
A family of efficient AI for long-context processing — from edge devices to maximum-context deployments.
“The bottleneck in AI-assisted work is not intelligence — it is memory. Asa was built to remove that constraint entirely.”
Engineered for on-device and low-latency environments. Nano fits within strict memory and compute budgets without sacrificing reasoning coherence.
Parameters
3B
Latency
<50ms
Context
128K tokens
The workhorse of the family. Core balances intelligence and throughput for production workloads — document processing, code generation, and enterprise assistants.
Parameters
32B
Latency
<200ms
Context
1M tokens
The full-context model. Max holds entire codebases, legal case files, or research corpora in a single pass — enabling reasoning that shorter-context models can't sustain.
Parameters
180B
Latency
<800ms
Context
10M tokens
Sustained coherence across book-length inputs. No chunking artifacts, no lost context mid-document.
Full-repo ingestion for code review, refactor suggestion, and dependency auditing in a single pass.
Conversation histories that span hours without degradation — valid for complex negotiations or extended interviews.
Reason simultaneously across multiple primary sources, surfacing contradictions and confluences.
First-class JSON, Markdown, and XML output modes with strict schema adherence — production-ready without post-processing.
State-of-the-art benchmark performance on complex, multi-step instruction sets across both technical and natural language tasks.
Asa is available across all three tiers for qualified enterprise deployments. Contact us to determine the right fit for your infrastructure and workload.
Request Access