AVALON-2B
The first sub-3B language model that knows what it doesn't know.
1.88B parameters built on Qwen 3.5 2B. A five-token reflection vocabulary — [Retrieval] [No Retrieval] [Relevant] [Utility:5] — and a 22M-parameter MiniLM router at 90.5% accuracy. 82.5% Self-RAG token accuracy under LoRA, 40 tok/s on Apple M3, 12 tok/s on iPhone 15 Pro.