Solutions · For consumer & prosumer

One mind, one user, one device.

Personal intelligence isn't luxury infrastructure. It should run on the phone in your pocket, on the laptop in front of you, in the apps you actually use every day. AVALON-2B is the on-device Self-RAG runtime; Hypersave is the cognitive memory layer that makes the app remember you. Together they're the substrate for the next generation of consumer AI.

See Nuro One →All consumer products →

On Apple M3

40 tok/s

AVALON-2B

Parameters

1.88B

sub-3B · Self-RAG

Quants

GGUF

Q4 / Q5 / Q8

Memory recall

<200ms

p95 · Hypersave

What you can build

Apps that know who they're talking to.

Runs on the device

AVALON-2B is sub-3B parameters and ships as GGUF quants. 40 tok/s on Apple M3, comfortable on flagship phones, free on every desktop. Latency you can feel.

Memory that follows the user

Hypersave gives every consumer app a persistent cognitive layer. The user’s preferences, history, voice — captured once, recalled everywhere. They stop introducing themselves.

Self-RAG keeps it honest

AVALON’s reflection vocabulary lets a small on-device model decide when it needs to consult memory or external sources — and admit when it doesn’t know. No silent confabulation.

Privacy as the default

Inference local. Memory keys per-device. No mandatory call-home. Build consumer apps where the privacy story is the product story.

In the wild

We dogfood the stack. Nuro Chat is the public proof.

Nuro Chat is multi-model chat with persistent memory built on Hypersave — switch models without losing context, recall conversations from last week, run AVALON-2B locally when you don't want to send a turn to the cloud. The Nuro stack (Chat, Studio, One) is the consumer surface and the reference implementation.

Nuro Chat →Nuro Studio →Nuro One →

Sample exchange · with persistent memory

you · last week

I'm working on a long-form essay about bottom-up routes to AGI.

you · today

Where did we leave the essay?

avalon · [Retrieval] [Relevant]

The essay on bottom-up AGI. You finished the section on persistent memory and were about to argue that reflection — not scale — is the rate-limiting step. Want to pick up from there?

Local-first integration

Pull AVALON. Add memory. Ship.

The fastest way from open-weights model to consumer app with a memory. Run AVALON through Ollama on the user's machine, route memory through Hypersave (managed or self-hosted). Nothing else to glue.

install.shbash

# On the user's device
ollama pull nuroai/avalon-2b

# In your app
npm install @hypersave/sdk ollama

app.tsts

import { Hypersave } from "@hypersave/sdk";
import { ollama } from "ollama";

const memory = new Hypersave({ apiKey: process.env.HYPERSAVE_KEY });

async function reply(userId: string, prompt: string) {
  const { answer: context } = await memory.recall({ userId, query: prompt });

  const response = await ollama.chat({
    model: "nuroai/avalon-2b",     // local — never hits the cloud
    messages: [
      { role: "system", content: `What you remember about the user: ${context}` },
      { role: "user", content: prompt },
    ],
  });

  await memory.remember({ userId, text: prompt, sector: "episodic" });
  return response.message.content;
}

Built on

One open-weights runtime, one cognitive memory layer.

AVALON-2B

On-device Self-RAG runtime

Sub-3B Self-Reflective RAG model. Apache 2.0. GGUF quants on Hugging Face. Ollama-ready. 40 tok/s on Apple M3. Beats Qwen 3.5 2B, Gemma 4 E2B, SmolLM3 3B on its target benchmarks.

Read the paper →

Hypersave

Persistent cognitive memory

Five sectors, Ebbinghaus decay, RRF hybrid retrieval. Embed via TS or Python SDK. Use the managed cloud or self-host on the user’s own infrastructure.

Read the docs →

A 2B model that knows when it doesn't know, paired with a memory layer that actually fuses retrieval — that is the substrate consumer AI has been waiting for.

Composite developer feedback · AVALON-2B early adopters · 2026

Get started

The model is open. The SDK is one command. Build something.

Free tier on Hypersave. AVALON-2B free forever (Apache 2.0). If you're shipping consumer AI and want to compare notes, reach press@nuroailabs.com.

AVALON on Hugging Face →Open Hypersave →Talk to us →

Open weights, open papers.

Memory, agents, applied minds.

Personal intelligence in production.

London-registered research lab.

One mind, one user, one device.

Apps that know who they're talking to.

Runs on the device

Memory that follows the user

Self-RAG keeps it honest

Privacy as the default

We dogfood the stack. Nuro Chat is the public proof.

Pull AVALON. Add memory. Ship.

One open-weights runtime, one cognitive memory layer.

AVALON-2B

Hypersave

The model is open. The SDK is one command. Build something.