Receipts

The proof, not the pitch.

We do not ask you to trust us. We lock the test before we train, publish the receipts, and let you rerun them. Here is what one desk can prove.

Every has a frozen test you can rerun.

We write the hard test and lock it before we train, so we cannot cheat. A means we proved it and published the receipts, and it links to the page where you can check our numbers. The rest of the marks read like this:

  • proved on a locked test
  • yes, strong
  • possible, not guaranteed
  • not built for it
Show
What it can do Advisor 4B · ours Kepler 8B · ours Patent 8B · ours Qwen3 8B · base Llama 3.3 70B · base DeepSeek-R1 base
Runs offline on a desk box
Answers from your own documents
Gives exact source citations
Refuses cleanly, no made-up answers
Calls tools, takes actions
Shows step-by-step reasoning
Returns one checkable number
Checks its own memory quality

Base-model marks (Qwen3, Llama, DeepSeek-R1) are general, vendor-documented capability, not our measurement. Only the cells come with a frozen test anyone can rerun.

Typing speed, words a second

A free box on your desk keeps pace with the big paid models.

Our Advisor (4B)

your desk, free and private

~70

DeepSeek V4 Pro

hosted, paid

78.6

Claude Opus 4.7

hosted, paid

50.9

GPT-5.4

hosted, paid

166.8

Straight talk: those big hosted models are far smarter than our small one, and some speed-chip services are faster still. The WOW is the dead heat on raw typing speed from a box on your desk, for free, that never sends your data out.

We proved these wrong

Six things almost everyone assumes about AI. Each one, next to the receipt from our own desk that breaks it.

  • Real AI needs the cloud.

    We ran 50 experiments overnight for 2 cents, all on a desk.

  • Failure is expensive, so play it safe.

    On the box, a failed try costs a fraction of a penny. So we let the machine fail 42 times out of 50, on purpose.

  • Better answers come from a smarter model.

    Our biggest jump came from the ranker, which notes to read first, not the model. Answer quality went from 3.30 to 4.27 out of 5.

  • A small box can't beat a managed service.

    Dropping to a 4-bit mode made our box 76% faster than the convenient default.

  • Bigger models are always safer to trust.

    Our 4B out-trusts a 30B, 18 of 21 versus 8 of 21, and only the big one made up answers.

Proof you can rerun. An AI team you can own.

The receipts are the reason to believe. The payoff is what they earn: a private AI team that runs on the Spark you already own, with no cloud account and no per-word bills. The full story of why is in the letter from the founder.