The proof, not the pitch.
We do not ask you to trust us. We lock the test before we train, publish the receipts, and let you rerun them. Here is what one desk can prove.
Every ★ has a frozen test you can rerun.
We write the hard test and lock it before we train, so we cannot cheat. A ★ means we proved it and published the receipts, and it links to the page where you can check our numbers. The rest of the marks read like this:
- ★ proved on a locked test
- ● yes, strong
- ◐ possible, not guaranteed
- ○ not built for it
| What it can do | Advisor 4B · ours | Kepler 8B · ours | Patent 8B · ours | Qwen3 8B · base | Llama 3.3 70B · base | DeepSeek-R1 base |
|---|---|---|---|---|---|---|
| Runs offline on a desk box | ● | ● | ● | ● | ◐ | ◐ |
| Answers from your own documents | ★ | ○ | ◐ | ◐ | ◐ | ◐ |
| Gives exact source citations | ★ | ○ | ★ | ◐ | ◐ | ◐ |
| Refuses cleanly, no made-up answers | ★ | ○ | ◐ | ◐ | ◐ | ◐ |
| Calls tools, takes actions | ◐ | ○ | ○ | ● | ● | ● |
| Shows step-by-step reasoning | ○ | ★ | ★ | ◐ | ○ | ● |
| Returns one checkable number | ○ | ★ | ○ | ◐ | ◐ | ◐ |
| Checks its own memory quality | ★ | ○ | ○ | ○ | ○ | ○ |
Base-model marks (Qwen3, Llama, DeepSeek-R1) are general, vendor-documented capability, not our measurement. Only the ★ cells come with a frozen test anyone can rerun.
Typing speed, words a second
A free box on your desk keeps pace with the big paid models.
Our Advisor (4B)
your desk, free and private
DeepSeek V4 Pro
hosted, paid
Claude Opus 4.7
hosted, paid
GPT-5.4
hosted, paid
Straight talk: those big hosted models are far smarter than our small one, and some speed-chip services are faster still. The WOW is the dead heat on raw typing speed from a box on your desk, for free, that never sends your data out.
The numbers that pop
Our desk box ran 50 AI experiments by itself in 73 minutes, on about 2 cents of electricity. The same loop on rented cloud chips runs to dollars, plus a fee for every word. And ours never sends your data out.
See the work →On a hard locked test, our 4B Advisor scored 18 of 21. A model eight times bigger scored 8 of 21 and made up 3 fake answers. Advisor refused all 9 trick questions and leaked zero secrets.
See the work →A special 4-bit mode made the same model run 76% faster, 38.8 words a second up from 22.1, on a fraction of the memory. Less memory means more room for everything else on the one box.
See the work →Before booking a big cloud training run, we test the design on the desk first. A $1 test on the box gates about $1,679 of expected loss. One wrong booking it stops pays for the whole box.
See the work →We proved these wrong
Six things almost everyone assumes about AI. Each one, next to the receipt from our own desk that breaks it.
-
Real AI needs the cloud.
We ran 50 experiments overnight for 2 cents, all on a desk.
-
Failure is expensive, so play it safe.
On the box, a failed try costs a fraction of a penny. So we let the machine fail 42 times out of 50, on purpose.
-
Better answers come from a smarter model.
Our biggest jump came from the ranker, which notes to read first, not the model. Answer quality went from 3.30 to 4.27 out of 5.
-
A small box can't beat a managed service.
Dropping to a 4-bit mode made our box 76% faster than the convenient default.
-
Bigger models are always safer to trust.
Our 4B out-trusts a 30B, 18 of 21 versus 8 of 21, and only the big one made up answers.
Proof you can rerun. An AI team you can own.
The receipts are the reason to believe. The payoff is what they earn: a private AI team that runs on the Spark you already own, with no cloud account and no per-word bills. The full story of why is in the letter from the founder.