Real AI needs the cloud.

We ran 50 experiments overnight for 2 cents, all on a desk.

Failure is expensive, so play it safe.

On the box, a failed try costs a fraction of a penny. So we let the machine fail 42 times out of 50, on purpose.

Better answers come from a smarter model.

Our biggest jump came from the ranker, which notes to read first, not the model. Answer quality went from 3.30 to 4.27 out of 5.

A small box can't beat a managed service.

Dropping to a 4-bit mode made our box 76% faster than the convenient default.

Bigger models are always safer to trust.

Our 4B out-trusts a 30B, 18 of 21 versus 8 of 21, and only the big one made up answers.

Receipts

The proof, not the pitch.

Name: Orionfold Proof
Brand: Orionfold
Price: 499.00 USD
Availability: InStock
Author: Manav Sehgal

We do not ask you to trust us. We lock the test before we train, publish the receipts, and let you rerun them. Here is what one desk can prove. Open source, offline, and it runs on the machine in front of you. Rerun anything on this page yourself.

Install pip install orionfold-proof then orionfold up

Every ★ has a frozen test you can rerun.

We write the hard test and lock it before we train, so we cannot cheat. A ★ means we proved it and published the receipts, and it links to the page where you can check our numbers. The rest of the marks read like this:

★ proved on a locked test
● yes, strong
◐ possible, not guaranteed
○ not built for it

Show

★ Click a row with a star to open its receipt, the frozen test you can rerun.

What it can do	Advisor 4B · ours	Kepler 8B · ours	Patent 8B · ours	Qwen3 8B · base	Llama 3.3 70B · base	DeepSeek-R1 base
Runs offline on a desk box	●	●	●	●	◐	◐
Answers from your own documents	★	○	◐	◐	◐	◐
Gives exact source citations	★	○	★	◐	◐	◐
Refuses cleanly, no made-up answers	★	○	◐	◐	◐	◐
Calls tools, takes actions	◐	○	○	●	●	●
Shows step-by-step reasoning	○	★	★	◐	○	●
Returns one checkable number	○	★	○	◐	◐	◐
Checks its own memory quality	★	○	○	○	○	○

Base-model marks (Qwen3, Llama, DeepSeek-R1) are general, vendor-documented capability, not our measurement. Only the ★ cells come with a frozen test anyone can rerun.

Typing speed, words a second

A model on your own desk keeps pace with the big hosted ones.

Our Advisor (4B)

your own desk, fully private

~70

DeepSeek V4 Pro

hosted in the cloud

78.6

Claude Opus 4.7

hosted in the cloud

50.9

GPT-5.4

hosted in the cloud

166.8

Straight talk: those big hosted models are far smarter than our small one, and some speed-chip services are faster still. The WOW is the dead heat on raw typing speed from a model on your own desk that never sends your data out.

The Orionfold line · Flow leads

Flow Conduct documents with AI agency. Relay Run repeatable agent work. Arena See which local AI wins.

What Orionfold Proof does

The tool behind these receipts

Orionfold Proof runs on your own machine. You point it at your own task, run the AI models or setups you are weighing, and it hands back a signed receipt: which one won, at what cost, with what failures. One command to install, a cockpit in your browser, and you are proving in minutes. Nothing phones home.

The Orionfold Proof cockpit after a run: the always-on telemetry rail across the top, the recommended winner, the leaderboard, the cost-vs-quality chart, and the failure cases on one scrolling canvas. — One canvas, the whole proof: the live telemetry rail up top, then the recommended winner, the leaderboard with cost and speed, the cost-vs-quality chart, and every failure case. Scroll inside the frame to read it all.

How it works

Step 1 Configure task + candidates Point Proof at your own task. Pick the models or setups you are weighing, local or cloud, and the rubric to score them by.
Step 2 Run local, row by row Proof runs every candidate on your machine, row by row, with a live rail showing the host, the runtime, and the result as it lands. Nothing leaves your desk.
Step 3 Decide a signed receipt You get a clear winner, a leaderboard with cost and speed, the failures laid out, and a signed receipt you can rerun any time.

Inside the cockpit

Orionfold Proof — your datasets

The Datasets screen in the Proof cockpit: the frozen example sets, the governance bench, and the corpus.

Your datasets

Bring your own task rows. Proof scores against the answers you decide are right.

Orionfold Proof — any candidate

The candidate picker listing local models through Ollama and cloud models across providers, side by side.

Any candidate

Local models through Ollama or cloud models through their API, side by side in one run.

Orionfold Proof — signed receipts

The Receipts screen: the bento masthead over the Runs table, each run exportable as Markdown, HTML, or JSON.

Signed receipts

Every run exports a receipt as Markdown, HTML, or JSON. Same inputs, same receipt, every time.

The flagship proof

Reproduce a published model result on your own laptop, turnkey

We took our Advisor 4B model and ran its governance bench again, this time on an Apple M3 Max laptop instead of the lab box. It scored 18 of 21, matching the published baseline answer for answer: the same 9 of 9 refusals, and the same 3 misses. It cost $0.00 because it ran fully local. Every line of that is in the receipt, down to the config hash 552d0060f782.

The full Orionfold Proof workflow: configure the Advisor governance bench, run 21 rows locally, and reach an 18 of 21 result. — The whole run in one take: configure, generate all 21 rows locally, reach 18 of 21.

It does not hide the losses. The result misses 3 of the 21 and shows exactly which checks failed and why. A trust tool that only ever shows a perfect score is the one you should not trust.

The full Orionfold Proof results page: the recommended 18 of 21 verdict, the leaderboard with tokens per second and zero cost, the run cost ledger, and the three honest failure cases with per-check detail. — The full results page: the recommended verdict, the leaderboard, the cost ledger, and the three honest misses laid out check by check. Scroll inside the frame to read it all.

One tool, two jobs

The same locked-test-and-receipt loop serves the builder proving a model on their desk and the regulated team that has to hand a reviewer something they can check. Pick your path.

For builders

Benchmark the models on your own desk, offline, in minutes.

Run local and cloud models against the same locked test.
Read one scored receipt: which passed, which refused, which made things up.
No account, no per-word bill. It runs on the machine in front of you.

For regulated teams

Produce an audit-ready receipt your risk team can rerun.

Every run writes a signed receipt with the exact test, the models, and the scores.
The same config hash reproduces the same result, so a reviewer can check it independently.
Nothing leaves your machine. Prompts and data never reach us or any vendor you did not choose.

Prove it on your own machine

Own the tool behind these receipts.

Orionfold Proof runs on your own machine and hands back a signed receipt you can rerun. Owning it unlocks every pack it ships with, starting with the Advisor governance pack.

$349 founding, first 25 then $499 one time

or see the full details

Orionfold Proof poster: an art-deco receipt rising from a laptop into the Orion stars, the tool that proves which AI to trust.

The numbers that pop

2 cents An overnight lab

Our desk box ran 50 AI experiments by itself in 73 minutes, on about 2 cents of electricity. The same loop on rented cloud chips runs to dollars, plus a fee for every word. And ours never sends your data out.

See the work →

4B beats 30B Small out-trusts big

On a hard locked test, our 4B Advisor scored 18 of 21. A model eight times bigger scored 8 of 21 and made up 3 fake answers. Advisor refused all 9 trick questions and leaked zero secrets.

See the work →

76% faster Same model, same chip

A special 4-bit mode made the same model run 76% faster, 38.8 words a second up from 22.1, on a fraction of the memory. Less memory means more room for everything else on the one box.

See the work →

$1 guards $1,679 A cheap wind tunnel

Before booking a big cloud training run, we test the design on the desk first. A $1 test on the box gates about $1,679 of expected loss. One wrong booking it stops pays for the whole box.

See the work →

We proved these wrong

Six things almost everyone assumes about AI. Each one, next to the receipt from our own desk that breaks it.

Real AI needs the cloud.

We ran 50 experiments overnight for 2 cents, all on a desk.
Failure is expensive, so play it safe.

On the box, a failed try costs a fraction of a penny. So we let the machine fail 42 times out of 50, on purpose.
Better answers come from a smarter model.

Our biggest jump came from the ranker, which notes to read first, not the model. Answer quality went from 3.30 to 4.27 out of 5.
A small box can't beat a managed service.

Dropping to a 4-bit mode made our box 76% faster than the convenient default.
Bigger models are always safer to trust.

Our 4B out-trusts a 30B, 18 of 21 versus 8 of 21, and only the big one made up answers.

Proof in the field

Every receipt above has a story behind it: the test we locked, the run, and what we found. Here are the field notes, each one a thing you can rerun yourself.

Yours, not ours

No lock-in, ever

The tool is open and installs with one command. Your receipts are plain files you keep. If we vanished tomorrow, everything you ran still verifies.

See how

Auditable end to end

Every release ships with a signed record of where it came from and a full parts list. Check what you install before you run it.

See how

Run it yourself

Proof runs on your own machine, offline if you want. Nothing you test is uploaded. Anything that could leave the box is listed in full, with an off switch.

See how

Before you buy

The questions your risk team asks.

Proof runs on your own machine, holds none of your data, and keeps working even if we do not. Here are the four questions a careful buyer asks, answered in plain words, each backed by a page that links the code that proves it.

Does Proof send our prompts or data to you?

No. Proof runs on your own machine. Your prompts, your test data, and your results stay there. AI work runs through your own model accounts, not ours. We run no service in the middle, so there is nothing for us to see and nothing for us to hold.
See every network call
Can I run it fully offline, air-gapped?

Yes. Point Proof at local models and it never needs the internet to run a test. The one time it reaches out is the checked, one-time download of the tool itself. After that, you can pull the plug and it keeps working.
Read the security packet
What if you disappear? Do my receipts still verify?

Yes. The tool is open source and your receipts are plain files with the test, the config hash, and the scores baked in. Anyone can rerun the same config and reach the same result, with or without us. The honest edge cases are written down too.
Read the continuity note
Can my risk team rerun this independently?

That is the whole point. A receipt names the exact test and the models it ran. Hand it to a reviewer, they install Proof, run the same config, and get the same numbers. A benchmark you cannot rerun is a rumor. This one you can.
See the supply chain

Prove it yourself, on your own machine

Get Orionfold Proof

Orionfold Proof is the tool behind these receipts. It runs on your own machine, points at your own task, and tries each AI model or setup you are weighing. It scores them on a rubric you set and hands back a signed receipt: which one won, at what cost, with what failures. Nothing leaves your machine, and you can rerun the proof any time.

Install it in one command, open the cockpit in your browser, and you are proving in minutes. Owning Orionfold Proof unlocks every pack that ships with it, starting with the Advisor governance pack: rerun our headline receipt, a 4B model that out-trusts a much bigger one, on your own desk.

$349 founding, first 25 licenses

then $499 one time, 12-month window included

After the first year, $149/yr keeps it proven: new packs, new checks, and a fresh receipt. Skip it and your copy keeps working, you just stop getting new proof.

Not ready to buy?

Get the proof playbook.

The one-page guide to locking a test, running it on your own desk, and reading the receipts. Plus a note when we publish new proof.

One-page PDF + new-proof notes

By subscribing you agree to receive the AI For Everyone digest, one email a week, no more. You can unsubscribe any time. See our privacy policy.

Proof you can rerun. An AI team you can own.

The receipts are the reason to believe. The payoff is what they earn: a private AI team that runs on the Spark you already own, with no cloud account and no per-word bills. The full story of why is in the letter from the founder.

Get an AI team you own Read the letter

Orionfold Proof