Bronze
Back the work
$10 / month
- Your name on our supporters list
- A vote on what we build next
- A thank you in the build log
Open-weight model
A small open AI model that answers from your own notes and files and names the exact source. When the answer is not in your notes, it says so instead of making one up. It runs fully offline on a desktop you own.
Advisor
Advisor is a small open AI model that answers from the notes and files you give it. It does two things most models will not. First, it names the exact source for its answer, so you can check it. Second, when the answer is not in your notes, it says so instead of making one up. It does all of this fully offline on a desktop you own, so your private notes never leave the room.
You hand Advisor a set of your own notes, files, or records. You ask a question. It answers in plain words and points at the exact note the answer came from. If your notes do not hold the answer, it tells you that plainly. It is built on NVIDIA’s Nemotron 4B, a small open model, and packed into ready-to-run files so it starts fast on a single desktop.
Most big models will give you a confident answer even when they are guessing. That is the dangerous part: a made-up answer that sounds sure. Advisor is tuned the other way. It would rather refuse than guess, and it shows its source when it does answer. For sensitive work, a clean “I do not have that” beats a confident wrong answer every time.
We wrote a hard test of 21 questions and locked it before training, so we could not cheat. Our 4B Advisor scored 18 of 21. A model eight times bigger scored 8 of 21 and made up 3 fake answers. We also slipped in 9 trick questions meant to pull a secret out of the model or bait it into guessing. Advisor refused all 9 and leaked nothing. The table above has the numbers. You can rerun the whole test yourself on the proof page.
This is one locked test, not a promise about every question. Treat Advisor as a careful helper that shows its work, and still check anything that matters.
Download the GGUF files (the ready-to-run format) and run them with llama.cpp on a Spark-class desktop, a small AI machine with 128 GB of memory. Start with the Q4_K_M build: it is about 2.6 GB, runs around 70 tokens a second, and scores the same as the full-size build on our test. If you would rather have the whole setup done for you, with a cockpit and memory built in, that is the Advisor offer.
huggingface-cli download Orionfold/Advisor-GGUFllama-cli -hf Orionfold/Advisor-GGUF:Q4_K_Mllama-cli -hf Orionfold/Advisor-GGUF:Q4_K_M -p "Using only the notes I gave you, what did we decide about pricing, and which note says so?"from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="Orionfold/Advisor-GGUF",
filename="*Q4_K_M.gguf",
)
out = llm("Using only the notes I gave you, what did we decide about pricing, and which note says so?")
print(out["choices"][0]["text"])
| Model | Score on the locked test | Made-up answers | Trick questions refused | |
|---|---|---|---|---|
| Our Advisor (4B | ours) | 18 of 21 | 0 | 9 of 9 |
| A model 8x bigger | 8 of 21 | 3 | fewer |
Back this work with a monthly tier. Your support moves your requests up the list, and Gold or Platinum earns a badge on the roadmap item you back.
Back the work
$10 / month
Get a say
$25 / month
Move it up the list
$50 / month
Shape the roadmap
$100 / month
Need something specific? Send an enquiry from the roadmap.

Offline patent reasoning in ready-to-run files, built with the NeMo toolkit. Nothing leaves your desktop.

A book on running a business with AI agents (software helpers that do the work for you). Fourteen short chapters in four parts take you from the first idea to a working system. About a two hour read, open to all.