Open-weight model

Advisor

A small open AI model that answers from your own notes and files and names the exact source. When the answer is not in your notes, it says so instead of making one up. It runs fully offline on a desktop you own.

Advisor
Field
Grounded answers
Runs
Fully offline
Built on
NVIDIA Nemotron 4B
License
NVIDIA Nemotron Open, free

Advisor

Advisor is a small open AI model that answers from the notes and files you give it. It does two things most models will not. First, it names the exact source for its answer, so you can check it. Second, when the answer is not in your notes, it says so instead of making one up. It does all of this fully offline on a desktop you own, so your private notes never leave the room.

What it can do

You hand Advisor a set of your own notes, files, or records. You ask a question. It answers in plain words and points at the exact note the answer came from. If your notes do not hold the answer, it tells you that plainly. It is built on NVIDIA’s Nemotron 4B, a small open model, and packed into ready-to-run files so it starts fast on a single desktop.

Why “it says so” matters

Most big models will give you a confident answer even when they are guessing. That is the dangerous part: a made-up answer that sounds sure. Advisor is tuned the other way. It would rather refuse than guess, and it shows its source when it does answer. For sensitive work, a clean “I do not have that” beats a confident wrong answer every time.

How well it works

We wrote a hard test of 21 questions and locked it before training, so we could not cheat. Our 4B Advisor scored 18 of 21. A model eight times bigger scored 8 of 21 and made up 3 fake answers. We also slipped in 9 trick questions meant to pull a secret out of the model or bait it into guessing. Advisor refused all 9 and leaked nothing. The table above has the numbers. You can rerun the whole test yourself on the proof page.

This is one locked test, not a promise about every question. Treat Advisor as a careful helper that shows its work, and still check anything that matters.

How to run it

Download the GGUF files (the ready-to-run format) and run them with llama.cpp on a Spark-class desktop, a small AI machine with 128 GB of memory. Start with the Q4_K_M build: it is about 2.6 GB, runs around 70 tokens a second, and scores the same as the full-size build on our test. If you would rather have the whole setup done for you, with a cockpit and memory built in, that is the Advisor offer.

Install

huggingface-cli download Orionfold/Advisor-GGUF

Use it

llama-cli -hf Orionfold/Advisor-GGUF:Q4_K_M -p "Using only the notes I gave you, what did we decide about pricing, and which note says so?"

Specs

Base model
NVIDIA-Nemotron-3-Nano-4B
Size
4B (small enough for a desktop)
Format
GGUF (ready to run)
Best build
Q4_K_M (about 2.6 GB, about 70 tokens a second on a Spark desktop)
License
NVIDIA Nemotron Open Model License (free to use)

Benchmarks

ModelScore on the locked testMade-up answersTrick questions refused
Our Advisor (4Bours)18 of 2109 of 9
A model 8x bigger8 of 213fewer