Bronze
Back the work
$10 / month
- Your name on our supporters list
- A vote on what we build next
- A thank you in the build log
Open-weight model
An open AI model tuned for legal text and built to follow instructions. It runs fully offline on a small desktop, so client matters and case files never leave the room.
Saul 7B Instruct
Saul 7B Instruct is an open AI model tuned for legal text. Legal work is full of careful reading, like contracts, filings, and case files, and much of it is private. This model reads and answers in private, fully offline, so client matters never leave the room.
It follows plain instructions on legal tasks: sum up a contract, pull out the key duties and dates, explain a clause in simple words, or draft a first pass at a routine document. It is built on Equall’s Saul-7B-Instruct, an open model trained on legal material, and packed into ready-to-run files for a single desktop.
We scored five builds on LegalBench, a 50-question legal test, on a small Spark desktop. The Q5_K_M build scored the best at 72 percent while running at about 20 tokens a second. If you want more speed, the Q4_K_M build is faster, at 29 tokens a second, and still scores 62 percent. The table above lists every build.
This is a short test, and it is not legal advice. Use the model as a fast first reader, and always have a person check its work before it matters.
Download the GGUF files (the ready-to-run format) and run them with llama.cpp on a Spark-class desktop, a small AI machine with 128 GB of memory. Pick Q5_K_M for the best answers, or Q4_K_M when you want it faster.
huggingface-cli download Orionfold/Saul-7B-Instruct-v1-GGUFllama-cli -hf Orionfold/Saul-7B-Instruct-v1-GGUF:Q5_K_Mllama-cli -hf Orionfold/Saul-7B-Instruct-v1-GGUF:Q5_K_M -p "Summarize the key duties in a standard non-disclosure agreement."from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="Orionfold/Saul-7B-Instruct-v1-GGUF",
filename="*Q5_K_M.gguf",
)
out = llm("Summarize the key duties in a standard non-disclosure agreement.")
print(out["choices"][0]["text"])
| Build | LegalBench score | Speed on a Spark |
|---|---|---|
| Q4_K_M (fastest) | 62% | 29 tokens a second |
| Q5_K_M (best score) | 72% | 20 tokens a second |
| Q6_K | 68% | 22 tokens a second |
| Q8_0 | 66% | 7 tokens a second |
| F16 (full size) | 68% | 11 tokens a second |
Live counts from HuggingFace, refreshed when the site builds. Built and maintained in the open by Orionfold.
Back this work with a monthly tier. Your support moves your requests up the list, and Gold or Platinum earns a badge on the roadmap item you back.
Back the work
$10 / month
Get a say
$25 / month
Move it up the list
$50 / month
Shape the roadmap
$100 / month
Need something specific? Send an enquiry from the roadmap.

Offline patent reasoning in ready-to-run files, built with the NeMo toolkit. Nothing leaves your desktop.

Real notes from doing AI research on one desktop. The NVIDIA DGX Spark is a small machine with huge power (petascale means it runs about a quadrillion math steps a second), so you can push local AI further with no cloud needed. Every lesson is backed by code that runs.