Bronze
Back the work
$10 / month
- Your name on our supporters list
- A vote on what we build next
- A thank you in the build log
Open-weight model
An open AI model for finance and money questions in plain chat. It runs fully offline on a small desktop, so account details and deal terms never leave your machine.
Finance Chat
Finance Chat is an open AI model for money and finance questions in plain chat. It runs fully offline on a small desktop, so account numbers, deal terms, and other private figures never leave your machine.
It talks through finance ideas in simple words: what a margin is, how a balance sheet fits together, or what a term in a deal means. It is built on AdaptLLM’s finance-chat, an open model trained on finance text, and packed into ready-to-run files for a single desktop.
We scored five builds on FinanceBench, a strict 50-question test that only counts an exact number as right. The scores are low, 14 to 18 percent, and this is the honest weak spot: the model is good at explaining finance in words, but not at pulling exact figures out of a filing. So use it to learn and to draft, and check every number yourself. One nice find from our testing: the Q8_0 build matches the full-size model almost exactly while taking far less space.
Download the GGUF files (the ready-to-run format) and run them with llama.cpp on a Spark-class desktop, a small AI machine with 128 GB of memory. The Q4_K_M build is the fastest, at about 31 tokens a second, and a good place to start.
huggingface-cli download Orionfold/finance-chat-GGUFllama-cli -hf Orionfold/finance-chat-GGUF:Q4_K_Mllama-cli -hf Orionfold/finance-chat-GGUF:Q4_K_M -p "Explain the difference between gross margin and net margin."from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="Orionfold/finance-chat-GGUF",
filename="*Q4_K_M.gguf",
)
out = llm("Explain the difference between gross margin and net margin.")
print(out["choices"][0]["text"])
| Build | FinanceBench score | Speed on a Spark |
|---|---|---|
| Q4_K_M (fastest) | 14% | 31 tokens a second |
| Q5_K_M | 16% | 27 tokens a second |
| Q6_K | 16% | 24 tokens a second |
| Q8_0 | 18% | 9 tokens a second |
| F16 (full size) | 18% | 12 tokens a second |
Live counts from HuggingFace, refreshed when the site builds. Built and maintained in the open by Orionfold.
Back this work with a monthly tier. Your support moves your requests up the list, and Gold or Platinum earns a badge on the roadmap item you back.
Back the work
$10 / month
Get a say
$25 / month
Move it up the list
$50 / month
Shape the roadmap
$100 / month
Need something specific? Send an enquiry from the roadmap.

Offline patent reasoning in ready-to-run files, built with the NeMo toolkit. Nothing leaves your desktop.

Real notes from doing AI research on one desktop. The NVIDIA DGX Spark is a small machine with huge power (petascale means it runs about a quadrillion math steps a second), so you can push local AI further with no cloud needed. Every lesson is backed by code that runs.