Open-weight model

Finance Chat

An open AI model for finance and money questions in plain chat. It runs fully offline on a small desktop, so account details and deal terms never leave your machine.

Finance Chat
Field
Finance
Runs
Fully offline
Built on
AdaptLLM finance-chat
License
Free to use

Finance Chat

Finance Chat is an open AI model for money and finance questions in plain chat. It runs fully offline on a small desktop, so account numbers, deal terms, and other private figures never leave your machine.

What it can do

It talks through finance ideas in simple words: what a margin is, how a balance sheet fits together, or what a term in a deal means. It is built on AdaptLLM’s finance-chat, an open model trained on finance text, and packed into ready-to-run files for a single desktop.

How well it works

We scored five builds on FinanceBench, a strict 50-question test that only counts an exact number as right. The scores are low, 14 to 18 percent, and this is the honest weak spot: the model is good at explaining finance in words, but not at pulling exact figures out of a filing. So use it to learn and to draft, and check every number yourself. One nice find from our testing: the Q8_0 build matches the full-size model almost exactly while taking far less space.

How to run it

Download the GGUF files (the ready-to-run format) and run them with llama.cpp on a Spark-class desktop, a small AI machine with 128 GB of memory. The Q4_K_M build is the fastest, at about 31 tokens a second, and a good place to start.

Install

huggingface-cli download Orionfold/finance-chat-GGUF

Use it

llama-cli -hf Orionfold/finance-chat-GGUF:Q4_K_M -p "Explain the difference between gross margin and net margin."

Specs

Base model
AdaptLLM/finance-chat
Format
GGUF (ready to run)
Builds
Q4_K_M · Q5_K_M · Q6_K · Q8_0 · F16
Best build
Q4_K_M (about 31 tokens a second on a Spark desktop)
License
Free to use

Benchmarks

BuildFinanceBench scoreSpeed on a Spark
Q4_K_M (fastest)14%31 tokens a second
Q5_K_M16%27 tokens a second
Q6_K16%24 tokens a second
Q8_018%9 tokens a second
F16 (full size)18%12 tokens a second

Used in the open

Live counts from HuggingFace, refreshed when the site builds. Built and maintained in the open by Orionfold.

531
Downloads · last 30 days