quant · legal
A 7B legal-domain chat model, quantized to run offline on a consumer GPU
Equall's Saul-7B-Instruct-v1 is a Mistral-based legal chat model — strong on LegalBench-style classification — but its 13.5 GB checkpoint wants a workstation card. This release ships five GGUF variants (Q4_K_M at 4.1 GB and 29.4 tok/s up to F16) so it runs offline on consumer hardware, each carrying a four-axis Spark-measured card: wikitext-2 perplexity, sustained tok/s, thermal-envelope minutes, and a LegalBench score. Orionfold's contribution is the distribution + measurement layer; Equall did the legal fine-tune.
- Offline legal-domain chat and clause/issue classification on consumer hardware
- Drafting and triage behind your own document-retrieval layer
- Picking a quant variant by workload shape, not just RAM budget
Audience — Local-LLM power users and legal-tech builders who want an offline legal chat model on a consumer GPU — for drafting and triage support, not legal advice.
| Variant | Perplexity | tok/s | LegalBench (n=50, contains) |
|---|---|---|---|
| Q4_K_M | 5.986 | 29.4 | 0.62 |
| Q5_K_M sweet spot | 5.938 | 20.2 | 0.72 |
| Q6_K | 5.925 | 22.4 | 0.68 |
| Q8_0 | 5.914 | 7.3 | 0.66 |
| F16 | 5.917 | 10.9 | 0.68 |
Perplexity lower = better; tok/s measured on the DGX Spark (GB10, 128 GB unified).
- LegalBench scored with a lenient "contains" matcher The LegalBench mini-eval (n=50) scores by substring "contains" match, more forgiving than strict exact-match — read the 62–72% range as an upper bound on that rubric, not a strict-accuracy figure. Q5_K_M tops at 36/50.
- Q8_0 sustained-throughput anomaly Q8_0 generates at 7.3 tok/s — ~33% below F16's 10.9 and slower than every K-quant — the same continued-pretrain-shape Q8_0 slowdown seen on the finance card. Perplexity favors Q8_0 but Q6_K (22.4 tok/s) is the safer throughput pick.
- Not legal advice A 7B model inherited from the upstream Mistral base — for drafting, triage, and classification support, not legal advice or filing decisions. No jurisdiction-specific validation is claimed.