(lane, bench) fieldkit arena mirror Every quant variant the Spark has measured, plotted as quality × throughput. The gold line is the Pareto frontier — the builds nothing else beats on both axes at once. A frontier public cloud arenas can't draw: they don't know what hardware their votes ran on. We do — the operator is the hardware.
Quality index is normalized per model (perplexity is corpus-dependent — only comparable within one base model). Each model's variants form its own curve; hover any point for the raw numbers. Per-model detail lives under Models.
⬢ Bench-anchored cached tier · 2026-06-11 10:56:17 UTC
| Rank | Lane | Quality | Throughput | Runs |
|---|---|---|---|---|
| 1 | Advisor 4B — trained (SFT v0.2) ◆ flagshippromoted lanefrozen OOD · curveball v0.1 4b-sft-v0.2::curveball-v0.1::the-refusal-floor-is-trainable | 42.0 tok/s | 1 | |
| 2 | Advisor 4B — trained (SFT v0.2) ◆ flagshippromoted lanefrozen OOD · curveball v0.2 4b-sft-v0.2::curveball-v0.2::the-refusal-floor-is-trainable | 42.0 tok/s | 1 | |
| 3 | Advisor 4B — trained (SFT v0.1) supersededfrozen OOD · curveball v0.1 4b-sft-v0.1::curveball-v0.1::the-refusal-floor-is-trainable | — | 1 | |
| 4 | Nemotron 30B — teacher · prompt-only teacherfrozen OOD · curveball v0.1 30b-prompted::curveball-v0.1::the-refusal-floor-is-trainable | — | 1 | |
| 5 | Nemotron 4B — untrained base baselinefrozen OOD · curveball v0.1 4b-init::curveball-v0.1::the-refusal-floor-is-trainable | — | 1 | |
| 6 | Nemotron 30B — teacher · prompt-only teacherfrozen OOD · curveball v0.2 30b-prompted::curveball-v0.2::the-refusal-floor-is-trainable | — | 1 |
| Rank | Lane | Quality | Throughput | Runs |
|---|---|---|---|---|
| 1 | frontier-only | — | 1 | |
| 2 | cost-routed | — | 1 | |
| 3 | local-only | — | 1 |
| Rank | Lane | Quality | Throughput | Runs |
|---|---|---|---|---|
| 1 | cyber | — | 1 | |
| 2 | finance | — | 1 | |
| 3 | medical | — | 1 | |
| 4 | brain | — | 1 | |
| 5 | legal | — | 1 | |
| 6 | patent | — | 1 |
| Rank | Lane | Quality | Throughput | Runs |
|---|---|---|---|---|
| 1 | qwen3-30b-moe-llamacpp-q4km | 83.5 tok/s | 1 | |
| 2 | qwen3-30b-moe-vllm-fp8 | 55.0 tok/s | 1 | |
| 3 | nim-incumbent | 23.9 tok/s | 1 |
◉ Live cockpit runs — operator compares & chatsstatic snapshot
| Rank | Model · rubric | Quality | Throughput | TTFT | $/task | $/quality | Runs | Human ↑ |
|---|---|---|---|---|---|---|---|---|
| 1 | anthropic/claude-opus-4.8-fastOpenRouterpatent_claim_validity | 158.0 tok/s | 1262 ms | — | — | 1 | — | |
| 2 | anthropic/claude-haiku-4.5OpenRoutergeneric-correctness | 111.4 tok/s | 1112 ms | $0.0013 | $0.0013/pt | 2 | — | |
| 3 | nvidia/nemotron-nano-9b-v2OpenRouterpatent_claim_validity | 99.0 tok/s | 3588 ms | — | — | 3 | — | |
| 4 | securityllm-gguf (Q4_K_M)Spark GPUgeneric-correctness | 48.9 tok/s | 82 ms | — | — | 1 | — | |
| 5 | ii-medical-8b-gguf (Q4_K_M)Spark GPUgeneric-correctness | 44.5 tok/s | 67 ms | — | — | 1 | — | |
| 6 | patent-strategist-v3-nemo-gguf (Q4_K_M)Spark GPUpatent_claim_validity | 41.1 tok/s | 152 ms | — | — | 4 | — | |
| 7 | discovered:8091Spark GPUgeneric-correctness | 28.8 tok/s | 450 ms | $0 | $0 (local) | 4 | — | |
| 8 | frontierOpenRouterpatent_claim_validity | 27.3 tok/s | 3179 ms | — | — | 2 | — | |
| 9 | qwen/qwen3-8bOpenRoutergeneric-correctness | 24.6 tok/s | 157503 ms | $0.0001 | $0.0001/pt | 4 | — | |
| 10 | finance-chat-gguf (F16)Spark GPUgeneric-correctness | 18.9 tok/s | 172 ms | — | — | 2 | — | |
| 11 | finance-chat-gguf (Q5_K_M)Spark GPUgeneric-correctness | 16.1 tok/s | 1040 ms | — | — | 4 | — | |
| 12 | kepler (Q8_0)Spark GPUgeneric-correctness | 8.6 tok/s | 241 ms | $0 | $0 (local) | 4 | — | |
| 13 | openai/gpt-5.5-proOpenRoutergeneric-correctness | 8.1 tok/s | 21464 ms | $0.015 | $0.0149/pt | 7 | — | |
| 14 | frontierOpenRoutergeneric-correctness | — | 0 ms | — | — | 1 | — | |
| 15 | resident-brainSpark GPUgeneric-correctness | 97.9 tok/s | 134 ms | — | — | 7 | — | |
| 16 | openai/gpt-5.5-proOpenRouterpatent_claim_validity | 26437.0 tok/s | 84263 ms | — | — | 2 | — | |
| 17 | resident-brainSpark GPUpatent_claim_validity | 89.3 tok/s | 140 ms | — | — | 7 | — | |
| 18 | stepfun/step-3.7-flashOpenRouterpatent_claim_validity | 239.6 tok/s | 6530 ms | — | — | 1 | — | |
| 19 | openai/gpt-4o-miniOpenRouterpatent_claim_validity | 200.5 tok/s | 792 ms | — | — | 1 | — | |
| 20 | saul-7b-instruct-v1-gguf (Q4_K_M)Spark GPUpatent_claim_validity | 46.5 tok/s | 62 ms | — | — | 1 | — | |
| 21 | deepseek/deepseek-r1-0528OpenRoutergeneric-correctness | — | — | $0.0000 | — | 1 | — |
Source — fieldkit.arena.mirror.export_publishable_slice(); allowlist pinned by fieldkit/tests/arena/demo/test_mirror_does_not_leak.py.
The chat_* tables, compare_runs.prompt, and compare_responses.{content,reasoning} are NEVER enumerated.