GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
throughput & first-token from the active lane
Active Lane idle no warm brain
Benches 4 cached evidence sources
Lanes ranked 36 unique (lane, bench)
Runs 78 bench + live
Schema v2 leak-proof · ✓
Generated 2026-06-11 10:56:17 UTC last fieldkit arena mirror
Cost / quality efficiency frontier 7 models · 30 builds

Every quant variant the Spark has measured, plotted as quality × throughput. The gold line is the Pareto frontier — the builds nothing else beats on both axes at once. A frontier public cloud arenas can't draw: they don't know what hardware their votes ran on. We do — the operator is the hardware.

Quality index is normalized per model (perplexity is corpus-dependent — only comparable within one base model). Each model's variants form its own curve; hover any point for the raw numbers. Per-model detail lives under Models.

View Sort

⬢ Bench-anchored cached tier · 2026-06-11 10:56:17 UTC

Orionfold Advisor — refusal-floor contract the-refusal-floor-is-trainable:advisor_contract 6 lanes · 6 runs metric · frozen OOD curveballs
Rank Lane Quality Throughput Runs
1
Advisor 4B — trained (SFT v0.2) ◆ flagshippromoted lanefrozen OOD · curveball v0.1 4b-sft-v0.2::curveball-v0.1::the-refusal-floor-is-trainable
90.0%
42.0 tok/s 1
2
Advisor 4B — trained (SFT v0.2) ◆ flagshippromoted lanefrozen OOD · curveball v0.2 4b-sft-v0.2::curveball-v0.2::the-refusal-floor-is-trainable
85.7%
42.0 tok/s 1
3
Advisor 4B — trained (SFT v0.1) supersededfrozen OOD · curveball v0.1 4b-sft-v0.1::curveball-v0.1::the-refusal-floor-is-trainable
70.0%
1
4
Nemotron 30B — teacher · prompt-only teacherfrozen OOD · curveball v0.1 30b-prompted::curveball-v0.1::the-refusal-floor-is-trainable
57.5%
1
5
Nemotron 4B — untrained base baselinefrozen OOD · curveball v0.1 4b-init::curveball-v0.1::the-refusal-floor-is-trainable
55.0%
1
6
Nemotron 30B — teacher · prompt-only teacherfrozen OOD · curveball v0.2 30b-prompted::curveball-v0.2::the-refusal-floor-is-trainable
38.1%
1
hermes-cost-routing-local-and-openrouter:cost_router 3 lanes · 3 runs metric · cost_router
Rank Lane Quality Throughput Runs
1
frontier-only
100.0%
1
2
cost-routed
91.7%
1
3
local-only
66.7%
1
hermes-vertical-router-on-spark:vertical_router 6 lanes · 6 runs metric · vertical_router
Rank Lane Quality Throughput Runs
1
cyber
100.0%
1
2
finance
100.0%
1
3
medical
100.0%
1
4
brain
80.0%
1
5
legal
80.0%
1
6
patent
80.0%
1
picking-the-hermes-brain-on-spark:hermes_brain 3 lanes · 3 runs metric · hermes_brain
Rank Lane Quality Throughput Runs
1
qwen3-30b-moe-llamacpp-q4km
90.0%
83.5 tok/s 1
2
qwen3-30b-moe-vllm-fp8
87.5%
55.0 tok/s 1
3
nim-incumbent
77.5%
23.9 tok/s 1

◉ Live cockpit runs — operator compares & chatsstatic snapshot

cockpit · all rubrics21 rows · 60 runsmetric · rubric mean
RankModel · rubricQualityThroughputTTFT$/task$/qualityRunsHuman ↑
1
anthropic/claude-opus-4.8-fastOpenRouterpatent_claim_validity
100.0% ·fmt
158.0 tok/s1262 ms1
2
anthropic/claude-haiku-4.5OpenRoutergeneric-correctness
100.0% ·fmt
111.4 tok/s1112 ms$0.0013$0.0013/pt2
3
nvidia/nemotron-nano-9b-v2OpenRouterpatent_claim_validity
100.0% ·fmt
99.0 tok/s3588 ms3
4
securityllm-gguf (Q4_K_M)Spark GPUgeneric-correctness
100.0% ·fmt
48.9 tok/s82 ms1
5
ii-medical-8b-gguf (Q4_K_M)Spark GPUgeneric-correctness
100.0% ·fmt
44.5 tok/s67 ms1
6
patent-strategist-v3-nemo-gguf (Q4_K_M)Spark GPUpatent_claim_validity
100.0% ·fmt
41.1 tok/s152 ms4
7
discovered:8091Spark GPUgeneric-correctness
100.0% ·fmt
28.8 tok/s450 ms$0$0 (local)4
8
frontierOpenRouterpatent_claim_validity
100.0% ·fmt
27.3 tok/s3179 ms2
9
qwen/qwen3-8bOpenRoutergeneric-correctness
100.0% ·fmt
24.6 tok/s157503 ms$0.0001$0.0001/pt4
10
finance-chat-gguf (F16)Spark GPUgeneric-correctness
100.0% ·fmt
18.9 tok/s172 ms2
11
finance-chat-gguf (Q5_K_M)Spark GPUgeneric-correctness
100.0% ·fmt
16.1 tok/s1040 ms4
12
kepler (Q8_0)Spark GPUgeneric-correctness
100.0% ·fmt
8.6 tok/s241 ms$0$0 (local)4
13
openai/gpt-5.5-proOpenRoutergeneric-correctness
100.0% ·fmt
8.1 tok/s21464 ms$0.015$0.0149/pt7
14
frontierOpenRoutergeneric-correctness
100.0% ·fmt
0 ms1
15
resident-brainSpark GPUgeneric-correctness
85.7% ·fmt
97.9 tok/s134 ms7
16
openai/gpt-5.5-proOpenRouterpatent_claim_validity
50.0% ·fmt
26437.0 tok/s84263 ms2
17
resident-brainSpark GPUpatent_claim_validity
42.9% ·fmt
89.3 tok/s140 ms7
18
stepfun/step-3.7-flashOpenRouterpatent_claim_validity
0.0% ·fmt
239.6 tok/s6530 ms1
19
openai/gpt-4o-miniOpenRouterpatent_claim_validity
0.0% ·fmt
200.5 tok/s792 ms1
20
saul-7b-instruct-v1-gguf (Q4_K_M)Spark GPUpatent_claim_validity
0.0% ·fmt
46.5 tok/s62 ms1
21
deepseek/deepseek-r1-0528OpenRoutergeneric-correctness
0.0% ·fmt
$0.00001

Source — fieldkit.arena.mirror.export_publishable_slice(); allowlist pinned by fieldkit/tests/arena/demo/test_mirror_does_not_leak.py. The chat_* tables, compare_runs.prompt, and compare_responses.{content,reasoning} are NEVER enumerated.