Orionfold Arena — Models

GPU Util — % utilisation

GPU Temp — °C die

Unified — GB of 128 · 8 GB guard

Throughput — tok / second

TTFT — ms · first token

throughput & first-token from the active lane

Active Lane idle no warm brain

Sidecar offline — start with fieldkit arena serve on the Spark to feed this rail.

Models8quant · lora · adapter

Benches3eval datasets

Notebooks5runnable on-ramps

Tooling6harnesses · skills

Free tier22run offline · yours

Kind License

quant advisor free

A governed 4B advisor over your corpus — exact source-id citations, trusted refusals, local on a DGX Spark

base nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

recommended Q4_K_M · 70 tok/s Explore →

quant astro free

A numeric astrodynamics reasoner — one verifiable boxed number out, served local on a DGX Spark for $0 a query

base Qwen/Qwen3-8B

recommended Q8_0 · 21 tok/s Explore →

lora patent free

Offline patent-prosecution reasoning on Spark-class hardware

base deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

recommended BF16 Explore →

quant patent free

Offline patent-prosecution reasoning on Spark-class hardware

base deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

recommended Q5_K_M · 35 tok/s Explore →

quant medical free

An 8B medical-reasoning model with a visible think-chain, quantized for offline clinical Q&A

base Intelligent-Internet/II-Medical-8B

recommended Q5_K_M · 36 tok/s Explore →

quant cyber free

A 7B cybersecurity chat model, quantized to run offline on a consumer GPU

base ZySec-AI/SecurityLLM

recommended Q4_K_M · 48 tok/s Explore →

quant legal free

A 7B legal-domain chat model, quantized to run offline on a consumer GPU

base Equall/Saul-7B-Instruct-v1

recommended Q5_K_M · 20 tok/s Explore →

quant finance free

A finance-specialized 7B chat model, quantized to run offline on a 4 GB consumer GPU

base AdaptLLM/finance-chat

recommended F16 · 12 tok/s Explore →

bench advisor free

The eval set that caught what prompting couldn't hold — frozen OOD curveballs for grounded citation, refusal, and routing

base n/a

0 variants Explore →

hermes-brain-bench-v0.1

base n/a

0 variants Explore →

bench patent free

patent-strategist-bench-v0.1

base n/a

0 variants Explore →

notebook finance free

Build the finance-chat quant — and call the model — on a Spark or a free cloud GPU

base AdaptLLM/finance-chat

recommended builder Explore →

notebook legal free

Build the Saul-7B quant — and call the legal model — on a Spark or a free cloud GPU

base Equall/Saul-7B-Instruct-v1

recommended builder Explore →

notebook cyber free

Build the SecurityLLM quant — and call the model — on a Spark or a free cloud GPU

base ZySec-AI/SecurityLLM

recommended builder Explore →

notebook medical free

Build the II-Medical-8B quant — and call the reasoner — on a Spark or a free cloud GPU

base Intelligent-Internet/II-Medical-8B

recommended builder Explore →

notebook patent free

Run the patent-strategist build — and use the model — on a Spark or a free cloud GPU

base deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

recommended builder Explore →

harness advisor free

A local memory layer that gates its own recall

base fieldkit.memory · pgvector(vectors/blog_chunks) · NIM llama-nemotron-embed-1b-v2

recommended cosine-only · top_k=5 · GB10 measured baseline Explore →

arena run astro free

An operator cockpit you run on your own DGX Spark

base fieldkit[arena] · Astro + FastAPI sidecar

0 variants Explore →

When does local stop being enough? Measure first, then route.

base Hermes Agent v0.14.0

recommended Local Spark — Qwen3-30B-A3B MoE Q4_K_M Explore →

Which local lane should drive your always-on Spark agent?

base Hermes Agent v0.14.0

recommended llama.cpp · Qwen3-30B-A3B (MoE, Q4_K_M) · 88 tok/s Explore →

The skills you write for Claude Code load into Hermes unchanged.

base agentskills.io SKILL.md (Hermes / Claude Code compatible)

recommended spark-serve Explore →

One always-on brain, five specialists, zero LLM-classifier overhead.

base Hermes Agent v0.14.0

recommended Default brain (MoE) Explore →