GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
throughput & first-token from the active lane
Active Lane idle no warm brain

← Models

What it's for
  • Re-index a multi-source corpus (article · lineage · eval · scout · deep_research) with provenance stamped per chunk, dispatched from the cockpit
  • Score chunk-recall@k + slug-recall@k against an in-repo gold set, gated like-for-like against the prior index so a rebuild can't silently regress recall
  • Query the Second Brain with a provenance/trust-tier filter — cited hits a hosted RAG can't honestly attribute

Audience — DGX Spark operators running a private, local-first RAG recall layer they drive, not a SaaS.

Quant economics quality × speed per build
Variant qa-eval.jsonl · 44 held-out Q · chunk-recall@5 / slug-recall@5 (cosine-only, GB10)
cosine-only · top_k=5 · GB10 measured baseline sweet spot
chunk-recall@5 0.41
slug-recall@5 0.73
Known drift bounded · honest
  • Reranker absent on GB10 the cosine-only score over top-5 retrieval is the floor, not the reranked ceiling; 1 reranker lane is unsupported on GB10 (NGC 410-gone, no -dgx-spark profile), so rerank=True hard-raises rather than mislabel a score (R22).
  • Generator-side metrics not in this lane 3 of 3 generator-side scores (faithfulness / correctness / refusal-rate) are left null — they need the generator NIM; this is the retrieval-only recall measurement.
  • Source-class population the multi-source provenance schema is live across 5 classes (article · lineage · eval · scout · deep_research) but only the article class is populated today — 313/313 chunks across 49 published articles; the other 4 ingest paths are wired but unpopulated.
  • Gold-set size recall is measured over 44 qa-eval rows, not a large-N guarantee.