harness · advisor

A local memory layer that gates its own recall

A raw pgvector table drifts silently and carries no trust card on its chunks — you cannot tell a Spark-measured fact from an external claim, and a re-index can quietly drop recall with no alarm. Orionfold Cortex wraps the index with stamped provenance, a coverage report, and a recall@k promotion gate, all dispatched and watched through the Arena control plane. The machine manages its own memory.

base fieldkit.memory · pgvector(vectors/blog_chunks) · NIM llama-nemotron-embed-1b-v2 · license apache-2.0 ·recommended cosine-only · top_k=5 · GB10 measured baseline

▶ Try in chat ＋ Send to compare

What it's for

Re-index a multi-source corpus (article · lineage · eval · scout · deep_research) with provenance stamped per chunk, dispatched from the cockpit
Score chunk-recall@k + slug-recall@k against an in-repo gold set, gated like-for-like against the prior index so a rebuild can't silently regress recall
Query the Second Brain with a provenance/trust-tier filter — cited hits a hosted RAG can't honestly attribute

Audience — DGX Spark operators running a private, local-first RAG recall layer they drive, not a SaaS.

Quant economics quality × speed per build

Variant	qa-eval.jsonl · 44 held-out Q · chunk-recall@5 / slug-recall@5 (cosine-only, GB10)
cosine-only · top_k=5 · GB10 measured baseline sweet spot	—
chunk-recall@5	0.41
slug-recall@5	0.73

Known drift bounded · honest

Reranker absent on GB10 the cosine-only score over top-5 retrieval is the floor, not the reranked ceiling; 1 reranker lane is unsupported on GB10 (NGC 410-gone, no -dgx-spark profile), so rerank=True hard-raises rather than mislabel a score (R22).
Generator-side metrics not in this lane 3 of 3 generator-side scores (faithfulness / correctness / refusal-rate) are left null — they need the generator NIM; this is the retrieval-only recall measurement.
Source-class population the multi-source provenance schema is live across 5 classes (article · lineage · eval · scout · deep_research) but only the article class is populated today — 313/313 chunks across 49 published articles; the other 4 ingest paths are wired but unpopulated.
Gold-set size recall is measured over 44 qa-eval rows, not a large-N guarantee.

Get it

Open on HuggingFace ↗ Read the build article

Run it local

Yours, offline, on the Spark:

pip install fieldkit[arena]
fieldkit arena up

then drive this model from the cockpit — prompts and telemetry never leave the box.