Book
AI Research on NVIDIA DGX Spark
A growing field journal of real AI research run on one desktop machine, the NVIDIA DGX Spark. Fifty three chapters in eight parts, every lesson backed by code you can run yourself. Free to read online.
- Local AI
- NVIDIA DGX Spark
- Petascale desktop
- Backed by code
- Length
- 53 chapters
- Parts
- 8 parts
- Size
- About 145,000 words
- Price
- Free to read
AI Research on NVIDIA DGX Spark
What's inside
Part 1 · Foundations
- 1
Setting up the Spark for solo AI work
Why the tools you work through matter more on day one than the model itself.
- 2
One machine, three ways to build on it
The same setup opens three paths, and this chapter walks you to the top of each.
- 3
What it takes to retrain a giant model
Three ways to fine-tune a 100B model (adjust a ready model on your own data), and how much memory each one needs.
- 4
What the research agent really built, in plain words
A full day of automated work for two cents of power, and why training a model from scratch is rarely worth it.
- 5
The real memory cost of serving a model
Why the bill at answer time is set by how many users and how long the prompt, not by model size.
Part 2 · Inference and retrieval
- 6
Your first model server on the Spark
Running NVIDIA's ready-to-run Llama 3.1 8B, and what the speed number does not tell you.
- 7
Your own space of meaning
A local service that turns text into numbers so the computer can find related ideas fast.
- 8
Where the meaning lives
Storing those numbers in a plain database so you can search them in milliseconds.
- 9
Three services, one answer
Letting the model look things up before it answers, the simple way. This is what people call RAG.
- 10
Better ways to look things up
Four search methods on one set of notes, and which one finds the right page.
- 11
A bigger model, the same gaps
Testing an 8B, a 49B, and a 70B model on one setup, and why a bigger model alone did not fix the misses.
- 12
A safety gate before the model speaks
One rule layer with three jobs, guarding private data, house style, and safe code.
- 13
Teaching a model to explore at answer time
A small add-on that helps the model reach wider for an answer without costing more compute.
- 14
Six fixes hiding behind two
A change that looked like two patches turned into six, and the score it reached once they all landed.
- 15
Three shapes of the same trick
Where that answer-time add-on helps a lot, a little, or not at all.
Part 3 · Training and pretraining
- 16
A real training framework against a hand-built script
Same model, same steps, and what a proper framework gives back in speed and memory.
- 17
Finding the fastest training settings
Sixteen setups swept to find the peak, landing at about 14,000 text pieces trained per second.
- 18
When real data beats random data
Feeding real text instead of noise, and how little it slows the training down.
- 19
How a small machine saves a big cloud bill
Test a hundred ideas on the desk for about a dollar of power, then rent the big machine only for the winner.
Part 4 · Fine-tuning and alignment
- 20
Teaching a model your own voice
231 of your own question-and-answer pairs and a short, cheap retrain, and what it changes.
- 21
Copying the research agent's taste
Training a small model on the agent's past choices, and where it falls short.
- 22
Building the training gym ourselves
A workbench, 200 tasks, and the lift a small retrain earned over the plain model.
- 23
Closing the loop the first retrain could not
A reward signal that teaches the agent to stop once the job is actually done.
- 24
When the practice score lies
A method that looks great in practice but slips on fresh, held-out tasks.
- 25
Smarter limits on a long task
A training tweak that pays more attention to the turns that actually taught the agent something.
- 26
Knowing where a model stands before you train it
Three test settings that bracket a model's ceiling on one machine, no cluster needed.
- 27
The trainer was fine, the data was not
Three confident wrong guesses, and the cheap bug in the data that caused all of them.
- 28
A faster trainer that fits the same memory
Six checks that prove a leaner training tool holds the same memory budget end to end.
- 29
Two trainers, one job, a 26% gap
The same recipe through two tools, and which one trained faster and wrote longer answers.
Part 5 · Agentic systems
- 30
The sandbox cost that was not the problem
Running a safe, walled-off agent next to a plain one on the same model, and where the real cost turned out to be.
- 31
Turning the research stack into a tool
Wrapping the look-it-up chain so any coding session can use it as a grounded helper.
- 32
Rules before the agent edits code
Five checks sit between what the agent proposes and any change it is allowed to make.
- 33
The overnight loop that edits its own trainer
Fifty rounds of a model improving its own training code while you sleep, for seven cents of power.
- 34
Reading the agent's paper trail
How keeping a simple log of past tries made the next try far more useful.
Part 6 · Observability and evaluation
- 35
Scoring the research stack
44 held-out questions, and which setup actually earned the points.
- 36
Was the agent working or stalling?
Putting real numbers on how often the agent just repeated itself.
- 37
One test, two ways to fail
Two models on the same hard test, both scoring zero for completely different reasons.
Part 7 · Deployment and distribution
- 38
The 4-bit trick that beats the rest
Why a newer way of shrinking the numbers, not just smaller numbers, is the real speed win on this chip.
- 39
Five finance model builds, measured
Packaging a finance model five ways and scoring each on speed, size, and a finance test.
- 40
Five legal model builds, measured
The same five-way test for a legal model, with a law-exam score for each build.
- 41
Five security model builds, measured
The same for a cyber-security model, where the smallest build came out on top.
- 42
Five medical model builds, measured
The same for a medical-reasoning model, with a clear study-helper-not-a-doctor note.
Part 8 · Field Kit toolkit reference
- 43
capabilities
A clear map of what the Spark can do, with the memory math built in.
- 44
nim
A tidy client for talking to the model server, with retries and size checks.
- 45
rag
The look-it-up pipeline, taking in notes, finding the right ones, then answering from them.
- 46
eval
The scoring tools, including tests, judges, and a checker for when a model refuses to answer.
- 47
training
The building blocks for retraining a model on the Spark.
- 48
lineage
A simple log that records what each training try learned.
- 49
quant
The tool that shrinks a model and measures what you trade away for the smaller size.
- 50
publish
The pieces that push a finished model to HuggingFace with a full report card.
- 51
command line tool
Quick checks and small benchmarks without writing any code.
- 52
viz
Branded charts and tables for the research notebooks.
- 53
notebook
A runtime that runs the same notebook on the Spark or on a free cloud GPU.
AI Research on NVIDIA DGX Spark is a running log of real AI research, all done on one small desktop machine. The DGX Spark is tiny but very powerful, so you can push local AI a long way with no cloud bill and no shared servers. Every chapter is a working note from the bench, and every claim is backed by code that runs.
What you will learn
You start by setting the machine up for everyday work, then build a system that can look things up before it answers (so it stays grounded in your own notes). From there you train and retrain models, run an agent that improves its own training code overnight, and measure the results in plain numbers. The last part is a reference for Field Kit, the small Python toolkit that ties it all together.
Who it is for
Builders and researchers who want to run serious AI on hardware they own, not rent. You do not need a cluster or a big budget. You can read the whole thing free online. If you want a copy to keep and read offline, the PDF and EPUB bundle is yours for a one time price.
Get the full book
PDF and EPUB, yours to keep.
$50one time
Not happy in the first 14 days? Email us and we refund you, no questions asked.
Keep exploring

Patent Strategist
Offline patent reasoning in ready-to-run files, built with the NeMo toolkit. Nothing leaves your desktop.

SecurityLLM
Tuned for cyber security questions, threat write-ups, and security know-how.

Saul 7B Instruct
Tuned for legal text and built to follow instructions on legal tasks.

Finance Chat
Tuned for finance and money questions in plain chat.

II-Medical 8B
Tuned for medical questions and clinical text.

AI Native Business
A book on running a business with AI agents (software helpers that do the work for you). Fourteen short chapters in four parts take you from the first idea to a working system. About a two hour read, open to all.

My first model on a desktop
I ran my first model on a small computer on my desk. 52 milliseconds to the first word, no cloud, no per-use bill. It felt like a local function, not a service.

Access first, models second
On day one with my desktop AI machine I did not pick a model. I set up how I reach it. Models change every six months. Good access lasts for years.