Book

AI Research on NVIDIA DGX Spark

A growing field journal of real AI research run on one desktop machine, the NVIDIA DGX Spark. Fifty three chapters in eight parts, every lesson backed by code you can run yourself. Free to read online.

  • Local AI
  • NVIDIA DGX Spark
  • Petascale desktop
  • Backed by code
Read online
The AI Research on NVIDIA DGX Spark book cover, a black cover with a glowing gold brain made of network nodes and circuit lines, the title in gold and white.
Length
53 chapters
Parts
8 parts
Size
About 145,000 words
Price
Free to read

AI Research on NVIDIA DGX Spark

What's inside

Part 1 · Foundations

  1. 1

    Setting up the Spark for solo AI work

    Why the tools you work through matter more on day one than the model itself.

  2. 2

    One machine, three ways to build on it

    The same setup opens three paths, and this chapter walks you to the top of each.

  3. 3

    What it takes to retrain a giant model

    Three ways to fine-tune a 100B model (adjust a ready model on your own data), and how much memory each one needs.

  4. 4

    What the research agent really built, in plain words

    A full day of automated work for two cents of power, and why training a model from scratch is rarely worth it.

  5. 5

    The real memory cost of serving a model

    Why the bill at answer time is set by how many users and how long the prompt, not by model size.

Part 2 · Inference and retrieval

  1. 6

    Your first model server on the Spark

    Running NVIDIA's ready-to-run Llama 3.1 8B, and what the speed number does not tell you.

  2. 7

    Your own space of meaning

    A local service that turns text into numbers so the computer can find related ideas fast.

  3. 8

    Where the meaning lives

    Storing those numbers in a plain database so you can search them in milliseconds.

  4. 9

    Three services, one answer

    Letting the model look things up before it answers, the simple way. This is what people call RAG.

  5. 10

    Better ways to look things up

    Four search methods on one set of notes, and which one finds the right page.

  6. 11

    A bigger model, the same gaps

    Testing an 8B, a 49B, and a 70B model on one setup, and why a bigger model alone did not fix the misses.

  7. 12

    A safety gate before the model speaks

    One rule layer with three jobs, guarding private data, house style, and safe code.

  8. 13

    Teaching a model to explore at answer time

    A small add-on that helps the model reach wider for an answer without costing more compute.

  9. 14

    Six fixes hiding behind two

    A change that looked like two patches turned into six, and the score it reached once they all landed.

  10. 15

    Three shapes of the same trick

    Where that answer-time add-on helps a lot, a little, or not at all.

Part 3 · Training and pretraining

  1. 16

    A real training framework against a hand-built script

    Same model, same steps, and what a proper framework gives back in speed and memory.

  2. 17

    Finding the fastest training settings

    Sixteen setups swept to find the peak, landing at about 14,000 text pieces trained per second.

  3. 18

    When real data beats random data

    Feeding real text instead of noise, and how little it slows the training down.

  4. 19

    How a small machine saves a big cloud bill

    Test a hundred ideas on the desk for about a dollar of power, then rent the big machine only for the winner.

Part 4 · Fine-tuning and alignment

  1. 20

    Teaching a model your own voice

    231 of your own question-and-answer pairs and a short, cheap retrain, and what it changes.

  2. 21

    Copying the research agent's taste

    Training a small model on the agent's past choices, and where it falls short.

  3. 22

    Building the training gym ourselves

    A workbench, 200 tasks, and the lift a small retrain earned over the plain model.

  4. 23

    Closing the loop the first retrain could not

    A reward signal that teaches the agent to stop once the job is actually done.

  5. 24

    When the practice score lies

    A method that looks great in practice but slips on fresh, held-out tasks.

  6. 25

    Smarter limits on a long task

    A training tweak that pays more attention to the turns that actually taught the agent something.

  7. 26

    Knowing where a model stands before you train it

    Three test settings that bracket a model's ceiling on one machine, no cluster needed.

  8. 27

    The trainer was fine, the data was not

    Three confident wrong guesses, and the cheap bug in the data that caused all of them.

  9. 28

    A faster trainer that fits the same memory

    Six checks that prove a leaner training tool holds the same memory budget end to end.

  10. 29

    Two trainers, one job, a 26% gap

    The same recipe through two tools, and which one trained faster and wrote longer answers.

Part 5 · Agentic systems

  1. 30

    The sandbox cost that was not the problem

    Running a safe, walled-off agent next to a plain one on the same model, and where the real cost turned out to be.

  2. 31

    Turning the research stack into a tool

    Wrapping the look-it-up chain so any coding session can use it as a grounded helper.

  3. 32

    Rules before the agent edits code

    Five checks sit between what the agent proposes and any change it is allowed to make.

  4. 33

    The overnight loop that edits its own trainer

    Fifty rounds of a model improving its own training code while you sleep, for seven cents of power.

  5. 34

    Reading the agent's paper trail

    How keeping a simple log of past tries made the next try far more useful.

Part 6 · Observability and evaluation

  1. 35

    Scoring the research stack

    44 held-out questions, and which setup actually earned the points.

  2. 36

    Was the agent working or stalling?

    Putting real numbers on how often the agent just repeated itself.

  3. 37

    One test, two ways to fail

    Two models on the same hard test, both scoring zero for completely different reasons.

Part 7 · Deployment and distribution

  1. 38

    The 4-bit trick that beats the rest

    Why a newer way of shrinking the numbers, not just smaller numbers, is the real speed win on this chip.

  2. 39

    Five finance model builds, measured

    Packaging a finance model five ways and scoring each on speed, size, and a finance test.

  3. 40

    Five legal model builds, measured

    The same five-way test for a legal model, with a law-exam score for each build.

  4. 41

    Five security model builds, measured

    The same for a cyber-security model, where the smallest build came out on top.

  5. 42

    Five medical model builds, measured

    The same for a medical-reasoning model, with a clear study-helper-not-a-doctor note.

Part 8 · Field Kit toolkit reference

  1. 43

    capabilities

    A clear map of what the Spark can do, with the memory math built in.

  2. 44

    nim

    A tidy client for talking to the model server, with retries and size checks.

  3. 45

    rag

    The look-it-up pipeline, taking in notes, finding the right ones, then answering from them.

  4. 46

    eval

    The scoring tools, including tests, judges, and a checker for when a model refuses to answer.

  5. 47

    training

    The building blocks for retraining a model on the Spark.

  6. 48

    lineage

    A simple log that records what each training try learned.

  7. 49

    quant

    The tool that shrinks a model and measures what you trade away for the smaller size.

  8. 50

    publish

    The pieces that push a finished model to HuggingFace with a full report card.

  9. 51

    command line tool

    Quick checks and small benchmarks without writing any code.

  10. 52

    viz

    Branded charts and tables for the research notebooks.

  11. 53

    notebook

    A runtime that runs the same notebook on the Spark or on a free cloud GPU.

AI Research on NVIDIA DGX Spark is a running log of real AI research, all done on one small desktop machine. The DGX Spark is tiny but very powerful, so you can push local AI a long way with no cloud bill and no shared servers. Every chapter is a working note from the bench, and every claim is backed by code that runs.

What you will learn

You start by setting the machine up for everyday work, then build a system that can look things up before it answers (so it stays grounded in your own notes). From there you train and retrain models, run an agent that improves its own training code overnight, and measure the results in plain numbers. The last part is a reference for Field Kit, the small Python toolkit that ties it all together.

Who it is for

Builders and researchers who want to run serious AI on hardware they own, not rent. You do not need a cluster or a big budget. You can read the whole thing free online. If you want a copy to keep and read offline, the PDF and EPUB bundle is yours for a one time price.

Get the full book

PDF and EPUB, yours to keep.

$50one time

Read online

Not happy in the first 14 days? Email us and we refund you, no questions asked.