Why I'm Writing an AI Book

Most AI books fall into two categories. Textbooks that explain everything and help you understand nothing. Tutorials that help you copy-paste until something breaks.

I'm writing a third kind.

The Book

It's called Inference Intuition: Fine-Tuning, Alignment, and Agents from Scratch. It's part of the research output at On Ground Labs. Part 1 — nine chapters covering everything from evaluation to alignment — is complete. Part 2, on agents and agentic architecture, is in drafts.

The goal isn't to teach you how to run a training script. Any LLM can generate a full training pipeline in 2026. The goal is to help you understand why the code works, what breaks when you change things, and how to think about the problem.

That's the gap. Not syntax. Intuition.

Evaluation Comes First

Most books and courses follow a predictable arc: start with the basics, build up to training, fine-tuning, deployment — and then, somewhere near the end, mention evaluation. Like testing is an afterthought.

Chapter 0 of this book is evaluation. Not Chapter 8. Not an appendix. The very first thing.

The reasoning is borrowed from software engineering. You don't write code for six months and then ask "does this work?" You write the test first. LLMs make this even more important because they're non-deterministic. Run the same prompt twice, get different outputs. Without eval discipline, you're flying blind and calling it progress.

Eval isn't just model benchmarks. It's a simpler question: did this do what I wanted? Five prompt versions, five evaluation criteria, a spreadsheet. Which prompt does what you need? That's eval. You can do it before you train anything.

Code for Intuition, Not Production

Every code block in the book is 5–15 lines. One idea per snippet. No imports, no boilerplate, no full pipelines.

This was a deliberate choice. The code exists to build mental models. A snippet that shows you the shape of a chat template — what the tokens actually look like before and after formatting — teaches more than a 200-line training script you'll copy and never read.

Before each code block: what are we about to see and why it matters. After: what would change if you tweaked this, and what the failure mode looks like. The code is sandwiched in explanation because the explanation is the point.

If a reader wants production code, they can ask Claude or Copilot. What they can't get from an LLM is the judgment to know when the code is wrong. I've written about this elsewhere — reading code builds the mental model that makes AI-assisted coding effective. This book applies the same principle: understanding first, productivity follows.

Opinions, Not Options

The book has a banned words list. Revolutionary. Game-changing. Paradigm shift. Groundbreaking. None of these appear anywhere.

It also has a stance on every major decision in the space:

LoRA is the right default. 95% of the quality at 5% of the compute. Unless you have a specific reason for full fine-tuning, start here.
DPO over PPO for most teams. Simpler, stabler, fewer hyperparameters.
SFT is surgery, not school. You're not teaching the model language — it already knows language. You're making a targeted update to shift behavior on your specific task.
Data quality over quantity. 1,000 curated examples beat 100,000 noisy ones.

These aren't hedged with "one might consider" or "depending on your use case." If I think something is the right call, I say so. If there's an edge case where it breaks, I say that too. But I don't pretend every option is equally valid when it isn't.

The Structure

Part 1 covers the training stack, front to back:

Evaluation — why eval comes first
Tokenization — BPE, SentencePiece, token boundaries
Pretraining — next-token prediction, scaling laws
Data — quality, deduplication, contamination, synthetic data
Training Systems — learning rate, batch size, reproducibility
Decoding & Generation — temperature, top-k, top-p
SFT — supervised fine-tuning as behavior modification
PEFT — LoRA, QLoRA, and the tradeoffs
Alignment & RL — RLHF, PPO, DPO, GRPO

Part 2 — agents, RAG, memory, agentic architecture — is where the applied work lives. It's not done yet. I'd rather ship it right than ship it fast.

Why This Book, Why Now

There's a specific kind of practitioner I'm writing for. Someone who uses LLMs daily, has fine-tuned a model or two, but doesn't have a coherent mental model of how the pieces fit together. They know the commands but not the reasons. They can follow a tutorial but can't debug when it breaks in a new way.

Textbooks give them too much theory. Tutorials give them too little. This book sits in the middle: enough depth to build real intuition, enough restraint to not bury it in math that doesn't help.

The book is part of On Ground Labs' broader thesis — that useful AI research and education should be accessible, not gated behind expensive compute or academic credentials. The same philosophy that drives our research projects drives how this book is written. If a graduate student can't run the examples on a laptop, we've failed.

I'll share more as Part 2 takes shape. For now, Part 1 is done, and I'm happy with how it turned out.

The book connects to a broader thesis I've been developing — that small, specialized models will matter more than trillion-parameter ones, and that the real bottleneck isn't syntax but systems understanding.

Learn more about On Ground Labs →