Applied Part 0. AgentClinic-production Laboratory
Status: Standard for the learning path. This part does not introduce a new technique. It explains how to read the second volume as a single laboratory branch after the first volume.
The first volume builds a small AgentClinic: routes, SQLite, feature specifications, checks, and reviews. In the second volume, the same project is used as a learning production model. We do not require real Kubernetes, Grafana, PagerDuty, or GitOps. These words denote roles in scenarios: where the signal came from, which action could be dangerous, where rollback is needed, and which artifact proves the solution.
If you read the chapters as a set of independent advanced tricks, the volume will quickly become heavy. Read it differently: one project, one main production pipeline, one growing package of evidence. For the default completion, use high_memory_usage; other incidents are needed as small laboratory windows for individual mechanisms.
Practical rule for the first pass: capstone/README.md must answer for one incident-case. A local example from another chapter may use a different incident, but only the verifiable principle is transferred to the completion package. For example, from autoscale_200pct the second case is not transferred, but the protective rule (guard) "do not expand the blast radius beyond quota" is. From cdn_error_budget_burn a new service is not transferred, but the anti-Goodhart invariant "MTTR cannot be improved at the cost of silent P0" is.
Before Reading
Part 0 is methodological, without a learning case. There are no steps and no control fact here; the task is to set the map of the volume and agree on the execution stack. From the next chapter, the standard "Before Reading" block returns and works as a contract between the chapter and the completion.
Goal
Before Chapter 1, you need to understand four things:
- which learning production case passes through the completion package;
- which files are considered the result of each chapter;
- which commands are actually runnable, and which are only the interface of a future production layer;
- where the learning minimum ends and the full implementation track begins.
End-to-end Case
The base scenario is called AgentClinic-production. This is the same AgentClinic, but now there is an operational pipeline around it. The main completion case is high_memory_usage in appointments-api: it is convenient to bring to webhook normalization, readiness gateway, trial run, and final evidence package. Additional cases show individual mechanisms, but are not required to be mixed in one capstone/.
- service
appointments-api; - alerts
high_memory_usage,autoscale_200pct,appointment_latency/appointment_latency_spike,node_not_ready,cdn_error_budget_burn; - specifications that must survive
/clear, model change, and review by another person; - prohibition on dangerous actions without evidence: expanding the blast radius, losing audit, silently closing P0, automatic bypass of rollback.
The learning branch does not need to contain real production code. Artifacts in capstone/, templates from examples/templates/, and runnable examples from examples/ are sufficient. If a chapter uses something other than high_memory_usage, record in capstone/ only the verifiable output: which defect, counterexample, budget risk, or invariant needs to be transferred to the main case.
Short transfer map:
| Local chapter case | What to transfer to high_memory_usage |
|---|---|
node_not_ready | requirement provenance and the rule "do not close without evidence of recovery" |
appointment_latency / appointment_latency_spike | one specification defect class or stress-mutator result (distinction: appointment_latency — general incident class "route /agents delay", appointment_latency_spike — specific learning payload in examples/stress-mutator/base/base_spec.json for chapters 2 and 5) |
autoscale_200pct | counterexample to expanding the blast radius or budget risk |
cdn_error_budget_burn | a pair of KPIs + guard metric against Goodharting |
Execution Stack
In the first volume, AgentClinic is a TypeScript application with Hono, server JSX, SQLite, and Vitest. This stack does not go anywhere: in the learning model AgentClinic-production, it remains the stack of the product itself.
In the second volume, a second layer of code appears — small runnable scripts in book2/examples/. They are written in Python stdlib and are only needed so that one person on their machine can run a minimal chapter example in a couple of seconds without raising infrastructure. This is not a change of the product stack and not a hint that production AgentClinic has been rewritten in Python. These are learning simulators: stress-mutator, duel, Spec CI, token budget, readiness calculator. In a real project, such checks are more often implemented as pre-commit, GitHub Actions, MCP tool, or a service on your own stack — Python here is just the cheapest language to run without building.
The rule is simple: everything in book2/examples/ can be run as python3 ... without dependencies. Everything marked in the chapter as [project script] or [conceptual interface] is the form of a future script or integration in your project, not tied to Python.
Minimal Path
If you have little time, go through the second volume like this:
- Read this part and the README of the chosen runnable example.
- In chapters 1–3, fill in three manual artifacts:
genealogy.md, poisoned/fixed pair, andconstitution.md.
- In chapters 4–11, run only
[runnable]commands fromexamples/; transfer results from other cases tocapstone/as a principle, not as a new domain. - In chapter 12, check the package against the diagnostic checklist.
- In chapter 13, assemble a small
capstone/for one incident.
The minimal path does not require writing external orchestrators, MCP servers, Kubernetes integrations, or real CI gateways. These elements belong to the full track.
The path check is simple: after each chapter, one new verifiable output should appear in capstone/. Not a full production process, but a small record that can be shown to another person.
> How to read the table. The "output" column is intentionally described in plain words, without terms from chapters 4–13. If you encounter a word in the right column that is not yet in the short dictionary below, it is introduced in the chapter where needed. Do not try to learn the volume dictionary from this table.
| After chapter | Minimal output |
|---|---|
| 1 | one requirement with two sources in genealogy.md |
| 2 | a specification pair "with defect / fixed" showing one error class |
| 3 | constitution.md with two immutable rules and one rule with an expiration date | | 4 | minimal counterexample to one rule or formulation of the next limiter | | 5 | smoke-run result of stress-mutator or brief report on which mutations the validator caught | | 6 | one accepted and one rejected shadow candidate (rules that could have entered the specification) | | 7 | Spec CI line: what is covered, what is blocked | | 8 | one file with verdict on a disputed change and link to evidence | | 9 | model budget exhaustion risk and threshold for switching to the cheap tier | | 10 | target metric and paired guard metric against its Goodhart skew | | 11 | production admission verdict and dry-run of permitted action | | 12 | three items blocker / owner / next_check | | 13 | assembled capstone/ for one incident with five PASS lines of rubric |
If additional terms appear in a chapter but this output is missing, close the output first. Terms can be read later.
For orientation, keep the filled example [examples/templates/capstone-dossier.md](examples/templates/capstone-dossier.md) nearby. This is not a template for mindless copying, but a minimal form of answer to the question: "what trace should remain after the first pass?"
The minimal dictionary for the first pass is short:
capstone/— final evidence package for one incident;genealogy.md— requirement provenance and confidence level;validation.md— commands, manual facts, and blockers;judgment.md— verdict on disputed change;readiness.md— why action is permitted, blocked, or goes to semi-manual mode.
All other terms are only needed when they help fill one of these files. If a term does not affect the current chapter output, do not stop on it during first reading.
What to Actually Run
The second volume uses three types of command blocks:
- [runnable] — run as written. Example is in
book2/examples/. - [project script] — this is the contract of a future script in your project. If no runnable analog is indicated nearby, the command does not have to exist in the tutorial repository.
- [conceptual interface] — form of a future integration. It does not need to be run during learning.
The rule is simple: the completion package may only reference facts that you actually ran, or manual artifacts that can be read without chat history.
First Smoke Run
Before reading chapters 4–11, it is useful to verify that local examples work:
bash book2/examples/smoke_all.sh
The script runs a smoke test on a temporary copy of book2/examples, so it does not leave out/ and __pycache__ in the working tree. If time is short, open examples/README.md and select only the block for the chapter you are currently passing.
Working Directory for Completion
Create the future package directory:
mkdir -p capstone
Leave it empty for now. In chapters 1–12 you will gradually understand which files will end up there. Do not mix several incidents in one evidence package: one file may reference a runnable analog from another case, but the solution must explain one main incident.
capstone/
README.md
genealogy.md
poisoned-spec.md
fixed-spec.md
constitution.md
validation.md
judgment.md
budget-note.md
goodhart-note.md
readiness.md
antipattern-audit.md
This structure repeats the first volume: first intent and boundaries, then plan and facts, then review and final package. The only difference in the second volume is that facts relate not to one feature, but to production admission of a dangerous action.
On the first pass, do not add files from the full track to capstone/ just because they are named in the chapter. scorebook, metric_network, decision_hash, precedents.md, and CI reports are needed when you actually created them or can explain which runnable analog confirms the same principle.
To make it easier to navigate which chapter opens which file:
capstone/ file | Opened by |
|---|---|
genealogy.md | chapter 1 |
poisoned-spec.md / fixed-spec.md | chapter 2 |
constitution.md | chapter 3 |
validation.md (happy + negative + counterexample) | chapters 4 and 7 |
judgment.md | chapter 8 |
budget-note.md | chapter 9 |
goodhart-note.md | chapter 10 |
readiness.md | chapter 11 |
antipattern-audit.md | chapter 12 |
README.md (final assembly) | chapter 13 |
If along the way a fourth or fifth file appears in a chapter, not in this list — this is the full track. Record the principle in one line and move on.
Control Fact
After this chapter, one main incident-case is chosen, an empty capstone/ is created, and runnable examples are checked with bash book2/examples/smoke_all.sh or postponed with an explicit reason. If you cannot name the main case and the first file that will go into capstone/, it is too early to proceed to chapter 1.
Control Questions
- Why is AgentClinic-production a learning model, not a requirement to raise real infrastructure?
- How does
[runnable]differ from[project script]? - Why can
high_memory_usageandautoscale_200pctnot be mixed in the finalcapstone/as two equal cases? - Why must the final
capstone/be understandable without chat history?