Applied Part 13. Practical Exam: Assemble a Production SDD Pipeline
Status: Recommendation. This part does not introduce a new mechanism. It assembles the second volume into a single verifiable route, following the pattern of the practical exam of the first volume. The goal is to prove that you can walk through a production SDD scenario from legacy traces to a solution that is admitted by facts, not by agent confidence.
The exam is best taken after chapters 1–12. If you are reading the volume selectively, use this part as a map of missing artifacts: any gap in the capstone/ package shows which chapter you need to return to. If it is unclear how to connect files into a single case, return to Part 0: it sets the AgentClinic-production lab frame and explains what counts as the educational minimum.
Goal
By the end of the exam, you should have one coherent evidence package for AgentClinic-production:
- a recovered requirement with provenance;
- a controllably defective specification, fixed;
constitution.mdwith immutable and mutable rules;- at least one counterexample and one duel record;
- a local Spec CI or its runnable equivalent;
judgment.mdor a precedent record;- budget and anti-Goodhart controls;
- a readiness gate and a list of blockers;
- a diagnostic antipattern checklist.
The exam is considered passed not when all files look complete, but when another person can open the package, repeat the key checks, and understand why the solution is safe to admit or why it must be deferred.
Final Case
Work with one production incident. The recommended primary case is high_memory_usage, because it goes through webhook normalization, readiness gate, and trial run from Part 11. autoscale_200pct can be chosen instead if you are building the exam around a duel and file arbitration. Do not mix the two cases in a single exam.
Minimum setup:
- AgentClinic-production received an alert from Grafana or PagerDuty;
- legacy traces are incomplete: some rules are known from post-mortems, some from
QWEN.md, some from oral practice; - automatic remediation looks useful but may violate the blast radius limit, tier budget, or anti-Goodhart invariant;
- before admission, you must prove that the specification, plan, check, and readiness do not contradict each other.
Package Structure
Create a directory:
capstone/
README.md
genealogy.md
poisoned-spec.md
fixed-spec.md
constitution.md
validation.md
judgment.md
budget-note.md
goodhart-note.md
readiness.md
antipattern-audit.md
If working on a real project, names can be adapted. But the roles of the files must remain the same: origin, defect, fix, rules, facts, arbitration, budget, metrics, readiness, and process audit.
Before filling in your own package, open [examples/templates/capstone-dossier.md](examples/templates/capstone-dossier.md). This is the reference "golden path" first pass for high_memory_usage: it shows how many facts are sufficient for the exam, without turning the chapter into a large production document.
Use it as a size limiter. If your capstone/README.md or validation.md becomes noticeably longer than the reference, first check whether artifacts from the full track have crept in: scorebook, metric_network, full out/duel.json, entire budget plan, or detailed chat history.
In chapters 1–12, look for the "How this goes into capstone/" block. It is more important than the full list of chapter artifacts on the first pass. If the block says to transfer one line, one accepted candidate, one defensive invariant, or one readiness verdict, do not expand the evidence package to all files of the full production track.
Before starting, write five placeholder lines in capstone/README.md:
Incident-case:
Main risk:
Key check:
Main blocker:
Next fix:
For the default route, the first line must be Incident-case: high_memory_usage. If autoscale_200pct is chosen, state this immediately and do not add high_memory_usage as a second equal case.
If these lines cannot be filled, the package is not yet assembled around a single case.
Minimum Educational Scenario
Educational Case
Take high_memory_usage from [examples/real-api/](examples/real-api/) as the default route. If autoscale_200pct from [examples/tribunal/](examples/tribunal/) is chosen instead, write this directly in capstone/README.md and do not add high_memory_usage as a second equal case. The goal is to assemble not an ideal production process, but a small reproducible evidence package: one incident, one specification defect, one counterexample or readiness conclusion, one list of blockers.
Preparation
- Read the README of the chosen runnable example.
- Copy the needed templates from [
examples/templates/](examples/templates/). - Create an empty
capstone/directory. - Decide in advance what will count as a blocker: weak
evidence_ref, priority conflict, violation ofmanual_review_floor, budget overrun, or readiness below threshold.
Steps
- Fill in
capstone/genealogy.md: one recovered requirement, at least two sources, confidence level, and open question. - Create
capstone/poisoned-spec.md: introduce exactly one defect — a priority conflict, cycle, or hidden boundary escape.
- Create
capstone/fixed-spec.md: fix the defect with an exception rule, schema, or explicit negative requirement. - Fill in
capstone/constitution.md: at least twoimmutable_principles, onemutable_rulewithttl,max_scope,rollback_condition, and a shortgovernance_protocol. - Run one runnable example for the chosen case.
- For
high_memory_usage— commands from the "Minimum Educational Scenario" section of Part 11: one positive readiness, one blocking stateful, one allowed and one forbidden dry-run. Commands withreadiness_block_stateful.jsonanddelete_namespaceare expected to return exit code 1 — this is not a broken example, but sources of blockers forcapstone/validation.md. - For
autoscale_200pct— three scripts from the "Minimum Educational Scenario" section of Part 8:run_duel.py,check_invariants.py,write_judgment.py.
Full commands are not duplicated here, so the exam does not turn into copy-paste. If you have both chapters open, follow their steps in the same order.
- Transfer the result to
capstone/validation.md: command, expected fact, actual result, and admission blocker. Forreal-api, a positive readiness run shows the admissible path,readiness_block_stateful.jsongives a stateful blocker, anddelete_namespaceshows the boundary of pre-agreed actions. If the command was from another runnable directory, explain which principle is transferred to the main case. - Fill in
capstone/judgment.md: verdictAPPROVE,DENY, orDEFERRED, reason,evidence_ref, next step.judgment.mdis a record of a decision on a specific dispute; a recurring class of conflict is additionally recorded incapstone/precedents.mdwith five fields (case_id/verdict/evidence_ref/applies_to/next_check), see Part 8. - Add
capstone/budget-note.md: what happens iflocal-coderfails, which limit protectsfrontier-reviewer, when emergency mode triggers. - Add
capstone/goodhart-note.md: which target metric may start lying and which guard metric constrains it.
- Fill in
capstone/readiness.md: final score, blocking conditions, why23/25with evidence is better than25/25without it. - Complete the diagnostic checklist from Part 12 and record three risks in
capstone/antipattern-audit.md. - Finish
capstone/README.md: one paragraph of context, list of commands, final status, and list of fixes before production.
After Step 12, re-read capstone/README.md as a new reviewer. It should show not all details, but the verification route: where the requirement came from, what was broken, which command was run, what verdict was obtained, and what blocks production admission.
A minimal capstone/README.md for the first pass fits in five lines:
Incident-case: high_memory_usage
Main risk: auto-remediation without full audit_trace or backup evidence
Key check: python3 scripts/check_readiness.py --readiness fixtures/readiness_block_stateful.json
Main blocker: stateful workload without backup_verified blocks action
Next fix: add evidence_ref for backup and repeat dry-run
Control Fact
The package is exam-ready if another reader can open capstone/README.md and answer five questions without your chat history:
- Which requirement was recovered and where did the evidence come from?
- Which defect was introduced and how was it fixed?
- Which check was actually run?
- Why is the file arbitration verdict or readiness gate exactly what it is?
- What remains a blocker before production?
If at least one question requires the author's oral comment, the package is not yet ready.
Reviewable Trace
Do not transfer out/ from runnable examples into the final package. The final trace is a short capstone/ with files that answer the five questions above. If you are working in your own repository, commit exactly this evidence package, not local run directories.
Quick Questions
Answer in writing, without Qwen Code.
- How does
genealogy.mddiffer fromvalidation.md? - Why must a controllably defective specification contain exactly one defect?
- When can a shadow specification end up in
QWEN.mdbut not inrequirements.md? - Why does
Spec CInot replace the Verifier? - What must
judgment.mdcontain for a dispute to be repeatable?
- Why can
manual_review_floornot be zeroed even with good KPIs? - What makes
token_healthmore useful than a simple count of spent tokens? - Why is a readiness score without
evidence_refnot an admission? - When is
DEFERREDbetter than a formalAPPROVE? - Which antipattern from Part 12 most often destroys your package?
Grading Criteria
Score the package out of 30 points. Five categories of 6 points each reflect the five pillars of production SDD: fact provenance, verifiability, dispute resolution, constraint maintenance, and package clarity. Equal weight means that one strong category does not compensate for a weak one, and within each category 6 items cover typical blind spots without excessive detail.
Provenance and Specification — 6 points
- 1:
genealogy.mdlinks the requirement to at least two sources; - 1: disputed facts are not passed off as approved requirements;
- 1: the poisoned/fixed pair contains one defect and one fix;
- 1: the fix changes a verifiable artifact, not just the explanation;
- 1:
constitution.mdseparates immutable and mutable layers; - 1: a mutable rule has
ttl,max_scope,rollback_condition.
Checks and Facts — 6 points
- 1: at least one runnable example from
book2/examples/was run; - 1: the result is transferred to
validation.mdwith command and expectation; - 1: a negative or blocking scenario is explicitly described;
- 1: Spec CI or its equivalent checks the link between requirement and plan;
- 1: readiness or dry-run does not bypass blocking conditions;
- 1:
out/is not passed off as a reviewable artifact.
Arbitration and Roles — 6 points
- 1:
judgment.mdcontains verdict, reason, andevidence_ref; - 1: Verifier/Implementor/Safety roles are not mixed, Coordinator only maintains
judgment.md; - 1: counterexample is minimal or explicitly noted as non-minimal;
- 1: in case of dispute there is
DEFERREDor a next verifiable step; - 1: precedent is recorded so it can be applied again;
- 1: Safety-veto or its equivalent cannot be overridden by majority vote.
Production Constraints — 6 points
- 1: budget scenario describes failure of the cheap tier;
- 1:
frontier-revieweris limited by risk or quota; - 1: anti-Goodhart pair links KPI and guard metric;
- 1:
manual_review_flooris preserved; - 1: readiness score is accompanied by evidence;
- 1: rollback or blocker is stated before admission.
Package Clarity — 6 points
- 1:
capstone/README.mdexplains the case without external chat; - 1: command list can be repeated locally or replaced with a link to a runnable equivalent;
- 1: blockers are separated from improvements;
- 1: links to chapters and templates help return to the source;
- 1: diagnostic checklist from Part 12 is completed;
- 1: package contains no extra mechanisms unrelated to the chosen case.
25–30 points — production SDD pipeline is ready for team review.
19–24 — pipeline is usable for an educational pass, but needs stronger evidence or blockers.
Below 19 — return to the minimum scenarios of chapters 1–12 and reduce the case size.
What to Do After the Exam
Do not transfer the entire package into production as a whole template. Pick the two or three most useful artifacts and automate them first:
- if requirement provenance is most often lost — start with
genealogy.md; - if CI lets weak specifications through — start with Spec CI;
- if disputes keep recurring — start with
judgment.mdandprecedents.md; - if KPIs start lying — start with anti-Goodhart
validation.md.
The main result of the second volume is not a set of terms, but the habit of demanding a verifiable trace before admitting a dangerous automatic action.