Applied Volume. Production SDD for Qwen Code CLI
This directory is the second, applied volume of the textbook. The first volume in book/ teaches the basic SDD cycle on AgentClinic: constitution, feature specification, plan, verifiable facts, implementation, review, and replanning. The second volume transfers the same cycle into production scenarios: legacy traces, validators, multi-agent checks, Spec CI, metrics, model budgets, and limited auto-remediation.
Version: v1.0 — verified 2026-05-20. See CHANGELOG.md for revision history.
The material is not designed for a first introduction to SDD. Before reading, you need to understand requirements.md, plan.md, validation.md, QWEN.md, feature boundaries, negative requirements, and fact-based verification. If these terms have not yet become part of your working vocabulary, complete the first volume first.
The main rule of the second volume: the first pass should leave one small verifiable trace, not introduce all production terminology at once. In each chapter, first close the educational minimum: one artifact, one command, or one blocker for capstone/. The main graded case is high_memory_usage; rules for transferring local cases into the main one are described in Part 0.
Quick Start
- Open Part 0 and take the main case
high_memory_usage. - Create an empty
capstone/. - In chapters 1–3, fill in
genealogy.md, a poisoned/fixed pair, andconstitution.md. - In chapters 4–11, complete only the "Minimal Educational Scenario" section and runnable commands from [
examples/](examples/); if a chapter uses a different case (autoscale_200pct,node_not_ready,appointment_latency/appointment_latency_spike,cdn_error_budget_burn), write one transfer line — which principle from that case protects the mainhigh_memory_usage. - In chapter 12, check the package against antipatterns.
- In chapter 13, assemble the final
capstone/README.mdand verify that it can be understood without chat history.
Minimal check of examples, including expected blockers:
bash book2/examples/smoke_all.sh
How to Read the Chapters
Chapters 1–12 should be read at the same pace. At the beginning of each chapter, first find the short "Before Reading" block: it answers what the chapter takes from the first volume, which local case it launches, what gets transferred to capstone/, and what belongs to the full track.
Then keep five questions in mind:
- Foundation from the first volume. Which AgentClinic idea is being extended.
- Minimal educational scenario. What to do by hand or run locally.
- Control fact. What proves the chapter has been completed.
- **How this gets into
capstone/.** Which line or file remains after the chapter. - Full track. What will only be needed when deploying to a real production repository.
If a chapter feels dense, don't read it linearly. First complete the minimal scenario, then return to "Key Ideas," and only after that look at calibrations, [project script], and [conceptual interface]. A term that doesn't help fill the current capstone/ file can be skipped until the second pass.
The editorial rule of the second volume: on the first pass, a new chapter should add no more than one new mandatory term to your working vocabulary. If you encounter five more names, but they aren't needed for the current capstone/ file, treat them as reference material and return to them after the minimal scenario.
Practical test for a chapter: after the minimal scenario, the reader should be able to write one line in one capstone/ file. If two new mechanisms need to be understood at once for this, one of them belongs to the second pass or the full track.
Status Labels and Commands
Chapters use the same confidence levels as the first volume:
- Standard — fixed behavior of a tool or established practice.
- Recommendation — practice that works in most cases but allows adaptation.
- Frontier — approach is in use, but the form depends on the team, models, and infrastructure.
Command blocks are divided into three types:
- [runnable] — works locally in [
book2/examples/](examples/) without external dependencies. - [project script] — interface of a script that needs to be implemented in your own project.
- [conceptual interface] — form of a future orchestrator, policy gate, MCP layer, or CI integration.
For educational completion, only [runnable] blocks and manual artifacts are needed. Everything else belongs to the full track.
End-to-End Route
| Chapters | What to do on first pass | What to defer |
|---|---|---|
| 0 | understand AgentClinic-production, choose high_memory_usage, create empty capstone/ | adaptation to your own production domain |
| 1–3 | recover one requirement, show one defect, formalize constitution.md | automatic proof normalizers and rule referendums |
| 4–5 | get a counterexample and smoke result from stress mutator | permanent duel and mutation factory in CI |
| 6–7 | accept/reject shadow candidate, run Spec CI | full scorebook, scope-gate, and PR reports |
| 8–9 | assemble judgment.md, simulate cheap tier refusal | separate budget service and arbitration orchestrator |
| 10–11 | check guard metrics, readiness and dry-run for high_memory_usage | GitOps deploy and automatic remediation without manual confirmation |
| 12 | record three risks blocker / owner / next_check | turning each antipattern into a CI policy |
| 13 | assemble final proof package | production-ready implementation of the entire process |
Mandatory Artifacts for First Pass
Track only these files. Other terms can be read later, once the main package already reads as a single case.
genealogy.md— where the requirement came from.poisoned-spec.md/fixed-spec.md— which defect was found and how it was fixed.constitution.md— which actions are forbidden to the agent or permitted with limitations.validation.md— which facts were actually verified.judgment.md— what verdict was rendered and on what evidence.budget-note.md— what happens when the cheap tier refuses.goodhart-note.md— which metric may start lying and which guard metric constrains it.readiness.md— why the contour is admitted, blocked, or sent to semi-manual mode.antipattern-audit.md— three risks in the formblocker / owner / next_checkafter completing chapter 12.capstone/README.md— final assembly of the package for one case.
Chapter 6 adds a short Shadow notes block to capstone/README.md (or, if you use QWEN.md in your educational repository, there). This is not a separate file in the main list.
Other names (scorebook, metric_network, decision_hash, precedents.md) belong to the full track unless they directly help fill one of the files above.
Each chapter must provide a minimal final fragment for one of these files. If after a chapter you have only general understanding but no line, command, or blocker for capstone/, the chapter is not yet closed at the educational level.
Cross-map of "which chapter writes which capstone/ file":
capstone/ file | Chapter that opens it | Chapters that supplement it |
|---|---|---|
genealogy.md | 1 | 13 (final assembly) |
poisoned-spec.md / fixed-spec.md | 2 | 13 |
constitution.md | 3 | 12 (mutable-rule antipatterns), 13 |
validation.md — happy/negative + counterexample | 4 | 5 (mutants), 7 (Spec CI), 13 |
validation.md — mutation immunity | 5 | 13 |
Shadow notes block in capstone/README.md | 6 | 13 |
validation.md — Spec CI line | 7 | 13 |
judgment.md | 8 | 12 (arbitration antipatterns), 13 |
budget-note.md | 9 | 13 |
goodhart-note.md | 10 | 13 |
readiness.md | 11 | 13 |
antipattern-audit.md | 12 | 13 |
capstone/README.md — assembly | 13 | — |
Before self-grading, open [examples/templates/capstone-dossier.md](examples/templates/capstone-dossier.md). This is a completed benchmark of the minimal package for high_memory_usage: it shows how short a good first pass can be.
Chapter Map
| Chapter | Foundation from first volume | Minimal output |
|---|---|---|
| 0. AgentClinic-production Lab | final project structure and practical exam | chosen case, empty capstone/, smoke command |
| 1. Recovering Specifications from Legacy | supporting an existing project | one entry in genealogy.md |
| 2. Diagnosing Specification Defects | negative requirements and facts | poisoned/fixed pair |
| 3. Project Constitution | mission.md, tech-stack.md, roadmap.md, QWEN.md | two immutable rules and one mutable rule |
| 4. LLM Duel | separate verification session | one counterexample or next_guard |
| 5. Mutation Testing of Specifications | negative path and counterexamples | stress mutator result |
| 6. Shadow Specification Selection | project memory and few-shot | one accepted and one rejected candidate |
| 7. Specification CI | link requirements.md → plan.md → validation.md | Spec CI line with PASS/BLOCK |
| 8. File Arbitration of Disputed Change | independent review | judgment.md with evidence_ref |
| 9. Tiered Budgets and Token Budgets | choosing model by task risk | budget risk and token_health |
| 10. Protecting Metrics from Goodhart | facts instead of persuasive prose | KPI and guard metric |
| 11. Production API | feature boundaries, rollback, manual check | readiness and dry-run |
| 12. Production SDD Antipatterns | SDD antipatterns | three diagnostic risks |
| 13. Practical Exam | full SDD cycle | final capstone/ package |
The full AgentClinic domain map is in Appendix A. Qwen Code command compatibility is described in Appendix B. Checklists are collected in Appendix C.
Why the Case Changes from Chapter to Chapter
The main graded case is high_memory_usage. But chapters 1–10 take different incidents because not every one equally well demonstrates the mechanism being studied: somewhere a priority conflict is easier to see in another domain, somewhere a mutation history is needed that high_memory_usage doesn't have. One case for the entire volume would turn every template into a formality.
The transfer rule is simple: after the chapter, write one line — which principle from that case protects your high_memory_usage.
| Chapter | Chapter case | What transfers to high_memory_usage |
|---|---|---|
| 1 | node_not_ready | technique for recovering a requirement from post-mortem and provenance |
| 2 | appointment_latency | one controlled priority conflict and reverse run |
| 3 | node_not_ready | immutable principle and one mutable rule with ttl and rollback_condition |
| 4 | autoscale_200pct | minimal counterexample and next_guard for violated Then |
| 5 | payment_latency_spike | smoke mutator result and validator immunity vector |
| 6 | shadow.p0.voice_handoff | one accepted and one rejected shadow candidate |
| 7 | incident payload | Spec CI line with PASS on coverage and BLOCK on schema |
| 8 | autoscale_200pct | judgment.md with verdict, evidence_ref, and Safety role |
| 9 | autoscale_200pct | budget risk, token_health, and cheap tier refusal scenario |
| 10 | cdn_error_budget_burn | paired anti-Goodhart metric to remediation KPI |
| 11 | high_memory_usage | readiness 23/25 and dry-run for main case |
| 12 | any package from chapters 8–11 | three lines blocker / owner / next_check |
| 13 | high_memory_usage | assembly of all artifacts into unified capstone/ |
If a chapter case doesn't transfer in one line — the chapter has been read but not closed.
Parts
- AgentClinic-production Lab
- Recovering Specifications from Legacy
- Diagnosing Specification Defects
- Project Constitution: First Rule Referendum
- LLM Duel: Verifier vs Implementor in Formal Claims
- Mutation Testing of Specifications
- Shadow Specification Selection
- Specification CI: Specification as Executable Artifact
- File Arbitration of Disputed Change: Roles, Verdicts, and Precedents
- Model Routing and Token Budgets
- Protecting Metrics from Goodhart: Guard Metrics and Emergency Mode
- Integration with Real API: From Specification to Deploy
- Production SDD Antipatterns: Diagnostic Map of Applied Cycle
- Practical Exam: Assemble Production SDD Contour
Accompanying Documents
- Applied Volume Glossary — definitions of second volume terms.
- Applied Volume Changelog — revision history of the text.
- Instructor Note — workshop formats and typical errors.
- Bridges to First Volume — prerequisites and AgentClinic domain map.
- Qwen Code Compatibility — built-in commands, custom commands, and project scripts.
- Applied SDD Checklists — checks for Spec CI, arbitration, metrics, and production readiness.
- Threshold Calibration — "Low / Default / High" tables, threshold shift exercises, and review signals for chapters 5, 6, 9, 10, 11. Not needed on first pass.
- Runnable Examples — local smoke runs and templates.
What Counts as Success
By the end of the applied volume, the result should not be a set of beautiful rules but a reproducible contour:
- disputed requirements have provenance and uncertainty level;
- dangerous automations are constrained by constitution, guardrails, and rollback conditions;
validation.mdchecks happy path, negative path, counterexamples, drift, and Goodhart traps;- CI or its runnable analog blocks uncovered requirements and weak payload contracts;
- agent decisions leave evidence suitable for review by another human or another model;
- final
capstone/shows one path from legacy trace to production-ready solution with explicit blockers and fix plan.