Reading: Applied Part 13. Practical Exam: Build a Production SDD Pipeline

Lesson 1 of 5 in module «Applied Part 13. Practical Exam: Build a Production SDD Pipeline»
You are viewing the lesson without signing in. Sign in to save progress and take tests.

Applied Part 13. Practical Exam: Assemble a Production SDD Pipeline

Status: Recommendation. This part does not introduce a new mechanism. It assembles the second volume into a single verifiable route, following the pattern of the practical exam of the first volume. The goal is to prove that you can walk through a production SDD scenario from legacy traces to a solution that is admitted by facts, not by agent confidence.

The exam is best taken after chapters 1–12. If you are reading the volume selectively, use this part as a map of missing artifacts: any gap in the capstone/ package shows which chapter you need to return to. If it is unclear how to connect files into a single case, return to Part 0: it sets the AgentClinic-production lab frame and explains what counts as the educational minimum.

Goal

By the end of the exam, you should have one coherent evidence package for AgentClinic-production:

  • a recovered requirement with provenance;
  • a controllably defective specification, fixed;
  • constitution.md with immutable and mutable rules;
  • at least one counterexample and one duel record;
  • a local Spec CI or its runnable equivalent;
  • judgment.md or a precedent record;
  • budget and anti-Goodhart controls;
  • a readiness gate and a list of blockers;
  • a diagnostic antipattern checklist.

The exam is considered passed not when all files look complete, but when another person can open the package, repeat the key checks, and understand why the solution is safe to admit or why it must be deferred.

Final Case

Work with one production incident. The recommended primary case is high_memory_usage, because it goes through webhook normalization, readiness gate, and trial run from Part 11. autoscale_200pct can be chosen instead if you are building the exam around a duel and file arbitration. Do not mix the two cases in a single exam.

Minimum setup:

  • AgentClinic-production received an alert from Grafana or PagerDuty;
  • legacy traces are incomplete: some rules are known from post-mortems, some from QWEN.md, some from oral practice;
  • automatic remediation looks useful but may violate the blast radius limit, tier budget, or anti-Goodhart invariant;
  • before admission, you must prove that the specification, plan, check, and readiness do not contradict each other.

Package Structure

Create a directory:

capstone/
  README.md
  genealogy.md
  poisoned-spec.md
  fixed-spec.md
  constitution.md
  validation.md
  judgment.md
  budget-note.md
  goodhart-note.md
  readiness.md
  antipattern-audit.md

If working on a real project, names can be adapted. But the roles of the files must remain the same: origin, defect, fix, rules, facts, arbitration, budget, metrics, readiness, and process audit.

Before filling in your own package, open [examples/templates/capstone-dossier.md](examples/templates/capstone-dossier.md). This is the reference "golden path" first pass for high_memory_usage: it shows how many facts are sufficient for the exam, without turning the chapter into a large production document.

Use it as a size limiter. If your capstone/README.md or validation.md becomes noticeably longer than the reference, first check whether artifacts from the full track have crept in: scorebook, metric_network, full out/duel.json, entire budget plan, or detailed chat history.

In chapters 1–12, look for the "How this goes into capstone/" block. It is more important than the full list of chapter artifacts on the first pass. If the block says to transfer one line, one accepted candidate, one defensive invariant, or one readiness verdict, do not expand the evidence package to all files of the full production track.

Before starting, write five placeholder lines in capstone/README.md:

Incident-case:
Main risk:
Key check:
Main blocker:
Next fix:

For the default route, the first line must be Incident-case: high_memory_usage. If autoscale_200pct is chosen, state this immediately and do not add high_memory_usage as a second equal case.

If these lines cannot be filled, the package is not yet assembled around a single case.

Minimum Educational Scenario

Educational Case

Take high_memory_usage from [examples/real-api/](examples/real-api/) as the default route. If autoscale_200pct from [examples/tribunal/](examples/tribunal/) is chosen instead, write this directly in capstone/README.md and do not add high_memory_usage as a second equal case. The goal is to assemble not an ideal production process, but a small reproducible evidence package: one incident, one specification defect, one counterexample or readiness conclusion, one list of blockers.

Preparation

  • Read the README of the chosen runnable example.
  • Copy the needed templates from [examples/templates/](examples/templates/).
  • Create an empty capstone/ directory.
  • Decide in advance what will count as a blocker: weak evidence_ref, priority conflict, violation of manual_review_floor, budget overrun, or readiness below threshold.

Steps

  1. Fill in capstone/genealogy.md: one recovered requirement, at least two sources, confidence level, and open question.
  2. Create capstone/poisoned-spec.md: introduce exactly one defect — a priority conflict, cycle, or hidden boundary escape.
  1. Create capstone/fixed-spec.md: fix the defect with an exception rule, schema, or explicit negative requirement.
  2. Fill in capstone/constitution.md: at least two immutable_principles, one mutable_rule with ttl, max_scope, rollback_condition, and a short governance_protocol.
  3. Run one runnable example for the chosen case.
  • For high_memory_usage — commands from the "Minimum Educational Scenario" section of Part 11: one positive readiness, one blocking stateful, one allowed and one forbidden dry-run. Commands with readiness_block_stateful.json and delete_namespace are expected to return exit code 1 — this is not a broken example, but sources of blockers for capstone/validation.md.
  • For autoscale_200pct — three scripts from the "Minimum Educational Scenario" section of Part 8: run_duel.py, check_invariants.py, write_judgment.py.

Full commands are not duplicated here, so the exam does not turn into copy-paste. If you have both chapters open, follow their steps in the same order.

  1. Transfer the result to capstone/validation.md: command, expected fact, actual result, and admission blocker. For real-api, a positive readiness run shows the admissible path, readiness_block_stateful.json gives a stateful blocker, and delete_namespace shows the boundary of pre-agreed actions. If the command was from another runnable directory, explain which principle is transferred to the main case.
  2. Fill in capstone/judgment.md: verdict APPROVE, DENY, or DEFERRED, reason, evidence_ref, next step. judgment.md is a record of a decision on a specific dispute; a recurring class of conflict is additionally recorded in capstone/precedents.md with five fields (case_id / verdict / evidence_ref / applies_to / next_check), see Part 8.
  3. Add capstone/budget-note.md: what happens if local-coder fails, which limit protects frontier-reviewer, when emergency mode triggers.
  4. Add capstone/goodhart-note.md: which target metric may start lying and which guard metric constrains it.
  1. Fill in capstone/readiness.md: final score, blocking conditions, why 23/25 with evidence is better than 25/25 without it.
  2. Complete the diagnostic checklist from Part 12 and record three risks in capstone/antipattern-audit.md.
  3. Finish capstone/README.md: one paragraph of context, list of commands, final status, and list of fixes before production.

After Step 12, re-read capstone/README.md as a new reviewer. It should show not all details, but the verification route: where the requirement came from, what was broken, which command was run, what verdict was obtained, and what blocks production admission.

A minimal capstone/README.md for the first pass fits in five lines:

Incident-case: high_memory_usage
Main risk: auto-remediation without full audit_trace or backup evidence
Key check: python3 scripts/check_readiness.py --readiness fixtures/readiness_block_stateful.json
Main blocker: stateful workload without backup_verified blocks action
Next fix: add evidence_ref for backup and repeat dry-run

Control Fact

The package is exam-ready if another reader can open capstone/README.md and answer five questions without your chat history:

  1. Which requirement was recovered and where did the evidence come from?
  2. Which defect was introduced and how was it fixed?
  3. Which check was actually run?
  4. Why is the file arbitration verdict or readiness gate exactly what it is?
  5. What remains a blocker before production?

If at least one question requires the author's oral comment, the package is not yet ready.

Reviewable Trace

Do not transfer out/ from runnable examples into the final package. The final trace is a short capstone/ with files that answer the five questions above. If you are working in your own repository, commit exactly this evidence package, not local run directories.

Quick Questions

Answer in writing, without Qwen Code.

  1. How does genealogy.md differ from validation.md?
  2. Why must a controllably defective specification contain exactly one defect?
  3. When can a shadow specification end up in QWEN.md but not in requirements.md?
  4. Why does Spec CI not replace the Verifier?
  5. What must judgment.md contain for a dispute to be repeatable?
  1. Why can manual_review_floor not be zeroed even with good KPIs?
  2. What makes token_health more useful than a simple count of spent tokens?
  3. Why is a readiness score without evidence_ref not an admission?
  4. When is DEFERRED better than a formal APPROVE?
  5. Which antipattern from Part 12 most often destroys your package?

Grading Criteria

Score the package out of 30 points. Five categories of 6 points each reflect the five pillars of production SDD: fact provenance, verifiability, dispute resolution, constraint maintenance, and package clarity. Equal weight means that one strong category does not compensate for a weak one, and within each category 6 items cover typical blind spots without excessive detail.

Provenance and Specification — 6 points

  • 1: genealogy.md links the requirement to at least two sources;
  • 1: disputed facts are not passed off as approved requirements;
  • 1: the poisoned/fixed pair contains one defect and one fix;
  • 1: the fix changes a verifiable artifact, not just the explanation;
  • 1: constitution.md separates immutable and mutable layers;
  • 1: a mutable rule has ttl, max_scope, rollback_condition.

Checks and Facts — 6 points

  • 1: at least one runnable example from book2/examples/ was run;
  • 1: the result is transferred to validation.md with command and expectation;
  • 1: a negative or blocking scenario is explicitly described;
  • 1: Spec CI or its equivalent checks the link between requirement and plan;
  • 1: readiness or dry-run does not bypass blocking conditions;
  • 1: out/ is not passed off as a reviewable artifact.

Arbitration and Roles — 6 points

  • 1: judgment.md contains verdict, reason, and evidence_ref;
  • 1: Verifier/Implementor/Safety roles are not mixed, Coordinator only maintains judgment.md;
  • 1: counterexample is minimal or explicitly noted as non-minimal;
  • 1: in case of dispute there is DEFERRED or a next verifiable step;
  • 1: precedent is recorded so it can be applied again;
  • 1: Safety-veto or its equivalent cannot be overridden by majority vote.

Production Constraints — 6 points

  • 1: budget scenario describes failure of the cheap tier;
  • 1: frontier-reviewer is limited by risk or quota;
  • 1: anti-Goodhart pair links KPI and guard metric;
  • 1: manual_review_floor is preserved;
  • 1: readiness score is accompanied by evidence;
  • 1: rollback or blocker is stated before admission.

Package Clarity — 6 points

  • 1: capstone/README.md explains the case without external chat;
  • 1: command list can be repeated locally or replaced with a link to a runnable equivalent;
  • 1: blockers are separated from improvements;
  • 1: links to chapters and templates help return to the source;
  • 1: diagnostic checklist from Part 12 is completed;
  • 1: package contains no extra mechanisms unrelated to the chosen case.

25–30 points — production SDD pipeline is ready for team review.

19–24 — pipeline is usable for an educational pass, but needs stronger evidence or blockers.

Below 19 — return to the minimum scenarios of chapters 1–12 and reduce the case size.

What to Do After the Exam

Do not transfer the entire package into production as a whole template. Pick the two or three most useful artifacts and automate them first:

  • if requirement provenance is most often lost — start with genealogy.md;
  • if CI lets weak specifications through — start with Spec CI;
  • if disputes keep recurring — start with judgment.md and precedents.md;
  • if KPIs start lying — start with anti-Goodhart validation.md.

The main result of the second volume is not a set of terms, but the habit of demanding a verifiable trace before admitting a dangerous automatic action.

My notes
0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course menu

Course

Production SDD for Qwen Code CLI. Part 2
Progress 0 / 100