Topic: Applied Part 2. Diagnosing Specification Defects
Difficulty level: Medium
Estimated study time: 8-12 hours (theory: 3 hours, practice: 5-9 hours)
Prerequisites: Part 7 of Volume 1: negative requirements
Part 20 of Volume 1: SDD antipatterns
Part 11 of Volume 1: introduction to the appointments-api domain and Hono JSX
Basic proficiency in YAML/JSON and JSON Schema
Experience working with Qwen Code in Plan Mode
Understanding of the SDD lifecycle: Specify → Plan → Tasks → Implement
Learning objectives: Independently introduce exactly one controlled defect into an incident triage specification, classifying it as cycle, priority_conflict, or hidden_out_of_scope
Diagnose Qwen Code's stuck point through measurable metrics ask_storm, stage_regress, and phase_context_loss with logging in validation.md
Formulate a verifiable override rule that resolves a priority conflict, and translate it into Given/When/Then + JSON Schema
Perform a reverse run of the full SDD loop Specify → Plan → Tasks → Implement to verify the resilience of the fix
Create a poisoned/fixed specification pair with a single recovery line in validation.md, suitable for transfer to the capstone project
Overview: This study guide covers the engineering technique of "controlled-defect specification" — a method of intentionally introducing one defect into an incident triage specification to test the diagnostic capabilities of an agent (Qwen Code) and practice requirement recovery. The technique is related to mutation testing, but applied at the specification level rather than the code level. Key principle: one mutation, one symptom, one recovery criterion. The chapter builds on materials from Volume 1 (negative requirements and SDD antipatterns) and develops them toward a controlled experiment: instead of random failures, you produce a predictable, measurable, and fixable defect. The study case is rising latency on the appointments-api route; the capstone case is high_memory_usage. The result is a working technique of reproducible specification destruction and recovery as a resilient contract.
Key concepts: Controlled defect: One intentionally introduced defect in a specification, limited in class and location. Contrasted with random errors or accumulation of multiple defects. Main rule: one mutation per iteration, one expected symptom, one recovery criterion. Used to test agent diagnostic capabilities and practice recovery technique.
Poisoned specification: A study label for a specification with one controlled defect. Should not be taken literally: "poison" is not chaos, but a strictly measured mutation. Created as poisoned-spec.md and fixed in fixed-spec.md with logging in validation.md.
Poisoned/fixed pair: A pair of artifacts: poisoned-spec.md (specification with defect) and fixed-spec.md (specification after fix). The difference between them is exactly one mutation and its patch. The minimum qualifying set for capstone includes defect class, recovery line, and confirmed absence of conflict reproduction.
Defect classes — cycle: A cyclic dependency between states in a triage specification. Example: WAIT_APPROVAL → VALIDATE_ESCALATION → WAIT_APPROVAL. The agent gets stuck, unable to exit the state machine. Verified through cycle detection in the state graph (find_spec_loops.py).
Defect classes — priority conflict: Two rules with the same priority leading to mutually exclusive actions. Example: "escalate P0 in 30 seconds" (priority=100) and "any escalation requires human approval" (priority=100). The agent cannot resolve which rule to apply. Verified through check_rule_priority.py.
Defect classes — hidden out of scope: An action that a requirement forces to be performed, though it is forbidden in constraints. Example: creating a Jira ticket in an acceptance test when Jira is prohibited in constraints. The defect is "hidden" in semantics, not syntax. Requires verification through JSON Schema and acceptance tests.
Metric ask storm: Diagnostic sign: repeated clarifying requests from the agent without new data appearing. Counted as +1 for each new message requesting data already named in previous messages of the current session. Resets when a new field is added to requirements.md or clarifications.md. A heuristic for manual counting on the first pass; automated through transcript parser qwen --output-format json.
Metric stage regress: Diagnostic sign: regression of the current SDD phase (specify/plan/tasks/implement) to a previous phase without explicit recording of the reason in validation.md. Counted as +1 for each rollback. Indicates specification instability or unresolved conflict in requirements.
Metric phase context loss: Diagnostic sign: loss of phase context when the agent in a new phase references a rule absent from the current requirements.md or plan.md. Boolean value: true/false. Shows that the specification does not maintain continuity between SDD phases.
Override rule: A formal strategy for resolving a priority conflict. Defines which requirement wins at the edge of time or criticality. Example: p0_time_critical_override above manual_gate_for_noncritical. Must be verifiable through JSON Schema and reflected in acceptance tests.
Single source of truth: Principle for eliminating divergence between specification text, JSON Schema, and acceptance tests. Priorities declared in YAML must reference the same hierarchy from tests and schema, without parallel interpretation.
Verification invariant: Recording of state before and after a critical transition. For escalation: before — severity, deadline, owner_state; after — channel, audit_record, reason_code. Ensures traceability and prevents a formal "resolution" of conflict with loss of control.
Reverse run of SDD loop: Full verification of Specify → Plan → Tasks → Implement after fixing a defect. Necessary to confirm that the defect did not migrate from rules into decomposition (Tasks) or from decomposition into implementation (Implement). Considered reliable only as a repeatable result: two consecutive runs without new ask_storm, stage_regress, and priority conflicts.
Recovery line in validation.md: Formal record of a fact that ceased to reproduce after the fix. Example: priority_conflict=false && escalation_path_resolved=P0 && audit_required=true. Serves as a readiness criterion and control fact for review.
Practice exercises: Name: Exercise 1: Creating poisoned-spec.md with a priority conflict
Problem: Take the study case appointment_latency. Create a file poisoned-spec.md (or specs/appointment-latency-poisoned.yaml) with two conflicting rules: (1) P0 escalates in 30 seconds, priority=100; (2) any escalation requires human approval, priority=100. Both rules must be formal (in YAML/JSON), not in comments. Record the expected symptom before launching the agent.
Solution: Step 1: Copy the base specification from Part 11 of Volume 1. Step 2: Add two rules with the same priority=100 leading to mutually exclusive actions. Step 3: Ensure the conflict is visible in data: run python3 scripts/spec_ci/check_rule_priority.py — it should show a collision. Step 4: In validation.md record the expected symptom: ask_storm >= 4 || stage_regress >= 2 || escalation_path_resolved=false. Step 5: Launch Qwen Code in Plan Mode without changing files, record actual metrics.
Complexity: beginner
Name: Exercise 2: Diagnosing a cycle through find_spec_loops.py
Problem: Introduce a cyclic dependency in the specification: WAIT_APPROVAL → VALIDATE_ESCALATION → WAIT_APPROVAL. Launch the agent and record how cycle manifests. Use [project script] find_spec_loops.py for visualization. Compare agent behavior with cycle and with priority_conflict: how do ask_storm and stage_regress metrics differ?
Solution: Step 1: In poisoned-spec.md replace the linear transition with a cyclic one: VALIDATE_ESCALATION.when = 'escalation_attempted == true', then = 'goto WAIT_APPROVAL'. Step 2: Run find_spec_loops.py — get a DOT graph with a back edge. Step 3: Launch the agent in Plan Mode, record: with cycle stage_regress dominates (agent returns to the same stage), with priority_conflict — ask_storm (agent repeatedly clarifies the same thing). Step 4: In validation.md record the difference: cycle_count > 0 && stage_regress >= 3 for cycle versus ask_storm >= 4 && escalation_path_resolved=false for priority_conflict.
Complexity: intermediate
Name: Exercise 3: Resolving a conflict through an override rule and JSON Schema
Problem: Fix priority_conflict in appointment_latency through p0_time_critical_override. The rule must: (a) permit auto-escalation for P0 when owner is unavailable; (b) preserve human_audit_required=true as post-factum; (c) leave manual barrier blocking for P1-P3. Translate the solution into JSON Schema that prevents regression to hidden manual confirmation.
Solution: Step 1: In fixed-spec.yaml create p0_time_critical_override: priority=100, when: severity=P0 && owner_unresponsive=true, then: escalation=critical_phone, human_audit_required=true. Step 2: Lower priority of human_gate_noncritical to 10 for P1-P3. Step 3: In JSON Schema add if-then condition: if severity=P0 && owner_unresponsive=true, then required: [auto_escalation_channel, human_audit_required, reason_code] with constants critical_phone, true, time_critical_override. Step 4: Run lint_spec.py — check atomicity. Step 5: Run check_rule_priority.py — ensure conflict is exhausted. Step 6: In validation.md record recovery line: priority_conflict=false && escalation_path_resolved=P0 && audit_required=true.
Complexity: intermediate
Name: Exercise 4: Full reverse run and transfer to capstone
Problem: Perform a reverse run of Specify → Plan → Tasks → Implement for the fixed specification. Record that no phase returns the original conflict. Transfer the result to capstone/ for the high_memory_usage case: adapt priority_conflict class to the conflict between restart_pod and human approval.
Solution: Step 1: Launch Qwen Code sequentially: specify (requirements.md), plan (plan.md), tasks (tasks.md), implement. After each phase record metrics in validation.md. Step 2: If Tasks create incompatible actions — the defect migrated into decomposition, redo the plan. If Implement passes but tests fail — the boundary of acceptable behavior is incompletely described. Step 3: For capstone/high_memory_usage: defect_class=priority_conflict, poisoned: memory_percent>=90 permits restart_pod, but any restart_pod requires human approval with the same priority. Step 4: fixed: restart_pod permitted as pre-approved action for stateless pod, first production launch requires human_review_for_first_run=true. Step 5: validation: priority_conflict=false && action=restart_pod && human_review_for_first_run=true.
Complexity: advanced
Name: Exercise 5: Hidden out of scope (hidden_out_of_scope)
Problem: Create a specification where a requirement forces creation of a Jira ticket, though constraints prohibit Jira. The defect must be "hidden" — syntactically the specification is correct, semantically it violates constraints. Diagnose through phase_context_loss.
Solution: Step 1: In constraints.md add forbidden_integrations: [jira]. Step 2: In poisoned-spec.md add rule: when: severity=P2, then: create_ticket_in=jira. Step 3: Launch the agent — it may formally continue, but in Implement phase reference jira, which is not in the permitted list. Step 4: Record phase_context_loss=true: agent in Implement references a rule absent from current constraints. Step 5: Fix: replace jira with a permitted channel (e.g., internal_tracker) or add an exception to constraints. Step 6: Verify through JSON Schema: forbidden_integrations must be an enum with a controlled list, and then — a reference to this enum.
Complexity: advanced
Case studies: Name: Case: Rising latency of appointments-api and diagnosing priority_conflict in a production team
Scenario: Online doctor appointment platform (domain from Part 11 of Volume 1). The appointments-api route showed p95 latency growth to 2.5 seconds. The SRE team received an alert severity=P0. The triage specification required: (1) escalate P0 in 30 seconds; (2) any escalation requires human approval. Both rules had priority=100. The on-call engineer was on a plane, confirmation impossible. Qwen Code, used as an incident planner, got stuck in a loop of clarifications.
Challenge: The agent asked "clarify if owner is available" 12 times, though owner_unresponsive=true was stated in the first message (ask_storm=12). Then the agent returned from Plan phase to Specify, proposing to "reconsider requirements" (stage_regress=2). Escalation did not happen for 4 minutes instead of 30 seconds. The team manually escalated by phone, losing SLA and audit trail.
Solution: After the incident the team applied the controlled-defect specification technique. Created poisoned-spec.md with the reproduced conflict. In fixed-spec.md introduced p0_time_critical_override: for P0 with owner_unresponsive=true escalation is permitted immediately via critical_phone, human_audit_required moved to post-factum audit. For P1-P3 the manual barrier was kept. JSON Schema enforced required fields auto_escalation_channel, human_audit_required, reason_code. Performed reverse run: Specify → Plan → Tasks → Implement — metrics ask_storm=0, stage_regress=0, phase_context_loss=false.
Result: P0 reaction time decreased from 4 minutes to 23 seconds. Audit trail restored: all auto-escalations have reason_code=time_critical_override. Specification became repeatably testable: the team monthly runs the poisoned/fixed pair in study mode. Method transferred to capstone project high_memory_usage with adaptation to restart_pod vs human approval.
Lessons learned: A priority conflict with identical priority=100 does not manifest during normal operation, but becomes fatal in a stress scenario (owner_unresponsive)
Metrics ask_storm and stage_regress allow quantitative distinction between a specification defect and "bad agent behavior"
An override rule must change a verifiable requirement, not just a textual explanation — otherwise the agent will revert to the old interpretation
Reverse run of the full loop is necessary: a local fix in requirements.md may migrate the defect into plan.md or tasks.md
JSON Schema as "single source of truth" prevents hidden agreements through adjacent agent steps
Related concepts: priority_conflict
ask_storm
stage_regress
p0_time_critical_override
reverse run of SDD loop
single source of truth
recovery line in validation.md
Name: Case: Adapting the technique to high_memory_usage in a capstone project
Scenario: Student graduation project for the SDD course. Domain: cloud infrastructure, alert high_memory_usage (memory_percent >= 90 for 10 minutes). The student needed to demonstrate mastery of specification defect diagnosis on a new domain, preserving the defect class from the study case.
Challenge: Direct copying of appointment_latency is inapplicable: different entities (pod, restart, memory), different risks (stateful vs stateless, data loss on restart). Needed to adapt priority_conflict while preserving structure: two rules with the same priority leading to mutually exclusive actions, and formal resolution through override.
Solution: Student created poisoned-spec.md: memory_percent>=90 permits restart_pod, but any restart_pod requires human approval, both with priority=100. In fixed-spec.md introduced stateless/stateful distinction: restart_pod — pre-approved action for stateless pod, for first production launch — human_review_for_first_run=true. This preserved audit but unblocked auto-action for a known configuration. JSON Schema required fields pod_type, first_run_flag, review_record. Recovery line: priority_conflict=false && action=restart_pod && human_review_for_first_run=true.
Result: Reviewer confirmed pass: defect class identified, patch changes a verifiable rule, reverse run is clean. Student noted that the main difficulty is not introducing the defect, but limiting it to exactly one: first attempts also included a cycle in the pod state graph, making diagnosis indistinguishable.
Lessons learned: Transferring a defect class to a new domain requires rethinking entities, but preserving conflict structure
The "one defect per iteration" constraint is the hardest rule in practice; natural tendency to "strengthen" the study case leads to an indistinguishable trace
human_review_for_first_run=true — an example of an override rule that scales: it works for memory, and appointment, and new domains
Capstone review values not artifact volume, but precision: exactly one defect, one recovery line, repeatable result
Related concepts: poisoned/fixed pair
priority_conflict
human_review_for_first_run
stateless vs stateful
recovery line
capstone transfer
Study tips: Start with manual metric counting (pencil in validation.md), not automation. Understanding the heuristic is more important than the script: you must feel when ask_storm is a specification defect, and when it is incompleteness of external world context
Create poisoned-spec.md before launching the agent and record the expected symptom in advance. This prevents post-factum rationalization: "aha, it was supposed to get stuck" — such logic deprives the technique of controllability
Use the study minimum: appointment_latency for the first pass, high_memory_usage only for capstone. Do not try to master both domains in parallel — difference in entities (latency vs memory, escalation vs restart) distracts from learning the technique itself
Verify the defect through project scripts (find_spec_loops.py, check_rule_priority.py) before launching Qwen Code. If the script doesn't catch the collision — the defect is hidden in a comment or natural language, not in the formal specification
During reverse run record metrics after each phase, not at the end. The defect may migrate: Specify clean → Plan clean → Tasks create incompatible actions. Telltale sign: stage_regress at the Plan→Tasks boundary
Practice translation into Given/When/Then and JSON Schema in parallel, not sequentially. They close different sides of the problem: GWT — verifiable scenario, JSON Schema — prohibition of invalid states. One without the other leaves a loophole for the agent
For classroom learning: split into pairs — one creates poisoned-spec, the other diagnoses without hints. Switch roles after debrief. This simulates a real situation where the specification author is "blind" to their own defect
Keep a "mutation journal": record all attempts at introducing defects, even failed ones. Common error — "the agent didn't get stuck, though it should have." This is not failure, but data: perhaps the defect was in a comment, or priorities differed, or the agent used implicit knowledge. The journal helps calibrate intuition
Don't postpone metrics ask_storm, stage_regress, phase_context_loss "for later." On the first pass they seem vague, but they are what transforms "strange agent response" into a diagnostic artifact. Practice on short sessions (3-5 messages)
For self-check before exam: ensure that the diff between poisoned and fixed touches exactly one verifiable rule. If the diff is only in comments or textual explanations, the fix does not count
Additional resources: Part 7 of Volume 1 (negative requirements): Basic theory underlying defect diagnosis. Necessary for understanding why a priority conflict is a negative requirement that escaped control
Part 20 of Volume 1 (SDD antipatterns): Catalog of classic specification errors, serving as a source of mutations. Cycle, priority_conflict, hidden_out_of_scope — derivatives of these antipatterns
Part 11 of Volume 1 (second phase project): Original domain appointments-api, Hono JSX agents page. Context for the study case appointment_latency
Examples/spec-ci/readme.md: Runnable analog of the basic specification gateway (Spec CI). Automatic checking of requirement and plan form, close to the runnable analog from the chapter
Book2/examples/templates/validation.md: Template form for recording verification. Used for logging expected symptom and recovery line
Github spec kit quickstart (https://github.github.io/spec-kit/quickstart.html): Official description of phases Specify → Plan → Tasks → Implement. Link from the chapter to external specification of the SDD approach
[project script] find spec loops.py: Script for finding cycles in specification state graph. Generates DOT graph for visualization
[project script] check rule priority.py: Script for checking priority collisions in YAML specification. Catches priority_conflict by numerical priority match
[project script] lint spec.py: Script for checking specification atomicity. Used during reverse run for validating fixed version
Qwen --output-format json + aggregator script: Path to automation of metrics ask_storm, stage_regress, phase_context_loss. On study pass — transcript parser, on advanced level — CI metric
Summary: The key achievement of this chapter is transforming specification from a set of wishes into a resilient, reproducible contract. The controlled-defect specification technique (poisoned/fixed pair) provides four tools: (1) one defect per iteration with a measurable symptom; (2) stuck-point diagnosis through ask_storm, stage_regress, phase_context_loss; (3) formal conflict resolution through override rules, Given/When/Then, and JSON Schema; (4) reverse run of the full SDD loop to verify resilience. Study minimum — poisoned-spec.md, fixed-spec.md, and recovery line in validation.md for appointment_latency; capstone transfer — the same priority_conflict class to high_memory_usage. The next chapter will formalize these rules in constitution.md as the first project referendum.