Applied Part 2. Diagnosing Specification Defects
Status: Recommendation. Injecting one controlled defect into the specification is a training technique close to mutation testing. Specific defect classes (cycle, priority_conflict, hidden_out_of_scope) are used in projects but not standardized. Stuckness metrics (ask_storm, stage_regress) are frontier.
The engineering name for the technique is controlled-defect specification: you intentionally introduce one defect to test diagnosis. The text sometimes uses the short label "poisoned specification," but it should not obscure the main rule: one mutation, one symptom, one recovery criterion.
This chapter continues two basic ideas from the first volume: negative requirements from Part 7 and antipatterns from Part 20. The difference is that now the defect is introduced intentionally and bounded in advance. Do not try to test the entire triage process here: the training minimum is a poisoned/fixed pair and one recovery line in validation.md.
Before Reading
- Foundation from the first volume: Part 7 provides negative requirements, Part 20 provides SDD antipatterns.
- Local training case:
appointment_latency, because priority conflict is visible without external infrastructure. - Trail for
capstone/: poisoned/fixed pair forhigh_memory_usageand one recovery line invalidation.md. - Key term for the first pass: controlled defect.
- What to defer: metrics
ask_storm,stage_regress, full reverse run, and automatic cycle detection.
The boundary with neighboring techniques is simple. This chapter is one manually introduced defect and one stuckness symptom. Chapter 4 is one minimal counterexample to formal Then. Chapter 5 is many deterministic mutants for testing the validator. Chapter 8 is the official protocol of dispute, evidence, and precedent.
The chapter scenario is rising latency of the appointments-api route, the agent page on Hono JSX that first appeared in Part 11 of the first volume. The same domain, but under stress. The catalog of classic errors that mutations build on is in Part 20. SDD Antipatterns.
Goal
After this chapter you will be able to intentionally break the incident triage specification, identify the stuckness point of Qwen Code, and bring the specification to a stable, reproducible state.
The training value is not in getting perfect triage immediately. The goal is to learn to produce a controlled failure, read its traces, and fix the root cause in requirements. The result will be a working technique:
- one defect per iteration;
- measurable deadlock diagnosis;
- formal resolution of contradictions;
- reverse run of the full SDD loop
Specify → Plan → Tasks → Implement.
Minimal Training Scenario
Training Case
Incident appointment_latency: the specification simultaneously requires "escalate P0 within 30 seconds" and "wait for manual confirmation before any escalation." You need to capture one priority conflict and fix it with an exception rule.
Preparation
book2/examples/templates/validation.md— form for recording the check.- Two short files or sections:
poisoned-spec.mdandfixed-spec.md. - One expected symptom:
ask_storm,stage_regress, orphase_context_loss.
Minimal poisoned/fixed pair for the first pass:
poisoned:
REQ-LAT-01: latency_p95 >= 2s and severity=P0 require escalation within 30 seconds. priority=100
REQ-LAT-02: any escalation requires prior human approval. priority=100
fixed:
REQ-LAT-01: for severity=P0, p0_time_critical_override applies.
REQ-LAT-02: under p0_time_critical_override, escalation is permitted immediately, but human_audit_required=true.
REQ-LAT-03: for P1-P3, prior human approval remains blocking.
These lines can be placed in training poisoned-spec.md and fixed-spec.md for the local case appointment_latency. If the final assessment uses high_memory_usage, transfer to capstone/ only the defect class and the recovery line from the block below. Change only one defect at a time: here it is a priority conflict.
Steps
- In
poisoned-spec.md, write two conflicting rules with the samepriority. Expectation: the defect is visible in the data, not hidden in a comment. - Before running analysis, record the expected symptom: for example,
priority_conflict=true && escalation_path_resolved=false. - Perform manual review or a Plan Mode query to Qwen Code without changing files. Expectation: the model points out the conflict or loses the move exactly at the disputed point.
- In
fixed-spec.md, addp0_time_critical_overrideand move manual verification to post-factum audit. - In
validation.md, record two facts: the original conflict was found, the fixed path preserveshuman_audit_required=true. - Compare the result with the runnable analog of Spec CI from [
examples/spec-ci/](examples/spec-ci/README.md), if you want to check the form of requirements and plan automatically.
Control Fact
The fix changes the verifiable rule, not just the explanation. In validation.md there is a recovery line: priority_conflict=false && escalation_path_resolved=P0 && audit_required=true.
How This Goes Into capstone/
Transfer exactly one defect to capstone/poisoned-spec.md, and exactly one fix to capstone/fixed-spec.md. Add the recovery line to capstone/validation.md. Do not transfer a long Plan Mode trace: for assessment, the defect class, patch, and the fact that the conflict no longer reproduces matter.
Minimal fragment (the same priority_conflict class transferred from appointment_latency to the main assessment case high_memory_usage: restart_pod permission and human approval requirement conflict with the same priority):
- defect_class: priority_conflict
- poisoned: memory_percent >= 90 for 10 minutes permits restart_pod, but any restart_pod requires prior human approval with the same priority.
- fixed: restart_pod is permitted as a pre-approved action only for stateless pod, and the first production launch requires human_review_for_first_run=true.
- validation: priority_conflict=false && action=restart_pod && human_review_for_first_run=true
Reviewable Trail
In the training package, keep the poisoned-spec.md / fixed-spec.md pair and the record in validation.md. out/* outputs are not needed if they were obtained only by a local project script.
Key Ideas
Introduce exactly one defect type per iteration. By "defect" here is meant one of three controlled specification mutations:
- cycle — cyclic dependency between states (for example,
WAIT_APPROVAL → VALIDATE_ESCALATION → WAIT_APPROVAL); - priority conflict — two rules with the same priority leading to mutually exclusive actions (say, "escalate P0 within 30 seconds" and "wait for manual confirmation");
- hidden out-of-scope — an action that a requirement forces to perform, although it is forbidden in
constraints(for example, a Jira ticket in an acceptance test when Jira is prohibited in constraints).
If you simultaneously add a recursive dependency, a disputed escalation rule, and a forbidden integration, Qwen Code's trace will show general chaos. It will be impossible to understand which element broke the behavior.
Keep the mutation in minimal radius: one changed fragment of the specification, one expected symptom, and one recovery criterion.
Localize model stuckness through chat metrics, not through the impression of a "weird" response. We introduce three diagnostic signs:
ask_storm— repeated clarifying requests without new data appearing;stage_regress— return to the same task or stage;phase_context_loss— loss of phase context, for example mixingPlanandImplement.
These signs are especially useful when Qwen Code formally continues to respond, but does not actually advance the solution: asks the owner again, rebuilds the same plan, or proposes a tool that was not permitted in the specification. A practical control line can look like this: ask_storm >= 4 || stage_regress >= 2 || phase_context_loss=true. After triggering, dissect the session as a diagnostic artifact, not as a failed dialogue.
> How to count these metrics on a training pass. These are heuristics, not CI metrics: on the first pass, a pencil mark in validation.md is enough. > > - ask_storm: each new agent message that requests data already named in previous messages of the current session. Counts as +1. Resets when you added at least one new field to requirements.md or clarifications.md. > - stage_regress: return of the current SDD phase (specify/plan/tasks/implement) to a previous one without explicit reason recorded in validation.md. Counts as +1 per rollback. > - phase_context_loss: true exactly when the agent in a new phase references a rule absent from the current requirements.md or plan.md. >
> For full tracking, these counters are automated via a parser of Qwen Code session transcript (qwen --output-format json + aggregator script). The training minimum counts them by eye during the session.
Make the defect explicit with conflicting requirements with priorities, not a comment in YAML. Compare two ways.
Bad:
# TODO: P0s should escalate within 30s, but human approval is mandatory —
# unclear what wins, will figure out later.
rules:
- id: escalate_p0
when: severity == "P0"
then: { escalation: critical_phone }
Problem: the defect sits in a comment. Linter and JSON Schema do not check it, and Qwen Code may read # TODO but is not obligated to treat a comment as an executable contract. Therefore the conflict remains outside formal verification.
Good:
rules:
- id: escalate_p0
when: severity == "P0"
then: { escalation: critical_phone }
priority: 100
- id: human_approval_required
when: severity == "P0"
then: { require_human_approval: true }
priority: 100 # intentional conflict at same priority
Now check_rule_priority.py (see below as [project script]) catches the collision by priority, not by human memory.
Translate disputed requirements into Given/When/Then and JSON Schema. Natural language conveys intent well, but poorly maintains boundaries of permissible behavior. The formulation "for critical incidents, fast escalation is needed" leaves room for the model to guess. The scenario Given severity=P0 and owner_unresponsive=true / When escalation_deadline expires / Then use critical_phone and record human_audit_required sets a verifiable branch.
JSON Schema closes the second half of the problem. It does not just describe the desired path, but forbids impermissible states. For example, absence of auto_escalation_channel at P0 or use of an integration from the forbidden_integrations list. Such a pairing corresponds to the SDD approach: the specification must include success criteria, constraints, and verifiable acceptance tests in the full development cycle. GitHub Spec Kit Quickstart describes these phases as the sequence Specify → Plan → Tasks → Implement.
Resolve the conflict by formal strategy. The strategy includes three parts:
- exception rule (override) defines which requirement wins at the time edge (for example,
time_critical_overrideabovemanual_gate_for_noncritical); - single source of truth eliminates divergence between specification text, schema, and test — if priorities are declared in YAML, reference the same hierarchy from acceptance tests and JSON Schema, do not introduce a parallel interpretation;
- verification invariant fixes transition safety: before escalation, record
severity,deadline, andowner_state; after escalation —channel,audit_record, andreason_code. Otherwise the system may formally "resolve" the conflict but lose traceability.
Close refactoring with a reverse run of the full Specify → Plan → Tasks → Implement loop. Otherwise the fix remains a local guess. What to look for in the trace:
- if after the patch
Planstabilized butTaskscreate incompatible actions — the defect moved from rules into decomposition; - if
Implementpasses but acceptance tests fail — the boundary of permissible behavior is incompletely described or the schema does not cover the operational effect.
Consider only a repeatable result reliable: the same incident log, the same specification, two consecutive runs without new ask_storm, stage_regress, and priority conflicts.
Examples and Application
Let's take a scenario different from previous cases: a sharp latency rise of appointments-api in production. In the poisoned version of the specification, two requirements are set simultaneously: "all P0 escalate within 30 seconds" and "any escalation requires manual confirmation (human approval)."
What will happen. If the responsible person is unavailable, Qwen Code enters a loop ESCALATE_EVENT → CHECK_OWNER → WAIT_APPROVAL → VALIDATE_ESCALATION → ESCALATE_EVENT. The deadline demands action. The manual barrier forbids action. The exit rule is undefined. A diagnostic run can be arranged like this:
> [project script] — commands below describe expected checks of the poisoned specification loop; a runnable analog of the basic specification gateway (Spec CI) see in examples/spec-ci/README.md.
qwen -p "In planning mode, analyze @specs/appointment-latency-poisoned.yaml.
Find cycles, priority conflicts, and hidden out-of-scope. Do not change files." \
--approval-mode plan \
--output-format json \
> out/appointment-latency-plan-review.json
python3 scripts/spec_ci/find_spec_loops.py \
--spec specs/appointment-latency-poisoned.yaml \
--out out/appointment-loop.dot
Control line for failure: cycle_count > 0 && ask_storm >= 4 && escalation_path_resolved=false.
flowchart TD
Specify[Specify]
Plan[Plan]
Tasks[Tasks]
WaitApproval[WAIT_APPROVAL]
Deadlock[deadlock by priority]
Specify -->|SDD| Plan
Specify -->|SDD| Tasks
Plan -->|SDD| WaitApproval
Tasks -->|SDD| WaitApproval
WaitApproval -->|SDD back edge| Deadlock
Deadlock -->|priority block| Specify
classDef danger fill:#ffcccc,stroke:#b00020,stroke-width:2px,color:#5a0000
class Deadlock dangerStart the fix not by removing manual confirmation, but by clarifying its scope. For P0, introduce an exception rule where reaction time outweighs prior manual confirmation. Move manual verification to post-factum audit.
For P1–P3, leave the manual barrier blocking — there is no such time risk there. A minimal patch can look like this:
rules:
- id: p0_time_critical_override
when: severity == "P0" && owner_unresponsive == true
then:
escalation: critical_phone
human_audit_required: true
priority: 100
- id: human_gate_noncritical
when: severity in ["P1", "P2", "P3"]
then:
require_human_approval: true
priority: 10
Then lock the disputed point with a schema. This is needed so the model does not return to hidden agreement through a neighboring step. In JSON Schema, require an auto-escalation channel for P0 when the owner is unavailable, and simultaneously keep a mandatory audit trail. Thus you define not only "what to do," but also "what cannot be considered successful completion":
{
"if": {
"properties": {
"severity": { "const": "P0" },
"owner_unresponsive": { "const": true }
},
"required": ["severity", "owner_unresponsive"]
},
"then": {
"required": ["auto_escalation_channel", "human_audit_required", "reason_code"],
"properties": {
"auto_escalation_channel": { "const": "critical_phone" },
"human_audit_required": { "const": true },
"reason_code": { "const": "time_critical_override" }
}
}
}
The final check must run the entire loop, not just the new schema:
> [project script] — lint_spec.py and check_rule_priority.py need to be implemented in your project; a runnable analog of simple schema and coverage gateways see in examples/spec-ci/README.md.
python3 scripts/spec_ci/lint_spec.py \
--spec specs/appointment-latency-fixed.yaml \
--atomicity
python3 scripts/spec_ci/check_rule_priority.py \
--spec specs/appointment-latency-fixed.yaml \
--expect-json-schema
qwen -p "Read @specs/appointment-latency-fixed.yaml and @validation.md.
Replay phases specify/plan/tasks/implement as review: what passes,
what remains unverified, what facts require scripts." \
--approval-mode plan \
--output-format json \
> out/appointment-latency-replay-review.json
Successful recovery line: priority_conflict=false && cycle_count==0 && escalation_path_resolved=P0 && audit_required=true.
Summary
A poisoned specification is useful only when its poison is bounded in advance: one defect, measurable symptom, formal patch, and full reverse run.
Cycles, priority conflicts, and hidden out-of-scope become from random Qwen Code failures into controlled laboratory mutations under two conditions. First — you read the trace through ask_storm, stage_regress, phase_context_loss. Second — you verify the fix through Given/When/Then, JSON Schema, exception rules, and before/after escalation invariants.
After such training, the specification ceases to be a set of wishes and becomes a stable contract. The contract can be reproducibly broken, fixed, and protected from repeated failure. In the next chapter we will formalize these rules in constitution.md as the first project referendum.
Artifacts and Readiness Criteria
| Artifact | Ready when |
|---|---|
poisoned-spec.md (or specs/appointment-latency-poisoned.yaml) | exactly one controlled defect from one class introduced: cycle, priority conflict, or hidden out-of-scope |
| Expected symptom record | before running the agent, one of ask_storm / stage_regress / phase_context_loss is named |
| fixed-spec.md (or fixed YAML) | patch changes the verifiable rule, not just the explanation in text | | Recovery line in validation.md | explains which exact fact stopped reproducing after the fix |
Full track adds out/appointment-latency-plan-review.json with Qwen Code diagnosis, a JSON Schema fragment forbidding return to hidden manual confirmation, and out/appointment-latency-replay-review.json after the reverse run. Consider it ready if the runnable analog of Spec CI locally shows a fixable failure and a pass, and the Specify → Plan → Tasks → Implement replay does not return the original conflict.
Practice
- Copy one existing feature specification and introduce exactly one defect into it: priority conflict, cycle, or hidden out-of-scope. *Expectation: two versions obtained —
poisoned-spec.mdandfixed-spec.md, differing by exactly one mutation; you can name the defect class in one word before running the agent.*
- Describe the expected failure symptom before running the agent: what should loop, what should become ambiguous, which fact should fail. *Expectation: symptom recorded concretely (
ask_stormafter third clarification,stage_regresswith plan → specify,Thenfailure invalidation.md), not as "agent won't cope."* - Fix the defect so that the patch changes requirement, plan, and check, not just the explanation in text. *Expectation: diff touches at least one of
requirements.md,plan.md,validation.md; reverse runSpecify → Plan → Tasks → Implementdoes not return the original conflict.*
Review Questions
- Why can't you introduce several defects into a poisoned specification at once?
- How does a priority conflict differ from a hidden out-of-scope?
- What does a full reverse run
Specify → Plan → Tasks → Implementprove? - You introduced a defect "escalation cycle," but Qwen Code output "no clarifications needed" and went to implement. What does this say about your specification and what is the next diagnostic step?