Reading: Applied Part 2. Specification Defect Diagnosis

Lesson 1 of 5 in module «Applied Part 2. Specification Defect Diagnosis»
You are viewing the lesson without signing in. Sign in to save progress and take tests.

Applied Part 2. Diagnosing Specification Defects

Status: Recommendation. Injecting one controlled defect into the specification is a training technique close to mutation testing. Specific defect classes (cycle, priority_conflict, hidden_out_of_scope) are used in projects but not standardized. Stuckness metrics (ask_storm, stage_regress) are frontier.

The engineering name for the technique is controlled-defect specification: you intentionally introduce one defect to test diagnosis. The text sometimes uses the short label "poisoned specification," but it should not obscure the main rule: one mutation, one symptom, one recovery criterion.

This chapter continues two basic ideas from the first volume: negative requirements from Part 7 and antipatterns from Part 20. The difference is that now the defect is introduced intentionally and bounded in advance. Do not try to test the entire triage process here: the training minimum is a poisoned/fixed pair and one recovery line in validation.md.

Before Reading

  • Foundation from the first volume: Part 7 provides negative requirements, Part 20 provides SDD antipatterns.
  • Local training case: appointment_latency, because priority conflict is visible without external infrastructure.
  • Trail for capstone/: poisoned/fixed pair for high_memory_usage and one recovery line in validation.md.
  • Key term for the first pass: controlled defect.
  • What to defer: metrics ask_storm, stage_regress, full reverse run, and automatic cycle detection.

The boundary with neighboring techniques is simple. This chapter is one manually introduced defect and one stuckness symptom. Chapter 4 is one minimal counterexample to formal Then. Chapter 5 is many deterministic mutants for testing the validator. Chapter 8 is the official protocol of dispute, evidence, and precedent.

The chapter scenario is rising latency of the appointments-api route, the agent page on Hono JSX that first appeared in Part 11 of the first volume. The same domain, but under stress. The catalog of classic errors that mutations build on is in Part 20. SDD Antipatterns.

Goal

After this chapter you will be able to intentionally break the incident triage specification, identify the stuckness point of Qwen Code, and bring the specification to a stable, reproducible state.

The training value is not in getting perfect triage immediately. The goal is to learn to produce a controlled failure, read its traces, and fix the root cause in requirements. The result will be a working technique:

  • one defect per iteration;
  • measurable deadlock diagnosis;
  • formal resolution of contradictions;
  • reverse run of the full SDD loop Specify → Plan → Tasks → Implement.

Minimal Training Scenario

Training Case

Incident appointment_latency: the specification simultaneously requires "escalate P0 within 30 seconds" and "wait for manual confirmation before any escalation." You need to capture one priority conflict and fix it with an exception rule.

Preparation

  • book2/examples/templates/validation.md — form for recording the check.
  • Two short files or sections: poisoned-spec.md and fixed-spec.md.
  • One expected symptom: ask_storm, stage_regress, or phase_context_loss.

Minimal poisoned/fixed pair for the first pass:

poisoned:

REQ-LAT-01: latency_p95 >= 2s and severity=P0 require escalation within 30 seconds. priority=100
REQ-LAT-02: any escalation requires prior human approval. priority=100

fixed:
REQ-LAT-01: for severity=P0, p0_time_critical_override applies.
REQ-LAT-02: under p0_time_critical_override, escalation is permitted immediately, but human_audit_required=true.
REQ-LAT-03: for P1-P3, prior human approval remains blocking.

These lines can be placed in training poisoned-spec.md and fixed-spec.md for the local case appointment_latency. If the final assessment uses high_memory_usage, transfer to capstone/ only the defect class and the recovery line from the block below. Change only one defect at a time: here it is a priority conflict.

Steps

  1. In poisoned-spec.md, write two conflicting rules with the same priority. Expectation: the defect is visible in the data, not hidden in a comment.
  2. Before running analysis, record the expected symptom: for example, priority_conflict=true && escalation_path_resolved=false.
  3. Perform manual review or a Plan Mode query to Qwen Code without changing files. Expectation: the model points out the conflict or loses the move exactly at the disputed point.
  1. In fixed-spec.md, add p0_time_critical_override and move manual verification to post-factum audit.
  2. In validation.md, record two facts: the original conflict was found, the fixed path preserves human_audit_required=true.
  3. Compare the result with the runnable analog of Spec CI from [examples/spec-ci/](examples/spec-ci/README.md), if you want to check the form of requirements and plan automatically.

Control Fact

The fix changes the verifiable rule, not just the explanation. In validation.md there is a recovery line: priority_conflict=false && escalation_path_resolved=P0 && audit_required=true.

How This Goes Into capstone/

Transfer exactly one defect to capstone/poisoned-spec.md, and exactly one fix to capstone/fixed-spec.md. Add the recovery line to capstone/validation.md. Do not transfer a long Plan Mode trace: for assessment, the defect class, patch, and the fact that the conflict no longer reproduces matter.

Minimal fragment (the same priority_conflict class transferred from appointment_latency to the main assessment case high_memory_usage: restart_pod permission and human approval requirement conflict with the same priority):

- defect_class: priority_conflict
- poisoned: memory_percent >= 90 for 10 minutes permits restart_pod, but any restart_pod requires prior human approval with the same priority.
- fixed: restart_pod is permitted as a pre-approved action only for stateless pod, and the first production launch requires human_review_for_first_run=true.
- validation: priority_conflict=false && action=restart_pod && human_review_for_first_run=true

Reviewable Trail

In the training package, keep the poisoned-spec.md / fixed-spec.md pair and the record in validation.md. out/* outputs are not needed if they were obtained only by a local project script.

Key Ideas

Introduce exactly one defect type per iteration. By "defect" here is meant one of three controlled specification mutations:

  • cycle — cyclic dependency between states (for example, WAIT_APPROVAL → VALIDATE_ESCALATION → WAIT_APPROVAL);
  • priority conflict — two rules with the same priority leading to mutually exclusive actions (say, "escalate P0 within 30 seconds" and "wait for manual confirmation");
  • hidden out-of-scope — an action that a requirement forces to perform, although it is forbidden in constraints (for example, a Jira ticket in an acceptance test when Jira is prohibited in constraints).

If you simultaneously add a recursive dependency, a disputed escalation rule, and a forbidden integration, Qwen Code's trace will show general chaos. It will be impossible to understand which element broke the behavior.

Keep the mutation in minimal radius: one changed fragment of the specification, one expected symptom, and one recovery criterion.

Localize model stuckness through chat metrics, not through the impression of a "weird" response. We introduce three diagnostic signs:

  • ask_storm — repeated clarifying requests without new data appearing;
  • stage_regress — return to the same task or stage;
  • phase_context_loss — loss of phase context, for example mixing Plan and Implement.

These signs are especially useful when Qwen Code formally continues to respond, but does not actually advance the solution: asks the owner again, rebuilds the same plan, or proposes a tool that was not permitted in the specification. A practical control line can look like this: ask_storm >= 4 || stage_regress >= 2 || phase_context_loss=true. After triggering, dissect the session as a diagnostic artifact, not as a failed dialogue.

> How to count these metrics on a training pass. These are heuristics, not CI metrics: on the first pass, a pencil mark in validation.md is enough. > > - ask_storm: each new agent message that requests data already named in previous messages of the current session. Counts as +1. Resets when you added at least one new field to requirements.md or clarifications.md. > - stage_regress: return of the current SDD phase (specify/plan/tasks/implement) to a previous one without explicit reason recorded in validation.md. Counts as +1 per rollback. > - phase_context_loss: true exactly when the agent in a new phase references a rule absent from the current requirements.md or plan.md. >

> For full tracking, these counters are automated via a parser of Qwen Code session transcript (qwen --output-format json + aggregator script). The training minimum counts them by eye during the session.

Make the defect explicit with conflicting requirements with priorities, not a comment in YAML. Compare two ways.

Bad:

# TODO: P0s should escalate within 30s, but human approval is mandatory —
# unclear what wins, will figure out later.
rules:
  - id: escalate_p0
    when: severity == "P0"
    then: { escalation: critical_phone }

Problem: the defect sits in a comment. Linter and JSON Schema do not check it, and Qwen Code may read # TODO but is not obligated to treat a comment as an executable contract. Therefore the conflict remains outside formal verification.

Good:

rules:
  - id: escalate_p0
    when: severity == "P0"
    then: { escalation: critical_phone }
    priority: 100
  - id: human_approval_required
    when: severity == "P0"
    then: { require_human_approval: true }
    priority: 100   # intentional conflict at same priority

Now check_rule_priority.py (see below as [project script]) catches the collision by priority, not by human memory.

Translate disputed requirements into Given/When/Then and JSON Schema. Natural language conveys intent well, but poorly maintains boundaries of permissible behavior. The formulation "for critical incidents, fast escalation is needed" leaves room for the model to guess. The scenario Given severity=P0 and owner_unresponsive=true / When escalation_deadline expires / Then use critical_phone and record human_audit_required sets a verifiable branch.

JSON Schema closes the second half of the problem. It does not just describe the desired path, but forbids impermissible states. For example, absence of auto_escalation_channel at P0 or use of an integration from the forbidden_integrations list. Such a pairing corresponds to the SDD approach: the specification must include success criteria, constraints, and verifiable acceptance tests in the full development cycle. GitHub Spec Kit Quickstart describes these phases as the sequence Specify → Plan → Tasks → Implement.

Resolve the conflict by formal strategy. The strategy includes three parts:

  • exception rule (override) defines which requirement wins at the time edge (for example, time_critical_override above manual_gate_for_noncritical);
  • single source of truth eliminates divergence between specification text, schema, and test — if priorities are declared in YAML, reference the same hierarchy from acceptance tests and JSON Schema, do not introduce a parallel interpretation;
  • verification invariant fixes transition safety: before escalation, record severity, deadline, and owner_state; after escalation — channel, audit_record, and reason_code. Otherwise the system may formally "resolve" the conflict but lose traceability.

Close refactoring with a reverse run of the full Specify → Plan → Tasks → Implement loop. Otherwise the fix remains a local guess. What to look for in the trace:

  • if after the patch Plan stabilized but Tasks create incompatible actions — the defect moved from rules into decomposition;
  • if Implement passes but acceptance tests fail — the boundary of permissible behavior is incompletely described or the schema does not cover the operational effect.

Consider only a repeatable result reliable: the same incident log, the same specification, two consecutive runs without new ask_storm, stage_regress, and priority conflicts.

Examples and Application

Let's take a scenario different from previous cases: a sharp latency rise of appointments-api in production. In the poisoned version of the specification, two requirements are set simultaneously: "all P0 escalate within 30 seconds" and "any escalation requires manual confirmation (human approval)."

What will happen. If the responsible person is unavailable, Qwen Code enters a loop ESCALATE_EVENT → CHECK_OWNER → WAIT_APPROVAL → VALIDATE_ESCALATION → ESCALATE_EVENT. The deadline demands action. The manual barrier forbids action. The exit rule is undefined. A diagnostic run can be arranged like this:

> [project script] — commands below describe expected checks of the poisoned specification loop; a runnable analog of the basic specification gateway (Spec CI) see in examples/spec-ci/README.md.

qwen -p "In planning mode, analyze @specs/appointment-latency-poisoned.yaml.

Find cycles, priority conflicts, and hidden out-of-scope. Do not change files." \
  --approval-mode plan \
  --output-format json \
  > out/appointment-latency-plan-review.json

python3 scripts/spec_ci/find_spec_loops.py \
  --spec specs/appointment-latency-poisoned.yaml \
  --out out/appointment-loop.dot

Control line for failure: cycle_count > 0 && ask_storm >= 4 && escalation_path_resolved=false.

flowchart TD
    Specify[Specify]
    Plan[Plan]
    Tasks[Tasks]
    WaitApproval[WAIT_APPROVAL]
    Deadlock[deadlock by priority]
    Specify -->|SDD| Plan
    Specify -->|SDD| Tasks
    Plan -->|SDD| WaitApproval
    Tasks -->|SDD| WaitApproval
    WaitApproval -->|SDD back edge| Deadlock
    Deadlock -->|priority block| Specify
    classDef danger fill:#ffcccc,stroke:#b00020,stroke-width:2px,color:#5a0000
    class Deadlock danger

Start the fix not by removing manual confirmation, but by clarifying its scope. For P0, introduce an exception rule where reaction time outweighs prior manual confirmation. Move manual verification to post-factum audit.

For P1–P3, leave the manual barrier blocking — there is no such time risk there. A minimal patch can look like this:

rules:
  - id: p0_time_critical_override
    when: severity == "P0" && owner_unresponsive == true
    then:
      escalation: critical_phone
      human_audit_required: true
    priority: 100

  - id: human_gate_noncritical
    when: severity in ["P1", "P2", "P3"]
    then:
      require_human_approval: true
    priority: 10

Then lock the disputed point with a schema. This is needed so the model does not return to hidden agreement through a neighboring step. In JSON Schema, require an auto-escalation channel for P0 when the owner is unavailable, and simultaneously keep a mandatory audit trail. Thus you define not only "what to do," but also "what cannot be considered successful completion":

{
  "if": {
    "properties": {
      "severity": { "const": "P0" },
      "owner_unresponsive": { "const": true }
    },
    "required": ["severity", "owner_unresponsive"]
  },
  "then": {
    "required": ["auto_escalation_channel", "human_audit_required", "reason_code"],
    "properties": {
      "auto_escalation_channel": { "const": "critical_phone" },

"human_audit_required": { "const": true },
      "reason_code": { "const": "time_critical_override" }
    }
  }
}

The final check must run the entire loop, not just the new schema:

> [project script]lint_spec.py and check_rule_priority.py need to be implemented in your project; a runnable analog of simple schema and coverage gateways see in examples/spec-ci/README.md.

python3 scripts/spec_ci/lint_spec.py \
  --spec specs/appointment-latency-fixed.yaml \
  --atomicity

python3 scripts/spec_ci/check_rule_priority.py \
  --spec specs/appointment-latency-fixed.yaml \
  --expect-json-schema

qwen -p "Read @specs/appointment-latency-fixed.yaml and @validation.md.
Replay phases specify/plan/tasks/implement as review: what passes,
what remains unverified, what facts require scripts." \
  --approval-mode plan \
  --output-format json \
  > out/appointment-latency-replay-review.json

Successful recovery line: priority_conflict=false && cycle_count==0 && escalation_path_resolved=P0 && audit_required=true.

Summary

A poisoned specification is useful only when its poison is bounded in advance: one defect, measurable symptom, formal patch, and full reverse run.

Cycles, priority conflicts, and hidden out-of-scope become from random Qwen Code failures into controlled laboratory mutations under two conditions. First — you read the trace through ask_storm, stage_regress, phase_context_loss. Second — you verify the fix through Given/When/Then, JSON Schema, exception rules, and before/after escalation invariants.

After such training, the specification ceases to be a set of wishes and becomes a stable contract. The contract can be reproducibly broken, fixed, and protected from repeated failure. In the next chapter we will formalize these rules in constitution.md as the first project referendum.

Artifacts and Readiness Criteria

ArtifactReady when
poisoned-spec.md (or specs/appointment-latency-poisoned.yaml)exactly one controlled defect from one class introduced: cycle, priority conflict, or hidden out-of-scope
Expected symptom recordbefore running the agent, one of ask_storm / stage_regress / phase_context_loss is named

| fixed-spec.md (or fixed YAML) | patch changes the verifiable rule, not just the explanation in text | | Recovery line in validation.md | explains which exact fact stopped reproducing after the fix |

Full track adds out/appointment-latency-plan-review.json with Qwen Code diagnosis, a JSON Schema fragment forbidding return to hidden manual confirmation, and out/appointment-latency-replay-review.json after the reverse run. Consider it ready if the runnable analog of Spec CI locally shows a fixable failure and a pass, and the Specify → Plan → Tasks → Implement replay does not return the original conflict.

Practice

  1. Copy one existing feature specification and introduce exactly one defect into it: priority conflict, cycle, or hidden out-of-scope. *Expectation: two versions obtained — poisoned-spec.md and fixed-spec.md, differing by exactly one mutation; you can name the defect class in one word before running the agent.*
  1. Describe the expected failure symptom before running the agent: what should loop, what should become ambiguous, which fact should fail. *Expectation: symptom recorded concretely (ask_storm after third clarification, stage_regress with plan → specify, Then failure in validation.md), not as "agent won't cope."*
  2. Fix the defect so that the patch changes requirement, plan, and check, not just the explanation in text. *Expectation: diff touches at least one of requirements.md, plan.md, validation.md; reverse run Specify → Plan → Tasks → Implement does not return the original conflict.*

Review Questions

  1. Why can't you introduce several defects into a poisoned specification at once?
  2. How does a priority conflict differ from a hidden out-of-scope?
  3. What does a full reverse run Specify → Plan → Tasks → Implement prove?
  4. You introduced a defect "escalation cycle," but Qwen Code output "no clarifications needed" and went to implement. What does this say about your specification and what is the next diagnostic step?
My notes
0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course menu

Course

Production SDD for Qwen Code CLI. Part 2
Progress 0 / 100