Reading: Applied Part 2. Specification Defect Diagnostics

Lesson 1 of 5 in module «Applied Part 2. Specification Defect Diagnostics»

You are viewing the lesson without signing in. Sign in to save progress and take tests.

Source

Applied Part 2. Diagnosing Specification Defects

Status: Recommendation. Injecting a single controlled defect into a specification is a learning technique close to mutation testing. Specific defect classes (cycle, priority_conflict, hidden_out_of_scope) are applied in projects but are not standardized. Stuck-state metrics (ask_storm, stage_regress) are frontier work.

The engineering name for the technique is the controllably defective specification: you deliberately introduce one defect to test diagnostics. The text sometimes uses the short label "poisoned specification", but it should not obscure the main rule: one mutation, one symptom, one recovery criterion.

This chapter continues two foundational ideas from Volume 1: negative requirements from part 7 and antipatterns from part 20. The difference is that now the defect is introduced deliberately and bounded in advance. Do not try to verify the entire triage process here: the learning minimum is a poisoned/fixed pair and one recovery line in validation.md.

Before Reading

Anchor from Volume 1: part 7 gives negative requirements, part 20 gives SDD antipatterns.
Local learning case: appointment_latency, because the priority conflict is visible without external infrastructure.
Trail for capstone/: a poisoned/fixed pair for high_memory_usage and one recovery line in validation.md.
Key term for the first pass: controlled defect.
What to defer: ask_storm and stage_regress metrics, full reverse replay, and automatic cycle detection.

The boundary with neighboring techniques is simple. This chapter is one manually injected defect and one stuck-state symptom. Chapter 4 is one minimal counterexample to a formal Then. Chapter 5 is many deterministic mutants for testing the validator. Chapter 8 is the formal protocol for dispute, evidence, and precedent.

The chapter's scenario is a latency spike on the appointments-api route, the agent page on Hono JSX, which first appeared in part 11 of Volume 1. The same domain, just under stress. The catalog of classic errors that the mutations rely on is in part 20. SDD Antipatterns.

Goal

After this chapter, you will be able to deliberately break an incident-triage specification, identify the point where Qwen Code gets stuck, and bring the specification to a stable, reproducible state.

The learning value is not in getting a flawless triage right away. The goal is to learn how to produce a controlled failure, read its traces, and fix the root cause in the requirements. The outcome will be a working technique:

one defect per iteration;
measurable diagnosis of the deadlock;
formal resolution of contradictions;
reverse replay of the full SDD loop Specify → Plan → Tasks → Implement.

Minimal Learning Scenario

Learning Case

The appointment_latency incident: the specification simultaneously requires "escalate P0 within 30 seconds" and "wait for manual approval before any escalation." You need to capture one priority conflict and fix it with an override rule.

Preparation

book2/examples/templates/validation.md — a form for recording the verification.
Two short files or sections: poisoned-spec.md and fixed-spec.md.
One expected symptom: ask_storm, stage_regress, or phase_context_loss.

Minimum poisoned/fixed pair for the first pass:

poisoned:

REQ-LAT-01: latency_p95 >= 2s and severity=P0 require escalation within 30 seconds. priority=100
REQ-LAT-02: any escalation requires prior human approval. priority=100

fixed:
REQ-LAT-01: for severity=P0, p0_time_critical_override applies.
REQ-LAT-02: under p0_time_critical_override, escalation is allowed immediately, but human_audit_required=true.
REQ-LAT-03: for P1-P3, prior human approval remains blocking.

These lines can be placed in the learning poisoned-spec.md and fixed-spec.md for the local appointment_latency case. If the final assessment is based on high_memory_usage, move only the defect class and the recovery line from the block below to capstone/. Change only one defect at a time: here it is the priority conflict.

Steps

In poisoned-spec.md, record two conflicting rules with the same priority. Expectation: the defect is visible in the data, not hidden in a comment.
Before running the analysis, record the expected symptom: for example, priority_conflict=true && escalation_path_resolved=false.
Run a manual review or Plan Mode request to Qwen Code without changing files. Expectation: the model points out the conflict or loses the thread exactly at the disputed point.

In fixed-spec.md, add p0_time_critical_override and move the manual check to a post-factum audit.
In validation.md, capture two facts: the original conflict was found, and the fixed path preserves human_audit_required=true.
Compare the result with the runnable Spec CI analog from [examples/spec-ci/](examples/spec-ci/README.md) if you want to automatically verify the form of the requirements and plan.

Verification Fact

The fix changes the verifiable rule, not only the explanation. In validation.md there is a recovery line: priority_conflict=false && escalation_path_resolved=P0 && audit_required=true.

How This Lands in `capstone/`

Move exactly one defect into capstone/poisoned-spec.md, and exactly one fix into capstone/fixed-spec.md. Add the recovery line to capstone/validation.md. Do not move the long Plan Mode trace: for the assessment, the defect class, the patch, and the fact that the conflict no longer reproduces matter.

Minimal fragment (the same priority_conflict class transferred from appointment_latency to the main assessment case high_memory_usage: the restart_pod permission conflicts with the human approval requirement at the same priority):

- defect_class: priority_conflict
- poisoned: memory_percent >= 90 within 10 minutes permits restart_pod, but any restart_pod requires prior human approval with the same priority.
- fixed: restart_pod is allowed as a pre-approved action only for stateless pods, and the first production run requires human_review_for_first_run=true.
- validation: priority_conflict=false && action=restart_pod && human_review_for_first_run=true

Reviewable Trail

In the learning package, keep the poisoned-spec.md / fixed-spec.md pair and the record in validation.md. The out/* outputs are not needed if they were obtained only by a local project script.

Key Ideas

Inject exactly one defect type per iteration. By "defect" we mean one of three controlled specification mutations:

cycle — a circular dependency between states (e.g., WAIT_APPROVAL → VALIDATE_ESCALATION → WAIT_APPROVAL);
priority conflict — two rules with the same priority leading to mutually exclusive actions (e.g., "escalate P0 within 30 seconds" and "wait for manual approval");
hidden out-of-scope — an action that a requirement forces you to take, even though it is forbidden in constraints (e.g., a Jira ticket in an acceptance test when Jira is forbidden in the constraints).

If you add a recursive dependency, a disputed escalation rule, and a forbidden integration all at once, the Qwen Code trace will show general chaos. It will be impossible to tell which element broke the behavior.

Keep the mutation to a minimal radius: one changed fragment of the specification, one expected symptom, and one recovery criterion.

Localize the model's stuck state through chat metrics, not by the impression of a "strange" response. Let's introduce three diagnostic indicators:

ask_storm — repeated clarifying requests with no new data appearing;
stage_regress — returning to the same task or stage;
phase_context_loss — loss of phase context, e.g., mixing Plan and Implement.

These indicators are especially useful when Qwen Code formally keeps responding but does not actually advance the solution: asking the owner again, rebuilding the same plan, or suggesting a tool that was not permitted in the specification. A practical control line might look like: ask_storm >= 4 || stage_regress >= 2 || phase_context_loss=true. After it triggers, treat the session as a diagnostic artifact, not a failed dialog.

> How to count these metrics in a learning pass. These are heuristics, not CI metrics: a pencil mark in validation.md is enough for the first pass. > > - ask_storm: each new agent message that requests data already named in the previous messages of the current session. Counts as +1. Resets when you add at least one new field to requirements.md or clarifications.md. > - stage_regress: the current SDD phase (specify/plan/tasks/implement) returning to a previous one without explicitly recording the reason in validation.md. Counts as +1 per rollback. > - phase_context_loss: true exactly when the agent, in a new phase, references a rule that is absent from the current requirements.md or plan.md. >

> For the full track, these counters are automated via a Qwen Code session transcript parser (qwen --output-format json + an aggregator script). The learning minimum counts them by eye during the session.

Express the defect as explicit conflicting requirements with priorities, not as a comment in YAML. Compare the two ways.

Bad:

# TODO: P0 should escalate within 30s, but human approval is required —
# unclear which wins, we'll figure it out later.
rules:
  - id: escalate_p0
    when: severity == "P0"
    then: { escalation: critical_phone }

Problem: the defect is hiding in a comment. The linter and JSON Schema do not check it, and Qwen Code may read # TODO but is not required to treat the comment as an executable contract. So the conflict will remain outside the formal check.

Good:

rules:
  - id: escalate_p0
    when: severity == "P0"
    then: { escalation: critical_phone }
    priority: 100
  - id: human_approval_required
    when: severity == "P0"
    then: { require_human_approval: true }
    priority: 100   # intentional conflict at the same priority

Now check_rule_priority.py (see below as [project script]) catches the collision by priority, not by human memory.

Convert disputed requirements into Given/When/Then and JSON Schema. Natural language conveys intent well, but holds the boundaries of acceptable behavior poorly. The phrasing "critical incidents need fast escalation" gives the model room to guess. A scenario like Given severity=P0 and owner_unresponsive=true / When escalation_deadline expires / Then use critical_phone and record human_audit_required defines a verifiable branch.

JSON Schema closes the second half of the problem. It does not only describe the desired path, but forbids invalid states. For example, the absence of auto_escalation_channel for P0, or the use of an integration from the forbidden_integrations list. This combination matches the SDD approach: a specification should include success criteria, constraints, and verifiable acceptance tests in the full development cycle. GitHub Spec Kit Quickstart describes these phases as the sequence Specify → Plan → Tasks → Implement.

Resolve the conflict with a formal strategy. The strategy has three parts:

an override rule determines which requirement wins at the time edge (e.g., time_critical_override above manual_gate_for_noncritical);
a single source of truth removes the discrepancy between the specification text, the schema, and the test — if priorities are declared in YAML, reference the same hierarchy from the acceptance tests and JSON Schema, rather than introducing a parallel interpretation;
a verification invariant captures the safety of the transition: before the escalation, record severity, deadline, and owner_state; after the escalation, record channel, audit_record, and reason_code. Otherwise the system may formally "resolve" the conflict but lose traceability.

Close the refactor with a reverse replay of the full cycle Specify → Plan → Tasks → Implement. Otherwise the fix remains a local guess. What to look for in the trace:

if after the patch Plan stabilizes but Tasks create incompatible actions — the defect has moved from the rules into the decomposition;
if Implement passes but the acceptance tests fail — the boundary of acceptable behavior is described incompletely, or the schema does not cover the operational effect.

Treat only a reproducible result as reliable: the same incident log, the same specification, two consecutive runs without new ask_storm, stage_regress, or priority conflicts.

Examples and Application

Let's take a scenario different from the previous cases: a sudden latency spike in appointments-api in production. In the poisoned version of the specification, two requirements are set at the same time: "all P0 escalate within 30 seconds" and "any escalation requires manual approval (human approval)".

What happens. If the responsible person is unavailable, Qwen Code gets stuck in the loop ESCALATE_EVENT → CHECK_OWNER → WAIT_APPROVAL → VALIDATE_ESCALATION → ESCALATE_EVENT. The deadline requires action. The manual gate forbids action. No exit rule is defined. The diagnostic run can be framed like this:

> [project script] — the commands below describe the expected checks for the poisoned-specification loop; a close runnable analog of the basic specification gateway (Spec CI) is in examples/spec-ci/README.md.

qwen -p "In plan mode, analyze @specs/appointment-latency-poisoned.yaml.

Find cycles, priority conflicts, and hidden out-of-scope (hidden_out_of_scope). Do not change files." \
  --approval-mode plan \
  --output-format json \
  > out/appointment-latency-plan-review.json

python3 scripts/spec_ci/find_spec_loops.py \
  --spec specs/appointment-latency-poisoned.yaml \
  --out out/appointment-loop.dot

Failure control line: cycle_count > 0 && ask_storm >= 4 && escalation_path_resolved=false.

flowchart TD
    Specify[Specify]
    Plan[Plan]
    Tasks[Tasks]
    WaitApproval[WAIT_APPROVAL]
    Deadlock[priority deadlock]
    Specify -->|SDD| Plan
    Specify -->|SDD| Tasks
    Plan -->|SDD| WaitApproval
    Tasks -->|SDD| WaitApproval
    WaitApproval -->|SDD backward arc| Deadlock
    Deadlock -->|priority block| Specify
    classDef danger fill:#ffcccc,stroke:#b00020,stroke-width:2px,color:#5a0000
    class Deadlock danger

Start the fix not by removing manual approval, but by clarifying its scope. For P0, introduce an override rule where response time takes precedence over prior manual approval. Move the manual check to a post-factum audit.

For P1–P3, keep the manual gate blocking — there is no comparable time risk there. A minimal patch might look like this:

rules:
  - id: p0_time_critical_override
    when: severity == "P0" && owner_unresponsive == true
    then:
      escalation: critical_phone
      human_audit_required: true
    priority: 100

  - id: human_gate_noncritical
    when: severity in ["P1", "P2", "P3"]
    then:
      require_human_approval: true
    priority: 10

Then pin the disputed place down with a schema. This is needed so that the model does not return to hidden coordination via a neighboring step. In JSON Schema, require an auto-escalation channel for P0 when the owner is unavailable and, at the same time, keep the mandatory audit trail. This way you specify not only "what to do", but also "what cannot count as a successful completion":

{
  "if": {
    "properties": {
      "severity": { "const": "P0" },
      "owner_unresponsive": { "const": true }
    },
    "required": ["severity", "owner_unresponsive"]
  },
  "then": {
    "required": ["auto_escalation_channel", "human_audit_required", "reason_code"],
    "properties": {
      "auto_escalation_channel": { "const": "critical_phone" },

"human_audit_required": { "const": true },
      "reason_code": { "const": "time_critical_override" }
    }
  }
}

The final check should run the full loop, not just the new schema:

> [project script] — lint_spec.py and check_rule_priority.py need to be implemented in your project; a runnable analog of simple schema and coverage gateways is in examples/spec-ci/README.md.

python3 scripts/spec_ci/lint_spec.py \
  --spec specs/appointment-latency-fixed.yaml \
  --atomicity

python3 scripts/spec_ci/check_rule_priority.py \
  --spec specs/appointment-latency-fixed.yaml \
  --expect-json-schema

qwen -p "Read @specs/appointment-latency-fixed.yaml and @validation.md.
Replay the specify/plan/tasks/implement phases as a review: what passes,
what remains unverified, which facts require scripts." \
  --approval-mode plan \
  --output-format json \
  > out/appointment-latency-replay-review.json

Successful recovery line: priority_conflict=false && cycle_count==0 && escalation_path_resolved=P0 && audit_required=true.

Summary

A poisoned specification is useful only when its core is bounded in advance: one defect, a measurable symptom, a formal patch, and a full reverse replay.

Cycles, priority conflicts, and hidden out-of-scope turn from random Qwen Code failures into managed laboratory mutations under two conditions. First, you read the trace through ask_storm, stage_regress, and phase_context_loss. Second, you verify the fix through Given/When/Then, JSON Schema, override rules, and pre/post-escalation invariants.

After such training, the specification stops being a set of wishes and becomes a stable contract. The contract can be reproducibly broken, fixed, and protected from recurrence. In the next chapter, we will formalize these rules in constitution.md as the first project referendum.

Artifacts and Readiness Criteria

Artifact	Ready when
`poisoned-spec.md` (or `specs/appointment-latency-poisoned.yaml`)	exactly one controlled defect of a single class is introduced: cycle, priority conflict, or hidden out-of-scope
Record of the expected symptom	before running the agent, one of `ask_storm` / `stage_regress` / `phase_context_loss` is named

| fixed-spec.md (or the corrected YAML) | the patch changes the verifiable rule, not only the text explanation | | Recovery line in validation.md | explains which specific fact no longer reproduces after the fix |

The full track adds out/appointment-latency-plan-review.json with Qwen Code diagnostics, a JSON Schema fragment that forbids returning to hidden manual approval, and out/appointment-latency-replay-review.json after the reverse replay. Treat it as ready when the runnable Spec CI analog locally shows a fixable failure and a pass, and the Specify → Plan → Tasks → Implement replay does not return the original conflict.

Practice

Copy an existing feature specification and inject exactly one defect into it: priority conflict, cycle, or hidden out-of-scope. *Expectation: you get two versions — poisoned-spec.md and fixed-spec.md — differing by exactly one mutation; you can name the defect class in one word before running the agent.*

Describe the expected failure symptom before running the agent: what should loop, what should become ambiguous, which fact should fail. *Expectation: the symptom is recorded concretely (ask_storm after the third clarification, stage_regress from plan → specify, a Then failure in validation.md), not as "the agent will not cope".*
Fix the defect so that the patch changes the requirement, the plan, and the check, not only the text explanation. *Expectation: the diff touches at least one of requirements.md, plan.md, validation.md; the Specify → Plan → Tasks → Implement reverse replay does not return the original conflict.*

Review Questions

Why can't you introduce several defects into a poisoned specification at the same time?
How does a priority conflict differ from hidden out-of-scope?
What does a full Specify → Plan → Tasks → Implement reverse replay prove?
You injected an "escalation cycle" defect, but Qwen Code responded with "no clarifications needed" and started implementing. What does this say about your specification, and what is the next diagnostic step?