Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle
Status: Recommendation. This part collects antipatterns that arise specifically in the applied cycle: during duels, file arbitration, Spec CI, tiered budgets, anti-Goodhart metrics, and auto-remediation. They continue the thread from Part 20 of Volume 1: artifacts exist, checks exist, the agent works quickly, but control over the system gradually slips away.
This part serves as a diagnostic map for the applied volume. If the production pipeline has become noisy, contradictory, or defends its own speed at the user's expense, start here.
Before Reading
- Foundation from Volume 1: Part 20 shows the basic SDD antipatterns.
- Local learning case: any artifact already created from Chapters 8–11.
- Trail for
capstone/: three lines ofblocker / owner / next_checkfor the selectedhigh_memory_usagepackage. - Key term for the first pass: diagnostic blocker. Individual antipattern names are for reference; reading them in sequence is unnecessary.
- What to defer: converting each antipattern into an automatic CI policy.
Goal
The goal of this chapter is not to memorize a list of names, but to conduct a brief audit of an already assembled production SDD package. After the chapter, you should have three diagnostic lines: what blocks admission, who is responsible for the fix, and when it will be checked again.
For the first pass, read this chapter as a checklist, not as a dictionary of all possible failures. The catalog below is needed to recognize a found problem; the chapter can be closed with three lines of blocker / owner / next_check.
Escalations of Volume 1 Antipatterns
Some antipatterns in this catalog are not new, but production variants of those described in Part 20 of Volume 1. The difference is in scale: in the learning cycle, a basic antipattern burns a day of work; in the applied cycle, it opens a class of incidents on a live service.
| Antipattern from this chapter | Escalation from Volume 1 |
|---|---|
| Constitution as cosmetics | "Constitution that nobody opens" (Vol. 1, Part 20) |
| Poisoned specification without diff in artifacts | "Specification after code" — patch of explanation instead of artifact correction |
validation.md drift after red CI | "Weakening facts after error" (Vol. 1, Part 20) |
| Shadow specification without review deadline | "QWEN.md as dump" — memory without ttl and author |
| Trail without proof annotation | "Facts on words" — absence of evidence_ref |
If you completed Part 20 of Volume 1, these entries can be read selectively — only what adds production context is here. The other 12 antipatterns in this chapter appear only in the applied cycle (duels, arbitration, budgets, anti-Goodhart, auto-remediation) and are not covered in Volume 1.
Minimal Learning Scenario
Learning Case
Before the final exam, you need to check one production SDD pipeline for noise. Choose any package from Chapters 8–11: judgment.md, validation.md, budget_network.yaml, or the readiness table. The goal is to find at least three antipatterns or explicitly prove that none exist.
Preparation
- Diagnostic checklist below.
book2/examples/templates/retrospective.md— form for brief output recording.- One artifact that has already passed through a runnable example or manual learning scenario.
Steps
- Choose one artifact and do not expand the check scope. Expectation: audit takes 15–30 minutes, not turning into a review of the entire project.
- Answer the 12 checklist questions. *Expectation: each answer is
yes,no, ornot_applicablewith a short file reference.* - For each
no, specify the antipattern, owner, and fix deadline. - Find at least one
[project script]in the selected chapter and check whether it is marked as a runnable analog or a "implement yourself" interface. - Record the result in
antipattern-audit.mdor in a retrospective: what blocks production admission, what can be left as improvement.
Control Fact
There is a list of three items: blocker, owner, next_check. If negative answers are turned only into general advice, the diagnostic map has not fulfilled its function.
How This Gets Into capstone/
Transfer to capstone/antipattern-audit.md three lines of blocker / owner / next_check. Do not fix these problems in the same file without a separate record: for the exam, it is important to see the diagnosis and next check, not a clean result without history.
Minimal fragment:
| blocker | owner | next_check |
|---|---|---|
| readiness without `evidence_ref` | platform | repeat dry-run with reference to fixture |
| `[project script]` without runnable analog | devex | replace with `examples/real-api` or implement script |
| `manual_review_floor` not specified | sre | add guard metric before auto-mode |
Reviewable Trail
Keep antipattern-audit.md if the audit relates to the learning package or capstone/. Do not fix found problems in the same change without a separate record: first the diagnosis must be visible, then the treatment.
Key Ideas
Each antipattern below is analyzed according to one schema of three fields: Symptom — what is observed in artifacts and logs, Why it's bad — how this changes the behavior of the team or agent, How to fix — minimal steps to a state where the symptom is caught automatically.
The catalog can be read in any order: each entry is self-contained and follows the same three fields. The entries are not alternatives — they are different classes of noise that often occur together. If three or more antipatterns are found in one production package, read them as a coupled network of problems with a common root in the lost risk contract.
Examples and Application
Constitution as Cosmetics
> Escalation of antipattern from Volume 1. The basic version is "constitution that nobody opens" from Part 20 of Volume 1. In the applied cycle, the same error leads not to bad code, but to a dangerous operation being performed.
Symptom: the repository has constitution.md with immutable_principles and mutable_rules, but the gate before dangerous actions is not triggered. In judgment.md there is a reference to the forbid_unscoped_delete rule, but in team logs (logs/auto-remediation.jsonl) there is not a single call to scripts/constitution/check.py. Over the last 30 days — zero gate triggers.
Why it's bad: rules formally exist but do not constrain the agent at the moment of pressure. After several incidents, the team begins to treat the constitution as decoration. In the analysis of the next incident, a characteristic phrase surfaces: "the rule was there, just nobody checked it" — this is a signal that the constitution worked as a comment, not as a contract.
How to fix:
- connect
constitution.mdto gate checks before playbook execution (see Part 3); - pass any dangerous operation through
scripts/constitution/check.pyor equivalent in CI;
- in
judgment.md, record the reference to the constitution version anddecision_hash; - if a rule is not checked automatically, move it from
immutable_principlesto the playbook and explicitly mark it as "instruction, not gate"; - verify that over the last 30 days there is at least one log where the gate triggered and blocked an action. Zero triggers per quarter means either the constitution is not connected, or it describes non-existent risks.
Mutable Rule with ttl: ∞
Symptom: in mutable_rules there is a rule without ttl, or ttl is set to years immediately. rollback_condition is absent or formulated as "by team decision".
Why it's bad: the rule lives indefinitely and over time turns into a hidden part of the invariant. After a year, participants don't remember why it appeared and are afraid to touch it. Amendments are applied by analogy to situations for which they were not intended.
How to fix:
- set
ttlin days, not years; first review — 30–90 days (see reference answer inINSTRUCTOR.md); - formulate
rollback_conditionas a verifiable predicate:repeat_incidents_same_node>=2,silent_p0_ratio>0.05,safety_veto=true;
- upon
ttlexpiration, block rule application until explicit extension through referendum; - delete rules that have not triggered once during their lifetime.
Poisoned Specification Without Diff in Artifacts
> Escalation of antipattern from Volume 1. The basic version is "specification after code" from Part 20 of Volume 1: the explanation changes, but the artifact does not. Here the same error manifests in the reverse task (specification recovery) — the patch treats the comment, not the root cause.
Symptom: the team trains specification recovery on poisoned specifications (see Part 2), but the patch changes only the explanation text or comment. requirements.md, plan.md, validation.md remain unchanged.
Why it's bad: the exercise stops teaching root cause localization. A week later, the same class of poisoned specification passes again — the patch did not close the root cause.
How to fix:
- in done-criteria for the lesson, require a diff in at least one artifact (
requirements.md,plan.md,validation.md,constitution.md) — not just in the explanation; - run a full reverse pass
Specify → Plan → Tasks → Implementafter fixing;
- if the patch changes only the explanation text, send for retake;
- register the class of controllably defective specification in
precedents.md, so the next similar defect is caught automatically.
ask_storm Disguised as Carefulness
Symptom: the agent in the cycle asks clarifying questions and does not proceed to solution. Control string: cycle_count > 0 && ask_storm >= 4 && escalation_path_resolved=false (ask_storm — counter of repeated clarifications without new data).
Why it's bad: questions look like caution, but signal an internal contradiction in the specification. Each human answer on the fly adds a new formulation that is not fixed anywhere and disappears with the next /clear.
How to fix:
- stop the session after the third consecutive clarification and check
requirements.mdfor contradictions; - analyze the specification as poisoned (Part 2) — one defect, root cause search;
- fix answers not in chat, but in
requirements.mdorclarifications.md; - do not turn agent questions into an ongoing dialogue: each clarification must either close a specification item or return to its editing.
stage_regress Without Explicit Cause
Symptom: implement phase returns to plan, plan phase — to specify without a recorded reason. The next day, nobody remembers why the plan was rewritten. (stage_regress — rollback to a previous SDD cycle phase without explicit cause.)
Why it's bad: the SDD cycle turns into drift. Each step back loses the context of the previous one, and after a week the project has three half-drafts of plans, none of which is closed by a fact in validation.md.
How to fix:
- explicitly record phase transitions: reviewable change record, entry in
genealogy.md, reason; - prohibit rollback to a previous phase without updating at least one fact in
validation.md; - use a project skill that on
git statusshows which phase is current and which facts are not yet closed; - if rollback happens more than once a day — this is a signal that the specification is too small, not that the process is noisy.
phase_context_loss Between Phases
Symptom: specify fixed a decision, plan did not inherit it, implement started with a draft that never passed validation.md. (phase_context_loss — loss of context between phases.)
Why it's bad: each step works with its own picture of the world. After two or three transitions, they diverge so much that artifacts contradict each other, and any check trivially passes — it checks the wrong thing.
How to fix:
- between phases pass only file references (
@specs/...), not retelling from chat; - introduce project skill
check_phase_handoff, which verifies that the plan references the currentrequirements.md, and implementation references the current plan; - after
/clear, start a new phase by readingQWEN.md, currentrequirements.md, and currentplan.md; - if a phase cannot explain which item of the previous phase it implements — return to it before editing code.
Verifier Turns into Ordinary Code Review
Symptom: Verifier writes comments like "style is not great", "better rename", "let's discuss". No specific counterexample, reference to a rule, or JSON Schema violation.
Why it's bad: the duel loses its procedural character. The Verifier stops being a formal circuit and becomes just another opinion. The Implementor responds with free text, and the dispute drifts into chat where it cannot be reproduced.
How to fix:
- prohibit the Verifier any judgments without a specific artifact: counterexample with minimality, hook log, JSON Schema, Given/When/Then;
- verdict replication must be local: another person with the same
cases/andmetrics/gets the samejudgment.md; - if the Verifier found no counterexample — record
verdict=APPROVEand move on, rather than continuing discussion in general formulations; - style remarks go to a separate review channel, not mixed with the duel.
File Arbitration Where Only Majority Votes
Symptom: governance_protocol is described as "2 approve out of 3", without veto from Safety and tie-breaker. At a tie, the system hangs or decides by convening date.
Why it's bad: the Safety role loses meaning. Decisions are made by votes of Verifier and Implementor, who optimize for speed; critical risks pass as "acceptable".
How to fix:
- introduce
safety_veto: critical_riskfor the Safety role; - set
tie_breaker: safety_first_then_latest_matching_precedent; - check
governance_protocolwith Spec CI gate: absence of tie-breaker and veto blocks merge;
- each rejection by Safety veto is recorded in
precedents.mdwith reference to the immutable rule, so a repeated similar dispute closes faster.
Fictitious [project script]
Symptom: in specification, checklist, or learning chapter, a command like python3 scripts/spec_ci/check_scope.py is used, but the script itself does not exist in the repository. Nobody has run it; the fact "check passed" is assumed, not observed.
Why it's bad: a false sense of control appears. CI looks strict, but the check is not executed. After several weeks, the team forgets which scripts are real and which are interfaces.
How to fix:
- next to each
[project script]block explicitly state whether there is a runnable analog inexamples/or if this is a "implement yourself" interface; - in Spec CI, a separate step checks that commands mentioned in
validation.mdactually exist (test -x path/to/script); - learning chapters mark
[runnable]only for commands that pass localpython3 scripts/...; - if a script is needed but absent — create a ticket with a fixed implementation date, rather than leaving it as "later".
Naked KPI Without Paired Counter-Metric
Symptom: in validation.md there is a target metric (MTTR<=5m, coverage>=80%, auto_close_rate>=0.9), but no paired counter-metrics (guard metrics). The CI gate passes when KPI is met.
Why it's bad: classic Goodhart. The agent or team learns to meet the metric at any cost: closing complex incidents as easy ones, marking P0 as P2, bypassing manual review. The metric rises, real quality falls.
How to fix:
- pair each target metric with an anti-Goodhart metric (see Part 10): to
MTTR—silent_p0_ratioandmanual_review_floor; tocoverage—mutation_kill_rate; - the gate passes only when the pair is simultaneously satisfied;
- any threshold change is a risk event, not a cosmetic YAML edit; recorded with justification;
- regularly run replay on historical incidents: the new version must not worsen verdicts on analyzed cases.
validation.md Drift After Red CI
Symptom: CI fails, after which the pull request author changes the threshold or deletes a fact in validation.md instead of fixing the code. Change description — "refined validation".
Why it's bad: the process begins to defend the implementation, not the user contract. This is the same error as "weakening facts after error" from Part 20 of Volume 1, but at applied scale: weakening the silent_p0 threshold from default 0.05 (AgentClinic baseline, see Appendix D.4) to 0.10 in one pull request shifts an entire class of risks.
How to fix:
- any edit to
validation.mdthat weakens a check is separately reviewed as a risk contract change; - in the change description, record the reason: incident identifier, link to post-mortem, expected effect;
- prohibit deletion of mandatory facts without recording in
precedents.md; - if the threshold changed more than twice last quarter — this is a signal that the goal and check are living separately.
Tier Switching Without Budget
Symptom: when local-coder fails, all traffic automatically goes to frontier-reviewer. budget_keeper (budget keeper) is not configured or does not block overrun.
Why it's bad: the expensive tier consumes the daily quota in minutes and loses the ability to service real P0/P1 when they arrive. Failover switching turns into a source of secondary incidents.
How to fix:
- describe switching as ranked, not total (see Part 9);
- to
frontier-reviewersend only tasks withseverity in [P0, P1]andage > N; - the rest — to degradation queue, after timeout — to manual channel;
- emergency mode triggers by
token_healthand transitions the system to protected mode untillocal-coderrecovers.
Shadow Specification Without Review Deadline
> Escalation of antipattern from Volume 1. The basic version is "QWEN.md as dump" from Part 20 of Volume 1. In the learning cycle, the problem is growing context; in the applied cycle, a heuristic without an author gains the force of a contract.
Symptom: QWEN.md contains a few-shot prompt or heuristic that got there "somehow on its own". The author doesn't remember who added it or when. The entry has no ttl, proof, or evaluation reference.
Why it's bad: the heuristic acquires the force of a contract without a review procedure. It cannot be challenged (nobody remembers the author) and cannot be precisely verified (no source). After six months, the rule is applied by analogy to cases for which it was not intended.
How to fix:
- format any heuristic in
QWEN.mdwith a minimal header: author, date, proof,ttl, auction reference (see Part 6); - upon
ttlexpiration — either update or quarantine with recorded reason; - few-shot prompts that have not triggered in the last 50 incidents are candidates for deletion;
- shadow specification does not replace an approved requirement — it only guides the prompt in ambiguous cases.
Auto-Remediation Without Minimum Manual Review
Symptom: the agent automatically closes incidents based on metrics. manual_review_floor is not set or equals zero.
Why it's bad: even if metrics look clean, the agent gradually pushes the human out of the circuit. When a class of incidents appears that the model has not seen, there is no backup mechanism to notice the deviation. After several weeks, silent failures accumulate because nobody is there to catch them.
How to fix:
- explicitly set
manual_review_floor: for example, "at least 15% of incidents pass through a human regardless of metrics"; - rotation is selected randomly, not by "leave the hardest for humans" — otherwise manual review doesn't see the baseline;
- manual review results go into the replay set for the next validator run;
- any reduction in
manual_review_floorpasses as a risk contract change, not as optimization.
Readiness 25/25 as Goal, Not Description
Symptom: the team pushes all 25 items of the readiness model to green because "we need to release". Some items are advanced without real proof annotation.
Why it's bad: readiness loses meaning as an early signal. On the next release, everything is "25/25" again, but incidents return. The scale turns into a ritual.
How to fix:
- verify readiness through
evidence_ref, not through textual assertion; - 23/25 with real proof is admission; 25/25 without proof on two items is not admission;
- when an item is not met, explicitly record the risk and rollback condition, rather than "coloring it green";
- regularly run readiness in reverse: which item triggered and caught a real problem, which one hasn't triggered once in a quarter — candidate for review.
Genealogy Without Update
Symptom: genealogy.md or constitution change_log exists, but the last entry is dated three months earlier. Meanwhile, five rules have changed.
Why it's bad: provenance stops working as an evidential chain. After six months, recovering "why the agent was authorized to perform this action" is impossible, and the dispute after an incident drifts into general discussion.
How to fix:
change_logentry is a mandatory part of every constitution amendment, without it the gate does not pass merge;parent_versionis mandatory; version skip is cause for separate review;decision_hashis calculated automatically from decision content, so substitution does not pass silently;- monthly — brief audit: reconciliation of
change_logwith actual file edits. Discrepancy is recorded as a process incident.
Trail Without Proof Annotation
> Escalation of antipattern from Volume 1. The basic version is "facts on words" from Part 20 of Volume 1. In the learning cycle, a trail is not needed because the feature is small. In the applied cycle, without evidence_ref it is impossible to recover on what basis the action was performed.
Symptom: the agent saves action logs, but entries have no references to source artifacts: which specification version was applied, which constitution rules, which prompt. After an incident, recovering the decision context is impossible.
Why it's bad: audit_trace_coverage (audit trail coverage) is formally close to 100%, but the trail is useless. This is the same error as a validation.md that nobody ran, but at the audit level.
How to fix:
- every trail entry must have
spec_version,constitution_version,prompt_hash,decision_source,evidence_ref; - Spec CI checks field completeness and blocks merge if at least one is empty;
evidence_refis a path and identifier within the artifact (logs/node-2026-05-12.parquet#row_4123), not a general link to a directory;- any entry with
evidence_ref=nullis considered invalid for audit.
Diagnostic Checklist
If the applied SDD pipeline has become noisy or stopped catching regressions, answer:
- Does
constitution.mdtrigger as a gate before execution, not just as review after? - Are there rules in
mutable_ruleswithoutttlor withttlmore than 90 days? - After a poisoned specification failure, does at least one artifact change (
requirements.md,plan.md,validation.md)? - Does the Verifier use a counterexample, hook log, or JSON Schema — or only free text?
- Is there a Safety veto and deterministic tie-breaker in
governance_protocol? - Are runnable commands in the repository marked separately from
[project script]interfaces? - Does each target metric have a paired anti-Goodhart metric?
- When CI fails, do they fix the code or weaken
validation.md? - Does tier switching have a budget ceiling and emergency mode?
- Does every entry in
QWEN.mdhave an author, proof, andttl? - Is
manual_review_floorpreserved regardless of KPI value? - Is
evidence_reffilled in every trail entry?
If three or more answers are negative — do not add new automation layers: file arbitration, tiered routing, new emergency rules. First remove the noise and close gaps in the current circuit.
Summary
Applied cycle antipatterns are not catastrophic individually. The danger is their accumulation: after several releases, the team does not see the risk contract behind "green CI". The diagnostic map is the first step to repair. Each negative answer becomes a project skill, Spec CI gate, or constitution rule with a verifiable rollback_condition. Return to this chapter after every major incident: the same artifact after three months shows three different blockers.
Related Parts of Volume 1
- Part 20 of Volume 1 — basic SDD antipatterns: specification after code, giant
requirements.md, ritual/clear,QWEN.mdas dump. - Part 18 of Volume 1 — antipatterns that are simultaneously security threats.
- Part 2 — poisoned specifications as a training tool against most antipatterns in this chapter.
- Part 10 — anti-Goodhart as protection from naked KPI.
Artifacts and Readiness Criteria
| Artifact | Ready when |
|---|---|
antipattern-audit.md (or retrospective) | three fields filled: blocker, owner, next_check; each found antipattern has an owner and next checkable step |
| Answers to 12 checklist questions | completed for one selected artifact; for each negative answer there is a plan |
| Separation of blockers and improvements | audit does not fix problems in the same change without a separate diagnosis record |
Full track adds updated diagnostic checklist to [appendix-c-checklists.md](appendix-c-checklists.md), entries in precedents.md for each encountered antipattern, additions to QWEN.md for recurring ones, and Spec CI check for at least one of them. Consider it ready if Spec CI has at least one check that catches an antipattern automatically (e.g., mutable_rules without ttl), and recurring antipatterns go into precedents.md or QWEN.md only with proof and review deadline.
Practice
- Open your team's current
constitution.mdand checkmutable_rulesforttlandrollback_condition. Find at least one rule that needs to be either updated or sent to quarantine. *Expectation: one line ofblocker / owner / next_checkappears inantipattern-audit.mdfor a specific rule; ttl is set in days, not years.* - Take the latest pull request with a
validation.mdedit. Determine what changed — threshold or fact content. If threshold, check whether the change description has a link to post-mortem or incident identifier. *Expectation: for the pull request, one of two outcomes is recorded — either a justified risk contract change with incident reference, or the antipattern "validation.mddrift after red CI" with owner and rollback deadline.* - Go through the list of
[project script]blocks in one selected chapter and check which commands are real and which are interfaces. Supplement the chapter README with annotations. *Expectation: for each[project script]it is explicitly stated "has runnable analog inexamples/..." or "implement yourself"; there are no blocks without annotation.*
Control Questions
- Why is a mutable rule with
ttl: ∞more dangerous than a rule without any formulation? - How does
ask_stormdiffer from good-faith clarifications, and how to tell them apart? - What three fields make a trail entry suitable for audit, and why is
audit_trace_coverage=100%without them a Goodhart metric? - On a pull request review you see that the author changed the
silent_p0threshold from 0.05 to 0.10 and added the comment "temporarily, until stabilization". What will you do with this pull request and what three conditions must be met before such a change can be merged?