Topic: Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle
Difficulty level: Medium
Estimated study time: 6-8 hours
Prerequisites: Basic knowledge of SDD (Software Design Document)
Completion of Part 20 of the first volume (basic SDD antipatterns)
Experience working with SDD artifacts (requirements.md, validation.md, plan.md)
Understanding of Spec CI and duel concepts
Familiarity with tiered budgeting (Part 9)
Understanding of Goodhart metrics (Part 10)
Learning objectives: Conduct a diagnostic audit of a production SDD package and identify at least 3 antipatterns (or confirm their absence) in 15-30 minutes
Create a diagnostic map with three fields: blocker, owner, next_check for each found antipattern
Apply the 12-question diagnostic checklist to a selected artifact with documented yes/no/not_applicable answers
Distinguish production variants of antipatterns from their basic versions in the first volume and escalate understanding of the problem scale
Design at least one Spec CI gate that automatically catches the found antipattern
Overview: This part is a diagnostic map of the applied SDD cycle, collecting antipatterns that arise specifically in the production contour: during duels, file arbitration, Spec CI, tiered budgets, anti-Goodhart metrics, and auto-remediation. The goal of the chapter is not to memorize a list of names, but to conduct a short audit of an already assembled production SDD package and obtain three diagnostic lines: what blocks admission, who is responsible for fixing it, and when it will be checked again. Each antipattern is analyzed according to a three-field schema: Symptom (what is observed), Why it's bad (consequences for the team or agent), How to fix (minimum steps before automation). The antipattern catalog includes 17 items, some of which are escalations of basic antipatterns from the first volume, while others appear only in the applied cycle. Sections can be read in any order, as each record is independent and follows the same three-field structure. Warning signal: once three or more antipatterns are found in the same production package, they should be treated as a connected network of problems with a common root in a lost risk contract.
Key concepts: Diagnostic blocker: A specific obstacle preventing code from reaching production. Determined through 'no' answers on the diagnostic checklist. Each blocker requires an owner and a next check date.
Antipattern 'constitution as cosmetics': A situation where constitution rules formally exist but do not constrain the agent at execution time. The gate before dangerous actions does not trigger, scripts/constitution/check.py is not called.
Mutable rule with ttl: ∞: A rule in mutable_rules without a lifetime or with an excessively long ttl. Over time turns into a hidden part of the invariant that everyone fears to touch, and is applied by analogy to unsuitable situations.
Poisoned specification without diff in artifacts: After training on poisoned specifications, the patch only fixes the explanation text without touching requirements.md, plan.md, or validation.md.
Ask storm: A counter of repeated clarifying questions without obtaining new data. The agent does not proceed to resolution, creating an appearance of caution instead of real progress.
Stage regress: Rollback to a previous phase of the SDD cycle without explicit reason or record. Leads to drift when the project has multiple semi-drafts, none of which is closed by fact.
Phase context loss: Loss of context between phases: specify recorded a decision, plan did not inherit it, implement started with a draft that never passed validation.md.
File arbitration without veto and tie-breaker: governance_protocol is described only as '2 approve out of 3' without critical veto from Safety and a deterministic tie-breaker. Decisions are made by voting without accounting for critical risks.
Fictitious [project script]: The specification mentions a command like python3 scripts/spec_ci/check_scope.py, but the script itself does not exist in the repository. The check is 'completed' by assumption, not observation.
Bare kpi without paired counter-metric: A target metric (MTTR, coverage, auto_close_rate) without an anti-Goodhart metric. The agent learns to fulfill the metric at any cost, real quality drops.
Validation.md drift after red ci: After CI failure, the PR author changes the threshold or removes a fact instead of fixing the code. The change description is 'clarified validation'.
Switching between tiers without budget: When local-coder fails, all traffic goes to frontier-reviewer without budget_keeper. The expensive tier consumes the daily quota in minutes.
Shadow specification without ttl and author: QWEN.md contains a heuristic without author, date, or evidence. Applied by analogy to cases for which it was not designed.
Auto-remediation without manual review floor: The agent automatically closes incidents, metrics look clean, but there is no manual review. Silent failures accumulate without a backup mechanism.
Readiness 25/25 as a goal: The team pulls all readiness model items to green in advance, without real evidence_ref. The scale turns into a ritual.
Genealogy without actualization: The constitution's change_log is outdated, while several rules have changed. Provenance stops working as an evidentiary chain.
Trace without evidence ref: Action logs are saved, but without references to spec_version, constitution_version, prompt_hash. audit_trace_coverage approaches 100%, but the trace is useless.
Anti-goodhart metric: A paired metric protecting the target from manipulation. For MTTR — silent_p0_ratio and manual_review_floor; for coverage — mutation_kill_rate.
Three fields of the diagnostic map: blocker, owner, next_check — the minimum fragment for credit. If negative answers are only turned into general advice, the map has not fulfilled its function.
Tier budget ceiling: The maximum token limit for a tier. When exceeded — protected mode until the base tier is restored.
Important dates: Quarterly audit: Recommended time to check if a metric threshold has changed more than twice
Mutable rule review: First review — 30-90 days from creation
Ttl shadow specification: Duration of heuristic validity in QWEN.md until automatic quarantine
Practice exercises: Name: Constitution readiness audit
Problem: Open your team's current constitution.md and check mutable_rules for ttl and rollback_condition. Find at least one rule that needs to be either updated or sent to quarantine. Create a record in antipattern-audit.md with one blocker/owner/next_check line for the specific rule.
Solution: 1. Open constitution.md and find the mutable_rules section. 2. For each rule check: is there a ttl (in days, not years), is there a rollback_condition (a verifiable predicate). 3. If ttl is absent or ∞ — this is a blocker. 4. Determine owner (who is responsible for this rule). 5. Set next_check (when to check again, minimum 30 days). 6. Record in antipattern-audit.md: | ready_without_ttl | platform | review in 60 days or send to quarantine |. 7. If the rule has not triggered once during its lifetime — candidate for removal.
Complexity: intermediate
Name: Pull request analysis with validation.md
Problem: Take the latest PR with a validation.md fix. Determine what changed — threshold or fact content. If threshold, check if the change description contains a link to a post-mortem or incident identifier. Record the outcome: justified risk contract change or antipattern 'validation.md drift after red CI'.
Solution: 1. Find the latest PR that changed validation.md. 2. Determine change type: (a) threshold — dangerous signal, (b) fact content — may be normal. 3. If threshold changed: (a) link to incident/post-mortem exists — justified change, (b) no link or 'temporary' comment — this is an antipattern. 4. For antipattern: who is owner? who should have checked? When is next check? 5. Record in antipattern-audit.md. 6. If change is justified — ensure it passed separate review as a risk contract change.
Complexity: intermediate
Name: Verification of [project script] blocks
Problem: Go through the list of [project script] blocks in one selected chapter (chapters 8-11) and verify which commands are real and which are 'implement yourself' interfaces. Supplement the chapter README with annotations.
Solution: 1. Select a chapter (e.g., chapter 8). 2. Find all [project script] blocks. 3. For each: (a) does the file exist at the specified path? (b) if exists — is it a runnable equivalent, (c) if not — this is an interface. 4. Determine runnable commands: verify the script passes python3 scripts/... or equivalent. 5. For fictitious scripts: (a) create a ticket with a fixed implementation date, (b) mark in README 'implement yourself'. 6. Record results in antipattern-audit.md.
Complexity: intermediate
Name: Metric calibration
Problem: Find at least two target metrics (KPI) in validation.md. Check if each has a paired anti-Goodhart metric. If no pair — add a recommendation to the audit.
Solution: 1. Open validation.md. 2. Find metrics: MTTR, coverage, auto_close_rate and similar. 3. For each check for paired metric: (a) MTTR → silent_p0_ratio + manual_review_floor, (b) coverage → increase_in_incident_severity, (c) auto_close_rate → false_negative_rate. 4. If no pair — this is 'bare KPI' antipattern. 5. Determine owner (who will add the paired metric). 6. Record: | KPI_without_counter_metric | devex | add anti-Goodhart by next sprint |.
Complexity: advanced
Name: Full diagnostic audit
Problem: Select one artifact from chapters 8-11 (judgment.md, validation.md, budget_network.yaml, or readiness table). Conduct a full audit using the 12-question diagnostic checklist. Record answers (yes/no/not_applicable), for each 'no' — blocker, owner, next_check.
Solution: 1. Select an artifact and do not expand the review scope. 2. Prepare the diagnostic checklist. 3. Answer 12 questions, each with a short reference to file or evidence. 4. For each 'no': (a) determine antipattern, (b) owner, (c) next_check. 5. Minimum three records in the final table. 6. Record in antipattern-audit.md or retrospective.md. 7. Do not fix problems in the same file — the diagnosis must be visible first.
Complexity: advanced
Case studies: Name: Marketplace SDD package audit: from chaos to order
Scenario: A marketplace development team has been working with the SDD process for six months. The production contour has become noisy: CI is sometimes 'green', but incidents keep returning. After each incident, the team adds new rules to constitution.md, but they don't work — the agent continues to perform dangerous operations. Meanwhile, artifacts are formally in order: there is requirements.md, validation.md with gates, judgment.md with verdicts.
Challenge: Need to understand why a formally correct system does not catch regressions. Initial surface analysis showed: about 40 rules in mutable_rules, but none with ttl. In validation.md, MTTR<=5m and coverage>=80% are specified, but no paired metrics. File arbitration works on '2 approve out of 3' principle, but the Safety role has no veto. Between phases, context is lost: plan inherits an old version of requirements.md, implement works with a draft that never passed validation.md.
Solution: A diagnostic audit was conducted using 12 questions. 7 antipatterns were found: Constitution as cosmetics — rules exist, but scripts/constitution/check.py is not connected to the gate. A Spec CI step was created to check for check.py call before any merge. Mutable rules with ttl: ∞ — 12 rules without review period. Each was given a 60-day ttl and first review in 30 days. Bare KPI — MTTR without silent_p0_ratio. A paired metric was added. File arbitration without veto — added safety_veto: critical_risk and tie-breaker. Fictitious [project script] — 3 commands from documentation do not exist. Tickets were created with deadlines. phase_context_loss — introduced project skill check_phase_handoff, verifying links between phases.
Result: One month after the audit: production incidents decreased by 35%. Time to investigate each incident was halved — now there is trace with evidence_ref. Spec CI automatically catches missing check.py and missing evidence_ref. New team onboarding uses the diagnostic checklist as an entry checklist. The team understands that SDD is not about beautiful documents, but about a risk contract that is checked automatically.
Lessons learned: Formally correct documents without automatic checks are cosmetics, not a contract. A rule must either be checked automatically, or be explicitly marked as 'instruction, not gate'
An audit should take 15-30 minutes per artifact. If the audit expands to the entire project — it is no longer an audit, but a review. Focus is critically important
Three fields blocker/owner/next_check are the minimum, not the maximum. Every negative answer on the checklist must turn into a concrete action with an owner and deadline
Related concepts: Diagnostic blocker
Antipattern 'Constitution as cosmetics'
Bare KPI without paired counter-metric
File arbitration without veto
Name: Context recovery after stage_regress
Scenario: A project implementing an auto-remediation system went through several phase rollbacks. The original specification defined manual_review_floor=15%. However, after a series of incidents, plan was rewritten three times, each time without recording reasons. Implement started work with the final plan draft, which contained manual_review_floor=0%. validation.md was also updated, but referenced an old requirements.md. After a month, the team could not recover who and why decided to exclude manual review.
Challenge: After deployment, the system automatically closed 200+ incidents without a single manual review. Metrics looked excellent: auto_close_rate=0.95, MTTR=3m. However, during audit it was discovered that the model missed a class of new incidents it had not seen in the training sample. Since manual_review_floor=0%, no one noticed the accumulation of silent failures. A manual review of all 200+ incidents had to be conducted post-factum.
Solution: A mandatory process was introduced for stage_regress: any rollback requires a record in genealogy.md with reason and link to incident/discussion. Spec CI blocks merge if plan.md contains links to non-existent requirements.md. Auto-remediation now requires manual_review_floor>=15% regardless of auto_close_rate value. A project skill was created that, upon detecting stage_regress, triggers a message to the channel with a reminder about the procedure.
Result: In the last 90 days, there has not been a single stage_regress without a record. Auto_remediation now has a real floor — manual checks are distributed randomly, not 'by complexity'. Silent failures are caught in time: on average 2-3 cases per week go to manual review and are added to the training sample.
Lessons learned: SDD cycle without records of phase transitions is drift, not a process. Every step back loses the context of the previous one
Auto-remediation without manual_review_floor is transferring control to an agent that optimizes metrics, not the user contract
Metrics look excellent when there is no person checking their meaning
Related concepts: stage_regress without explicit reason
Auto-remediation without minimum manual review
phase_context_loss between phases
Name: Escalation of 'QWEN.md as dump' in production
Scenario: A platform team has been maintaining QWEN.md for a year and a half. During this time, hundreds of heuristics, few-shot examples, and temporary rules accumulated there. When disputed situations arise, participants cite 'rules from QWEN', but no one remembers who added them, when, and on what basis. The heuristic 'if an incident looks like P0 but metrics are green — it's P0' is cited regularly, although in the last 50 incidents it has not triggered once.
Challenge: After a major incident (payment system failure for 3 hours), it turned out that one participant made a rollback decision based on a heuristic from QWEN.md, which was added 'sometime by someone' and was not documented anywhere. Decision history is not reproducible. During investigation, it is impossible to establish who and on what basis recommended a particular action.
Solution: A mandatory QWEN.md audit was conducted: each entry required author, date, evidence (link to experiment or incident), ttl. Entries without ttl were sent to quarantine. The heuristic that had not triggered in the last 50 incidents was removed. A rule was introduced: any new heuristic goes through an auction procedure — at least three participants must confirm its value before adding. Spec CI checks for ttl and author for each entry.
Result: QWEN.md was reduced from 340 entries to 89. Each remaining entry has an author, date, evidence, and ttl. Time to find a relevant heuristic was reduced by 70%. In disputed decisions, provenance can now be traced — who recommended and on what basis.
Lessons learned: A shadow specification without author and ttl eventually acquires the force of a contract that cannot be challenged and cannot be verified
The number of entries in QWEN is technical debt, not knowledge coverage. Regular cleaning is mandatory
A heuristic without evidence is an opinion that should not constrain agent behavior
Related concepts: Shadow specification without review period
Trace without evidence mark
Genealogy without actualization
Study tips: Read the antipattern catalog as a checklist, not a dictionary. For the first pass, it is enough to find three antipatterns or confirm their absence — the chapter can be closed with three blocker/owner/next_check lines
Do not try to study all 17 antipatterns at once. Focus on those most relevant to your current project or artifact
When going through the diagnostic checklist, work with one artifact (judgment.md, validation.md, or budget). Expanding the review scope turns the audit into a multi-hour review
Remember: your goal is not to fix the found problems, but to record the diagnosis. Fixes are separate tasks with separate owners
Connect each antipattern with already familiar concepts from the first volume. If you remember 'a constitution that no one opens', it is easier to understand the escalation 'constitution as cosmetics' in production
Practice in pairs: one conducts the audit, the other asks clarifying questions. This helps to see blind spots and trains the ability to explain antipatterns aloud
After each major incident, return to this chapter and conduct a repeat audit of the same artifact. The same artifact after three months shows three different blockers
Make it a habit to check evidence_ref in every trace entry. It is a small thing, but without it any post-factum audit turns into a detective investigation
If three or more answers on the diagnostic checklist are negative — do not add new layers of automation. First remove the noise and close the gaps in the current contour
Additional resources: Part 20 of the first volume: Basic SDD antipatterns: specification after code, giant requirements.md, ritual /clear, QWEN.md as dump
Part 18 of the first volume: Antipatterns that are simultaneously security threats
Part 2: poisoned specifications: Training tool against most antipatterns in this chapter
Part 9: tiered budgeting: More about budget ceiling and failover between tiers
Part 10: anti-goodhart metrics: Protection against bare KPI through paired counter-metrics
Book2/examples/templates/retrospective.md: Template for a short record of diagnostic audit output
Appendix-c-checklists.md: Updated diagnostic checklist
Appendix d.4: Protecting metrics from Goodhart — threshold calibration
Summary: Applied Part 12 presents a diagnostic map for auditing the production SDD contour. Key idea: artifacts exist, checks exist, the agent works fast, but control over the system is gradually slipping away. A catalog of 17 antipatterns (some — escalations from the first volume, some — unique to the applied cycle) is analyzed according to the Symptom → Why it's bad → How to fix schema. Minimum learning scenario: select one artifact, answer 12 questions of the diagnostic checklist, for each 'no' record blocker/owner/next_check. The goal is not to memorize names, but to conduct a 15-30-minute audit and obtain three diagnostic lines. The danger is not antipatterns individually, but their accumulation: over several releases, the team no longer sees the risk contract behind the 'green CI'. Applied cycle antipatterns are not catastrophic individually. Their accumulation is dangerous: over several releases, the team no longer sees the risk contract behind the «green CI». The diagnostic map is the first step to repair. Return to this chapter after every major incident.