Study guide: Appendix C. Applied SDD Checklists

Lesson 3 of 5 in module «Appendix C. Applied SDD Checklists»

You are viewing the lesson without signing in. Sign in to save progress and take tests.

Topic: Appendix C. Checklists for Applied SDD

Difficulty level: Medium

Estimated study time: 4-6 hours

Prerequisites: Study of the basic specification, implementation, and merge checklists from the first volume of SDD.

Understanding of CI/CD concepts and validation automation.

Familiarity with the basic structure of artifacts: requirements.md, plan.md, validation.md, QWEN.md.

Experience with Markdown and version control systems.

Learning objectives: Learn to apply specialized checklists at each stage of the applied SDD cycle.

Master the mechanisms for working with legacy systems and resolving disputed changes through file arbitration.

Understand the principles of safe configuration of auto-remediation and protection against Goodhart's Law when optimizing metrics.

Examine the methodology of the 'poisoned specification' for testing the resilience of validation processes.

Successfully prepare a project package in the capstone/ directory to pass the final production assessment.

Overview: This study guide is a detailed manual on using Appendix C during the applied cycle of the Software Design Document (SDD). The material builds upon the basic concepts of the first volume and focuses on advanced scenarios: working with legacy systems, implementing delta-arbitration, protecting metrics from manipulation, and preparing for the final audit. The guide describes strict criteria for specification gateways (Spec CI), requirements for the evidence base, and rules for stopping automation when antipatterns are detected.

Key concepts: Poisoned specification: A single defect specially introduced into the system to test the effectiveness of quality control processes. The expected symptom of failure must be described in advance, and a successful fix must affect artifacts (spec/plan/validation), not just textual explanations.

File arbitration: A method of resolving disputes in a team, where the focus shifts from textual discussions in chat to analyzing differences (diff) in the files themselves. Requires a strict distribution of roles (Coordinator, Implementer, Verifier).

Goodhart's law and anti-Goodhart metrics: The principle that a metric ceases to be useful as soon as it becomes a goal. To protect the system, each target metric must have a paired protective (anti-Goodhart) metric and a 'red button rule'.

Auto-remediation: An automated process of restoring a system after a failure. Requires a pre-defined blast radius, mandatory dry-run, and strict rollback conditions recorded before execution begins.

Spec CI (specification gateway): Continuous integration for requirements. Checks for the presence of stable REQ-* identifiers, the coherence of the plan with requirements, and the validity of JSON examples. A CI error must contain exhaustive information: file, line, rule, reason, and action.

Genealogy and evidence artifacts: The principle that every decision, metric, or entry in QWEN.md must have an author, a decision source (prompt/policy hash), evidence, and a confidence level.

Practice exercises: Name: Configuring the Specification Gateway (Spec CI)

Problem: You are given a list of disparate API requirements. You need to prepare them for passing through Spec CI, ensuring the coherence of the plan and validation.

Solution: 1. Assign each requirement a unique identifier in the format REQ-01, REQ-02, etc. 2. In plan.md, create plan items that contain direct references to the corresponding REQ-* identifiers. 3. Describe contracts in the domain model. 4. Create JSON examples in validation.md and ensure they successfully validate against JSON Schema. 5. Configure CI error output so that it contains the file, line, and reason.

Complexity: intermediate

Name: Implementing the 'Poisoned Specification'

Problem: The team is confident that their validation process is ideal. Your task is to prove or disprove this by introducing a poisoned specification into the test set of requirements.

Solution: 1. Select one specific defect (e.g., changing a data type in an API response). 2. Introduce it into validation.md. 3. Record the expected symptom of failure in a separate document. 4. Run CI/validation. 5. If the failure is not detected, update the requirements (requirements.md) so that this type of error becomes impossible (by changing the schema or spec).

Complexity: advanced

Name: Analysis of Antipatterns in the Applied Cycle

Problem: A project consistently fails in production because the team constantly weakens validation.md when CI fails and has no restrictions on mutable_rules. Conduct an audit using the checklist.

Solution: 1. Introduce a strict ban on editing validation.md when tests fail (the code must be fixed). 2. Find all rules in mutable_rules without TTL or with TTL > 90 days and remove/update them. 3. Configure the red button rule (Safety veto) in governance_protocol. 4. Ensure that failing CI is fixed with code, not validation.

Complexity: intermediate

Case studies: Name: Auto-Remediation Crisis in a Payment Gateway

Scenario: The team implemented an auto-remediation script to restart hung payment processing services based on CPU metrics. The automation was intended to reduce MTTD (Mean Time To Detect) and MTTR (Mean Time To Resolve).

Challenge: During a recent failure, the script began restarting services in an infinite loop, leading to transaction duplication and database lockup. The blast radius had not been defined in advance, and the script had no stop condition for manual review.

Solution: The team urgently stopped the script and applied the 'Before Auto-Remediation' checklist. They introduced: a dry-run in an isolated environment, a strictly prescribed rollback condition before execution, and a manual confirmation threshold upon repeated failure. For monitoring, a paired anti-Goodhart metric (number of transaction duplicates) was added to prevent a blind pursuit of uptime.

Result: The incident was contained. Auto-remediation now triggers correctly only on an isolated failure; when the blast radius expands, the system transitions to a manual confirmation mode (manual_review_floor).

Lessons learned: Never implement auto-remediation without a pre-recorded rollback condition.

The blast radius must be strictly documented.

Aggressive automation without protection (anti-Goodhart metrics) leads to disasters.

Related concepts: Auto-remediation

Goodhart's Law

Blast radius

Study tips: Create a physical or digital checklist template for each stage (Spec CI, Auto-remediation, etc.) and use it in your current project.

Pay special attention to the difference between '[runnable]' and '[project script]' — this is a common cause of confusion during implementation.

Use the 'Three Negative Answers' rule: if three antipattern audit items are rejected, immediately stop adding new automation and fix the current circuit.

Always link practical tasks to artifacts (requirements.md, validation.md, genealogy.md) — in applied SDD, the evidence base is more important than the code.

Additional resources: Part 0 (production lab): Source text for selecting a study incident (incident-case) and understanding the structure of capstone.

Part 12 (antipattern audit): Detailed analysis of symptoms and causes of failures in the applied cycle, expanding the quick checklist.

examples/templates/ folder: Contains mandatory-to-study templates: pr-template.md, retrospective.md, clear-prompt.md, replan-prompt.md.

Appendix C of the first volume: Basic checklists that form the foundation for the current advanced guide.

Summary: Appendix C is a strict set of validation barriers for the advanced applied SDD cycle. The main rule of the guide is that automation and optimization are unacceptable without strict control (Spec CI, anti-Goodhart metrics, dry-run). Successful completion of the final production assessment requires impeccable traceability (genealogy.md, evidence_ref), the ability to work with incidents through file diffs, and the system's ability to recognize 'poisoned specifications'. If basic processes fail, development must be suspended until they are fixed.