Study guide: Part 20. SDD Antipatterns

Lesson 2 of 5 in module «Part 20. SDD Antipatterns»

You are viewing the lesson without signing in. Sign in to save progress and take tests.

Topic: Part 20. SDD Antipatterns

Difficulty level: Medium

Estimated study time: 4-6 hours (theory 2 hours, practice 2-4 hours)

Prerequisites: SDD fundamentals (Specification-Driven Development) — parts 1-5 of the course

Understanding of git, pull requests, and basic CI/CD

Experience working with AI assistants in development (Claude, GPT, Qwen, etc.)

Basic knowledge of project structure: package.json, requirements, testing

Learning objectives: Diagnose at least 8 SDD antipatterns in a real project using the checklist from part 20

Apply specific remediation techniques for each antipattern (e.g., separating QWEN.md and specs/, introducing fact-repro in validation.md)

Review validation.md and identify testing illusions (tautologies, mirrors, snapshot deception)

Create a /clear-resilient process where a new agent can continue work without chat history

Formulate PR explanations in your own words, distinguishing human and agent responsibility

Overview: This part of the course is a diagnostic map for SDD processes that have become heavy, noisy, or useless. An antipattern in SDD looks like a correct process (files in place, checks passing, agent working fast), but gradually strips the human of control over the project. The material covers 14 specific antipatterns: from specification after code and giant requirements.md to hallucinations in agent code and testing illusions. Each antipattern includes symptoms, explanation of harm, and step-by-step remediation. The final diagnostic checklist of 8 questions allows quick assessment of process health: with three negative answers, simplification is recommended instead of adding new tools.

Key concepts: Specification after code: An antipattern where the agent first implements a feature, then appends requirements.md, plan.md, and validation.md to match the finished code. Specification becomes a report instead of a steering tool. Remediation: commit draft specification before implementation, prohibit product code in specification creation sessions, explicitly show specification commit before implementation commit in PR.

Giant requirements.md: A single requirements file contains dozens of items, multiple scenarios, future phases, and disputed decisions. The agent starts choosing priorities itself, the human loses boundaries. Remediation: split into phases, move future items to roadmap.md, keep only current branch, mark disputed decisions as questions.

Unrun validation.md: The validation contains beautiful facts but no traces of their execution. Creates a false sense of readiness. Remediation: store command/scenario next to each fact, require agent to list passed/failed/unverified facts, do not consider a fact confirmed without reproducibility.

Weakening facts after failure: Test fails — agent changes expected result in validation.md instead of code. The process protects the agent's implementation, not the product intent. Remediation: require showing discrepancy without edits, review validation.md changes with extra care, preserve reason for change, prohibit removal of mandatory facts without human decision.

Ritual /clear: /clear command is invoked between phases, but after it the agent receives a long explanation from chat. Visibility of transferability masks actual dependency on human memory. Remediation: after /clear provide only file references, verify understanding with a new session, append specifications instead of expanding prompt.

Skill as magic button: A command invokes Qwen Code skill, but nobody reads SKILL.md or understands its decisions. The skill becomes a hidden process. Remediation: store project skills in repository, review SKILL.md as process code, write constraints, do not create a skill before 2–3 repetitions of manual process.

Qwen.md as dump: QWEN.md accumulates product requirements, stack, personal preferences, temporary tasks, and bug notes. The agent stops distinguishing permanent rules from temporary context. Remediation: product decisions in specs/, behavior rules in QWEN.md, temporary conclusions in memory or retrospective, regular cleanup of outdated items.

Hook that silently changes project: Hook formats, rewrites, or deletes files without an explicit step in plan. Changes outside agent and human control. Remediation: hooks by default verify or log, formatting only as explicit command rule, all changes in git diff, on block — explanation of reason.

Memory as hidden source of truth: Agent makes decisions based on memory, but they are not in specs/, QWEN.md, or AGENTS.md. New participants cannot see the basis for decisions. Remediation: memory = hint, not rule; transfer recurring conclusions to reviewable files; delete outdated memory; in conflict memory vs specification — choose specification.

MCP without purpose: MCP servers are connected "for the future". Agent gets extra powers, team doesn't understand possible external actions. Remediation: connect only for specific scenario, limit tools, store configuration in reviewable place, disable experimental servers after testing.

Too large MVP: First version includes auth, roles, analytics, interface, migrations, import, integrations. Agent quickly creates many files, human cannot assess quality. Remediation: first phase proves one risk, time limit, rollback to last green state on scope creep, add features only after verified facts.

Hallucinations in agent code: Agent confidently references non-existent functions, methods, packages. Especially dangerous are non-existent package names (slopsquatting attack: attacker registers similar name, npm install pulls malicious code). Remediation: allowed dependencies list in tech-stack.md, separate review step for adding dependencies, version check on first error, visual verification of package names, require reference to definition when referencing new function.

Testing illusions: npm test is green, but bug remains. Subtypes: tautological test (compares with same expression), mirror test (checks what was returned without independent expectation), snapshot deception (bug captured in snapshot as "correct"), test that never fails on any error. Remediation: fact-repro in validation.md, reading actual tests during review, mutation testing (Stryker for Vitest), ban snapshots for business logic.

Developer doesn't understand their PR: PR author cannot explain the decision, forwards agent's answers. Responsibility blurs, future maintenance impossible. Remediation: rule — author explains PR in their own words, reviewer asks question with answer only from human, encouragement of "pair SDD", re-reading git diff and formulating "I did X because Y".

Practice exercises: Name: Repository diagnosis using checklist

Problem: You are given access to a project repository that has been using SDD for 3 months. Perform diagnosis using 8 questions from the part 20 checklist. For each negative answer: identify the specific antipattern, find evidence in the repository (commit, file, PR), propose remediation with example of new state. The repository contains: requirements.md of 200 lines with marks 'phase 2' and 'discuss', validation.md with facts without commands, QWEN.md with product decisions and personal preferences, pre-commit hook that auto-formats code, 3 MCP servers in configuration, 2 of which are not used in current tasks.

Solution: 1. Question 1 (specification after code): Check git log --oneline -- requirements.md plan.md validation.md — files committed after implementation. Antipattern: 'Specification after code'. Remediation: git rebase -i with commit reordering, add rule in CONTRIBUTING.md: specification before implementation. 2. Question 3 (QWEN.md vs specs/): In QWEN.md found 'we use PostgreSQL' (product) and 'don't use await in loops' (behavior rule). Antipattern: 'QWEN.md as dump'. Remediation: move 'PostgreSQL' to specs/architecture.md, keep only agent rules in QWEN.md. 3. Question 4 (hooks): pre-commit changes files without step in plan.md. Antipattern: 'Hook that silently changes project'. Remediation: replace with verification hook, formatting via explicit npm run format command in CI. 4. Question 5 (MCP): 2 unused servers. Antipattern: 'MCP without purpose'. Remediation: remove from configuration, comment with experiment date and decision. 5. Question 7 (/clear): Test — create new session, provide only file references, verify task understanding. If doesn't understand — write specs/current-task.md. Total 4 negative answers — process simplification required before adding new tools.

Complexity: intermediate

Name: Reviewing validation.md to identify illusions

Problem: You are sent validation.md and tests for 'discount calculation' feature. Tests pass (95% coverage). Find testing illusions: (1) test('10% discount', () => expect(calcDiscount(100, 10)).toBe(100 * 0.9)); (2) test('discount works', async () => { const result = await calcDiscount(200, 20); expect(result).toBe(result); }); (3) snapshot test for price formatting function; (4) test('does not crash', () => { expect(() => calcDiscount('abc', 'def')).not.toThrow(); }). For each: classify the illusion subtype, explain what bug will remain unnoticed, rewrite the test correctly.

Solution: (1) Tautological test: 100 0.9 — same expression as inside function. Bug: if function simply returns price (100 - percent) / 100, renaming variable breaks it, but logic is not verified. Correct: const expected = 90; expect(calcDiscount(100, 10)).toBe(expected); + separate test for boundary values (percent = 0, 100, 101). (2) Mirror test: expect(result).toBe(result) always true. Bug: any result is considered correct, including undefined, null, calculation error. Correct: hardcode expected = 160; + type check. (3) Snapshot deception: first run created snapshot with error. Bug: formatting '1 000,00' vs '1 000.00' captured as correct and never verified. Correct: explicit expects with locale, ban snapshot for business logic. (4) Test that never fails on any error: only assertion is not to throw. Bug: function returns NaN, null, string 'NaN' — test is green. Correct: valid inputs checked separately, errors checked with expect().toThrow() with specific message. Additionally: add fact-repro in validation.md — command that failed before fix (e.g., calcDiscount(-10, 10) returned negative price).

Complexity: intermediate

Name: Transforming QWEN.md 'dump' into structured process

Problem: Given QWEN.md from a real project (fragment): '# QWEN.md\n\n## Product\nWe are building a CRM for dental clinics. Main screen — appointment calendar.\n\n## Stack\nReact 18, Node 20, PostgreSQL 15.\n\n## My preferences\nI can't stand async/await in loops, write via Promise.all.\n\n## Temporary\nBug #234: not fixing for now, client agreed.\n\n## Error March 15\nAgent used lodash, though we rejected it. Do not use anymore.\n\n## Agent rules\n- Always write tests before code\n- Don't change package.json without asking'. Transform into proper structure: separate by purpose, indicate where to move each block, which rules are outdated and should be deleted, which require regular review.

Solution: Structuring per part 20 rules: 1. 'Product' + 'Stack' → specs/product.md and specs/tech-stack.md (product decisions not in QWEN.md). 2. 'My preferences' → delete or transform into objective rule: 'Prefer parallel execution of independent operations via Promise.all if order is unimportant' — in QWEN.md as agent behavior rule. Personal 'can't stand' is unacceptable. 3. 'Temporary: Bug #234' → move to retrospective or memory with expiration date, delete from QWEN.md after 2 weeks of closure. 4. 'Error March 15' → if rule is still relevant ('do not use lodash'), move to specs/dependencies.md with justification; if agents no longer violate — delete as outdated. 5. 'Agent rules' — keep in QWEN.md, enhance: 'Always write tests before code' → clarify 'Follow TDD: fact in validation.md → test → implementation'. 'Don't change package.json without asking' → strengthen: 'Adding dependency — separate step with tech-stack.md review'. Review: QWEN.md monthly, specs/ — on architecture change, memory — weekly for cleanup.

Complexity: intermediate

Name: Building a /clear-resilient process

Problem: You have been working on 'payment system integration' feature for 2 weeks. Chat history contains 50+ messages with clarifications, deviations from plan, compromises. Tomorrow a new developer (and new agent) joins the project. Create a minimal set of files that allows continuing work after /clear without retelling from chat. Consider: current phase — webhook testing, known problem — sandbox responds 202 instead of 200, accepted decision — retries with exponential backoff (not in original specification).

Solution: Create files: 1. specs/payment-integration/current-phase.md: 'Phase 3.3: Webhook testing. Status: in progress. Blocker: sandbox returns 202 instead of documented 200. Decision: retries with exponential backoff (max 5 attempts, base delay 1s, multiplier 2). Decision made 2024-01-15, reason: sandbox environment of payment system incompatible with documentation, production not affected. Next step: validation of retries under load.' 2. specs/payment-integration/decisions.md: record of decision with context, alternatives (waiting for payment system fix — rejected, timeline unknown), human signature. 3. validation.md: update fact 'Webhook handles 202 from sandbox' with curl command for reproduction + fact 'Retries do not exceed 30s total' with load command. 4. QWEN.md: add rule 'For external API integrations: document discrepancies with documentation in specs/<integration>/discrepancies.md'. 5. Verification: new session receives only references to these 4 files + task 'continue phase 3.3'. If agent suggests changing retries without reading decisions.md — process is not resilient, specification strengthening required.

Complexity: advanced

Case studies: Name: MVP crash: when agent built too much in 48 hours

Scenario: An EdTech startup commissioned development of an online course platform. Request: 'MVP in 2 weeks — registration, course viewing, basic analytics for instructors'. Agent (Claude Code with large context access) generated in 48 hours: full authorization with roles (admin, instructor, student, guest), field-level permission system, analytics dashboard with 12 widgets, data migration from CSV, SendGrid and Stripe integration. Everything 'worked' — npm test green, 150+ files, 80% coverage.

Challenge: Founder could not explain how the permission system worked: 'Agent said it's more secure'. Under first real load (20 simultaneous registrations) discovered: race condition in email uniqueness check, missing transactions in critical operations, 'analytics' calculated metrics client-side over full dataset. 80% coverage was an illusion — tests verified function existence, not correctness. Attempting to fix one error broke three others due to hidden dependencies. Project went for full rewrite after 6 weeks.

Solution: Applying part 20 antipatterns: 1. 'Too large MVP' — first phase should prove one risk. In this case: can we quickly create and display a course? Everything else — subsequent phases. 2. 'Testing illusions' — implementing fact-repro: each fact in validation.md accompanied by command that fails before fix. Mutation testing (Stryker) revealed that 60% of 'covered' tests don't catch mutations. 3. 'Developer doesn't understand their PR' — introduced rule: founder must explain PR in their own words, otherwise merge is blocked. 4. 'Specification after code' — rewrite started with draft specification 'one course, one user, one page'.

Result: Rewritten MVP (effectively 'nano-MVP') — email registration, creating course with text, viewing list — was ready in 3 days with 12 files. Founder understood every decision. Phase 2 (analytics) revealed that original 12 widgets were not needed: instructors only asked 'how many students started and finished'. Total time to first paying user reduced from 10 weeks to 4, support cost — 8 times lower.

Lessons learned: Code coverage as a number is an illusion metric if tests don't verify behavior. Mutation testing is mandatory for critical paths.

An agent can create a 'working' project that a human cannot maintain. Human control is measured by ability to explain PR, not code generation speed.

MVP is an experiment to verify one risk, not a mini-version of full product. Each phase must have one measurable risk and a fact verifying it.

Related concepts: Too large MVP

Testing illusions

Developer doesn't understand their PR

Specification after code

Name: Slopsquatting attack: when agent hallucination became a vulnerability

Scenario: A team of 5 developers used SDD for a document processing microservice. Agent suggested a PDF parsing library — 'pdf-parse-pro' instead of known 'pdf-parse'. Developer didn't visually check the name, npm install worked, tests passed (agent wrote stubs for missing methods). Package turned out malicious: when parsing documents above 10 MB, it sent contents to external server. Discovered 3 weeks after production deployment.

Challenge: Vulnerability arose at intersection of two antipatterns: 'Hallucinations in agent code' (non-existent package, name similar to real one) and 'Testing illusions' (agent stubs masked missing real functionality). Standard SDD process didn't include separate dependency review — they were added as part of 'implement feature'. tech-stack.md existed but didn't contain allowed dependencies list.

Solution: Implementing multi-layered defense per part 20: 1. tech-stack.md — explicit allowed dependencies list with versions, justification, and alternatives. 2. Adding dependency — separate review step, equivalent to architecture change. 3. Visual verification: name differing from familiar by one letter — stop signal, requires search in official registry and check of downloads/age. 4. On first type or runtime error — mandatory check against package.json and package existence verification. 5. Snyk and npm audit integration in CI, blocking PR on new dependencies without review.

Result: After implementation over 6 months: 2 attempts to add similar names ('expresss' instead of 'express', 'lodash-es-extra') blocked at CI stage. Time to add legitimate dependency increased from 0 to 15 minutes (necessity discussion). 'Shadow' dependencies reduced by 40%. Team consciously rejected 3 'convenient' packages in favor of built-in Node.js solutions.

Lessons learned: Agent hallucinations are not just code quality errors, but security vectors. Package names require same rigor as secrets and access credentials.

Agent stubs for non-existent APIs create false sense of functionality. Test must fail before implementation, otherwise it's not TDD but self-deception.

'Separate review step for dependencies' process seems excessive until you encounter an incident. Optimal review frequency is not zero, but conscious.

Related concepts: Hallucinations in agent code

Testing illusions

Skill as magic button

MCP without purpose

Name: Regaining control after 'ritual /clear'

Scenario: A team of 2 (tech lead and product manager) developed a data visualization tool for 4 months. Process 'worked': /clear every 3-4 days, then 10-minute context retelling in chat. When tech lead went on vacation, product manager tried to continue with new agent. New session didn't understand: which phase is active, why this or that data format was chosen, what compromises were made. 2 weeks went to context recovery, project froze.

Challenge: Root problem: /clear was used as a 'cleansing ritual', but process fully depended on tech lead's memory in chat. No decisions, compromises, deviations from plan were recorded in reviewable files. Antipatterns: 'Ritual /clear', 'Memory as hidden source of truth', partially 'QWEN.md as dump' (temporary conclusions accumulated in agent memory).

Solution: Radical transformation per part 20: 1. After /clear — only file references, no 'reminders' from chat. 2. Implement specs/decisions/ with template: context, considered options, chosen decision, reason, date, signature. 3. Each phase — separate specs/phase-N.md with current status, blockers, next step. 4. Transferability check: every week one developer started new session with /clear and only files, verifying task understanding. 5. Agent memory — periodic cleanup, recurring conclusions migrated to specs/ or QWEN.md.

Result: After 3 months: new developer onboarded in 2 hours (reading specs/decisions/ + current phase), instead of 2 weeks. Tech lead vacation no longer blocked project. Discovered side effect: recorded decisions allowed rejecting 40% of agent suggestions as 'already considered and rejected for reason X'. Development speed decreased by 15% (documentation time), but predictability increased by an order of magnitude.

Lessons learned: /clear is an agent tool, not a process. Process is resilient if new session understands task from files, not from human chat.

Agent memory is valuable as a hint, but dangerous as a rule. Recurring conclusions must migrate to reviewable files.

Recording decisions 'slows down' development, but 'speeds up' team scaling and reduces dependency on specific people.

Related concepts: Ritual /clear

Memory as hidden source of truth

QWEN.md as dump

Developer doesn't understand their PR

Study tips: Go through material sequentially, but after each antipattern pause and check your current project — diagnosis in real context reinforces better than theory

Keep an 'anti-diary': for each antipattern found in your practice record specific file/commit/PR, remediation, and check date in a month

Study with a colleague in pairs: one finds antipattern, other verifies if it's really that or false positive — develops critical thinking

Use final checklist as weekly routine, not one-time diagnosis; set reminder in calendar

For visuals: create mind-map with 14 antipatterns, connections between them (e.g., 'ritual /clear' → 'memory as source of truth') and your project examples

For practitioners: don't try to fix all antipatterns at once; pick 3 with highest negative impact per your checklist, implement in a week, then next ones

For audials: verbally explain each PR description before merge — if 'I did X because Y' doesn't add up, PR is not ready

Additional resources: Part 16 of course (reviewing antipatterns): Same errors from reviewer perspective: how to catch antipattern in someone else's pull request

Part 18 of course (security): Antipatterns that are simultaneously security threats: secrets in specifications, MCP without review, weakened validation.md

Part 9 of course (fact matrix): Matrix 'feature type → fact level' that directly cures 'weak validation.md' antipattern

Part 22 of course (pair SDD): Pattern 'one writes specification, second reviews before implementation' — prevention for 'developer doesn't understand their PR'

Stryker mutator (for vitest): Mutation testing tool for identifying testing illusions — https://stryker-mutator.io/

Npm audit and snyk: Tools for automatic dependency checking, supplement to manual package name review

Appendix c from course (PR template): PR description template requiring explanation in own words

Summary: SDD antipatterns are habits that look like a correct process but break the result. Key difference from ordinary errors: files in place, checks passing, agent working fast, but human gradually loses control. Part 20 provides a diagnostic map of 14 antipatterns: from specification after code and giant requirements.md to hallucinations in agent code and testing illusions. Each antipattern has clear symptoms, explanation of harm, and specific remediation. Final checklist of 8 questions allows quick assessment of process health: with three negative answers the rule is simple — don't add new tools, simplify the process. Main principle: SDD works when agent is guided by specification, not replacing it; when human understands their PR, not forwarding agent's answers; when new session after /clear continues work from files, not from chat memory.