Study guide: Part 9. Feature Validation: From Specifications to Facts

Lesson 2 of 5 in module «Part 9. Feature Validation: From Specifications to Facts»

You are viewing the lesson without signing in. Sign in to save progress and take tests.

Topic: Part 9. Feature Validation: From Specifications to Facts

Difficulty level: Medium

Estimated study time: 4-6 hours (theory 2 hours, practice 2-4 hours)

Prerequisites: Basic understanding of Git and branch workflow

Experience writing tests (unit/integration)

Familiarity with TypeScript and basic npm commands

Understanding of client-server architecture (HTTP, REST)

Experience working with markdown documentation

Recommended: familiarity with materials from Parts 1-8 of the course (SDD, specs, roadmap)

Learning objectives: Compose validation.md with clear, verifiable facts for any feature, distinguishing facts from prose wishes

Select the optimal level of facts (examples, invariants, properties, contracts) based on the risk matrix by feature type

Apply the fact lifecycle statuses (draft → mandatory → implemented → deferred) for quality control before merge

Form an evidence bundle for review, including fact statuses, command traces, and manual check results

Use agents (Qwen Code) for automatic code reconciliation with validation.md and identifying specification deviations

Overview: This part of the course is dedicated to the critical transition from text specifications to verifiable facts — a mechanism that turns intentions into proof of feature readiness for merge. Specifications explain what should be done, but do not by themselves guarantee correct implementation. Facts are executable or unambiguously verifiable statements that a machine or human can confirm without re-interpretation. The material covers four levels of facts, a risk matrix for choosing the right density of checks, the structure of validation.md, the fact lifecycle, integration with CI/CD, manual and automatic checks, as well as forming an evidence bundle for review. Special attention is given to working with code-writing agents: since the model may interpret the same specification differently, facts become the only reliable gate for merge.

Key concepts: Fact: An executable or unambiguously verifiable statement that requires no interpretation. Examples: npm run typecheck exits with code 0; GET / returns 200; response contains <h1>AgentClinic</h1>. Contrasted with prose wishes like "make sure the page looks good".

Specification vs fact: Specification guides development, explaining intent and boundaries. Facts gate merge by providing objective evidence. Short formula: "Specifications guide. Facts gate merge."

Four levels of facts: Examples (concrete input-output pairs: one curl, one test), Invariants (always-true statements: re-running migration does not change schema), Properties (class-of-cases verification: any rating outside 1..5 is rejected), Contracts (precondition → action → postcondition: if session is unauthenticated, GET /dashboard redirects to /login).

Risk matrix: A tool for selecting the minimally sufficient level of facts by feature type. Visual changes require manual facts; data migrations — invariants and properties; authorization — properties and contracts; payments — examples and contracts. The matrix goal is to uncover missed checks, not create bureaucracy.

Validation.md: The central artifact of feature validation. Contains a set of facts with command/check, expectation, responsible party (automatic/manual check), and status. Not a checklist, but a set of facts for merge gating.

Fact lifecycle: Draft (proposed, not finalized) → Mandatory (accepted as feature criterion) → Implemented (has test, command, or confirmation) → Deferred (consciously moved to future phase with explanation). Helps not mix intent and evidence.

Manual fact: A check performed by a human, but concrete and unambiguous. Weak example: "Check the interface". Strong: "At 375px width, page /feedback shows name field, message field, submit button without horizontal scrolling or overlaps". Manual facts are mandatory for UI, useful for tone and accessibility.

Evidence bundle: A compact artifact for the reviewer at merge: link to specification, list of facts with statuses, command run traces (exit codes, output), manual check results, decisions made during implementation, commit links. The reviewer should not re-run everything — they should understand what the author checked, and selectively re-verify if in doubt.

Human-in-the-loop verification: Agent finds mechanical discrepancies; human evaluates product and architectural aspects: mission alignment, boundary creep, unspoken dependencies, structure clarity for new developers, presence of facts for risky behavior, decisions left only in chat.

Synchronous update of specification, plan, and facts: When implementation reveals need for new structure (e.g., splitting into Layout/Header/Main/Footer), update plan.md and validation.md simultaneously so future agent sessions don't revert to old interpretation.

Practice exercises: Name: Converting wishes to facts

Problem: Given a prose check: "Make sure the feedback form works correctly and looks normal on phone". Convert this to 3-4 concrete facts for validation.md, including automatic and manual checks. Indicate the level of each fact (example, invariant, property, contract, or manual fact).

Solution: 1. F1 — Example: curl -X POST http://localhost:3000/feedback -d '{"name":"Test","message":""}' returns 400 Bad Request. Level: example + contract (empty message → rejection).

F2 — Property: Any POST /feedback with rating outside range 1..5 returns 400, regardless of other fields. Level: property.
F3 — Invariant: After successful POST /feedback with valid data, the number of records in feedback table increases by exactly 1, and response redirects to /feedback. Level: invariant + contract.
F4 — Manual fact: At 375px width, page /feedback displays name field (type='text'), message field (textarea), submit button (type='submit'), and list of last 3 records without horizontal scrolling, element overlaps, or text truncation. Level: manual fact.

Key distinction: each fact contains a specific command or check condition, expected result, and success criterion that excludes ambiguous interpretation.

Complexity: beginner

Name: Applying the risk matrix

Problem: A team is developing the following features: (A) adding a promotional banner to the home page, (B) database migration to add email field to users table, (C) new endpoint POST /payments for payment processing, (D) registration form validation with email uniqueness check. For each feature, determine which fact levels are mandatory according to the risk matrix, and explain why.

Solution: A — Banner (visual/UI change): Example (correct HTML renders) + Manual fact (visual hierarchy, mobile readability). Without manual check, visual quality cannot be confirmed.

B — Data migration: Invariant (re-run does not change schema/duplicate column) + Property (migration applies idempotently for all existing records). Example is insufficient: need to guarantee data safety at scale.

C — Payment endpoint: Example (specific successful payment passes) + Contract (invalid credentials → 400 with specific error; duplicate idempotency key → 409). Side effects require strict contracts.

D — Registration validation: Example (valid data creates user) + Property (any duplicate email is rejected) + Contract (missing required field → 400 with field indication). Form validation requires properties for classes of invalid inputs.

Check: if for migration B only example without invariant is listed — this is a signal to rewrite validation.md.

Complexity: intermediate

Name: Creating a complete validation.md

Problem: Develop validation.md for the "Hello Hono" feature — a minimal web application on Hono with server-side rendering. Feature includes: installing Hono and tsx, GET / route returning HTML with AgentClinic heading, connecting static/style.css, type check script. Use lifecycle statuses and indicate responsible parties for checks.

Solution: ```markdown

Validation — Hello Hono

Fact Set

F1 — TypeScript compiles

Command: npm run typecheck
Expectation: exit code 0, no type errors
Responsible: automatic check (CI + local)
Status: mandatory → implemented

F2 — Development server starts

Command: npm run dev (background), then curl -s http://localhost:3000
Expectation: HTTP 200, Content-Type contains text/html
Responsible: automatic check
Status: mandatory → implemented

F3 — HTML contains landmark

Command: curl -s http://localhost:3000 | grep '<h1>AgentClinic</h1>'
Expectation: exactly one match, grep exit code 0
Responsible: automatic check
Status: mandatory → implemented

F4 — Static files are served

Command: curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/static/style.css
Expectation: HTTP 200, body contains CSS rules (check via | head -c 100)
Responsible: automatic check
Status: mandatory → implemented

F5 — Page structure is semantically correct

Check: open response source (curl -s http://localhost:3000)
Expectation: presence of <header>, <main>, <footer> tags in correct hierarchy
Responsible: developer manual check
Status: mandatory → implemented

F6 — Visual integrity on mobile

Check: DevTools, width 375px, height 667px
Expectation: heading, main content and footer don't overlap; horizontal scrolling absent
Responsible: developer manual check
Status: mandatory → implemented

Readiness Criteria

[x] All automatic facts (F1-F4) pass in CI
[x] Manual facts (F5-F6) checked locally
[x] Roadmap updated: Hello Hono phase marked complete
[x] Commits contain all changes to specs/ and src/

Note: F1-F4 are automatic, reproducible commands; F5-F6 are manual but with specific conditions and criteria. Not a single prose wish remains.

Complexity: intermediate

Name: Analyzing specification deviation

Problem: When validating the "feedback form" feature, tests pass, but implementation: (1) adds "phone" field not specified in requirements.md, (2) redirects to /success instead of /feedback after submission, (3) doesn't validate message length (required ≤500 characters). Compose a request to Qwen Code for deviation analysis and describe which files need synchronous update.

Solution: Request to Qwen Code:

/clear Compare this branch with @specs/2026-05-01-feedback-form/validation.md.

Show:

facts that are implemented and passing;
facts that are missing (message length validation ≤500);
facts that are ambiguous and require rewriting;
implementation decisions not described in requirements.md ("phone" field, redirect to /success);
outdated specification statements.

Do not modify files yet.

After analysis, synchronous update:
1. **requirements.md**: either add rationale for "phone" field and /success, or revert to original requirements.
2. **plan.md**: update component structure if adding field requires new component.
3. **validation.md**: 
   - Add F-new: POST /feedback with message >500 characters returns 400.
   - Update F-redirect: expectation changed to /feedback or /success — depending on product decision.
   - Add fact for phone field if it stays; or remove from implementation.
4. **Code**: bring into compliance with updated specification and facts.

Principle: any decision made during implementation must be reflected in specs, not left only in chat or commit message.

Complexity: advanced

Name: Forming an evidence bundle for merge

Problem: Prepare a merge request description (evidence bundle) for the "Hello Hono" feature based on completed exercises. Include all mandatory evidence bundle elements and demonstrate how a reviewer can selectively re-verify any fact.

Solution: ```markdown
## Evidence Bundle: Hello Hono

### Specification
- Folder: `specs/2026-05-01-hello-hono/`
- requirements.md: captures intent (minimal SSR on Hono)
- plan.md: Layout/Header/Main/Footer structure
- validation.md: 6 facts (see below)

### Fact Statuses
| ID | Fact | Status | Confirmation |
|---|---|---|---|
| F1 | TypeScript compiles | ✅ confirmed | `npm run typecheck` → exit 0 |
| F2 | Server responds 200 | ✅ confirmed | `curl -s -o /dev/null -w '%{http_code}' http://localhost:3000` → 200 |
| F3 | HTML contains <h1>AgentClinic</h1> | ✅ confirmed | `curl -s http://localhost:3000 \| grep '<h1>AgentClinic</h1>'` → match |
| F4 | static/style.css is served | ✅ confirmed | `curl -s -w '%{http_code}' http://localhost:3000/static/style.css` → 200 |
| F5 | Semantic structure | ✅ confirmed | Manual check: source contains <header>, <main>, <footer> |
| F6 | Mobile integrity | ✅ confirmed | Manual check: DevTools 375×667, screenshot attached |

### Command Traces

$ npm run typecheck > tsc --noEmit

exit code 0

$ npm run dev & $ curl -s http://localhost:3000 | head -c 200 <!DOCTYPE html><html><head>...<<h1>AgentClinic</h1>...

$ curl -s -w '\nHTTP %{http_code}' http://localhost:3000/static/style.css body { font-family: system-ui; } HTTP 200

### Decisions Made During Implementation
- Added Layout component for consistency: plan.md updated, F5 added.
- static/style.css connected via `<link>` in Layout, not inline: plan and facts synchronized.

### Commits
- `a1b2c3d` — feat: Hono setup with SSR
- `e4f5g6h` — feat: Layout/Header/Main/Footer structure
- `i7j8k9l` — docs: validation.md and roadmap update

### Selective Re-verification for Reviewer

Quick check of all automatic facts:

npm install && npm run typecheck && npm run dev & sleep 2 && curl -s http://localhost:3000 | grep -q 'AgentClinic' && echo 'F3 OK'

Key value: reviewer in 30 seconds understands what the author checked, and can either accept or run one command to re-verify, without reproducing everything from scratch.

Complexity: advanced

Case studies: Name: Migration deviation without invariant: data loss at a startup

Scenario: A team of 4 developers was building a SaaS platform for clinics. The "data migration" feature added a patient_records table with foreign key to users. Developer prepared migration.sql and checked it once locally — worked. In validation.md there was only one fact: "Example: migration applies successfully on empty database".

Challenge: After merge to staging, it was discovered that re-running migration (due to a CI script error) created a duplicate column patient_records_id_1 and broke indexes. Production deploy was delayed by 3 days. Rollback required manual DBA intervention. Problem: missing invariant "re-run does not change schema" and property "migration is idempotent for database with existing data".

Solution: Team introduced mandatory risk matrix for all migrations. For patient_records, rewrote validation.md:

F1 (invariant): npm run migrate:up && npm run migrate:up → schema unchanged, exit code 0 both times.
F2 (property): For database with 10K+ users, migration completes in <30 seconds without table lock.
F3 (example): After migration, SELECT COUNT(*) FROM patient_records returns 0 for new clinics, correctly counts existing for migrated ones.
Added CI step: run migration twice with pg_dump --schema-only | diff - check.

Also synchronously updated plan.md: migrations split into pre-deploy (safe) and post-deploy (requiring monitoring).

Result: Migration verification time reduced from 2 hours of manual checking to 5 minutes automatic. In 8 months since implementation, not a single migration incident occurred. Team could deploy 3 times a week instead of once every two weeks. Reviewers started trusting evidence bundles and didn't re-run migrations locally without reason.

Lessons learned: Risk matrix is not bureaucracy — it uncovers blind spots: migration with one example without invariant is a classic anti-pattern

Invariant "idempotency" for migrations is more important than dozens of examples: it covers all re-apply cases

CI must check exactly the scenario that's scary in production, not just the "happy path"

Evidence bundle reduces reviewer cognitive load — they decide based on artifacts, not guesses

Related concepts: Risk matrix

Invariant

validation.md

Evidence bundle

Fact lifecycle

Name: UI feature without manual facts: accessibility regression in a fintech app

Scenario: A large bank was developing a new transfer screen. Feature included a complex form with amount, recipient, and confirmation. Team relied entirely on automated tests: 47 unit tests, 12 integration tests, 100% branch coverage. validation.md contained only automatic facts.

Challenge: After release, complaints came from users with vision impairments: keyboard navigation lost focus on the confirmation modal, and screen reader didn't announce validation errors. Autotests passed, but didn't check tab order, aria attributes, contrast. Regression affected 12% of mobile app users in web-view. Rollback cost the bank reputational damage and a regulator fine for accessibility violation.

Solution: Team introduced mandatory manual facts for all UI features affecting financial operations:

F-manual-1: At 320px width and 200% zoom, all interactive elements remain accessible without horizontal scrolling.
F-manual-2: Tab navigation goes through all form fields, confirmation button, and "Cancel" link in logical order, without looping in modal.
F-manual-3: Screen reader NVDA/VoiceOver correctly announces validation errors on field blur.
For recurring checks, implemented Playwright with axe-core, but kept manual fact for new UI patterns.

Synchronously updated plan.md: each UI component must have accessibility-notes.md file with expected focus and screen reader behavior.

Result: Manual facts uncovered 3 potential regressions during development of next feature. Time from development to release reduced by 20%, as fewer bugs returned from QA. Regulatory accessibility audit passed without remarks. Team started sharing evidence bundles with compliance department, accelerating approvals.

Lessons learned: Automatic coverage ≠ quality: 100% branches doesn't guarantee accessibility, which only human can verify

Manual facts are not weaker than automatic, if they are concrete and reproducible

Recurring manual fact is a signal to automate via specialized tools (Playwright + axe)

Synchronous update of plan.md and validation.md prevents "drift" of implementation from intent when working with agents

Related concepts: Manual fact

Risk matrix (visual/UI change)

Human-in-the-loop verification

Synchronous update of specification, plan, and facts

Name: Working with an agent: preventing specification drift in an AI-native project

Scenario: Startup AgentClinic fully used Qwen Code for code generation. The "agent details" feature required displaying a list of ailments for each initial agent. Original specification described a monolithic page component. Agent implemented it correctly, but in a subsequent session (a week later, different context) started generating different structure — without Layout component.

Challenge: Agent interpreted the same text specification differently in different sessions. Absence of fixed facts in validation.md caused each new session to "creatively" reinterpret structure. Code worked, but architecture drifted. Reviewers spent time uncovering implicit changes not captured in specification.

Solution: Team introduced strict process:

After first implementation, updated plan.md: page = Layout(Header, Main, Footer).
Added facts to validation.md locking structure:

F-structure: curl -s http://localhost:3000/agents/1 | grep -E '<(header|main|footer)' → exactly 3 matches.
F-css: curl -s -w '%{http_code}' http://localhost:3000/static/style.css → 200.
F-typecheck: npm run typecheck → 0.

Agent request now always included: "Update implementation according to @specs/.../plan.md and @specs/.../validation.md. Do not modify specs files without explicit request."
Introduced check: git diff --stat main...HEAD + agent request to compare with validation.md before each commit.

When structure change was needed, used synchronous request: update plan.md, validation.md, and implementation simultaneously.

Result: Architecture drift stopped. 90% of agent sessions generated code compatible with existing structure without additional fixes. Review time reduced by 60%. New (human) developers understood project structure in minutes, not hours, thanks to facts in validation.md serving as executable documentation.

Lessons learned: Text specification is interpreted differently by agent across sessions; facts are the only stable contract between sessions

Fact "response contains header/main/footer landmarks" is better than "page is structured correctly" — it is machine-verifiable without interpretation

Synchronous update of plan.md + validation.md + code prevents agent's "creative drift"

Git diff + validation.md check before commit creates feedback loop that teaches agent (and human) to honor contracts

Related concepts: Fact

validation.md

Agent-assisted verification

Synchronous update of specification, plan, and facts

Start with differences

Study tips: Start practice with "inversion": take bad prose checks and turn them into facts — this trains "fact thinking" better than reading theory

Use risk matrix as a "what's missing" checklist, not mandatory set: goal is to uncover omission, not create bureaucracy

Practice with real curl commands: write them in terminal, check exit codes, copy into validation.md — fact must be reproducible

Create a validation.md template for your project and use it as starting point for each feature — this lowers the barrier to start

Conduct "fact audit" with a colleague: one reads validation.md, other tries to perform checks without reading code — uncovers ambiguities in minutes

For agent work, keep history of requests for reconciliation with validation.md — they become training material for new sessions

Keep a "fact changed after failure" journal — explicitly documented fact change is normal; hidden one is anti-pattern that returns as technical debt

Transition from manual to automatic facts when you notice repetition: manually checked three times — candidate for Playwright or integration test

Use lifecycle statuses for stakeholder expectation management: "deferred" with explanation is better than "forgotten" fact surfacing before release

Practice forming evidence bundles even for personal projects — skill of structuring arguments for reviewer transfers to any team work

Additional resources: Appendix C to course (PR evidence bundle template): Official merge request description template with evidence bundle — use as starting point for all projects

Course Part 20 (hidden fact changes as anti-pattern): Deep dive into fact evolution and specification technical debt management

Playwright documentation (playwright.dev): Tool for automating manual facts: screenshot testing, accessibility checks, mobile viewports

Axe-core (deque.com/axe): Library for programmatic accessibility verification — candidate for manual UI facts automation

Hono framework documentation (hono.dev): Context of framework used in course examples — for reproducing exercises

Git documentation: git diff, git status, git log: Basic commands for "start with differences" practice in feature validation

Part 9 review questions (from original document): Self-check: 1) Why shouldn't specification be the only gate? 2) How is fact different from wish? 3) When is manual check a fact? 4) What to do if tests pass but implementation doesn't match requirements.md? 5) Why "specifications guide, facts gate" is better than just "write better specifications"?

Summary: Feature validation is a separate mode of work where text specification transforms into verifiable facts. Specifications guide development, but only facts gate merge. A fact is an executable or unambiguously verifiable statement, not a prose wish. Four levels of facts (examples, invariants, properties, contracts) and the risk matrix help select minimally sufficient check density for each feature type. Central artifact is validation.md with lifecycle statuses separating intent from evidence. Manual facts are not weaker than automatic if concrete; recurring manual checks are a signal to automate. When working with agents, facts are critically important: they overcome ambiguity of text specification interpretation between sessions. Evidence bundle turns review from guesswork into artifact verification. Synchronous update of specification, plan, and facts prevents implementation drift and makes project understandable for future developers and agents.