Topic: Applied Part 1. Recovering Specifications from Legacy
Difficulty level: Medium
Estimated study time: 6-8 hours (theory + practice)
Prerequisites: Familiarity with the first volume of the course, part 13 (recovering the constitution of an existing project)
Basic understanding of Kubernetes, Grafana, PagerDuty
Experience working with markdown and JSON
Understanding of Git repository structure
Familiarity with the Given/When/Then format for user stories
Learning objectives: Distinguish between requirements (SDD contract) and background context (memory bank), applying strict separation when inventorying artifacts
Create a genealogy.md with at least two evidence_ref entries, confidence level, and an open question for one production requirement
Transform recovered statements into Given/When/Then format and define corresponding JSON Schema fields
Use Qwen Code in extraction mode (not generation) to obtain claims with mandatory sources and counterexamples
Evaluate a requirement's readiness for assertion by checking threshold defensibility and SLA source references
Overview: This chapter is dedicated to recovering engineering-grade specifications from the chaos of legacy artifacts: unstructured logs, Slack threads, dashboard screenshots, and post-mortems without a formal SDD. After the SRE team churns in an automatic incident management project, fragments remain from which verifiable requirements must be extracted, not a collection of plausible guesses. The educational focus is narrow: one claim, two sources, one open question. The full production track (normalizers, historical replay, file arbitration) is deferred to part 8. The main result is a completed genealogy.md, separating an asserted requirement from a hypothesis and background context.
Key concepts: Genealogy.md: A registry of requirement provenance, linking an assertion to sources (logs, Slack, metrics, post-mortems), confidence level (uncertainty), and open questions. Unlike git log, which shows who changed a file and when, genealogy.md shows where the requirement itself came from and how confident we are in it. Required fields: claim, evidence_ref (minimum two), status (approved/needs_clarity/rejected), uncertainty, open_questions.
Memory bank: A separate layer of infrastructure context: everything that helps interpret facts but is not itself a contract. Cluster topology, team list, historical agreements, API limitations, familiar communication channels, operational vocabulary. Mixing memory bank with requirements is dangerous: false rules like "canary is always non-escalating" appear in the SDD, when they are actually just context for a test namespace.
Evidence ref: A proof annotation — a reference to a specific location in the source artifact. Format: source:path#location. Examples: grafana:NR-2026-05-17-01, postmortem:node-not-ready-2026-05, slack/thread_11#msg_7. Without evidence_ref, a statement remains the author's opinion, not a verifiable requirement.
Claim (assertion candidate): A recovered rule, not yet asserted in the contract. A good claim contains: specific threshold (>=3 NodeNotReady), window (within 10 minutes), condition (for one node), correlation (with 5xx growth), evidence_ref, counterexample, missing_context, confidence level.
Given/when/then: A behavioral requirement description format. Given — the initial system state. When — the trigger event. Then — the expected result with specific numbers, statuses, and SLAs. Example: Given a cluster in active shift and >=3 NodeNotReady within 10m; When the event is correlated with a deployment; Then a P1 is created, initial response within 8 minutes, escalation to NOC within 15 minutes, closure after 2 consecutive OKs within 10 minutes.
JSON schema contract: A machine-readable representation of a requirement with mandatory fields, allowed values (enum), and numeric boundaries (minimum/maximum). Dual notation (Given/When/Then + JSON Schema) eliminates the gap between "human-understandable" and "machine-verifiable". Verifiable fields: rule_id, severity (enum P0-P3), sla_minutes, conditions (event_code, count, window_minutes, namespace_rule).
Timeline normalization: Bringing sources to a common time (UTC), removing duplicates, extracting event codes, linking records with a unified incident identifier. Skeleton: ts → source → event_code → actor → affected_scope → evidence_ref. Without normalization, recovery becomes an argument about memories rather than a reconstruction of system behavior.
File arbitration (full track): A process for verifying disputed requirements by three roles: Verifier (consistency of numbers and statuses), Implementor (feasibility in the current pipeline), Safety (boundaries of safe action, veto right at critical_risk). The coordinator keeps a journal without voting. Detailed in part 8. For the educational minimum — reference only.
Qwen code as extractor: The model works not as an author of business logic, but as an intermediary for extracting claims from normalized data. A good prompt requests: recurring rules, confirming sources, counterexamples, confidence level. Assertions without evidence references are prohibited. Mode: headless Plan Mode (--approval-mode plan).
Uncertainty (confidence level): low — assertion confirmed by independent sources, passed replay or arbitration. medium — there are contradictions, missing context, one source. high — assumption based on indirect data. Disputed facts are not masked as approved: marked as needs_clarity with a verification plan.
Practice exercises: Name: Creating genealogy.md for node_not_ready
Problem: Using the course excerpt, create a genealogy.md file with one entry for the node_not_ready incident. Given: a Grafana log with three NodeNotReady events within 10 minutes, a PagerDuty record of P1 escalation, a post-mortem with rejection of auto-resolve until two stable windows, and an open question about canary namespace. Task: formulate the claim, add two evidence_ref entries, specify uncertainty and open_questions, separate memory bank.
Solution: 1. Copy the template from book2/examples/templates/genealogy.md to your working directory. 2. Write the claim: 'When >=3 NodeNotReady within 10m for one node, a P1 is created, closure requires 2 consecutive OKs within 10m'. 3. Add evidence_ref: ['grafana:NR-2026-05-17-01', 'postmortem:node-not-ready-2026-05']. 4. Specify uncertainty: medium (due to open question about canary). 5. Add open_questions: ['does canary namespace exclude P1 or only reduce confidence?']. 6. Send to memory bank: cluster=prod-k8s, node=worker-07, owner=platform_oncall (context, not contract). 7. Set status: needs_clarity. 8. Verify: the assertion cannot be read as 'author's opinion' — the threshold is protected by a Grafana reference, the closure condition is protected by a post-mortem reference.
Complexity: beginner
Name: Translating a claim into Given/When/Then and JSON Schema
Problem: Take the assertion from the previous exercise and format it in dual notation: behavioral story (Given/When/Then) and minimal JSON Schema. Identify which three schema fields verify the threshold, severity, and closure condition. Format the canary exclusion as a separate namespace_rule condition.
Solution: Given/When/Then: Given a cluster in active shift and >=3 NodeNotReady within 10 minutes for one node is recorded; When the event is not in canary namespace or in canary with correlated 5xx growth; Then an incident severity=P1 is created, initial response within 8 minutes, escalation to NOC within 15 minutes, closure after 2 consecutive OKs within 10 minutes. JSON Schema: { '$id': 'urn:spec:node-not-ready:v1', 'type': 'object', 'required': ['rule_id','severity','sla_minutes','conditions'], 'properties': { 'rule_id': {'type':'string'}, 'severity': {'type':'string','enum':['P0','P1','P2','P3']}, 'sla_minutes': {'type':'integer','minimum':1,'maximum':120}, 'conditions': {'type':'object','required':['event_code','count','window_minutes','namespace_rule'], 'properties': {'count':{'type':'integer','minimum':3}, 'window_minutes':{'type':'integer','minimum':1}, 'namespace_rule':{'type':'string','enum':['standard','canary']}}}}} Three verifiable fields: count (threshold >=3), severity (enum P1), conditions with window_minutes and namespace_rule (closure condition via consecutive OKs — in the full track via auto_resolve_window with regular expression).
Complexity: intermediate
Name: Separating requirements and memory bank in a real fragment
Problem: Given a fragment from a Slack thread and post-mortem: 'On-call Vasya noticed at 3 AM that worker-07 in prod-k8s was NotReady again, wrote in channel #sre-alerts, after 15 minutes called Nastya from NOC, they decided not to close automatically because last time auto-resolve led to a repeat incident. It turned out that at 2:47 there was a planned deployment to canary. The old service was called node-health-check, now it's k8s-node-monitor'. Separate into requirements (SDD) and memory bank. Explain why each element belongs to its layer.
Solution: SDD requirements: '>=3 NodeNotReady within 10m → P1' (recovered from recurring pattern), 'closure requires 2 consecutive OKs within 10m' (post-mortem: auto-resolve rejected), 'escalation to NOC within 15 minutes' (temporal event chain). Memory bank: 'on-call Vasya' (who was on duty — context, not contract), 'channel #sre-alerts' (familiar communication channel), 'worker-07 in prod-k8s' (specific topology, changes), 'old name node-health-check' (historical vocabulary), 'planned deployment to canary at 2:47' (context for interpretation, but not a rule in itself — unless formatted as a namespace_rule condition). Rationale: people's names and node names change, communication channels are operational habits, while thresholds and SLAs are verifiable system behavior.
Complexity: intermediate
Name: Evaluating a claim's readiness for assertion
Problem: Given three claims for high_memory_usage. A: 'When memory_percent >= 90% within 10m for appointments-api, a P1 is created' — two evidence_ref (grafana, post-mortem), but the prohibition of auto-resolve without two stable windows is not confirmed. B: 'When memory_percent >= 85% within 5m, a P0 is created' — one evidence_ref (Slack), no post-mortem, contradicts neighboring cluster. C: 'When memory_percent >= 90% within 10m, a P1 is created, auto-resolve is prohibited without two stable windows of 5m each' — three evidence_ref, passed replay on historical data, but the service owner quit. Determine status and uncertainty for each, justify.
Solution: A: status needs_clarity, uncertainty medium. Threshold and severity are protected by two sources, but the closure condition is an open question. Cannot partially approve a partially confirmed requirement. Plan: request service owner or find additional post-mortem. B: status rejected (or remains a hypothesis), uncertainty high. One source, contradiction with neighboring cluster, 85% threshold is unprotected — appears to be a guess from chat. Cannot accept without re-verification. C: status approved or needs_clarity depending on procedure. Technically three sources and replay give low uncertainty, but the absence of a service owner is a risk for future review. Decision: keep approved with note 'last confirmation from departed owner, re-validation required upon architecture change' or move to needs_clarity until a new owner is assigned. Lesson: even strong evidence requires a living review mechanism.
Complexity: advanced
Case studies: Name: Recovering escalation SLA after SRE team departure in AgentClinic-production
Scenario: The educational model AgentClinic-production is deployed in Kubernetes. After the SRE team churned, fragments remain: 47 pages of unstructured logs, 11 relevant Slack messages, Grafana dashboard screenshots, post-mortems without formal SDD. The triage contour receives webhooks from Grafana and PagerDuty. Need to recover one requirement: when NodeNotReady becomes P1 and when it cannot be closed automatically.
Challenge: 1. Disparate sources without unified format: metric logs, PagerDuty escalations, informal post-mortems. 2. Risk of mixing operational context (on-call names, cluster topology, old service names) with business rules. 3. Implicit rule about canary namespace: does it exclude P1 or only reduce confidence? 4. Absence of formal SDD — the alternative of 'a collection of plausible guesses' is unacceptable. 5. Need for verifiability: the recovered specification must become an executable artifact.
Solution: 1. Applied strict 'requirement vs memory bank' filter already at the inventory stage. On-call names, Slack channels, worker-07 topology — sent to memory bank. 2. Created normalized timeline: 1248 NodeNotReady events → grouping by 10-minute windows → 63 alerts → extracting 8 previously closed incidents. 3. Identified coincidence of sharp NodeNotReady growth with deployment, and two behavior branches: standard P1 and canary path with relaxed thresholds. 4. Formulated claim with two evidence_ref: grafana:NR-2026-05-17-01 and postmortem:node-not-ready-2026-05. 5. Open question about canary was recorded, not masked as an asserted rule. 6. Dual notation: Given/When/Then + JSON Schema with severity, sla_minutes, conditions fields. 7. For the educational minimum — genealogy.md without full arbitration; for production track, transition to file arbitration in part 8 is provided.
Result: Recovered verifiable requirement: '>=3 NodeNotReady within 10 minutes for one node creates P1, initial response 8 minutes, escalation to NOC within 15 minutes, closure after 2 consecutive OKs within 10 minutes'. Canary namespace condition is extracted as a separate namespace_rule, not a note. genealogy.md allows auditing the provenance of each point. Specification is ready for CI validation and replay on historical data.
Lessons learned: Evidence matters more than confident phrasing: a beautiful rule without evidence_ref is still a hypothesis
Separation of requirements and memory bank must happen at the inventory stage, not at the end — otherwise false rules infiltrate the contract
Exceptions (canary, planned deployment) are not noise but pointers to hidden specification conditions; they cannot be removed
Open questions must be explicitly marked, not masked as approved — this protects against accepting unsubstantiated decisions
Dual notation Given/When/Then + JSON Schema eliminates the gap between human-readable description and machine verifiability
Even strong evidence requires a living review mechanism upon team or architecture change
Related concepts: genealogy.md
memory bank
evidence_ref
Given/When/Then
JSON Schema contract
timeline normalization
uncertainty
Name: Error of mixing memory bank and requirements in a monitoring migration project
Scenario: The platform team migrated from a custom monitoring system to Prometheus+Grafana. When recovering alerting specifications, engineers included in the SDD the phrase: 'Alerts severity=P1 are sent to channel #critical-alerts, mentioning oncall from team platform'.
Challenge: After 6 months, team platform was renamed to platform-infra, the channel changed name, and on-call rotation moved to PagerDuty. The SDD had to be rewritten, even though the business logic (when P1 is created) did not change. Problem: operational context was encoded as a contractual requirement.
Solution: Retrospective analysis showed that the correct separation is: requirement — 'When >=3 NodeNotReady within 10m, a P1 is created, escalation occurs within 15 minutes per SLA'. Memory bank — 'channel #critical-alerts, team platform, current on-call Vasya'. Rewritten SDD with genealogy.md separates stable behavior from changing context. JSON Schema contains severity and sla_minutes, but not channel_name or team_name.
Result: The new structure survived three communication reorganizations without contract changes. Updating memory bank is a minute-long operation not requiring arbitration. The lesson confirmed: the 'contract vs context' filter is critical for long-lived specifications.
Lessons learned: Team names, channel names, people's names — memory bank, regardless of how 'obvious' they seem when writing
If reading the SDD raises the question 'what if the team is renamed?' — this is a signal of layer mixing
JSON Schema as an audit tool: if a field has no enum or numeric boundaries, it likely belongs to memory bank
Related concepts: memory bank
contract vs context filter
JSON Schema contract
genealogy.md
Study tips: Start with the minimal educational scenario: one claim, two sources, one open question. Don't try to immediately cover all 47 pages of logs — that's the full production track
Practice on the 4-line course excerpt before applying to real materials. Structure matters more than volume
For visual style: draw two columns on paper — left 'requirement (verifiable)', right 'memory bank (context)'. Go through each fragment of the source artifact, deciding which column it belongs to
For auditory style: vocalize claims aloud with the question 'what should change in the system?' If the answer is 'nothing, this is just information' — it's memory bank
For kinesthetic style: physically move cards with fragments between two stacks 'SDD' and 'memory bank', discussing each move
Test every claim with the question 'can I write an automatic test for this?' If not — it may not be a requirement, or the requirement is not specific enough
Use genealogy.md as a 'suspicion checklist': if an entry lacks two evidence_ref — it is automatically a hypothesis, regardless of the author's confidence
Practice with Qwen Code on educational data in Plan Mode, but always validate the output JSON with a separate parser — the model generates a session report, not strict claims.json
Create 'counterexamples' as a mandatory field: a good exercise is to find a case in the source data that contradicts your claim, and explain why (exception? data error? incorrect hypothesis?)
Review genealogy.md after 24 hours with 'fresh eyes': can you, without recalling the context, understand why the threshold is exactly 3, not 2 or 4?
Additional resources: Genealogy.md template: book2/examples/templates/genealogy.md — starter template for provenance
Educational project from first volume: AgentClinic on TypeScript, Hono, server JSX, SQLite, Vitest — foundation for understanding memory bank from tech-stack.md and QWEN.md
Github spec kit: https://github.com/github/spec-kit — framework of 'specification as executable artifact'
Part 13 of first volume: Recovering the constitution of an existing project — preliminary foundation
Part 8 of second volume: part-08-multiagent-tribunal.md — file arbitration, role details for Verifier/Implementor/Safety
Tribunal examples: examples/tribunal/ — runnable educational equivalent
Appendix a: appendix-a-bridges-to-book.md — connections with first volume
Course repository: book2/examples/ — runnable Python stdlib scripts
Summary: Recovering specifications from legacy is an engineering technique for transforming artifact chaos (logs, chats, post-mortems) into a verifiable contract. Key principles: strict separation of requirements and memory bank, timeline normalization, extraction (not generation) of claims via Qwen Code with mandatory evidence_ref, dual notation Given/When/Then + JSON Schema, explicit marking of uncertainty and open_questions. The educational minimum is one completed genealogy.md with a protected assertion, two sources, and an open question. The full production track adds normalizers, historical replay, and file arbitration (part 8). The main result: the specification becomes a chain of evidence that can be validated, disputed, replayed on historical data, and audited months later.