Reading: Applied Part 1. Recovering Specifications from Legacy

Lesson 1 of 5 in module «Applied Part 1. Recovering Specifications from Legacy»
You are viewing the lesson without signing in. Sign in to save progress and take tests.

Applied Part 1. Recovering Specifications from Legacy

Status: Recommendation. Gathering evidence, normalizing the timeline, and separating requirements from memory bank are established engineering practices. The three-party file arbitration at the end of the chapter is frontier.

For the educational walkthrough, it is enough to assemble one genealogy.md and separate an approved requirement from a hypothesis. File arbitration, normalizers, and historical data replay are only needed for the full production track.

This chapter continues Part 13 of the first volume: there we recovered the constitution of an existing project, here we recover one production requirement from the traces of an incident. Keep the focus narrow: one claim, two sources, one open question. Everything that requires normalizers, historical replay, or file arbitration belongs to the full track.

Before Reading

  • Foundation from the first volume: Part 13 teaches how to recover the constitution of an existing project; here you recover one production requirement.
  • Local educational case: node_not_ready, because it is easy to demonstrate provenance and uncertainty with it.
  • Artifact for capstone/: one genealogy.md entry for the main high_memory_usage with two evidence_refs and one open question.
  • Key terms for the first pass: evidence_ref and memory bank (the boundary between requirement and background context). The remaining terms in the chapter — Verifier/Implementor/Safety, Coordinator-Scribe, normalizer, file arbitration — are reference material, examined in detail in Part 8.
  • What to defer: log normalizers, historical replay, and file arbitration.

In the first volume, AgentClinic was an educational project in TypeScript, Hono, server JSX, SQLite, and Vitest. In the second volume, we use the educational model AgentClinic-production. The same project is mentally deployed in Kubernetes. Grafana and PagerDuty send webhooks into its triage pipeline, and long-running replicas have accumulated operational history. Python in the second volume is used only for small runnable scripts in examples/, not as the main application stack.

No real cluster needs to be spun up. The legacy traces that Chapters 1–11 work with are educational post-mortems, dashboards, and logs from a production scenario. The specific incidents below (node_not_ready, appointment_latency / appointment_latency_spike, autoscale_200pct, cdn_error_budget_burn, high_memory_usage) are events from this model, not abstract scenarios.

The engineering name for this technique is recovering specifications from observable artifacts: logs, metrics, chats, post-mortems, and verifiable traces of decisions. If you encounter the figurative phrasing "Spec-necromancy," treat it only as a short label for this reconstruction, not a separate technique.

Goal

After the SRE team churned, fragments remained in the automatic incident management project: 47 pages of unstructured logs, several Slack threads, dashboard screenshots, and post-mortems without a formal SDD. The chapter's goal is to show how to recover an engineering-grade specification for a triage pipeline based on Qwen Code from such traces. The alternative — a set of plausible guesses — is not acceptable.

After this section you will be able to:

  • separate requirements from the background memory bank model (full definition in "Key Ideas" below);
  • gather evidence into a single chain of events;
  • extract implicit rules and turn them into verifiable user stories;
  • anchor the provenance of each item so that contentious decisions can be audited and re-justified later (the SDD "specification as executable artifact" frame from GitHub Spec Kit).

Minimal Educational Scenario

Educational Case

Production incident node_not_ready: from the metrics log, a PagerDuty escalation, and one post-mortem, you need to recover one requirement — when a NodeNotReady event becomes P1 and when it cannot be auto-closed.

Preparation

  • book2/examples/templates/genealogy.md — provenance template.
  • The educational excerpt below — a minimal substitute for logs, post-mortem, and Slack thread.
  • One contentious fact: planned deploy window, canary namespace, or manual escalation cancellation.

A logical question: how does genealogy.md differ from git log or git blame. In short: by the fields that carry meaning here. git log shows which file changed and who did it. genealogy.md shows where the requirement itself came from, how confident we are in it (uncertainty), which sources confirm it (evidence_ref), and which open questions remain unanswered. A commit message "added requirement" in git history does not distinguish "we know this firmly from two post-mortems" from "we guessed this in chat." In genealogy.md, this difference is mandatory.

Minimal educational excerpt:

grafana:NR-2026-05-17-01  cluster=prod-k8s node=worker-07 event=NodeNotReady count=3 window=10m
pagerduty:NR-2026-05-17-01 escalation=created owner=platform_oncall severity=P1
postmortem:node-not-ready-2026-05  note="auto-resolve was rejected until two stable OK windows"
open_questions:
  - "does canary namespace exclude P1 or only reduce confidence?"

If you do not have your own logs, use this excerpt. If you have real materials, replace the lines with your own, but keep the same minimum: two sources, one claim, one open question.

Steps

  1. Copy the genealogy.md template into your working directory. Expected result: a file appears with sections for source, status, confidence, and open questions.
  2. Write one candidate claim: for example, >=3 NodeNotReady in 10 minutes creates P1.
  3. Add at least two evidence tags (evidence_ref) and one missing context. Expected result: the claim cannot be read as "just the author's opinion."
  4. Separate the requirement from memory bank: cluster topology and on-call names must not become the contract.
  5. Rewrite the claim in Given/When/Then and indicate alongside which field of the future JSON Schema will verify the threshold, severity, and closure condition.
  6. Set status to approved, needs_clarity, or rejected. Expected result: a contentious fact is not masked as an approved requirement.

Checkpoint

In genealogy.md there is one entry where the claim, sources, confidence level, missing context, and link to verifiable behavior are simultaneously visible. If the threshold or SLA cannot be defended with a source reference, the requirement remains a hypothesis.

How This Goes into capstone/

Transfer into capstone/genealogy.md only one defended entry: the claim, two evidence_refs, confidence level, and open question. Do not transfer the entire timeline, log excerpts, and Slack quotes unless they became evidence for a specific requirement.

Minimal fragment for high_memory_usage:

- claim: "At memory_percent >= 90% for 10m for appointments-api, P1 is created."
  status: needs_clarity
  evidence_ref: ["grafana:HM-2026-05-17-01", "postmortem:api-memory-2026-05"]
  uncertainty: medium
  open_questions:
    - "Is the ban on auto-resolve without two stable windows confirmed?"

Reviewable Trace

In the educational package, keep only the completed genealogy.md or its fragment. Draft log excerpts and temporary tables are not needed in the repository unless they became verifiable evidence.

Key Ideas

The first discipline of specification recovery is to strictly separate factual requirements from the background memory bank model. By memory bank we mean a separate layer of infrastructure context: everything that helps interpret facts but is not itself a contract.

If this term seems new, look at it through the first volume. What lived there in tech-stack.md (what we write on) and in QWEN.md (the agent's permanent context), in the second volume is called by the single common word memory bank. It is the same background layer, only now it is explicitly separated from requirements, because in production scenarios the difference "contract vs context" becomes critical.

A requirement, unlike memory bank, describes feature behavior. What counts as a trigger. When an incident is created. Which SLA applies. Who receives escalation. Under what conditions the event is closed.

memory bank stores something else: cluster topology, team list, historical agreements, API constraints, habitual communication channels, and operational vocabulary. Why this separation matters. If you mix levels, a false rule like "canary is always non-escalating" can easily appear in the SDD. In reality this may only be context for a test namespace, not a universal product behavior.

Introduce the separation already at the artifact inventory stage. In the SDD, include claims that can be verified by an observable scenario: >=3 NodeNotReady in 10 minutes creates P1, NOC receives notification within 15 minutes, closure requires 2 consecutive OKs.

Send to memory bank everything that helps interpret facts but is not a contract:

  • who was on call the night of the incident;
  • why an old service name was used in Slack;
  • which teams have access to Grafana.

Such a filter reduces the risk that Qwen Code will take infrastructure background for a business rule and start designing behavior based on an accidental detail.

The second idea is to gather and normalize evidence into a single temporal chain of events. Each source has its own profile:

  • logs provide observable states and the order of event occurrence;
  • Slack shows operator intent and manual workarounds;
  • post-mortem records causes and consequences;
  • metrics allow estimating the scale of degradation.

Before analysis, bring sources to common time (UTC). Remove duplicates, highlight event codes, and link records by a common incident, cluster, node, or deployment identifier. Without this, SDD recovery turns into an argument about memories rather than a reconstruction of system behavior.

The normalized chain is built as a sequence of ts → source → event_code → actor → affected_scope → evidence_ref, where the last field is the evidence tag (evidence_ref), a reference to a specific place in the original artifact. In the node_not_ready case, the skeleton may show that three NodeNotReady events in 10 minutes almost always preceded P1 creation. Then after 15 minutes came escalation to NOC. Closure only occurred after a pair of stable OKs.

Separately record exceptions: planned deploy window, canary namespace, temporary metrics loss, or manual escalation cancellation. Do not remove such exceptions as noise — they often point to hidden conditions of the future specification.

> [conceptual interface] — these commands show the expected interface of local normalizers. Ready-made timeline_builder.py and evidence_matrix.py are not in the textbook repository; implement them in your own project if you move from the educational minimum to the full track.

rg -n "NotReady|NodeNotReady|ALERT|deploy" evidence/raw/* > evidence/index.txt
python3 tools/timeline_builder.py --input evidence/raw --out evidence/timeline.ndjson
python3 tools/evidence_matrix.py \
  --timeline evidence/timeline.ndjson \
  --slack evidence/slack_export.json \
  --metrics evidence/metrics.csv \
  --out evidence/matrix.csv
Check: every row in evidence/timeline.ndjson contains ts, source, event_code, cluster, namespace, actor, and evidence_ref; empty fields block transition to requirement inference.

Next the diagram shows how a recovered SDD is obtained from legacy. On the right side appears an "Arbitration" block with three roles and a coordinator: this is the full track, examined in detail in Part 8. On the first pass, treat the "Arbitration" block as one step "an independent role checks contentious requirements" — the detailed role composition does not need to be read here.

flowchart TD
  subgraph Input["Input: legacy"]
    L[Logs, post-mortem, Slack, metrics]
  end
  subgraph Processing["Processing"]
    P[Parsing and timeline]
    R[Requirement hypotheses and user stories]
  end
  subgraph Arbitration["Arbitration (full track, ch.8)"]
    TBR[Independent role checks contentious requirements]
  end
  subgraph Result["Result"]
    S[Recovered SDD and genealogy.md]
  end
  L --> P --> R --> TBR --> S

The third idea is to extract implicit requirements through Qwen Code, but evaluate each claim by source and context. Qwen Code here does not work as the author of business logic, but as an intermediary for extraction. It is given facts, environment constraints, and a strict response format where claims without evidence references are prohibited.

A good prompt does not ask to "invent an SDD," but to do something else:

  • find repeating rules in the event chain;
  • indicate confirming sources;
  • name counterexamples;
  • assign a confidence level.

Thus the model amplifies analysis but does not get the right to turn guesses into requirements. Expect from Qwen Code a list of candidate claims (claims), not a final specification.

Bad:

> REQ-NR-01: when frequent NodeNotReady on node, P1 is created.

Problem: no threshold, no window, no evidence tag. The rule is impossible to verify or challenge.

Good:

> REQ-NR-01: when >=3 NodeNotReady in 10 minutes on one node and correlated 5xx growth, P1 is created. evidence: logs/node-2026-05-12.parquet#row_4123, slack/thread_11#msg_7, grafana/node_5xx#segment_11:00. confidence: medium. missing_context: planned deploy window.

What this gives in practice. Such a record is more useful than a smooth user-story text: it immediately shows where the requirement is solid and where it needs checking with the service owner. If the rule is confirmed by only one post-mortem and does not match metrics, it remains a hypothesis even if it sounds convincing.

> [project script]qwen -p is runnable by itself, but the input @evidence/matrix.csv must first be assembled in your project. Stabilize the final JSON format with a separate parser-normalizer.

qwen -p "Read @evidence/matrix.csv. Find repeating rules
for incident node_not_ready. Return claims with evidence, counterexample,
missing_context, and confidence. Do not assert facts without evidence." \

--approval-mode plan \
  --output-format json \
  > sdd/drafts/nr-claims.qwen.json

qwen -p "Read @sdd/drafts/nr-claims.qwen.json and conduct cross-examine:
for each claim check source, counterexample, and missing_context.
Mark claim as approved, needs_clarity, or rejected." \
  --approval-mode plan \
  --output-format json \
  > sdd/drafts/nr-claims-cross.qwen.json
Check: Qwen here operates in headless Plan Mode. The final Qwen Code JSON is
a session message report; if the project needs a strict claims.json,
add a separate parser-normalizer and test it.

The fourth idea is to encode requirements simultaneously in Given/When/Then and a machine-readable contract, e.g. JSON Schema. Given/When/Then keeps the requirement in behavioral language: initial state, event, expected result.

JSON Schema fixes mandatory fields, allowed values, numeric bounds, and data structure. The contract can be validated in CI or a local validator pipeline. Dual notation eliminates the gap between "understandable to humans" and "verifiable by machines".

For node_not_ready the behavioral story looks like this:

  • Given cluster prod-k8s is in active shift and >=3 NodeNotReady is recorded for one node within 10 minutes;
  • When the event is correlated with a deployment or 5xx growth in related metrics;
  • Then incident severity=P1 is created, initial response is expected within 8 minutes, auto-escalation to NOC occurs within 15 minutes, and closure is only permitted after 2 consecutive OKs within 10 minutes.

Formulate the canary namespace exception as a separate condition, not as a note at the end. Otherwise the validator will not be able to distinguish the standard path from the relaxed threshold. Such format turns the conversation about "quick response" into concrete numbers, events, and statuses.

Minimal JSON Schema for the same contract (full form with triggers and regex for auto_resolve_window — in the full track):

{
  "$id": "urn:spec:node-not-ready:v1",
  "type": "object",
  "required": ["rule_id", "severity", "sla_minutes", "conditions"],
  "properties": {
    "rule_id":      {"type": "string"},
    "severity":     {"type": "string", "enum": ["P0", "P1", "P2", "P3"]},
    "sla_minutes":  {"type": "integer", "minimum": 1, "maximum": 120},
    "conditions": {

"type": "object",
      "required": ["event_code", "count", "window_minutes", "namespace_rule"],
      "properties": {
        "count":          {"type": "integer", "minimum": 3},
        "window_minutes": {"type": "integer", "minimum": 1},
        "namespace_rule": {"type": "string", "enum": ["standard", "canary"]}
      }
    }
  }
}

The fifth idea applies only to the full track: contentious recovered requirements can be sent to file arbitration. Three roles vote — Verifier, Implementor, Safety; the Coordinator keeps the log without voting. The Verifier checks consistency of numbers and statuses, the Implementor — implementability in the current triage pipeline, Safety — boundaries of safe action and veto right at critical_risk. Roles, verdicts, and precedents are examined in detail in Part 8; the runnable educational equivalent is in [examples/tribunal/](examples/tribunal/). For the educational minimum this step is not needed: genealogy.md with sources, confidence level, and open question is sufficient.

The sixth idea is to maintain genealogy.md, a separate registry of provenance for each requirement. Why it is needed. A recovered SDD quickly loses value if after a month it is impossible to explain:

  • why the threshold of 3 events in 10 minutes was chosen;
  • who confirmed the 8-minute SLA;
  • why canary got a separate mode.

genealogy.md links the claim to logs, Slack, metrics, post-mortem, file arbitration decision, and current uncertainty level. Thus the specification becomes a chain of evidence, not a textual snapshot of collective memory.

- req_id: NR-01
  statement: "At >=3 NodeNotReady in 10m for one node and 5xx growth, P1 is created."
  source:
    - logs: evidence/normalized_node_logs.parquet#row_4123
    - slack: export/slack_thread_11.json#msg_7
    - metrics: grafana/node_5xx_timeseries.csv#segment_2026-05-12T11:00
  status: approved
  adjudicated_by: [Verifier, Implementor, Safety]
  uncertainty: low
  open_questions: []

If an item remains contentious, do not mask it as an approved contract. Set uncertainty: medium or uncertainty: high, indicate the reason for doubt, and add a verification plan:

  • request the service owner;
  • run replay on historical data;
  • compare with a neighboring cluster;
  • gather the missing metric.

Such a provenance registry is especially important for the future Project Constitution. Only rules with clear provenance, scope of applicability, and review mechanism should transfer into it.

Examples and Application

The educational excerpt of 4 lines in the "Minimal Educational Scenario" is already a filtered normalization result. The original set contains:

  • 9 hours of observations;
  • 11 relevant Slack messages;
  • 47 pages of uncleaned logs;
  • 1,248 NodeNotReady events;
  • 63 alerts;
  • 8 previously closed incidents.

After normalization it is visible that a sharp NodeNotReady growth coincided with deployment, some events went into the canary segment with different auto-escalation logic, and two behavior branches appear: standard P1 and canary path with relaxed thresholds.

> [conceptual interface] — normalizer pseudocode. Runnable examples in the second volume remain on Python stdlib and live in book2/examples/.

read evidence/normalized_node_logs
sort events by ts
filter event_code == "NodeNotReady"
group by cluster,node in 10m windows
mark windows where count >= 3

link marked windows to alerts and Slack messages in [-15m,+5m]

The [-15m,+5m] window is needed because the operator may have discussed the problem before the formal incident record or already after the automatic alert. If the event belongs to a canary namespace without SLO degradation — set a separate tag, do not remove it as noise. If a planned deploy window explains some NodeNotReady, directly indicate in the requirement whether this blocks P1 creation or only reduces confidence.

The recovered SDD becomes a working artifact only after replay: run historical incidents through the new JSON contract and check whether the created severity, SLA, and escalations match confirmed outcomes. Mismatches do not always mean a contract error — sometimes they show that old practice was inconsistent or depended on a specific on-call person. What to change in this case — the specification, memory bank, or hypothesis status in genealogy.md — is decided by file arbitration from Part 8.

Summary

Recovering specifications from legacy restores an SDD not from intuition, but from a verifiable chain of evidence. The route is:

  • legacy artifacts are normalized into a timeline;
  • Qwen Code extracts candidate claims with confidence level;
  • requirements are separated from memory bank;
  • then encoded in Given/When/Then and JSON Schema;
  • for the full track they pass Coordinator/Implementor/Verifier file arbitration;
  • and receive provenance in genealogy.md.

Such process turns the chaos of logs, chats, and post-mortems into a contract. The contract can be validated, challenged, replayed on historical data, and transferred to a stricter rule system. In the next chapter we will intentionally poison specifications with contradictions and study where Qwen Code begins to get stuck.

Artifacts and Definition of Done

ArtifactDone when
genealogy.md with one requirement or hypothesisrequirement is separated from memory bank, contentious facts are marked as hypotheses
At least two evidence_refs and one missing contextclaim cannot be read as "author's opinion," threshold/SLA is defended by source reference or explicitly marked as not yet approvable
Given/When/Then formulationverifiable fields are linked to what the JSON Schema covers

The full track adds evidence/timeline.ndjson, evidence/matrix.csv with references to logs, Slack, metrics, and post-mortems, sdd/drafts/nr-claims.qwen.json with candidate claims, contracts/node_not_ready.schema.json, and a file arbitration record for requirements that cannot be approved manually. Consider the full track done when Given/When/Then and JSON Schema describe the same contract, the normalizer gives a reproducible timeline, and the validator or file arbitration delivers a verifiable verdict.

Practice

  1. Copy [examples/templates/genealogy.md](examples/templates/genealogy.md) to capstone/genealogy.md and fill in one entry for the main case high_memory_usage: claim, at least two evidence_refs, confidence level, and one open question. The educational excerpt from the "Minimal Educational Scenario" can be used as a substitute for real logs.
  2. Rewrite your claim in Given/When/Then and indicate alongside which three JSON Schema fields verify the threshold, severity, and closure condition. A field that cannot be defended by a source reference, leave as uncertainty: medium, not as an approved contract.
  1. Open [appendix-a-bridges-to-book.md](appendix-a-bridges-to-book.md) and mark which chapter of the first volume was the foundation for your genealogy.md. If there is no foundation — this is a signal that the requirement is not yet tied to the educational model.

Review Questions

  1. Why is evidence more important than a confidently worded requirement?
  2. How does memory bank differ from an SDD contract, and why is it dangerous to mix them?
  3. When can a hypothesis not be transferred to an approved requirement?
  4. You recovered a rule from two post-mortems, but the service owner left half a year ago. What will you do with this rule before adding it to requirements.md?
My notes
0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course menu

Course

Production SDD for Qwen Code CLI. Part 2
Progress 0 / 100