Reading: Applied Part 11. Integration with a Real API: From Specification to Deployment

Lesson 1 of 5 in module «Applied Part 11. Integration with a Real API: From Specification to Deployment»
You are viewing the lesson without signing in. Sign in to save progress and take tests.

Applied Part 11. Integration with a Real API: From Specification to Deployment

Status: Recommendation. SDD phase separation of Specify/Plan/Tasks/Implement/Validate and the 25-point readiness model — recommended framework. It does not require actual Kubernetes, GitOps, or an external executor for the educational pass.

Frontier. Fully automated auto-remediation without human review on the critical path remains frontier: even teams with extensive SDD experience keep a human in the loop. Of the built-in Qwen Code commands, only /plan is available here; the other steps are user commands or direct qwen -p via project scripts.

For the educational pass, the local pipeline examples/real-api/ is sufficient: normalize webhooks, pass the readiness gate, and block a forbidden action. GitOps, Kubernetes API, and full auto-remediation belong to the full production track.

> [runnable] — a runnable equivalent of the "webhook → normalization → readiness gate → dry run" pipeline is in [examples/real-api/](examples/real-api/README.md). Scripts work on stdlib without external dependencies; they do not replace production infrastructure, but allow you to run the gate locally and see which conditions block the action.

The high_memory_usage scenario is the peak of readings on the same SQLite we built in Part 12 of the first volume, and the same idempotent migration technique. Only now it is viewed from the operations side. The Specify → Plan → Tasks → Implement cycle, practiced in Part 7, Part 8, and Part 9 of the first volume, is not cancelled or replaced here. It is wrapped in a production gate and completed with team review of the evidence package in the spirit of Part 16.

Before Reading

  • Foundation from the first volume: Parts 7–9 establish the specification-plan-validation cycle, Part 16 — team review.
  • Local educational case: high_memory_usage, the canonical case of the entire first pass.
  • Trace for capstone/: readiness verdict, two blocking conditions, and dry-run of the permitted action.
  • Key terms of the first pass: readiness and dry-run. 25-point rubric, audit_trace, GitOps, executor — reference material.
  • What to defer: GitOps, Kubernetes API, full executor, and auto-remediation without manual confirmation.

Objective

In the educational minimum, this chapter checks the short chain webhook -> normalization -> readiness -> dry-run for high_memory_usage. The full production track expands it to GitOps deployment, rollback of changes, and readiness assessment before limited auto-remediation. Every action must be linked to specify/plan/tasks/implement/validate artifacts, not lost in manual commands.

The practical result of the first pass is not a production orchestrator, but proof that a permitted action passes readiness, while a forbidden one is blocked until the system is changed.

readiness here is a formal pipeline assessment on a 25-point scale with a threshold of 23/25. Auto-remediation in this chapter means a limited playbook with pre-approved actions, rollback conditions, and human review. This is not an agent's right to arbitrarily change production.

Of the built-in Qwen Code commands in this pipeline, only /plan is available. The other steps — /sdd:specify, /sdd:tasks, /sdd:validate — should be implemented as user commands in .qwen/commands/sdd/ or replaced with regular prompts via qwen -p and project scripts.

Minimal Educational Scenario

Educational Case

Production incident high_memory_usage for appointments-api — derived from the MVP phase and SQLite migrations in book/part-12-mvp.md. Pipeline: webhook from Grafana+PagerDuty → normalize_webhook.py → readiness gate by 25-point model → dry run against the list of pre-approved actions. Goal — to complete the full path from raw payload to controlled restart_pod and ensure that blocking conditions (audit, stateful) catch failure exactly where they should.

Preparation

  • book2/examples/real-api/fixtures/webhook_grafana.json, webhook_pagerduty.json — raw payloads with the same incident_key.
  • book2/examples/real-api/fixtures/incident_event.expected.json — reference for normalized event.
  • book2/examples/real-api/fixtures/readiness_pass.json (24/25), readiness_block_audit.json (22/25 + audit below 1.0), readiness_block_stateful.json (24/25, but stateful without confirmed backup).
  • book2/examples/real-api/specs/high_memory_usage/specify.md — pre-approved restart_pod and scale_up_replicas_one.
  • book2/examples/real-api/scripts/normalize_webhook.py, check_readiness.py, dry_run.py.

Steps

  1. cd book2/examples/real-api. Expected: you are in the example directory, no additional dependencies.
  2. python3 scripts/normalize_webhook.py --grafana fixtures/webhook_grafana.json --pagerduty fixtures/webhook_pagerduty.json --expected fixtures/incident_event.expected.json. *Expected: exit code 0, normalized incident_event matches reference.*
  3. python3 scripts/check_readiness.py --readiness fixtures/readiness_pass.json. *Expected: exit code 0, PASS incident=HM-2026-05-17-01 score=24/25.*
  1. python3 scripts/check_readiness.py --readiness fixtures/readiness_block_audit.json. *Expected: exit code 1, reason — audit_trace_coverage=0.7 < 1.0, plus failure by total score (22/25).*
  2. python3 scripts/check_readiness.py --readiness fixtures/readiness_block_stateful.json. *Expected: exit code 1, reason — stateful workload without confirmed backup, even though total is 24/25.*

Bad: run dry_run.py before the readiness gate — the action is formally permitted by specification, but audit_trace_coverage or backup_verified may be missing. Good: gate first, dry run only on exit code 0 from gate — this sequence ensures the blast radius is known before checking the action list.

  1. python3 scripts/dry_run.py --spec specs/high_memory_usage/specify.md --action restart_pod. *Expected: exit code 0, PASS: action=restart_pod permitted (2 actions in spec).*
  2. python3 scripts/dry_run.py --spec specs/high_memory_usage/specify.md --action delete_namespace. *Expected: exit code 1, BLOCK: action="delete_namespace" not found among pre-approved.*
  1. For the educational minimum, stop here: the runnable chain has demonstrated normalization, PASS for the permitted path, and BLOCK for audit/stateful/delete-namespace.

If you have Qwen Code installed and need an explanation for review, perform a separate optional step:

qwen -p "Read @fixtures/readiness_block_audit.json and @specs/high_memory_usage/specify.md. What needs to be added to reach readiness 23/25 and audit_trace_coverage=1.0? Do not modify files." --approval-mode plan

This request is not part of the runnable minimum. Its output can be attached to review, but readiness clearance must rely on check_readiness.py and dry_run.py.

Control Fact

Steps 3, 6 — PASS. Steps 4, 5, 7 — BLOCK with specific reason in stderr. If step 5 passes with stateful=true, backup_verified=false, the readiness gate is broken: the hard block for stateful cannot be bypassed.

How This Goes Into capstone/

Transfer to capstone/readiness.md the readiness result, two blocking conditions, and the dry_run.py result for the permitted action. In capstone/validation.md list the commands that were actually run. GitOps, Kubernetes API, and full executor are not part of the educational minimum unless they were implemented.

Read this fragment as follows: one positive fixture shows the permitted path, two blockers record specific rejection reasons, dry_run is the boundary case of permitted and blocked action. If any line is missing, the readiness package is incomplete.

readiness:
  pass_fixture: "readiness_pass.json -> 24/25"
  blockers:
    - "audit_trace_coverage=0.7 blocks auto mode"
    - "stateful=true without backup_verified blocks action"
  dry_run: "restart_pod PASS; delete_namespace BLOCK"

Reviewable Trace

Scripts write to stdout/stderr and do not create out/. Record the run as a readable artifact: a short capstone/readiness.md or a CI report if your project has one. Minimum content — the same four YAML lines from the block above (pass_fixture, two blockers, dry_run); full 25-point report is only needed in the full track.

Do not create a commit marker for the sake of the commit itself. For the textbook, a reproducible trace that can be read without chat history is what matters.

Key Ideas

The starting point of traceability is audit_trace (live log of Qwen Code), in which the incoming webhook and specification diffs are recorded as a single causal chain. For incident HM-2026-05-17-01, the first record links incident_event.json, user command /sdd:specify, and created file specs/high_memory_usage/specify.md. If any element is missing, the pipeline has already lost provability. Minimal log fragment: webhook_received -> incident_event_normalized -> /sdd:specify -> spec_diff_created; each subsequent diff references the same incident_id. /sdd:specify is a project extension; implement it as a user command in .qwen/commands/sdd/specify.md or replace with direct qwen -p.

Normalize alerts from Grafana and PagerDuty into a single incident-event. Otherwise different sources will dictate different versions of the same incident. Grafana provides metrics and observation window, e.g. memory_percent=93 over 10m. PagerDuty adds priority, service binding, and escalation status. The normalizer reduces them to fields service, namespace, pod, severity, window_minutes, metric_context, source_refs. After this, the specify step describes only WHY and WHAT: why intervention is needed and what result counts as success. It does not choose a library, SDK, or specific API endpoint.

What this means in practice. Let's compare two specify variants for the same incident:

Bad:

> Specify for high_memory_usage: Restart pod via kubectl delete pod ...

Problem: specify immediately chooses the implementation command and blocks Plan.

Good:

> Specify for high_memory_usage: Keep memory_percent < 80% for 5 minutes after action. Pre-approved actions: restart_pod, scale_up_replicas_one. Audit trace required.

SDD phase separation protects the pipeline from premature implementation. Each phase is responsible for its own:

  • Specify captures user story, success criteria, functional and non-functional constraints;
  • Plan chooses strategy;
  • Tasks turns it into executable steps;
  • Implement applies changes through a controlled mechanism.

This structure corresponds to the practical phase framework Specify → Plan → Tasks → Implement from GitHub Spec Kit (see also GitHub Spec Kit Quickstart). In production this matters because the model does not get the right to "immediately fix" an incident until cause, intervention boundaries, and result verification method are proven.

Do not expand the core of the chapter to the entire production orchestrator. In the first pass, only the chain webhook -> normalization -> readiness -> dry-run is checked here. Other mechanisms from previous chapters serve as control points:

  • Verifier from Parts 4 and 8 is needed if dry-run produces a disputed counterexample.
  • Tiered budgets from Part 9 are needed if frontier-reviewer starts serving not only high-risk branches.
  • Anti-Goodhart from Part 10 is needed if memory drops at the cost of 5xx, latency, or manual audit.

If these mechanisms are not yet assembled, do not try to model them inside Chapter 11. Record them as blockers or links to corresponding chapters, and complete the minimum run with a readiness verdict and dry-run result.

For high_memory_usage, start planning with minimal impact. The base /plan chooses restarting a specific pod with priority on blast radius. Then checks if scale-up is necessary. Only after that permits action expansion while maintaining rollback path.

The tasks step breaks this down into operations: confirm stateless nature of workload, perform dry-run without real changes, delete only the target pod, observe RSS, CPU, 5xx; if no improvement within the set window — activate rollback and create human_review.

Validation completes the auto-remediation loop only after checking real metrics, security gate, and GitOps commit (this is part of the frontier scenario, see chapter header). In validation.md check four conditions:

  • memory stays below threshold in two consecutive windows;
  • 5xx does not increase;
  • latency does not degrade;
  • rollback is described and executable.

After successful verification, six basic artifacts go into GitOps: specification, plan, tasks, diff, decision log, and 25-point report. Constitution update is added when necessary. Without this, the incident may be technically mitigated but not considered managed-closed.

Full Track: 25-Point Readiness Model

On the first pass, it is sufficient to understand two facts: readiness_pass.json passes, and audit/stateful fixtures are blocked. The full rubric below is needed when you transfer this gate to a real production process and must explain why the threshold was chosen this way.

The model assesses five categories on a 0–5 scale and gives a total sum. Points are assigned by artifacts, not by impression. If a criterion cannot be confirmed by file, log, or schema, a lower score is given. Below are rubrics for each category.

Threshold 23/25 — a "strict but not paralyzing" compromise for the educational AgentClinic-production model: up to two "fixable" complaints at "4" in different categories (4+4+5+5+5 = 23) or one "4" with the rest at "5" (24/25). "3" or below in at least one category immediately drops the total to 22 or less and removes auto-clearance. Below 23: 20–22/25 shifts the pipeline to semi-manual mode with human confirmation after each implement step. Higher — threshold 24/25 — makes auto fall to semi-manual from any minor complaint, and teams begin ignoring the model. Calibrate to risk profile: payments and healthcare — auto ≥24/25; internal tools permit 21–22/25, but only as semi-manual or canary, not as production-ready auto-remediation.

Spec — Completeness of WHY/WHAT/constraints

PointsSpec
5WHY/WHAT/constraints explicit, acceptance criteria present, no out-of-scope in plan, Given/When/Then present
4WHY/WHAT explicit, constraints present, but one plan item lacks implements:
3WHY present, WHAT vague, constraints partial
2One of three blocks (WHY/WHAT/constraints) missing
1Only symptom description, no WHY, WHAT, or constraints
0No specification

Implementation — Idempotency and Controlled Changes

PointsImplementation
5All tasks idempotent, dry-run present, blast radius explicitly stated at pod/deployment level, changes go through GitOps
4Idempotency and dry-run present, but one task changes state without prior check
3Dry-run only for some steps, blast radius described in text without explicit field
2No dry-run, changes applied directly to cluster bypassing GitOps
1Tasks not idempotent, rerun breaks state
0Actions performed manually, not recorded in tasks

Verification — Given/When/Then, Schemas, Stress, Monitoring

PointsVerification
5Given/When/Then covers happy and negative path, JSON Schema validates inputs and outputs, stress specification and post-metrics in two windows
4All elements present, but stress specification covers only one violation class
3Given/When/Then and schema present, but monitoring checked in one window
2Only Given/When/Then without schema and without post-metrics
1Validation reduced to exit code check or single screenshot
0validation.md missing or not running

Process — Traceability "webhook → CLI → diff → replay"

PointsProcess
5Every step (webhook, normalization, CLI command, diff, commit, validate) linked via incident_id, log reproducible, replay gives same diff
4Traceability complete, but replay requires manual substitution of one variable
3Webhook and CLI linked, but diff not tied to incident_id
2Log exists, but step order recoverable only by timestamp
1Actions recorded in chat, not in files
0No traceability, incident source unknown

Security — Guardrails, Emergency Stop, Rollback, Escalation

PointsSecurity
5Guardrails prevent blast radius expansion, emergency stop present, rollback condition written before execution, escalation to manual confirmation on uncertainty
4All elements present, but escalation described only in text without formal trigger
3Rollback and guardrails present, emergency stop missing
2Only rollback, without guardrails and escalation
1Declared "manual rollback" but no executable path described
0Security gate undefined, actions proceed without restrictions

How It Is Calculated and What Blocks Merge

The sum of points gives a total from 0 to 25. Passing threshold for auto-clearance is 23/25: below this boundary the pipeline does not get production-ready status, even if three categories are at maximum. Zero score in Security is forbidden at any total. 0 in this column means absence of protective contour and blocks even semi-manual mode until minimum rollback, guardrails, and escalation appear.

Blocking conditions do not depend on total. Each of these cases blocks merge separately:

  • failed validation (Verification ≤ 2);
  • missing rollback (Security ≤ 2);
  • undefined blast radius (Implementation ≤ 2 without explicit field).

At total 20–22, the pipeline is permitted only in semi-manual mode and only if no blocking conditions above: stop after each implement step, explicit human confirmation, mandatory specification update, and re-assessment before return to auto loop.

Checklist Before Production Cutover

Used when transferring the gate to a real process — each item is tied to the rubric in which hidden score drop is possible:

  • [ ] Spec contains WHY/WHAT/constraints and is tied to incident_id; each task has implements: pointing to REQ-identifier.
  • [ ] Dry-run is logged before real changes; blast radius fixed at pod or deployment level, not in words.
  • [ ] JSON Schema validates incident_event and final validation_report; Given/When/Then cover happy and negative path.
  • [ ] Rollback condition written before execution and tested on staging; emergency stop available to operator without cluster access.
  • [ ] Trace webhook → CLI → diff → commit → validate reproducible by incident_id; manual confirmation triggered automatically on repeated failure or blast radius expansion.

Example Filled Rubric for high_memory_usage

CategoryScoreJustification
Spec5WHY (prevent OOMKill), WHAT (RSS below 80% for 10 minutes), constraints (don't touch stateful, rollback after 6 minutes) explicit, Given/When/Then assembled
Implementation4Tasks idempotent, dry-run present, but scale-up branch lacks separate dry-run step
Verification5Given/When/Then, JSON Schema on incident_event and validation_report, stress specification on hidden leak, post-metrics in two windows
Process5incident_id=HM-2026-05-17-01 links webhook, /sdd:specify, diff, commit, and replay
Security4Guardrails on stateful workloads, rollback and emergency stop present, escalation described in text without formal trigger
Total23/25Production-ready by threshold, but scale-up branch remains semi-manual until separate dry-run

Full Track: Threshold Calibration

Table "Low / Default / High" for readiness threshold, exercise with THRESHOLD override, and signals for review — in Appendix D, Section D.5. On the first pass, the chapter minimum is already proven if readiness_pass.json passes, audit/stateful fixtures are blocked, and delete_namespace does not get into the pre-approved action list.

Examples and Application

Practical input log for Qwen Code may start like this: POST /hooks/grafana reports memory_percent=93, pod=api-7b4, namespace=appointments-api, window=10m. Then POST /hooks/pagerduty confirms severity=critical and links the event to service appointments-api. The normalizer creates incident_event with incident_id=HM-2026-05-17-01, removes sensitive fields, attaches source references, and triggers user command /sdd:specify --event incident_event.json --preset high_memory_usage or equivalent qwen -p prompt — both variants belong to the recommended framework from the chapter header and are implemented as project commands around Qwen Code.

The first diff in specify.md captures three blocks: WHAT (reduce RSS below 80% for 10 minutes), WHY (prevent OOMKill and latency growth), constraints (don't touch stateful workloads, don't change HPA, have rollback after 6 minutes without improvement). On /plan the system compares two strategies: (A) restart target pod and observe; (B) restart plus temporary scale-up to four replicas. The verifier runs Given/When/Then: given pod is stateless and memory above 90% for 10 minutes; when only target pod is restarted; then memory must drop below 80%, and 5xx must not exceed acceptable threshold. If stress specification shows that scale-up requires changing rollout policy or hides memory leak by growing replica count, variant B remains a backup branch with manual confirmation, not automatic action.

At implement step, dry-run is performed first. Then commit goes through GitOps and syncs to ArgoCD only on green validator status. The executor does not close the PagerDuty incident immediately after restart. It waits two monitoring windows, checks validation.md, verifies the security gate, and adds spec, tasks, commit, and validation result links to the comment. If after 6 minutes memory does not decrease or 5xx grows, rollback path activates, human_review is created, and readiness score is recalculated with the failed verification criterion.

Summary

Production pipeline readiness is fixed by the 25-point model: five categories (Spec, Implementation, Verification, Process, Security) with equal weight of 5 points each repeat the SDD cycle stages. Equal weight is a principle: no category compensates for a gap in another, so threshold 23 permits no more than two partial gaps total. Production-ready — not below 23/25 with no critical validation and security gate violations. Dropping below threshold shifts auto-remediation to semi-manual mode until specification, policy, or execution path is fixed. Fully automated remediation remains the frontier scenario from the chapter header: permit it only after accumulating replay evidence and operator dry-run. Such a loop turns every future incident into a verifiable system improvement.

Artifacts and Readiness Criteria

ArtifactReady when
Normalized incident_eventmatches examples/real-api/fixtures/incident_event.expected.json field-for-field; Specify captures WHY/WHAT/constraints and does not choose remediation command
Local readiness gate runreadiness_pass.json passes; audit/stateful fixtures blocked with specific reason
dry_run.py on permitted and forbidden actionrestart_pod PASS, delete_namespace BLOCK
Record in capstone/readiness.mdscore, blocking conditions, one actually run command

Full track adds specs/high_memory_usage/specify.md, plan.md, tasks.md, and validation.md, GitOps diff or commit linked to incident_id, decision log webhook → CLI → diff → commit → validate, and filled 25-point readiness table with evidence. Consider it ready if plan and tasks have blast radius, dry-run, rollback condition, and manual confirmation trigger; validation checks two metric windows, 5xx, latency, and security gate; user commands are either implemented as project commands or replaced with qwen -p prompts or project scripts; readiness total not below 23/25 without blocking conditions on rollback, verification, or blast radius.

Practice

  1. cd book2/examples/real-api && python3 scripts/normalize_webhook.py --grafana fixtures/webhook_grafana.json --pagerduty fixtures/webhook_pagerduty.json --expected fixtures/incident_event.expected.json — *expected: code 0, normalized incident_event matches reference field-for-field.*
  2. Run four checks separately (each returns its own code, so && between them is not suitable):
   python3 scripts/check_readiness.py --readiness fixtures/readiness_pass.json
   python3 scripts/check_readiness.py --readiness fixtures/readiness_block_audit.json
   python3 scripts/dry_run.py --spec specs/high_memory_usage/specify.md --action restart_pod
   python3 scripts/dry_run.py --spec specs/high_memory_usage/specify.md --action delete_namespace

*Expected: readiness_pass → code 0, PASS incident=HM-… score=24/25; readiness_block_audit → code 1, BLOCK … score=22/25 with reasons "score 22/25 below threshold 23" and "audit_trace_coverage=0.7 < 1.0 — full coverage mandatory"; restart_pod PASS, delete_namespace BLOCK.*

  1. Assess your case by the 25-point model and fill the table below. For each category indicate score, evidence artifact, and reason for reduction if score is below 5. Calculate total, check blocking conditions, and formulate what needs to change for the pipeline to pass threshold 23/25. *Expected: all three fields filled in each table row; "Evidence artifact" points to specific file or run, not general statement; total cell contains number in format N/25 and list of blocking conditions, or explicit "no blockers".*
CategoryScore (0–5)Evidence artifactReason for reduction
Spec
Implementation
Verification
Process
Security
TotalBlocking conditions:What to change before cutover:

Control Questions

  1. Why should specify not choose a specific remediation command?
  2. What conditions make auto-remediation unacceptable?
  3. What blocks clearance if readiness is below 23/25?
  4. A webhook about high_memory_usage arrives off-hours, automatic remediation is ready to restart pod. The readiness model gives 22/25 (minus 3 for incomplete audit). What will you do — restart, wait until morning, or call the on-call?
My notes
0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course menu

Course

Production SDD for Qwen Code CLI. Part 2
Progress 0 / 100