Study guide: Applied Part 0. AgentClinic-production Laboratory

Lesson 3 of 5 in module «Applied Part 0. AgentClinic-production Laboratory»
You are viewing the lesson without signing in. Sign in to save progress and take tests.

Topic: Applied Part 0. AgentClinic-production Lab

Difficulty level: Medium

Estimated study time: 4-6 hours (theory + first-pass practice)

Prerequisites: Completed first volume of AgentClinic (routes, SQLite, feature specs, checks and reviews)

Basic proficiency in TypeScript and Python

Understanding of CI/CD and production operations concepts at reading level

Ability to work with command line and Git

Familiarity with alerting, metrics, and incident management concepts

Learning objectives: Explain the difference between the educational production model and real infrastructure, correctly interpreting the roles of Kubernetes, Grafana, PagerDuty as scenario designations

Classify three types of command blocks ([runnable], [project script], [conceptual interface]) and apply the rule of running only [runnable] commands when forming a graded package

Form an end-to-end graded case high_memory_usage in appointments-api, selecting a primary incident and selecting principles from additional cases for transfer to capstone/

Create the capstone/ structure and fill in the first artifacts (genealogy.md, specification pair, constitution.md) via the minimal route of chapters 1–3

Execute smoke run bash book2/examples/smoke_all.sh and verify the functionality of educational simulators before proceeding to chapters 4–11

Overview: Applied Part 0 is the methodological introductory lab of the second volume of AgentClinic. It does not introduce new technique, but establishes a reading map: how to turn a set of advanced chapters into a unified educational production contour around the familiar AgentClinic project. Key insight: the second volume uses the same stack (TypeScript, Hono, SQLite), but adds a second layer — small Python scripts in book2/examples/ as educational simulators of production mechanisms. These scripts do not replace the product stack; they provide the ability to run a minimal example in seconds without spinning up infrastructure. Part 0 introduces the rule "one project, one primary contour, one growing evidence package", defines the end-to-end case high_memory_usage, the capstone/ structure, and the minimal route for limited time. Successful completion of this part is measured by a control fact: primary incident-case selected, empty capstone/ created, runnable examples verified or deferred with explicit reason.

Key concepts: Agentclinic-production: Educational production model around the familiar AgentClinic project. Does not require real Kubernetes, Grafana, PagerDuty or GitOps — these terms denote roles in scenarios: where the signal came from, which action is dangerous, where rollback is needed, which artifact proves the solution. The model allows studying production practices without infrastructure costs.

High memory usage: Primary graded case for end-to-end completion. Incident in the appointments-api service, convenient for bringing to a full evidence package: webhook normalization, readiness gateway, trial run, final package. Other cases (autoscale_200pct, cdn_error_budget_burn, node_not_ready, etc.) serve as laboratory windows for individual mechanisms, but their principles are transferred to the primary case rather than mixed as equals.

Capstone/: Final evidence package for one incident. Structure repeats the logic of the first volume: intent and boundaries → plan and facts → review and outcome. Each chapter opens a specific file. Key rule: one incident per package, solution understandable without chat history.

Three types of command blocks: [runnable] — run as written, examples in book2/examples/; [project script] — contract for a future script in your project, not required to exist in the repository; [conceptual interface] — form of future integration, not run during educational completion. Graded package references only executed facts or manual artifacts readable without chat context.

Educational simulators (python stdlib): Second layer of code in the second volume — small scripts in Python standard library. Not a stack change for the product, but the cheapest way to run an example without building: stress-mutator, duel, Spec CI, token budget, readiness calculator. In a real project these checks would become pre-commit, GitHub Actions, MCP tool, or service on the product stack.

Transfer of principles vs. transfer of cases: From a local case in a chapter to high_memory_usage, what is transferred is not the case itself but a verifiable principle. For example: from autoscale_200pct — guard rule "do not expand blast radius beyond quota"; from cdn_error_budget_burn — anti-Goodhart invariant "MTTR cannot be improved at the cost of silent P0"; from node_not_ready — requirement provenance and rule "do not close without recovery evidence".

Genealogy.md: Chapter 1 artifact. Requirement origin and confidence level. Minimal output: one requirement with two sources.

Poisoned-spec.md / fixed-spec.md: Chapter 2 artifact. Pair of specifications "with defect / fixed", demonstrating one error class.

Constitution.md: Chapter 3 artifact. Two immutable rules and one time-bound rule.

Validation.md: Chapters 4 and 7 artifact. Commands, manual facts, and blockers. Contains happy path, negative path, and counterexample.

Judgment.md: Chapter 8 artifact. Verdict on a disputed change with reference to evidence.

Budget-note.md: Chapter 9 artifact. Risk of model budget depletion and threshold for switching to cheaper tier.

Goodhart-note.md: Chapter 10 artifact. Target metric and paired guard metric against Goodhart distortion.

Readiness.md: Chapter 11 artifact. Production readiness verdict and dry-run of permitted action.

Antipattern-audit.md: Chapter 12 artifact. Three items of blocker / owner / next_check.

Minimal route: Shortened path for limited time: part 0 + README of runnable example → chapters 1–3 (three manual artifacts) → chapters 4–11 (only [runnable] commands, principles from other cases transferred as strings) → chapter 12 (diagnostic checklist) → chapter 13 (assembly of capstone/ with five PASS strings from rubric). Does not require external orchestrators, MCP servers, Kubernetes integrations.

Full track: Extended completion with real integrations. Files like scorebook, metric_network, decision_hash, precedents.md, and CI reports are added only if actually created or if there is a runnable analog confirming the principle.

Practice exercises: Name: Classification of command blocks

Problem: You are given three fragments from different chapters of the second volume:

Fragment A: python3 book2/examples/stress-mutator/run.py --spec capstone/fixed-spec.md Fragment B: [project script] deploy-gate --readiness capstone/readiness.md --dry-run Fragment C: [conceptual interface] Integration with PagerDuty Oncall API for automatic engineer assignment

Determine the type of each fragment. Indicate which one can be run now, which one — to be implemented in your project later, which one — not to be run during educational completion. Explain whether a graded package can reference each of them.

Solution: 1. Fragment A — [runnable]. Can be run now if file capstone/fixed-spec.md exists. Graded package can reference the run result as a verifiable fact.

  1. Fragment B — [project script]. This is a contract for a future script in your project. Not required to exist in the textbook repository. Graded package can reference it only if you implemented an analog and ran it, or if you described a manual artifact readable without chat context.
  2. Fragment C — [conceptual interface]. Form of future integration, not run during educational completion. Does not enter the graded package as a fact on first pass; if the principle is important, it is recorded as a one-line invariant in the corresponding capstone/ file.

Complexity: beginner

Name: Forming a transfer map for autoscale_200pct

Problem: You are completing a chapter with the local case autoscale_200pct. According to Part 0 rules, you must not mix this case with the primary high_memory_usage in one capstone/. After reading the chapter, you see: autoscaling doubled resources on alert, but this expanded the blast radius (affected neighboring services), exceeding the quota. Which principle should be transferred to the primary case high_memory_usage and in which capstone/ file should it be recorded?

Solution: 1. Local case autoscale_200pct remains in the chapter as a laboratory window.

  1. Verifiable principle for transfer: guard rule "do not expand blast radius beyond quota".
  2. This rule relates to a counterexample or limiter, so it is recorded in validation.md (chapters 4 and 7) as negative path or in constitution.md (chapter 3) as a time-bound rule if it is regularly checked.
  3. Formulation in capstone/: one specific item, for example: "Guard: on high_memory_usage alert, scaling is limited to quota X; counterexample from autoscale_200pct shows that quota excess leads to cascading failure".
  4. Reference to local runnable example is acceptable, but the solution in README.md explains high_memory_usage.

Complexity: intermediate

Name: Smoke run and failure diagnosis

Problem: You execute bash book2/examples/smoke_all.sh before chapter 5. The script fails on the stress-mutator block with error ModuleNotFoundError: No module named 'json'. Using knowledge from Part 0, determine the probable cause and actions. Can you continue the educational route without fixing it?

Solution: 1. json is part of Python stdlib, ModuleNotFoundError for it is impossible in a normal installation. Probable cause: script was run in an isolated environment (custom Python build, container without stdlib, or overridden PYTHONPATH).

  1. Check: python3 -c "import json; print(json.__file__)" — if it fails, the environment is broken.
  2. According to Part 0, all book2/examples/ use Python stdlib and must run without dependencies. If json is unavailable, this is an environment contract violation, not an educational material issue.
  3. Actions: check which python3, use system Python 3.8+, remove virtual environment with custom configuration.
  4. Can you continue without fixing: no, smoke run is a control fact before chapters 4–11. However, you can choose the minimal option: open examples/README.md, run only the current chapter's block manually, defer the rest with explicit reason "nonstandard environment, manually verified for chapter 5".

Complexity: intermediate

Name: Designing capstone/ structure after chapter 3

Problem: You have completed chapters 1–3 via the minimal route. You have: genealogy.md with requirement "readiness gateway must check memory_before_allow" (sources: incident high_memory_usage and team SRE guide), pair poisoned-spec.md/fixed-spec.md showing defect "alert does not distinguish memory leak and legitimate spike", and constitution.md with two immutable rules ("no actions without dry-run" and "all P0 require audit") and one time-bound rule "readiness check valid 24 hours". Check whether this matches the minimal output of Part 0. What is missing to proceed to chapter 4?

Solution: 1. Check against minimal output table:

  • After chapter 1: one requirement with two sources in genealogy.md — ✅ (requirement + incident + SRE guide).
  • After chapter 2: specification pair with visible error class — ✅ (poisoned/fixed show alert classification defect).
  • After chapter 3: constitution.md with two immutable and one time-bound — ✅.
  1. To proceed to chapter 4, a minimal counterexample to one rule or formulation of the next limiter is needed. This is chapter 4 output, not chapter 3 — therefore ready to proceed to chapter 4.
  2. However, it is useful to pre-select which rule from constitution.md will get a counterexample in chapter 4. Recommendation: rule "readiness check 24 hours" — easy to build counterexample with alert arriving 25 hours after check.
  3. Check capstone/ structure: files do not mix incidents, each readable without chat history — ✅.
  4. Only missing smoke_all.sh execution or its conscious deferral with reason.

Complexity: intermediate

Name: Distinguishing minimal route from full track

Problem: In chapter 7 you encounter terms scorebook, Spec CI pipeline, decision_hash. According to Part 0, how do you determine whether these terms are needed for the graded package? Your current minimal output is a Spec CI string: what is covered, what is blocked. Formulate what to record in capstone/ if scorebook and decision_hash were not created by you but are conceptually described in the chapter.

Solution: 1. Part 0 rule: if a term does not affect the current chapter output, do not stop on it during first reading.

  1. Minimal output of chapter 7: Spec CI string — what is covered, what is blocked. This goes into validation.md.
  2. scorebook and decision_hash are elements of the full track. They are needed only if you actually created them or can explain a runnable analog confirming the same principle.
  3. Action: record in validation.md in one line the principle that scorebook and decision_hash protect, without mentioning the terms themselves. For example: "Coverage: specification passes 5/5 Spec CI checks. Blocker: if mutator finds unhandled payload, merge is forbidden. Verifiable fact: running python3 examples/spec-ci/run.py on fixed-spec.md yields PASS".
  4. If you later implement scorebook in the full track — add the file. Redundant on first pass.

Complexity: advanced

Case studies: Name: Migrating the educational model to real SRE practice: from AgentClinic-production to production-ready pipeline

Scenario: A team of 8 completed the first and second volumes of AgentClinic via the minimal route. Their capstone/ contained a complete package for high_memory_usage: genealogy.md with two sources, specification pair with alert classification defect, constitution.md with guard rules, validation.md with stress-mutator and Spec CI results, readiness.md with dry-run verdict. The team was tasked to implement a similar process for a real doctor appointment service in a clinical network.

Challenge: Three key problems in translating the educational model: (1) the real service used Go and PostgreSQL, not TypeScript/SQLite — temptation arose to discard the educational stack as irrelevant; (2) infrastructure required real PagerDuty, DataDog and GitHub Actions, not Python simulators; (3) management demanded "full track" immediately, including scorebook and metric_network, increasing timeline from 2 weeks to 3 months and causing project paralysis.

Solution: The team applied Part 0 principles: (1) separated product stack (Go/PostgreSQL) from check layer — Python simulators were replaced with GitHub Actions and pre-commit hooks in Go, but artifact structure (genealogy.md, constitution.md, etc.) remained; (2) terms PagerDuty/DataDog were used as scenario roles, implementing integration gradually: first Slack webhook simulator, then real PagerDuty; (3) insisted on minimal route: first 2 weeks delivered working pipeline with 5 files in capstone/, full track added iteratively by principle "one new verifiable output per sprint".

Result: Minimal pipeline worked in 10 days. First real incident (memory pressure on appointment service) was handled via capstone/ in 45 minutes instead of previous 4 hours. Guard rule "do not expand blast radius" prevented cascading shutdown of payment gateway. Full track (scorebook, decision_hash) was implemented over 6 months iteratively, without paralysis. Educational Python scripts remained as prototypes for local debugging of new guard rules.

Lessons learned: Educational model is valuable not for its stack, but for artifact structure and principles of separating checks from product

Minimal route is not simplification for the weak, but a strategy of fast validation before scaling

Scenario roles (PagerDuty, Kubernetes) allow designing process before infrastructure readiness

Full track without working minimal route — risk of architectural paralysis

Educational Python scripts have long life as guard rule prototyping tools

Related concepts: AgentClinic-production

Minimal route

Three types of command blocks

Transfer of principles vs. transfer of cases

capstone/

Name: Case mixing error: when autoscale_200pct consumed high_memory_usage

Scenario: An engineer completing the second volume decided to "do everything right" — include both incidents in capstone/ as equal cases. README.md described high_memory_usage, but files validation.md and readiness.md contained solutions for autoscale_200pct, and budget-note.md — for cdn_error_budget_burn.

Challenge: On review by another engineer the package was unreadable: guard rules from autoscale_200pct contradicted the readiness gateway for high_memory_usage, judgment.md referenced a disputed change without context, and README.md did not explain why the package contained three incidents. Reviewer spent 2 hours figuring out context and rejected the package.

Solution: Rework per Part 0 rules: one primary case high_memory_usage selected; principles from other cases extracted as one-line invariants in corresponding files. For example, instead of a section about autoscale_200pct — a line in validation.md: "Guard rule: scaling limited by quota (counterexample from autoscale_200pct)".

Result: Reworked package passed review in 15 minutes. Engineer mastered the key Part 0 skill: separating principle from case. In subsequent projects they applied this to consolidate post-mortems from dozens of incidents into a unified guard rule set.

Lessons learned: Mixing cases in one capstone/ creates unreadability and contradictions

Principle "one incident — one package" does not limit learning, but focuses it

Review by "another person" is a quality criterion, not a formality; package must be understandable without chat history

Counterexample from another case strengthens the primary one if recorded as an invariant, not as a separate story

Related concepts: high_memory_usage

Transfer of principles vs. transfer of cases

capstone/

validation.md

Study tips: Read Part 0 in full before touching code. This is a methodological chapter without steps; its purpose is a map, not execution. Attempting to "just run smoke_all.sh" without understanding the graded case will lead to accumulation of irrelevant artifacts.

Physically create mkdir -p capstone immediately after reading. Empty directory — an anchor for decision-making: "this file goes here or is it full track?"

Print or keep open the table "After chapter → Minimal output". Use it as a checklist: if after a chapter the file does not match the description, proceed to the next chapter only after revision.

For visual style: draw arrows on paper between capstone/ files. genealogy.mdpoisoned-spec.md/fixed-spec.mdconstitution.md → ... This helps see that each file is not an isolated document, but a link in an evidence chain.

For auditory learners: read control questions aloud and record voice answers before checking. If the answer takes more than 30 seconds — understanding is likely not deep enough.

For kinesthetic learners: physically execute the smoke run, not reading the script. Then open smoke_all.sh in editor and trace which block belongs to which chapter. Manual mapping "code → chapter → output" reinforces structure.

When working with additional cases (autoscale_200pct, cdn_error_budget_burn, etc.) use the "one principle — one line" technique. Open the target file for transfer and record the principle before closing the chapter tab. Otherwise context will be lost.

Conduct self-review: close all materials and try to explain "why cases cannot be mixed" in your own words to an 8-year-old or a colleague from another team. If you need to mention Kubernetes or PagerDuty — you are still in the "real infrastructure" trap, get out to "scenario roles".

For chapters 4–11: run runnable examples with --help flag or study the argparse section before execution. Understanding simulator parameters is more important than quick result — you must be able to explain what exactly the script checks, not only that it output PASS.

Final test before chapter 1: name the primary case, show empty capstone/, demonstrate smoke_all.sh result or its conscious deferral. If any item is impossible — return to Part 0.

Additional resources: Examples/templates/capstone-dossier.md: Minimal form of answer to "what trace should remain after first pass?". Not a template for mindless copying, but a reference for self-checking package structure

Examples/readme.md: Description of all runnable examples by chapter blocks. Use for selective smoke run if full smoke_all.sh takes too long

Book2/examples/smoke all.sh: Full smoke run script. Runs on temporary copy, does not leave artifacts in working tree

First volume of agentclinic (routes, sqlite, feature specs): Foundation on which the second volume is built. Ensure genealogy.md and specification pairs from the first volume are understood

Part 0 control questions (inside document): Four self-check questions before proceeding to chapter 1. Use as flashcards

Glossary of terms for chapters 4–13: Do not learn in advance from Part 0 table — terms are introduced in their chapters. But keep at hand for quick check when encountering an unfamiliar word

Summary: Applied Part 0 is a fundamental methodological laboratory defining how to read the second volume of AgentClinic as a unified production contour, rather than a set of isolated advanced techniques. Key principles: (1) educational model AgentClinic-production uses scenario roles instead of real infrastructure; (2) Python scripts in book2/examples/ are simulators for local execution, not a product stack change; (3) one end-to-end case high_memory_usage, others — laboratory windows for principle transfer; (4) three types of command blocks with clear gradation rule; (5) capstone/ structure as a growing evidence package understandable without chat history; (6) minimal route vs. full track — fast validation before scaling. Successful completion is measured by control fact: case selected, capstone/ created, runnable examples verified. Everything subsequent builds on this base.

My notes
0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course menu

Course

Production SDD for Qwen Code CLI. Part 2
Progress 0 / 100