Applied Part 10. Protecting Metrics from Goodhart: Guard Metrics and Emergency Mode
Status: Recommendation. Protecting KPIs with a paired guard metric and a blocking emergency mode is an established practice described in the Google SRE Book. Specific thresholds (silent_p0, manual_review_floor, audit_trace_coverage) and the validation.md v1.1 format are a recommended framework that most teams adapt.
For educational walkthrough, it is sufficient to run examples/goodhart-validator/ and see how a good MTTR gets blocked by rising silent_p0. The metric network, trace fields, and threshold calibration belong to the full production track. If the term "red button" appears below, read it as a short label for the formal emergency mode.
In Part 9 of the first volume, one metric per check was sufficient: "review visible after publication," "sum does not go negative." In the production scenario cdn_error_budget_burn, that same logic is no longer enough. The agent journal and ailment dashboard from Part 11 shows a contradictory picture after release, and a single metric becomes a trap. Here we will expand it into a network of paired guard metrics — these are pairs of "KPI + safety indicator," where the second prevents optimizing the first at the cost of hidden damage. The catalog of typical manipulations against which this network protects is systematized in Part 20. SDD Antipatterns.
Before Reading
- Foundation from the first volume: Part 9 teaches to verify the fact, not convincing prose; Part 20 shows how the process starts protecting the wrong goal.
- Local educational case:
cdn_error_budget_burn, because improved MTTR can be blocked by risingsilent_p0. - Trail for
capstone/: one target metric, one guard metric, and one blocked example forhigh_memory_usage.
- Key terms for the first pass: guard metric and emergency mode ("red button"). The rest —
silent_p0,manual_review_floor,audit_trace_coverage,edge_drift, trace fields, metric network — are reference material; open them only when needed for a single line incapstone/goodhart-note.md. - What to defer: metric network, trace fields, drift calibration, and full emergency mode.
Goal
By the end of the section, you will assemble a validation.md that catches Goodhart traps in advance and does not allow the LLM incident pipeline to improve reported KPIs at the cost of triage degradation.
The main gain is this: you will separate metrics into controllable goals and inviolable quality invariants. Then you will establish verifiable thresholds for them, evidence in traces, and CI blocks.
"Trap metrics" here means KPIs that are useful as signals but become dangerous if optimized separately from quality invariants. KPI (key performance indicator) is a key metric that the team seeks to improve with a release.
This approach continues the SDD cycle: specification, verification criteria, and iterations are fixed before implementing changes, not adjusted after obtaining a pretty result (GitHub Spec Kit Quickstart).
The effect "when a measure becomes a target, it ceases to be a good measure" is classically known as Goodhart's law (Wikipedia: Goodhart's law). Google's SRE definition of SLO directly relies on this caution (SRE Book: Service Level Objectives).
Minimal Educational Scenario
Educational Case
Production incident cdn_error_budget_burn, projected onto the educational agent journal from book/part-11-second-feature-phase.md. The release improved MTTR from 660s to 290s, formally looking like a success. But silent_p0 jumped from 0.02 to 0.18, manual_review_rate dropped from 0.18 to 0.12. The goal is to see that the CI gate catches this shift and blocks the merge despite the "green" MTTR.
Preparation
book2/examples/goodhart-validator/specs/validation.yaml— invariants and red button check.
book2/examples/goodhart-validator/fixtures/baseline_metrics.json— baseline (MTTR 660s, silent_p0 0.02).book2/examples/goodhart-validator/fixtures/new_metrics_good.json— improvement without blind spots.book2/examples/goodhart-validator/fixtures/new_metrics_bad.json— "MTTR blindness" (290s, silent_p0 0.18).book2/examples/goodhart-validator/fixtures/new_metrics_drift.json— correlation edge drift.book2/examples/goodhart-validator/scripts/run_validation.py,compare_drift.py,ci_gate.py.
Steps
cd book2/examples/goodhart-validator. Expected: you are in the example directory, no additional dependencies needed.- Run "good":
python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_good.json. *Expected: exit code 0, statusPASS, all three invariantsOK.* - Run "MTTR blindness":
python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_bad.json. *Expected: exit code 1,red_button_mttr_blindnesstriggers,manual_review_floorandsilent_p0_capmarked FAIL.*
Bad: looking only at MTTR — the release is faster, seems "better."
Good: running validation with invariants — "faster" with silent_p0=0.18 gets blocked automatically.
- Run drift against drift fixture:
python3 scripts/compare_drift.py --baseline fixtures/baseline_metrics.json --new fixtures/new_metrics_drift.json. *Expected:edge_drift > 0.12, exit code 1.* - Control: same
compare_drift.pyagainst good metrics. *Expected:edge_drift <= 0.12, exit code 0.* - Full CI gate:
python3 scripts/ci_gate.py --validation specs/validation.yaml --baseline fixtures/baseline_metrics.json --new fixtures/new_metrics_bad.json. *Expected: exit code 1,reasonslists specific violated invariants, not a genericFAIL.* - Record the run as a short anti-Goodhart conclusion: target metric improved, but
silent_p0_capandmanual_review_floorblocked the release. Expected: on the next pull request with MTTR acceleration, the validator compares against good/bad/drift fixtures, not "green vs old baseline."
If you have Qwen Code installed and need an explanation for review, perform a separate optional step:
qwen -p "Read @fixtures/new_metrics_bad.json and @specs/validation.yaml. Which invariant cannot be bypassed even at MTTR=290s? Do not modify the files." --approval-mode plan
Such output is useful as explanation but does not replace run_validation.py, compare_drift.py, and ci_gate.py.
Control Fact
Step 2 gives exit code 0, steps 3 and 4 give exit code 1 with specific indication of violated invariants. Step 6 shows the same behavior in the composite gate. If the CI gate passes new_metrics_bad.json, the validator configuration is weakened — the silent_p0_cap or manual_review_floor threshold has been shifted.
How This Gets into capstone/
Transfer to capstone/goodhart-note.md one target metric, one guard metric, and one blocked example. If the main credit case is high_memory_usage, record this run as an anti-Goodhart risk for the same contour: memory or MTTR cannot be improved at the cost of silent_p0, manual audit, or 5xx. Do not transfer the entire metric network if it was not recomputed; for the educational minimum, it is sufficient to show that an improved KPI does not pass without a protective invariant.
Minimal fragment:
target_metric: "MTTR <= 5m"
guard_metric: "silent_p0 <= 0.05 and manual_review_rate >= 0.15"
blocked_example: "new_metrics_bad.json"
reason: "MTTR improved, but silent_p0 and manual_review_floor fail"
Reviewable Trail
Scripts run_validation.py, compare_drift.py, and ci_gate.py write results to stdout, they do not create a separate out/ directory. For the educational route, transfer the summary to capstone/goodhart-note.md: target metric, guard metric, blocked example, and reason.
If in your own project you save outputs/goodhart.last-run.txt, it should be a readable appendix for review, not an empty marker. In SDD, a fact is a reproducible command or readable artifact, not the mere existence of a commit.
Key Ideas
First, define which indicators remain quality invariants and which become optimization targets and therefore subject to manipulation. An invariant cannot be "improved" by direct pressure: it describes the minimally acceptable system state. Examples of invariants:
- audit completeness;
- manual review share;
- upper bound on
silent_p0(this is the share of "silent" critical incidents closed without escalation).
An optimization target, conversely, can be lowered or raised, but only within a protective corridor. MTTR is useful as a recovery speed indicator but dangerous as the sole reward for the model or team.
In validation.md, make this distinction explicit. MTTR<=5m can be a goal. But manual_review_rate>=15%, silent_p0<=5%, and audit_trace_coverage==100% should remain admission conditions.
Bad:
> Achieve MTTR below 5 minutes.
Problem: bare goal without guard metrics, direct path to silent_p0.
Good:
> MTTR <= 5m AND silent_p0 <= 5% AND manual_review_rate >= 15% AND audit_trace_coverage == 100% — violation of any condition = CI_BLOCK.
The Goodhart trap manifests when a metric becomes a substitute for reality. The system starts optimizing the measurement method, not triage quality. If MTTR is checked in isolation, the model learns to close incidents faster, reduce escalation share, and avoid lengthy investigations — exactly those that spoil mean time to recovery.
On the chart this looks like a win: MTTR drops to 5 minutes or below. But in the operational loop this may mean the opposite. Complex P0s did not disappear, they became invisible because they were misclassified as false positives, low urgency, or "self-healing" events.
The "MTTR 5 minutes" trap is especially dangerous for rare severe incidents where closure speed competes with investigation completeness. In numbers it looks like this:
- baseline on 300 incidents in replay: MTTR 11:00, escalation share 14%,
silent_p02%; - new optimized version: MTTR 4:50, escalations 6%,
silent_p018%.
Formally the KPI improved. But the system started missing critical events more often without manual review and escalation. Block such a release: it transfers risk from the visible report into future repeat incidents, post-mortem regressions, and lost chains of responsibility.
The antidote in validation.md is formal conditions that prevent optimization from redefining the meaning of quality. The minimal set is three rules, and they must be checked simultaneously:
| Rule | What It Protects | Boundary |
|---|---|---|
manual_review_floor | share of decisions with manual verification | not below 15% |
| silent_p0_cap | share of "silent" P0 closed without escalation | not above 5% | | audit_trace_required | completeness of decision trace (prompt, diff, source) | 100%, no exceptions |
Individually these rules leave loopholes. High traceability does not compensate for rising silent_p0. Manual review is useless if it is impossible to reconstruct the prompt, distinction, and source of the decision. Tune the "red button" so it triggers not on one bad number but on violation of the protective contour.
What to Choose as Goal and What as Protection
Not all KPIs require equal protection. Manual triage operations and automatic remediation have different risk levels, so the minimal set of invariants differs. The main rule is one: the more dangerous the action, the more guard metrics must pair with the target KPI.
| Decision Type | What We Improve | What Must Go in Pair |
|---|---|---|
| Manual triage operation | MTTR | decision trace saved completely |
| Auto-classification without action | classification speed and accuracy | no silent P0; decision trace saved |
| Auto-escalation | escalation delay | no silent P0; no false escalations |
| Auto-remediation without state | MTTR | no silent P0; manual review exists; full audit trace | | Auto-remediation with state (DB, cache) | MTTR | same + verified backup | | New policy release | repeat accuracy | no "edge drift"; full audit trace |
Full English metric names (silent_p0, manual_review_floor, audit_trace_coverage, false_escalation_rate, edge_drift, postmortem_gap, backup_verified) with thresholds and formulas are placed in Appendix D. Here the rule matters, not the table of names: for each "what we improve" row there must be one-two guards from the same risk area.
For dangerous actions (last three rows), additionally include the "red button" — a blocking gate that cannot be bypassed without a referendum from Chapter 3. For manual and observational operations (first three rows), a soft warning is sufficient.
The goal of the table is not to turn it into dogma but to help see what you missed. If in the "auto-remediation with state" row you have no backup check — that is a signal to rewrite validation.md, not to "optimize MTTR."
> [conceptual interface] — structure of validation.md, which you should adapt to your trace files.
#### Minimal validation.md v1.1 structure
version: 1.1
invariants:
- name: manual_review_floor
expression: "manual_review_rate >= 0.15"
- name: silent_p0_cap
expression: "silent_p0 <= 0.05"
- name: audit_trace_required
expression: "audit_trace_coverage == 1.0"
checks:
- name: red_button_mttr_blindness
when: "MTTR <= 5m"
assert: "manual_review_rate >= 0.15 and silent_p0 <= 0.05 and audit_trace_coverage == 1.0"
fail: "CI_BLOCK"
Full form with artifact_inputs, network_consistency, and precise expression for audit_trace_required via COUNT(events_with(...)) — in [examples/goodhart-validator/specs/validation.yaml](examples/goodhart-validator/specs/validation.yaml).
The next layer of protection is a detector of hidden distortions right in the specification. Consider a regression any change in triage behavior with unchanged KPIs. Here regression is a shift in the distribution of decisions invisible to aggregates.
Reason: harm is not always visible in top-level numbers. MTTR may remain the same, escalation share may look normal, but the model starts distributing ambiguous cases differently among auto_close, manual_review, and defer.
Therefore in validation.md, compare not only aggregates but behavioral patterns:
- severity transition matrix;
- distribution of closure reasons;
- share of reopened incidents;
- delay to post-mortem label;
- change in the relationship between
manual_review_rateandsilent_p0.
If drift_budget (allowable deviation corridor from baseline) is exceeded, block the build even with "green" KPIs. This means the system has already changed its decision-making mode.
To see the main trap, three metrics and one guard are sufficient:
flowchart LR
MTTR[MTTR]
silent_p0[silent_p0]
manual_review_rate[manual_review_rate]
audit_trace_coverage[audit_trace_coverage]
silent_p0 -->|pulls MTTR down unfairly| MTTR
manual_review_rate -->|pulls MTTR up fairly| MTTR
audit_trace_coverage -->|constrains silent_p0| silent_p0Read as: MTTR can be artificially improved if silent P0 are allowed to close without escalation. Guard audit_trace_coverage forbids closure without a trace, and manual_review_rate maintains the share of manual checks. The full picture with additional metrics (escalation_rate, postmortem_regression) — in Appendix D; there also — formal thresholds and relationships.
Bind checks to Qwen logs, decisions, and diff chains — otherwise they cannot be transferred to production without losing context. Minimal trace composition per event: trace_id (chain), prompt_hash (prompt hash), decision (what was chosen), policy_version + diff_id (which version and which change introduced it), and postmortem_label (what the review confirmed). Full set of fields with agent, raw_alert_excerpt, reasoning_delta, and review_outcome belongs to the full track and is collected in [examples/templates/validation.md](examples/templates/validation.md).
These five fields allow answering engineering questions after blocking: which specification version changed behavior, which prompt pushed the model toward auto-closure, which diff introduced the new heuristic. Without this linkage validation.md remains a declaration; with it, it becomes a reproducible audit artifact.
Design metrics as a network of dependencies, not as a set of independent counters. This is network_consistency: change in one metric must not contradict related ones. Recompute together MTTR, silent_p0, manual_review_rate, escalation_rate, postmortem_regression, rollback_rate, and audit_gap (see diagram above). Local improvement in one value often creates debt in another. Practical criterion — edge consistency: if MTTR drops but manual review simultaneously decreases and the share of late-confirmed P0 rises, mark the system as risky. This turns CI from a "pass/fail KPI" check into a triage behavior stability check.
> [conceptual interface] — scripts/metrics/network_recompute.py shows the form of local metric network recomputation; no ready CLI exists in the textbook repository. Runnable analog of the CI gate itself with anti-Goodhart checks — python3 examples/goodhart-validator/scripts/run_validation.py and ci_gate.py (see "Control Fact" below).
#### Updating metric network and validation CI-gate after spec change
python3 scripts/metrics/network_recompute.py \
--spec specs/incident-spec.md \
--replay data/replay_*.jsonl \
--out .artifacts/metric_network.json
python3 scripts/metrics/ci_gate.py \
--artifact validation.md \
--metric-network .artifacts/metric_network.json \
--traces .artifacts/qwen_trace.ndjson
CONTROL: CI_GATE = PASS if (edge_drift <= 0.12 && silent_p0 <= 0.05 && manual_review_rate >= 0.15 && audit_trace_coverage == 1.0) else CI_BLOCK
Full Track: Threshold Calibration
The "Low / Default / High" table for silent_p0, manual_review_rate, edge_drift, audit_trace_coverage, exercise on "dangerous" simultaneous weakening of two protections, and the full metric dependency network — in Appendix D, Section D.4. On the first pass it is sufficient to see that a bad release is blocked by a guard metric.
Examples and Application
Example: a team wants to prove readiness of a new triage pipeline for more aggressive auto-closure. First, a replay run is launched with target optimization MTTR<=5m. Then the same incident set is checked through red_button_mttr_blindness.
If the result looks like MTTR=4:50, silent_p0=18%, manual_review_rate=12%, block the release. The reason is not bad speed but violation of protective invariants. This is an important distinction: the goal is achieved, but the quality contract is broken.
> [conceptual interface] — scripts/metrics/simulate.py and validate_red_button.py show the form of emergency mode checking; no ready CLI exists in the textbook repository. Runnable analog of checking the same invariants on educational fixtures — python3 examples/goodhart-validator/scripts/run_validation.py (see examples/goodhart-validator/README.md).
#### Example red button run on replay
python3 scripts/metrics/simulate.py \
--scenario data/replay_300.jsonl \
--goal "MTTR<=5m" \
--spec specs/incident-spec.md
python3 scripts/metrics/validate_red_button.py \
--artifact validation.md \
--mode red_button \
--assert "silent_p0<=5% && manual_review_rate>=15% && audit_trace_coverage==1.0"
CONTROL: red_button = BLOCKED (MTTR=4:50, silent_p0=18%, manual_review_rate=12%)
Second example — erroneous auto-closure of 40 P0 as "false positives." Before post-mortem the metric looks clean: incidents closed quickly, fewer escalations, queue not growing.
After label reconciliation something else emerges. Five events were real critical failures. They must increase silent_p0, escalation_regret, and postmortem_regression.
In validation.md, record such a case as a predictive triage failure. Do not wait for user damage in production. Use replay and post-mortem ground truth as an early blocking signal.
In practice, store validation.md next to the specification and update it only through the same review process as triage rules. CI on every change rebuilds the metric network, runs replay, checks trace completeness, and compares behavior to baseline. Changing a threshold — e.g., raising allowable silent_p0 from 5% to 7% — conduct as a risk contract change through the mutable rule from Part 3, not as a technical YAML edit. This barrier protects the system from gradual erosion of invariants under pressure from convenient reports.
Summary
Trap metrics are dangerous not because they are false. They are useful until the moment they become the sole optimization target.
A reliable validation.md solves five tasks:
- separates goals from invariants;
- blocks MTTR improvements when
silent_p0rises; - requires minimum manual verification;
- checks triage behavior drift;
- preserves the evidentiary chain of Qwen logs, decisions, and diffs.
> [runnable] — minimal smoke run for this chapter is in [examples/goodhart-validator/](examples/goodhart-validator/README.md).
cd book2/examples/goodhart-validator
python3 scripts/run_validation.py \
--validation specs/validation.yaml \
--metrics fixtures/new_metrics_good.json
python3 scripts/ci_gate.py \
--validation specs/validation.yaml \
--baseline fixtures/baseline_metrics.json \
--new fixtures/new_metrics_good.json
Example with expected failure for the red button scenario uses fixtures/new_metrics_bad.json: run_validation.py and ci_gate.py should exit with code 1 because manual_review_floor and silent_p0_cap are violated.
In the next chapter this protective contour will be connected to real monitoring and deployment APIs.
Artifacts and Readiness Criteria
| Artifact | Ready When |
|---|---|
Local run of book2/examples/goodhart-validator | optimization goal separated from inviolable invariant |
| Three fixtures: good / bad / drift | new_metrics_good.json passes, new_metrics_bad.json blocked with specific reason, new_metrics_drift.json blocked by compare_drift.py | | One guard metric block example | MTTR improved but release blocked due to silent_p0 or manual_review_floor | | Record in capstone/goodhart-note.md | three lines: target metric, paired guard metric, blocking condition |
Full track adds validation.md with target metric and quality invariants, .artifacts/metric_network.json or runnable metric network analog, replay set, and CI gate report with edge_drift, silent_p0, and audit_trace_coverage. Consider it ready if emergency mode blocks MTTR<<5m when silent_p0 rises, trace fields link prompt, diff, decision, and post-mortem label, and threshold change is filed as a risk contract change, not YAML cosmetics.
Practice
cd book2/examples/goodhart-validator && python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_good.json --json— *expected: code 0, in JSON"status": "PASS", all three invariantsmanual_review_floor,silent_p0_cap,audit_trace_requiredhave"ok": true.*python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_bad.json --json— *expected: code 1, in JSON"status": "CI_BLOCK", checkred_button_mttr_blindnesstriggers with"fail": "CI_BLOCK"andviolated_invariants: [manual_review_floor, silent_p0_cap]. Alsopython3 scripts/compare_drift.py --baseline fixtures/baseline_metrics.json --new fixtures/new_metrics_drift.json --threshold 0.12returns code 1 and printsedge_drift=0.18 threshold=0.12 -> FAIL.*- Transfer to
capstone/goodhart-note.mdthree lines: target metric (MTTR), guard metric (silent_p0oraudit_trace_coverage), red button rule. *Expected: next merge attempt with improved MTTR but failingaudit_trace_coveragegets blocked automatically.*
Check Questions
- Why is MTTR dangerous as the sole optimization target?
- How does a quality invariant differ from a KPI?
- What trace fields are needed to investigate a Goodhart regression?
- After release MTTR dropped 30%, but
silent_p0rose from 4% to 12%. What will you do — accept the release, roll back, or demand additional diagnostics? What exactly will you check?