Reading: Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode

Lesson 1 of 5 in module «Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode»
You are viewing the lesson without signing in. Sign in to save progress and take tests.

Applied Part 10. Protecting Metrics from Goodhart: Guard Metrics and Emergency Mode

Status: Recommendation. Protecting KPIs with a paired guard metric and a blocking emergency mode is an established practice described in the Google SRE Book. Specific thresholds (silent_p0, manual_review_floor, audit_trace_coverage) and the validation.md v1.1 format are a recommended framework that most teams adapt.

For educational walkthrough, it is sufficient to run examples/goodhart-validator/ and see how a good MTTR gets blocked by rising silent_p0. The metric network, trace fields, and threshold calibration belong to the full production track. If the term "red button" appears below, read it as a short label for the formal emergency mode.

In Part 9 of the first volume, one metric per check was sufficient: "review visible after publication," "sum does not go negative." In the production scenario cdn_error_budget_burn, that same logic is no longer enough. The agent journal and ailment dashboard from Part 11 shows a contradictory picture after release, and a single metric becomes a trap. Here we will expand it into a network of paired guard metrics — these are pairs of "KPI + safety indicator," where the second prevents optimizing the first at the cost of hidden damage. The catalog of typical manipulations against which this network protects is systematized in Part 20. SDD Antipatterns.

Before Reading

  • Foundation from the first volume: Part 9 teaches to verify the fact, not convincing prose; Part 20 shows how the process starts protecting the wrong goal.
  • Local educational case: cdn_error_budget_burn, because improved MTTR can be blocked by rising silent_p0.
  • Trail for capstone/: one target metric, one guard metric, and one blocked example for high_memory_usage.
  • Key terms for the first pass: guard metric and emergency mode ("red button"). The rest — silent_p0, manual_review_floor, audit_trace_coverage, edge_drift, trace fields, metric network — are reference material; open them only when needed for a single line in capstone/goodhart-note.md.
  • What to defer: metric network, trace fields, drift calibration, and full emergency mode.

Goal

By the end of the section, you will assemble a validation.md that catches Goodhart traps in advance and does not allow the LLM incident pipeline to improve reported KPIs at the cost of triage degradation.

The main gain is this: you will separate metrics into controllable goals and inviolable quality invariants. Then you will establish verifiable thresholds for them, evidence in traces, and CI blocks.

"Trap metrics" here means KPIs that are useful as signals but become dangerous if optimized separately from quality invariants. KPI (key performance indicator) is a key metric that the team seeks to improve with a release.

This approach continues the SDD cycle: specification, verification criteria, and iterations are fixed before implementing changes, not adjusted after obtaining a pretty result (GitHub Spec Kit Quickstart).

The effect "when a measure becomes a target, it ceases to be a good measure" is classically known as Goodhart's law (Wikipedia: Goodhart's law). Google's SRE definition of SLO directly relies on this caution (SRE Book: Service Level Objectives).

Minimal Educational Scenario

Educational Case

Production incident cdn_error_budget_burn, projected onto the educational agent journal from book/part-11-second-feature-phase.md. The release improved MTTR from 660s to 290s, formally looking like a success. But silent_p0 jumped from 0.02 to 0.18, manual_review_rate dropped from 0.18 to 0.12. The goal is to see that the CI gate catches this shift and blocks the merge despite the "green" MTTR.

Preparation

  • book2/examples/goodhart-validator/specs/validation.yaml — invariants and red button check.
  • book2/examples/goodhart-validator/fixtures/baseline_metrics.json — baseline (MTTR 660s, silent_p0 0.02).
  • book2/examples/goodhart-validator/fixtures/new_metrics_good.json — improvement without blind spots.
  • book2/examples/goodhart-validator/fixtures/new_metrics_bad.json — "MTTR blindness" (290s, silent_p0 0.18).
  • book2/examples/goodhart-validator/fixtures/new_metrics_drift.json — correlation edge drift.
  • book2/examples/goodhart-validator/scripts/run_validation.py, compare_drift.py, ci_gate.py.

Steps

  1. cd book2/examples/goodhart-validator. Expected: you are in the example directory, no additional dependencies needed.
  2. Run "good": python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_good.json. *Expected: exit code 0, status PASS, all three invariants OK.*
  3. Run "MTTR blindness": python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_bad.json. *Expected: exit code 1, red_button_mttr_blindness triggers, manual_review_floor and silent_p0_cap marked FAIL.*

Bad: looking only at MTTR — the release is faster, seems "better."

Good: running validation with invariants — "faster" with silent_p0=0.18 gets blocked automatically.

  1. Run drift against drift fixture: python3 scripts/compare_drift.py --baseline fixtures/baseline_metrics.json --new fixtures/new_metrics_drift.json. *Expected: edge_drift > 0.12, exit code 1.*
  2. Control: same compare_drift.py against good metrics. *Expected: edge_drift <= 0.12, exit code 0.*
  3. Full CI gate: python3 scripts/ci_gate.py --validation specs/validation.yaml --baseline fixtures/baseline_metrics.json --new fixtures/new_metrics_bad.json. *Expected: exit code 1, reasons lists specific violated invariants, not a generic FAIL.*
  4. Record the run as a short anti-Goodhart conclusion: target metric improved, but silent_p0_cap and manual_review_floor blocked the release. Expected: on the next pull request with MTTR acceleration, the validator compares against good/bad/drift fixtures, not "green vs old baseline."

If you have Qwen Code installed and need an explanation for review, perform a separate optional step:

qwen -p "Read @fixtures/new_metrics_bad.json and @specs/validation.yaml. Which invariant cannot be bypassed even at MTTR=290s? Do not modify the files." --approval-mode plan

Such output is useful as explanation but does not replace run_validation.py, compare_drift.py, and ci_gate.py.

Control Fact

Step 2 gives exit code 0, steps 3 and 4 give exit code 1 with specific indication of violated invariants. Step 6 shows the same behavior in the composite gate. If the CI gate passes new_metrics_bad.json, the validator configuration is weakened — the silent_p0_cap or manual_review_floor threshold has been shifted.

How This Gets into capstone/

Transfer to capstone/goodhart-note.md one target metric, one guard metric, and one blocked example. If the main credit case is high_memory_usage, record this run as an anti-Goodhart risk for the same contour: memory or MTTR cannot be improved at the cost of silent_p0, manual audit, or 5xx. Do not transfer the entire metric network if it was not recomputed; for the educational minimum, it is sufficient to show that an improved KPI does not pass without a protective invariant.

Minimal fragment:

target_metric: "MTTR <= 5m"

guard_metric: "silent_p0 <= 0.05 and manual_review_rate >= 0.15"
blocked_example: "new_metrics_bad.json"
reason: "MTTR improved, but silent_p0 and manual_review_floor fail"

Reviewable Trail

Scripts run_validation.py, compare_drift.py, and ci_gate.py write results to stdout, they do not create a separate out/ directory. For the educational route, transfer the summary to capstone/goodhart-note.md: target metric, guard metric, blocked example, and reason.

If in your own project you save outputs/goodhart.last-run.txt, it should be a readable appendix for review, not an empty marker. In SDD, a fact is a reproducible command or readable artifact, not the mere existence of a commit.

Key Ideas

First, define which indicators remain quality invariants and which become optimization targets and therefore subject to manipulation. An invariant cannot be "improved" by direct pressure: it describes the minimally acceptable system state. Examples of invariants:

  • audit completeness;
  • manual review share;
  • upper bound on silent_p0 (this is the share of "silent" critical incidents closed without escalation).

An optimization target, conversely, can be lowered or raised, but only within a protective corridor. MTTR is useful as a recovery speed indicator but dangerous as the sole reward for the model or team.

In validation.md, make this distinction explicit. MTTR<=5m can be a goal. But manual_review_rate>=15%, silent_p0<=5%, and audit_trace_coverage==100% should remain admission conditions.

Bad:

> Achieve MTTR below 5 minutes.

Problem: bare goal without guard metrics, direct path to silent_p0.

Good:

> MTTR <= 5m AND silent_p0 <= 5% AND manual_review_rate >= 15% AND audit_trace_coverage == 100% — violation of any condition = CI_BLOCK.

The Goodhart trap manifests when a metric becomes a substitute for reality. The system starts optimizing the measurement method, not triage quality. If MTTR is checked in isolation, the model learns to close incidents faster, reduce escalation share, and avoid lengthy investigations — exactly those that spoil mean time to recovery.

On the chart this looks like a win: MTTR drops to 5 minutes or below. But in the operational loop this may mean the opposite. Complex P0s did not disappear, they became invisible because they were misclassified as false positives, low urgency, or "self-healing" events.

The "MTTR 5 minutes" trap is especially dangerous for rare severe incidents where closure speed competes with investigation completeness. In numbers it looks like this:

  • baseline on 300 incidents in replay: MTTR 11:00, escalation share 14%, silent_p0 2%;
  • new optimized version: MTTR 4:50, escalations 6%, silent_p0 18%.

Formally the KPI improved. But the system started missing critical events more often without manual review and escalation. Block such a release: it transfers risk from the visible report into future repeat incidents, post-mortem regressions, and lost chains of responsibility.

The antidote in validation.md is formal conditions that prevent optimization from redefining the meaning of quality. The minimal set is three rules, and they must be checked simultaneously:

RuleWhat It ProtectsBoundary
manual_review_floorshare of decisions with manual verificationnot below 15%

| silent_p0_cap | share of "silent" P0 closed without escalation | not above 5% | | audit_trace_required | completeness of decision trace (prompt, diff, source) | 100%, no exceptions |

Individually these rules leave loopholes. High traceability does not compensate for rising silent_p0. Manual review is useless if it is impossible to reconstruct the prompt, distinction, and source of the decision. Tune the "red button" so it triggers not on one bad number but on violation of the protective contour.

What to Choose as Goal and What as Protection

Not all KPIs require equal protection. Manual triage operations and automatic remediation have different risk levels, so the minimal set of invariants differs. The main rule is one: the more dangerous the action, the more guard metrics must pair with the target KPI.

Decision TypeWhat We ImproveWhat Must Go in Pair
Manual triage operationMTTRdecision trace saved completely
Auto-classification without actionclassification speed and accuracyno silent P0; decision trace saved
Auto-escalationescalation delayno silent P0; no false escalations

| Auto-remediation without state | MTTR | no silent P0; manual review exists; full audit trace | | Auto-remediation with state (DB, cache) | MTTR | same + verified backup | | New policy release | repeat accuracy | no "edge drift"; full audit trace |

Full English metric names (silent_p0, manual_review_floor, audit_trace_coverage, false_escalation_rate, edge_drift, postmortem_gap, backup_verified) with thresholds and formulas are placed in Appendix D. Here the rule matters, not the table of names: for each "what we improve" row there must be one-two guards from the same risk area.

For dangerous actions (last three rows), additionally include the "red button" — a blocking gate that cannot be bypassed without a referendum from Chapter 3. For manual and observational operations (first three rows), a soft warning is sufficient.

The goal of the table is not to turn it into dogma but to help see what you missed. If in the "auto-remediation with state" row you have no backup check — that is a signal to rewrite validation.md, not to "optimize MTTR."

> [conceptual interface] — structure of validation.md, which you should adapt to your trace files.

#### Minimal validation.md v1.1 structure
version: 1.1
invariants:
  - name: manual_review_floor
    expression: "manual_review_rate >= 0.15"

  - name: silent_p0_cap
    expression: "silent_p0 <= 0.05"

  - name: audit_trace_required
    expression: "audit_trace_coverage == 1.0"

checks:
  - name: red_button_mttr_blindness
    when: "MTTR <= 5m"
    assert: "manual_review_rate >= 0.15 and silent_p0 <= 0.05 and audit_trace_coverage == 1.0"
    fail: "CI_BLOCK"

Full form with artifact_inputs, network_consistency, and precise expression for audit_trace_required via COUNT(events_with(...)) — in [examples/goodhart-validator/specs/validation.yaml](examples/goodhart-validator/specs/validation.yaml).

The next layer of protection is a detector of hidden distortions right in the specification. Consider a regression any change in triage behavior with unchanged KPIs. Here regression is a shift in the distribution of decisions invisible to aggregates.

Reason: harm is not always visible in top-level numbers. MTTR may remain the same, escalation share may look normal, but the model starts distributing ambiguous cases differently among auto_close, manual_review, and defer.

Therefore in validation.md, compare not only aggregates but behavioral patterns:

  • severity transition matrix;
  • distribution of closure reasons;
  • share of reopened incidents;
  • delay to post-mortem label;
  • change in the relationship between manual_review_rate and silent_p0.

If drift_budget (allowable deviation corridor from baseline) is exceeded, block the build even with "green" KPIs. This means the system has already changed its decision-making mode.

To see the main trap, three metrics and one guard are sufficient:

flowchart LR
    MTTR[MTTR]
    silent_p0[silent_p0]
    manual_review_rate[manual_review_rate]
    audit_trace_coverage[audit_trace_coverage]
    silent_p0 -->|pulls MTTR down unfairly| MTTR
    manual_review_rate -->|pulls MTTR up fairly| MTTR
    audit_trace_coverage -->|constrains silent_p0| silent_p0

Read as: MTTR can be artificially improved if silent P0 are allowed to close without escalation. Guard audit_trace_coverage forbids closure without a trace, and manual_review_rate maintains the share of manual checks. The full picture with additional metrics (escalation_rate, postmortem_regression) — in Appendix D; there also — formal thresholds and relationships.

Bind checks to Qwen logs, decisions, and diff chains — otherwise they cannot be transferred to production without losing context. Minimal trace composition per event: trace_id (chain), prompt_hash (prompt hash), decision (what was chosen), policy_version + diff_id (which version and which change introduced it), and postmortem_label (what the review confirmed). Full set of fields with agent, raw_alert_excerpt, reasoning_delta, and review_outcome belongs to the full track and is collected in [examples/templates/validation.md](examples/templates/validation.md).

These five fields allow answering engineering questions after blocking: which specification version changed behavior, which prompt pushed the model toward auto-closure, which diff introduced the new heuristic. Without this linkage validation.md remains a declaration; with it, it becomes a reproducible audit artifact.

Design metrics as a network of dependencies, not as a set of independent counters. This is network_consistency: change in one metric must not contradict related ones. Recompute together MTTR, silent_p0, manual_review_rate, escalation_rate, postmortem_regression, rollback_rate, and audit_gap (see diagram above). Local improvement in one value often creates debt in another. Practical criterion — edge consistency: if MTTR drops but manual review simultaneously decreases and the share of late-confirmed P0 rises, mark the system as risky. This turns CI from a "pass/fail KPI" check into a triage behavior stability check.

> [conceptual interface]scripts/metrics/network_recompute.py shows the form of local metric network recomputation; no ready CLI exists in the textbook repository. Runnable analog of the CI gate itself with anti-Goodhart checks — python3 examples/goodhart-validator/scripts/run_validation.py and ci_gate.py (see "Control Fact" below).

#### Updating metric network and validation CI-gate after spec change
python3 scripts/metrics/network_recompute.py \
  --spec specs/incident-spec.md \
  --replay data/replay_*.jsonl \
  --out .artifacts/metric_network.json

python3 scripts/metrics/ci_gate.py \
  --artifact validation.md \
  --metric-network .artifacts/metric_network.json \
  --traces .artifacts/qwen_trace.ndjson

CONTROL: CI_GATE = PASS if (edge_drift <= 0.12 && silent_p0 <= 0.05 && manual_review_rate >= 0.15 && audit_trace_coverage == 1.0) else CI_BLOCK

Full Track: Threshold Calibration

The "Low / Default / High" table for silent_p0, manual_review_rate, edge_drift, audit_trace_coverage, exercise on "dangerous" simultaneous weakening of two protections, and the full metric dependency network — in Appendix D, Section D.4. On the first pass it is sufficient to see that a bad release is blocked by a guard metric.

Examples and Application

Example: a team wants to prove readiness of a new triage pipeline for more aggressive auto-closure. First, a replay run is launched with target optimization MTTR<=5m. Then the same incident set is checked through red_button_mttr_blindness.

If the result looks like MTTR=4:50, silent_p0=18%, manual_review_rate=12%, block the release. The reason is not bad speed but violation of protective invariants. This is an important distinction: the goal is achieved, but the quality contract is broken.

> [conceptual interface]scripts/metrics/simulate.py and validate_red_button.py show the form of emergency mode checking; no ready CLI exists in the textbook repository. Runnable analog of checking the same invariants on educational fixtures — python3 examples/goodhart-validator/scripts/run_validation.py (see examples/goodhart-validator/README.md).

#### Example red button run on replay
python3 scripts/metrics/simulate.py \
  --scenario data/replay_300.jsonl \
  --goal "MTTR<=5m" \
  --spec specs/incident-spec.md

python3 scripts/metrics/validate_red_button.py \
  --artifact validation.md \
  --mode red_button \
  --assert "silent_p0<=5% && manual_review_rate>=15% && audit_trace_coverage==1.0"

CONTROL: red_button = BLOCKED (MTTR=4:50, silent_p0=18%, manual_review_rate=12%)

Second example — erroneous auto-closure of 40 P0 as "false positives." Before post-mortem the metric looks clean: incidents closed quickly, fewer escalations, queue not growing.

After label reconciliation something else emerges. Five events were real critical failures. They must increase silent_p0, escalation_regret, and postmortem_regression.

In validation.md, record such a case as a predictive triage failure. Do not wait for user damage in production. Use replay and post-mortem ground truth as an early blocking signal.

In practice, store validation.md next to the specification and update it only through the same review process as triage rules. CI on every change rebuilds the metric network, runs replay, checks trace completeness, and compares behavior to baseline. Changing a threshold — e.g., raising allowable silent_p0 from 5% to 7% — conduct as a risk contract change through the mutable rule from Part 3, not as a technical YAML edit. This barrier protects the system from gradual erosion of invariants under pressure from convenient reports.

Summary

Trap metrics are dangerous not because they are false. They are useful until the moment they become the sole optimization target.

A reliable validation.md solves five tasks:

  • separates goals from invariants;
  • blocks MTTR improvements when silent_p0 rises;
  • requires minimum manual verification;
  • checks triage behavior drift;
  • preserves the evidentiary chain of Qwen logs, decisions, and diffs.

> [runnable] — minimal smoke run for this chapter is in [examples/goodhart-validator/](examples/goodhart-validator/README.md).

cd book2/examples/goodhart-validator
python3 scripts/run_validation.py \
  --validation specs/validation.yaml \
  --metrics fixtures/new_metrics_good.json

python3 scripts/ci_gate.py \
  --validation specs/validation.yaml \
  --baseline fixtures/baseline_metrics.json \
  --new fixtures/new_metrics_good.json

Example with expected failure for the red button scenario uses fixtures/new_metrics_bad.json: run_validation.py and ci_gate.py should exit with code 1 because manual_review_floor and silent_p0_cap are violated.

In the next chapter this protective contour will be connected to real monitoring and deployment APIs.

Artifacts and Readiness Criteria

ArtifactReady When
Local run of book2/examples/goodhart-validatoroptimization goal separated from inviolable invariant

| Three fixtures: good / bad / drift | new_metrics_good.json passes, new_metrics_bad.json blocked with specific reason, new_metrics_drift.json blocked by compare_drift.py | | One guard metric block example | MTTR improved but release blocked due to silent_p0 or manual_review_floor | | Record in capstone/goodhart-note.md | three lines: target metric, paired guard metric, blocking condition |

Full track adds validation.md with target metric and quality invariants, .artifacts/metric_network.json or runnable metric network analog, replay set, and CI gate report with edge_drift, silent_p0, and audit_trace_coverage. Consider it ready if emergency mode blocks MTTR<<5m when silent_p0 rises, trace fields link prompt, diff, decision, and post-mortem label, and threshold change is filed as a risk contract change, not YAML cosmetics.

Practice

  1. cd book2/examples/goodhart-validator && python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_good.json --json — *expected: code 0, in JSON "status": "PASS", all three invariants manual_review_floor, silent_p0_cap, audit_trace_required have "ok": true.*
  2. python3 scripts/run_validation.py --validation specs/validation.yaml --metrics fixtures/new_metrics_bad.json --json — *expected: code 1, in JSON "status": "CI_BLOCK", check red_button_mttr_blindness triggers with "fail": "CI_BLOCK" and violated_invariants: [manual_review_floor, silent_p0_cap]. Also python3 scripts/compare_drift.py --baseline fixtures/baseline_metrics.json --new fixtures/new_metrics_drift.json --threshold 0.12 returns code 1 and prints edge_drift=0.18 threshold=0.12 -> FAIL.*
  3. Transfer to capstone/goodhart-note.md three lines: target metric (MTTR), guard metric (silent_p0 or audit_trace_coverage), red button rule. *Expected: next merge attempt with improved MTTR but failing audit_trace_coverage gets blocked automatically.*

Check Questions

  1. Why is MTTR dangerous as the sole optimization target?
  1. How does a quality invariant differ from a KPI?
  2. What trace fields are needed to investigate a Goodhart regression?
  3. After release MTTR dropped 30%, but silent_p0 rose from 4% to 12%. What will you do — accept the release, roll back, or demand additional diagnostics? What exactly will you check?
My notes
0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course menu

Course

Production SDD for Qwen Code CLI. Part 2
Progress 0 / 100