Reading: Applied Part 7. Specification CI: specification as an executable artifact

Lesson 1 of 5 in module «Applied Part 7. Specification CI: specification as an executable artifact»

You are viewing the lesson without signing in. Sign in to save progress and take tests.

Source

Applied part 7. Specification CI: the specification as an executable artifact

Status: Recommendation. Running specification checks in CI is a stable practice. The specific set of gates (coverage, scope, schema, spec_gate) and the JSON diagnostic format are a recommended framework that most teams adapt. JSON Schema validation of fixtures from validation.md is a standard use of the tool.

In this chapter, a "specification gate" is a short name for the loop that checks the specification in the same way that regular CI checks code. The check itself consists of ordinary steps: parsing requirements.md and plan.md, validating examples against JSON Schema, and verifying cross-file consistency. All other terms — gate, fixture, schema-check, coverage-check — are introduced below, in the places where they actually appear in a team or script; there is no need to condense them into a single opening line.

The specification gate is the automation of exactly the procedure that part 9 of the first volume performed with the human eye, and that part 16 performed as a team in a pull request. In the training AgentClinic, the REQ identifiers and review payload schemas from part 12 could still remain a convention. In production, that reserve of trust does not exist. The same links must be turned into a mandatory gate that green unit tests cannot bypass.

Before reading

Anchor from the first volume: part 9 links validation.md to facts, part 16 shows review of the evidence package.
Local training case: incident payload, because coverage and JSON Schema can be checked locally without CI.
Trace for capstone/: a single Spec CI line for high_memory_usage: the command, the proven fact, and a negative example.
Main term of the first pass: Spec CI. gate, fixture, schema-check, coverage-check are reference terms, appearing right in the command and the script comment.
What to defer: GitHub Actions workflow, scope-gate, and extraction of fixtures from an arbitrary validation.md.

Goal

In this chapter, the specification gate turns from the idea of "checking documents" into a working GitHub Actions loop for the incident project. Every push and every pull request passes through a mandatory gate. The gate blocks merging on three classes of violations:

unmet requirements,
out-of-scope actions,
JSON Schema errors.

The reader will get a practical repository scheme in which requirements.md, plan.md, validation.md, and API contracts are checked as executable artifacts, not as reference documentation.

The main win is that the team gets a reproducible blocking mechanism in CI. The argument about specification quality is reduced to a specific line, rule, and action to fix it.

Minimal training scenario

Training case

incident payload: verify that requirements.md is linked to plan.md, and that the JSON fixtures (input examples that we extract from validation.md) contain the required incident_id. The goal is to see Spec CI as a small local gate (a mandatory check without which merging is blocked), not as a large GitHub Actions process.

Preparation

book2/examples/spec-ci/requirements.md.
book2/examples/spec-ci/plan.md.

book2/examples/spec-ci/fixtures/valid-incident.json.
book2/examples/spec-ci/fixtures/invalid-missing-incident-id.json.
The scripts check_coverage.py and validate_schema.py.

Steps

cd book2/examples/spec-ci. Expectation: you are in the runnable example directory.
python3 scripts/check_coverage.py --requirements requirements.md --plan plan.md. *Expectation: return code 0, all REQ-* have a link to the plan.*
python3 scripts/validate_schema.py --schema schemas/incident_payload.schema.json --fixtures fixtures. Expectation: the valid fixture passes, the negative one fails predictably.
Open the error message for the negative fixture. Expectation: it is clear which field is missing and which file to fix.
Record in validation.md what exactly the gate blocks: coverage, scope, or schema.

Checkable fact

A single local run shows two kinds of truth: a requirement is linked to the plan, and data conforms to the contract. If a CI error does not point to a file, a rule, and an action, it is not ready for the team.

How it gets into `capstone/`

Transfer to capstone/validation.md a single Spec CI line: which command was run, what it proved, which negative example was blocked. Do not transfer the full GitHub Actions workflow if it has not been created; for the training minimum, the runnable analogue from examples/spec-ci is enough.

Minimal fragment:

| Spec CI | `python3 scripts/check_coverage.py ...` | all REQ-* are linked to plan | PASS |
| Schema negative | `python3 scripts/validate_schema.py ...` | missing incident_id blocked | PASS |

Transfer to `high_memory_usage`

The training example works on incident payload, but in capstone/ you need a line for high_memory_usage. Apply the same two classes of checks to your own requirements:

What we check	Command (training)	What it proves for `high_memory_usage`
Coverage	`check_coverage.py --requirements requirements.md --plan plan.md`	requirement REQ-HM-01 "do not restart a pod without confirmed RSS > 90% for 5 minutes" is linked to a task in `plan.md`

| Schema negative | validate_schema.py --schema schemas/incident_payload.schema.json --fixtures fixtures | a fixture without incident_id or with severity: "P0" without backup_verified is blocked |

If you cannot write Coverage and Schema negative lines for high_memory_usage, it means requirements.md does not yet contain a checkable requirement, or the schema does not yet distinguish P0 without backup.

Reviewable trace

In the training package, keep changes to requirements.md, plan.md, validation.md, or the schema. Temporary fixtures created only for local diagnostics are not needed unless they have become part of the regression set.

Key ideas

Here, "executable artifact" does not mean running Markdown as a program. It is about checking requirements, plan, and examples with ordinary CI scripts.

Run a mandatory GitHub Actions gate on both pull_request and push to a protected branch. Why both triggers. A specification violation can enter in two ways: through an ordinary pull request, or through a direct update of service files.

The minimum set of tracked artifacts:

requirements.md,
plan.md,
validation.md,
contracts/**,

constitution.md — if needed, when it records domain constraints of the incident pipeline.

In branch protection settings, mark exactly the final task spec_gate as required. Otherwise, green unit tests will be able to bypass the semantic check. This scheme matches the SDD approach, in which requirements, plan, and tasks become checkable layers, not static text (GitHub Spec Kit).

> [project script] — .github/workflows/spec-ci.yml calls the project scripts scripts/spec_ci/*.py.

name: spec-ci

on:
  pull_request:
    paths:
      - 'requirements.md'
      - 'plan.md'
      - 'validation.md'
      - 'contracts/**'
      - 'constitution.md'
  push:
    branches: [main]
    paths:
      - 'requirements.md'
      - 'plan.md'
      - 'validation.md'
      - 'contracts/**'
      - 'constitution.md'

jobs:
  coverage:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: python3 scripts/spec_ci/check_coverage.py --requirements requirements.md --plan plan.md --out out/spec-ci/coverage-report.json

  scope:
    runs-on: ubuntu-latest
    needs: [coverage]

steps:
      - uses: actions/checkout@v4
      - run: python3 scripts/spec_ci/check_scope.py --domain models/incident-response.yaml --plan plan.md --contracts contracts/api.md --out out/spec-ci/scope-violations.ndjson

  schema:
    runs-on: ubuntu-latest
    needs: [coverage, scope]
    steps:
      - uses: actions/checkout@v4
      - run: python3 scripts/spec_ci/extract_fixtures.py --from validation.md --out out/spec-ci/fixtures
      - run: python3 scripts/spec_ci/validate_schema.py --schemas schemas --fixtures out/spec-ci/fixtures --out out/spec-ci/schema-audit.json

  spec_gate:
    runs-on: ubuntu-latest
    needs: [coverage, scope, schema]
    steps:
      - run: echo "Specification gate passed"

Coverage check starts from the requirements → plan graph, not from matching-word search. Let us introduce tracing rules:

every user story from requirements.md gets a stable identifier REQ-*;
every task or step in plan.md must reference one or more such identifiers through a field like implements: [REQ-014].

What counts as an error. If a story has no traceable task, exit with fail: the team has already lost the promise to the user before implementation starts. The reverse violation is also important. A task without implements becomes a rogue task. That means the plan has started adding functionality that is not backed by requirements.

> [project script] — scripts/spec_ci/check_coverage.py; the runnable analogue is [examples/spec-ci/scripts/check_coverage.py](examples/spec-ci/).

python3 scripts/spec_ci/check_coverage.py \
  --requirements requirements.md \
  --plan plan.md \
  --out out/spec-ci/coverage-report.json

The out-of-scope detector is needed for cases where formal tracing exists, but the step's content goes beyond the incident domain. Match actions from plan.md against the domain model incident-response. Allowed operations, for example:

acknowledge,
escalate,
annotate,
rollback,
notify_on_call.

Arbitrary business actions (such as notify_finance, close_customer_contract, or force_resolve_without_operator) are not included here.

Take into account not only the verb. Also check:

the actor,
the endpoint,

the trigger condition.

Why so. resolve incident can be allowed for a human on-call operator and forbidden for an autonomous agent. The practical rule is simple: if a step cannot be explained through the incident model and the allowed API contract, it blocks the pull request.

> [project script] — scripts/spec_ci/check_scope.py. There is no ready-made analogue; implement it yourself on top of the domain model and the API contract.

python3 scripts/spec_ci/check_scope.py \
  --domain models/incident-response.yaml \
  --plan plan.md \
  --contracts contracts/api.md \
  --out out/spec-ci/scope-violations.ndjson

JSON Schema checks close the layer of fixtures and payload examples, where silent integration regressions often appear. What to do:

extract all JSON blocks from validation.md;
convert them into separate fixtures;
validate against schemas from schemas/** — for example, incident_payload.schema.json, pagerduty_webhook.schema.json, or grafana_alert.schema.json.

Watch two directions. Valid examples must pass without errors. Deliberately negative examples must fail predictably. If a negative payload passes, the schema is too soft and does not protect the contract.

Before merging to a protected branch, this is checked as strictly as application tests. An incorrect incident_id, an invalid severity, or an empty source can break the entire remediation loop.

> [project script] — scripts/spec_ci/extract_fixtures.py and scripts/spec_ci/validate_schema.py; the runnable analogue of the schema check is [examples/spec-ci/scripts/validate_schema.py](examples/spec-ci/). Implement the extraction step yourself for the validation.md format of your project.

python3 scripts/spec_ci/extract_fixtures.py \
  --from validation.md \
  --out out/spec-ci/fixtures

python3 scripts/spec_ci/validate_schema.py \
  --schemas schemas \
  --fixtures out/spec-ci/fixtures \
  --out out/spec-ci/schema-audit.json

Design the diagnostic format of CI rejections to be quick to fix, not something to investigate through process logs.

Bad: > Coverage failed: missing REQ

Problem: the reviewer does not know which requirement was orphaned or where to look. The error cannot be fixed without a separate investigation.

Good: > requirements.md:42: REQ-014 has no reference in plan.md. Add a task with implements: REQ-014 to plan.md or remove the requirement.

In every error, indicate four elements:

a clear reason,
a file and line reference,
the identifier of the violated rule,
a concrete action for the specification team.

Error types look like this. For coverage, it can be REQ-021 has no implementing plan item; add implements: [REQ-021] to plan.md or remove the requirement. For scope — plan.md:48 uses force_resolve without domain permission. For schema — validation.md:72 missing required property incident_id.

This format reduces reviewer load. The person checks the meaning of the change, not restoring what exactly CI broke.

{
  "status": "failed",
  "check": "scope",
  "file": "plan.md",
  "line": 48,
  "rule": "IR-SCOPE-007",
  "reason": "Autonomous force resolve is outside the incident-response domain model",
  "action": "Replace with POST /incidents/{id}/ack or add an approved requirement and domain rule"
}

Examples and application

flowchart LR
A[pre-commit hook]
B[local quick run and light duel before push]
C[PR push]
D[extract changed files]
E[check_coverage requirements plan tasks graph]
F[check_scope domain model and contracts/api]
G[check_schema validation and counterexamples]
H[gate report and PR status]
A --> B --> C --> D --> E --> F --> G --> H

A typical pull request in the training incident repository changes three files:

requirements.md,
plan.md,
validation.md.

The author describes the story REQ-014: as an on-call engineer, I want to receive escalation confirmation. Then in the plan they add task TASK-033 with implements: [REQ-014]. And in validation.md they put an example webhook payload with the fields incident_id, severity, source, and escalation_target.

What gets checked. The coverage check passes if the link REQ-014 → TASK-033 exists. The scope check passes if the action matches the domain model. The schema check passes if the payload matches the contract. If any of the three layers breaks, spec_gate returns a red status and GitHub does not allow merging.

A telling failure: the author tries to "speed up" handling and adds the step POST /pagerduty/force-resolve to plan.md without a separate requirement and without permission in the domain model. Coverage can stay green if the step is formally tied to an existing story. But the scope check will block the pull request: autonomous incident closure without on-call confirmation is not among the agreed operations.

If the same pull request adds a payload with event_code instead of the required incident_id to validation.md, the schema check produces an independent blocker. The team gets two different classes of errors:

semantic out-of-scope,
data-structure violation.

A local quick run before push saves time and makes the specification gate a habitual part of the working cycle. In pre-commit, run only changed files. Leave the full process to GitHub Actions, so the developer is not slowed down by a long check of all fixtures.

For the incident project, a command is enough that does three things:

builds the coverage graph,
checks scope by diff,
validates the affected JSON blocks.

If the local report already shows orphan requirement, rogue task, or schema mismatch, the author fixes the specification before creating the pull request, not after getting a red status in the remote CI.

> [project script] — example local wrapper for scripts/spec_ci/*.py.

#!/usr/bin/env bash
set -euo pipefail

python3 scripts/spec_ci/check_coverage.py \
  --requirements requirements.md \
  --plan plan.md \
  --out out/spec-ci/coverage-report.json

python3 scripts/spec_ci/check_scope.py \
  --domain models/incident-response.yaml \
  --plan plan.md \
  --contracts contracts/api.md \
  --out out/spec-ci/scope-violations.ndjson

python3 scripts/spec_ci/extract_fixtures.py \
  --from validation.md \
  --out out/spec-ci/fixtures

python3 scripts/spec_ci/validate_schema.py \
  --schemas schemas \
  --fixtures out/spec-ci/fixtures \
  --out out/spec-ci/schema-audit.json

Summary

The specification gate makes the specification an executable arbiter of the repository. GitHub Actions blocks the pull request on three classes of violations:

uncovered user stories,
extraneous scenarios in the plan,
JSON Schema errors in validation examples.

For the team, this changes the nature of review. Instead of a subjective argument about requirement completeness, there is a diagnostic report with a file, line, rule, and action.

In incident automation, this rigor is especially important. An incorrect scope or a weak payload contract can lead to three consequences:

false escalations,
dangerous auto-operations,
loss of trust in remediation.

Next, this loop becomes the basis for file-based arbitration of disputed changes.

The minimum runnable set for this chapter is in examples/spec-ci/. Walk through it before implementing the full GitHub Actions process. First, achieve a green local gate. Then transfer the same commands to CI.

> [runnable] — runnable example: examples/spec-ci/scripts/check_coverage.py and examples/spec-ci/scripts/validate_schema.py.

cd book2/examples/spec-ci
python3 scripts/check_coverage.py --requirements requirements.md --plan plan.md
python3 scripts/validate_schema.py --schema schemas/incident_payload.schema.json --fixtures fixtures

AoT gate for agent plans

Spec CI checks not only textual requirements. If an agent builds a plan as a set of atomic actions, that plan also becomes a checkable artifact. The format can be simple: atoms[], where each atom has id, kind, name, input, and dependsOn.

Minimal rule: an LLM is allowed to propose atoms, but is not allowed to execute them before checking. The local gate must reject:

an unknown tool name;
missing required parameters;
a reference to the result of a non-existent atom;
a dependency cycle;
an action outside the incident-response domain model.

For high_memory_usage, it looks like this:

{
  "atoms": [
    {"id": 1, "kind": "tool", "name": "read_metrics", "input": {"window": "10m"}, "dependsOn": []},
    {"id": 2, "kind": "tool", "name": "check_readiness", "input": {"incident_id": "HM-2026-05-17-01"}, "dependsOn": [1]},
    {"id": 3, "kind": "tool", "name": "dry_run", "input": {"action": "restart_pod"}, "dependsOn": [2]}
  ]
}

Such a plan cannot be accepted based on the persuasiveness of the text. It must be validated as a graph: topological sort, a list of allowed name, JSON Schema for the input of each tool, and a separate blast-radius check. This is the same principle as for requirements.md → plan.md: a specification is useful only when it can be rejected mechanically.

Artifacts and readiness criteria

Artifact	Ready when
Local run of `book2/examples/spec-ci`	smoke-pass without external dependencies
Coverage check `requirements → plan`	every `REQ-*` has an implementing task, every task has `implements`
JSON Schema check	valid fixture passes, negative one fails predictably
Atomic plan check	unknown tools, cycles, and actions outside the domain are blocked before execution
Entry in `validation.md`	the gate error message indicates the file, the rule, and the fix action

The full track adds .github/workflows/spec-ci.yml or its project analogue, out/spec-ci/coverage-report.json for the requirements → plan graph, out/spec-ci/scope-violations.ndjson with domain model violations, out/spec-ci/schema-audit.json for fixtures from validation.md, and a local wrapper for the quick run. Consider it ready when the scope check blocks autonomous actions outside the incident-response model, the spec_gate task is required in the protected branch, and the CI diagnostic format indicates a file, line, rule identifier, and action.

Practice

cd book2/examples/spec-ci && python3 scripts/check_coverage.py --requirements requirements.md --plan plan.md — *expectation: code 0, stdout is a single line coverage ok: 3 requirements covered.*
python3 scripts/validate_schema.py --schema schemas/incident_payload.schema.json --fixtures fixtures — *expectation: code 0; stdout contains valid-incident.json: valid and invalid-missing-incident-id.json: expected invalid, rejected: missing required property incident_id (the negative fixture is marked _expected_invalid: true and is therefore considered successfully rejected).*

Transfer to capstone/validation.md a single Spec CI line: "coverage ok: 3/3, schema ok: 2/2 (negative rejected for missing required property incident_id)". Expectation: on the next regression, the line lets you restore what exactly blocks the merge without reading CI logs.

Check questions

Why is coverage by words weaker than the requirements → plan graph?
Which violations should the scope check catch, and which should it not?
What makes a CI error fixable without investigation?
The specification gate blocks a merge because of a REQ-ID mismatch. The developer wants to add a REQ-ID to an existing plan item and merge the pull request. What is dangerous about this approach?