Study guide: Applied Part 5. Mutation Testing of Specifications

Lesson 3 of 5 in module «Applied Part 5. Mutation Testing of Specifications»

You are viewing the lesson without signing in. Sign in to save progress and take tests.

Topic: Practical Part 5. Mutation Testing of Specifications

Difficulty level: Medium

Estimated study time: 3-4 hours

Prerequisites: Familiarity with the discipline of facts from part 9 of the first volume (Given/When/Then format)

Understanding of SDD antipatterns from part 20

Basic skills working with the Linux command line and Python 3

Understanding of JSON structure and the principles of JSON Schema

Learning objectives: Successfully run the generator of degenerate specifications and ensure generation determinism (verification via seed).

Measure the resilience of the validator loop using the vector immunity metric (strict_reject_rate, depth_of_diagnostics, recovery_time).

Link each generated mutant with a specific step of the Given/When/Then scenario and a JSON Schema rule.

Produce a correct and reproducible reviewable trail (smoke test report) for CI and SDD.

Overview: Mutation testing of specifications (or "vaccination of validators") is a practice for checking the resilience of the validator. Instead of checking the system against correct data, the technique deliberately "corrupts" the baseline specification using predefined mutation operators (for example, deleting fields or creating cycles). The validator must catch this defect at a strictly defined step and emit a precise error code. This approach turns the validator from a simple syntax check into an instrument of anatomical diagnostics, ensuring that incorrect incidents will not lead to cascading failures in production.

Key concepts: Mutation testing: A testing technique in which a reference artifact is controllably modified (corrupted), and the test loop is required to detect this defect and reject it.

Immunity score: A vector metric of validator resilience, consisting of three components: strict_reject_rate (share of strictly rejected cases), depth_of_diagnostics (useful diagnostic depth), and recovery_time (time to return a stable verdict).

Mutation factory: A deterministic generator that takes a correct baseline specification (base_spec.json) and applies a catalog of corruption operators to it, preserving reproducibility through a generation seed.

Mutation operators: Types of controlled defects. Main classes: Nullify (zeroing out fields), FutureTime (temporal anomalies), EscalationCycle (cycles in escalation routes), PriorityContradiction (priority contradictions).

The "one mutant — one expected failure" principle: The main rule of the mutation factory. One mutator must introduce only one defect, so that when a test fails the cause can be precisely localized and tied to a specific rule.

Practice exercises: Name: Running mutation generation and verifying determinism

Problem: You need to navigate to the example directory and run the mutation generation script using the Nullify, FutureTime, EscalationCycle, PriorityContradiction operators and a fixed seed of 20260517. You must confirm that the generator is deterministic.

Solution: 1. Open a terminal and execute: cd book2/examples/stress-mutator.

Run the generator: python3 scripts/mutate_specs.py --base base/base_spec.json --seed 20260517 --operators Nullify,FutureTime,EscalationCycle,PriorityContradiction --out out/mutations.
Repeat the command from step 2.
Compare the manifest files: diff out/mutations/manifest.json manifest.example.json.

Expectation: 0 lines of differences, the order of mutation_id matches on every run.

Complexity: beginner

Name: Calculating the immunity vector of the validator

Problem: Using the previously generated mutations, run the fake validator and the immunity score calculation script. Analyze the obtained values and draw a conclusion about the readiness of the loop.

Solution: 1. Run the validator: python3 scripts/fake_validator.py --mutations out/mutations --out out/validator_results.json.

Run the metrics calculation: python3 scripts/immunity_score.py --validator-results out/validator_results.json --expected expected/expected_failures.json.
Verify that the values meet the thresholds: strict_reject_rate >= 0.98, depth_of_diagnostics >= 3, recovery_time_p95_ms <= 1200.

Complexity: intermediate

Name: Producing the reviewable trail (Capstone)

Problem: Based on the smoke test performed (practice exercises 1 and 2), produce a minimal report fragment for the file capstone/validation.md.

Solution: Create or open capstone/validation.md and add a block containing the seed, the list of operators, the three computed immunity metrics, and the final verdict (verdict: PASS/FAIL). Do not commit the out/mutations directory itself — the source of truth must be the reproducible command.

Complexity: intermediate

Case studies: Name: Validation of the appointment_latency_spike incident

Scenario: Consider a production incident with a sharp increase in latency (appointment_latency_spike). According to the specification, the SLA is 10 minutes, and the escalation route goes from appointments_oncall to sre_lead. The baseline specifier contains correct data for this case.

Challenge: It is necessary to make sure that the validator will not miss a situation where the severity field becomes empty. If an empty field leaks into the system, it can distort the SLA calculation and lead to incorrect selection of the incident owner, which is critical for the throughput of on-call engineers.

Solution: The Nullify mutation operator is applied to the severity field. The validator is expected to detect the defect at the Given:incident_received step and emit the code EMPTY_REQUIRED_FIELD. The pipeline must be forcibly stopped before the When:evaluate_sla_window step. The result is bound to the JSON Schema rule: severity.minLength.

Result: The validator successfully stopped processing before the SLA computation. The evidence chain is recorded in the SDD: mutation_id, difference in the specification (diff: 'P1' -> ''), diagnostic code, and routing stack. It is proven that the validator provides "anatomical diagnostics".

Lessons learned: Temporal anomalies and empty fields should be checked before routing and SLA computations begin.

The "one mutant — one expected failure" principle is critical for localizing the problem without manual debugging.

Related concepts: Nullify operator

Given/When/Then format

strict_reject_rate

Name: Detecting an escalation cycle (cdn_error_budget_burn)

Scenario: An incident with an error budget burn (cdn_error_budget_burn) is being analyzed. The edge_oncall owner hands off a P1 incident to the traffic_sre team.

Challenge: Using the EscalationCycle operator, a reverse edge (traffic_sre → edge_oncall) is injected into the escalation graph. JSON Schema validates the shape of the data well, but is not always able to express the topological behavior of the route. It is necessary to check whether the execution loop will get into infinite reassignment of the owner.

Solution: The validator builds a directed graph of owners and runs a depth-first search (DFS) with white/gray/black states. When a gray node is found, the error CYCLE_ESCALATION is returned with the minimal cycle indicated. The failure is bound to the When:route_escalation step.

Result: The validator detected the cycle before the execution phase and stopped the process. The report showed a clear failure route: schema.normalize -> step.when.prepare -> graph.build -> graph.detect_cycle -> halt.

Lessons learned: Diagnosing topological cycles requires a separate graph pass inside the validator.

Each mutant must be bound to a specific rule (for example, $defs.escalation_graph.no_cycles) so that the diagnosis is not overly generic.

Related concepts: EscalationCycle operator

Graph traversals (DFS)

depth_of_diagnostics

Study tips: Always verify determinism: run the mutation generator twice with the same seed and compare the files. A single run without verification is random noise, not a reliable metric.

Pay attention to the classes of degenerate scenarios (empty fields, temporal anomalies, escalation cycles, recursive dependencies). Developers often forget about reverse edges in graphs.

When studying the immunity metric, always treat it as a vector. If strict_reject_rate grows while depth_of_diagnostics falls, your loop has become stricter but "blinder" (the engineer will not understand how to fix the incident).

Remember the rule: The out/ catalog is not committed to Git. Keep only the seed, operators, and summary metrics in the repository.

Additional resources: Runnable example: book2/examples/stress-mutator/README.md — working Python scripts for local execution of the mutation factory and immunity calculation.

Part 09 feature validation: ../book/part-09-feature-validation.md — the discipline of feature validation, without which mutations of specifications lose meaning.

Part 20 sdd antipatterns: ../book/part-20-sdd-antipatterns.md — a catalog of classic process errors on which the mutation operators are built.

Summary: Mutation testing of specifications turns validator testing into a managed engineering cycle. Using a mutation factory, you deliberately create degenerate scenarios (empty fields, cycles, time errors) and verify that the validator catches them at the correct Given/When/Then step. Computing the vector immunity metric (strict_reject_rate, depth_of_diagnostics, recovery_time) allows configuring a CI gate that will not let a regression through and will protect the system from toxic requirements and cascading failures.