Study guide: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents

Lesson 3 of 5 in module «Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents»

You are viewing the lesson without signing in. Sign in to save progress and take tests.

Topic: Applied Part 8. File-Based Arbitration of Disputed Changes: Roles, Verdicts, and Precedents

Difficulty level: Medium

Estimated study time: 2-3 hours

Prerequisites: Knowledge of team review principles (Part 16 of Volume 1)

Understanding of LLM duel mechanisms and working with counterexamples

Basic CLI skills (running Python scripts)

Understanding of YAML and JSON data structures

Learning objectives: Understand the difference between an LLM duel and file-based arbitration (defect-finding questions vs. issuing an official verdict).

Set up and run a minimal learning arbitration scenario using the run_duel.py, check_invariants.py, and write_judgment.py scripts.

Compose correct entries in judgment.md using strict evidence (evidence_ref).

Create entries in precedents.md to document recurring disputes.

Apply the anti-Goodhart rule to protect the system from optimizing metrics at the expense of hidden risks (silent failures, rollback-flapping).

Overview: File-based arbitration turns the resolution of disputes over code changes from a subjective exchange of opinions in chat into a formalized, reproducible chain of evidence. This system uses clear roles: a Coordinator manages the process, an Implementer proposes changes, a Verifier checks them against formal criteria, and Safety imposes a veto on critical risks. The outcome of arbitration becomes project artifacts: judgment.md (a decision log with evidence) and precedents.md (a knowledge base of recurring disputes). The approach is designed so that the verdict does not depend on a specific AI model (which is verified through tier rotation), and the specification remains the primary source of truth.

Key concepts: File-based arbitration: Collective review of a disputed change by multiple roles, where the result is recorded as files (artifacts) rather than in chat history.

Arbitration roles: Coordinator (keeps the protocol, does not vote), Implementer (proposes changes), Verifier (checks formal criteria), Safety (blocks critical risks via veto).

Judgment.md: Decision log for disputes. It must contain a verdict (APPROVE/DENY/DEFERRED), a reason, and an evidence_ref — a reference to a file, not a paraphrase from chat.

Precedents.md: Precedent log for recurring disputes. It contains a stable identifier, a verdict, evidence, applicability boundaries, and a condition for reconsideration (next_check).

Evidence ref: A strict reference to evidence (hook log, diff difference, JSON Schema, Given/When/Then scenario). The Verifier does not accept arguments without an evidence_ref.

Tier rotation: Testing the same specification on different pairs of agents (for example, a cheap local model + an expensive cloud model). It reveals whether the verdict depends on a specific AI.

Anti-Goodhart invariants: A protective mechanism of arbitration that forbids improving one metric (for example, MTTR) at the expense of degrading others (rising false_escalation_rate, rollback-flapping, silent_p0).

Decision trace: A reproducible phased decision protocol describing extracted facts, red-flag checks, the policy applied, and the final verdict.

Practice exercises: Name: Running the training runnable case autoscale_200pct

Problem: You need to conduct file-based arbitration for the autoscale_spec.yaml specification. Run the duel, check the anti-Goodhart invariants, and generate the final decision log.

Solution: 1. Go to the example directory: cd book2/examples/tribunal. 2. Run the duel script: python3 scripts/run_duel.py --spec specs/autoscale_spec.yaml --cases cases/ --out out/duel.json. 3. Run the invariant check: python3 scripts/check_invariants.py --metrics metrics/validation_metrics.json --out out/invariants.json. 4. Generate the verdict: python3 scripts/write_judgment.py --duel-out out/duel.json --invariants-out out/invariants.json --to out/judgment.md.

Complexity: beginner

Name: Moving a conflict to the precedent database

Problem: During the autoscale_200pct review, a recurring conflict was identified: the automatic mode requires audit_trace_coverage=1.0, but the current value is 0.7. This precedent needs to be documented.

Solution: Open the precedents.md file and add a YAML structure: case_id: PREC-001, verdict: DENY, evidence_ref: tests/regression_001.json, applies_to: auto-remediation without full audit_trace, next_check: repeat the duel when manual_review_floor changes. In the next dispute, refer to PREC-001.

Complexity: intermediate

Name: Verifying protection against the Goodhart trap

Problem: The Implementer has proposed a plan to reduce MTTR from 6 to 2 minutes through aggressive auto-escalation. As the Verifier, you need to reject the plan if it violates invariants.

Solution: Check the logic against the hard stop conditions in validation.md. Make sure that false_escalation_rate <= 0.05, rollback_flapping < 3/hr, and silent_p0_ratio == 0. If the plan violates at least one condition, the Coordinator records FAIL(reason=metric corruption) in judgment.md.

Complexity: advanced

Case studies: Name: Arbitrating a dispute over Rate Limiting in an API gateway

Scenario: The specification requires, during a burst of requests, temporarily throttling a specific client (tenant) without blocking the entire service. The Implementer proposes a patch that adds tenant_id to the deduplication key and a burst_window_sec=60.

Challenge: It is necessary to ensure that the patch actually isolates clients and does not lead to a global block. It is important to check not only the technical plausibility of the text, but also the presence of strict evidence.

Solution: The Verifier requires three pieces of evidence (evidence_ref): 1) A JSON Schema that requires tenant_id, limit_reason, expires_at; 2) A PreToolUse log that prohibits changing the global limit; 3) A Given/When/Then scenario proving that a burst from tenant A does not reduce tenant B's quota.

Result: During A/B testing, it turned out that with a "strong Implementer + weak Verifier" pair, the patch is rejected because the evidence is hidden in long text rather than extracted into a formalized diff. An explicit tenant isolation scenario in Gherkin was added to the specification.

Lessons learned: A verdict must be based on formalized evidence, not on a model's authority or the persuasiveness of text.

A divergence of verdicts under tier rotation means that the specification's requirements are not portable enough and need to be strengthened through a diff in validation.md.

Related concepts: Tier rotation

evidence_ref

JSON Schema

Given/When/Then scenarios

Study tips: Pay special attention to the difference in evidence formats: weak models (local-coder) only understand minimal_form (a short diagnostic_code), while strong ones (frontier-reviewer) can work with extended_form (an evidence_by_invariant structure).

Remember the golden rule of arbitration: a dispute is resolved only through diffs in requirements.md, hooks.md, or validation.md. Chat correspondence is not evidence.

When studying on your own, use the matrix.py script to check how the verdict changes when AI-agent pairs (C1-C4) are swapped.

Study the decision_trace format — it helps separate facts from assessments and makes the decision protocol reproducible for other verifiers.

Additional resources: Example of file-based arbitration (runnable): book2/examples/tribunal/ and book2/examples/tribunal/matrix/

Project charter and voting weights: Part 3 (part-03-project-constitution.md#key-ideas)

Feature validation artifact: Part 9 (part-09-feature-validation.md) — validation.md is described here

Compatibility with qwen code cli: appendix-b-qwen-code-compatibility.md

Summary: File-based arbitration turns the code review process from a chaotic discussion into a strict procedural protocol. The main tools here are the judgment.md and precedents.md logs, and the support for decision-making is provided by formal evidence (evidence_ref), such as hook logs, JSON schemas, and Given/When/Then test scenarios. The approach is protected against metric distortion by anti-Goodhart invariants and against bias toward specific AI models by tier rotation. The main rule: if there is no evidence in a file, the Coordinator must not turn an agent's impression into an official verdict.

0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course

Using SDD in Development for Qwen Code CLI. Applied Course

Progress 0 / 95

○ Reading: Practical Part 0. AgentClinic-production Laboratory 🔒 Diagram: Practical Part 0. AgentClinic-production Laboratory 🔒 Study guide: Practical Part 0. AgentClinic-production Laboratory 🔒 Quiz: Practical Part 0. AgentClinic-production Laboratory 🔒 Flashcards: Practical Part 0. AgentClinic-production Laboratory

🔒 Reading: Applied Part 1. Recovering Specifications from Legacy 🔒 Diagram: Applied Part 1. Recovering Specifications from Legacy 🔒 Study guide: Applied Part 1. Recovering Specifications from Legacy 🔒 Quiz: Applied Part 1. Recovering Specifications from Legacy 🔒 Flashcards: Applied Part 1. Recovering Specifications from Legacy

🔒 Reading: Applied Part 2. Specification Defect Diagnostics 🔒 Diagram: Applied Part 2. Specification Defect Diagnostics 🔒 Study guide: Applied Part 2. Specification Defect Diagnostics 🔒 Quiz: Applied Part 2. Specification Defect Diagnostics 🔒 Flashcards: Applied Part 2. Specification Defect Diagnostics

🔒 Reading: Applied Part 3. Project Constitution: First Referendum on Rules 🔒 Diagram: Applied Part 3. Project Constitution: First Referendum on Rules 🔒 Study guide: Applied Part 3. Project Constitution: First Referendum on Rules 🔒 Quiz: Applied Part 3. Project Constitution: First Referendum on Rules 🔒 Flashcards: Applied Part 3. Project Constitution: First Referendum on Rules

🔒 Reading: Applied Part 4. LLM Duel: Verifier vs. Implementer in Formal Statements 🔒 Diagram: Applied Part 4. LLM Duel: Verifier vs. Implementer in Formal Statements 🔒 Study guide: Applied Part 4. LLM Duel: Verifier vs. Implementer in Formal Statements 🔒 Quiz: Applied Part 4. LLM Duel: Verifier vs. Implementer in Formal Statements 🔒 Flashcards: Applied Part 4. LLM Duel: Verifier vs. Implementer in Formal Statements

🔒 Reading: Applied Part 5. Mutation Testing of Specifications 🔒 Diagram: Applied Part 5. Mutation Testing of Specifications 🔒 Study guide: Applied Part 5. Mutation Testing of Specifications 🔒 Quiz: Applied Part 5. Mutation Testing of Specifications 🔒 Flashcards: Applied Part 5. Mutation Testing of Specifications

🔒 Reading: Applied Part 6. Selection of Shadow Specifications 🔒 Diagram: Applied Part 6. Selection of Shadow Specifications 🔒 Study guide: Applied Part 6. Selection of Shadow Specifications 🔒 Quiz: Applied Part 6. Selection of Shadow Specifications 🔒 Flashcards: Applied Part 6. Selection of Shadow Specifications

🔒 Reading: Applied Part 7. Specification CI: specification as an executable artifact 🔒 Diagram: Applied Part 7. Specification CI: specification as an executable artifact 🔒 Study guide: Applied Part 7. Specification CI: specification as an executable artifact 🔒 Quiz: Applied Part 7. Specification CI: specification as an executable artifact 🔒 Flashcards: Applied Part 7. Specification CI: specification as an executable artifact

🔒 Reading: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents 🔒 Diagram: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents ▸ Study guide: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents 🔒 Quiz: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents 🔒 Flashcards: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents

🔒 Reading: Applied Part 9. Model Routing and Token Budget 🔒 Diagram: Applied Part 9. Model Routing and Token Budget 🔒 Study guide: Applied Part 9. Model Routing and Token Budget 🔒 Quiz: Applied Part 9. Model Routing and Token Budget 🔒 Flashcards: Applied Part 9. Model Routing and Token Budget

🔒 Reading: Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode 🔒 Diagram: Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode 🔒 Study guide: Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode 🔒 Quiz: Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode 🔒 Flashcards: Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode

🔒 Reading: Practical Part 11. Integration with a Real API: From Specification to Deployment 🔒 Diagram: Practical Part 11. Integration with a Real API: From Specification to Deployment 🔒 Study guide: Practical Part 11. Integration with a Real API: From Specification to Deployment 🔒 Quiz: Practical Part 11. Integration with a Real API: From Specification to Deployment 🔒 Flashcards: Practical Part 11. Integration with a Real API: From Specification to Deployment

🔒 Reading: Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle 🔒 Diagram: Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle 🔒 Study guide: Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle 🔒 Quiz: Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle 🔒 Flashcards: Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle

🔒 Reading: Practical Part 13. Practical Assessment: Build a Production SDD Pipeline 🔒 Diagram: Practical Part 13. Practical Assessment: Build a Production SDD Pipeline 🔒 Study guide: Practical Part 13. Practical Assessment: Build a Production SDD Pipeline 🔒 Quiz: Practical Part 13. Practical Assessment: Build a Production SDD Pipeline 🔒 Flashcards: Practical Part 13. Practical Assessment: Build a Production SDD Pipeline

🔒 Reading: Appendix A. Bridges to the first volume 🔒 Diagram: Appendix A. Bridges to the first volume 🔒 Study guide: Appendix A. Bridges to the first volume 🔒 Quiz: Appendix A. Bridges to the first volume 🔒 Flashcards: Appendix A. Bridges to the first volume

🔒 Reading: Appendix B. Qwen Code Compatibility 🔒 Diagram: Appendix B. Qwen Code Compatibility 🔒 Study guide: Appendix B. Qwen Code Compatibility 🔒 Quiz: Appendix B. Qwen Code Compatibility 🔒 Flashcards: Appendix B. Qwen Code Compatibility

🔒 Reading: Appendix C. Applied SDD Checklists 🔒 Diagram: Appendix C. Applied SDD Checklists 🔒 Study guide: Appendix C. Applied SDD Checklists 🔒 Quiz: Appendix C. Applied SDD Checklists 🔒 Flashcards: Appendix C. Applied SDD Checklists

🔒 Reading: Appendix D. Threshold Calibration 🔒 Diagram: Appendix D. Threshold Calibration 🔒 Study guide: Appendix D. Threshold Calibration 🔒 Quiz: Appendix D. Threshold Calibration 🔒 Flashcards: Appendix D. Threshold Calibration

🔒 Reading: Applied Volume Glossary 🔒 Diagram: Applied Volume Glossary 🔒 Study guide: Applied Volume Glossary 🔒 Quiz: Applied Volume Glossary 🔒 Flashcards: Applied Volume Glossary

Study guide: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents

My notes

Course menu

Course

Study guide: Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents

My notes

Course menu

Course

1. Practical Part 0. AgentClinic-production Laboratory 0 / 5

2. Applied Part 1. Recovering Specifications from Legacy 0 / 5

3. Applied Part 2. Specification Defect Diagnostics 0 / 5

4. Applied Part 3. Project Constitution: First Referendum on Rules 0 / 5

5. Applied Part 4. LLM Duel: Verifier vs. Implementer in Formal Statements 0 / 5

6. Applied Part 5. Mutation Testing of Specifications 0 / 5

7. Applied Part 6. Selection of Shadow Specifications 0 / 5

8. Applied Part 7. Specification CI: specification as an executable artifact 0 / 5

9. Applied Part 8. File Arbitration of Disputed Changes: Roles, Verdicts, and Precedents 0 / 5

10. Applied Part 9. Model Routing and Token Budget 0 / 5

11. Applied Part 10. Protecting Metrics from Goodhart's Law: Guardrail Metrics and Emergency Mode 0 / 5

12. Practical Part 11. Integration with a Real API: From Specification to Deployment 0 / 5

13. Applied Part 12. Production SDD Antipatterns: Diagnostic Map of the Applied Cycle 0 / 5

14. Practical Part 13. Practical Assessment: Build a Production SDD Pipeline 0 / 5

15. Appendix A. Bridges to the first volume 0 / 5

16. Appendix B. Qwen Code Compatibility 0 / 5

17. Appendix C. Applied SDD Checklists 0 / 5

18. Appendix D. Threshold Calibration 0 / 5

19. Applied Volume Glossary 0 / 5