Study guide: Applied Part 3. Project Constitution: First Rules Referendum

Lesson 3 of 5 in module «Applied Part 3. Project Constitution: First Rules Referendum»
You are viewing the lesson without signing in. Sign in to save progress and take tests.

Topic: Applied Part 3. Project Constitution: The First Rules Referendum

Difficulty level: Medium

Estimated study time: 6-8 hours (minimal track: 2-3 hours; full track with referendum: +4-5 hours)

Prerequisites: Part 6 of Volume 1: understanding the structure of mission.md, tech-stack.md, roadmap.md

Part 18 of Volume 1: basics of SDD process security

Basic understanding of YAML syntax

Experience with auto-remediation systems or incident management

Understanding of CI/CD and production security concepts

Learning objectives: Formulate at least two immutable_principles as prohibitions (not recommendations) and one mutable_rule with mandatory six fields (incident_type, pipeline_phase, permitted_actions, max_scope, ttl, rollback_condition)

Create a working constitution.md file that passes verification by the check.py script with a PASS verdict

Explain the difference between a product constitution (mission/tech-stack/roadmap) and a safe automation constitution (constitution.md)

Describe a governance_protocol with three roles, a quorum, a Safety veto, and a deterministic tie-breaker rule

Transfer principles from the local node_not_ready case to the exam case high_memory_usage, preserving the portability of invariants

Overview: This section teaches how to create a versioned security contract for auto-remediation — a constitution.md file that separates what an agent is never allowed to do (immutable_principles) from what can be delegated with restrictions (mutable_rules). Unlike the product files from Volume 1 (mission.md, tech-stack.md, roadmap.md), which answer the question "what are we building," constitution.md answers the question "what must the agent not do with it, even if it really wants to." The key innovation is the strict structure of mutable rules with six mandatory fields, ensuring limited lifespan, clear blast radius, and automatic rollback upon degradation. For educational completion, manual filling of a minimal constitution.md is sufficient; the automatic agent referendum and external arbiter belong to the full production track.

Key concepts: Immutable principles: Unchangeable security-level prohibitions and obligations that can never be automatically disabled. Formulated as invariants, not recommendations. Examples: prohibition on restarting production databases without a verified backup, prohibition on automatically deleting backups and audit logs, requirement of two consecutive OKs before transitioning an incident to resolved. These are what constrain the agent in moments of pressure, when the shortest path to reducing MTTR could create a larger cascade of failures.

Mutable rules: Changeable norms with precise scope that can be revoked or rewritten through formal procedure. Each rule must contain six fields: incident_type (incident class), pipeline_phase (response phase: triage, recovery, etc.), permitted_actions (list of allowed actions), max_scope (limit on blast radius, e.g., single_node), ttl (rule lifespan, after which review is required), rollback_condition (automatic rollback condition). Without these fields, an amendment becomes too broad and competes with invariants.

Max scope: Maximum blast radius within which an action is permitted. A critically important parameter for limiting cascading failures. Examples: single_node, single_pod, single_namespace. Any expansion of max_scope requires a separate proposal and referendum.

Ttl (time to live): Lifespan of a mutable rule. After ttl expires, the rule automatically deactivates or requires explicit extension through a referendum. Prevents "forgotten" rules that nobody reviews after six months. Typical values: 12h for hotfix rules, 14d, 30d for regular operations.

Rollback condition: Condition under which a mutable rule rolls back automatically, without waiting for ttl expiration. Examples: repeat_incidents_same_node>=2, memory_percent>=90% after 2 windows, 5xx_increase, Safety veto. Ensures rapid reaction to unforeseen side effects of automation.

Governance protocol: Voting procedure for amending the constitution. Minimum composition — three roles: Verifier (checks for invariant violations), Implementor (evaluates applicability in playbook), Safety (checks blast radius, privacy, rollback conditions). Quorum: 2 approve + no Safety-veto. Tie-breaker rule: safety-first. Referendum is called after three unknown incidents with the same pattern_id within 48 hours.

Proposal.md: Amendment form to the constitution, including input events, current unknown classification, assumed risk, expected effect, cancellation condition. An amendment without origin (reason) and voting record is not considered part of production SDD.

Change log: Immutable trail of constitution changes, containing version, parent_version, reason, votes by role, decision_hash (cryptographic hash for verification), incident_context, activation_time, link to diff. Transforms amendment history into a chain of evidence.

Decision hash: Cryptographic hash of the decision (e.g., sha256), which can be recalculated and verified later. Ensures reproducibility and protection against history tampering.

Safety-first tie-breaker: Tie-breaker rule where critical risk from Safety always leads to amendment rejection. A deterministic rule prevents system deadlock in case of a tied vote.

Constitution gateway: Verification gateway before executing a dangerous action: a local script checks immutable_principles and mutable_rules, then an LLM in planning mode explains risks, and only after that does the executor receive clearance. Order is critical: check before action, not after.

Practice exercises: Name: Creating a minimal constitution.md for node_not_ready

Problem: Create a constitution.md file for the node_not_ready scenario. Requirements: two immutable_principles formulated as prohibitions (not recommendations); one mutable_rule with mandatory six fields; a short governance_protocol. Test the file with the question: "what action will the agent be unable to perform, even if it would reduce MTTR?" The answer must be in immutable_principles, not in chat.

Solution: Step 1: Formulate immutable_principles as prohibitions. Example: "Do not perform auto-remediation without audit_trace" and "Do not touch stateful workload without a confirmed backup." Step 2: Create a mutable_rule for incident_type: node_not_ready, pipeline_phase: triage, permitted_actions: ["soft_restart_agent"], max_scope: "single_node", ttl: "30d", rollback_condition: "repeat_incidents_same_node>=2 or Safety veto". Step 3: Describe governance_protocol: roles Verifier/Implementor/Safety, quorum 2 approve + no Safety-veto, tie-breaker: safety-first. Step 4: Run verification: cd book2/examples/constitution && python3 scripts/check.py --constitution specs/constitution.yaml --proposal proposals/valid_proposal.md. Expected result: verdict: PASS. Step 5: Test with proposals/missing_evidence.md — verdict: BLOCK with reasons specified.

Complexity: beginner

Name: Transferring principles to high_memory_usage

Problem: Transfer the structure from node_not_ready to the high_memory_usage case. Do not copy the rule — find a portable principle. Formulate an immutable_principle for confirmed memory stabilization and a mutable_rule for restart_pod with appropriate max_scope, ttl, and rollback_condition. If the principle cannot be formulated as a string, return to the local case.

Solution: Step 1: Immutable: "Do not close high_memory_usage without confirmation that RSS returned below 80% twice in a row" (transfer of the two consecutive OKs principle). Step 2: Mutable: incident_type: high_memory_usage, pipeline_phase: recovery, permitted_actions: ["restart_pod"], max_scope: "single_pod", ttl: "14d", rollback_condition: "5xx_increase OR memory_percent>=90% after 2 windows". Step 3: Governance: Safety-veto when attempting to expand restart_pod to namespace (transfer of the radius limitation principle). Step 4: Verify that all six fields are filled. Step 5: Record in capstone/README.md one line with the reason for the rule's appearance.

Complexity: intermediate

Name: Checking and fixing typical errors

Problem: Given a draft constitution.md with typical errors: an immutable rule formulated as "avoid if possible"; a mutable_rule missing rollback_condition; governance_protocol not describing a tie-breaker; change_log missing. Find and fix all errors while preserving verification reproducibility.

Solution: Step 1: Convert "avoid if possible" to a strict prohibition: "Do not perform X automatically, even under MTTR pressure." Step 2: Add rollback_condition: a specific automatic rollback condition tied to observable metrics. Step 3: Add tie-breaker: safety-first_then_latest_matching_precedent. Step 4: Create change_log with at least one record containing version, parent_version, reason, votes, decision_hash, activation_time. Step 5: Verify via Qwen Code: request a draft check returning a list of discrepancies without automatically writing files. Step 6: Repeat runnable verification until PASS is obtained.

Complexity: intermediate

Name: Simulating a referendum and filling change_log

Problem: Three identical unknown NodeNotReady incidents occurred within 36 hours. Form a proposal.md, conduct a referendum with three roles, record the result in change_log. Consider the scenario: Safety casts a veto with Verifier and Implementor tied. Ensure the tie-breaker works deterministically.

Solution: Step 1: Confirm threshold reached: 3 unknown incidents with pattern_id NodeNotReady within 48h (36h < 48h, threshold reached). Step 2: Form proposal.md: input events (3 incidents with timestamps), current unknown classification, assumed risk (soft_restart_agent on unprepared node), expected effect (40% MTTR reduction), cancellation condition (repeat_incidents_same_node>=2). Step 3: Convene referendum within 15 minutes. Step 4: Voting: Verifier: approve, Implementor: approve, Safety: veto (critical_risk — node contains stateful workload without explicit backup). Step 5: Applying pass_rule: at_least_2_approve_and_no_safety_veto → Safety-veto blocks. With tie-breaker safety-first, result: BLOCK. Step 6: Record in change_log: version 1.2.0, parent_version 1.1.0, reason, votes (verifier: approve, implementor: approve, safety: veto), decision_hash, activation_time: null (amendment not accepted). Step 7: Repeat scenario with Safety: abstain — amendment accepted, activation_time filled.

Complexity: advanced

Case studies: Name: Cascade failure without max_scope: incident at a cloud provider

Scenario: A major cloud provider implemented auto-remediation for DiskPressure incidents on Kubernetes clusters. The rule allowed draining nodes without limiting blast radius. During a mass storage failure, the rule sequentially drained nodes across the entire region, turning a local incident into regional unavailability.

Challenge: Absence of max_scope in mutable_rule allowed the agent to expand the radius of action cascadically. There was no ttl — the rule acted indefinitely. There was no rollback_condition — as the failure scope grew, the system did not stop automatically. The team discovered the problem after 47 minutes, when 60% of the region's nodes had already been drained.

Solution: After the incident, the team introduced a strict constitution: immutable_principle "Do not perform mass operations on nodes without explicit approval from Safety"; mutable_rule for DiskPressure with max_scope: single_node, ttl: 4h, rollback_condition: "drain_rate>1_node_per_5min OR affected_pods>10". Governance_protocol with Safety-veto on any max_scope expansion. Automatic gateway checks the constitution before each drain.

Result: Recovery time reduced from 4 hours to 15 minutes for isolated incidents. No regional cascade failures occurred for 18 months. Average time for new rule adoption through referendum: 12 minutes. All changes are traceable in change_log with decision_hash.

Lessons learned: max_scope is not a decorative field, but a critical barrier against cascade failures; its absence is more costly than any MTTR savings

ttl forces rule review and prevents "forgotten" permissions that become dangerous when architecture changes

rollback_condition must be tied to observable metrics, not abstract assessments; drain_rate and affected_pods are measured automatically

Safety-veto on radius expansion is a necessary compensation for referendum speed; without it, a quorum of 2 approve becomes insufficient for critical decisions

Related concepts: max_scope

ttl

rollback_condition

safety-first tie-breaker

constitution gateway

Name: Successful hotfix ticketing flood: limited rule with clear ttl

Scenario: A support platform faced a wave of tickets due to an external API integration failure. Standard ticket routing created a 4-hour queue, critical for SLA. The team considered automatically redirecting all tickets to a general queue without classification.

Challenge: Full automation of routing violated the immutable_principle "Do not lose audit-trace when processing tickets" and created a risk of PII leakage into the wrong queue. Manual processing could not handle the volume. A temporary rule with guaranteed rollback was needed.

Solution: The team created a mutable_rule hotfix_ticketing_flood: incident_type: api_integration_failure, pipeline_phase: triage, permitted_actions: ["simplified_routing"], max_scope: "single_api_endpoint", ttl: "12h", rollback_condition: "checkpoint_missing OR pii_exposure_detected". The immutable_principle preserved the prohibition on losing audit. Governance_protocol: Verifier checked for absence of PII in simplified routing, Implementor — for checkpoint presence, Safety — for blast radius. Referendum convened in 8 minutes, accepted with Safety: abstain.

Result: SLA restored within 2 hours. The rule automatically deactivated after 12 hours. The checkpoint allowed restoring standard routing without data loss. Audit-trace fully preserved. Record in change_log with decision_hash is used as precedent for future hotfix rules.

Lessons learned: Short ttl (12h) for hotfix rules is preferable to a "reasonable" term of a week — forces an explicit decision on extension or archival

checkpoint as part of rollback_condition ensures technical rollback capability, not just political permission

Safety: abstain instead of approve is an acceptable outcome when risks are minimal; it is important that veto remains possible and blocks on critical risks

Precedent in change_log accelerates future referendums: latest_matching_precedent in tie-breaker references the hash of the accepted decision

Related concepts: ttl

rollback_condition

change_log

decision_hash

immutable_principles

Study tips: Start with manual filling, not automation: create constitution.md in a text editor, check by eye, then use the script. This builds structural intuition that automation hides

Test each immutable_principle with a pressure test: formulate a scenario where the agent "really wants" to violate the rule for MTTR. If the formulation allows circumvention — it is a recommendation, not an invariant

Use paired comparison "bad/good" for mutable_rules: take a rule without ttl/max_scope/rollback_condition, imagine it in a year when architecture changes, then add missing fields and repeat the thought experiment

Practice transferring principles, not copying rules: take the local node_not_ready case, extract an abstract principle (e.g., "confirmed stabilization before resolved"), then apply to a completely different context (high_memory_usage, cpu_throttling, disk_pressure)

Practice interviewing with Qwen Code using the template: ask three grouped questions, get answers, check the draft, then ask to find typical errors. This models the production process and prepares you for working with an LLM assistant

Create your own "bad" examples for runnable verification: take proposals/missing_evidence.md and proposals/conflict_with_immutable.md, modify them, predict the verdict, then check. Script feedback reinforces understanding of boundaries

Keep a "constitutional diary": for each incident in your practice, record which immutable_principle and mutable_rule would apply, which max_scope and rollback_condition you would choose. This turns abstract concepts into professional intuition

For the full track: simulate a referendum with colleagues or in a role-play. Assign roles Verifier/Implementor/Safety, distribute the same proposal, compare votes. Discrepancies will show ambiguities in governance_protocol that need to be eliminated

Additional resources: Book2/examples/constitution/: Runnable examples of constitution verification, including check.py script, templates for constitution.yaml and proposal.md, PASS and BLOCK examples

Book2/examples/templates/proposal.md: Amendment form template with mandatory sections for referendum

Book2/examples/spec-ci/readme.md: Runnable analog of gateway: same idea — verifiable gateway before action

Part 6 of Volume 1 (part-06-constitution.md): Source of mission.md, tech-stack.md, roadmap.md — product constitution of Volume 1

Part 18 of Volume 1 (part-18-sdd-security.md): SDD process security — context for understanding the cost of error in production

Part 13 of Volume 1 (part-13-legacy-support.md): Interaction with running code — connection of constitution with legacy systems

Part 16 of Volume 1 (part-16-team-code-review.md): Comparison of feature code review and dangerous action review — why one person is not enough

Qwen code plan mode documentation: Planning mode for constitution verification without automatic file writing

Yaml 1.2 specification: Technical specification of constitution.md format for deep understanding of data structure

Summary: The project constitution (constitution.md) is a versioned security contract that separates what an agent is never allowed to do (immutable_principles as strict prohibitions) from what can be delegated with restrictions (mutable_rules with mandatory six fields: incident_type, pipeline_phase, permitted_actions, max_scope, ttl, rollback_condition). Unlike the product files of Volume 1, it answers the question "what must the agent not do." The educational minimum is manual creation of a file with two prohibitions and one mutable rule that passes script verification. The full track adds governance_protocol with three roles, quorum, Safety-veto and deterministic tie-breaker, agent referendum, and immutable change_log with decision_hash. The key practical skill is transferring principles between cases (node_not_ready → high_memory_usage), not mechanical copying. The constitution is checked before dangerous action, not after, turning reaction speed from a risk into a manageable process.

My notes
0 / 10000

Notes are saved in this browser. They will not appear on another device.

Course menu

Course

Production SDD for Qwen Code CLI. Part 2
Progress 0 / 100