Merge feat/adr-structure: ADR-023 structure & lifecycle + back-catalogue conformance

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-10 15:18:48 +02:00
commit 7ebbc113ab
27 changed files with 1434 additions and 60 deletions

View file

@ -25,7 +25,8 @@ report the rest, and write a tracked report to `docs/reviews/`.
### Phase 0 — deterministic pre-scan
Run `python3 scripts/repo-scan.py > /tmp/repo-scan.json`. It returns the **inventory**
(roles, ADRs, runbooks, playbooks, scripts — your shard list) and **exact findings**
(markers, broken refs, unencrypted vaults). Fold these into the report verbatim.
(markers, broken refs, unencrypted vaults, ADR-structure violations). Fold these into
the report verbatim.
It also emits two deferral checks (see Phase 2): `open-deferred-item` (every still-open
ADR "Deferred/Open" entry — a checklist to confirm) and `stale-deferred` (an entry

View file

@ -231,6 +231,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
| Firewall strategy | `docs/decisions/020-firewall.md` |
| Operational access | `docs/decisions/021-operational-access.md` |
| Backup & disaster recovery | `docs/decisions/022-backup.md` |
| ADR structure & lifecycle | `docs/decisions/023-adr-structure.md` |
| Adding a new role | `docs/runbooks/new-role.md` |
| Adding a new host | `docs/runbooks/new-host.md` |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |

View file

@ -25,6 +25,24 @@ _(append new raw signals here; the next kaizen review consumes them)_
invented a Status header ("Proposed") on the fly because there's no documented
convention for how we write ADRs (status lifecycle, required sections). → TODO 10.2 —
decide a minimal ADR template / status convention.
- `[recurring]` **Brainstorming's "user reviews spec" gate fires despite a standing
agreement to skip it** (2026-06-10): writing the ADR-structure spec, I stopped to ask
the user to review the finished spec before writing the plan — the
`superpowers:brainstorming` skill scripts that gate. We had previously agreed I should
move directly from the Q/A to the implementation plan once the spec is written. Same
shape as the execution-mode-menu signal: an external skill's script conflicting with a
boma convention, where prose reminders don't hold. → consider a mechanical guard
(Stop-hook family) or a CLAUDE.md/skill-override note that suppresses the spec-review
gate.
- `[recurring]` **Subagent faithfulness self-reports can be wrong — controller must
diff** (2026-06-10): during the ADR-023 retroactive restructure, an implementer
subagent reported "0 substantive deletions, the See-also lines reappear verbatim" for
ADR-014, but it had actually dropped the cross-reference lines. Caught only by the
controller independently running `git show <sha> | grep '^-[^-]'`. For
faithfulness-critical edits delegated to subagents, the agent's own audit is not
sufficient evidence. → systematize a controller-side deletion-audit step (every `-`
line must be a classified, expected change) before accepting any "presentational-only"
restructure; consider a helper script.
---

View file

@ -1,5 +1,9 @@
# ADR-001 — Architecture overview
## Status
Accepted (2026-05-30)
## Context
This document describes the overall architecture of the homelab infrastructure
@ -65,3 +69,21 @@ This architecture prioritises:
- **Simplicity**: few moving parts, no orchestration layer (no Kubernetes, no Swarm)
- **Reproducibility**: any host can be rebuilt from scratch via Ansible
- **Legibility**: a human reading the repo can understand what runs where
## Consequences
Drawn from the boundaries this ADR already states:
- The small fleet (25 VMs) is treated as individuals, not cattle (per Infrastructure),
and forgoing an orchestration layer is the cost of the simplicity priority (per
Decision).
- The control node `ubongo` cannot be created by the Terraform it hosts, so it is
provisioned manually — the one documented exception to Terraform-owned VM existence
(per Infrastructure / Host groups; ADR-009, ADR-015).
- Management scope is deliberately bounded: Proxmox configuration itself (storage,
clustering, networking) is out of scope, and the `control` group never runs the
`docker_host` role (per Host groups).
- Compose files are always regenerated by Ansible on deploy; no hand-edited Compose
files exist on hosts (per Service interaction model).
- The "What this repo manages" table describes the *intended* design — STATUS.md
records what is actually built (per that section).

View file

@ -1,5 +1,9 @@
# ADR-002 — Security baseline and strategy
## Status
Accepted (2026-05-30)
## Context
Security here is not a single control but the sum of several combined efforts —
@ -183,3 +187,27 @@ This posture was chosen to be:
Out-of-scope items and conscious trade-offs are recorded in
`docs/security/accepted-risks.md` rather than here, so this decision record stays
stable while the risk posture evolves.
## Consequences
Drawn from the trade-offs, scoping, and follow-on work this ADR already states:
- Targeted/physical adversaries are out of scope at this scale, and supply chain is
consciously deprioritized — active vuln scanning is deferred as an accepted risk
(per Threat model; `docs/security/accepted-risks.md`).
- SELinux is not used (non-native to Debian, redundant with AppArmor), recorded as an
accepted risk (per Mandatory access control).
- Some CIS L2 items require separate partitions with restrictive mount options, which
reaches into VM disk layout — a provisioning concern (Terraform / cloud-init, ADR-006),
not just the `base` role (per Hardening standard). Any impractical CIS item is exempted
into the accepted-risk register with rationale, recording named exceptions rather than a
blanket opt-out.
- Several controls and governance mechanisms are stated as planned, not yet built:
Suricata network IDS, active alerting wiring AIDE/`auditd`/`fail2ban`/Suricata plus
log-source-silence into Grafana, the `/security-review` skill and its aggregation of
every `roles/*/SECURITY.md`, and the periodic security review (per File integrity /
Governance; STATUS.md / `docs/TODO.md`).
- The per-service security bar is enforced manually in review today, pending the planned
`/security-review` automation (per Governance).
- The accepted-risk register is kept out of this ADR so the record stays stable while the
risk posture evolves (per Decision; `docs/security/accepted-risks.md`).

View file

@ -1,6 +1,20 @@
# ADR-003 — Toolchain decisions
## Execution engine
## Status
Accepted (2026-05-30)
## Context
boma needs a defined, reproducible toolchain for running and testing its Ansible
monorepo: an execution engine, a Python environment, secrets handling, a testing
framework, linting, CI/CD, developer-ergonomics conventions, and a collections/roles
policy. This ADR records the choice made for each, together with the alternatives
weighed and why they were not adopted.
## Decision
### Execution engine
**Choice**: `ansible-core` (pip-installed, pinned version) + explicit `requirements.yml`
@ -12,7 +26,7 @@ that isn't needed in a maintained monorepo.
---
## Python environment
### Python environment
**Choice**: `python3-venv` (system Python on Debian 13) + pinned `requirements.txt`
@ -24,7 +38,7 @@ reproducible, and has no extra dependencies.
---
## Secrets
### Secrets
**Choice**: Ansible Vault (file-based, built-in)
@ -40,7 +54,7 @@ CLAUDE.md → Secrets).
---
## Testing
### Testing
**Choice**: Molecule with Docker driver (`molecule-plugins[docker]`)
@ -59,7 +73,7 @@ are needed.
---
## Linting
### Linting
**Choice**: `ansible-lint` + `yamllint` + `pre-commit`
@ -71,7 +85,7 @@ Config files: `.ansible-lint`, `.yamllint` in repo root.
---
## CI/CD
### CI/CD
**Choice**: Forgejo Actions (self-hosted at forgejo.nyumbani.baobab.band) + `act_runner`
@ -87,7 +101,7 @@ a dedicated runner VM later if CI load warrants a separate host.
---
## Developer ergonomics
### Developer ergonomics
**Choice**: `Makefile` as the single interface for all operations
@ -102,7 +116,7 @@ The venv is activated in the user's shell profile.
---
## Collections and roles policy
### Collections and roles policy
**No Galaxy roles.** All roles are written and maintained locally in `roles/`.
Galaxy roles introduce external state, versioning surprises, and implicit
@ -136,3 +150,24 @@ are removed. Each entry in `requirements.yml` must justify its presence.
| NixOS targets | Poor Ansible fit; all hosts standardised on Debian 13 |
Terraform is **adopted** for VM provisioning only (no DNS) — see `docs/decisions/006-terraform.md`.
## Consequences
Drawn from the rationale and trade-offs this ADR already states:
- Pinning `ansible-core` + an explicit `requirements.yml` and a plain pinned venv keeps
the control-node environment small and fully reproducible, at the cost of maintaining
the pins (per Execution engine / Python environment).
- Ansible Vault's whole-file encryption makes diffs unreadable regardless of layout, so
secrets are organised for human lookup (`vault.<service>.<key>`) rather than diff
ergonomics — the trade accepted against SOPS/age (per Secrets).
- The `Makefile` is the single interface: Claude Code and CI invoke the same targets, so
local and CI behaviour can't drift and collaborators need not know raw flags (per
Developer ergonomics).
- Collections are added only on demand, so `requirements.yml` stays minimal; this defers
`community.crypto` (use `openssl` CLI until a role needs certs) and `community.general`
(add only the specific sub-module needed) until a real need appears (per Collections
and roles policy).
- The heavier orchestration tools were declined for this scale, each with a named
revisit trigger — e.g. Semaphore if non-SSH operators must trigger runs, AWX-adjacent
tooling only if AWX/AAP is ever adopted (per "What was explicitly ruled out").

View file

@ -1,5 +1,9 @@
# ADR-004 — Docker and Compose service model
## Status
Accepted (2026-05-30)
## Context
All services run as Docker containers managed via Docker Compose. This document
@ -107,3 +111,22 @@ Docker Compose was chosen over Kubernetes/Swarm because:
- Compose files are human-readable and easily auditable
- No distributed state to manage
- Straightforward to back up and restore
## Consequences
Drawn from the trade-offs and deferred items this ADR already states:
- A shared `compose_service` engine role is intentionally not built: the ~5 standard
tasks are duplicated per role in favour of legible, self-contained roles, with a stated
revisit trigger — extract a shared engine if maintaining the duplicated mechanics
becomes painful (a pattern change touching many roles, or drift this standard alone
isn't preventing) (per "Why not a shared engine").
- Forgoing Kubernetes/Swarm is the deliberate cost of matching complexity to a 25 host
fleet with no distributed state to manage (per Decision).
- User-namespace remapping is not enabled by default — evaluated per use case (per Docker
daemon configuration).
- Bare `latest` is acceptable only on the stateless tier; the stateful tier is always
pinned `tag@digest`, and image updates are a deliberate operation (per Image management;
ADR-011).
- Backup strategy is stated as defined separately, not in scope of this ADR (per Persistent
data).

View file

@ -1,5 +1,9 @@
# ADR-005 — Host bootstrapping
## Status
Accepted (2026-05-30)
## Context
This document defines the **cloud-init template** that managed VMs are cloned
@ -81,3 +85,19 @@ Cloud-init with Proxmox templates provides:
- No manual installer interaction
- A clean handoff point to Ansible
- Easy rebuilds — destroy VM, clone template, run Ansible
## Consequences
Drawn from the trade-offs and special cases this ADR already states:
- The cloud-init image was chosen over a manual Debian installer (slow, error-prone,
not reproducible) and over preseed/netboot (powerful but complex to maintain) (per
Approach).
- Template creation is a one-time manual procedure per Proxmox cluster, and the template
is never booted directly (per Template creation).
- There is no manual `qm clone` path for managed hosts; the full create → inventory →
configure pipeline and the Terraform↔Ansible contract live in ADR-009 (per VM
provisioning / Ansible handoff).
- The control node is the sole documented exception — `ubongo`, a physical machine
installed by hand because it cannot be created by the Terraform it hosts (chicken-and-egg);
its hardware target and recovery model live in ADR-015 (per Control node bootstrapping).

View file

@ -1,5 +1,9 @@
# ADR-006 — Terraform for infrastructure provisioning
## Status
Accepted (2026-05-30)
## Context
Ansible manages host configuration well but has no state model for infrastructure
@ -13,7 +17,9 @@ exact boundary, handoff pipeline, and data contract between them live in **ADR-0
---
## Responsibility split
## Decision
### Responsibility split
The canonical responsibility-split table lives in **ADR-009**. In short: Terraform
owns VM existence only; Ansible owns everything inside a VM, including all internal
@ -26,7 +32,7 @@ cadence, making them a poor fit for Terraform state.
---
## Providers
### Providers
**`bpg/proxmox` (`~> 0.70`)**: Chosen over `telmate/proxmox` for active maintenance,
full Proxmox 8 API support, and better cloud-init integration. This is the only
@ -42,7 +48,7 @@ Terraform manages its own provider dependencies via `required_providers` and
---
## State backend
### State backend
**Choice**: Local state on the control node.
@ -59,7 +65,7 @@ integration boundary.
---
## Structure
### Structure
```
terraform/
@ -83,7 +89,7 @@ Each environment directory contains:
---
## Secrets handling
### Secrets handling
The only secret input (the Proxmox API token) is passed via a `TF_VAR_*`
environment variable and declared `sensitive = true` in `variables.tf`. It never
@ -92,7 +98,7 @@ appears in `.tfvars` files. Non-secret configuration lives in tracked
---
## Ansible integration
### Ansible integration
After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate
`inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline,
@ -102,7 +108,7 @@ handoff)**.
---
## What was ruled out
### What was ruled out
| Option | Reason |
|---|---|
@ -110,3 +116,24 @@ handoff)**.
| OPNsense Terraform provider | Community-maintained; provider rot risk across OPNsense releases |
| Terraform workspaces | Single state file with workspace prefix; accidental cross-env apply possible |
| Separate Terraform repo | Cross-referencing between infra and config adds friction; monorepo keeps the full picture together |
## Consequences
Drawn from the "What was ruled out" section and the decisions stated above:
- `bpg/proxmox` is the only provider; `telmate/proxmox` was ruled out for weaker
maintenance and Proxmox 8 / cloud-init support (Providers; What was ruled out).
- OPNsense stays entirely in Ansible — no Terraform OPNsense provider — to avoid
community-provider rot across OPNsense releases (Responsibility split; What was
ruled out).
- Terraform writes no DNS records; Ansible's `dns` role owns the entire internal
zone, avoiding the bootstrap cycle and split DNS ownership the earlier
`hashicorp/dns` design created (Providers).
- State is local on the control node because Forgejo offers no usable HTTP state
backend; this is sufficient at solo-operator scale (no concurrent applies, no
remote locking), with a real backend such as MinIO/S3 to be added later if
warranted (State backend).
- Separate environment directories are used instead of Terraform workspaces to
remove the risk of applying the wrong state (Structure; What was ruled out).
- Terraform and Ansible internals are kept in one monorepo rather than a separate
Terraform repo to avoid cross-referencing friction (What was ruled out).

View file

@ -1,5 +1,9 @@
# ADR-007 — Network topology and addressing
## Status
Accepted (2026-05-30)
## Context
The boma homelab is a Proxmox cluster on a dedicated private network behind an
@ -10,7 +14,9 @@ and OPNsense configuration.
---
## Physical topology
## Decision
### Physical topology
```
ISP
@ -38,7 +44,7 @@ ISP
---
## VLAN design
### VLAN design
| VLAN | Name | Subnet | Purpose |
|---|---|---|---|
@ -51,9 +57,9 @@ ISP
---
## IP addressing
### IP addressing
### VLAN 10 — mgmt (10.10.0.0/24) — no DHCP
#### VLAN 10 — mgmt (10.10.0.0/24) — no DHCP
| Address | Host |
|---|---|
@ -63,7 +69,7 @@ ISP
| `10.10.0.201` | `pve1` |
| `10.10.0.202` | `pve2` |
### VLAN 20 — srv (10.20.0.0/24) — no DHCP, all static
#### VLAN 20 — srv (10.20.0.0/24) — no DHCP, all static
| Range | Purpose |
|---|---|
@ -81,28 +87,28 @@ Assigned infrastructure addresses:
| `10.20.0.12` | `proxy` | Reverse proxy |
| `10.20.0.13` | `homeassistant` | Home Assistant (IoT controller) |
### VLAN 30 — lan (10.30.0.0/24)
#### VLAN 30 — lan (10.30.0.0/24)
| Range | Purpose |
|---|---|
| `10.30.0.1` | OPNsense gateway |
| `10.30.0.100``.249` | DHCP pool |
### VLAN 40 — iot (10.40.0.0/24)
#### VLAN 40 — iot (10.40.0.0/24)
| Range | Purpose |
|---|---|
| `10.40.0.1` | OPNsense gateway |
| `10.40.0.100``.249` | DHCP pool |
### VLAN 50 — guest (10.50.0.0/24)
#### VLAN 50 — guest (10.50.0.0/24)
| Range | Purpose |
|---|---|
| `10.50.0.1` | OPNsense gateway |
| `10.50.0.100``.249` | DHCP pool |
### VLAN 99 — vpn — retired
#### VLAN 99 — vpn — retired
The OPNsense WireGuard VPN (`10.99.0.0/24`) is **replaced by the NetBird mesh**
(ADR-016). Remote access for `ubongo`, `askari`, and road-warrior clients rides a
@ -111,7 +117,7 @@ NetBird self-hosted on `askari`. NetBird manages its own overlay addressing
(default `100.64.0.0/10`); no boma VLAN/subnet is allocated for it, and
`10.99.0.0/24` is freed.
### Corosync ring (172.16.0.0/24) — not on managed switch
#### Corosync ring (172.16.0.0/24) — not on managed switch
| Address | Host |
|---|---|
@ -121,7 +127,7 @@ NetBird self-hosted on `askari`. NetBird manages its own overlay addressing
---
## OPNsense firewall rules (intent)
### OPNsense firewall rules (intent)
| Source | Destination | Policy |
|---|---|---|
@ -142,7 +148,7 @@ IoT devices cannot initiate connections to `srv`.
---
## Naming scheme
### Naming scheme
| Layer | Convention | Examples |
|---|---|---|
@ -155,7 +161,7 @@ IoT devices cannot initiate connections to `srv`.
---
## DNS zones and split-horizon
### DNS zones and split-horizon
**Internal zone**: `boma.baobab.band` — served by `dns1` and `dns2`.
The zone is rendered by the Ansible `dns` role: host A records come from the
@ -175,7 +181,7 @@ All other queries go upstream (e.g., `1.1.1.1`, `9.9.9.9`).
---
## External monitoring — askari
### External monitoring — askari
`askari` (Hetzner VPS) is a peer on the **NetBird mesh** (ADR-016) and also **hosts
the self-hosted NetBird coordinator** (management/signal/relay). It reaches `srv`
@ -186,3 +192,24 @@ ACLs — no OPNsense WireGuard tunnel and no `10.99.0.0/24` routing.
be reachable even when the homelab is down (its entire purpose), which is also why
the mesh coordinator lives here: an off-site control plane survives a homelab outage.
FQDN: `askari.baobab.band`.
---
## Consequences
Drawn from the implications already stated above:
- VLAN 99 (`vpn`, `10.99.0.0/24`) is retired and the subnet freed; remote access is
carried by the self-hosted NetBird mesh instead of an OPNsense WireGuard subnet
(VLAN design; IP addressing — VLAN 99 retired).
- Mesh-peer firewall allowances (to `srv` metrics ports and `mgmt`) are enforced by
NetBird ACLs, not OPNsense rules (OPNsense firewall rules (intent)).
- IoT devices cannot initiate connections to `srv`; only Home Assistant at
`10.20.0.13` may reach the IoT VLAN, with OPNsense Avahi bridging `srv``iot`
for discovery (OPNsense firewall rules (intent)).
- Terraform writes no DNS records; the Ansible `dns` role renders the internal zone
from inventory plus `group_vars`, with `dns1`/`dns2` serving split-horizon answers
(DNS zones and split-horizon).
- `askari` runs independently of the cluster so it survives a homelab outage, which
is why the off-site NetBird control plane lives there (External monitoring —
askari).

View file

@ -3,6 +3,10 @@
> Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`,
> apply-path coverage blind spots) live in `docs/testing/gotchas.md`.
## Status
Accepted (2026-05-30)
## Context
Ansible roles must be idempotent and correct before they touch production hosts.
@ -11,9 +15,11 @@ This document records the testing strategy, what each level covers, and — crit
---
## Three testing levels
## Decision
### Level 1 — Molecule (per role, always required)
### Three testing levels
#### Level 1 — Molecule (per role, always required)
Runs in Docker on the control node (`ubongo`) or in CI. Fast (~5 min per role).
@ -41,7 +47,7 @@ The idempotency step is non-negotiable. Every role must pass it cleanly.
that: svc.stdout == "active"
```
### Level 2 — Staging playbook (full stack, real VMs)
#### Level 2 — Staging playbook (full stack, real VMs)
`make check PLAYBOOK=site` followed by `make deploy PLAYBOOK=site` on
Terraform-provisioned staging VMs. Catches inter-role dependencies and ordering
@ -50,13 +56,13 @@ have already run and configured the firewall).
Run before every merge to `main`.
### Level 3 — External smoke test from askari
#### Level 3 — External smoke test from askari
Once `askari` is operational: scripted checks from outside the network confirming
that public-facing services respond correctly. Catches firewall and reverse proxy
configuration issues invisible to Ansible check mode.
### Level 4 — Service-UI acceptance (Claude-driven exploratory)
#### Level 4 — Service-UI acceptance (Claude-driven exploratory)
A Claude-driven exploratory check of a service's **application UI**, run as
`/verify-service <name>` on `ubongo` (ADR-017). Claude drives Chromium via the
@ -78,7 +84,7 @@ deploy (STATUS.md). Full design: ADR-017.
---
## Molecule test image
### Molecule test image
**No external images.** The project builds and hosts its own test image.
@ -103,7 +109,7 @@ functionally equivalent and fully owned.
---
## Idempotency requirements
### Idempotency requirements
Every role task must satisfy one of these:
@ -121,9 +127,9 @@ catches anything lint misses.
---
## What Molecule tests — and what it does not
### What Molecule tests — and what it does not
### Tested in Molecule
#### Tested in Molecule
| Capability | Notes |
|---|---|
@ -139,7 +145,7 @@ catches anything lint misses.
| auditd installation and configuration | Install and config file |
| Idempotency of all of the above | Enforced by Molecule's idempotency step |
### Not tested in Molecule — explicit exceptions
#### Not tested in Molecule — explicit exceptions
The following require a real kernel or real hardware and are validated only at
Level 2 (staging) or Level 3 (external). This is a conscious, documented decision
@ -161,7 +167,7 @@ Behavioural correctness is confirmed on staging.
---
## CI pipeline
### CI pipeline
```
push to main
@ -178,3 +184,27 @@ promote to production
Manual gates are intentional. Automated tests prove correctness in isolation;
a human confirms the change is safe to promote.
---
## Consequences
Drawn from the limitations and trade-offs already stated above:
- The Molecule idempotency step is non-negotiable; every role must pass it cleanly
(Three testing levels — Level 1).
- A class of capabilities (nftables rule loading, NetBird mesh data plane,
unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware
passthrough, corosync cluster formation) cannot be verified in Molecule and is
validated only at Level 2 (staging) or Level 3 (external) — a conscious,
documented decision, not a gap (What Molecule tests — and what it does not).
- The project builds and hosts its own `molecule-debian13` image rather than relying
on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a
custom image to avoid drift, disappearance, or unexpected changes outside project
control (Molecule test image).
- Level 4 service-UI acceptance is authorable now but its execution is deferred,
pending `ubongo`, the `playwright` plugin, Authentik, and a staging deploy (Three
testing levels — Level 4).
- Promotion to staging and to production stays behind intentional manual approval
gates; automation proves isolated correctness, a human confirms promotion safety
(CI pipeline).

View file

@ -1,5 +1,9 @@
# ADR-009 — Terraform ↔ Ansible provisioning handoff
## Status
Accepted (2026-05-30)
## Context
Two tools touch every managed host. Terraform owns **what exists** — VMs on
@ -14,7 +18,9 @@ the cloud-init template that VMs are cloned from. This ADR covers how they conne
---
## The boundary
## Decision
### The boundary
| Layer | Tool | Notes |
|---|---|---|
@ -31,7 +37,7 @@ below).
---
## The handoff pipeline
### The handoff pipeline
There is one path by which a managed host comes into existence and reaches its
configured state:
@ -55,7 +61,7 @@ this pipeline — **never** by hand-editing the inventory.
---
## The data contract
### The data contract
The seam's interface is a single Terraform output consumed by a single script.
@ -88,7 +94,7 @@ Terraform, and the inventory is regenerated, never edited.
---
## Cloud-init's role
### Cloud-init's role
Cloud-init is the thin first-boot layer between Terraform and Ansible:
@ -103,7 +109,7 @@ The line is sharp: cloud-init buys *reachability*, Ansible owns *configuration*.
---
## Internal DNS — owned by Ansible, no chicken-and-egg
### Internal DNS — owned by Ansible, no chicken-and-egg
Terraform writes **no** DNS records. The internal zone (`boma.baobab.band`) is
rendered entirely by the Ansible `dns` role:
@ -129,7 +135,7 @@ convention only — it no longer implies any difference in how records are writt
---
## The control-node exception
### The control-node exception
The control node — the host that runs Terraform and Ansible — is `ubongo`, a
dedicated **physical** machine outside the cluster. It is not a VM at all, so
@ -146,7 +152,7 @@ Every other host is Terraform-managed.
---
## What was ruled out
### What was ruled out
| Option | Reason |
|---|---|
@ -154,3 +160,25 @@ Every other host is Terraform-managed.
| Hand-editing the generated inventory | `hosts.yml` is a build artifact of `tf_to_inventory.py`; edits are overwritten on the next `make tf-inventory`. Edit `local.vms` instead. |
| Documenting the seam in both ADR-005 and ADR-006 | The boundary belongs in exactly one place. Those ADRs link here. |
| Terraform-managed DNS records (`hashicorp/dns` + RFC 2136) | Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle. |
## Consequences
Drawn from the boundary, the data contract, and the "What was ruled out" section above:
- Adding a host means editing `local.vms` and running the handoff pipeline; the
generated `hosts.yml` is a build artifact and must never be hand-edited — manual
edits are overwritten on the next `make tf-inventory` (The handoff pipeline; The
data contract; What was ruled out).
- Manual `qm clone` is rejected as a general provisioning path so the inventory and
real infrastructure cannot drift; Terraform is the single way VMs come into
existence (What was ruled out).
- Terraform writes no DNS records: the Ansible `dns` role renders the whole internal
zone from inventory plus `group_vars`, dissolving the bootstrap cycle a
Terraform-managed zone (`hashicorp/dns` + RFC 2136) would create (Internal DNS —
owned by Ansible, no chicken-and-egg; What was ruled out).
- The control node (`ubongo`) is the single documented exception to "Terraform owns
VM existence" — a physical machine provisioned manually and managed by Ansible for
baseline config only; every other host is Terraform-managed (The control-node
exception).
- The seam is documented in exactly one place (this ADR); ADR-005 and ADR-006 link
here rather than restating it (What was ruled out).

View file

@ -1,5 +1,9 @@
# ADR-010 — Forgejo integration and CI
## Status
Accepted (2026-05-30)
## Context
boma's git host, container registry, and (planned) CI all run on a self-hosted
@ -20,7 +24,7 @@ held to the same standard as the rest of the repo's secrets.
---
## Decisions
## Decision
### 1. API tokens are managed secrets, least-privilege
@ -75,3 +79,21 @@ later if CI load warrants a separate host. Actions is not yet enabled — see ST
| Terraform Forgejo HTTP state backend | Forgejo's `/raw/` API is read-only; state can't be written there. Local state instead (ADR-006). |
| Admin-scoped automation tokens | Unnecessary privilege; scope to `read:repository` + `read`/`write:package`. |
| Ad-hoc UI/API configuration as the norm | Becomes undocumented drift; codify or document instead. |
---
## Consequences
- The planned CI pipeline (see "CI pipeline (planned)") is trunk-based per ADR-003 /
ADR-008 — `push to main → lint + Molecule → deploy staging → [manual gate] → deploy
production` — running `act_runner` on `ubongo` (or a dedicated runner VM later if CI
load warrants); Actions is not yet enabled, so this remains future work tracked in
STATUS.md.
- Terraform state is not held in Forgejo: its `/raw/` API is read-only and cannot be
written, so local state is used instead (ADR-006) (see "What was ruled out").
- Automation tokens are scoped to `read:repository` + `read`/`write:package` rather
than admin, accepting the limits that least-privilege imposes on what automation can
do (see "What was ruled out").
- Instance/repo configuration must be codified or documented rather than changed
ad-hoc, to avoid the undocumented drift `/review-repo` exists to catch (see "What was
ruled out").

View file

@ -1,6 +1,9 @@
# ADR-011 — Update and upgrade management
**Status: Proposed — draft for discussion (not yet accepted).**
## Status
Proposed (2026-06-04) — draft for discussion; not yet accepted. The core decisions
below are settled in intent, but several specifics remain open (see "Open questions").
## Context
@ -10,7 +13,7 @@ drift over time and must be kept current without breaking the homelab: the **hos
---
## Decisions
## Decision
### 1. Every service is classified stateful or stateless
@ -132,3 +135,19 @@ alert-driven.
| 8-weekly as the only stateful path | Too slow for urgent CVEs — hence the DIUN security fast-path. |
---
## Consequences
- A single uniform update policy is rejected: the stateful/stateless split is
load-bearing, so stateless services roll on rolling tags while stateful services are
pinned `tag@digest`, human-gated, and backup-first (see "What was ruled out").
- The weekly run never touches stateful services and the whole fleet is never updated
at once, accepting the added orchestration of host ordering and an 8-weekly +
fast-path cadence in exchange for bounded blast radius (see "What was ruled out").
- No update automation ships until the health-check verification gate is in order; the
pipeline is deliberately sequenced behind that harness (see Decision 6).
- Several points remain open for discussion (see "Open questions"): where the Proxmox
snapshot is driven from across the TF/Ansible boundary; the exact cadences; where the
health-check harness lives and the minimum bar that counts as "in order"; whether
classification is a per-role `__stateful` flag or a group_vars list; whether the
weekly run hits staging first; and the notification + "skip/pause" control channel.

View file

@ -1,5 +1,9 @@
# ADR-012 — Hardware reference & capacity evaluation
## Status
Accepted (2026-06-01)
## Context
The repo modelled the logical/network layer (Terraform VM specs, ADR-007

View file

@ -1,5 +1,9 @@
# ADR-013 — Heritage: learning from AnsibleBaobabV4 without inheriting it
## Status
Accepted (2026-06-04)
## Context
boma is the methodology successor to AnsibleBaobabV4 (and V3 before it) — not a new
@ -10,7 +14,9 @@ structure and assumptions creep back in under the guise of "inspiration." This A
sets the policy for drawing on V4 without inheriting it. (Resolves the questions
previously parked in TODO 3.3 and 10.1.)
## Principle — translate, don't transplant
## Decision
### Principle — translate, don't transplant
V4 is **evidence, never authority.** It can show what was needed or what went wrong;
it can never be the reason boma does something a certain way.
@ -21,7 +27,7 @@ it can never be the reason boma does something a certain way.
- **Acceptance test** for anything V4-derived: *can it be justified purely from
boma's principles, with zero reference to V4?* If not, it does not land.
## What V4 is — and is not — a source of
### What V4 is — and is not — a source of
| Legitimate source of | Never a source of |
|---|---|
@ -33,7 +39,7 @@ it can never be the reason boma does something a certain way.
Only concrete, verifiable, low-level knowledge crosses over — precisely because it is
safe to re-derive, whereas structure and requirements drag assumptions along.
## Provenance — transient only
### Provenance — transient only
When a boma decision was prompted by a V4 lesson, or a config adapted from V4, the
lineage is recorded only in **transient** places: the commit message, the working
@ -42,7 +48,7 @@ extraction warrants one. **Durable artifacts (ADRs, role READMEs, `SECURITY.md`)
stand on boma's own terms with no V4 reference.** Honest about lineage in history;
clean in the living repo.
## AI consultation guardrails
### AI consultation guardrails
The AI is the main consumer of V4 — it is on disk and readable. When consulting it:

View file

@ -1,5 +1,9 @@
# ADR-014 — Sourcing technical knowledge (docs and best practices)
## Status
Accepted (2026-06-04)
## Context
Most work in boma is done by AI agents drawing on training memory, which is stale
@ -100,5 +104,27 @@ above keeps the policy working.
- Commit to the principle, not a tool — degrade to `WebFetch`/`WebSearch` when plugins
are absent.
See also: ADR-013 (heritage / translate-don't-transplant), ADR-011 (version pinning),
ADR-008 (testing/verification).
## Consequences
Drawn from the follow-on work and limitations this ADR already states:
- Verified facts carry a durable, greppable stamp; a stamp binds a fact to a pinned
version, so a `requirements` change or image upgrade marks exactly what to re-check
(per Capture / Re-verification).
- Stale-stamp detection — a `/review-repo` or `/security-review` check flagging stamps
whose recorded version no longer matches what is pinned — is a noted enhancement, not
built yet (per Re-verification).
- Any version-specific claim given from memory must be marked "from memory, unverified"
as a transparency backstop, since agent self-assessed certainty is unreliable (per
When consulting is required).
- The policy commits to the principle rather than a specific plugin, so it degrades to
`WebFetch`/`WebSearch` on a bare install; reproducing the plugin toolchain from the
repo is done via `.claude/settings.json` and `docs/runbooks/claude-code-setup.md`,
with the graceful-degradation fallback covering a fresh clone until bootstrap runs
(per Source hierarchy / Reproducibility of the toolchain).
## Related
- ADR-013 — heritage / translate-don't-transplant.
- ADR-011 — version pinning.
- ADR-008 — testing / verification.

View file

@ -1,5 +1,9 @@
# ADR-015 — Control / development / AI-worker host (`ubongo`)
## Status
Accepted (2026-06-05)
## Context
Earlier ADRs framed the control node — the host that runs Terraform and Ansible —

View file

@ -90,7 +90,7 @@ allocated for it.
## Status
Designed, not built — depends on the unbuilt `base` role and service-role machinery
Accepted (2026-06-05). Designed, not built — depends on the unbuilt `base` role and service-role machinery
(STATUS.md). This ADR records the decision and doc reconciliation; role tasks land when
`base` exists.
@ -108,3 +108,22 @@ Designed, not built — depends on the unbuilt `base` role and service-role mach
See also: ADR-007 (network — amended), ADR-015 (control host), ADR-002 (security),
ADR-011 (version pinning), ADR-004 (one service = one role), ADR-009 (TF↔Ansible
handoff), ADR-013 (heritage — V4 ran WireGuard; NetBird is translated, not transplanted).
## Consequences
- A new public surface appears on `askari` — management API + dashboard (80/443) +
Coturn (3478) — mitigated by TLS, embedded-IdP login, source-IP limits where
practical, `base` hardening and version-pinned NetBird, and recorded as accepted-risk
R3 (Security).
- On-LAN SSH never depends on the mesh: `base` allows inbound SSH from `ubongo`'s LAN
address as a mesh-independent secondary path, so a mesh/coordinator outage never
blocks on-LAN SSH and Ansible stays off the mesh (Security; Recovery & operations).
- The mesh survives a homelab outage because the coordinator is off-site on `askari`,
with its management datastore backed up encrypted off `askari` and peers keeping
last-known config through a brief coordinator outage (Recovery & operations).
- Choosing NetBird over plain OPNsense WireGuard, Tailscale, Tailscale+Headscale, an
on-cluster coordinator, a `ubongo` subnet router, and a standalone IdP gains
identity/ACL policy, self-hosted sovereignty, no routing SPOF, and a light single
operator footprint (What was ruled out).
- Implementation is pending: the role tasks land only once the unbuilt `base` role and
service-role machinery exist (Status).

View file

@ -65,7 +65,7 @@ them.
## Status
Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
Accepted (2026-06-05). Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.
@ -90,3 +90,21 @@ template, the `/verify-service` skill, the convention/checklist/Further-reading
See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
## Consequences
- The harness is confined to staging by a hard stop: it refuses to run against
production because exploratory clicking is destructive, the blast radius is bounded to
the target service, and test users live only in the staging `test` group (Safety).
- No secrets leak: the git-ignored screenshot dir is the safety boundary and credential
screens are avoided (Safety; Reporting & manual handoff).
- Test identities are ephemeral per-run credentials in the staging Authentik only —
never production, none persisted in `vault.yml` — created reuse-or-create and torn
down via staging rebuild or `test`-group cleanup (Test-user standard).
- Anything Claude cannot exercise (physical device, paid/external flow, subjective
judgment) is handed off via a structured manual-test checklist in the run report
(Reporting & manual handoff).
- Authoring is possible now (this ADR, the `VERIFY.md` template, the `/verify-service`
skill, conventions/checklist edits), but running is deferred on its dependencies:
`ubongo`, the `playwright` plugin, Authentik, a staging deploy, and `make new-role`
scaffolding `VERIFY.md` (Status; Dependencies).

View file

@ -72,7 +72,7 @@ tracked allocation in `docs/hardware/reference.md` (ADR-012).
## Status
Designed. **Authorable now:** this ADR + the ADR-002/CAPABILITIES/ADR-012/
Accepted (2026-06-06). Designed. **Authorable now:** this ADR + the ADR-002/CAPABILITIES/ADR-012/
accepted-risks/STATUS/TODO reconciliations. **Deferred on the stack:** Alloy-in-`base`,
the `loki`/`grafana` service roles, OPNsense syslog config, the push-only credential,
and the live pipeline.
@ -97,3 +97,26 @@ the metrics stack (Prometheus / `node_exporter`) for SSD-wearout + log-silence a
See also: ADR-002 (security baseline — realised here), ADR-016 (mesh / `askari`),
ADR-007 (OPNsense / `askari`), ADR-012 (hardware/capacity), ADR-004 (service-role
standard), ADR-011 (health checks — distinct from this).
## Consequences
- Opportunistic track-covering and host-pivot-to-store are defeated because logs leave
the host in near-real-time and the off-cluster security trail is append-only, so it
survives full-cluster compromise (Security, integrity & residual risks).
- Conscious residuals remain: append-only is not cryptographic WORM (root-on-`askari`
could edit chunks — R4); there is a few-seconds un-shipped window; agent compromise
can stop future shipping but not alter shipped history; a stolen push credential
appends noise but cannot delete; and an `askari` outage buffers then flushes on
reconnect (Security, integrity & residual risks).
- A host going silent is itself an alert (Security, integrity & residual risks).
- Only a bounded security subset ships off-site — `auditd`, `authpriv`, `fail2ban`,
AIDE, Suricata and key container security events tagged `security="true"` — while the
cluster Loki holds everything, keeping off-site volume small (Data flow & the security
subset).
- Disk-wear is a managed parameter: log storage on NVMe/SSD or HDD never SD/USB flash,
bounded verbosity at source, tuned Loki retention/compaction, and monitored SSD
wearout/TBW with an alert; log storage is a tracked allocation in
`docs/hardware/reference.md` (Retention & disk-wear).
- The decision is authorable now but the live pipeline is deferred on the stack:
Alloy-in-`base`, the `loki`/`grafana` service roles, OPNsense syslog config, and the
push-only credential (Status; Dependencies).

View file

@ -0,0 +1,106 @@
# ADR-023 — ADR structure & lifecycle
## Status
Accepted (2026-06-10). Meta/doctrine ADR — pins how ADRs are written; the
`adr-structure` check (`scripts/repo-scan.py`) and `docs/decisions/adr-template.md`
ship with it, and ADRs 001018 were retroactively restructured to conform. Resolves
the FRICTION signal (2026-05-31) about ADR-writing policy being unsettled.
## Context
boma records architectural decisions as numbered ADRs in `docs/decisions/`, and
CLAUDE.md treats them as load-bearing. Yet no ADR said how an ADR is written. The
newest ADRs (019022) converged on a clean shape — Status → Context → Decision →
Consequences → Related — but only by imitation. ADRs 001018 predate it and drifted
widely: most lacked a `## Status` section entirely (016018 carried only a trailing
build-state note), and many lacked an explicit `## Decision` or `## Consequences`
heading, their decisions spread across ad-hoc topical sections. The result was
structural drift and no uniform way to tell an active decision from a superseded or
deprecated one.
## Decision
### 1. Title & filename
Title line: `# ADR-NNN — <Title>: <optional clarifying subtitle>` (em-dash). Filename:
`NNN-kebab-title.md`, zero-padded 3-digit, monotonic, never reused — a superseded ADR
keeps its number and file. A new ADR is registered as a row in the CLAUDE.md
"Further reading" table.
### 2. Mandatory sections, in this order
- `## Status` — a lifecycle line, usually `Accepted (YYYY-MM-DD)` (see §4), plus an
optional one-line note.
- `## Context` — the forces, the problem, what exists today, why now.
- `## Decision` — what we are doing; numbered sub-decisions for multi-part ADRs.
- `## Consequences` — results, trade-offs explicitly accepted, follow-on work.
### 3. Optional sections (use only where they genuinely apply)
`## Related`, `## Scope`, `## Guardrails` / `## Enforcement`, `## What was ruled out`,
`## Verified facts (ADR-014)`.
### 4. Status lifecycle
Four states. Because boma is single-contributor and trunk-based with no review gate,
most ADRs are **born `Accepted (YYYY-MM-DD)`** — committed-to on writing. A
**`Proposed`** state exists for a genuine draft whose core direction is recorded but
whose specifics are still open for discussion (e.g. ADR-011); it is promoted to
`Accepted` once settled.
- **`Proposed (YYYY-MM-DD)`** — drafted, under discussion, not yet committed-to. May
carry open questions. Promoted to `Accepted (YYYY-MM-DD)` when decided.
- **`Accepted (YYYY-MM-DD)`** — committed-to. The common starting state.
- Replaced → old ADR's Status becomes **`Superseded by ADR-NNN (YYYY-MM-DD)`**; the new
ADR records `Supersedes ADR-MMM` in its Status and `## Related`. The link is
**bidirectional**.
- Retired with no replacement → **`Deprecated (YYYY-MM-DD)`** + a one-line reason.
**No silent rewrites.** An Accepted ADR is not edited to reverse its decision. Typo and
clarity fixes are fine; a material reversal requires a new ADR and a `Superseded by`
marker on the old one.
### 5. Template & enforcement
`docs/decisions/adr-template.md` is the scaffold for new ADRs. The `/review-repo`
command's pre-scan (`scripts/repo-scan.py`) emits an `adr-structure` finding for any
numbered ADR missing a mandatory section or with an unparseable Status line. It checks
**presence and Status, not section order** — order is a convention the template carries,
deliberately not gated, to keep enforcement lightweight (consistent with boma's other
doctrine ADRs adding no CI gate).
### 6. Retroactive conformance of the back-catalogue
ADRs 001018 are restructured to satisfy this standard rather than grandfathered. The
restructure is **presentational** — existing headings are relabelled, regrouped, or
demoted under a `## Decision` umbrella; a dated `## Status` is added; a `## Consequences`
section is assembled from implications the ADR already states. **The substance of no
decision is changed.** This keeps the check uniform (no number threshold) and the corpus
a consistent, legible decision history.
## Consequences
- New ADRs have one obvious shape and a scaffold; structural drift stops.
- Every ADR declares its lifecycle state uniformly, and reversals are traceable.
- The whole corpus conforms; the check needs no grandfathering and stays simple.
- One-time restructure churn across ADRs 001018 (heading reorganization + a Status and
a Consequences section per file; no decision substance changed).
- `/review-repo` grows one deterministic check; no new CI machinery.
- This ADR is the first conformant example and is held to its own check.
## What was ruled out
- **A `make lint` / CI gate for ADR structure** — heavier than the risk warrants;
the `/review-repo` check and the template suffice.
- **Machine-enforcing section order** — brittle for marginal value; left as a
template-demonstrated convention.
- **Grandfathering 001018 from the check** — rejected in favour of restructuring the
whole corpus to conform, so the standard applies uniformly with no exceptions.
## Related
- ADR-014 — knowledge sourcing (the `Verified facts` optional section).
- ADR-019/020/021/022 — the emergent structure this ADR codifies.
- `docs/decisions/adr-template.md` — the scaffold.
- `scripts/repo-scan.py` — the `adr-structure` enforcement check.

View file

@ -0,0 +1,40 @@
# ADR-NNN — <Title>: <optional clarifying subtitle>
<!-- Filename: NNN-kebab-title.md (zero-padded, monotonic, never reused).
Register a row in CLAUDE.md "Further reading" when this ADR is created.
Sections below in order. Mandatory: Status, Context, Decision, Consequences.
Delete this comment and any optional section you don't use. -->
## Status
Accepted (YYYY-MM-DD)
<!-- Lifecycle: usually born "Accepted (YYYY-MM-DD)"; use "Proposed (YYYY-MM-DD)" for a
genuine draft (open questions), promoted to Accepted once settled. Later:
"Superseded by ADR-NNN (YYYY-MM-DD)" or "Deprecated (YYYY-MM-DD)" + one-line why.
Optional trailing note OK, e.g.
"Accepted (2026-06-10). Doctrine ADR — pins policy, builds nothing yet." -->
## Context
<!-- The forces, the problem, what exists today, why now. -->
## Decision
<!-- What we are doing. Use numbered sub-decisions (### 1. ...) for multi-part ADRs. -->
## Consequences
<!-- Results, trade-offs explicitly accepted, follow-on work. -->
<!-- Optional sections — uncomment any that genuinely apply; never pad:
## Scope — explicit in / out-of-scope boundaries.
## Guardrails — how the decision is mechanically enforced (lint, CI, hooks).
## What was ruled out — rejected alternatives, each with its reason.
## Verified facts (ADR-014) — verified: <subject> · <tool> <version> · <source> · <YYYY-MM-DD>
## Related — links to other ADRs by number; bidirectional for Supersedes/Superseded-by.
-->

View file

@ -0,0 +1,556 @@
# ADR Structure & Lifecycle Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Codify how boma's ADRs are structured — a canonical section set, an Accepted/Superseded/Deprecated lifecycle, a template, a lightweight enforcement check, and a one-time Status backfill of the back-catalogue.
**Architecture:** Five independent units. (1) A pure-function `adr-structure` check added to the existing `scripts/repo-scan.py` (stdlib only, pytest-tested like its siblings), verifying every numbered ADR has the four mandatory sections and a parseable Status line — presence only, not order. (2) An `adr-template.md` scaffold. (3) ADR-023 itself, written to pass its own check. (4) Wiring into CLAUDE.md and the `/review-repo` command doc. (5) A mechanical backfill adding `## Status` to ADRs 001018, dated from each file's first git-commit.
**Tech Stack:** Python 3 stdlib (`scripts/repo-scan.py`), pytest (`.venv/bin/pytest`), Markdown, git.
**Spec:** `docs/superpowers/specs/2026-06-10-adr-structure-design.md`
**Branch:** `feat/adr-structure` (already created; the design spec is the first commit).
**Convention reminders (from CLAUDE.md):** docs-/script-only commits skip the ansible-lint pre-commit hook and need no `rbw` unlock. Imperative subject ≤72 chars. `Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>` trailer on every commit.
---
## Decisions locked by the spec (do not re-litigate)
- **Mandatory sections, in this order:** `## Status`, `## Context`, `## Decision`, `## Consequences`.
- **Optional sections:** `## Related`, `## Scope`, `## Guardrails` / `## Enforcement`, `## What was ruled out`, `## Verified facts (ADR-014)`.
- **Status lifecycle (4 states):** `Proposed (YYYY-MM-DD)` (genuine drafts, e.g. ADR-011) → `Accepted (YYYY-MM-DD)` (the common starting state) → optionally `Superseded by ADR-NNN (YYYY-MM-DD)` or `Deprecated (YYYY-MM-DD)`. (`Proposed` was added on the evidence of ADR-011, which is a real draft with open questions.)
- **No silent rewrites:** material reversal = new ADR + `Superseded by` marker; bidirectional link.
- **Enforcement checks presence + parseable Status line, NOT section order.** Order is demonstrated by the template, not machine-enforced.
- **Back-catalogue is fully restructured (no grandfathering)** — ADRs 001018 are brought to all-four-section conformance. The restructure is **presentational**: relabel/regroup/demote existing headings, add a dated Status, assemble a Consequences section from implications the ADR already states. **The substance of no decision is changed.** If a faithful Consequences cannot be drawn from existing content, escalate that file rather than inventing one.
---
## Task 1: `adr-structure` check in repo-scan.py
**Files:**
- Modify: `scripts/repo-scan.py` (add module-level regexes near the other `_RE` definitions ~line 3844; add `adr_structure_findings()` next to `deferred_findings()` ~line 96; wire it into `scan()` at the `findings.extend(...)` site ~line 215)
- Test: `tests/test_repo_scan.py` (new)
- [ ] **Step 1: Write the failing test**
Create `tests/test_repo_scan.py`:
```python
import importlib.util
import pathlib
_PATH = pathlib.Path(__file__).resolve().parent.parent / "scripts" / "repo-scan.py"
_spec = importlib.util.spec_from_file_location("repo_scan", _PATH)
rs = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(rs)
GOOD = [
"# ADR-099 — Example\n", "\n",
"## Status\n", "\n", "Accepted (2026-06-10)\n", "\n",
"## Context\n", "\n", "Why.\n", "\n",
"## Decision\n", "\n", "What.\n", "\n",
"## Consequences\n", "\n", "So what.\n",
]
def _checks(findings):
return [f for f in findings if f["check"] == "adr-structure"]
def test_good_adr_has_no_findings():
out = rs.adr_structure_findings({"docs/decisions/099-example.md": GOOD})
assert _checks(out) == []
def test_missing_mandatory_section_is_flagged():
lines = [ln for ln in GOOD if not ln.startswith("## Consequences")]
out = _checks(rs.adr_structure_findings({"docs/decisions/099-example.md": lines}))
assert len(out) == 1
assert "Consequences" in out[0]["detail"]
def test_unparseable_status_is_flagged():
lines = [("Designed, not built.\n" if ln == "Accepted (2026-06-10)\n" else ln)
for ln in GOOD]
out = _checks(rs.adr_structure_findings({"docs/decisions/099-example.md": lines}))
assert len(out) == 1
assert "Status not parseable" in out[0]["detail"]
def test_superseded_status_is_accepted():
lines = [("Superseded by ADR-100 (2026-06-11)\n" if ln == "Accepted (2026-06-10)\n"
else ln) for ln in GOOD]
out = _checks(rs.adr_structure_findings({"docs/decisions/099-example.md": lines}))
assert out == []
def test_non_numbered_file_is_skipped():
bare = ["# ADR template\n", "\n", "## Status\n", "\n", "<!-- hint -->\n"]
out = _checks(rs.adr_structure_findings({"docs/decisions/adr-template.md": bare}))
assert out == []
```
- [ ] **Step 2: Run the test to verify it fails**
Run: `.venv/bin/pytest tests/test_repo_scan.py -q`
Expected: FAIL — `AttributeError: module 'repo_scan' has no attribute 'adr_structure_findings'`.
- [ ] **Step 3: Add the regexes**
In `scripts/repo-scan.py`, after the `RESOLVE_WORD_RE = ...` line (~line 44), add:
```python
# ADR-structure check (ADR-023): numbered ADRs must carry the four mandatory
# sections and a parseable Status line. Presence only — section ORDER is a
# template-demonstrated convention, not machine-enforced.
ADR_FILE_RE = re.compile(r"^\d{3}-.*\.md$")
ADR_REQUIRED_SECTIONS = ("Status", "Context", "Decision", "Consequences")
ADR_STATUS_LINE_RE = re.compile(
r"^(Accepted \(\d{4}-\d{2}-\d{2}\)"
r"|Superseded by ADR-\d{3}"
r"|Deprecated \(\d{4}-\d{2}-\d{2}\))")
```
- [ ] **Step 4: Add the check function**
In `scripts/repo-scan.py`, immediately after the `deferred_findings(...)` function (it ends ~line 96, just before `def walk_files():`), add:
```python
def adr_structure_findings(adr_files):
"""adr_files: {rel_path: [lines]} for docs/decisions/*.md.
Flags numbered ADRs (NNN-*.md) missing a mandatory section or whose Status
section has no parseable lifecycle line. Non-numbered files (e.g.
adr-template.md) are skipped. Section order is NOT checked (ADR-023)."""
out = []
for rpath, lines in sorted(adr_files.items()):
if not ADR_FILE_RE.match(os.path.basename(rpath)):
continue
headings = {}
for i, line in enumerate(lines):
m = re.match(r"^##\s+(\w+)", line)
if m:
headings.setdefault(m.group(1), i)
missing = [s for s in ADR_REQUIRED_SECTIONS if s not in headings]
if missing:
out.append({"check": "adr-structure", "severity": "medium",
"path": rpath, "line": 1,
"detail": f"missing mandatory section(s): {', '.join(missing)}"})
if "Status" in headings:
body = []
for line in lines[headings["Status"] + 1:]:
if line.startswith("## "):
break
body.append(line)
status_text = next((ln.strip() for ln in body if ln.strip()), "")
if not ADR_STATUS_LINE_RE.match(status_text):
out.append({"check": "adr-structure", "severity": "medium",
"path": rpath, "line": headings["Status"] + 1,
"detail": "Status not parseable (want 'Accepted (YYYY-MM-DD)', "
"'Superseded by ADR-NNN', or 'Deprecated (YYYY-MM-DD)'); "
f"got: {status_text[:60]!r}"})
return out
```
- [ ] **Step 5: Run the test to verify it passes**
Run: `.venv/bin/pytest tests/test_repo_scan.py -q`
Expected: PASS — 5 passed.
- [ ] **Step 6: Wire the check into `scan()`**
In `scripts/repo-scan.py`, find (~line 215):
```python
findings.extend(deferred_findings(adr_files, defer_refs))
return findings
```
Replace with:
```python
findings.extend(deferred_findings(adr_files, defer_refs))
findings.extend(adr_structure_findings(adr_files))
return findings
```
- [ ] **Step 7: Confirm the check fires on the real (not-yet-backfilled) repo**
Run: `python3 scripts/repo-scan.py 2>/dev/null | python3 -c "import json,sys; print(sorted({f['path'] for f in json.load(sys.stdin)['findings'] if f['check']=='adr-structure'}))"`
Expected: a list including `docs/decisions/001-architecture.md` … through `018-logging.md` (001015 missing Status; 016018 unparseable Status). 019022 and 023 must NOT appear. This proves the check works and previews Task 5's worklist.
- [ ] **Step 8: Commit**
```bash
git add scripts/repo-scan.py tests/test_repo_scan.py
git commit -m "feat(review): add adr-structure check to repo-scan
Flags numbered ADRs missing a mandatory section (Status/Context/Decision/
Consequences) or with an unparseable Status line. Presence only, not order.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
```
---
## Task 2: ADR template
**Files:**
- Create: `docs/decisions/adr-template.md`
- [ ] **Step 1: Write the template**
Create `docs/decisions/adr-template.md` with exactly:
```markdown
# ADR-NNN — <Title>: <optional clarifying subtitle>
<!-- Filename: NNN-kebab-title.md (zero-padded, monotonic, never reused).
Register a row in CLAUDE.md "Further reading" when this ADR is created.
Sections below in order. Mandatory: Status, Context, Decision, Consequences.
Delete this comment and any optional section you don't use. -->
## Status
Accepted (YYYY-MM-DD)
<!-- Lifecycle: "Accepted (YYYY-MM-DD)" → later "Superseded by ADR-NNN (YYYY-MM-DD)"
or "Deprecated (YYYY-MM-DD)" + one-line why. Optional trailing note OK, e.g.
"Accepted (2026-06-10). Doctrine ADR — pins policy, builds nothing yet." -->
## Context
<!-- The forces, the problem, what exists today, why now. -->
## Decision
<!-- What we are doing. Use numbered sub-decisions (### 1. ...) for multi-part ADRs. -->
## Consequences
<!-- Results, trade-offs explicitly accepted, follow-on work. -->
<!-- Optional sections — uncomment any that genuinely apply; never pad:
## Scope — explicit in / out-of-scope boundaries.
## Guardrails — how the decision is mechanically enforced (lint, CI, hooks).
## What was ruled out — rejected alternatives, each with its reason.
## Verified facts (ADR-014) — verified: <subject> · <tool> <version> · <source> · <YYYY-MM-DD>
## Related — links to other ADRs by number; bidirectional for Supersedes/Superseded-by.
-->
```
(HTML comments do not nest — optional sections use one flat comment block with inline
em-dash descriptions, not commented sub-hints inside an outer comment.)
- [ ] **Step 2: Confirm the template is skipped by the check**
Run: `python3 scripts/repo-scan.py 2>/dev/null | python3 -c "import json,sys; print([f for f in json.load(sys.stdin)['findings'] if f['check']=='adr-structure' and 'adr-template' in f['path']])"`
Expected: `[]` (non-numbered filename → skipped).
- [ ] **Step 3: Commit**
```bash
git add docs/decisions/adr-template.md
git commit -m "docs(adr): add adr-template.md scaffold (ADR-023)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
```
---
## Task 3: ADR-023 itself
**Files:**
- Create: `docs/decisions/023-adr-structure.md`
- [ ] **Step 1: Write ADR-023**
Create `docs/decisions/023-adr-structure.md`. It must pass its own check (Status/Context/Decision/Consequences present; parseable Status line). Use this content:
```markdown
# ADR-023 — ADR structure & lifecycle
## Status
Accepted (2026-06-10). Meta/doctrine ADR — pins how ADRs are written; the
`adr-structure` check (`scripts/repo-scan.py`) and `docs/decisions/adr-template.md`
ship with it, and ADRs 001018 were retroactively restructured to conform. Resolves
the FRICTION signal (2026-05-31) about ADR-writing policy being unsettled.
## Context
boma records architectural decisions as numbered ADRs in `docs/decisions/`, and
CLAUDE.md treats them as load-bearing. Yet no ADR said how an ADR is written. The
newest ADRs (019022) converged on a clean shape — Status → Context → Decision →
Consequences → Related — but only by imitation. ADRs 001018 predate it and drifted
widely: most lacked a `## Status` section entirely (016018 carried only a trailing
build-state note), and many lacked an explicit `## Decision` or `## Consequences`
heading, their decisions spread across ad-hoc topical sections. The result was
structural drift and no uniform way to tell an active decision from a superseded or
deprecated one.
## Decision
### 1. Title & filename
Title line: `# ADR-NNN — <Title>: <optional clarifying subtitle>` (em-dash). Filename:
`NNN-kebab-title.md`, zero-padded 3-digit, monotonic, never reused — a superseded ADR
keeps its number and file. A new ADR is registered as a row in the CLAUDE.md
"Further reading" table.
### 2. Mandatory sections, in this order
- `## Status` — a lifecycle line, usually `Accepted (YYYY-MM-DD)` (see §4), plus an
optional one-line note.
- `## Context` — the forces, the problem, what exists today, why now.
- `## Decision` — what we are doing; numbered sub-decisions for multi-part ADRs.
- `## Consequences` — results, trade-offs explicitly accepted, follow-on work.
### 3. Optional sections (use only where they genuinely apply)
`## Related`, `## Scope`, `## Guardrails` / `## Enforcement`, `## What was ruled out`,
`## Verified facts (ADR-014)`.
### 4. Status lifecycle
Four states. Because boma is single-contributor and trunk-based with no review gate,
most ADRs are **born `Accepted (YYYY-MM-DD)`** — committed-to on writing. A
**`Proposed`** state exists for a genuine draft whose core direction is recorded but
whose specifics are still open for discussion (e.g. ADR-011); it is promoted to
`Accepted` once settled.
- **`Proposed (YYYY-MM-DD)`** — drafted, under discussion, not yet committed-to. May
carry open questions. Promoted to `Accepted (YYYY-MM-DD)` when decided.
- **`Accepted (YYYY-MM-DD)`** — committed-to. The common starting state.
- Replaced → old ADR's Status becomes **`Superseded by ADR-NNN (YYYY-MM-DD)`**; the new
ADR records `Supersedes ADR-MMM` in its Status and `## Related`. The link is
**bidirectional**.
- Retired with no replacement → **`Deprecated (YYYY-MM-DD)`** + a one-line reason.
**No silent rewrites.** An Accepted ADR is not edited to reverse its decision. Typo and
clarity fixes are fine; a material reversal requires a new ADR and a `Superseded by`
marker on the old one.
### 5. Template & enforcement
`docs/decisions/adr-template.md` is the scaffold for new ADRs. The `/review-repo`
command's pre-scan (`scripts/repo-scan.py`) emits an `adr-structure` finding for any
numbered ADR missing a mandatory section or with an unparseable Status line. It checks
**presence and Status, not section order** — order is a convention the template carries,
deliberately not gated, to keep enforcement lightweight (consistent with boma's other
doctrine ADRs adding no CI gate).
### 6. Retroactive conformance of the back-catalogue
ADRs 001018 are restructured to satisfy this standard rather than grandfathered. The
restructure is **presentational** — existing headings are relabelled, regrouped, or
demoted under a `## Decision` umbrella; a dated `## Status` is added; a `## Consequences`
section is assembled from implications the ADR already states. **The substance of no
decision is changed.** This keeps the check uniform (no number threshold) and the corpus
a consistent, legible decision history.
## Consequences
- New ADRs have one obvious shape and a scaffold; structural drift stops.
- Every ADR declares its lifecycle state uniformly, and reversals are traceable.
- The whole corpus conforms; the check needs no grandfathering and stays simple.
- One-time restructure churn across ADRs 001018 (heading reorganization + a Status and
a Consequences section per file; no decision substance changed).
- `/review-repo` grows one deterministic check; no new CI machinery.
- This ADR is the first conformant example and is held to its own check.
## What was ruled out
- **A `make lint` / CI gate for ADR structure** — heavier than the risk warrants;
the `/review-repo` check and the template suffice.
- **Machine-enforcing section order** — brittle for marginal value; left as a
template-demonstrated convention.
- **Grandfathering 001018 from the check** — rejected in favour of restructuring the
whole corpus to conform, so the standard applies uniformly with no exceptions.
## Related
- ADR-014 — knowledge sourcing (the `Verified facts` optional section).
- ADR-019/020/021/022 — the emergent structure this ADR codifies.
- `docs/decisions/adr-template.md` — the scaffold.
- `scripts/repo-scan.py` — the `adr-structure` enforcement check.
```
- [ ] **Step 2: Confirm ADR-023 passes its own check**
Run: `python3 scripts/repo-scan.py 2>/dev/null | python3 -c "import json,sys; print([f for f in json.load(sys.stdin)['findings'] if f['check']=='adr-structure' and '023-' in f['path']])"`
Expected: `[]`.
- [ ] **Step 3: Commit**
```bash
git add docs/decisions/023-adr-structure.md
git commit -m "docs(adr): ADR-023 — ADR structure & lifecycle
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
```
---
## Task 4: Wire into CLAUDE.md and the review-repo command doc
**Files:**
- Modify: `CLAUDE.md` ("Further reading" table)
- Modify: `.claude/commands/review-repo.md` (the deterministic-findings description, ~line 2628)
- [ ] **Step 1: Add the CLAUDE.md "Further reading" row**
In `CLAUDE.md`, in the "Further reading" table, after the `Backup & disaster recovery` row, add:
```markdown
| ADR structure & lifecycle | `docs/decisions/023-adr-structure.md` |
```
- [ ] **Step 2: Mention the new check in review-repo.md**
In `.claude/commands/review-repo.md`, find (~line 2728):
```markdown
(roles, ADRs, runbooks, playbooks, scripts — your shard list) and **exact findings**
(markers, broken refs, unencrypted vaults). Fold these into the report verbatim.
```
Replace the parenthetical with:
```markdown
(roles, ADRs, runbooks, playbooks, scripts — your shard list) and **exact findings**
(markers, broken refs, unencrypted vaults, ADR-structure violations). Fold these into
the report verbatim.
```
- [ ] **Step 3: Verify the CLAUDE.md link resolves**
Run: `test -f docs/decisions/023-adr-structure.md && echo OK`
Expected: `OK`.
- [ ] **Step 4: Commit**
```bash
git add CLAUDE.md .claude/commands/review-repo.md
git commit -m "docs(adr): register ADR-023 and note adr-structure check
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
```
---
## Task 5: Retroactively restructure ADRs 001018 to full conformance
**Goal:** every ADR in 001018 ends with all four mandatory sections present and a
parseable Status line, so the `adr-structure` check reports zero findings — **without
changing the substance of any decision.**
**Files (current findings — the exact worklist):**
- Missing `Status` + `Consequences`: `001-architecture.md`, `002-security.md`, `004-docker-model.md`, `005-bootstrapping.md`, `014-knowledge-sourcing.md`
- Missing `Status` + `Decision` + `Consequences`: `006-terraform.md`, `007-network.md`, `008-testing.md`, `009-provisioning-handoff.md`, `010-forgejo-ci.md`, `011-update-management.md`
- Missing all four: `003-toolchain.md`
- Missing `Status` + `Decision`: `013-heritage-v4.md`
- Missing `Status` only: `012-hardware-capacity.md`, `015-control-host.md`
- Have unparseable `Status` + missing `Consequences`: `016-mesh-vpn.md`, `017-service-ui-verification.md`, `018-logging.md`
(`010`/`011` use `## Decisions` (plural) → relabel to `## Decision`. The "missing
Decision" cases generally have the decision spread across topical `##` headings.)
**THE FAITHFULNESS RULE (non-negotiable):** This is a *presentational* restructure.
You MAY: add a `## Status` section; relabel a heading (`## Decisions``## Decision`);
introduce a `## Decision` umbrella heading and **demote** existing topical `##` headings
to `###` beneath it; add a `## Consequences` section. You MUST NOT alter any existing
sentence of decision prose, reword arguments, or add new policy. A `## Consequences`
section is assembled **only** from implications the ADR already states (its trade-offs,
"what was ruled out", "open questions", named follow-on work). **If an ADR states
nothing that can be faithfully cast as a consequence, STOP and report it as
DONE_WITH_CONCERNS / escalate — do not invent consequences.**
**Per-file date source:** the file's first git-commit (add) date —
`git log --diff-filter=A --format=%as -- <path> | tail -1` (yields `YYYY-MM-DD`).
- [ ] **Step 1: Add a dated `## Status` section to each ADR**
For 001015 (no Status today): insert, between the title line and the first `##`
heading, a Status section:
```markdown
## Status
Accepted (<d>)
```
where `<d>` is the file's first-git-commit date. For 016/017/018 (unparseable Status
today): prepend a parseable `Accepted (<d>). ` clause to the first line of their
existing `## Status` section so the build-state note becomes its tail, e.g.
`Accepted (2026-06-05). Designed. **Authorable now:** ...`.
- [ ] **Step 2: Ensure a `## Decision` section exists**
For ADRs flagged "missing Decision" (003, 006, 007, 008, 009, 010, 011, 013): relabel a
plural/synonym heading where one exists (`## Decisions``## Decision` in 010/011), or
introduce a `## Decision` umbrella immediately after `## Context` and demote the existing
topical `##` body headings (e.g. in 003: "Execution engine", "Python environment", …) to
`###`. Do not move or rewrite the prose under them.
- [ ] **Step 3: Ensure a `## Consequences` section exists**
For every ADR flagged "missing Consequences" (001, 002, 003, 004, 005, 006, 007, 008,
009, 010, 011, 014, 016, 017, 018): add a `## Consequences` section near the end,
assembled strictly from implications the ADR already states. Where an ADR has a trailing
section that *is* consequences under another name (e.g. "What was ruled out", "Open
questions", "Trade-offs"), you may keep that section and add a short `## Consequences`
that references/summarizes the already-stated trade-offs — without introducing new
claims. **Honour the faithfulness rule; escalate any ADR where no faithful Consequences
can be drawn.**
- [ ] **Step 4: Verify the whole corpus passes the check**
Run: `python3 scripts/repo-scan.py 2>/dev/null | python3 -c "import json,sys; v=[f for f in json.load(sys.stdin)['findings'] if f['check']=='adr-structure']; print('adr-structure findings:', len(v)); [print(' ', f['path'], '—', f['detail']) for f in v]"`
Expected: `adr-structure findings: 0`.
- [ ] **Step 5: Verify faithfulness via diff**
Run: `git diff --stat` and spot-check `git diff docs/decisions/003-toolchain.md`.
Expected: changes are heading additions/relabels/level-demotions, a new Status section,
and a new Consequences section — **no edits to existing decision sentences.**
- [ ] **Step 6: Run the repo-scan test suite**
Run: `.venv/bin/pytest tests/test_repo_scan.py -q`
Expected: PASS — 5 passed.
- [ ] **Step 7: Commit**
```bash
git add docs/decisions/0*.md docs/decisions/1*.md
git commit -m "docs(adr): restructure ADRs 001-018 to ADR-023 conformance
Presentational only: add a dated Status section, relabel/regroup headings
under Decision, and add a Consequences section assembled from each ADR's
already-stated implications. No decision substance changed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
```
---
## Final verification (after all tasks)
- [ ] **Lint:** `make lint` — Expected: passes (docs + a stdlib script touched; ansible content unchanged).
- [ ] **Full deterministic scan clean for our check:** `python3 scripts/repo-scan.py 2>/dev/null | python3 -c "import json,sys; print('adr-structure:', sum(1 for f in json.load(sys.stdin)['findings'] if f['check']=='adr-structure'))"``adr-structure: 0`.
- [ ] **Tests green:** `.venv/bin/pytest tests/ -q` → all pass.
- [ ] **Branch ready:** invoke `superpowers:finishing-a-development-branch` to merge `feat/adr-structure` to `main` (trunk-based, no PR) and delete the branch.
---
## Self-review notes
- **Spec coverage:** §1 title/filename → Task 3 + template; §2 sections → Tasks 2/3 + check; §3 lifecycle → Task 3; §4 cross-refs → Task 3 `## Related`; §5 template → Task 2; §6 retroactive restructure → Task 5; §7 enforcement → Task 1 + Task 4. All covered.
- **Order nuance:** spec says sections come "in this order"; the check enforces presence + Status only. This is intentional and stated in both the spec's enforcement wording ("the four mandatory sections and a parseable Status line") and ADR-023's Decision §5 / "What was ruled out". Not a gap.
- **Type/name consistency:** `adr_structure_findings` and the `"adr-structure"` check key are used identically in the function, the `scan()` wiring, the tests, and both verification one-liners.

View file

@ -0,0 +1,164 @@
# Design — ADR structure & lifecycle
- **Date:** 2026-06-10
- **Status:** Approved design — implementation plan to follow
- **Resolves:** the absence of a written standard for how ADRs in
`docs/decisions/` are structured. The newest ADRs (019022) have converged on a
clean pattern (`Status``Context``Decision``Consequences``Related`),
but it lives only as imitation; ADRs 001018 predate it and most lack a `Status`
section.
- **Becomes:** ADR-023 (this design is the basis for that ADR).
- **Reuses:** boma's existing `*-template.md` convention (`service-security-template.md`,
`service-verify-template.md`, `service-access-template.md`, `service-backup-template.md`);
ADR-014 (knowledge-sourcing → the optional `Verified facts` section); ADR-019/020/021/022
(the emergent structure being codified); the `/review-repo` command (enforcement home).
---
## Problem
boma documents architectural decisions as numbered ADRs in `docs/decisions/`, and
CLAUDE.md treats them as load-bearing ("Before assuming a role, provider, or pipeline
exists, check STATUS.md"; the entire "Further reading" table points into them). Yet
there is no ADR that says how an ADR is written. The result:
- **Structural drift.** ADRs 001018 are freeform; 019022 converged on a consistent
shape but only by imitation. A new ADR's structure depends on which existing one the
author happened to copy.
- **No status discipline.** Most early ADRs have no `## Status` section, so there is no
uniform way to tell an active decision from a superseded or deprecated one — and no
written rule for how a decision gets reversed without silently rewriting history.
- **No scaffold.** Every other recurring document type in boma has a template
(`service-security-template.md`, etc.). ADRs do not.
This design codifies the structure 019022 already demonstrate, pins a status
lifecycle, ships a template, and reconciles the back-catalogue.
## Scope
- **In:** the canonical section set (mandatory + optional); title and filename
convention; the `Accepted / Superseded / Deprecated` status lifecycle and the
no-silent-rewrite rule; cross-reference convention; an ADR template file; a
lightweight `/review-repo` structure check; a **one-time retroactive restructure of
ADRs 001018** to full conformance (all four mandatory sections + a parseable Status
line), reorganizing existing content under canonical headings.
- **Out (for now):** *changing the substance of* any existing decision (the restructure
is presentational — relabel/regroup/demote existing content, add a dated Status, never
alter what was decided); a `make lint` / CI gate for ADR structure (explicitly
rejected in favour of the `/review-repo` check — consistent with boma's other doctrine
ADRs, which add no CI gate); grandfathering pre-convention ADRs from the check
(rejected — the whole corpus is brought to conformance instead).
The lifecycle uses four states — `Proposed / Accepted / Superseded / Deprecated`. An
earlier draft of this design omitted `Proposed`, but ADR-011 (a real draft with open
questions) is evidence boma occasionally needs it, so it was kept.
## Decision
### 1. Title & filename
- Title line: `# ADR-NNN — <Title>: <optional clarifying subtitle>` (em-dash `—`,
matching every existing ADR).
- Filename: `NNN-kebab-title.md`, zero-padded 3-digit, monotonic, **never reused**
(a superseded ADR keeps its number and file).
- A new ADR is registered as a row in the CLAUDE.md "Further reading" table.
### 2. Canonical sections
**Mandatory — every ADR, in this order:**
| Section | Holds |
|---|---|
| `## Status` | `Accepted (YYYY-MM-DD)`, plus an optional one-line note (what it resolves/supersedes, or a doctrine-not-yet-built caveat as ADR-022 uses) |
| `## Context` | the forces, the problem, what exists today, why now |
| `## Decision` | what we are doing — numbered sub-decisions for multi-part ADRs, as 020/021/022 do |
| `## Consequences` | results, trade-offs *explicitly accepted*, follow-on work |
**Optional — use only where genuinely applicable, never as padding:**
- `## Related` — links to other ADRs by number.
- `## Scope` — explicit in/out-of-scope boundaries.
- `## Guardrails` / `## Enforcement` — how the decision is mechanically enforced
(lint, CI, hooks).
- `## What was ruled out` — rejected alternatives, each with its reason.
- `## Verified facts (ADR-014)` — version-stamped facts per the knowledge-sourcing rule.
### 3. Status lifecycle
Four states. Most ADRs are **born `Accepted (YYYY-MM-DD)`** — the sole author commits
to it on writing (boma is single-contributor and trunk-based with no review gate).
- **`Proposed (YYYY-MM-DD)`** — a genuine draft whose core direction is recorded but
whose specifics are still open (e.g. ADR-011, which carries open questions). Promoted
to `Accepted (YYYY-MM-DD)` once settled.
- **`Accepted (YYYY-MM-DD)`** — committed-to; the common starting state.
- Replaced by a later decision → the old ADR's Status becomes
**`Superseded by ADR-NNN (YYYY-MM-DD)`**; the superseding ADR records
`Supersedes ADR-MMM` in its own `## Status` and `## Related`. The link is
**bidirectional** — both files must point at each other.
- Retired with no replacement → **`Deprecated (YYYY-MM-DD)`** plus a one-line reason.
**Load-bearing rule — no silent rewrites.** An `Accepted` ADR is not edited to reverse
its decision. Typo and clarity fixes are fine; a *material reversal* requires a new ADR
and a `Superseded by` marker on the old one. The history of decisions stays legible.
### 4. Cross-references
Reference other ADRs by number inline (`ADR-019`), and collect the relationships in a
`## Related` section.
### 5. Template file
Ship `docs/decisions/adr-template.md` — consistent with boma's existing
`*-template.md` convention. It contains the mandatory section headers pre-filled with
short HTML-comment hints, and the optional sections listed as commented stubs to
uncomment when relevant. It is a skeleton, not a numbered decision, so it does not take
an ADR number.
### 6. Retroactive restructure (001018)
A **separate step** after the ADR and template land: bring every pre-convention ADR to
full conformance — all four mandatory sections present and a parseable Status line. This
is a **presentational** restructure, governed by a strict faithfulness rule:
- **Add** a `## Status` section valued `Accepted (YYYY-MM-DD)`, the date reconstructed
from the file's **first git-commit date**. For 016018, whose existing trailing
build-state note is unparseable, prepend the dated `Accepted (...)` clause so the note
becomes a parseable Status line's tail.
- **Reorganize** existing content under the canonical headings: relabel a synonym
(`## Decisions``## Decision`), or introduce a `## Decision` umbrella and **demote**
the existing topical `##` headings to `###` beneath it. No sentence of existing prose
is altered.
- **Add** a `## Consequences` section built **only** from implications the ADR already
states (trade-offs, "what was ruled out", "open questions", follow-on work already
named). If an ADR genuinely states nothing that can be faithfully cast as a
consequence, that file is escalated for a human decision rather than inventing one.
- **Never** change the substance of a decision. A `git diff` of the restructure should
show heading-level changes, a new Status section, and a Consequences section assembled
from existing material — not edits to existing argument.
ADRs already conformant (019022) are left alone. End state: the `adr-structure` check
reports zero findings across the whole corpus, with no grandfathering.
### 7. Enforcement
Lightweight, no CI gate. The `/review-repo` command gains an ADR-structure check:
every file in `docs/decisions/` matching `NNN-*.md` has the four mandatory sections and
a parseable `## Status` line. The template carries the convention forward for new ADRs.
## Consequences
- New ADRs have one obvious shape and a scaffold to start from; structural drift stops.
- Every ADR declares its lifecycle state uniformly, and reversals are traceable rather
than silent — the back-catalogue becomes a legible decision history.
- One-time churn: a restructure touching ~18 files (heading reorganization + a Status
section + a Consequences section per file). Larger and more judgment-heavy than a
Status-only backfill, hence the faithfulness rule and per-file review.
- The whole corpus conforms — the check needs no grandfathering or number threshold, and
stays simple (presence + parseable Status, applied uniformly).
- `/review-repo` grows a new check; no new CI machinery, matching boma's habit of not
gating doctrine in CI.
- This ADR is itself the first conformant example — it must follow its own structure.
## Open questions
None outstanding — title/filename, the **4-state lifecycle** (`Proposed / Accepted /
Superseded / Deprecated`; `Proposed` adopted on the evidence of ADR-011), template name
(`adr-template.md`), enforcement (`/review-repo`, no CI gate), and the **full
retroactive restructure** of 001018 (no grandfathering) were all confirmed during
brainstorming and execution.

View file

@ -41,6 +41,17 @@ LIST_ITEM_RE = re.compile(r"^\s*(\d+\.|[-*+])\s+(.*)")
DEFER_REF_RE = re.compile(r"ADR-(\d{3})\D{0,40}?deferred\D{0,12}?(\d+)", re.I)
RESOLVE_WORD_RE = re.compile(r"\b(?:resolv\w*|decid\w*|address\w*|complet\w*|done)\b", re.I)
# ADR-structure check (ADR-023): numbered ADRs must carry the four mandatory
# sections and a parseable Status line. Presence only — section ORDER is a
# template-demonstrated convention, not machine-enforced.
ADR_FILE_RE = re.compile(r"^\d{3}-.*\.md$")
ADR_REQUIRED_SECTIONS = ("Status", "Context", "Decision", "Consequences")
ADR_STATUS_LINE_RE = re.compile(
r"^(Proposed \(\d{4}-\d{2}-\d{2}\)"
r"|Accepted \(\d{4}-\d{2}-\d{2}\)"
r"|Superseded by ADR-\d{3} \(\d{4}-\d{2}-\d{2}\)"
r"|Deprecated \(\d{4}-\d{2}-\d{2}\))")
def _is_defer_heading(text):
t = text.strip().lower()
@ -95,6 +106,42 @@ def deferred_findings(adr_files, defer_refs):
return out
def adr_structure_findings(adr_files):
"""adr_files: {rel_path: [lines]} for docs/decisions/*.md.
Flags numbered ADRs (NNN-*.md) missing a mandatory section or whose Status
section has no parseable lifecycle line. Non-numbered files (e.g.
adr-template.md) are skipped. Section order is NOT checked (ADR-023)."""
out = []
for rpath, lines in sorted(adr_files.items()):
if not ADR_FILE_RE.match(os.path.basename(rpath)):
continue
headings = {}
for i, line in enumerate(lines):
m = re.match(r"^##\s+(\w+)", line)
if m:
headings.setdefault(m.group(1), i)
missing = [s for s in ADR_REQUIRED_SECTIONS if s not in headings]
if missing:
out.append({"check": "adr-structure", "severity": "medium",
"path": rpath, "line": 1,
"detail": f"missing mandatory section(s): {', '.join(missing)}"})
if "Status" in headings:
body = []
for line in lines[headings["Status"] + 1:]:
if line.startswith("## "):
break
body.append(line)
status_text = next((ln.strip() for ln in body if ln.strip()), "")
if not ADR_STATUS_LINE_RE.match(status_text):
out.append({"check": "adr-structure", "severity": "medium",
"path": rpath, "line": headings["Status"] + 1,
"detail": "Status not parseable (want 'Proposed (YYYY-MM-DD)', "
"'Accepted (YYYY-MM-DD)', 'Superseded by ADR-NNN "
"(YYYY-MM-DD)', or 'Deprecated (YYYY-MM-DD)'); "
f"got: {status_text[:60]!r}"})
return out
def walk_files():
for dirpath, dirnames, filenames in os.walk(ROOT):
dirnames[:] = [d for d in dirnames if d not in PRUNE]
@ -213,6 +260,7 @@ def scan():
findings.append({"check": "broken-path-ref", "severity": "medium", "path": rpath,
"line": i, "detail": f"references '{ref}' which does not exist"})
findings.extend(deferred_findings(adr_files, defer_refs))
findings.extend(adr_structure_findings(adr_files))
return findings

59
tests/test_repo_scan.py Normal file
View file

@ -0,0 +1,59 @@
import importlib.util
import pathlib
_PATH = pathlib.Path(__file__).resolve().parent.parent / "scripts" / "repo-scan.py"
_spec = importlib.util.spec_from_file_location("repo_scan", _PATH)
rs = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(rs)
GOOD = [
"# ADR-099 — Example\n", "\n",
"## Status\n", "\n", "Accepted (2026-06-10)\n", "\n",
"## Context\n", "\n", "Why.\n", "\n",
"## Decision\n", "\n", "What.\n", "\n",
"## Consequences\n", "\n", "So what.\n",
]
def _checks(findings):
return [f for f in findings if f["check"] == "adr-structure"]
def test_good_adr_has_no_findings():
out = rs.adr_structure_findings({"docs/decisions/099-example.md": GOOD})
assert _checks(out) == []
def test_missing_mandatory_section_is_flagged():
lines = [ln for ln in GOOD if not ln.startswith("## Consequences")]
out = _checks(rs.adr_structure_findings({"docs/decisions/099-example.md": lines}))
assert len(out) == 1
assert "Consequences" in out[0]["detail"]
def test_unparseable_status_is_flagged():
lines = [("Designed, not built.\n" if ln == "Accepted (2026-06-10)\n" else ln)
for ln in GOOD]
out = _checks(rs.adr_structure_findings({"docs/decisions/099-example.md": lines}))
assert len(out) == 1
assert "Status not parseable" in out[0]["detail"]
def test_superseded_status_is_accepted():
lines = [("Superseded by ADR-100 (2026-06-11)\n" if ln == "Accepted (2026-06-10)\n"
else ln) for ln in GOOD]
out = _checks(rs.adr_structure_findings({"docs/decisions/099-example.md": lines}))
assert out == []
def test_proposed_status_is_accepted():
lines = [("Proposed (2026-06-04)\n" if ln == "Accepted (2026-06-10)\n"
else ln) for ln in GOOD]
out = _checks(rs.adr_structure_findings({"docs/decisions/099-example.md": lines}))
assert out == []
def test_non_numbered_file_is_skipped():
bare = ["# ADR template\n", "\n", "## Status\n", "\n", "<!-- hint -->\n"]
out = _checks(rs.adr_structure_findings({"docs/decisions/adr-template.md": bare}))
assert out == []