boma/docs/reviews/2026-06-14-review.md
sjat 64f1e821d8 docs(review): 2026-06-14 repo audit — M4a doc drift + Traefik→Caddy lag
11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01
description, base/playbooks/scripts/terraform/public_dns README build-state,
CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23
stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported.
make lint green. No stale-deferred (ADR-011 open questions still open).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:37:54 +02:00

9.7 KiB
Raw Blame History

Repo review — 2026-06-14

  • Reviewed commit: e346137 (docs(plan): M4b — NetBird coordinator service role)
  • Mode: on-demand (interactive — auto-fixes applied + committed)
  • Previous run: 2026-06-11 (67f2aba)
  • make lint: green before and after fixes (260 files, profile production; check-tags OK).

Summary

A lot shipped since the last review (M4a: docker_host Docker engine, reverse_proxy Caddy applied to askari; offsite Terraform env live; ADR-024). Most findings this run are the predictable docs-lagging-the-build kind — stale "not built yet" notes, a reverse-proxy that switched from DNS-01/custom-image to vanilla HTTP-01 leaving stale descriptions behind, and the Traefik→Caddy rename only half-propagated through the ADR set. The previous run's blocker (O1, make lint RED) is resolved.

Counts

Dimension High Medium Low Total
Cruft / staleness 0 0 0 0
Design conformance 1 2 2 5
Consistency & intent 2 2 9 13
Docs-vs-reality drift 1 4 5 10
Open total 4 8 16 29

Plus 11 auto-fixes applied (3 high, 5 medium, 3 low).

Phase-0 scan

repo-scan.py: 5 roles, 25 ADRs · broken-adr-ref=4, broken-path-ref=2, marker=14, open-deferred-item=5, stale-deferred=0. Every scan finding is a known false-positive (test fixtures ADR-099/100; the roles/netbird/ references in the M4b plan for unbuilt work; superpowers planning artifacts; 019-tagging.md:14 is prose about "over-tagging", not a TODO). Details in the findings JSON.

Deferral checklist

All 5 ADR-011 "Open questions" (Proxmox snapshot driver, exact cadences, health-check harness home, classification home, staging-first) confirmed genuinely still open — ADR-011 is still Proposed/unbuilt, the same questions sit open in docs/TODO.md item 16, and no later ADR or STATUS decides any of them. No stale-deferred (same as last run).

Auto-fixes applied

All safe/obvious (stale text contradicting code/reality, partial enumerations, broken descriptions) — no logic, variable, secret, or task-order changes.

ID Sev File What
AF1 high roles/reverse_proxy/meta/main.yml description still said DNS-01 + custom on-host image → rewrote to vanilla Caddy + HTTP-01 (matches the role since b7e919d)
AF2 med roles/README.md base hardening + docker_host/reverse_proxy/public_dns build-state was stale → reconciled with STATUS
AF3 med playbooks/README.md stale "docker_host has no tasks" note; added missing dns.yml + offsite.yml bullets
AF4 low roles/public_dns/README.md "askari in M4" → askari + *.askari records applied in M4a
AF5 low scripts/README.md added the missing check-tags.py entry (run by make lint)
AF6 med terraform/README.md added modules/hetzner_vm + environments/offsite (the one applied env)
AF7 low terraform/environments/offsite/providers.tf verified-stamp cax11@hel1cx23@hel1 (actual server)
AF8 low terraform/modules/hetzner_vm/variables.tf server_type example cax11 (ARM)cx23 (x86) or cax11 (ARM)
AF9 med inventories/production/group_vars/all/public_dns.yml wildcard comment "cert via DNS-01" → ACME HTTP-01 (M4a)
AF10 high docs/CAPABILITIES.md reverse-proxy candidate TraefikCaddy (ADR-024); public DNS "apply pending" → "applied (M1)"
AF11 low README.md Documentation ADR list extended ADR-017 → ADR-024

Open findings (prioritised)

High

  • O1 — drift — STATUS.md:41 (+45-48) ↔ 33-34 (new): docker_host still appears in the "Scaffolded but empty — NOT implemented" table as a no-op, contradicting its own "Built + applied" rows and the real tasks file. Reword the scaffold row + closing paragraph (left for the operator — STATUS is the ground-truth doc).
  • O2 — consistency — ADR-004:105,131 ↔ ADR-022 (recurring): ADR-004 says backup is "not in scope of this repo"; ADR-022 defines a full in-repo backup doctrine. Repoint ADR-004 at ADR-022 (ADR↔ADR design decision — report).
  • O3 — consistency — ADR-024 Consequences ↔ ADR-008:70/017:27,88/019:52 (new): ADR-024 claims it updated ADR-017's Traefik prose to Caddy; it didn't, and ADR-008/019 still say Traefik too. Either finish the rename or soften ADR-024's claim.
  • O4 — conformance — ADR-023:7-8,77-80 ↔ ADR-016/017/018 (recurring): ADR-023 claims ADRs 001018 were restructured to lead with ## Status, but 016/017/018 still open with ## Context and bury Status. Fix the three ADRs or correct ADR-023 §6.

Medium

  • O5 — ADR-004:48-50 (recurring): service-role file table omits ACCESS.md + BACKUP.md rows (now mandated by CLAUDE.md/ADR-021/022).
  • O6 — ADR-002:82 (recurring): make deploy PLAYBOOK=upgrade cited as real, but no upgrade.yml exists and ADR-011 is unbuilt — needs a (planned) caveat.
  • O7 — CAPABILITIES:150-155 ↔ STATUS:29 (recurring): nvim/tmux listed as a "confirmed exclusion" while dev_env installs them on ubongo; needs a control-host carve-out (not a token swap, so left from AF10).
  • O8 — dev_env tasks (include_tasks + per_user.yml:4-9) (recurring): untagged set_fact dev_env__home preflight + include without apply: tags:; a partial --tags users|config run breaks (base guards this; dev_env doesn't).
  • O9 — inventories/production/hosts.yml (recurring): header claims TF-generated but it's hand-maintained (carries ubongo, omits offsite_hosts); tf-inventory would drop ubongo. Make the header honest.
  • O10 — group_vars/all/vars.yml:42 ↔ ADR-007 (recurring): ubongo 10.20.10.151 is in no ADR-007 subnet and undocumented; base__firewall_control_addr depends on it.
  • O11 — terraform tfvars.example (both envs) (recurring): pve01 vs ADR-007's pve0; verify the real node name before changing.
  • O12 — roles/reverse_proxy/ (new): first built+applied service role, but missing SECURITY/VERIFY/ACCESS/BACKUP.md. (Recorded judgement: public_dns is exempt — control- node external-API role, not a host service.)
  • O15 — runbooks/new-host.md Part E (recurring): still describes an ansible user on ubongo; STATUS says ubongo is managed as sjat (ansible-user bootstrap pending).
  • O18 — ADR-007/009/016 internal-zone name (new): boma.baobab.band vs target boma.wingu.me used inconsistently across the doc set after M1; state the transition in one place.

Low

O13 (See-also vs ## Related in ADR-012/013/015/016/017/018 — recurring), O14 (docs/README + inventories/README narrow enumerations — recurring), O16 (.zshrc rclone alias + unguarded direnv hook — recurring), O17 (oh_my_posh zen.toml tasks missing config tag — recurring), O19 (ADR-009:122 nyumbani example after retirement — recurring), O20 (ROADMAP M2 CAX11/ARM vs cx23/x86 — new), O21 (ADR-020 "ports will be added in M4" stale; already opened in M4a — new), O22 (ADR-024 body still asserts custom- image obligation contradicting its revised Status — new), O23 (netbird_coordinator vs netbird role name across ADRs/ROADMAP/plan — new), O24 (*.boma.<domain> vs *.<boma-domain> wildcard scope ADR-024 vs ROADMAP — new), O25 (tags: [verify] out of the ADR-019 vocabulary in molecule verify — new), O26 (reverse_proxy templates lack ansible_managed header — new), O27 (reverse_proxy vars in group_vars/all/ not offsite_hosts/ — new), O28 (capacity-scan.py ignores offsite.yml — new), O29 (offsite.yml duplicates empty groups from hosts.yml, undocumented merge — new).

Full detail + suggested fixes in 2026-06-14-findings.json.

Themes worth a deliberate pass

  1. Finish the Traefik→Caddy rename (O3, and ADR-024 over-claimed it was done). One sweep across ADR-008/017/019 closes it.
  2. STATUS docker_host self-contradiction (O1) — quick, but it's the ground-truth doc.
  3. ADR-024 internal consistency (O22) — the role went vanilla/HTTP-01 but the ADR body still mandates the custom image; reconcile §2/§3/Consequences with its own Status.
  4. dev_env tag-isolation (O8) — the one real conformance bug with runtime impact; mirror base's apply: tags: guard.
  5. First service-role doc quartet (O12) — reverse_proxy is the template for every future service role; getting SECURITY/VERIFY/ACCESS/BACKUP.md right now pays forward.

Follow-up prompt

Work the open findings from docs/reviews/2026-06-14-review.md. Priority order: (1) O1 — fix the STATUS.md docker_host contradiction (it's built+applied, not a no-op; reword the "Scaffolded but empty" row + the 45-48 paragraph). (2) O3 + O22 — finish the Traefik→Caddy rename in ADR-008:70, ADR-017:27,88, ADR-019:52, and reconcile ADR-024's body (§2 custom image, §3 NetBird, Consequences) with its own revised HTTP-01 Status note. (3) O2 + O5 — repoint ADR-004's "backup not in scope" line at ADR-022 and add ACCESS.md + BACKUP.md rows to its service-role file table. (4) O8 — add apply: tags: [users, config] to dev_env's per_user.yml include and tag the dev_env__home set_fact always; add a Molecule assertion that a partial --tags config run still resolves the home dir. (5) O12 — author the four service-role doc files for roles/reverse_proxy/ from the templates (BACKUP.md = backup__state: false, re-issuable certs). (6) O4 — restructure ADR-016/017/018 to lead with ## Status, or correct ADR-023 §6. Then the medium drift items (O6 upgrade caveat, O7 nvim/tmux carve-out, O9 hosts.yml header, O15 new-host Part E, O18 internal-zone naming). Run make lint after each batch; commit per CLAUDE.md git conventions.