boma/docs/reviews/2026-06-11-review.md
sjat 1da117d65b docs(review): 2026-06-11 repo audit — fix build-wave doc drift
/review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the
base(firewall)+dev_env build wave: README/playbook/role notes that still called
the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021
cross-links in ADR-016/020.

18 open findings reported (not fixed). Headline: `make lint` is red on `main`
(site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022
backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of
12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:48:00 +02:00

9.8 KiB
Raw Permalink Blame History

Repo review — 2026-06-11

  • Reviewed commit: 67f2aba (main)
  • Mode: on-demand (interactive)
  • Previous run: 2026-06-05 (commit f566fd1)
  • Process: Phase 0 deterministic scan → 5 parallel shard reviewers + 1 cross-cutting reviewer → synthesis, deferral-checklist resolution, prior-run diff → safe auto-fixes.

Summary

High Medium Low Total
Auto-fixed 1 2 2 5
Open (report-only) 2 7 9 18

By dimension (open): conformance 3 · consistency 8 · drift 6 · cruft 1.

Headline: make lint is currently red on mainplaybooks/site.yml imports the not-yet-existent docker_host role (confirmed at clean HEAD, unrelated to this run's edits). That breaks CLAUDE.md's "main must always work" / "Never skip lint" contract and is the top open finding (O1). The bulk of the rest is documentation drift created by the recent base (firewall) + dev_env build wave: several READMEs/playbook notes still described the roles as "empty / not built." Those were the safe auto-fixes.

Good news: 7 of the 12 open findings from the 2026-06-05 run are confirmed resolved (VERIFY.md row + runbook step, backend.tf relabel, askari group naming, ADR-014 reproducibility, CAPABILITIES Level-4 row, TODO 3.10). The deferral checklist is clean — 0 stale-deferred this run (the recurring miss logged in FRICTION.md did not recur).

Auto-fixes applied

Markdown / YAML-comment only; no runtime behaviour, logic, vars, or task order touched.

ID Sev File(s) What
AF1 high roles/README.md Rewrote stale "base & docker_host are empty untracked dirs, site.yml would fail on a clean clone" → base partially built (firewall), docker_host not yet created, dev_env built+applied.
AF2 med playbooks/site.yml NOTE no longer claims base is unbuilt / "fails on a clean clone"; now reflects firewall-only base + missing docker_host.
AF3 med playbooks/README.md Dropped the "currently a no-op" claim; added a workstation.yml bullet.
AF4 low README.md Added docs/access/, docs/backup/, roles/dev_env/, playbooks/workstation.yml to the project-structure tree.
AF5 low docs/decisions/016-mesh-vpn.md, docs/decisions/020-firewall.md Added the reciprocal ADR-021 cross-reference that ADR-021 says it amended in.

make lint was re-run after the fixes: it fails only on the pre-existing docker_host syntax-check (O1), identical to clean HEAD. No auto-fix introduced or changed any lint result, so none were reverted.

Open findings (prioritised)

High

  • O1 — make lint is red on main · playbooks/site.yml:18 · conformance site.yml imports the docker_host role, which does not exist, so ansible-lint's syntax-check fails on a clean checkout. Violates "main must always work" + "Never skip lint" (pre-commit would block every commit unless bypassed). Fix (judgement): guard/skip the docker_host play until the role exists, scaffold a stub via make new-role NAME=docker_host, or exclude site.yml from syntax-check until built — and record the choice. new

  • O2 — ADR-004 ↔ ADR-022 backup-scope contradiction · docs/decisions/004-docker-model.md:105 · consistency ADR-004 says "Backup strategy is defined separately (not in scope of this repo)"; ADR-022 defines a full in-repo backup strategy. Per ADR-023 (no silent reversals), update ADR-004's line to defer to ADR-022 and cross-link. Design decision — report. new

Medium

  • O3 — ADR-004 service-role file table missing ACCESS.md + BACKUP.md · docs/decisions/004-docker-model.md:48 · consistency — CLAUDE.md + ADR-021/022 now mandate both for service roles; the canonical table lists only SECURITY.md + VERIFY.md. (Prior "missing VERIFY.md" is resolved; this is the next evolution.) new
  • O4 — CAPABILITIES nvim/tmux exclusion ↔ dev_env built · docs/CAPABILITIES.md:149 · consistency — listed as a confirmed exclusion ("server-only"), but dev_env (built+applied to ubongo) installs exactly that. Carve out the control-node/AI-worker exception (ADR-015). new
  • O5 — phantom make deploy PLAYBOOK=upgrade · docs/decisions/002-security.md:82 · drift — no upgrade.yml exists; ADR-011 is unbuilt. Add a "(planned)" caveat. new
  • O6 — hosts.yml stubs missing offsite_hosts group · inventories/{production,staging}/hosts.yml · drift — the generator emits it (one of four VALID_GROUPS); the hand-stubs predate the standard. Regenerate via make tf-inventory (don't hand-edit). (Prior "askari group unnamed" is resolved.) new
  • O7 — new-host runbook Part E vs ubongo reality · docs/runbooks/new-host.md:81-130 · drift — instructs creating an ansible user / ssh ansible@; STATUS records ubongo is managed as sjat, ansible-user bootstrap pending. new
  • O8 — dev_env untagged set_fact under tagged consumers · roles/dev_env/tasks/per_user.yml:2-9 · conformance — partial --tags users|config runs skip the dev_env__home set_fact and fail. Tag the preflight [users, config] or always. new
  • O9 — ubongo address outside ADR-007 subnets · STATUS.md:31 ↔ 007-network.md · drift — 10.20.10.151 is in neither srv (10.20.0.0/24) nor mgmt (10.10.0.0/24); base__firewall_control_addr depends on it. Already a tracked follow-up in the ubongo-build plan. Reconcile address or ADR-007. new

Low

  • O10 — README ADR list stops at 017 · README.md:104 · drift — 018023 exist; extend or trim to a pointer. recurring (evolved from prior O3)
  • O11 — ADR section-order vs ADR-023 §2 · 008:3, 014:98, 016:91, 017:66, 018:73 · conformance — Status-not-first / Decision-late; passes lint (order not gated) but not the standard. Presentational restructure. new
  • O12 — ADR-007 FQDN convention vs its own example · 007-network.md:160 · consistency<service>.baobab.band vs forgejo.nyumbani.baobab.band; ties to open TODO 4 (split-horizon). new
  • O13 — dev_env .zshrc heritage carryovers · roles/dev_env/files/dotfiles/zsh/.zshrc:28,55 · consistency — hard-coded /usr/bin/rclone alias (not installed by the role) + unguarded direnv hook. new
  • O14 — oh_my_posh config tasks untagged · roles/dev_env/tasks/oh_my_posh.yml:15-26 · consistency — inconsistent config tagging vs per_user.yml. new
  • O15 — tfvars.example pve01 vs ADR-007 pve0 · terraform/environments/*/terraform.tfvars.example:9 · consistency — verify the real node name, then align. new
  • O16 — ADR-013/015 "See also:" vs ## Related · consistency — stylistic; convert for uniformity. new
  • O17 — empty scaffold handlers/main.yml · roles/{dev_env,base}/handlers/main.yml · cruft — confirm convention or delete. new
  • O18 — docs/README.md + inventories/README.md narrower than reality · consistency — omit several real subdirs / the offsite_hosts group. new

Deferral checklist (Phase 2)

Source Items Verdict
ADR-011 Deferred/Open 5 (snapshot driver, cadences, health-check harness home, classification home, staging-first) All genuinely still open — cross-checked against later ADRs + TODO 16. None silently resolved.
ADR-015 Deferred #1 mesh VPN, #2 service-UI, #3 build All marked RESOLVED in place (ADR-016 / ADR-017 / 2026-06-11 build).

Stale-deferred found: 0. The recurring FRICTION.md miss did not recur this run.

Scan false positives (folded in, not actionable)

  • broken-path-ref STATUS.md:38 — STATUS legitimately documents roles/docker_host/ as "Not in git." (intentional reference to an unbuilt role).
  • broken-adr-ref ×4 — ADR-099/ADR-100 in tests/test_repo_scan.py and the adr-structure plan are intentional test fixtures for the scanner's bad-ref check.
  • marker ×14 — all in docs/superpowers/{plans,specs}/* (historical commit-message TODOs / plan steps) or prose discussing "over-tagging" as a concept. Not cruft.

Prior-run diff (vs 2026-06-05)

Resolved (7): O1 VERIFY.md row · O2 new-role VERIFY step · O4 askari group naming · O5 backend.tf relabel · O6 ADR-014 reproducibility · O11 CAPABILITIES Level-4 row · O12 TODO 3.10. Partial: O3 (docs tree fixed in AF4; ADR-list carried as O10). Not re-detected (verify next run): O7O10 (ADR-011 still Proposed).

Follow-up prompt (copy-paste)

Act on the open findings from docs/reviews/2026-06-11-review.md. Priority order:

  1. O1 (high): make lint is red on mainplaybooks/site.yml imports the non-existent docker_host role. Pick an interim posture (guard/skip the play, or make new-role NAME=docker_host to scaffold a stub, or exclude from syntax-check until built) so the trunk lints clean again, and record the choice in STATUS.md.
  2. O2 (high): Resolve the ADR-004 ↔ ADR-022 backup-scope contradiction — update ADR-004's "not in scope of this repo" line to defer to ADR-022 (per ADR-023's no-silent-reversal rule) and cross-link.
  3. O3: Add ACCESS.md + BACKUP.md rows to ADR-004's service-role file table.
  4. O4: Reconcile CAPABILITIES' nvim/tmux exclusion with the built dev_env role (carve out the ubongo control-node exception).
  5. O8 (conformance): Tag the dev_env__home preflight set_fact so partial --tags users|config runs don't fail.
  6. O6 / O9: Regenerate the inventory stubs to include offsite_hosts; reconcile ubongo's 10.20.10.151 against ADR-007's subnets (or amend ADR-007).
  7. Sweep the low-severity doc items (O5 caveat, O7 runbook, O10 ADR list, O11 ADR section order, O12O18) as a single docs-hygiene batch. Run make lint before committing; commit per CLAUDE.md git conventions.