boma/docs/reviews/2026-06-11-review.md
sjat 1da117d65b docs(review): 2026-06-11 repo audit — fix build-wave doc drift
/review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the
base(firewall)+dev_env build wave: README/playbook/role notes that still called
the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021
cross-links in ADR-016/020.

18 open findings reported (not fixed). Headline: `make lint` is red on `main`
(site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022
backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of
12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:48:00 +02:00

161 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Repo review — 2026-06-11
- **Reviewed commit:** `67f2aba` (main)
- **Mode:** on-demand (interactive)
- **Previous run:** `2026-06-05` (commit `f566fd1`)
- **Process:** Phase 0 deterministic scan → 5 parallel shard reviewers + 1 cross-cutting
reviewer → synthesis, deferral-checklist resolution, prior-run diff → safe auto-fixes.
## Summary
| | High | Medium | Low | Total |
|---|---|---|---|---|
| **Auto-fixed** | 1 | 2 | 2 | 5 |
| **Open (report-only)** | 2 | 7 | 9 | 18 |
By dimension (open): conformance 3 · consistency 8 · drift 6 · cruft 1.
**Headline:** `make lint` is currently **red on `main`**`playbooks/site.yml` imports the
not-yet-existent `docker_host` role (confirmed at clean HEAD, unrelated to this run's
edits). That breaks CLAUDE.md's "main must always work" / "Never skip lint" contract and
is the top open finding (O1). The bulk of the rest is documentation drift created by the
recent `base` (firewall) + `dev_env` build wave: several READMEs/playbook notes still
described the roles as "empty / not built." Those were the safe auto-fixes.
**Good news:** 7 of the 12 open findings from the 2026-06-05 run are confirmed resolved
(VERIFY.md row + runbook step, backend.tf relabel, askari group naming, ADR-014
reproducibility, CAPABILITIES Level-4 row, TODO 3.10). The deferral checklist is clean —
**0 stale-deferred** this run (the recurring miss logged in FRICTION.md did not recur).
## Auto-fixes applied
Markdown / YAML-comment only; no runtime behaviour, logic, vars, or task order touched.
| ID | Sev | File(s) | What |
|---|---|---|---|
| AF1 | high | `roles/README.md` | Rewrote stale "base & docker_host are empty untracked dirs, site.yml would fail on a clean clone" → base partially built (firewall), docker_host not yet created, dev_env built+applied. |
| AF2 | med | `playbooks/site.yml` | NOTE no longer claims base is unbuilt / "fails on a clean clone"; now reflects firewall-only base + missing docker_host. |
| AF3 | med | `playbooks/README.md` | Dropped the "currently a no-op" claim; added a `workstation.yml` bullet. |
| AF4 | low | `README.md` | Added `docs/access/`, `docs/backup/`, `roles/dev_env/`, `playbooks/workstation.yml` to the project-structure tree. |
| AF5 | low | `docs/decisions/016-mesh-vpn.md`, `docs/decisions/020-firewall.md` | Added the reciprocal `ADR-021` cross-reference that ADR-021 says it amended in. |
> `make lint` was re-run after the fixes: it fails **only** on the pre-existing
> `docker_host` syntax-check (O1), identical to clean HEAD. No auto-fix introduced or
> changed any lint result, so none were reverted.
## Open findings (prioritised)
### High
- **O1 — `make lint` is red on `main`** · `playbooks/site.yml:18` · *conformance*
site.yml imports the `docker_host` role, which does not exist, so ansible-lint's
syntax-check fails on a clean checkout. Violates "main must always work" + "Never skip
lint" (pre-commit would block every commit unless bypassed).
*Fix (judgement):* guard/skip the docker_host play until the role exists, scaffold a
stub via `make new-role NAME=docker_host`, or exclude site.yml from syntax-check until
built — and record the choice. **new**
- **O2 — ADR-004 ↔ ADR-022 backup-scope contradiction** ·
`docs/decisions/004-docker-model.md:105` · *consistency*
ADR-004 says "Backup strategy is defined separately (not in scope of this repo)";
ADR-022 defines a full in-repo backup strategy. Per ADR-023 (no silent reversals),
update ADR-004's line to defer to ADR-022 and cross-link. Design decision — report. **new**
### Medium
- **O3 — ADR-004 service-role file table missing ACCESS.md + BACKUP.md** ·
`docs/decisions/004-docker-model.md:48` · *consistency* — CLAUDE.md + ADR-021/022 now
mandate both for service roles; the canonical table lists only SECURITY.md + VERIFY.md.
(Prior "missing VERIFY.md" is resolved; this is the next evolution.) **new**
- **O4 — CAPABILITIES nvim/tmux exclusion ↔ dev_env built** ·
`docs/CAPABILITIES.md:149` · *consistency* — listed as a confirmed exclusion
("server-only"), but `dev_env` (built+applied to ubongo) installs exactly that. Carve
out the control-node/AI-worker exception (ADR-015). **new**
- **O5 — phantom `make deploy PLAYBOOK=upgrade`** · `docs/decisions/002-security.md:82` ·
*drift* — no `upgrade.yml` exists; ADR-011 is unbuilt. Add a "(planned)" caveat. **new**
- **O6 — hosts.yml stubs missing `offsite_hosts` group** ·
`inventories/{production,staging}/hosts.yml` · *drift* — the generator emits it (one of
four VALID_GROUPS); the hand-stubs predate the standard. Regenerate via
`make tf-inventory` (don't hand-edit). (Prior "askari group unnamed" is resolved.) **new**
- **O7 — new-host runbook Part E vs ubongo reality** · `docs/runbooks/new-host.md:81-130`
· *drift* — instructs creating an `ansible` user / `ssh ansible@`; STATUS records ubongo
is managed as `sjat`, ansible-user bootstrap pending. **new**
- **O8 — dev_env untagged `set_fact` under tagged consumers** ·
`roles/dev_env/tasks/per_user.yml:2-9` · *conformance* — partial `--tags users|config`
runs skip the `dev_env__home` set_fact and fail. Tag the preflight `[users, config]` or
`always`. **new**
- **O9 — ubongo address outside ADR-007 subnets** · `STATUS.md:31 ↔ 007-network.md` ·
*drift* — 10.20.10.151 is in neither srv (10.20.0.0/24) nor mgmt (10.10.0.0/24);
`base__firewall_control_addr` depends on it. Already a tracked follow-up in the
ubongo-build plan. Reconcile address or ADR-007. **new**
### Low
- **O10 — README ADR list stops at 017** · `README.md:104` · *drift* — 018023 exist;
extend or trim to a pointer. **recurring** (evolved from prior O3)
- **O11 — ADR section-order vs ADR-023 §2** · `008:3, 014:98, 016:91, 017:66, 018:73` ·
*conformance* — Status-not-first / Decision-late; passes lint (order not gated) but not
the standard. Presentational restructure. **new**
- **O12 — ADR-007 FQDN convention vs its own example** · `007-network.md:160` ·
*consistency*`<service>.baobab.band` vs `forgejo.nyumbani.baobab.band`; ties to open
TODO 4 (split-horizon). **new**
- **O13 — dev_env `.zshrc` heritage carryovers** ·
`roles/dev_env/files/dotfiles/zsh/.zshrc:28,55` · *consistency* — hard-coded
`/usr/bin/rclone` alias (not installed by the role) + unguarded `direnv` hook. **new**
- **O14 — oh_my_posh config tasks untagged** · `roles/dev_env/tasks/oh_my_posh.yml:15-26`
· *consistency* — inconsistent `config` tagging vs per_user.yml. **new**
- **O15 — tfvars.example `pve01` vs ADR-007 `pve0`** ·
`terraform/environments/*/terraform.tfvars.example:9` · *consistency* — verify the real
node name, then align. **new**
- **O16 — ADR-013/015 "See also:" vs `## Related`** · *consistency* — stylistic; convert
for uniformity. **new**
- **O17 — empty scaffold `handlers/main.yml`** · `roles/{dev_env,base}/handlers/main.yml`
· *cruft* — confirm convention or delete. **new**
- **O18 — docs/README.md + inventories/README.md narrower than reality** · *consistency*
— omit several real subdirs / the offsite_hosts group. **new**
## Deferral checklist (Phase 2)
| Source | Items | Verdict |
|---|---|---|
| ADR-011 Deferred/Open | 5 (snapshot driver, cadences, health-check harness home, classification home, staging-first) | **All genuinely still open** — cross-checked against later ADRs + TODO 16. None silently resolved. |
| ADR-015 Deferred | #1 mesh VPN, #2 service-UI, #3 build | **All marked RESOLVED in place** (ADR-016 / ADR-017 / 2026-06-11 build). |
**Stale-deferred found: 0.** The recurring FRICTION.md miss did not recur this run.
## Scan false positives (folded in, not actionable)
- `broken-path-ref STATUS.md:38` — STATUS legitimately documents `roles/docker_host/` as
"Not in git." (intentional reference to an unbuilt role).
- `broken-adr-ref` ×4 — `ADR-099`/`ADR-100` in `tests/test_repo_scan.py` and the
adr-structure plan are intentional **test fixtures** for the scanner's bad-ref check.
- `marker` ×14 — all in `docs/superpowers/{plans,specs}/*` (historical commit-message
TODOs / plan steps) or prose discussing "over-tagging" as a concept. Not cruft.
## Prior-run diff (vs 2026-06-05)
**Resolved (7):** O1 VERIFY.md row · O2 new-role VERIFY step · O4 askari group naming ·
O5 backend.tf relabel · O6 ADR-014 reproducibility · O11 CAPABILITIES Level-4 row ·
O12 TODO 3.10. **Partial:** O3 (docs tree fixed in AF4; ADR-list carried as O10).
**Not re-detected (verify next run):** O7O10 (ADR-011 still Proposed).
## Follow-up prompt (copy-paste)
> Act on the open findings from `docs/reviews/2026-06-11-review.md`. Priority order:
> 1. **O1 (high):** `make lint` is red on `main` — `playbooks/site.yml` imports the
> non-existent `docker_host` role. Pick an interim posture (guard/skip the play, or
> `make new-role NAME=docker_host` to scaffold a stub, or exclude from syntax-check
> until built) so the trunk lints clean again, and record the choice in STATUS.md.
> 2. **O2 (high):** Resolve the ADR-004 ↔ ADR-022 backup-scope contradiction —
> update ADR-004's "not in scope of this repo" line to defer to ADR-022 (per ADR-023's
> no-silent-reversal rule) and cross-link.
> 3. **O3:** Add ACCESS.md + BACKUP.md rows to ADR-004's service-role file table.
> 4. **O4:** Reconcile CAPABILITIES' nvim/tmux exclusion with the built `dev_env` role
> (carve out the ubongo control-node exception).
> 5. **O8 (conformance):** Tag the `dev_env__home` preflight `set_fact` so partial
> `--tags users|config` runs don't fail.
> 6. **O6 / O9:** Regenerate the inventory stubs to include `offsite_hosts`; reconcile
> ubongo's 10.20.10.151 against ADR-007's subnets (or amend ADR-007).
> 7. Sweep the low-severity doc items (O5 caveat, O7 runbook, O10 ADR list, O11 ADR
> section order, O12O18) as a single docs-hygiene batch.
> Run `make lint` before committing; commit per CLAUDE.md git conventions.