boma/docs/reviews/2026-06-05-review.md

94 lines
7.6 KiB
Markdown
Raw Normal View History

# Repo review — 2026-06-05
- **Reviewed commit:** `f566fd1` (scan); auto-fixes landed in `666ad42`
- **Mode:** on-demand (interactive)
- **Scope:** whole repo — 2 roles, 17 ADRs, 4 runbooks, 7 scripts; doc-heavy
- **Prior run:** 2026-05-30 (`de38d1c`) — 7 auto-fixed, 17 open
## Summary
| | high | medium | low | total |
|---|---|---|---|---|
| Auto-fixed | 2 | 0 | 2 | 4 |
| Open (report-only) | 0 | 5 | 7 | 12 |
This review followed a session of heavy documentation work (ADR-015 `ubongo`,
ADR-016 NetBird mesh, ADR-017 Level-4 verification). Most findings are **propagation
gaps** — a new decision landed but an older doc still reflects the prior design.
**New deferral check exercised.** `repo-scan.py` now enumerates open ADR "Deferred/
Open" items and flags any another file calls resolved-but-unmarked. This run: 6
open-deferred-items surfaced, **all confirmed genuinely open** by the cross-cutting
reviewer (ADR-011 #15, ADR-015 #3), **0 stale-deferred**. The check produced no false
resolutions and the judgement layer agreed — working as designed.
## Auto-fixes applied (`666ad42`)
| id | dim | sev | location | fix |
|---|---|---|---|---|
| AF1 | consistency | high | `docs/decisions/005-bootstrapping.md:36`, `docs/runbooks/new-host.md:62,71` | Removed "Terraform writes the host's DNS A record" — contradicts ADR-009 (the `dns` role owns the zone). **Recurring**: the 2026-05-30 run fixed the same contradiction in README/ADR-003; it reappeared in two more files. |
| AF2 | consistency | high | `docs/decisions/005-bootstrapping.md:8` | Control node described as cloned from the cloud-init template; ADR-015 makes `ubongo` a physical box installed directly. Corrected. |
| AF3 | consistency | low | `CLAUDE.md:197` | Added the missing `docs/testing/service-verify-template.md` row to Further reading (parallels the security-template row). |
| AF4 | cruft | low | `docs/TODO.md:79` | Typos: "we we" → "we"; "seperate" → "separate". |
## Open findings (report-only)
### VERIFY.md propagation cluster (ADR-017 not fully threaded through)
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O1 | medium | `docs/decisions/004-docker-model.md` (file table) | The service-role standard lists `SECURITY.md` but not `VERIFY.md`, though ADR-017 + CLAUDE.md:85 now mandate it. | Add a `VERIFY.md` row to ADR-004's file table. |
| O2 | medium | `docs/runbooks/new-role.md` (step 9 → Commit) | No step to write `VERIFY.md` for service roles (only `SECURITY.md`). Makes `STATUS.md:17` ("runbooks current and mutually reconciled") slightly overstated. | Add a "write the per-service verification spec" step mirroring the SECURITY.md step. |
| O3 | low | `README.md:58-60, 94` | ADR list stops at 001009 (010017 absent); the `docs/` tree omits `security/`, `testing/`, `hardware/`. | Extend the ADR list (or point to `docs/decisions/` + CLAUDE.md's table); expand the `docs/` subtree. |
### Design gaps from the recent ADRs
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O4 | medium | `CLAUDE.md:106`, `docs/decisions/009-provisioning-handoff.md:78`, `scripts/tf_to_inventory.py:24` | ADR-016 says "`askari` is Ansible-managed — its own inventory group", but no group is named anywhere; host-groups list + valid-groups set don't include it. | Decide the group name (e.g. `edge_hosts`/`hetzner_hosts`), add to CLAUDE.md host groups + ADR-009 valid groups. (`askari` is manual like the control node, so `tf_to_inventory.py` need not generate it, but the group must be valid.) |
| O5 | medium | `docs/decisions/006-terraform.md:78` | `backend.tf` labelled "Forgejo state backend", contradicting ADR-006's own State-backend section (local state on `ubongo`; Forgejo's API is read-only). | Relabel to "local state backend (no remote backend)". |
| O6 | medium | `docs/decisions/014-knowledge-sourcing.md:88` | Plugin-reproducibility described as open ("tracked in `docs/TODO.md`"), but TODO 10.7 is marked DONE (settings.json declares the plugin set; claude-code-setup.md covers bootstrap). | Update to reflect the resolved state; drop the forward-pointer. |
### Clarity / lower-priority consistency
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O7 | low | `docs/decisions/011-update-management.md:128` | "Digest-pinning the stateful tier" sits in the ruled-out table, but Decision #2 *adopts* `tag@digest` for stateful (TODO 16 confirms). ADR-011 is still **Proposed/draft**. | Remove/replace the ruled-out row when accepting ADR-011 (TODO 16). |
| O8 | low | `docs/decisions/003-toolchain.md:85`, `docs/decisions/010-forgejo-ci.md:66` | "act_runner on the control node **or a dedicated runner VM**" reads ambiguously against ADR-015 (no cluster control VM). Not wrong (a runner VM is a separate option) but worth disambiguating. | Name `ubongo` as the runner host; cross-ref ADR-015; keep "dedicated runner VM" as an explicit future option. |
| O9 | low | `docs/decisions/008-testing.md:148` | The "WireGuard tunnel establishment" Molecule-exclusion row is framed for the retired OPNsense VLAN-99 WireGuard; NetBird still uses WireGuard (`wt0`) as its data plane. | Reframe the row to the NetBird `wt0` data-plane (ADR-016). |
| O10 | low | `docs/decisions/011-update-management.md:67` | Cross-references "the `scheduled_jobs` plan and ADR-010"; ADR-010 is Forgejo CI, not scheduled jobs (that's TODO 8.3, unbuilt). | Point to TODO 8.3 instead. |
| O11 | low | `docs/CAPABILITIES.md` §10 | No row for the `/verify-service` (Level 4) capability though ADR-017 decided it. | Add an Operations row for `/verify-service`. |
| O12 | low | `docs/TODO.md:30` (item 3.10) | Garbled text ("maybe something in the improvements of the methods in boma moods the point?") — unfollowable. | Rewrite the question clearly or strike it. |
### Deterministic-scan noise (not fixed — known limitations)
- **`broken-path-ref` ×14** — all illustrative/future paths: report-name templates
(`docs/testing/reviews/YYYY-MM-DD-<service>.md`) and `latest.md` files not yet
created. The path-ref check stops at the `<placeholder>` boundary, so a templated
path registers as a partial broken ref. *Potential scanner improvement: skip a path
ref immediately followed by a placeholder char or a `YYYY-MM-DD` token.*
- **`marker` ×35** — mostly prose references to `TODO.md` items, not code markers.
Known noise; the regex already excludes `TODO.md`/alternations but not "TODO 8.2"
prose.
- **`open-deferred-item` ×6** — all confirmed genuinely open (see above). `0`
stale-deferred. New check healthy.
## Diff vs prior run (2026-05-30)
- **Recurring:** the Terraform-writes-DNS contradiction (AF1) — fixed in README/ADR-003
last run, reappeared in ADR-005/new-host.md. Signal that this phrasing keeps being
copied; worth a `/review-repo`-time grep for "writes … DNS A record".
- **New:** everything else — the repo gained ADR-010…017 and the `ubongo`/NetBird/
Level-4 work since the prior run, so most findings are fresh propagation gaps.
- **Resolved:** prior-run open items were largely addressed during the intervening
doc work (control-node-as-VM, WireGuard framing, etc., now mostly reconciled).
## Follow-up prompt
> Thread the ADR-017 `VERIFY.md` convention through the remaining docs (O1O3): add a
> `VERIFY.md` row to ADR-004's service-role file table, a VERIFY.md step to
> `new-role.md` (and reconcile STATUS.md:17), and refresh `README.md`'s ADR list +
> `docs/` tree. Then settle the `askari` inventory group name (O4) and propagate it to
> CLAUDE.md host-groups + ADR-009 valid-groups. Finally clear the stale labels O5
> (ADR-006 backend.tf) and O6 (ADR-014 plugin reproducibility = DONE).