11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01 description, base/playbooks/scripts/terraform/public_dns README build-state, CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23 stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported. make lint green. No stale-deferred (ADR-011 open questions still open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
157 lines
9.7 KiB
Markdown
157 lines
9.7 KiB
Markdown
# Repo review — 2026-06-14
|
||
|
||
- **Reviewed commit:** `e346137` (docs(plan): M4b — NetBird coordinator service role)
|
||
- **Mode:** on-demand (interactive — auto-fixes applied + committed)
|
||
- **Previous run:** 2026-06-11 (`67f2aba`)
|
||
- **`make lint`:** green before and after fixes (260 files, profile production; check-tags OK).
|
||
|
||
## Summary
|
||
|
||
A lot shipped since the last review (M4a: `docker_host` Docker engine, `reverse_proxy`
|
||
Caddy applied to askari; offsite Terraform env live; ADR-024). Most findings this run are
|
||
the predictable **docs-lagging-the-build** kind — stale "not built yet" notes, a
|
||
reverse-proxy that switched from DNS-01/custom-image to vanilla HTTP-01 leaving stale
|
||
descriptions behind, and the **Traefik→Caddy** rename only half-propagated through the
|
||
ADR set. The previous run's blocker (O1, `make lint` RED) is **resolved**.
|
||
|
||
### Counts
|
||
|
||
| Dimension | High | Medium | Low | Total |
|
||
|---|---|---|---|---|
|
||
| Cruft / staleness | 0 | 0 | 0 | 0 |
|
||
| Design conformance | 1 | 2 | 2 | 5 |
|
||
| Consistency & intent | 2 | 2 | 9 | 13 |
|
||
| Docs-vs-reality drift | 1 | 4 | 5 | 10 |
|
||
| **Open total** | **4** | **8** | **16** | **29** |
|
||
|
||
Plus **11 auto-fixes applied** (3 high, 5 medium, 3 low).
|
||
|
||
### Phase-0 scan
|
||
|
||
`repo-scan.py`: 5 roles, 25 ADRs · broken-adr-ref=4, broken-path-ref=2, marker=14,
|
||
open-deferred-item=5, **stale-deferred=0**. Every scan finding is a known false-positive
|
||
(test fixtures ADR-099/100; the `roles/netbird/` references in the M4b *plan* for unbuilt
|
||
work; superpowers planning artifacts; `019-tagging.md:14` is prose about "over-tagging",
|
||
not a TODO). Details in the findings JSON.
|
||
|
||
### Deferral checklist
|
||
|
||
All 5 ADR-011 "Open questions" (Proxmox snapshot driver, exact cadences, health-check
|
||
harness home, classification home, staging-first) confirmed **genuinely still open** —
|
||
ADR-011 is still Proposed/unbuilt, the same questions sit open in `docs/TODO.md` item 16,
|
||
and no later ADR or STATUS decides any of them. **No stale-deferred** (same as last run).
|
||
|
||
## Auto-fixes applied
|
||
|
||
All safe/obvious (stale text contradicting code/reality, partial enumerations, broken
|
||
descriptions) — no logic, variable, secret, or task-order changes.
|
||
|
||
| ID | Sev | File | What |
|
||
|---|---|---|---|
|
||
| AF1 | high | `roles/reverse_proxy/meta/main.yml` | description still said DNS-01 + custom on-host image → rewrote to vanilla Caddy + HTTP-01 (matches the role since b7e919d) |
|
||
| AF2 | med | `roles/README.md` | base hardening + docker_host/reverse_proxy/public_dns build-state was stale → reconciled with STATUS |
|
||
| AF3 | med | `playbooks/README.md` | stale "docker_host has no tasks" note; added missing `dns.yml` + `offsite.yml` bullets |
|
||
| AF4 | low | `roles/public_dns/README.md` | "askari in M4" → askari + `*.askari` records applied in M4a |
|
||
| AF5 | low | `scripts/README.md` | added the missing `check-tags.py` entry (run by `make lint`) |
|
||
| AF6 | med | `terraform/README.md` | added `modules/hetzner_vm` + `environments/offsite` (the one applied env) |
|
||
| AF7 | low | `terraform/environments/offsite/providers.tf` | verified-stamp `cax11@hel1` → `cx23@hel1` (actual server) |
|
||
| AF8 | low | `terraform/modules/hetzner_vm/variables.tf` | `server_type` example `cax11 (ARM)` → `cx23 (x86) or cax11 (ARM)` |
|
||
| AF9 | med | `inventories/production/group_vars/all/public_dns.yml` | wildcard comment "cert via DNS-01" → ACME HTTP-01 (M4a) |
|
||
| AF10 | high | `docs/CAPABILITIES.md` | reverse-proxy candidate `Traefik` → `Caddy (ADR-024)`; public DNS "apply pending" → "applied (M1)" |
|
||
| AF11 | low | `README.md` | Documentation ADR list extended ADR-017 → ADR-024 |
|
||
|
||
## Open findings (prioritised)
|
||
|
||
### High
|
||
|
||
- **O1 — drift — STATUS.md:41 (+45-48) ↔ 33-34** *(new)*: docker_host still appears in
|
||
the "Scaffolded but empty — NOT implemented" table as a no-op, contradicting its own
|
||
"Built + applied" rows and the real tasks file. Reword the scaffold row + closing
|
||
paragraph (left for the operator — STATUS is the ground-truth doc).
|
||
- **O2 — consistency — ADR-004:105,131 ↔ ADR-022** *(recurring)*: ADR-004 says backup is
|
||
"not in scope of this repo"; ADR-022 defines a full in-repo backup doctrine. Repoint
|
||
ADR-004 at ADR-022 (ADR↔ADR design decision — report).
|
||
- **O3 — consistency — ADR-024 Consequences ↔ ADR-008:70/017:27,88/019:52** *(new)*:
|
||
ADR-024 claims it updated ADR-017's Traefik prose to Caddy; it didn't, and ADR-008/019
|
||
still say Traefik too. Either finish the rename or soften ADR-024's claim.
|
||
- **O4 — conformance — ADR-023:7-8,77-80 ↔ ADR-016/017/018** *(recurring)*: ADR-023
|
||
claims ADRs 001–018 were restructured to lead with `## Status`, but 016/017/018 still
|
||
open with `## Context` and bury Status. Fix the three ADRs or correct ADR-023 §6.
|
||
|
||
### Medium
|
||
|
||
- **O5 — ADR-004:48-50** *(recurring)*: service-role file table omits ACCESS.md +
|
||
BACKUP.md rows (now mandated by CLAUDE.md/ADR-021/022).
|
||
- **O6 — ADR-002:82** *(recurring)*: `make deploy PLAYBOOK=upgrade` cited as real, but no
|
||
`upgrade.yml` exists and ADR-011 is unbuilt — needs a `(planned)` caveat.
|
||
- **O7 — CAPABILITIES:150-155 ↔ STATUS:29** *(recurring)*: nvim/tmux listed as a
|
||
"confirmed exclusion" while `dev_env` installs them on ubongo; needs a control-host
|
||
carve-out (not a token swap, so left from AF10).
|
||
- **O8 — dev_env tasks (include_tasks + per_user.yml:4-9)** *(recurring)*: untagged
|
||
`set_fact dev_env__home` preflight + include without `apply: tags:`; a partial
|
||
`--tags users|config` run breaks (base guards this; dev_env doesn't).
|
||
- **O9 — inventories/production/hosts.yml** *(recurring)*: header claims TF-generated but
|
||
it's hand-maintained (carries ubongo, omits offsite_hosts); `tf-inventory` would drop
|
||
ubongo. Make the header honest.
|
||
- **O10 — group_vars/all/vars.yml:42 ↔ ADR-007** *(recurring)*: ubongo `10.20.10.151` is
|
||
in no ADR-007 subnet and undocumented; `base__firewall_control_addr` depends on it.
|
||
- **O11 — terraform tfvars.example (both envs)** *(recurring)*: `pve01` vs ADR-007's
|
||
`pve0`; verify the real node name before changing.
|
||
- **O12 — roles/reverse_proxy/** *(new)*: first built+applied service role, but missing
|
||
SECURITY/VERIFY/ACCESS/BACKUP.md. (Recorded judgement: public_dns is exempt — control-
|
||
node external-API role, not a host service.)
|
||
- **O15 — runbooks/new-host.md Part E** *(recurring)*: still describes an `ansible` user
|
||
on ubongo; STATUS says ubongo is managed as `sjat` (ansible-user bootstrap pending).
|
||
- **O18 — ADR-007/009/016 internal-zone name** *(new)*: `boma.baobab.band` vs target
|
||
`boma.wingu.me` used inconsistently across the doc set after M1; state the transition
|
||
in one place.
|
||
|
||
### Low
|
||
|
||
O13 (See-also vs `## Related` in ADR-012/013/015/016/017/018 — recurring), O14
|
||
(docs/README + inventories/README narrow enumerations — recurring), O16 (.zshrc rclone
|
||
alias + unguarded direnv hook — recurring), O17 (oh_my_posh zen.toml tasks missing
|
||
`config` tag — recurring), O19 (ADR-009:122 `nyumbani` example after retirement —
|
||
recurring), O20 (ROADMAP M2 CAX11/ARM vs cx23/x86 — new), O21 (ADR-020 "ports will be
|
||
added in M4" stale; already opened in M4a — new), O22 (ADR-024 body still asserts custom-
|
||
image obligation contradicting its revised Status — new), O23 (`netbird_coordinator` vs
|
||
`netbird` role name across ADRs/ROADMAP/plan — new), O24 (`*.boma.<domain>` vs
|
||
`*.<boma-domain>` wildcard scope ADR-024 vs ROADMAP — new), O25 (`tags: [verify]` out of
|
||
the ADR-019 vocabulary in molecule verify — new), O26 (reverse_proxy templates lack
|
||
`ansible_managed` header — new), O27 (reverse_proxy vars in `group_vars/all/` not
|
||
`offsite_hosts/` — new), O28 (capacity-scan.py ignores `offsite.yml` — new), O29
|
||
(offsite.yml duplicates empty groups from hosts.yml, undocumented merge — new).
|
||
|
||
Full detail + suggested fixes in `2026-06-14-findings.json`.
|
||
|
||
## Themes worth a deliberate pass
|
||
|
||
1. **Finish the Traefik→Caddy rename** (O3, and ADR-024 over-claimed it was done). One
|
||
sweep across ADR-008/017/019 closes it.
|
||
2. **STATUS docker_host self-contradiction** (O1) — quick, but it's the ground-truth doc.
|
||
3. **ADR-024 internal consistency** (O22) — the role went vanilla/HTTP-01 but the ADR
|
||
body still mandates the custom image; reconcile §2/§3/Consequences with its own Status.
|
||
4. **dev_env tag-isolation** (O8) — the one real conformance bug with runtime impact;
|
||
mirror base's `apply: tags:` guard.
|
||
5. **First service-role doc quartet** (O12) — reverse_proxy is the template for every
|
||
future service role; getting SECURITY/VERIFY/ACCESS/BACKUP.md right now pays forward.
|
||
|
||
## Follow-up prompt
|
||
|
||
> Work the open findings from `docs/reviews/2026-06-14-review.md`. Priority order:
|
||
> (1) **O1** — fix the STATUS.md docker_host contradiction (it's built+applied, not a
|
||
> no-op; reword the "Scaffolded but empty" row + the 45-48 paragraph).
|
||
> (2) **O3 + O22** — finish the Traefik→Caddy rename in ADR-008:70, ADR-017:27,88,
|
||
> ADR-019:52, and reconcile ADR-024's body (§2 custom image, §3 NetBird, Consequences)
|
||
> with its own revised HTTP-01 Status note.
|
||
> (3) **O2 + O5** — repoint ADR-004's "backup not in scope" line at ADR-022 and add
|
||
> ACCESS.md + BACKUP.md rows to its service-role file table.
|
||
> (4) **O8** — add `apply: tags: [users, config]` to dev_env's per_user.yml include and
|
||
> tag the `dev_env__home` set_fact `always`; add a Molecule assertion that a partial
|
||
> `--tags config` run still resolves the home dir.
|
||
> (5) **O12** — author the four service-role doc files for `roles/reverse_proxy/` from the
|
||
> templates (BACKUP.md = `backup__state: false`, re-issuable certs).
|
||
> (6) **O4** — restructure ADR-016/017/018 to lead with `## Status`, or correct ADR-023 §6.
|
||
> Then the medium drift items (O6 upgrade caveat, O7 nvim/tmux carve-out, O9 hosts.yml
|
||
> header, O15 new-host Part E, O18 internal-zone naming). Run `make lint` after each
|
||
> batch; commit per CLAUDE.md git conventions.
|