2026-06-14 18:37:54 +02:00
# Repo review — 2026-06-14
2026-05-30 19:10:58 +02:00
2026-06-14 18:37:54 +02:00
- **Reviewed commit:** `e346137` (docs(plan): M4b — NetBird coordinator service role)
- **Mode:** on-demand (interactive — auto-fixes applied + committed)
- **Previous run:** 2026-06-11 (`67f2aba` )
- **`make lint` :** green before and after fixes (260 files, profile production; check-tags OK).
2026-06-05 18:24:39 +02:00
## Summary
2026-05-30 19:10:58 +02:00
2026-06-14 18:37:54 +02:00
A lot shipped since the last review (M4a: `docker_host` Docker engine, `reverse_proxy`
Caddy applied to askari; offsite Terraform env live; ADR-024). Most findings this run are
the predictable **docs-lagging-the-build** kind — stale "not built yet" notes, a
reverse-proxy that switched from DNS-01/custom-image to vanilla HTTP-01 leaving stale
descriptions behind, and the **Traefik→Caddy** rename only half-propagated through the
ADR set. The previous run's blocker (O1, `make lint` RED) is **resolved** .
### Counts
| Dimension | High | Medium | Low | Total |
2026-05-30 19:10:58 +02:00
|---|---|---|---|---|
2026-06-14 18:37:54 +02:00
| Cruft / staleness | 0 | 0 | 0 | 0 |
| Design conformance | 1 | 2 | 2 | 5 |
| Consistency & intent | 2 | 2 | 9 | 13 |
| Docs-vs-reality drift | 1 | 4 | 5 | 10 |
| **Open total** | **4** | **8** | **16** | **29** |
Plus **11 auto-fixes applied** (3 high, 5 medium, 3 low).
2026-06-11 14:48:00 +02:00
2026-06-14 18:37:54 +02:00
### Phase-0 scan
2026-06-11 14:48:00 +02:00
2026-06-14 18:37:54 +02:00
`repo-scan.py` : 5 roles, 25 ADRs · broken-adr-ref=4, broken-path-ref=2, marker=14,
open-deferred-item=5, **stale-deferred=0** . Every scan finding is a known false-positive
(test fixtures ADR-099/100; the `roles/netbird/` references in the M4b *plan* for unbuilt
work; superpowers planning artifacts; `019-tagging.md:14` is prose about "over-tagging",
not a TODO). Details in the findings JSON.
2026-06-11 14:48:00 +02:00
2026-06-14 18:37:54 +02:00
### Deferral checklist
All 5 ADR-011 "Open questions" (Proxmox snapshot driver, exact cadences, health-check
harness home, classification home, staging-first) confirmed **genuinely still open** —
ADR-011 is still Proposed/unbuilt, the same questions sit open in `docs/TODO.md` item 16,
and no later ADR or STATUS decides any of them. **No stale-deferred** (same as last run).
2026-06-11 14:48:00 +02:00
## Auto-fixes applied
2026-06-14 18:37:54 +02:00
All safe/obvious (stale text contradicting code/reality, partial enumerations, broken
descriptions) — no logic, variable, secret, or task-order changes.
2026-06-11 14:48:00 +02:00
2026-06-14 18:37:54 +02:00
| ID | Sev | File | What |
2026-06-11 14:48:00 +02:00
|---|---|---|---|
2026-06-14 18:37:54 +02:00
| AF1 | high | `roles/reverse_proxy/meta/main.yml` | description still said DNS-01 + custom on-host image → rewrote to vanilla Caddy + HTTP-01 (matches the role since b7e919d) |
| AF2 | med | `roles/README.md` | base hardening + docker_host/reverse_proxy/public_dns build-state was stale → reconciled with STATUS |
| AF3 | med | `playbooks/README.md` | stale "docker_host has no tasks" note; added missing `dns.yml` + `offsite.yml` bullets |
| AF4 | low | `roles/public_dns/README.md` | "askari in M4" → askari + `*.askari` records applied in M4a |
| AF5 | low | `scripts/README.md` | added the missing `check-tags.py` entry (run by `make lint` ) |
| AF6 | med | `terraform/README.md` | added `modules/hetzner_vm` + `environments/offsite` (the one applied env) |
| AF7 | low | `terraform/environments/offsite/providers.tf` | verified-stamp `cax11@hel1` → `cx23@hel1` (actual server) |
| AF8 | low | `terraform/modules/hetzner_vm/variables.tf` | `server_type` example `cax11 (ARM)` → `cx23 (x86) or cax11 (ARM)` |
| AF9 | med | `inventories/production/group_vars/all/public_dns.yml` | wildcard comment "cert via DNS-01" → ACME HTTP-01 (M4a) |
| AF10 | high | `docs/CAPABILITIES.md` | reverse-proxy candidate `Traefik` → `Caddy (ADR-024)` ; public DNS "apply pending" → "applied (M1)" |
| AF11 | low | `README.md` | Documentation ADR list extended ADR-017 → ADR-024 |
2026-06-11 14:48:00 +02:00
## Open findings (prioritised)
### High
2026-06-14 18:37:54 +02:00
- **O1 — drift — STATUS.md:41 (+45-48) ↔ 33-34** *(new)* : docker_host still appears in
the "Scaffolded but empty — NOT implemented" table as a no-op, contradicting its own
"Built + applied" rows and the real tasks file. Reword the scaffold row + closing
paragraph (left for the operator — STATUS is the ground-truth doc).
- **O2 — consistency — ADR-004:105,131 ↔ ADR-022** *(recurring)* : ADR-004 says backup is
"not in scope of this repo"; ADR-022 defines a full in-repo backup doctrine. Repoint
ADR-004 at ADR-022 (ADR↔ADR design decision — report).
- **O3 — consistency — ADR-024 Consequences ↔ ADR-008:70/017:27,88/019:52** *(new)* :
ADR-024 claims it updated ADR-017's Traefik prose to Caddy; it didn't, and ADR-008/019
still say Traefik too. Either finish the rename or soften ADR-024's claim.
- **O4 — conformance — ADR-023:7-8,77-80 ↔ ADR-016/017/018** *(recurring)* : ADR-023
claims ADRs 001– 018 were restructured to lead with `## Status` , but 016/017/018 still
open with `## Context` and bury Status. Fix the three ADRs or correct ADR-023 §6.
2026-06-11 14:48:00 +02:00
### Medium
2026-06-14 18:37:54 +02:00
- **O5 — ADR-004:48-50** *(recurring)* : service-role file table omits ACCESS.md +
BACKUP.md rows (now mandated by CLAUDE.md/ADR-021/022).
- **O6 — ADR-002:82** *(recurring)* : `make deploy PLAYBOOK=upgrade` cited as real, but no
`upgrade.yml` exists and ADR-011 is unbuilt — needs a `(planned)` caveat.
- **O7 — CAPABILITIES:150-155 ↔ STATUS:29** *(recurring)* : nvim/tmux listed as a
"confirmed exclusion" while `dev_env` installs them on ubongo; needs a control-host
carve-out (not a token swap, so left from AF10).
- **O8 — dev_env tasks (include_tasks + per_user.yml:4-9)** *(recurring)* : untagged
`set_fact dev_env__home` preflight + include without `apply: tags:` ; a partial
`--tags users|config` run breaks (base guards this; dev_env doesn't).
- **O9 — inventories/production/hosts.yml** *(recurring)* : header claims TF-generated but
it's hand-maintained (carries ubongo, omits offsite_hosts); `tf-inventory` would drop
ubongo. Make the header honest.
- **O10 — group_vars/all/vars.yml:42 ↔ ADR-007** *(recurring)* : ubongo `10.20.10.151` is
in no ADR-007 subnet and undocumented; `base__firewall_control_addr` depends on it.
- **O11 — terraform tfvars.example (both envs)** *(recurring)* : `pve01` vs ADR-007's
`pve0` ; verify the real node name before changing.
- **O12 — roles/reverse_proxy/** *(new)* : first built+applied service role, but missing
SECURITY/VERIFY/ACCESS/BACKUP.md. (Recorded judgement: public_dns is exempt — control-
node external-API role, not a host service.)
- **O15 — runbooks/new-host.md Part E** *(recurring)* : still describes an `ansible` user
on ubongo; STATUS says ubongo is managed as `sjat` (ansible-user bootstrap pending).
- **O18 — ADR-007/009/016 internal-zone name** *(new)* : `boma.baobab.band` vs target
`boma.wingu.me` used inconsistently across the doc set after M1; state the transition
in one place.
2026-06-11 14:48:00 +02:00
### Low
2026-06-14 18:37:54 +02:00
O13 (See-also vs `## Related` in ADR-012/013/015/016/017/018 — recurring), O14
(docs/README + inventories/README narrow enumerations — recurring), O16 (.zshrc rclone
alias + unguarded direnv hook — recurring), O17 (oh_my_posh zen.toml tasks missing
`config` tag — recurring), O19 (ADR-009:122 `nyumbani` example after retirement —
recurring), O20 (ROADMAP M2 CAX11/ARM vs cx23/x86 — new), O21 (ADR-020 "ports will be
added in M4" stale; already opened in M4a — new), O22 (ADR-024 body still asserts custom-
image obligation contradicting its revised Status — new), O23 (`netbird_coordinator` vs
`netbird` role name across ADRs/ROADMAP/plan — new), O24 (`*.boma.<domain>` vs
`*.<boma-domain>` wildcard scope ADR-024 vs ROADMAP — new), O25 (`tags: [verify]` out of
the ADR-019 vocabulary in molecule verify — new), O26 (reverse_proxy templates lack
`ansible_managed` header — new), O27 (reverse_proxy vars in `group_vars/all/` not
`offsite_hosts/` — new), O28 (capacity-scan.py ignores `offsite.yml` — new), O29
(offsite.yml duplicates empty groups from hosts.yml, undocumented merge — new).
Full detail + suggested fixes in `2026-06-14-findings.json` .
## Themes worth a deliberate pass
1. **Finish the Traefik→Caddy rename** (O3, and ADR-024 over-claimed it was done). One
sweep across ADR-008/017/019 closes it.
2. **STATUS docker_host self-contradiction** (O1) — quick, but it's the ground-truth doc.
3. **ADR-024 internal consistency** (O22) — the role went vanilla/HTTP-01 but the ADR
body still mandates the custom image; reconcile §2/§3/Consequences with its own Status.
4. **dev_env tag-isolation** (O8) — the one real conformance bug with runtime impact;
mirror base's `apply: tags:` guard.
5. **First service-role doc quartet** (O12) — reverse_proxy is the template for every
future service role; getting SECURITY/VERIFY/ACCESS/BACKUP.md right now pays forward.
## Follow-up prompt
> Work the open findings from `docs/reviews/2026-06-14-review.md`. Priority order:
> (1) **O1** — fix the STATUS.md docker_host contradiction (it's built+applied, not a
> no-op; reword the "Scaffolded but empty" row + the 45-48 paragraph).
> (2) **O3 + O22** — finish the Traefik→Caddy rename in ADR-008:70, ADR-017:27,88,
> ADR-019:52, and reconcile ADR-024's body (§2 custom image, §3 NetBird, Consequences)
> with its own revised HTTP-01 Status note.
> (3) **O2 + O5** — repoint ADR-004's "backup not in scope" line at ADR-022 and add
> ACCESS.md + BACKUP.md rows to its service-role file table.
> (4) **O8** — add `apply: tags: [users, config]` to dev_env's per_user.yml include and
> tag the `dev_env__home` set_fact `always`; add a Molecule assertion that a partial
> `--tags config` run still resolves the home dir.
> (5) **O12** — author the four service-role doc files for `roles/reverse_proxy/` from the
> templates (BACKUP.md = `backup__state: false`, re-issuable certs).
> (6) **O4** — restructure ADR-016/017/018 to lead with `## Status`, or correct ADR-023 §6.
> Then the medium drift items (O6 upgrade caveat, O7 nvim/tmux carve-out, O9 hosts.yml
> header, O15 new-host Part E, O18 internal-zone naming). Run `make lint` after each
> batch; commit per CLAUDE.md git conventions.