11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01 description, base/playbooks/scripts/terraform/public_dns README build-state, CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23 stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported. make lint green. No stale-deferred (ADR-011 open questions still open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.7 KiB
Repo review — 2026-06-14
- Reviewed commit:
e346137(docs(plan): M4b — NetBird coordinator service role) - Mode: on-demand (interactive — auto-fixes applied + committed)
- Previous run: 2026-06-11 (
67f2aba) make lint: green before and after fixes (260 files, profile production; check-tags OK).
Summary
A lot shipped since the last review (M4a: docker_host Docker engine, reverse_proxy
Caddy applied to askari; offsite Terraform env live; ADR-024). Most findings this run are
the predictable docs-lagging-the-build kind — stale "not built yet" notes, a
reverse-proxy that switched from DNS-01/custom-image to vanilla HTTP-01 leaving stale
descriptions behind, and the Traefik→Caddy rename only half-propagated through the
ADR set. The previous run's blocker (O1, make lint RED) is resolved.
Counts
| Dimension | High | Medium | Low | Total |
|---|---|---|---|---|
| Cruft / staleness | 0 | 0 | 0 | 0 |
| Design conformance | 1 | 2 | 2 | 5 |
| Consistency & intent | 2 | 2 | 9 | 13 |
| Docs-vs-reality drift | 1 | 4 | 5 | 10 |
| Open total | 4 | 8 | 16 | 29 |
Plus 11 auto-fixes applied (3 high, 5 medium, 3 low).
Phase-0 scan
repo-scan.py: 5 roles, 25 ADRs · broken-adr-ref=4, broken-path-ref=2, marker=14,
open-deferred-item=5, stale-deferred=0. Every scan finding is a known false-positive
(test fixtures ADR-099/100; the roles/netbird/ references in the M4b plan for unbuilt
work; superpowers planning artifacts; 019-tagging.md:14 is prose about "over-tagging",
not a TODO). Details in the findings JSON.
Deferral checklist
All 5 ADR-011 "Open questions" (Proxmox snapshot driver, exact cadences, health-check
harness home, classification home, staging-first) confirmed genuinely still open —
ADR-011 is still Proposed/unbuilt, the same questions sit open in docs/TODO.md item 16,
and no later ADR or STATUS decides any of them. No stale-deferred (same as last run).
Auto-fixes applied
All safe/obvious (stale text contradicting code/reality, partial enumerations, broken descriptions) — no logic, variable, secret, or task-order changes.
| ID | Sev | File | What |
|---|---|---|---|
| AF1 | high | roles/reverse_proxy/meta/main.yml |
description still said DNS-01 + custom on-host image → rewrote to vanilla Caddy + HTTP-01 (matches the role since b7e919d) |
| AF2 | med | roles/README.md |
base hardening + docker_host/reverse_proxy/public_dns build-state was stale → reconciled with STATUS |
| AF3 | med | playbooks/README.md |
stale "docker_host has no tasks" note; added missing dns.yml + offsite.yml bullets |
| AF4 | low | roles/public_dns/README.md |
"askari in M4" → askari + *.askari records applied in M4a |
| AF5 | low | scripts/README.md |
added the missing check-tags.py entry (run by make lint) |
| AF6 | med | terraform/README.md |
added modules/hetzner_vm + environments/offsite (the one applied env) |
| AF7 | low | terraform/environments/offsite/providers.tf |
verified-stamp cax11@hel1 → cx23@hel1 (actual server) |
| AF8 | low | terraform/modules/hetzner_vm/variables.tf |
server_type example cax11 (ARM) → cx23 (x86) or cax11 (ARM) |
| AF9 | med | inventories/production/group_vars/all/public_dns.yml |
wildcard comment "cert via DNS-01" → ACME HTTP-01 (M4a) |
| AF10 | high | docs/CAPABILITIES.md |
reverse-proxy candidate Traefik → Caddy (ADR-024); public DNS "apply pending" → "applied (M1)" |
| AF11 | low | README.md |
Documentation ADR list extended ADR-017 → ADR-024 |
Open findings (prioritised)
High
- O1 — drift — STATUS.md:41 (+45-48) ↔ 33-34 (new): docker_host still appears in the "Scaffolded but empty — NOT implemented" table as a no-op, contradicting its own "Built + applied" rows and the real tasks file. Reword the scaffold row + closing paragraph (left for the operator — STATUS is the ground-truth doc).
- O2 — consistency — ADR-004:105,131 ↔ ADR-022 (recurring): ADR-004 says backup is "not in scope of this repo"; ADR-022 defines a full in-repo backup doctrine. Repoint ADR-004 at ADR-022 (ADR↔ADR design decision — report).
- O3 — consistency — ADR-024 Consequences ↔ ADR-008:70/017:27,88/019:52 (new): ADR-024 claims it updated ADR-017's Traefik prose to Caddy; it didn't, and ADR-008/019 still say Traefik too. Either finish the rename or soften ADR-024's claim.
- O4 — conformance — ADR-023:7-8,77-80 ↔ ADR-016/017/018 (recurring): ADR-023
claims ADRs 001–018 were restructured to lead with
## Status, but 016/017/018 still open with## Contextand bury Status. Fix the three ADRs or correct ADR-023 §6.
Medium
- O5 — ADR-004:48-50 (recurring): service-role file table omits ACCESS.md + BACKUP.md rows (now mandated by CLAUDE.md/ADR-021/022).
- O6 — ADR-002:82 (recurring):
make deploy PLAYBOOK=upgradecited as real, but noupgrade.ymlexists and ADR-011 is unbuilt — needs a(planned)caveat. - O7 — CAPABILITIES:150-155 ↔ STATUS:29 (recurring): nvim/tmux listed as a
"confirmed exclusion" while
dev_envinstalls them on ubongo; needs a control-host carve-out (not a token swap, so left from AF10). - O8 — dev_env tasks (include_tasks + per_user.yml:4-9) (recurring): untagged
set_fact dev_env__homepreflight + include withoutapply: tags:; a partial--tags users|configrun breaks (base guards this; dev_env doesn't). - O9 — inventories/production/hosts.yml (recurring): header claims TF-generated but
it's hand-maintained (carries ubongo, omits offsite_hosts);
tf-inventorywould drop ubongo. Make the header honest. - O10 — group_vars/all/vars.yml:42 ↔ ADR-007 (recurring): ubongo
10.20.10.151is in no ADR-007 subnet and undocumented;base__firewall_control_addrdepends on it. - O11 — terraform tfvars.example (both envs) (recurring):
pve01vs ADR-007'spve0; verify the real node name before changing. - O12 — roles/reverse_proxy/ (new): first built+applied service role, but missing SECURITY/VERIFY/ACCESS/BACKUP.md. (Recorded judgement: public_dns is exempt — control- node external-API role, not a host service.)
- O15 — runbooks/new-host.md Part E (recurring): still describes an
ansibleuser on ubongo; STATUS says ubongo is managed assjat(ansible-user bootstrap pending). - O18 — ADR-007/009/016 internal-zone name (new):
boma.baobab.bandvs targetboma.wingu.meused inconsistently across the doc set after M1; state the transition in one place.
Low
O13 (See-also vs ## Related in ADR-012/013/015/016/017/018 — recurring), O14
(docs/README + inventories/README narrow enumerations — recurring), O16 (.zshrc rclone
alias + unguarded direnv hook — recurring), O17 (oh_my_posh zen.toml tasks missing
config tag — recurring), O19 (ADR-009:122 nyumbani example after retirement —
recurring), O20 (ROADMAP M2 CAX11/ARM vs cx23/x86 — new), O21 (ADR-020 "ports will be
added in M4" stale; already opened in M4a — new), O22 (ADR-024 body still asserts custom-
image obligation contradicting its revised Status — new), O23 (netbird_coordinator vs
netbird role name across ADRs/ROADMAP/plan — new), O24 (*.boma.<domain> vs
*.<boma-domain> wildcard scope ADR-024 vs ROADMAP — new), O25 (tags: [verify] out of
the ADR-019 vocabulary in molecule verify — new), O26 (reverse_proxy templates lack
ansible_managed header — new), O27 (reverse_proxy vars in group_vars/all/ not
offsite_hosts/ — new), O28 (capacity-scan.py ignores offsite.yml — new), O29
(offsite.yml duplicates empty groups from hosts.yml, undocumented merge — new).
Full detail + suggested fixes in 2026-06-14-findings.json.
Themes worth a deliberate pass
- Finish the Traefik→Caddy rename (O3, and ADR-024 over-claimed it was done). One sweep across ADR-008/017/019 closes it.
- STATUS docker_host self-contradiction (O1) — quick, but it's the ground-truth doc.
- ADR-024 internal consistency (O22) — the role went vanilla/HTTP-01 but the ADR body still mandates the custom image; reconcile §2/§3/Consequences with its own Status.
- dev_env tag-isolation (O8) — the one real conformance bug with runtime impact;
mirror base's
apply: tags:guard. - First service-role doc quartet (O12) — reverse_proxy is the template for every future service role; getting SECURITY/VERIFY/ACCESS/BACKUP.md right now pays forward.
Follow-up prompt
Work the open findings from
docs/reviews/2026-06-14-review.md. Priority order: (1) O1 — fix the STATUS.md docker_host contradiction (it's built+applied, not a no-op; reword the "Scaffolded but empty" row + the 45-48 paragraph). (2) O3 + O22 — finish the Traefik→Caddy rename in ADR-008:70, ADR-017:27,88, ADR-019:52, and reconcile ADR-024's body (§2 custom image, §3 NetBird, Consequences) with its own revised HTTP-01 Status note. (3) O2 + O5 — repoint ADR-004's "backup not in scope" line at ADR-022 and add ACCESS.md + BACKUP.md rows to its service-role file table. (4) O8 — addapply: tags: [users, config]to dev_env's per_user.yml include and tag thedev_env__homeset_factalways; add a Molecule assertion that a partial--tags configrun still resolves the home dir. (5) O12 — author the four service-role doc files forroles/reverse_proxy/from the templates (BACKUP.md =backup__state: false, re-issuable certs). (6) O4 — restructure ADR-016/017/018 to lead with## Status, or correct ADR-023 §6. Then the medium drift items (O6 upgrade caveat, O7 nvim/tmux carve-out, O9 hosts.yml header, O15 new-host Part E, O18 internal-zone naming). Runmake lintafter each batch; commit per CLAUDE.md git conventions.