Compare commits

...

3 commits

Author SHA1 Message Date
3dd03d4198 review-repo: 2026-06-05 report (4 auto-fixed, 12 open)
Stale-deferred check exercised: 6 open-deferred-items all confirmed genuinely
open, 0 stale-deferred. Top open: thread ADR-017 VERIFY.md convention through
ADR-004/new-role/README; name the askari inventory group (ADR-016).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:24:39 +02:00
666ad42634 review-repo: fix DNS-write contradictions + stale control-node/template refs
Auto-fixes from /review-repo:
- ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record"
  (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run)
- ADR-005: control node is physical ubongo, not cloned from the template (ADR-015)
- CLAUDE.md: add the VERIFY.md template to Further reading
- TODO.md: typo fixes (we we / seperate)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:23:16 +02:00
f566fd17eb review-repo: add stale-deferred check for ADR Deferred entries
repo-scan.py now enumerates open ADR "Deferred/Open" items and flags any that
another file describes as resolved but which isn't marked resolved in place
(the recurring miss in docs/FRICTION.md). review-repo.md's Phase 2 reviewer
confirms each open item against later ADRs/STATUS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:13:49 +02:00
9 changed files with 367 additions and 35 deletions

View file

@ -27,6 +27,11 @@ Run `python3 scripts/repo-scan.py > /tmp/repo-scan.json`. It returns the **inven
(roles, ADRs, runbooks, playbooks, scripts — your shard list) and **exact findings**
(markers, broken refs, unencrypted vaults). Fold these into the report verbatim.
It also emits two deferral checks (see Phase 2): `open-deferred-item` (every still-open
ADR "Deferred/Open" entry — a checklist to confirm) and `stale-deferred` (an entry
another file describes as resolved but which isn't marked resolved in place —
high-confidence, usually auto-fixable by marking the source ADR's entry RESOLVED).
### Phase 1 — fan-out judgement review
Scale to repo size:
- **Small** (≤ ~10 roles, like boma today): a few sub-agents, or one pass per area.
@ -42,6 +47,13 @@ location (file:line), description, suggested_fix, auto_fixable (bool)}`.
- Merge and dedupe all findings (deterministic + reviewer).
- Run **one cross-cutting reviewer** over the full ADR set + `STATUS.md` + `CLAUDE.md`
to catch contradictions that span files (per-shard agents can't see these).
- **Resolve the deferral checklist.** For every `open-deferred-item` from Phase 0,
judge whether it is *genuinely* still open: search later ADRs / `STATUS.md` for a
decision on that subject (a deferred item often resolves silently when a later ADR
lands). If it has been decided, it is a stale-deferred finding — the fix is to mark
that entry RESOLVED in its **source ADR's** Deferred list (the spot the resolving
ADR's own change won't have touched). Treat every `stale-deferred` finding as
high-confidence. This is the recurring miss logged in `docs/FRICTION.md`.
- Diff against the previous run's `docs/reviews/<prev>-findings.json` and tag each
finding **new / recurring / resolved**.
- Prioritise by severity; split into auto-fixable vs report-only.

View file

@ -195,6 +195,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
| Accepted security risks | `docs/security/accepted-risks.md` |
| Per-service security checklist | `docs/security/service-checklist.md` |
| Per-service security record (template) | `docs/security/service-security-template.md` |
| Per-service verification spec (template) | `docs/testing/service-verify-template.md` |
| Heritage / V4 policy | `docs/decisions/013-heritage-v4.md` |
| Sourcing tech knowledge | `docs/decisions/014-knowledge-sourcing.md` |
| Toolchain choices | `docs/decisions/003-toolchain.md` |

View file

@ -28,20 +28,17 @@
8. Ensure the right things are backed up (incl. database dumps if we land on PBS).
9. Decide: a central database server, or individual database services per app?
10. Should we continue to use the base-container method, or maybe something in the improvements of the methods in boma moods the point?
11. Deliberate tagging strategy.
4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?
5. **Control node**
1. Set up and test the control node while waiting for hardware.
2. Define control-node bootstrapping — a dedicated recipe and playbook?
3. Decide the role of mamba — access/availability vs compute power and ease?
4. Set up rbw on the control node.
3. Set up rbw on the control node.
6. **Updating**
1. Decide pinning vs latest for versions.
2. Decide the update strategy across services & containers vs packages &
builds / GitHub pulls / Flatpaks.
3. Define scheduling of updates and reboots, including post-update testing.
6. **Updating** 2. Decide the update strategy across services & containers vs packages &
builds / GitHub pulls / Flatpaks. 3. Define scheduling of updates and reboots, including post-update testing.
7. **Shell setup**
1. Decide what shell setup matters for the AI's work on the control node.
@ -79,7 +76,7 @@
remaining setup to carry out from this decision?
1. ~~Policy for how we collaborate with references to baobabAnsibleV4 without misusing it.~~ DECIDED — ADR-013.
2. Policy for how we write key documents like ADRs.
3. Further development on how we we collaborate on designing the foundation for the project - seperate from how we implement new containers etc.
3. Further development on how we collaborate on designing the foundation for the project - separate from how we implement new containers etc.
4. ~~How do we make sure agents always use the latest official documentation for the technologies etc. we use?~~ DECIDED — ADR-014 (facts → version-matched docs, cited + stamped; best practices → translated per ADR-013; risk-based triggers; graceful fallback to WebFetch).
5. Always subagent driven?
6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
@ -99,7 +96,7 @@
2. Keep appending raw signals to `docs/FRICTION.md` (live now) until the
retro consumes them.
12. **Spin-up order** — what is the right order of operations when spinning up
12. **Spin-up / build order** — what is the right order of operations when spinning up
from scratch (OS, DNS, Authentik, Traefik, …)?
13. **Intentions** - Is the current setup clearly identifying intentions throughout? We have the readme files but is that enough? Also, how do we rechallange desisions and how they interact over time. I.e. We have these two services running, but extending one a little bit could make the other redundant so we could remove it. Or an alternative to this services has emerged, and it is actually better.

View file

@ -6,8 +6,8 @@ This document defines the **cloud-init template** that managed VMs are cloned
from, and the **control-node** bootstrapping special case. The per-host
provisioning pipeline — how a VM is created from this template and handed off to
Ansible — is owned by ADR-009. Terraform clones the template defined here; the
template is the base image both for Terraform-managed hosts and for the manually
provisioned control node.
template is the base image for Terraform-managed hosts. The control node (`ubongo`)
is a physical machine installed directly, not cloned from this template (ADR-015).
## Approach: Proxmox cloud-init template
@ -32,10 +32,10 @@ High-level steps:
## VM provisioning (per new host)
Per-host VMs are created by **Terraform**, which clones this template, sets the
cloud-init values (hostname, SSH public key, IP/gateway), and writes the host's
DNS A record. Cloud-init runs at first boot (~3060 seconds), leaving the VM
reachable via SSH with the ansible user's key.
Per-host VMs are created by **Terraform**, which clones this template and sets the
cloud-init values (hostname, SSH public key, IP/gateway). Cloud-init runs at first
boot (~3060 seconds), leaving the VM reachable via SSH with the ansible user's key.
Terraform writes no DNS records — the `dns` role owns the internal zone (ADR-009).
The full create → inventory → configure pipeline, and the Terraform↔Ansible data
contract, are defined in **ADR-009 (provisioning handoff)**. There is no manual

View file

@ -0,0 +1,88 @@
{
"date": "2026-06-05",
"reviewed_commit": "f566fd1",
"fixes_commit": "666ad42",
"mode": "on-demand",
"counts": {
"auto_fixed": 4,
"open": 12,
"scan": {"broken-path-ref": 14, "marker": 35, "open-deferred-item": 6, "stale-deferred": 0}
},
"auto_fixed": [
{"id": "AF1", "dimension": "consistency", "severity": "high",
"location": "docs/decisions/005-bootstrapping.md:36; docs/runbooks/new-host.md:62,71",
"description": "Terraform 'writes the host's DNS A record' contradicts ADR-009 (dns role owns the zone)",
"fix": "removed the DNS-write clause; noted Terraform writes no DNS records",
"tag": "recurring"},
{"id": "AF2", "dimension": "consistency", "severity": "high",
"location": "docs/decisions/005-bootstrapping.md:8",
"description": "control node described as cloned from the cloud-init template; ADR-015 makes ubongo physical",
"fix": "control node is a physical box installed directly, not cloned (ADR-015)",
"tag": "new"},
{"id": "AF3", "dimension": "consistency", "severity": "low",
"location": "CLAUDE.md:197",
"description": "Further reading missing the VERIFY.md template row",
"fix": "added docs/testing/service-verify-template.md row",
"tag": "new"},
{"id": "AF4", "dimension": "cruft", "severity": "low",
"location": "docs/TODO.md:79",
"description": "typos: 'we we', 'seperate'",
"fix": "corrected to 'we' and 'separate'",
"tag": "new"}
],
"open": [
{"id": "O1", "dimension": "consistency", "severity": "medium",
"location": "docs/decisions/004-docker-model.md",
"description": "service-role standard file table lists SECURITY.md but not VERIFY.md (ADR-017/CLAUDE.md:85 mandate it)",
"suggested_fix": "add a VERIFY.md row to ADR-004's file table", "tag": "new"},
{"id": "O2", "dimension": "consistency", "severity": "medium",
"location": "docs/runbooks/new-role.md",
"description": "no step to write VERIFY.md for service roles; STATUS.md:17 'runbooks reconciled' now overstated",
"suggested_fix": "add a VERIFY.md step mirroring the SECURITY.md step", "tag": "new"},
{"id": "O3", "dimension": "cruft", "severity": "low",
"location": "README.md:58-60,94",
"description": "ADR list stops at 001-009; docs/ tree omits security/, testing/, hardware/",
"suggested_fix": "extend ADR list + docs/ subtree", "tag": "new"},
{"id": "O4", "dimension": "consistency", "severity": "medium",
"location": "CLAUDE.md:106; docs/decisions/009-provisioning-handoff.md:78; scripts/tf_to_inventory.py:24",
"description": "ADR-016 says askari gets its own inventory group but none is named; valid-groups set excludes it",
"suggested_fix": "name the group; add to host-groups + ADR-009 valid groups", "tag": "new"},
{"id": "O5", "dimension": "consistency", "severity": "medium",
"location": "docs/decisions/006-terraform.md:78",
"description": "backend.tf labelled 'Forgejo state backend' contradicts ADR-006's own local-state section",
"suggested_fix": "relabel to local state backend (no remote backend)", "tag": "new"},
{"id": "O6", "dimension": "drift", "severity": "medium",
"location": "docs/decisions/014-knowledge-sourcing.md:88",
"description": "plugin reproducibility described as open, but TODO 10.7 is DONE",
"suggested_fix": "update to resolved state; drop the forward-pointer", "tag": "new"},
{"id": "O7", "dimension": "consistency", "severity": "low",
"location": "docs/decisions/011-update-management.md:128",
"description": "ruled-out 'Digest-pinning the stateful tier' contradicts Decision #2 (adopts tag@digest); ADR-011 is draft",
"suggested_fix": "remove/replace the ruled-out row when accepting ADR-011 (TODO 16)", "tag": "new"},
{"id": "O8", "dimension": "consistency", "severity": "low",
"location": "docs/decisions/003-toolchain.md:85; docs/decisions/010-forgejo-ci.md:66",
"description": "'act_runner on control node or a dedicated runner VM' ambiguous vs ADR-015",
"suggested_fix": "name ubongo as runner host; cross-ref ADR-015", "tag": "new"},
{"id": "O9", "dimension": "consistency", "severity": "low",
"location": "docs/decisions/008-testing.md:148",
"description": "WireGuard Molecule-exclusion row framed for retired OPNsense VLAN-99 WireGuard",
"suggested_fix": "reframe to NetBird wt0 data plane (ADR-016)", "tag": "new"},
{"id": "O10", "dimension": "consistency", "severity": "low",
"location": "docs/decisions/011-update-management.md:67",
"description": "cross-refs 'scheduled_jobs plan and ADR-010'; ADR-010 has no such plan (TODO 8.3)",
"suggested_fix": "point to TODO 8.3", "tag": "new"},
{"id": "O11", "dimension": "consistency", "severity": "low",
"location": "docs/CAPABILITIES.md",
"description": "no row for the /verify-service (Level 4) capability decided in ADR-017",
"suggested_fix": "add an Operations row for /verify-service", "tag": "new"},
{"id": "O12", "dimension": "cruft", "severity": "low",
"location": "docs/TODO.md:30",
"description": "item 3.10 is garbled/unfollowable",
"suggested_fix": "rewrite clearly or strike", "tag": "new"}
],
"scan_noise": [
"broken-path-ref x14: illustrative report-name templates (YYYY-MM-DD-<service>.md) and not-yet-created latest.md files; scanner stops at the <placeholder> boundary",
"marker x35: mostly prose references to TODO.md items, not code markers",
"open-deferred-item x6: all confirmed genuinely open (ADR-011 #1-5, ADR-015 #3); 0 stale-deferred"
]
}

View file

@ -0,0 +1,93 @@
# Repo review — 2026-06-05
- **Reviewed commit:** `f566fd1` (scan); auto-fixes landed in `666ad42`
- **Mode:** on-demand (interactive)
- **Scope:** whole repo — 2 roles, 17 ADRs, 4 runbooks, 7 scripts; doc-heavy
- **Prior run:** 2026-05-30 (`de38d1c`) — 7 auto-fixed, 17 open
## Summary
| | high | medium | low | total |
|---|---|---|---|---|
| Auto-fixed | 2 | 0 | 2 | 4 |
| Open (report-only) | 0 | 5 | 7 | 12 |
This review followed a session of heavy documentation work (ADR-015 `ubongo`,
ADR-016 NetBird mesh, ADR-017 Level-4 verification). Most findings are **propagation
gaps** — a new decision landed but an older doc still reflects the prior design.
**New deferral check exercised.** `repo-scan.py` now enumerates open ADR "Deferred/
Open" items and flags any another file calls resolved-but-unmarked. This run: 6
open-deferred-items surfaced, **all confirmed genuinely open** by the cross-cutting
reviewer (ADR-011 #15, ADR-015 #3), **0 stale-deferred**. The check produced no false
resolutions and the judgement layer agreed — working as designed.
## Auto-fixes applied (`666ad42`)
| id | dim | sev | location | fix |
|---|---|---|---|---|
| AF1 | consistency | high | `docs/decisions/005-bootstrapping.md:36`, `docs/runbooks/new-host.md:62,71` | Removed "Terraform writes the host's DNS A record" — contradicts ADR-009 (the `dns` role owns the zone). **Recurring**: the 2026-05-30 run fixed the same contradiction in README/ADR-003; it reappeared in two more files. |
| AF2 | consistency | high | `docs/decisions/005-bootstrapping.md:8` | Control node described as cloned from the cloud-init template; ADR-015 makes `ubongo` a physical box installed directly. Corrected. |
| AF3 | consistency | low | `CLAUDE.md:197` | Added the missing `docs/testing/service-verify-template.md` row to Further reading (parallels the security-template row). |
| AF4 | cruft | low | `docs/TODO.md:79` | Typos: "we we" → "we"; "seperate" → "separate". |
## Open findings (report-only)
### VERIFY.md propagation cluster (ADR-017 not fully threaded through)
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O1 | medium | `docs/decisions/004-docker-model.md` (file table) | The service-role standard lists `SECURITY.md` but not `VERIFY.md`, though ADR-017 + CLAUDE.md:85 now mandate it. | Add a `VERIFY.md` row to ADR-004's file table. |
| O2 | medium | `docs/runbooks/new-role.md` (step 9 → Commit) | No step to write `VERIFY.md` for service roles (only `SECURITY.md`). Makes `STATUS.md:17` ("runbooks current and mutually reconciled") slightly overstated. | Add a "write the per-service verification spec" step mirroring the SECURITY.md step. |
| O3 | low | `README.md:58-60, 94` | ADR list stops at 001009 (010017 absent); the `docs/` tree omits `security/`, `testing/`, `hardware/`. | Extend the ADR list (or point to `docs/decisions/` + CLAUDE.md's table); expand the `docs/` subtree. |
### Design gaps from the recent ADRs
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O4 | medium | `CLAUDE.md:106`, `docs/decisions/009-provisioning-handoff.md:78`, `scripts/tf_to_inventory.py:24` | ADR-016 says "`askari` is Ansible-managed — its own inventory group", but no group is named anywhere; host-groups list + valid-groups set don't include it. | Decide the group name (e.g. `edge_hosts`/`hetzner_hosts`), add to CLAUDE.md host groups + ADR-009 valid groups. (`askari` is manual like the control node, so `tf_to_inventory.py` need not generate it, but the group must be valid.) |
| O5 | medium | `docs/decisions/006-terraform.md:78` | `backend.tf` labelled "Forgejo state backend", contradicting ADR-006's own State-backend section (local state on `ubongo`; Forgejo's API is read-only). | Relabel to "local state backend (no remote backend)". |
| O6 | medium | `docs/decisions/014-knowledge-sourcing.md:88` | Plugin-reproducibility described as open ("tracked in `docs/TODO.md`"), but TODO 10.7 is marked DONE (settings.json declares the plugin set; claude-code-setup.md covers bootstrap). | Update to reflect the resolved state; drop the forward-pointer. |
### Clarity / lower-priority consistency
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O7 | low | `docs/decisions/011-update-management.md:128` | "Digest-pinning the stateful tier" sits in the ruled-out table, but Decision #2 *adopts* `tag@digest` for stateful (TODO 16 confirms). ADR-011 is still **Proposed/draft**. | Remove/replace the ruled-out row when accepting ADR-011 (TODO 16). |
| O8 | low | `docs/decisions/003-toolchain.md:85`, `docs/decisions/010-forgejo-ci.md:66` | "act_runner on the control node **or a dedicated runner VM**" reads ambiguously against ADR-015 (no cluster control VM). Not wrong (a runner VM is a separate option) but worth disambiguating. | Name `ubongo` as the runner host; cross-ref ADR-015; keep "dedicated runner VM" as an explicit future option. |
| O9 | low | `docs/decisions/008-testing.md:148` | The "WireGuard tunnel establishment" Molecule-exclusion row is framed for the retired OPNsense VLAN-99 WireGuard; NetBird still uses WireGuard (`wt0`) as its data plane. | Reframe the row to the NetBird `wt0` data-plane (ADR-016). |
| O10 | low | `docs/decisions/011-update-management.md:67` | Cross-references "the `scheduled_jobs` plan and ADR-010"; ADR-010 is Forgejo CI, not scheduled jobs (that's TODO 8.3, unbuilt). | Point to TODO 8.3 instead. |
| O11 | low | `docs/CAPABILITIES.md` §10 | No row for the `/verify-service` (Level 4) capability though ADR-017 decided it. | Add an Operations row for `/verify-service`. |
| O12 | low | `docs/TODO.md:30` (item 3.10) | Garbled text ("maybe something in the improvements of the methods in boma moods the point?") — unfollowable. | Rewrite the question clearly or strike it. |
### Deterministic-scan noise (not fixed — known limitations)
- **`broken-path-ref` ×14** — all illustrative/future paths: report-name templates
(`docs/testing/reviews/YYYY-MM-DD-<service>.md`) and `latest.md` files not yet
created. The path-ref check stops at the `<placeholder>` boundary, so a templated
path registers as a partial broken ref. *Potential scanner improvement: skip a path
ref immediately followed by a placeholder char or a `YYYY-MM-DD` token.*
- **`marker` ×35** — mostly prose references to `TODO.md` items, not code markers.
Known noise; the regex already excludes `TODO.md`/alternations but not "TODO 8.2"
prose.
- **`open-deferred-item` ×6** — all confirmed genuinely open (see above). `0`
stale-deferred. New check healthy.
## Diff vs prior run (2026-05-30)
- **Recurring:** the Terraform-writes-DNS contradiction (AF1) — fixed in README/ADR-003
last run, reappeared in ADR-005/new-host.md. Signal that this phrasing keeps being
copied; worth a `/review-repo`-time grep for "writes … DNS A record".
- **New:** everything else — the repo gained ADR-010…017 and the `ubongo`/NetBird/
Level-4 work since the prior run, so most findings are fresh propagation gaps.
- **Resolved:** prior-run open items were largely addressed during the intervening
doc work (control-node-as-VM, WireGuard framing, etc., now mostly reconciled).
## Follow-up prompt
> Thread the ADR-017 `VERIFY.md` convention through the remaining docs (O1O3): add a
> `VERIFY.md` row to ADR-004's service-role file table, a VERIFY.md step to
> `new-role.md` (and reconcile STATUS.md:17), and refresh `README.md`'s ADR list +
> `docs/` tree. Then settle the `askari` inventory group name (O4) and propagate it to
> CLAUDE.md host-groups + ADR-009 valid-groups. Finally clear the stale labels O5
> (ADR-006 backend.tf) and O6 (ADR-014 plugin reproducibility = DONE).

View file

@ -1,23 +1,93 @@
# Latest repo review
# Repo review — 2026-06-05
Most recent: **2026-05-30** → full report: `docs/reviews/2026-05-30-review.md`
- **Reviewed commit:** `f566fd1` (scan); auto-fixes landed in `666ad42`
- **Mode:** on-demand (interactive)
- **Scope:** whole repo — 2 roles, 17 ADRs, 4 runbooks, 7 scripts; doc-heavy
- **Prior run:** 2026-05-30 (`de38d1c`) — 7 auto-fixed, 17 open
## Summary
| | high | medium | low | total |
|---|---|---|---|---|
| Auto-fixed | 2 | 3 | 2 | 7 |
| Open | 4 | 4 | 9 | 17 |
| Auto-fixed | 2 | 0 | 2 | 4 |
| Open (report-only) | 0 | 5 | 7 | 12 |
Dominant theme: drift from this session's own changes — residual `.vault_pass`
references after the Vaultwarden/rbw switch, and leftover PR/merge-request language
after going trunk-based.
This review followed a session of heavy documentation work (ADR-015 `ubongo`,
ADR-016 NetBird mesh, ADR-017 Level-4 verification). Most findings are **propagation
gaps** — a new decision landed but an older doc still reflects the prior design.
## Suggested follow-up prompt
**New deferral check exercised.** `repo-scan.py` now enumerates open ADR "Deferred/
Open" items and flags any another file calls resolved-but-unmarked. This run: 6
open-deferred-items surfaced, **all confirmed genuinely open** by the cross-cutting
reviewer (ADR-011 #15, ADR-015 #3), **0 stale-deferred**. The check produced no false
resolutions and the judgement layer agreed — working as designed.
> Remediate the boma 2026-05-30 review (`docs/reviews/2026-05-30-review.md`):
> 1. Purge the residual `.vault_pass` references R1R5 → the rbw/Vaultwarden flow.
> 2. Decide the workflow model R6R7 — I lean "keep deploy approval gates, drop the
> PR/merge-request framing"; reconcile ADR-003/008 and CLAUDE.md to match.
> 3. Resolve R8 — scaffold `base`/`docker_host` via `make new-role`, or correct
> STATUS.md/roles/README.md to say the roles don't exist yet.
> 4. Fix the Terraform `vlan_tag` wiring (R9).
> Report on the rest.
## Auto-fixes applied (`666ad42`)
| id | dim | sev | location | fix |
|---|---|---|---|---|
| AF1 | consistency | high | `docs/decisions/005-bootstrapping.md:36`, `docs/runbooks/new-host.md:62,71` | Removed "Terraform writes the host's DNS A record" — contradicts ADR-009 (the `dns` role owns the zone). **Recurring**: the 2026-05-30 run fixed the same contradiction in README/ADR-003; it reappeared in two more files. |
| AF2 | consistency | high | `docs/decisions/005-bootstrapping.md:8` | Control node described as cloned from the cloud-init template; ADR-015 makes `ubongo` a physical box installed directly. Corrected. |
| AF3 | consistency | low | `CLAUDE.md:197` | Added the missing `docs/testing/service-verify-template.md` row to Further reading (parallels the security-template row). |
| AF4 | cruft | low | `docs/TODO.md:79` | Typos: "we we" → "we"; "seperate" → "separate". |
## Open findings (report-only)
### VERIFY.md propagation cluster (ADR-017 not fully threaded through)
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O1 | medium | `docs/decisions/004-docker-model.md` (file table) | The service-role standard lists `SECURITY.md` but not `VERIFY.md`, though ADR-017 + CLAUDE.md:85 now mandate it. | Add a `VERIFY.md` row to ADR-004's file table. |
| O2 | medium | `docs/runbooks/new-role.md` (step 9 → Commit) | No step to write `VERIFY.md` for service roles (only `SECURITY.md`). Makes `STATUS.md:17` ("runbooks current and mutually reconciled") slightly overstated. | Add a "write the per-service verification spec" step mirroring the SECURITY.md step. |
| O3 | low | `README.md:58-60, 94` | ADR list stops at 001009 (010017 absent); the `docs/` tree omits `security/`, `testing/`, `hardware/`. | Extend the ADR list (or point to `docs/decisions/` + CLAUDE.md's table); expand the `docs/` subtree. |
### Design gaps from the recent ADRs
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O4 | medium | `CLAUDE.md:106`, `docs/decisions/009-provisioning-handoff.md:78`, `scripts/tf_to_inventory.py:24` | ADR-016 says "`askari` is Ansible-managed — its own inventory group", but no group is named anywhere; host-groups list + valid-groups set don't include it. | Decide the group name (e.g. `edge_hosts`/`hetzner_hosts`), add to CLAUDE.md host groups + ADR-009 valid groups. (`askari` is manual like the control node, so `tf_to_inventory.py` need not generate it, but the group must be valid.) |
| O5 | medium | `docs/decisions/006-terraform.md:78` | `backend.tf` labelled "Forgejo state backend", contradicting ADR-006's own State-backend section (local state on `ubongo`; Forgejo's API is read-only). | Relabel to "local state backend (no remote backend)". |
| O6 | medium | `docs/decisions/014-knowledge-sourcing.md:88` | Plugin-reproducibility described as open ("tracked in `docs/TODO.md`"), but TODO 10.7 is marked DONE (settings.json declares the plugin set; claude-code-setup.md covers bootstrap). | Update to reflect the resolved state; drop the forward-pointer. |
### Clarity / lower-priority consistency
| id | sev | location | finding | suggested fix |
|---|---|---|---|---|
| O7 | low | `docs/decisions/011-update-management.md:128` | "Digest-pinning the stateful tier" sits in the ruled-out table, but Decision #2 *adopts* `tag@digest` for stateful (TODO 16 confirms). ADR-011 is still **Proposed/draft**. | Remove/replace the ruled-out row when accepting ADR-011 (TODO 16). |
| O8 | low | `docs/decisions/003-toolchain.md:85`, `docs/decisions/010-forgejo-ci.md:66` | "act_runner on the control node **or a dedicated runner VM**" reads ambiguously against ADR-015 (no cluster control VM). Not wrong (a runner VM is a separate option) but worth disambiguating. | Name `ubongo` as the runner host; cross-ref ADR-015; keep "dedicated runner VM" as an explicit future option. |
| O9 | low | `docs/decisions/008-testing.md:148` | The "WireGuard tunnel establishment" Molecule-exclusion row is framed for the retired OPNsense VLAN-99 WireGuard; NetBird still uses WireGuard (`wt0`) as its data plane. | Reframe the row to the NetBird `wt0` data-plane (ADR-016). |
| O10 | low | `docs/decisions/011-update-management.md:67` | Cross-references "the `scheduled_jobs` plan and ADR-010"; ADR-010 is Forgejo CI, not scheduled jobs (that's TODO 8.3, unbuilt). | Point to TODO 8.3 instead. |
| O11 | low | `docs/CAPABILITIES.md` §10 | No row for the `/verify-service` (Level 4) capability though ADR-017 decided it. | Add an Operations row for `/verify-service`. |
| O12 | low | `docs/TODO.md:30` (item 3.10) | Garbled text ("maybe something in the improvements of the methods in boma moods the point?") — unfollowable. | Rewrite the question clearly or strike it. |
### Deterministic-scan noise (not fixed — known limitations)
- **`broken-path-ref` ×14** — all illustrative/future paths: report-name templates
(`docs/testing/reviews/YYYY-MM-DD-<service>.md`) and `latest.md` files not yet
created. The path-ref check stops at the `<placeholder>` boundary, so a templated
path registers as a partial broken ref. *Potential scanner improvement: skip a path
ref immediately followed by a placeholder char or a `YYYY-MM-DD` token.*
- **`marker` ×35** — mostly prose references to `TODO.md` items, not code markers.
Known noise; the regex already excludes `TODO.md`/alternations but not "TODO 8.2"
prose.
- **`open-deferred-item` ×6** — all confirmed genuinely open (see above). `0`
stale-deferred. New check healthy.
## Diff vs prior run (2026-05-30)
- **Recurring:** the Terraform-writes-DNS contradiction (AF1) — fixed in README/ADR-003
last run, reappeared in ADR-005/new-host.md. Signal that this phrasing keeps being
copied; worth a `/review-repo`-time grep for "writes … DNS A record".
- **New:** everything else — the repo gained ADR-010…017 and the `ubongo`/NetBird/
Level-4 work since the prior run, so most findings are fresh propagation gaps.
- **Resolved:** prior-run open items were largely addressed during the intervening
doc work (control-node-as-VM, WireGuard framing, etc., now mostly reconciled).
## Follow-up prompt
> Thread the ADR-017 `VERIFY.md` convention through the remaining docs (O1O3): add a
> `VERIFY.md` row to ADR-004's service-role file table, a VERIFY.md step to
> `new-role.md` (and reconcile STATUS.md:17), and refresh `README.md`'s ADR list +
> `docs/` tree. Then settle the `askari` inventory group name (O4) and propagate it to
> CLAUDE.md host-groups + ADR-009 valid-groups. Finally clear the stale labels O5
> (ADR-006 backend.tf) and O6 (ADR-014 plugin reproducibility = DONE).

View file

@ -58,9 +58,9 @@ locals {
}
```
Terraform clones the cloud-init template from Part A, sets the cloud-init values
(hostname, SSH key, IP/gateway), and writes the host's DNS A record. See ADR-009
for the full handoff and the `vms` output → inventory data contract.
Terraform clones the cloud-init template from Part A and sets the cloud-init values
(hostname, SSH key, IP/gateway). It writes no DNS records — the `dns` role owns the
internal zone. See ADR-009 for the full handoff and the `vms` output → inventory data contract.
---
@ -68,7 +68,7 @@ for the full handoff and the `vms` output → inventory data contract.
```bash
make tf-plan TF_ENV=production # review — confirm only the new VM is added
make tf-apply TF_ENV=production # create the VM + write its DNS A record
make tf-apply TF_ENV=production # create the VM (no DNS records written)
make tf-inventory TF_ENV=production # regenerate inventories/production/hosts.yml
```

View file

@ -31,6 +31,67 @@ ADR_REF_RE = re.compile(r"\bADR-(\d{3})\b")
PATH_REF_RE = re.compile(r"(?:docs|scripts|roles|inventories|terraform|playbooks)/[\w./-]+")
PLACEHOLDER = set("<>*${}")
# Stale-deferred detection: ADR "Deferred/Open" entries that another file describes
# as resolved, but which aren't marked resolved in place. (See docs/FRICTION.md.)
RESOLVE_MARK_RE = re.compile(r"\b(?:RESOLVED|DECIDED)\b", re.I)
LIST_ITEM_RE = re.compile(r"^\s*(\d+\.|[-*+])\s+(.*)")
# An external "this resolves ADR-NNN deferred #K" style reference.
DEFER_REF_RE = re.compile(r"ADR-(\d{3})\D{0,40}?deferred\D{0,12}?(\d+)", re.I)
RESOLVE_WORD_RE = re.compile(r"\b(?:resolv\w*|decid\w*|address\w*|complet\w*|done)\b", re.I)
def _is_defer_heading(text):
t = text.strip().lower()
return (t.startswith("deferred") or t.startswith("unresolved")
or "open question" in t or "open issue" in t)
def _defer_subject(item_text):
m = re.search(r"\*\*(.+?)\*\*", item_text)
s = m.group(1) if m else re.split(r"\s+[—–-]\s+|:", item_text, maxsplit=1)[0]
return re.sub(r"\s+", " ", s).strip(" *_`~—–-:.")
def deferred_findings(adr_files, defer_refs):
"""adr_files: {rel_path: [lines]} for docs/decisions/*.md.
defer_refs: [(adr, ordinal, path, line, has_resolve_word)] gathered repo-wide.
Emits one informational `open-deferred-item` per open entry, and a `stale-deferred`
contradiction when another file describes that entry as resolved."""
out = []
for rpath, lines in sorted(adr_files.items()):
madr = re.match(r"(\d{3})-", os.path.basename(rpath))
adr_num = madr.group(1) if madr else None
in_defer = False
for i, raw in enumerate(lines, 1):
hm = re.match(r"#{1,6}\s+(.*)", raw)
if hm:
in_defer = _is_defer_heading(hm.group(1))
continue
if not in_defer:
continue
im = LIST_ITEM_RE.match(raw)
if not im:
continue
marker, item_text = im.group(1), im.group(2)
# self-marked resolved (inline RESOLVED/DECIDED or ~~strikethrough~~) → fine
if RESOLVE_MARK_RE.search(raw) or item_text.lstrip().startswith("~~"):
continue
ordinal = int(marker[:-1]) if marker[:-1].isdigit() else None
subject = _defer_subject(item_text)
tag = f" #{ordinal}" if ordinal else ""
out.append({"check": "open-deferred-item", "severity": "low", "path": rpath,
"line": i, "detail": f"open deferred item{tag} in ADR-{adr_num}: "
f"'{subject[:80]}' — confirm not resolved by a later ADR/STATUS"})
if adr_num and ordinal:
for ra, rk, rp, rl, has_res in defer_refs:
if ra == adr_num and rk == ordinal and rp != rpath and has_res:
out.append({"check": "stale-deferred", "severity": "medium",
"path": rpath, "line": i,
"detail": f"ADR-{adr_num} deferred #{ordinal} "
f"('{subject[:60]}') is described as resolved at "
f"{rp}:{rl}, but is not marked RESOLVED in place"})
return out
def walk_files():
for dirpath, dirnames, filenames in os.walk(ROOT):
@ -81,6 +142,9 @@ def adr_numbers():
def scan():
findings = []
adrs = adr_numbers()
adr_files = {} # docs/decisions/*.md → lines, for deferred-section parsing
defer_refs = [] # repo-wide "resolves ADR-NNN deferred #K" references
decisions_dir = os.path.join("docs", "decisions")
for path in walk_files():
rpath = rel(path)
if rpath.startswith(SKIP_PREFIX):
@ -108,7 +172,13 @@ def scan():
except OSError:
continue
if rpath.startswith(decisions_dir) and rpath.endswith(".md"):
adr_files[rpath] = lines
for i, line in enumerate(lines, 1):
for m in DEFER_REF_RE.finditer(line):
defer_refs.append((m.group(1), int(m.group(2)), rpath, i,
bool(RESOLVE_WORD_RE.search(line))))
markers = sorted(set(m.group(1) for m in MARKER_RE.finditer(line)))
if markers:
findings.append({"check": "marker", "severity": "low", "path": rpath,
@ -131,6 +201,7 @@ def scan():
if not os.path.exists(os.path.join(ROOT, ref)):
findings.append({"check": "broken-path-ref", "severity": "medium", "path": rpath,
"line": i, "detail": f"references '{ref}' which does not exist"})
findings.extend(deferred_findings(adr_files, defer_refs))
return findings