Commit graph

77 commits

Author SHA1 Message Date
0030b45bbd docs(adr-016): soften the second stale off-site-backup claim (R8 consistency)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 11:42:49 +02:00
a483f4e55c fix: address whole-branch review (anchor pin regexp, ADR-016 backup note, verify comment)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 11:41:19 +02:00
c09b7fe6a5 docs(security): accept the single-coordinator mesh SPOF (R8) + ADR-016 availability amendment
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 11:34:21 +02:00
bc8592616b fix: address final whole-branch review findings
- ADR-023 §4: ADR-015 no-sudo sub-decision now Superseded-by ADR-025 (bidirectional), not just an in-place amendment.
- STATUS: drop the deferred `reset` verb; honest integration_test (molecule not run in this env; applied to ubongo) + verify (forward/DNAT, not wt0); RED->GREEN validated.
- driver: remove unused `import shutil`.
- README: fix the ADR-025 link filename.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:52:28 +02:00
d7bd31babb docs(adr/status): integration-testing harness RED→GREEN validated (ADR-025)
The local-VM integration harness RED→GREEN acceptance passed on real hardware
(2026-06-18): a KVM VM on ubongo reproduced the 2026-06-17 nftables/Docker reboot
breakage (RED) and survived with the docker_host container-forward drop-in (GREEN).

ADR-025: Status updated to PASSED; shakedown learnings section added (UEFI boot
required, claude sudo load-bearing); ADR-021 added to Related.
STATUS.md: integration-harness section updated from PENDING to PASSED; ubongo
entry updated to reflect claude NOPASSWD sudo + sjat-ansible NOPASSWD removal;
last-reviewed date updated.
docs/TODO.md: item 2.4 collapsed to one-line pointer per the file's convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:39:30 +02:00
cc772ff845 docs(adr/security): record claude NOPASSWD sudo model (ADR-015 amend + R7)
The integration-testing shakedown reversed ADR-015's "no local sudo" sub-decision:
the claude AI-worker now has NOPASSWD:ALL sudo on ubongo — without it, virsh,
nft, and journalctl all block during VM diagnosis. Compensating controls:
password-locked account, auditd/Loki attribution, repo-managed revocable drop-in.

ADR-015: dated amendment note in Status + expanded AI-worker identity section.
ADR-021: new §Sudo model (amendment 2026-06-18) — claude=NOPASSWD, sjat=password
required; former sjat NOPASSWD drop-in removed 2026-06-18 (least-privilege cleanup).
accepted-risks.md: R7 added (claude NOPASSWD:ALL on ubongo); last-reviewed updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:39:20 +02:00
4732730515 docs: wire ADR-025 into testing/control-host/risks/status/capacity
- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the
  "not tested in Molecule" table
- ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs
  (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing
- accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records)
- CLAUDE.md: add make test-integration[/-clean] to key-commands;
  add ADR-025 + runbook rows to further-reading
- hardware/reference.md: note one ephemeral KVM test VM on ubongo
- STATUS.md: add integration harness entry (built, lint+pytest clean;
  RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:51:22 +02:00
edcc347a95 docs(adr): ADR-025 local VM integration testing
Accepted decision to implement ADR-008 Level 2/3 on ubongo via
libvirt/KVM directly: throwaway VM overlays, stdlib-only driver,
tiered cert fidelity, three safety invariants. Addresses the
2026-06-17 mesh-hardening incident's reboot-survivability gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:49:52 +02:00
b3468b34e4 docs: record Caddy/Gandi DNS-01 as resolved + proven (was M4a deferral)
ADR-024 Status/Consequences, STATUS.md, ROADMAP M4a, and the FRICTION ledger now
record that the DNS-01 path is built and proven, with the root cause of the M4a
failure (version skew: pre-Bearer libdns/gandi sent the deprecated Apikey header;
plus building on a Hetzner IP). Traefik was reconsidered and rejected again — lego's
Gandi provider has the same PAT-vs-Apikey question, so it would not have helped.

Dated review reports and spec/plan snapshots are left as historical records.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:57:55 +02:00
13ae674cc9 chore(kaizen): first /kaizen run — curate 12 friction signals
Dogfood of the new /kaizen command. 11 consumed, 1 kept open.
- SYSTEMATIZE → docs/testing/gotchas.md (apply:{tags} propagation, Molecule
  tag-isolation testing, API/templating render-only gap); CLAUDE.md
  (item['key'] loop convention, TF module required_providers); public_dns
  README (Gandi null-MX workaround).
- CHANGE → extend the Stop hook to also guard the brainstorming spec-review gate
  (verified: blocks the gate, passes meta-discussion).
- SYSTEMATIZE → make new-role scaffolds the access__/backup__ noqa reminder;
  ADR-004 documents the cross-role-naming convention.
- ALREADY-BUILT/ACCEPTED → exec-menu guard verified firing; ADR-023; ADR-024;
  subagent-faithfulness now embodied in the two-stage subagent review.
- KEEP-OPEN → a repo-scan.py check for ADRs that over-claim reconciliation.

Nudge: OVERDUE (13 signals) → ok (1). make lint + 16 friction-scan tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:46:23 +02:00
9e0c264658 docs: reconcile lower-severity review findings (O9-O24)
- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
  outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
  boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
  <host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
  placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
  offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)

O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:31:40 +02:00
175777e36a docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)
- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:06:33 +02:00
1862b7a828 docs(m4a): HTTP-01 for askari; ADR-024 cert-method-follows-exposure; STATUS/roadmap/friction
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:14:38 +02:00
d10f6de84b docs(adr): ADR-024 — Caddy is boma's reverse proxy
Adds ADR-024 pinning Caddy (xcaddy + caddy-dns/gandi) as boma's reverse
proxy, superseding the soft Traefik assumption in the roadmap and ADR-017
prose. Updates CLAUDE.md Further reading table and ROADMAP.md Phase-2 step 5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:28:42 +02:00
3588904528 docs(askari): amend ADR-006/009/020/007/016 for TF-provisioned offsite host; STATUS (apply pending)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 12:09:20 +02:00
3cb6436ad2 docs(adr-007): fix askari FQDN to askari.wingu.me (review nit)
The naming-table amendment left the 'External monitoring' prose saying
askari.baobab.band; askari is greenfield (never on baobab.band), so its FQDN is
askari.wingu.me, off-site tier.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:44:21 +02:00
f170ffd936 docs(public_dns): amend ADR-007 to wingu.me/Gandi; resolve TODO 4; STATUS + CAPABILITIES
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:38:45 +02:00
1da117d65b docs(review): 2026-06-11 repo audit — fix build-wave doc drift
/review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the
base(firewall)+dev_env build wave: README/playbook/role notes that still called
the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021
cross-links in ADR-016/020.

18 open findings reported (not fixed). Headline: `make lint` is red on `main`
(site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022
backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of
12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:48:00 +02:00
349d10d65c docs: record ubongo physical build (2026-06-11)
Move ubongo to 'Built (partial)' in STATUS; fill real M70q hardware specs
(i3-10100T, 16 GB, 256 GB SanDisk X600 SATA, no disk encryption). Record in
ADR-015 the dedicated claude AI-worker identity, LAN-SSH-only operational
reality, and the no-encryption decision; close the rbw offline-cache
recovery-verification item (ADR-015 + rotate-secrets). Add accepted-risk R5
(control-node disk unencrypted at rest) with its compensating controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:26 +02:00
d0a3307822 docs(adr): fix 007/008 heading nesting; require date in Superseded status
Final-review polish: demote the sub-headings under the demoted 'IP addressing'
(007) and 'Three testing levels'/'What Molecule tests' (008) to #### so they
nest correctly instead of flattening to siblings. Tighten the adr-structure
Superseded pattern to require '(YYYY-MM-DD)' per ADR-023.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:00:58 +02:00
0df24909e3 docs(adr): restructure ADRs 016-018 to ADR-023 conformance
Make the existing Status sections parseable (Accepted (date) + the existing
designed-not-built note) and add Consequences sections assembled from each
ADR's already-stated residual risks, trade-offs and build status. No
decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:51:51 +02:00
40a428975a docs(adr): restructure ADR-003 to ADR-023 conformance
Add Status, a descriptive Context, a Decision umbrella over the existing
topical sections (demoted to ###), and a Consequences section assembled
from the ADR's already-stated rationale. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:50:03 +02:00
6d7d27b03b docs(adr): add Proposed lifecycle state; mark ADR-011 Proposed
Revisits the lifecycle decision on the evidence of ADR-011 (a real draft
with open questions). Adds a fourth state, Proposed (YYYY-MM-DD), to ADR-023,
the template, the adr-structure check (+test), spec and plan. Sets ADR-011's
Status to Proposed and removes its now-redundant inline 'Proposed' line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:48:55 +02:00
b3ca510380 docs(adr): restructure ADRs 010,011,013 to ADR-023 conformance
010/011: relabel Decisions->Decision + add Status/Consequences.
013: add Status + Decision umbrella (existing Consequences untouched).
No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:43:41 +02:00
44dbd4628f docs(adr): restructure ADRs 006-009 to ADR-023 conformance
Add dated Status sections, a Decision umbrella over the existing topical
sections (demoted to ###), and Consequences assembled from each ADR's
already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:41:24 +02:00
188882449d docs(adr): restructure ADRs 001,002,004,005,012,014,015 to ADR-023 conformance
Add dated Status sections and (where missing) Consequences sections assembled
from each ADR's already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:39:00 +02:00
a9aab9d040 docs(adr): ADR-023 — ADR structure & lifecycle
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:32:40 +02:00
ab14d65aa1 docs(adr): add adr-template.md scaffold (ADR-023)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:30:52 +02:00
91713127cb docs(kaizen): migrate gotchas to docs; curate FRICTION log (2026-06-10 review)
- New docs/testing/gotchas.md (nft iif/iifname, Molecule ansible_host,
  apply-path coverage blind spot, render-nft-c pattern); pointer from ADR-008.
- claude-code-setup.md gains "Environment gotchas" (hooks-need-restart,
  pre-commit stashes unstaged, rbw sync cache, zsh word-split).
- FRICTION.md restructured into Open signals + a decisions ledger; consumed
  signals archived with where their resolution now lives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 12:51:39 +02:00
ed6d5463aa docs(backup): final-review fixes — stateless BACKUP.md, dump-step wording, spec sync
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:32:06 +02:00
f5c97d1f36 docs(backup): record ADR-022; wire into CLAUDE.md, STATUS, TODO
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 11:19:01 +02:00
f151e99d04 docs(access): correct ADR-021 governance (runbook+gate, not scaffold)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:52:24 +02:00
f8098c2e15 docs(access): reconcile ADR-016/020 with control-node SSH source (ADR-021)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:34:57 +02:00
0fe9e45f57 docs(access): add ADR-021 operational-access doctrine 2026-06-09 17:33:46 +02:00
d311f67098 docs(adr): ADR-020 firewall strategy (two-layer + shared catalog) 2026-06-06 15:59:30 +02:00
5aeeb094eb feat(tags): enforce role imports carry their role-name tag
Adds role_tag_problems() to check-tags.py: every role imported in a
play's roles: block must carry its own role name as a tag (extra tags
allowed; templated role names skipped). Wires the check into main() so
make lint catches violations. 6 new unit tests (29 total, all passing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:12:48 +02:00
2e5a1e1e23 fix(tags): exclude molecule scenarios from tag scan; clarify ADR enforcement
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 09:50:14 +02:00
24b5e9361e docs(tags): ADR-019 + CLAUDE.md/TODO/CAPABILITIES (tagging standard)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 09:42:22 +02:00
c6aa45037d ADR-012: track log-storage allocation + SSD wearout (ADR-018) 2026-06-06 07:05:15 +02:00
30c6a93c28 ADR-002: make central-logging + alerting controls concrete (ADR-018) 2026-06-06 07:02:32 +02:00
2894319f01 Add ADR-018 (logging and log integrity)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 07:01:36 +02:00
db76be2a63 review-repo: clear O7-O12 clarity items
- ADR-011: ruled-out row was "digest-pinning stateful" (contradicted Decision 2);
  now "digest-only (no readable tag)" — tag@digest is adopted (O7)
- ADR-003/010: act_runner names ubongo as the runner host, runner VM as a future
  option (O8)
- ADR-008: WireGuard Molecule-exclusion row reframed to NetBird wt0 data plane (O9)
- ADR-011: scheduled_jobs xref points to TODO 8.3, not ADR-010 (O10)
- CAPABILITIES: add /verify-service Level 4 capability row (O11)
- TODO 3.10: rewrite the garbled base-container question (O12)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 19:28:07 +02:00
8e4bf3dd88 ADR-006/014: clear two stale labels
Review O5/O6: ADR-006 mislabeled backend.tf as "Forgejo state backend" (its own
State-backend section chooses local state — Forgejo's API is read-only); ADR-014
called plugin reproducibility open though TODO 10.7 is done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:55:17 +02:00
d8afa94c4b Name and propagate the offsite_hosts inventory group (askari)
Review O4: ADR-016 said askari gets "its own inventory group" but never named it.
Settled as offsite_hosts (off-site, distinct from on-site-but-off-cluster ubongo).
Added to VALID_GROUPS (tf_to_inventory.py), ADR-009 valid groups, ADR-001/ADR-016
host-group enumerations, and CLAUDE.md. Generated hosts.yml picks up the section on
the next make tf-inventory (a manual-exception group like control).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:54:54 +02:00
f0d189ca09 Thread the VERIFY.md convention through ADR-004/new-role/README
Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the
ADR-004 service-role file table, as a new-role runbook step, and the README
docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:52:42 +02:00
666ad42634 review-repo: fix DNS-write contradictions + stale control-node/template refs
Auto-fixes from /review-repo:
- ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record"
  (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run)
- ADR-005: control node is physical ubongo, not cloned from the template (ADR-015)
- CLAUDE.md: add the VERIFY.md template to Further reading
- TODO.md: typo fixes (we we / seperate)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:23:16 +02:00
d5c62c99ad STATUS/ADR-015: mark the three deferred design threads resolved
ubongo, the NetBird mesh, and Level 4 verification are design-resolved
(ADR-015/016/017 + specs + plans); STATUS now says so while keeping build
status honest. Also resolves ADR-015 deferred #2 (browser harness), which
was left open when ADR-017 landed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:01:14 +02:00
2df1f98153 ADR-008: expand Level 4 into the verify-service harness (ADR-017) 2026-06-05 13:14:12 +02:00
cc3337502f Add ADR-017 (service-UI acceptance verification, Level 4) 2026-06-05 13:13:09 +02:00
2ae5cf4535 ADR-015: resolve mesh-VPN deferral — NetBird on askari (ADR-016) 2026-06-05 11:48:04 +02:00