Commit graph

348 commits

Author SHA1 Message Date
32d480efcf docs(spec): note project (boma) vs domain (wingu.me) in the naming scheme
Decided to keep the project named boma with wingu.me as its domain (boma was not
available as a domain). Record why the infra tier reads <host>.boma.wingu.me so it
isn't re-litigated; folds into the ADR-007 amendment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:47:13 +02:00
79f2315eee feat(make): add edit-vault + check-vault targets
`make edit-vault` runs `ansible-vault edit` (decrypt → nvim → re-encrypt on :wq,
abort on :cq) so editing the vault is one step with no plaintext left in the work
tree, then validates structure. `make check-vault` runs scripts/check-vault.py:
decrypts in-memory, asserts valid YAML with secrets under the nested `vault:` map
and no empty leaves, and prints a values-masked structure view (comments visible,
secrets never printed). Both default to the production all-vault; override VAULT=.

Update the vault header comment, CLAUDE.md (command table + Secrets section), and
scripts/README to point at edit-vault (note check-vault.py is the one venv-
dependent helper, by design).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:36:15 +02:00
43e5a4aa53 secrets(vault): add Gandi LiveDNS PAT as vault.gandi.pat
Personal Access Token for wingu.me LiveDNS, used by the M1 public_dns role via
community.general.gandi_livedns. Stored under the nested vault.<service>.<key> map
(CLAUDE.md); the placeholder canary is preserved. Verified the token authenticates
+ is scoped to wingu.me, and that the file round-trips (decrypts to the expected
structure). PAT to be rotated after M1 (transmitted in plaintext during setup).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:14:10 +02:00
f7fac5f5e3 docs(spec): M1 — finalize for wingu.me (greenfield), record Gandi-defaults purge
boma's domain is wingu.me (registered at Gandi; 'wingu' = Swahili for cloud).
Replace the parametric <boma-domain> placeholder with wingu.me throughout. The
zone was NOT empty — Gandi auto-seeded 13 default records (parking A, www redirect,
a full Gandi mailbox set), so M1 includes a one-time purge to a clean baseline plus
an anti-spoof null-mail set (null MX, SPF -all, DMARC reject) since wingu.me sends
no mail. Domain-pick open item closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:14:10 +02:00
7a47dd9dec docs(spec): M1 — public DNS migration to Gandi (DNS-as-code) design
Settles the M1 design: full registrar transfer Cloudflare -> Gandi; three-tier
naming scheme (host.boma / service.bare / service.askari), nyumbani dropped,
mesh/LAN-only default; public-DNS-as-code via a control-node `public_dns` role
driven by group_vars data, using community.general.gandi_livedns with a PAT
(api_key is deprecated/rejected by Gandi — verified per ADR-014). Stale records +
unused MX cleaned by omission. Cert scope is DNS+PAT only (issuance deferred to
M4/Phase 2). Human/agent division of labour + token-scoping recorded.

Resolves TODO 4 and review finding O12 once the ADR-007 amendment lands. Point
ROADMAP.md M1 at the spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 23:17:19 +02:00
be2679cc66 docs(roadmap): record decided DNS naming scheme in M1
Three-tier scheme: <host>.boma.baobab.band (infra, internal) /
<service>.baobab.band (home, split-horizon, mesh/LAN-only default) /
<service>.askari.baobab.band (off-site, public). nyumbani dropped; mesh carries
the baobab.band match-domain to road-warriors; *.baobab.band DNS-01 wildcard
certs via Gandi API. Resolves TODO 4 and review finding O12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:17:28 +02:00
3cfcb1c2e9 docs(roadmap): add ROADMAP.md — remote-access-first build order
High-level build order for the project (Approach A): one Off-site/Remote-access
track first (Gandi DNS-as-code -> askari -> NetBird control plane -> enroll
ubongo + road-warrior laptops -> harden), a procurement gate sized by
/capacity-review, then the Cluster track. Sequences the docs/TODO.md backlog into
milestones and records why the order is what it is.

Decisions captured this session: Gandi over Cloudflare is values-driven and
independent of NetBird (sequenced first so records are born at Gandi); public DNS
managed as code (Ansible, consistent with internal DNS + Terraform-owns-no-DNS);
NetBird-on-ubongo before base default-deny (chicken-and-egg); cluster procurement
gated on patterns proven on two cheap hosts.

Wire ROADMAP.md into CLAUDE.md's Further-reading index and point TODO.md at it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:12:38 +02:00
03d33f83dd fix(O1): scaffold docker_host role so make lint passes on main
playbooks/site.yml imports the docker_host role, but it didn't exist, so
ansible-lint's syntax-check failed on a clean checkout — breaking CLAUDE.md's
"main must always work" / "Never skip lint" (top open finding O1 from the
2026-06-11 review).

Scaffold docker_host as a proper placeholder via the prescribed mechanism
(make new-role): filled meta/main.yml + README, an honest no-task tasks/main.yml
documenting planned scope (Docker engine + Compose, daemon hardening, nftables.d
container rules per ADR-004/020), and the standard molecule scenario. This
preserves site.yml's full-standard-state intent rather than dropping the play.

Update STATUS.md (docker_host moves from "Not in git" to "scaffolded, no tasks")
and the role/playbook READMEs to match.

make lint: 0 failures, 0 warnings; check-tags OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:53:55 +02:00
1da117d65b docs(review): 2026-06-11 repo audit — fix build-wave doc drift
/review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the
base(firewall)+dev_env build wave: README/playbook/role notes that still called
the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021
cross-links in ADR-016/020.

18 open findings reported (not fixed). Headline: `make lint` is red on `main`
(site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022
backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of
12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:48:00 +02:00
67f2aba9d8 STATUS: record dev_env (built+applied) and working deploy path
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:21:36 +02:00
aea4f8c3d6 dev_env: install Node.js from pinned tarball, drop npm bloat
Debian's npm package pulls a ~400-package node-* tree (the first deploy
installed 527 packages). Replace apt nodejs+npm with a pinned upstream Node
tarball (v20.19.2) installed to /opt + symlinked, mirroring the nvim install
pattern (ADR-014 pinning). npm/npx come bundled. Molecule verifies node/npm
on PATH; lint + idempotent converge green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:21:33 +02:00
6203513220 inventory: manage ubongo (control node) as the operator account
group_vars/all assumes the ansible service user (created by bootstrap on
Terraform VMs). ubongo is the manually-provisioned control node (ADR-009/
ADR-015 exception) with no bootstrapped ansible user, so connect as sjat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:09:15 +02:00
607423d0e7 dev_env: install acl for become_user file copies
When the login user differs from the become_user (ubongo connects as sjat,
the role copies files as claude), Ansible needs ACLs on its temp files;
without the acl package it falls back to an unsupported chmod syntax and
fails. Molecule didn't catch it (root login can chown directly).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:09:12 +02:00
a2bb99928c fix(deploy): make check/deploy actually run
Two latent bugs that blocked the documented deploy path (never exercised
end-to-end before applying dev_env to ubongo):
- Makefile: the PLAYBOOK variable was both the ansible-playbook BINARY path
  and the user-supplied playbook NAME, so `make check/deploy PLAYBOOK=<name>`
  overrode the binary. Renamed the binary var to PLAYBOOK_BIN.
- ansible.cfg: stdout_callback=yaml and callbacks_enabled=timer were
  community.general plugins (not installed; boma only ships ansible.posix).
  Use the built-in default callback with callback_result_format=yaml and
  ansible.posix.profile_tasks — same intent, no new heavy collection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:09:12 +02:00
f3f382ae69 Add dev_env role: zsh/tmux/nvim for workstation-class hosts
A new role (separate from base) that gives workstation-class hosts (ubongo
now, mamba later) a clean interactive environment: zsh + oh-my-zsh +
oh-my-posh, tmux + TPM plugins, and neovim. Dotfiles are real files deployed
via GNU stow (not templated); pinned nvim v0.12.2 + oh-my-posh 29.0.1.

Configs re-derived (ADR-013) from AnsibleBaobabV4 + the operator's fisi setup
on boma's terms: no Nerd Font (headless host), no system LSP suite (nvim uses
mason), versions pinned (V4 tracks latest). Applied via playbooks/workstation.yml
to the control group for users sjat + claude. Lint + Molecule (idempotent) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:50:11 +02:00
b9daf2a0ad plan: record ubongo build outcome (done/deferred/follow-ups)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:33:18 +02:00
349d10d65c docs: record ubongo physical build (2026-06-11)
Move ubongo to 'Built (partial)' in STATUS; fill real M70q hardware specs
(i3-10100T, 16 GB, 256 GB SanDisk X600 SATA, no disk encryption). Record in
ADR-015 the dedicated claude AI-worker identity, LAN-SSH-only operational
reality, and the no-encryption decision; close the rbw offline-cache
recovery-verification item (ADR-015 + rotate-secrets). Add accepted-risk R5
(control-node disk unencrypted at rest) with its compensating controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:26 +02:00
7b5fd17e55 inventory: add ubongo to control group; set ssh-from-control addr
Wire the now-built physical control node ubongo (10.20.10.151) into the
production control group (the documented manual exception), and activate the
dormant base__firewall_control_addr knob (ADR-021 ssh-from-control source).
Forward-wiring only: no host has the base role applied yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:24 +02:00
7b190e4313 Add ubongo physical-build plan (2026-06-11 session)
Captures the interactive build decisions (no-encryption + accepted risk,
simple partition, dedicated claude identity, LAN-only access, pinned
versions) and the A-F + H task breakdown. Sequel to the 2026-06-05
docs-only ADR-015 plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:01:41 +02:00
7ebbc113ab Merge feat/adr-structure: ADR-023 structure & lifecycle + back-catalogue conformance
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:18:48 +02:00
fa3db421dc docs(kaizen): FRICTION signal — controller must diff-audit subagent restructures
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:01:21 +02:00
d0a3307822 docs(adr): fix 007/008 heading nesting; require date in Superseded status
Final-review polish: demote the sub-headings under the demoted 'IP addressing'
(007) and 'Three testing levels'/'What Molecule tests' (008) to #### so they
nest correctly instead of flattening to siblings. Tighten the adr-structure
Superseded pattern to require '(YYYY-MM-DD)' per ADR-023.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:00:58 +02:00
0df24909e3 docs(adr): restructure ADRs 016-018 to ADR-023 conformance
Make the existing Status sections parseable (Accepted (date) + the existing
designed-not-built note) and add Consequences sections assembled from each
ADR's already-stated residual risks, trade-offs and build status. No
decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:51:51 +02:00
40a428975a docs(adr): restructure ADR-003 to ADR-023 conformance
Add Status, a descriptive Context, a Decision umbrella over the existing
topical sections (demoted to ###), and a Consequences section assembled
from the ADR's already-stated rationale. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:50:03 +02:00
6d7d27b03b docs(adr): add Proposed lifecycle state; mark ADR-011 Proposed
Revisits the lifecycle decision on the evidence of ADR-011 (a real draft
with open questions). Adds a fourth state, Proposed (YYYY-MM-DD), to ADR-023,
the template, the adr-structure check (+test), spec and plan. Sets ADR-011's
Status to Proposed and removes its now-redundant inline 'Proposed' line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:48:55 +02:00
b3ca510380 docs(adr): restructure ADRs 010,011,013 to ADR-023 conformance
010/011: relabel Decisions->Decision + add Status/Consequences.
013: add Status + Decision umbrella (existing Consequences untouched).
No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:43:41 +02:00
44dbd4628f docs(adr): restructure ADRs 006-009 to ADR-023 conformance
Add dated Status sections, a Decision umbrella over the existing topical
sections (demoted to ###), and Consequences assembled from each ADR's
already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:41:24 +02:00
188882449d docs(adr): restructure ADRs 001,002,004,005,012,014,015 to ADR-023 conformance
Add dated Status sections and (where missing) Consequences sections assembled
from each ADR's already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:39:00 +02:00
9b1502cf7d docs(adr): register ADR-023 and note adr-structure check
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:33:55 +02:00
a9aab9d040 docs(adr): ADR-023 — ADR structure & lifecycle
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:32:40 +02:00
3c920ae630 docs(adr): sync plan Task 2 with flat-comment template fix
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:31:23 +02:00
ab14d65aa1 docs(adr): add adr-template.md scaffold (ADR-023)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:30:52 +02:00
89179dd7c9 docs(adr): revise spec+plan — full retroactive restructure of 001-018
Replaces the Status-only backfill with a faithful presentational
restructure bringing the whole back-catalogue to 4-section conformance
(no grandfathering). Adds the faithfulness rule and per-file worklist.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:28:20 +02:00
a3ea0f7d80 feat(review): add adr-structure check to repo-scan
Flags numbered ADRs missing a mandatory section (Status/Context/Decision/
Consequences) or with an unparseable Status line. Presence only, not order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:57:42 +02:00
ce3319cbed docs(adr): implementation plan + FRICTION signal for ADR structure
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:55:16 +02:00
dfbe37916f docs(adr): design spec for ADR structure & lifecycle (ADR-023)
Codifies the structure ADRs 019-022 converged on, pins an
Accepted/Superseded/Deprecated lifecycle with a no-silent-rewrite rule,
adds an adr-template.md scaffold, and plans a Status-header backfill of
ADRs 001-018. Basis for ADR-023.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:45:21 +02:00
4116286ed0 feat(hooks): Stop guard blocking the execution-mode menu
Mechanical fix for the 4×-recurring execution-mode menu ask (kaizen 2026-06-10).
A Stop hook reads the transcript and, if the final assistant message presents the
"subagent-driven vs inline — which approach?" menu, blocks the turn and tells the
model to proceed subagent-driven (boma's standing preference). Fails open,
respects stop_hook_active (no loop), tight match signature (no false positives on
meta-discussion). Pipe-tested across 5 scenarios. Activates next session
(settings watcher only tracks files present at session start).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 12:51:46 +02:00
91713127cb docs(kaizen): migrate gotchas to docs; curate FRICTION log (2026-06-10 review)
- New docs/testing/gotchas.md (nft iif/iifname, Molecule ansible_host,
  apply-path coverage blind spot, render-nft-c pattern); pointer from ADR-008.
- claude-code-setup.md gains "Environment gotchas" (hooks-need-restart,
  pre-commit stashes unstaged, rbw sync cache, zsh word-split).
- FRICTION.md restructured into Open signals + a decisions ledger; consumed
  signals archived with where their resolution now lives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 12:51:39 +02:00
2dbcac11a0 chore(tooling): scope ansible-lint to ansible content; venv PATH in make test
Kaizen 2026-06-10 fixes:
- ansible-lint pre-commit hook now `always_run: false` + a files filter for
  roles/playbooks/inventories YAML, so docs-/config-only commits skip it and no
  longer need `rbw unlock` (root cause was ansible-lint auto-decrypting the
  group_vars vault, not the syntax-check).
- `make test`/`test-all` prepend $(CURDIR)/.venv/bin to PATH so non-activated
  agent runs find ansible-config/ansible-playbook.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 12:51:30 +02:00
9be4366ac3 feat(backup): backup strategy foundation layer (ADR-022)
Plan 1 of the backup & DR strategy: ADR-022, per-service backup__* contract +
BACKUP.md governance (template + checklist gate + new-role runbook step + dormant
/check-backup), and hardware/CAPABILITIES updates. Docs-only; the backup role and
live restore testing are Plans 2-3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:32:36 +02:00
ed6d5463aa docs(backup): final-review fixes — stateless BACKUP.md, dump-step wording, spec sync
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:32:06 +02:00
1e85c11ede docs(backup): update hardware ref (ubongo M70q, add fisi) + CAPABILITIES §9 (ADR-022)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 11:25:37 +02:00
5f946ac640 feat(backup): add dormant /check-backup verifier (ADR-022) 2026-06-10 11:22:57 +02:00
01e47d0890 docs(backup): add BACKUP.md step to new-role runbook (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:21:56 +02:00
81dac4f28b docs(backup): gate BACKUP.md in service checklist (ADR-022) 2026-06-10 11:20:55 +02:00
f3f80443d0 docs(backup): add BACKUP.md template + backup__* contract (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:20:01 +02:00
f5c97d1f36 docs(backup): record ADR-022; wire into CLAUDE.md, STATUS, TODO
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 11:19:01 +02:00
da116e1d92 docs(friction): log execution-mode ask (4th occurrence)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:06:25 +02:00
2041bd3b70 docs(backup): add foundation-layer implementation plan (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:05:17 +02:00
eaffd8d900 docs(backup): add backup & DR strategy design (→ ADR-022)
Data-only restic backups, rebuild-from-code recovery (Model A); central
off-cluster pull node (fisi) with 8TB mirror; 3-2-1 via pCloud (rclone)
+ rotated USB air-gap. Per-service backup__* contract + BACKUP.md as a
hard convention. Two-tier restore testing (ubongo container restore-verify
+ semi-annual staging DR rehearsal). One restic password escrowed to
Vaultwarden + paper (restic + vault passwords) for a non-circular
break-glass. Dead-man's-switch alerting via Uptime Kuma.

Resolves TODO 3.8; grounds ADR-011's backup-first assumption.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:00:01 +02:00