Settles the M1 design: full registrar transfer Cloudflare -> Gandi; three-tier
naming scheme (host.boma / service.bare / service.askari), nyumbani dropped,
mesh/LAN-only default; public-DNS-as-code via a control-node `public_dns` role
driven by group_vars data, using community.general.gandi_livedns with a PAT
(api_key is deprecated/rejected by Gandi — verified per ADR-014). Stale records +
unused MX cleaned by omission. Cert scope is DNS+PAT only (issuance deferred to
M4/Phase 2). Human/agent division of labour + token-scoping recorded.
Resolves TODO 4 and review finding O12 once the ADR-007 amendment lands. Point
ROADMAP.md M1 at the spec.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
High-level build order for the project (Approach A): one Off-site/Remote-access
track first (Gandi DNS-as-code -> askari -> NetBird control plane -> enroll
ubongo + road-warrior laptops -> harden), a procurement gate sized by
/capacity-review, then the Cluster track. Sequences the docs/TODO.md backlog into
milestones and records why the order is what it is.
Decisions captured this session: Gandi over Cloudflare is values-driven and
independent of NetBird (sequenced first so records are born at Gandi); public DNS
managed as code (Ansible, consistent with internal DNS + Terraform-owns-no-DNS);
NetBird-on-ubongo before base default-deny (chicken-and-egg); cluster procurement
gated on patterns proven on two cheap hosts.
Wire ROADMAP.md into CLAUDE.md's Further-reading index and point TODO.md at it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
playbooks/site.yml imports the docker_host role, but it didn't exist, so
ansible-lint's syntax-check failed on a clean checkout — breaking CLAUDE.md's
"main must always work" / "Never skip lint" (top open finding O1 from the
2026-06-11 review).
Scaffold docker_host as a proper placeholder via the prescribed mechanism
(make new-role): filled meta/main.yml + README, an honest no-task tasks/main.yml
documenting planned scope (Docker engine + Compose, daemon hardening, nftables.d
container rules per ADR-004/020), and the standard molecule scenario. This
preserves site.yml's full-standard-state intent rather than dropping the play.
Update STATUS.md (docker_host moves from "Not in git" to "scaffolded, no tasks")
and the role/playbook READMEs to match.
make lint: 0 failures, 0 warnings; check-tags OK.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the
base(firewall)+dev_env build wave: README/playbook/role notes that still called
the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021
cross-links in ADR-016/020.
18 open findings reported (not fixed). Headline: `make lint` is red on `main`
(site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022
backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of
12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Debian's npm package pulls a ~400-package node-* tree (the first deploy
installed 527 packages). Replace apt nodejs+npm with a pinned upstream Node
tarball (v20.19.2) installed to /opt + symlinked, mirroring the nvim install
pattern (ADR-014 pinning). npm/npx come bundled. Molecule verifies node/npm
on PATH; lint + idempotent converge green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
group_vars/all assumes the ansible service user (created by bootstrap on
Terraform VMs). ubongo is the manually-provisioned control node (ADR-009/
ADR-015 exception) with no bootstrapped ansible user, so connect as sjat.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the login user differs from the become_user (ubongo connects as sjat,
the role copies files as claude), Ansible needs ACLs on its temp files;
without the acl package it falls back to an unsupported chmod syntax and
fails. Molecule didn't catch it (root login can chown directly).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two latent bugs that blocked the documented deploy path (never exercised
end-to-end before applying dev_env to ubongo):
- Makefile: the PLAYBOOK variable was both the ansible-playbook BINARY path
and the user-supplied playbook NAME, so `make check/deploy PLAYBOOK=<name>`
overrode the binary. Renamed the binary var to PLAYBOOK_BIN.
- ansible.cfg: stdout_callback=yaml and callbacks_enabled=timer were
community.general plugins (not installed; boma only ships ansible.posix).
Use the built-in default callback with callback_result_format=yaml and
ansible.posix.profile_tasks — same intent, no new heavy collection.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A new role (separate from base) that gives workstation-class hosts (ubongo
now, mamba later) a clean interactive environment: zsh + oh-my-zsh +
oh-my-posh, tmux + TPM plugins, and neovim. Dotfiles are real files deployed
via GNU stow (not templated); pinned nvim v0.12.2 + oh-my-posh 29.0.1.
Configs re-derived (ADR-013) from AnsibleBaobabV4 + the operator's fisi setup
on boma's terms: no Nerd Font (headless host), no system LSP suite (nvim uses
mason), versions pinned (V4 tracks latest). Applied via playbooks/workstation.yml
to the control group for users sjat + claude. Lint + Molecule (idempotent) green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move ubongo to 'Built (partial)' in STATUS; fill real M70q hardware specs
(i3-10100T, 16 GB, 256 GB SanDisk X600 SATA, no disk encryption). Record in
ADR-015 the dedicated claude AI-worker identity, LAN-SSH-only operational
reality, and the no-encryption decision; close the rbw offline-cache
recovery-verification item (ADR-015 + rotate-secrets). Add accepted-risk R5
(control-node disk unencrypted at rest) with its compensating controls.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the now-built physical control node ubongo (10.20.10.151) into the
production control group (the documented manual exception), and activate the
dormant base__firewall_control_addr knob (ADR-021 ssh-from-control source).
Forward-wiring only: no host has the base role applied yet.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the interactive build decisions (no-encryption + accepted risk,
simple partition, dedicated claude identity, LAN-only access, pinned
versions) and the A-F + H task breakdown. Sequel to the 2026-06-05
docs-only ADR-015 plan.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final-review polish: demote the sub-headings under the demoted 'IP addressing'
(007) and 'Three testing levels'/'What Molecule tests' (008) to #### so they
nest correctly instead of flattening to siblings. Tighten the adr-structure
Superseded pattern to require '(YYYY-MM-DD)' per ADR-023.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the existing Status sections parseable (Accepted (date) + the existing
designed-not-built note) and add Consequences sections assembled from each
ADR's already-stated residual risks, trade-offs and build status. No
decision substance changed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add Status, a descriptive Context, a Decision umbrella over the existing
topical sections (demoted to ###), and a Consequences section assembled
from the ADR's already-stated rationale. No decision substance changed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revisits the lifecycle decision on the evidence of ADR-011 (a real draft
with open questions). Adds a fourth state, Proposed (YYYY-MM-DD), to ADR-023,
the template, the adr-structure check (+test), spec and plan. Sets ADR-011's
Status to Proposed and removes its now-redundant inline 'Proposed' line.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add dated Status sections, a Decision umbrella over the existing topical
sections (demoted to ###), and Consequences assembled from each ADR's
already-stated implications. No decision substance changed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add dated Status sections and (where missing) Consequences sections assembled
from each ADR's already-stated implications. No decision substance changed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the Status-only backfill with a faithful presentational
restructure bringing the whole back-catalogue to 4-section conformance
(no grandfathering). Adds the faithfulness rule and per-file worklist.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flags numbered ADRs missing a mandatory section (Status/Context/Decision/
Consequences) or with an unparseable Status line. Presence only, not order.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codifies the structure ADRs 019-022 converged on, pins an
Accepted/Superseded/Deprecated lifecycle with a no-silent-rewrite rule,
adds an adr-template.md scaffold, and plans a Status-header backfill of
ADRs 001-018. Basis for ADR-023.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mechanical fix for the 4×-recurring execution-mode menu ask (kaizen 2026-06-10).
A Stop hook reads the transcript and, if the final assistant message presents the
"subagent-driven vs inline — which approach?" menu, blocks the turn and tells the
model to proceed subagent-driven (boma's standing preference). Fails open,
respects stop_hook_active (no loop), tight match signature (no false positives on
meta-discussion). Pipe-tested across 5 scenarios. Activates next session
(settings watcher only tracks files present at session start).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Kaizen 2026-06-10 fixes:
- ansible-lint pre-commit hook now `always_run: false` + a files filter for
roles/playbooks/inventories YAML, so docs-/config-only commits skip it and no
longer need `rbw unlock` (root cause was ansible-lint auto-decrypting the
group_vars vault, not the syntax-check).
- `make test`/`test-all` prepend $(CURDIR)/.venv/bin to PATH so non-activated
agent runs find ansible-config/ansible-playbook.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plan 1 of the backup & DR strategy: ADR-022, per-service backup__* contract +
BACKUP.md governance (template + checklist gate + new-role runbook step + dormant
/check-backup), and hardware/CAPABILITIES updates. Docs-only; the backup role and
live restore testing are Plans 2-3.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Data-only restic backups, rebuild-from-code recovery (Model A); central
off-cluster pull node (fisi) with 8TB mirror; 3-2-1 via pCloud (rclone)
+ rotated USB air-gap. Per-service backup__* contract + BACKUP.md as a
hard convention. Two-tier restore testing (ubongo container restore-verify
+ semi-annual staging DR rehearsal). One restic password escrowed to
Vaultwarden + paper (restic + vault passwords) for a non-circular
break-glass. Dead-man's-switch alerting via Uptime Kuma.
Resolves TODO 3.8; grounds ADR-011's backup-first assumption.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Complete the 2026-06-09 entry (third recurrence of presenting the
execution-mode menu despite the standing subagent-driven preference) and
restore two continuation-line indents a markdown formatter had stripped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>