Commit graph

217 commits

Author SHA1 Message Date
07af037ff3 feat(make): offsite TF token injection + directory inventory + tf-inventory-offsite
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 12:05:41 +02:00
127ade59a3 feat(tf): offsite environment — askari (CAX11/hel1/debian-13)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 12:03:31 +02:00
bbc287900a feat(tf): hetzner_vm module (server + firewall + ssh key + cloud-init)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 12:03:01 +02:00
29921428c4 docs(plan): M2 — askari provisioning (Terraform + Hetzner Cloud)
9-task plan: verify hcloud facts; hetzner_vm module (server+firewall+ssh+cloud-init);
offsite env (CAX11/hel1/debian-13, local state); Makefile token-injection + directory
inventory + tf-inventory-offsite; offsite-handoff pytest; init/validate/plan; GATED
apply (billed VPS) + bootstrap; ADR-006/009/020/007/016 amendments. Resolves the
inventory-handoff open item via a directory inventory.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 11:53:08 +02:00
993d7885e4 docs: mark M1 applied (STATUS); log item.values + Gandi null-MX gotchas
M1 public_dns applied to wingu.me (purge + SPF/DMARC, idempotent). Friction:
item.values dict-method collision, Gandi null-MX rejection, and the apply=false-
Molecule/data-only-pytest gap that let both bugs reach a live apply.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:58:03 +02:00
76bd1d63fc fix(public_dns): index loop keys with item['key'] not item.key
item.values resolved to the dict's built-in .values() METHOD, not the 'values'
key, so gandi_livedns received '<built-in method values of dict object at 0x..>'
as the TXT value — garbage AND non-idempotent (the address changes each run).
Bracket-index all loop fields. Caught only by the live apply (apply=false Molecule
+ data-only pytest both missed it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:57:23 +02:00
078d1ad9d9 fix(public_dns): drop null-MX (Gandi rejects '0 .'); remove MX instead
Gandi LiveDNS rejects the RFC-7505 null-MX value '0 .' ('invalid format for MX
record'), which failed the live apply. No MX + no apex A = no mail delivery, and
SPF -all + DMARC reject still prevent spoofing — so remove Gandi's seeded MX (add
@/MX to absent) rather than declare a null-MX present. Assert now requires an SPF
@/TXT record; tests + Molecule sample updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:53:54 +02:00
3cb6436ad2 docs(adr-007): fix askari FQDN to askari.wingu.me (review nit)
The naming-table amendment left the 'External monitoring' prose saying
askari.baobab.band; askari is greenfield (never on baobab.band), so its FQDN is
askari.wingu.me, off-site tier.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:44:21 +02:00
f170ffd936 docs(public_dns): amend ADR-007 to wingu.me/Gandi; resolve TODO 4; STATUS + CAPABILITIES
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:38:45 +02:00
e247af6e55 test(public_dns): Molecule scenario (apply disabled, no live API)
Converge runs in CI; the no-op apply=false scenario adds no local signal over
the pytest, and the test image is on an unreachable registry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:36:40 +02:00
a0a3e4d356 feat(public_dns): dns.yml play (control-node, Gandi LiveDNS)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:35:30 +02:00
bd84dd0213 feat(public_dns): role tasks, defaults, meta, README
Implement M1: manage wingu.me public DNS zone at Gandi LiveDNS via
community.general.gandi_livedns (PAT from vault.gandi.pat). Adds
assertion guard for domain + null-MX, present/absent record loops
with run_once, and apply-gate for Molecule dry-run mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:34:42 +02:00
9311968363 feat(public_dns): wingu.me record data + validation test
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:33:07 +02:00
91ad629c02 secrets(vault): rotate Gandi PAT (via make edit-vault)
The chat-exposed PAT was rotated at Gandi and swapped in via the new edit-vault
target; commit the re-encrypted vault so the rotation is versioned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:30:58 +02:00
70c302d7e5 scaffold(public_dns): empty role structure
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:30:02 +02:00
6f5c7b2bfb deps: add community.general for gandi_livedns (public_dns)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:29:57 +02:00
e96480692d docs(friction): execution-mode menu recurred despite the 06-10 mechanical fix
5th occurrence (06-14): asked the subagent-driven/inline menu at the M1 plan
handoff. The 06-10 ledger claims a Stop hook blocks this; it didn't fire. Flag to
verify the hook is present + its matcher catches the writing-plans menu wording.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:26:43 +02:00
b131ee317e docs(plan): M1 — public_dns implementation plan
Bite-sized TDD plan: add community.general; scaffold public_dns; wingu.me record
data + pytest; role tasks (gandi_livedns present/absent loops, apply toggle);
Molecule (apply=false, no live API); dns.yml play; gated live run on ubongo
(purge Gandi defaults + anti-spoof baseline + dig verify); ADR-007 amendment +
TODO 4 resolution + STATUS/CAPABILITIES.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:23:26 +02:00
602550fdaa docs(spec): M2 — provision askari via Terraform + Hetzner Cloud
askari is provisioned as IaC: Terraform owns its existence too, generalizing
ADR-006 from "Proxmox VM existence" to Proxmox + Hetzner (new hetznercloud/hcloud
provider, hetzner_vm module, offsite stack with local state). CAX11 (ARM) in
Helsinki on Debian 13, behind a TF-managed Hetzner Cloud Firewall (SSH-from-ubongo
now; NetBird ports in M4). Token via TF_VAR_hcloud_token from vault.hetzner.token.
Handoff stays ADR-009-shaped (tf_to_inventory.py extended to emit askari into
offsite_hosts). State in the ADR-022 backup scope; DR via terraform import.

Amends ADR-006/009/020/007/016. Point ROADMAP.md M2 at the spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:12:10 +02:00
32d480efcf docs(spec): note project (boma) vs domain (wingu.me) in the naming scheme
Decided to keep the project named boma with wingu.me as its domain (boma was not
available as a domain). Record why the infra tier reads <host>.boma.wingu.me so it
isn't re-litigated; folds into the ADR-007 amendment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:47:13 +02:00
79f2315eee feat(make): add edit-vault + check-vault targets
`make edit-vault` runs `ansible-vault edit` (decrypt → nvim → re-encrypt on :wq,
abort on :cq) so editing the vault is one step with no plaintext left in the work
tree, then validates structure. `make check-vault` runs scripts/check-vault.py:
decrypts in-memory, asserts valid YAML with secrets under the nested `vault:` map
and no empty leaves, and prints a values-masked structure view (comments visible,
secrets never printed). Both default to the production all-vault; override VAULT=.

Update the vault header comment, CLAUDE.md (command table + Secrets section), and
scripts/README to point at edit-vault (note check-vault.py is the one venv-
dependent helper, by design).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:36:15 +02:00
43e5a4aa53 secrets(vault): add Gandi LiveDNS PAT as vault.gandi.pat
Personal Access Token for wingu.me LiveDNS, used by the M1 public_dns role via
community.general.gandi_livedns. Stored under the nested vault.<service>.<key> map
(CLAUDE.md); the placeholder canary is preserved. Verified the token authenticates
+ is scoped to wingu.me, and that the file round-trips (decrypts to the expected
structure). PAT to be rotated after M1 (transmitted in plaintext during setup).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:14:10 +02:00
f7fac5f5e3 docs(spec): M1 — finalize for wingu.me (greenfield), record Gandi-defaults purge
boma's domain is wingu.me (registered at Gandi; 'wingu' = Swahili for cloud).
Replace the parametric <boma-domain> placeholder with wingu.me throughout. The
zone was NOT empty — Gandi auto-seeded 13 default records (parking A, www redirect,
a full Gandi mailbox set), so M1 includes a one-time purge to a clean baseline plus
an anti-spoof null-mail set (null MX, SPF -all, DMARC reject) since wingu.me sends
no mail. Domain-pick open item closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 09:14:10 +02:00
7a47dd9dec docs(spec): M1 — public DNS migration to Gandi (DNS-as-code) design
Settles the M1 design: full registrar transfer Cloudflare -> Gandi; three-tier
naming scheme (host.boma / service.bare / service.askari), nyumbani dropped,
mesh/LAN-only default; public-DNS-as-code via a control-node `public_dns` role
driven by group_vars data, using community.general.gandi_livedns with a PAT
(api_key is deprecated/rejected by Gandi — verified per ADR-014). Stale records +
unused MX cleaned by omission. Cert scope is DNS+PAT only (issuance deferred to
M4/Phase 2). Human/agent division of labour + token-scoping recorded.

Resolves TODO 4 and review finding O12 once the ADR-007 amendment lands. Point
ROADMAP.md M1 at the spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 23:17:19 +02:00
be2679cc66 docs(roadmap): record decided DNS naming scheme in M1
Three-tier scheme: <host>.boma.baobab.band (infra, internal) /
<service>.baobab.band (home, split-horizon, mesh/LAN-only default) /
<service>.askari.baobab.band (off-site, public). nyumbani dropped; mesh carries
the baobab.band match-domain to road-warriors; *.baobab.band DNS-01 wildcard
certs via Gandi API. Resolves TODO 4 and review finding O12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:17:28 +02:00
3cfcb1c2e9 docs(roadmap): add ROADMAP.md — remote-access-first build order
High-level build order for the project (Approach A): one Off-site/Remote-access
track first (Gandi DNS-as-code -> askari -> NetBird control plane -> enroll
ubongo + road-warrior laptops -> harden), a procurement gate sized by
/capacity-review, then the Cluster track. Sequences the docs/TODO.md backlog into
milestones and records why the order is what it is.

Decisions captured this session: Gandi over Cloudflare is values-driven and
independent of NetBird (sequenced first so records are born at Gandi); public DNS
managed as code (Ansible, consistent with internal DNS + Terraform-owns-no-DNS);
NetBird-on-ubongo before base default-deny (chicken-and-egg); cluster procurement
gated on patterns proven on two cheap hosts.

Wire ROADMAP.md into CLAUDE.md's Further-reading index and point TODO.md at it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:12:38 +02:00
03d33f83dd fix(O1): scaffold docker_host role so make lint passes on main
playbooks/site.yml imports the docker_host role, but it didn't exist, so
ansible-lint's syntax-check failed on a clean checkout — breaking CLAUDE.md's
"main must always work" / "Never skip lint" (top open finding O1 from the
2026-06-11 review).

Scaffold docker_host as a proper placeholder via the prescribed mechanism
(make new-role): filled meta/main.yml + README, an honest no-task tasks/main.yml
documenting planned scope (Docker engine + Compose, daemon hardening, nftables.d
container rules per ADR-004/020), and the standard molecule scenario. This
preserves site.yml's full-standard-state intent rather than dropping the play.

Update STATUS.md (docker_host moves from "Not in git" to "scaffolded, no tasks")
and the role/playbook READMEs to match.

make lint: 0 failures, 0 warnings; check-tags OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:53:55 +02:00
1da117d65b docs(review): 2026-06-11 repo audit — fix build-wave doc drift
/review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the
base(firewall)+dev_env build wave: README/playbook/role notes that still called
the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021
cross-links in ADR-016/020.

18 open findings reported (not fixed). Headline: `make lint` is red on `main`
(site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022
backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of
12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:48:00 +02:00
67f2aba9d8 STATUS: record dev_env (built+applied) and working deploy path
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:21:36 +02:00
aea4f8c3d6 dev_env: install Node.js from pinned tarball, drop npm bloat
Debian's npm package pulls a ~400-package node-* tree (the first deploy
installed 527 packages). Replace apt nodejs+npm with a pinned upstream Node
tarball (v20.19.2) installed to /opt + symlinked, mirroring the nvim install
pattern (ADR-014 pinning). npm/npx come bundled. Molecule verifies node/npm
on PATH; lint + idempotent converge green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:21:33 +02:00
6203513220 inventory: manage ubongo (control node) as the operator account
group_vars/all assumes the ansible service user (created by bootstrap on
Terraform VMs). ubongo is the manually-provisioned control node (ADR-009/
ADR-015 exception) with no bootstrapped ansible user, so connect as sjat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:09:15 +02:00
607423d0e7 dev_env: install acl for become_user file copies
When the login user differs from the become_user (ubongo connects as sjat,
the role copies files as claude), Ansible needs ACLs on its temp files;
without the acl package it falls back to an unsupported chmod syntax and
fails. Molecule didn't catch it (root login can chown directly).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:09:12 +02:00
a2bb99928c fix(deploy): make check/deploy actually run
Two latent bugs that blocked the documented deploy path (never exercised
end-to-end before applying dev_env to ubongo):
- Makefile: the PLAYBOOK variable was both the ansible-playbook BINARY path
  and the user-supplied playbook NAME, so `make check/deploy PLAYBOOK=<name>`
  overrode the binary. Renamed the binary var to PLAYBOOK_BIN.
- ansible.cfg: stdout_callback=yaml and callbacks_enabled=timer were
  community.general plugins (not installed; boma only ships ansible.posix).
  Use the built-in default callback with callback_result_format=yaml and
  ansible.posix.profile_tasks — same intent, no new heavy collection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:09:12 +02:00
f3f382ae69 Add dev_env role: zsh/tmux/nvim for workstation-class hosts
A new role (separate from base) that gives workstation-class hosts (ubongo
now, mamba later) a clean interactive environment: zsh + oh-my-zsh +
oh-my-posh, tmux + TPM plugins, and neovim. Dotfiles are real files deployed
via GNU stow (not templated); pinned nvim v0.12.2 + oh-my-posh 29.0.1.

Configs re-derived (ADR-013) from AnsibleBaobabV4 + the operator's fisi setup
on boma's terms: no Nerd Font (headless host), no system LSP suite (nvim uses
mason), versions pinned (V4 tracks latest). Applied via playbooks/workstation.yml
to the control group for users sjat + claude. Lint + Molecule (idempotent) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:50:11 +02:00
b9daf2a0ad plan: record ubongo build outcome (done/deferred/follow-ups)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:33:18 +02:00
349d10d65c docs: record ubongo physical build (2026-06-11)
Move ubongo to 'Built (partial)' in STATUS; fill real M70q hardware specs
(i3-10100T, 16 GB, 256 GB SanDisk X600 SATA, no disk encryption). Record in
ADR-015 the dedicated claude AI-worker identity, LAN-SSH-only operational
reality, and the no-encryption decision; close the rbw offline-cache
recovery-verification item (ADR-015 + rotate-secrets). Add accepted-risk R5
(control-node disk unencrypted at rest) with its compensating controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:26 +02:00
7b5fd17e55 inventory: add ubongo to control group; set ssh-from-control addr
Wire the now-built physical control node ubongo (10.20.10.151) into the
production control group (the documented manual exception), and activate the
dormant base__firewall_control_addr knob (ADR-021 ssh-from-control source).
Forward-wiring only: no host has the base role applied yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:24 +02:00
7b190e4313 Add ubongo physical-build plan (2026-06-11 session)
Captures the interactive build decisions (no-encryption + accepted risk,
simple partition, dedicated claude identity, LAN-only access, pinned
versions) and the A-F + H task breakdown. Sequel to the 2026-06-05
docs-only ADR-015 plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:01:41 +02:00
7ebbc113ab Merge feat/adr-structure: ADR-023 structure & lifecycle + back-catalogue conformance
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:18:48 +02:00
fa3db421dc docs(kaizen): FRICTION signal — controller must diff-audit subagent restructures
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:01:21 +02:00
d0a3307822 docs(adr): fix 007/008 heading nesting; require date in Superseded status
Final-review polish: demote the sub-headings under the demoted 'IP addressing'
(007) and 'Three testing levels'/'What Molecule tests' (008) to #### so they
nest correctly instead of flattening to siblings. Tighten the adr-structure
Superseded pattern to require '(YYYY-MM-DD)' per ADR-023.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:00:58 +02:00
0df24909e3 docs(adr): restructure ADRs 016-018 to ADR-023 conformance
Make the existing Status sections parseable (Accepted (date) + the existing
designed-not-built note) and add Consequences sections assembled from each
ADR's already-stated residual risks, trade-offs and build status. No
decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:51:51 +02:00
40a428975a docs(adr): restructure ADR-003 to ADR-023 conformance
Add Status, a descriptive Context, a Decision umbrella over the existing
topical sections (demoted to ###), and a Consequences section assembled
from the ADR's already-stated rationale. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:50:03 +02:00
6d7d27b03b docs(adr): add Proposed lifecycle state; mark ADR-011 Proposed
Revisits the lifecycle decision on the evidence of ADR-011 (a real draft
with open questions). Adds a fourth state, Proposed (YYYY-MM-DD), to ADR-023,
the template, the adr-structure check (+test), spec and plan. Sets ADR-011's
Status to Proposed and removes its now-redundant inline 'Proposed' line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:48:55 +02:00
b3ca510380 docs(adr): restructure ADRs 010,011,013 to ADR-023 conformance
010/011: relabel Decisions->Decision + add Status/Consequences.
013: add Status + Decision umbrella (existing Consequences untouched).
No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:43:41 +02:00
44dbd4628f docs(adr): restructure ADRs 006-009 to ADR-023 conformance
Add dated Status sections, a Decision umbrella over the existing topical
sections (demoted to ###), and Consequences assembled from each ADR's
already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:41:24 +02:00
188882449d docs(adr): restructure ADRs 001,002,004,005,012,014,015 to ADR-023 conformance
Add dated Status sections and (where missing) Consequences sections assembled
from each ADR's already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:39:00 +02:00
9b1502cf7d docs(adr): register ADR-023 and note adr-structure check
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:33:55 +02:00
a9aab9d040 docs(adr): ADR-023 — ADR structure & lifecycle
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:32:40 +02:00
3c920ae630 docs(adr): sync plan Task 2 with flat-comment template fix
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:31:23 +02:00