11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01
description, base/playbooks/scripts/terraform/public_dns README build-state,
CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23
stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported.
make lint green. No stale-deferred (ADR-011 open questions still open).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture NetBird's configure.sh reference for a pinned version → translate into
boma role templates (compose + management.json + dex/openid + turnserver),
external-proxy mode behind the M4a Caddy (netbird.askari.wingu.me). First service
role: full ADR-004 standard files; secrets generated/CHANGEME-stubbed (setup key
for M5). Gated live deploy + verify.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switch from a custom caddy-dns/gandi image built on-host to the official
caddy:2 image with per-host ACME HTTP-01 certificates. Removes the
Dockerfile, env.j2 (Gandi token), on-host image build/ship/load tasks,
the caddy-image Makefile target, and the wildcard DNS-01 Caddyfile.
Each route now gets its own server block and automatic certificate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the Caddy reverse proxy role (ADR-024): builds boma/caddy-gandi:latest
on-host (caddy-dns/gandi plugin), renders Caddyfile from route catalog, brings
Compose project up. Adds community.docker to requirements.yml, production group_vars,
and a caddy-image Makefile target.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the docker_host role tasks: prerequisites, /etc/apt/keyrings
directory (ordered before the GPG key write), Docker APT key + repo, and
docker-ce/cli/containerd.io/compose-plugin install. Daemon hardening and
nftables.d integration remain deferred to Phase 2 (cluster + base firewall).
Updates defaults, README, and molecule verify to assert docker --version.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds ADR-024 pinning Caddy (xcaddy + caddy-dns/gandi) as boma's reverse
proxy, superseding the soft Traefik assumption in the roadmap and ADR-017
prose. Updates CLAUDE.md Further reading table and ROADMAP.md Phase-2 step 5.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First of M4's two build phases: docker_host (Docker engine), custom xcaddy Caddy
image (caddy-dns/gandi), reverse_proxy role (Caddyfile from a route catalog,
DNS-01 wildcard cert for *.askari.wingu.me via vault.gandi.pat), ADR-024 (Caddy is
boma's reverse proxy), firewall 80/443 + DNS, proven by serving a test route over
TLS. M4b (NetBird) follows, reading NetBird's current self-host compose then.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Caddy becomes boma's standard reverse proxy (amends the soft Traefik assumption;
new ADR) with Gandi DNS-01 certs (custom xcaddy image, reuses vault.gandi.pat) —
the only cert path for mesh/LAN-only services. NetBird self-hosted in
external-proxy mode (embedded Dex), compose rendered from boma templates
(ADR-004/013). Three roles: docker_host (first real content), reverse_proxy (new,
Caddy), netbird (first service role w/ full ADR-004 standard files). Firewall +
DNS amendments; backup execution deferred (fisi). caddy-dns/gandi + NetBird
self-host facts verified.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two bugs caught by the live make check/deploy on askari:
- include_tasks with a tag selects the include but NOT its tasks, so --tags hardening
ran nothing. Use apply: {tags:} to propagate (also fixed the firewall include).
- fail2ban service start + restart handler fail in a first-run --check (package not
installed yet); guard both with when: not ansible_check_mode so check is clean.
Applied to askari: SSH hardened, fail2ban active, ping still works (no lockout).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add explicit base__ssh_authorised_keys: [] default to prevent
undefined-var errors in Molecule. Extend verify.yml with sshd
drop-in validation, PasswordAuthentication check, and fail2ban
jail assertion. Pre-create /run/sshd in ssh.yml so sshd -t
works in containers before the service has ever started.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ADR-002 baseline (key-only, no root, fail2ban 5/1h) as two base task files under
the existing 'hardening' concern tag; applied to askari by tag (NOT the host
firewall — that's mesh-gated to avoid lockout; Hetzner Cloud Firewall is the
perimeter until M5). NetBird agent deferred to M4. Adds a LIMIT=/TAGS= passthrough
to make check/deploy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
askari provisioned + bootstrapped (cx23/hel1/Debian 13.5, cloud-init ansible user
+ sudo, cloud firewall SSH-from-ubongo-WAN, reachable, in offsite_hosts). Added
askari.wingu.me A -> 77.42.120.136 via public_dns. STATUS: askari moves to
'Real and working today'.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ARM (cax11) unavailable in all EU locations 2026-06-14; fell back to cx23 (x86,
same 2/4/40 spec, cheaper in hel1). Server created (id 141153963); offsite.yml
generated into the directory inventory.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
terraform init failed: child modules using non-hashicorp providers must declare
required_providers, else TF infers hashicorp/{hcloud,proxmox} (nonexistent). Add
versions.tf to hetzner_vm AND proxmox_vm (same latent bug, never caught because
Proxmox TF was never init'd). Track the offsite lock (hcloud 1.65.0). Caught by
running 'make tf-init/plan TF_ENV=offsite' on ubongo — static review missed it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Streamline the recurring secret-entry friction: the agent stubs a needed secret as
vault.<service>.<key>=CHANGEME with a what/how-to-obtain comment, wires the code,
and commits; the operator fills it via make edit-vault (real value never hits chat).
check-vault now lists outstanding CHANGEME placeholders so none are forgotten.
Convention documented in CLAUDE.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review catches: (1) <<-EOT strips by the closing marker's indent, so the
cloud-config body must match it (2 spaces) for '#cloud-config' to land at column
0; (2) the Hetzner Cloud Firewall filters public traffic, so ssh_admin_cidrs is
ubongo's WAN/egress IP, not its LAN address — a private CIDR would lock SSH out of
the live VPS.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
M1 public_dns applied to wingu.me (purge + SPF/DMARC, idempotent). Friction:
item.values dict-method collision, Gandi null-MX rejection, and the apply=false-
Molecule/data-only-pytest gap that let both bugs reach a live apply.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
item.values resolved to the dict's built-in .values() METHOD, not the 'values'
key, so gandi_livedns received '<built-in method values of dict object at 0x..>'
as the TXT value — garbage AND non-idempotent (the address changes each run).
Bracket-index all loop fields. Caught only by the live apply (apply=false Molecule
+ data-only pytest both missed it).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gandi LiveDNS rejects the RFC-7505 null-MX value '0 .' ('invalid format for MX
record'), which failed the live apply. No MX + no apex A = no mail delivery, and
SPF -all + DMARC reject still prevent spoofing — so remove Gandi's seeded MX (add
@/MX to absent) rather than declare a null-MX present. Assert now requires an SPF
@/TXT record; tests + Molecule sample updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The naming-table amendment left the 'External monitoring' prose saying
askari.baobab.band; askari is greenfield (never on baobab.band), so its FQDN is
askari.wingu.me, off-site tier.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Converge runs in CI; the no-op apply=false scenario adds no local signal over
the pytest, and the test image is on an unreachable registry.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implement M1: manage wingu.me public DNS zone at Gandi LiveDNS via
community.general.gandi_livedns (PAT from vault.gandi.pat). Adds
assertion guard for domain + null-MX, present/absent record loops
with run_once, and apply-gate for Molecule dry-run mode.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chat-exposed PAT was rotated at Gandi and swapped in via the new edit-vault
target; commit the re-encrypted vault so the rotation is versioned.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5th occurrence (06-14): asked the subagent-driven/inline menu at the M1 plan
handoff. The 06-10 ledger claims a Stop hook blocks this; it didn't fire. Flag to
verify the hook is present + its matcher catches the writing-plans menu wording.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bite-sized TDD plan: add community.general; scaffold public_dns; wingu.me record
data + pytest; role tasks (gandi_livedns present/absent loops, apply toggle);
Molecule (apply=false, no live API); dns.yml play; gated live run on ubongo
(purge Gandi defaults + anti-spoof baseline + dig verify); ADR-007 amendment +
TODO 4 resolution + STATUS/CAPABILITIES.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
askari is provisioned as IaC: Terraform owns its existence too, generalizing
ADR-006 from "Proxmox VM existence" to Proxmox + Hetzner (new hetznercloud/hcloud
provider, hetzner_vm module, offsite stack with local state). CAX11 (ARM) in
Helsinki on Debian 13, behind a TF-managed Hetzner Cloud Firewall (SSH-from-ubongo
now; NetBird ports in M4). Token via TF_VAR_hcloud_token from vault.hetzner.token.
Handoff stays ADR-009-shaped (tf_to_inventory.py extended to emit askari into
offsite_hosts). State in the ADR-022 backup scope; DR via terraform import.
Amends ADR-006/009/020/007/016. Point ROADMAP.md M2 at the spec.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decided to keep the project named boma with wingu.me as its domain (boma was not
available as a domain). Record why the infra tier reads <host>.boma.wingu.me so it
isn't re-litigated; folds into the ADR-007 amendment.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`make edit-vault` runs `ansible-vault edit` (decrypt → nvim → re-encrypt on :wq,
abort on :cq) so editing the vault is one step with no plaintext left in the work
tree, then validates structure. `make check-vault` runs scripts/check-vault.py:
decrypts in-memory, asserts valid YAML with secrets under the nested `vault:` map
and no empty leaves, and prints a values-masked structure view (comments visible,
secrets never printed). Both default to the production all-vault; override VAULT=.
Update the vault header comment, CLAUDE.md (command table + Secrets section), and
scripts/README to point at edit-vault (note check-vault.py is the one venv-
dependent helper, by design).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Personal Access Token for wingu.me LiveDNS, used by the M1 public_dns role via
community.general.gandi_livedns. Stored under the nested vault.<service>.<key> map
(CLAUDE.md); the placeholder canary is preserved. Verified the token authenticates
+ is scoped to wingu.me, and that the file round-trips (decrypts to the expected
structure). PAT to be rotated after M1 (transmitted in plaintext during setup).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
boma's domain is wingu.me (registered at Gandi; 'wingu' = Swahili for cloud).
Replace the parametric <boma-domain> placeholder with wingu.me throughout. The
zone was NOT empty — Gandi auto-seeded 13 default records (parking A, www redirect,
a full Gandi mailbox set), so M1 includes a one-time purge to a clean baseline plus
an anti-spoof null-mail set (null MX, SPF -all, DMARC reject) since wingu.me sends
no mail. Domain-pick open item closed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Settles the M1 design: full registrar transfer Cloudflare -> Gandi; three-tier
naming scheme (host.boma / service.bare / service.askari), nyumbani dropped,
mesh/LAN-only default; public-DNS-as-code via a control-node `public_dns` role
driven by group_vars data, using community.general.gandi_livedns with a PAT
(api_key is deprecated/rejected by Gandi — verified per ADR-014). Stale records +
unused MX cleaned by omission. Cert scope is DNS+PAT only (issuance deferred to
M4/Phase 2). Human/agent division of labour + token-scoping recorded.
Resolves TODO 4 and review finding O12 once the ADR-007 amendment lands. Point
ROADMAP.md M1 at the spec.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>