Incident 2026-06-17: applying base's nftables default-deny (forward policy drop)
to askari — a Docker host — broke container forwarding/NAT on reboot, and the
wt0-only sshd ListenAddress left no break-glass (ip_nonlocal_bind did NOT beat
the boot race). Recovery: disable nftables + restart docker (restore the wiped
NAT masquerade) + force-recreate the coordinator (it FATAL-looped unable to
download its GeoLite2 DB with no egress) -> mesh re-formed.
Back out the enablement so a future deploy can't re-break askari:
- offsite_hosts: base__ssh_listen_mesh_only=false, base__firewall_apply=false
- remove host_vars/askari.yml (manage over the WAN again, not wt0)
- tf/offsite: re-open WAN :22 to ubongo only (break-glass; already applied)
askari now: sshd on all interfaces (Ansible-managed), nftables disabled, WAN :22
open -> stable + reboot-survivable. The base feature code (sshd ListenAddress
option, firewall public zone) stays; it's just not enabled on Docker hosts.
Mesh-hardening 1/3 to be re-spec'd before any retry.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Hetzner Cloud Firewall SSH rule is now conditional on a non-empty
ssh_admin_cidrs (default []); askari sets it empty so the WAN :22 rule is
removed on the next apply. SSH is reached over wt0; break-glass is the Hetzner
console. Apply is the live cutover (Task 5). Mesh-hardening 1/3.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
<host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)
O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01
description, base/playbooks/scripts/terraform/public_dns README build-state,
CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23
stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported.
make lint green. No stale-deferred (ADR-011 open questions still open).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ARM (cax11) unavailable in all EU locations 2026-06-14; fell back to cx23 (x86,
same 2/4/40 spec, cheaper in hel1). Server created (id 141153963); offsite.yml
generated into the directory inventory.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
terraform init failed: child modules using non-hashicorp providers must declare
required_providers, else TF infers hashicorp/{hcloud,proxmox} (nonexistent). Add
versions.tf to hetzner_vm AND proxmox_vm (same latent bug, never caught because
Proxmox TF was never init'd). Track the offsite lock (hcloud 1.65.0). Caught by
running 'make tf-init/plan TF_ENV=offsite' on ubongo — static review missed it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review catches: (1) <<-EOT strips by the closing marker's indent, so the
cloud-config body must match it (2 spaces) for '#cloud-config' to land at column
0; (2) the Hetzner Cloud Firewall filters public traffic, so ssh_admin_cidrs is
ubongo's WAN/egress IP, not its LAN address — a private CIDR would lock SSH out of
the live VPS.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Forgejo's /raw/ API is read-only so it cannot serve as a Terraform HTTP state
backend. Switch both envs to local state on the control node (ADR-006); remove
the dead TF_HTTP_* credential hints.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
R9: pass vlan_tag (default 20 = srv VLAN, ADR-007) from both envs to the
proxmox_vm module so VMs are tagged, not on untagged vmbr0. R11: make new-role
now sed-substitutes ROLE_NAME_PLACEHOLDER so scaffolded molecule converge works
out of the box.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>