Incident 2026-06-17: applying base's nftables default-deny (forward policy drop) to askari — a Docker host — broke container forwarding/NAT on reboot, and the wt0-only sshd ListenAddress left no break-glass (ip_nonlocal_bind did NOT beat the boot race). Recovery: disable nftables + restart docker (restore the wiped NAT masquerade) + force-recreate the coordinator (it FATAL-looped unable to download its GeoLite2 DB with no egress) -> mesh re-formed. Back out the enablement so a future deploy can't re-break askari: - offsite_hosts: base__ssh_listen_mesh_only=false, base__firewall_apply=false - remove host_vars/askari.yml (manage over the WAN again, not wt0) - tf/offsite: re-open WAN :22 to ubongo only (break-glass; already applied) askari now: sshd on all interfaces (Ansible-managed), nftables disabled, WAN :22 open -> stable + reboot-survivable. The base feature code (sshd ListenAddress option, firewall public zone) stays; it's just not enabled on Docker hosts. Mesh-hardening 1/3 to be re-spec'd before any retry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| environments | ||
| modules | ||
| README.md | ||
terraform/
Infrastructure provisioning. Terraform owns VM existence only — creating and destroying Proxmox VMs. It writes no DNS records and configures nothing inside a VM; Ansible owns all of that.
modules/proxmox_vm/— reusable VM module (Proxmox only).modules/hetzner_vm/— reusable VM module (Hetzner Cloud: server + firewall + SSH key + cloud-init).environments/{staging,production}/— separate state per environment (Proxmox). Add a VM by editinglocal.vmsin that env'smain.tf, thenmake tf-plan→tf-apply→tf-inventory. Not yetterraform inited.environments/offsite/— the off-site Hetzner host (askari); the one applied environment. Usemake tf-* TF_ENV=offsiteandtf-inventory-offsite.
Rationale: ADR-006. Handoff to Ansible: ADR-009. Secrets via TF_VAR_*
only — never in .tfvars. See STATUS.md for what is provisioned.