askari is provisioned as IaC: Terraform owns its existence too, generalizing ADR-006 from "Proxmox VM existence" to Proxmox + Hetzner (new hetznercloud/hcloud provider, hetzner_vm module, offsite stack with local state). CAX11 (ARM) in Helsinki on Debian 13, behind a TF-managed Hetzner Cloud Firewall (SSH-from-ubongo now; NetBird ports in M4). Token via TF_VAR_hcloud_token from vault.hetzner.token. Handoff stays ADR-009-shaped (tf_to_inventory.py extended to emit askari into offsite_hosts). State in the ADR-022 backup scope; DR via terraform import. Amends ADR-006/009/020/007/016. Point ROADMAP.md M2 at the spec. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8 KiB
Design — Provisioning askari (Terraform + Hetzner Cloud)
- Date: 2026-06-14
- Status: Draft for review — design settled in brainstorming; pending user review, then implementation plan
- Roadmap milestone: M2 (
docs/ROADMAP.md) - Amends: ADR-006 (Terraform scope → Proxmox + Hetzner), ADR-009 (offsite
handoff), ADR-020 (Hetzner Cloud Firewall = askari's perimeter), ADR-007/016 (
askariis Terraform-provisioned, not "added manually") - Becomes: amendments to those ADRs
Problem
askari (the off-site Hetzner VPS — NetBird coordinator + watchdog, later the off-site
log subset) does not exist yet. ADR-007/016 designed it as "provisioned independently…
added manually." Now that there's a dedicated Hetzner account + a verified API token in
the vault, we can provision it as IaC instead. boma's principle (ADR-006/009) is
"Terraform owns VM existence; Ansible owns config" — but scoped to Proxmox. This
milestone generalizes that principle to Hetzner and stands askari up.
Decisions (as settled)
- Terraform owns
askari's existence (Approach 1) — generalize ADR-006 from "Proxmox VM existence" to "VM existence on Proxmox + Hetzner." (Rejected: Ansiblehetzner.hcloud— breaks the TF/Ansible boundary;hcloudCLI — not stateful IaC.) - Server: CAX11 (ARM/Ampere, 2 vCPU / 4 GB / 40 GB), Helsinki (
hel1), Debian 13. Rescale up later if the off-site log subset needs it. - TF-managed Hetzner Cloud Firewall as
askari's perimeter (the off-site OPNsense-analog). Starts minimal (SSH from ubongo only); service ports are added as services land (NetBird ports in M4). The ADR-020 catalog stays authoritative for the host nftables layer. - Token via
TF_VAR_hcloud_token, sourced fromvault.hetzner.tokenat apply time — never in.tfvars(CLAUDE.md). - Handoff stays ADR-009-shaped:
tf_to_inventory.pyis extended to emitaskariintooffsite_hosts, sohosts.ymlstays fully generated.
Verified facts (ADR-014)
verified: Hetzner Cloud entry tiers · WebSearch · 2026-06-14 · CAX11 (ARM/Ampere) 2 vCPU / 4 GB / 40 GB ≈ €3.79/mo, 20 TB traffic + 1 IPv4; ARM (CAX) is EU-locations only (incl.
hel1). Price change for new orders from 2026-06-15.
to verify when writing the role (ADR-014): the
hetznercloud/hcloudprovider version to pin; the Debian 13 image slug (expecteddebian-13); CAX11 availability inhel1.
Architecture
Terraform structure
- Module
terraform/modules/hetzner_vm/(sibling toproxmox_vm): inputsname,server_type,location,image,ssh_keys,user_data,firewall_rules,labels; outputs the server'sipv4(+ id, name). - Stack
terraform/environments/offsite/(its own local state on ubongo, gitignored):providers.tfpinshetznercloud/hcloud;main.tfcallshetzner_vmforaskari+ anhcloud_firewall+ anhcloud_ssh_key;variables.tf(incl.hcloud_token,control_ssh_pubkey,ssh_admin_cidr);outputs.tf(askariipv4, for the handoff + DNS);backend.tf(local state, like the Proxmox envs). make tf-* TF_ENV=offsitedrives it; foroffsitethe targets first exportTF_VAR_hcloud_tokenfromvault.hetzner.token(a small vault→env step).tf-applystays gated behind a showntf-plan(CLAUDE.md).
Provisioning → Ansible handoff
- TF creates the CAX11 with a cloud-init
user_datathat injects ubongo's control SSH public key for first login (minimal — no config beyond the key + ensuring Python is present for Ansible). - TF outputs
askari's public IPv4.tf_to_inventory.py(extended for the offsite stack) writesaskariinto theoffsite_hostsgroup ofhosts.yml. playbooks/bootstrap.ymlruns againstaskari→ creates theansibleuser + sudoers (as for Proxmox hosts). Where M2 ends.- (Downstream, not M2):
baseremote-access subset (M3), NetBird coordinator (M4), mesh enrollment + SSH-narrowed-to-wt0(M5).
- A convenience
askari.wingu.meA record is added via the M1public_dnsrole (stable name for humans + future certs); the inventory may reference it once DNS exists.
Cloud firewall (perimeter)
- TF
hcloud_firewallattached toaskari:- inbound SSH (22/tcp) from ubongo's address only (
ssh_admin_cidrvar); - everything else default-deny.
- inbound SSH (22/tcp) from ubongo's address only (
- Grows with services: NetBird's UDP 3478 (Coturn) + TCP 80/443 (management/dashboard) are added in M4 when the coordinator deploys — not opened to a non-existent listener now.
- This is the off-site perimeter layer (OPNsense has no presence off-cluster);
ADR-020's
group_varscatalog remains the single source for the host nftables layer thatbaserenders (M3).
State + disaster recovery
- The
offsiteterraform.tfstatelives on ubongo and is added to the ADR-022 backup scope (the control-node TF state backup already flagged in STATUS). - DR is management-only:
askarisurvives a homelab/ubongo outage by design, so a lost state is recovered byterraform import-ing the still-running server — no rebuild.
Division of labour & access
| Task | Who | How |
|---|---|---|
| Hetzner token | Done | vault.hetzner.token (verified live, HTTP 200). |
hetzner_vm module + offsite stack + tf_to_inventory extension + make token-inject |
Agent | Committed IaC + a pytest for the handoff. |
terraform plan (offsite) |
Agent | make tf-plan TF_ENV=offsite, output shown. |
terraform apply (offsite) |
Human-gated | Only after the plan is reviewed (CLAUDE.md: never apply without a shown plan). Run on ubongo. |
| Confirm the control SSH key | Human | Which ubongo key Ansible uses to reach hosts (its public key feeds control_ssh_pubkey). |
- Token:
TF_VAR_hcloud_tokenfrom vault at apply; never written to a.tfvarsfile. - SSH: cloud-init injects only the control public key; the private key stays on ubongo. The cloud firewall limits SSH to ubongo's address until the mesh exists.
Testing & verification
terraform fmt+terraform validate+make tf-plan TF_ENV=offsite(plan reviewed before any apply).- pytest for the
tf_to_inventory.pyoffsite extension (mirrors the existing stdlib-only script tests), asserting anaskarientry lands inoffsite_hosts. - Post-apply: SSH reachability from ubongo; cloud-init ran; then
bootstrap.ymlconnectivity. (base/NetBird get their own Molecule/verify in M3/M4.)
Scope boundaries — what M2 is NOT
- Not the
basehardening subset (SSH hardening, fail2ban, NetBird agent) — M3. - Not the NetBird coordinator or the cloud-firewall NetBird ports — M4.
- Not mesh enrollment / narrowing SSH to
wt0— M5. - Not the off-site log subset (may need a bigger instance / a volume) — later.
ADR work
- ADR-006 — generalize "Terraform owns VM existence" to Proxmox + Hetzner; add the
hetznercloud/hcloudprovider (no longer "the only provider isbpg/proxmox"); add theoffsiteenvironment +hetzner_vmmodule to Structure; note the TF-managed Hetzner Cloud Firewall. - ADR-009 — the offsite handoff (
tf_to_inventory.pyemitsaskari→offsite_hosts). - ADR-020 — the Hetzner Cloud Firewall is
askari's perimeter (OPNsense-analog); catalog still authoritative for host nftables. - ADR-007 / ADR-016 —
askariis Terraform-provisioned (hcloud), superseding "added manually."
Open items (resolve during the plan / implementation)
- Pin the
hetznercloud/hcloudprovider version; confirm thedebian-13image slug and CAX11/hel1availability (ADR-014). - The make tf token-inject mechanism for
offsite(readvault.hetzner.token→ exportTF_VAR_hcloud_token) — shape it in the plan (rbw/ansible-vault one-liner vs a wrapper). - Whether the inventory references
askariby IPv4 (from TF output) or byaskari.wingu.meonce the DNS record exists — decide in the plan.