- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional, outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative boma.baobab.band -> boma.wingu.me transition note already added earlier - terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and <host>.boma.baobab.band per ADR-007 naming (O11) - ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections placed after Consequences, matching ADR-014/019-023 (O13) - docs/README + inventories/README: list the missing subdirs / offsite_hosts + offsite.yml merge behaviour (O14, O29 note) - ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19) - ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20) - ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21) - netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23) - ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24) - capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28) - tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9) - tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep) O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected); the fix lives in the generator for the next regeneration. make lint + pytest (57) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.7 KiB
ADR-009 — Terraform ↔ Ansible provisioning handoff
Status
Accepted (2026-05-30)
Context
Two tools touch every managed host. Terraform owns what exists — VMs on Proxmox. Ansible owns what is configured inside — users, packages, firewall, Docker services, and all internal DNS. This ADR is the single source of truth for the seam between them: the exact handoff, the data contract, and the one documented exception. The two tools must never overlap; this document defines the line they meet at.
ADR-006 covers Terraform's internals (providers, state, structure). ADR-005 covers the cloud-init template that VMs are cloned from. This ADR covers how they connect.
Decision
The boundary
| Layer | Tool | Notes |
|---|---|---|
| VM existence | Terraform | Create/destroy Proxmox VMs, assign static IPs |
| VM resolver (cloud-init) | Terraform | Sets which DNS servers a VM queries — not a zone record |
| OS configuration | Ansible | Users, SSH, firewall, packages |
| Service deployment | Ansible | Docker, Compose files, secrets |
| OPNsense (all) | Ansible | Firewall rules, DHCP, interfaces, VLANs |
| Internal DNS (all records) | Ansible (dns role) |
Internal zone rendered from inventory + group_vars; see ADR-007 |
This table is canonical here. ADR-006 links to it rather than restating it. Terraform owns VM existence only — it writes no DNS records (see "Internal DNS" below).
The handoff pipeline
There is one path by which a managed host comes into existence and reaches its configured state:
make tf-plan TF_ENV=production # review infrastructure changes
make tf-apply TF_ENV=production # clone template → VM (no DNS records written)
make tf-inventory TF_ENV=production # regenerate Ansible inventory from outputs
make check PLAYBOOK=site # dry-run Ansible against the new host(s)
make deploy PLAYBOOK=bootstrap # first-run specifics (see ADR-005)
make deploy PLAYBOOK=site # full standard state — `dns` role writes the zone
tf-apply creates the VM by cloning the Debian 13 cloud-init template (ADR-005).
tf-inventory regenerates the Ansible inventory from Terraform outputs. From
make check onward the host is Ansible's — including its DNS record, which the
dns role writes into the internal zone during make deploy.
Adding a host means editing local.vms in the environment's main.tf and running
this pipeline — never by hand-editing the inventory.
The data contract
The seam's interface is a single Terraform output consumed by a single script.
Producer — terraform/environments/<env>/outputs.tf emits a vms map:
{
"vms": {
"value": {
"host-a": { "ip": "192.168.1.10", "group": "docker_hosts" }
}
}
}
Consumer — scripts/tf_to_inventory.py (Python standard library only) reads
terraform output -json and writes inventories/<env>/hosts.yml. It validates the
group against the allowed set and fails loudly on an unknown group.
Valid groups: control, docker_hosts, proxmox_hosts, offsite_hosts.
control holds ubongo, a physical machine not managed by Terraform (see the
control-node exception below and ADR-015). offsite_hosts holds askari, which is
Terraform-managed via the hetznercloud/hcloud provider in the offsite environment
(see the off-site handoff note below and ADR-016).
The generated hosts.yml carries a "do not edit manually" header and is owned by
the generator. Treat it as a build artifact: the source of truth is local.vms in
Terraform, and the inventory is regenerated, never edited.
Cloud-init's role
Cloud-init is the thin first-boot layer between Terraform and Ansible:
- Terraform clones the cloud-init template (ADR-005) and sets cloud-init values (hostname, SSH public key, IP/gateway).
- Cloud-init does just enough at first boot to make the VM reachable over SSH with the ansible user's key — nothing more.
- Ansible takes over from a reachable host: the
bootstrapplaybook handles first-run specifics, thensiteapplies the full standard state.
The line is sharp: cloud-init buys reachability, Ansible owns configuration.
Internal DNS — owned by Ansible, no chicken-and-egg
Terraform writes no DNS records. The internal zone (boma.baobab.band) is
rendered entirely by the Ansible dns role:
- Host A records derive from the inventory — the same
hostname → ipdata that originated inlocal.vmsand reached Ansible viamake tf-inventory. So Terraform remains the ultimate source of truth for which hosts exist; the data simply flows through the inventory instead of through a direct Terraform→DNS write. - Service, alias (CNAME), split-horizon, and non-VM records (e.g. the OPNsense
gateway,
vaultwarden.wingu.me→ proxy split-horizon) are explicit zone data ingroup_vars.
This dissolves the bootstrap cycle that a Terraform-managed zone would create. If
Terraform wrote records via RFC 2136, provisioning the first DNS server would
require a DNS server that does not yet exist — dns1 cannot register its own A
record before it is running and configured. Because Ansible renders the zone from
inventory (using IP addresses, never name resolution, to connect), dns1/dns2
are ordinary Terraform-created VMs whose records are written by the same role that
configures the DNS service. There is no special case and no ordering trap.
ADR-007 holds the zone structure, split-horizon, and addressing conventions. The
IP-range split there (.10–.19 core infra vs .50–.249 fleet) is now an addressing
convention only — it no longer implies any difference in how records are written.
The control-node exception
The control node — the host that runs Terraform and Ansible — is ubongo, a
dedicated physical machine outside the cluster. It is not a VM at all, so
Terraform genuinely never touches it: it cannot provision the infrastructure that
would provision itself (chicken-and-egg). It is therefore the single documented
exception to "Terraform owns VM existence":
- Provisioned and bootstrapped manually on bare metal, per the control-node section of ADR-005; rationale, hardware, and recovery model in ADR-015.
- Listed in
inventories/<env>/hosts.ymlunder thecontrolgroup, and managed by Ansible for baseline config only (nodocker_hostrole).
Every other host is Terraform-managed.
The off-site handoff (offsite environment → offsite_hosts)
askari (Hetzner VPS, ADR-016) follows the same handoff pipeline as Proxmox hosts but
with its own provider and environment:
- Producer —
terraform/environments/offsite/outputs.tfemits avmsmap in the same{ host: { ip, group } }shape as Proxmox environments;askari's group isoffsite_hosts. - Consumer —
scripts/tf_to_inventory.pyreadsterraform output -jsonfrom theoffsiteenvironment and writesinventories/production/offsite.yml. - Makefile target —
make tf-inventory-offsiteruns the generator for the offsite environment.
The production inventory is a directory (inventories/production/) that Ansible
merges at runtime: hosts.yml (Proxmox-generated) and offsite.yml
(offsite-generated) together form the full production host list. Each file is a build
artifact — never hand-edited; their source of truth is local.vms in the respective
environment's main.tf.
What was ruled out
| Option | Reason |
|---|---|
Manual qm clone as a general provisioning path |
Terraform is the single way VMs come into existence; a parallel manual path would let the inventory and real infrastructure drift. The sole exception is the control node. |
| Hand-editing the generated inventory | hosts.yml is a build artifact of tf_to_inventory.py; edits are overwritten on the next make tf-inventory. Edit local.vms instead. |
| Documenting the seam in both ADR-005 and ADR-006 | The boundary belongs in exactly one place. Those ADRs link here. |
Terraform-managed DNS records (hashicorp/dns + RFC 2136) |
Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle. |
Consequences
Drawn from the boundary, the data contract, and the "What was ruled out" section above:
- Adding a host means editing
local.vmsand running the handoff pipeline; the generatedhosts.ymlis a build artifact and must never be hand-edited — manual edits are overwritten on the nextmake tf-inventory(The handoff pipeline; The data contract; What was ruled out). - Manual
qm cloneis rejected as a general provisioning path so the inventory and real infrastructure cannot drift; Terraform is the single way VMs come into existence (What was ruled out). - Terraform writes no DNS records: the Ansible
dnsrole renders the whole internal zone from inventory plusgroup_vars, dissolving the bootstrap cycle a Terraform-managed zone (hashicorp/dns+ RFC 2136) would create (Internal DNS — owned by Ansible, no chicken-and-egg; What was ruled out). - The control node (
ubongo) is the single documented exception to "Terraform owns VM existence" — a physical machine provisioned manually and managed by Ansible for baseline config only (The control-node exception). - The
offsiteTF environment'svmsoutput feeds theoffsite_hostsgroup viatf_to_inventory.py(make tf-inventory-offsite→inventories/production/offsite.yml); the production inventory is a directory that mergeshosts.yml(Proxmox) andoffsite.yml(offsite) (The off-site handoff). - The seam is documented in exactly one place (this ADR); ADR-005 and ADR-006 link here rather than restating it (What was ruled out).