# ADR-009 — Terraform ↔ Ansible provisioning handoff ## Status Accepted (2026-05-30) ## Context Two tools touch every managed host. Terraform owns **what exists** — VMs on Proxmox. Ansible owns **what is configured inside** — users, packages, firewall, Docker services, and all internal DNS. This ADR is the single source of truth for the seam between them: the exact handoff, the data contract, and the one documented exception. The two tools must never overlap; this document defines the line they meet at. ADR-006 covers Terraform's internals (providers, state, structure). ADR-005 covers the cloud-init template that VMs are cloned from. This ADR covers how they connect. --- ## Decision ### The boundary | Layer | Tool | Notes | |---|---|---| | VM existence | Terraform | Create/destroy Proxmox VMs, assign static IPs | | VM resolver (cloud-init) | Terraform | Sets *which* DNS servers a VM queries — not a zone record | | OS configuration | Ansible | Users, SSH, firewall, packages | | Service deployment | Ansible | Docker, Compose files, secrets | | OPNsense (all) | Ansible | Firewall rules, DHCP, interfaces, VLANs | | Internal DNS (all records) | Ansible (`dns` role) | Internal zone rendered from inventory + `group_vars`; see ADR-007 | This table is canonical here. ADR-006 links to it rather than restating it. Terraform owns VM **existence** only — it writes no DNS records (see "Internal DNS" below). --- ### The handoff pipeline There is one path by which a managed host comes into existence and reaches its configured state: ``` make tf-plan TF_ENV=production # review infrastructure changes make tf-apply TF_ENV=production # clone template → VM (no DNS records written) make tf-inventory TF_ENV=production # regenerate Ansible inventory from outputs make check PLAYBOOK=site # dry-run Ansible against the new host(s) make deploy PLAYBOOK=bootstrap # first-run specifics (see ADR-005) make deploy PLAYBOOK=site # full standard state — `dns` role writes the zone ``` `tf-apply` creates the VM by cloning the Debian 13 cloud-init template (ADR-005). `tf-inventory` regenerates the Ansible inventory from Terraform outputs. From `make check` onward the host is Ansible's — including its DNS record, which the `dns` role writes into the internal zone during `make deploy`. Adding a host means editing `local.vms` in the environment's `main.tf` and running this pipeline — **never** by hand-editing the inventory. --- ### The data contract The seam's interface is a single Terraform output consumed by a single script. **Producer** — `terraform/environments//outputs.tf` emits a `vms` map: ```json { "vms": { "value": { "host-a": { "ip": "192.168.1.10", "group": "docker_hosts" } } } } ``` **Consumer** — `scripts/tf_to_inventory.py` (Python standard library only) reads `terraform output -json` and writes `inventories//hosts.yml`. It validates the group against the allowed set and fails loudly on an unknown group. **Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`. `control` holds `ubongo`, a physical machine not managed by Terraform (see the control-node exception below and ADR-015). `offsite_hosts` holds `askari`, which is Terraform-managed via the `hetznercloud/hcloud` provider in the `offsite` environment (see the off-site handoff note below and ADR-016). The generated `hosts.yml` carries a "do not edit manually" header and is owned by the generator. Treat it as a build artifact: the source of truth is `local.vms` in Terraform, and the inventory is regenerated, never edited. --- ### Cloud-init's role Cloud-init is the thin first-boot layer between Terraform and Ansible: - **Terraform** clones the cloud-init template (ADR-005) and sets cloud-init values (hostname, SSH public key, IP/gateway). - **Cloud-init** does just enough at first boot to make the VM reachable over SSH with the ansible user's key — nothing more. - **Ansible** takes over from a reachable host: the `bootstrap` playbook handles first-run specifics, then `site` applies the full standard state. The line is sharp: cloud-init buys *reachability*, Ansible owns *configuration*. --- ### Internal DNS — owned by Ansible, no chicken-and-egg Terraform writes **no** DNS records. The internal zone (`boma.baobab.band`) is rendered entirely by the Ansible `dns` role: - **Host A records** derive from the inventory — the same `hostname → ip` data that originated in `local.vms` and reached Ansible via `make tf-inventory`. So Terraform remains the ultimate source of truth for which hosts exist; the data simply flows through the inventory instead of through a direct Terraform→DNS write. - **Service, alias (CNAME), split-horizon, and non-VM records** (e.g. the OPNsense gateway, `forgejo.nyumbani.baobab.band` → proxy) are explicit zone data in `group_vars`. This dissolves the bootstrap cycle that a Terraform-managed zone would create. If Terraform wrote records via RFC 2136, provisioning the **first** DNS server would require a DNS server that does not yet exist — `dns1` cannot register its own A record before it is running and configured. Because Ansible renders the zone from inventory (using IP addresses, never name resolution, to connect), `dns1`/`dns2` are ordinary Terraform-created VMs whose records are written by the same role that configures the DNS service. There is no special case and no ordering trap. ADR-007 holds the zone structure, split-horizon, and addressing conventions. The IP-range split there (`.10–.19` core infra vs `.50–.249` fleet) is now an addressing convention only — it no longer implies any difference in how records are written. --- ### The control-node exception The control node — the host that runs Terraform and Ansible — is `ubongo`, a dedicated **physical** machine outside the cluster. It is not a VM at all, so Terraform genuinely never touches it: it cannot provision the infrastructure that would provision itself (chicken-and-egg). It is therefore the single documented exception to "Terraform owns VM existence": - Provisioned and bootstrapped manually on bare metal, per the control-node section of ADR-005; rationale, hardware, and recovery model in ADR-015. - Listed in `inventories//hosts.yml` under the `control` group, and managed by Ansible for baseline config only (no `docker_host` role). Every other host is Terraform-managed. --- ### The off-site handoff (`offsite` environment → `offsite_hosts`) `askari` (Hetzner VPS, ADR-016) follows the same handoff pipeline as Proxmox hosts but with its own provider and environment: - **Producer** — `terraform/environments/offsite/outputs.tf` emits a `vms` map in the same `{ host: { ip, group } }` shape as Proxmox environments; `askari`'s group is `offsite_hosts`. - **Consumer** — `scripts/tf_to_inventory.py` reads `terraform output -json` from the `offsite` environment and writes `inventories/production/offsite.yml`. - **Makefile target** — `make tf-inventory-offsite` runs the generator for the offsite environment. The production inventory is a **directory** (`inventories/production/`) that Ansible merges at runtime: `hosts.yml` (Proxmox-generated) and `offsite.yml` (offsite-generated) together form the full production host list. Each file is a build artifact — never hand-edited; their source of truth is `local.vms` in the respective environment's `main.tf`. --- ### What was ruled out | Option | Reason | |---|---| | Manual `qm clone` as a general provisioning path | Terraform is the single way VMs come into existence; a parallel manual path would let the inventory and real infrastructure drift. The sole exception is the control node. | | Hand-editing the generated inventory | `hosts.yml` is a build artifact of `tf_to_inventory.py`; edits are overwritten on the next `make tf-inventory`. Edit `local.vms` instead. | | Documenting the seam in both ADR-005 and ADR-006 | The boundary belongs in exactly one place. Those ADRs link here. | | Terraform-managed DNS records (`hashicorp/dns` + RFC 2136) | Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle. | ## Consequences Drawn from the boundary, the data contract, and the "What was ruled out" section above: - Adding a host means editing `local.vms` and running the handoff pipeline; the generated `hosts.yml` is a build artifact and must never be hand-edited — manual edits are overwritten on the next `make tf-inventory` (The handoff pipeline; The data contract; What was ruled out). - Manual `qm clone` is rejected as a general provisioning path so the inventory and real infrastructure cannot drift; Terraform is the single way VMs come into existence (What was ruled out). - Terraform writes no DNS records: the Ansible `dns` role renders the whole internal zone from inventory plus `group_vars`, dissolving the bootstrap cycle a Terraform-managed zone (`hashicorp/dns` + RFC 2136) would create (Internal DNS — owned by Ansible, no chicken-and-egg; What was ruled out). - The control node (`ubongo`) is the single documented exception to "Terraform owns VM existence" — a physical machine provisioned manually and managed by Ansible for baseline config only (The control-node exception). - The `offsite` TF environment's `vms` output feeds the `offsite_hosts` group via `tf_to_inventory.py` (`make tf-inventory-offsite` → `inventories/production/offsite.yml`); the production inventory is a directory that merges `hosts.yml` (Proxmox) and `offsite.yml` (offsite) (The off-site handoff). - The seam is documented in exactly one place (this ADR); ADR-005 and ADR-006 link here rather than restating it (What was ruled out).