boma/docs/decisions/009-provisioning-handoff.md
sjat 810e6d557b Correct Forgejo host to forgejo.nyumbani.baobab.band
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:16:38 +02:00

6.6 KiB
Raw Blame History

ADR-009 — Terraform ↔ Ansible provisioning handoff

Context

Two tools touch every managed host. Terraform owns what exists — VMs on Proxmox. Ansible owns what is configured inside — users, packages, firewall, Docker services, and all internal DNS. This ADR is the single source of truth for the seam between them: the exact handoff, the data contract, and the one documented exception. The two tools must never overlap; this document defines the line they meet at.

ADR-006 covers Terraform's internals (providers, state, structure). ADR-005 covers the cloud-init template that VMs are cloned from. This ADR covers how they connect.


The boundary

Layer Tool Notes
VM existence Terraform Create/destroy Proxmox VMs, assign static IPs
VM resolver (cloud-init) Terraform Sets which DNS servers a VM queries — not a zone record
OS configuration Ansible Users, SSH, firewall, packages
Service deployment Ansible Docker, Compose files, secrets
OPNsense (all) Ansible Firewall rules, DHCP, interfaces, VLANs
Internal DNS (all records) Ansible (dns role) Internal zone rendered from inventory + group_vars; see ADR-007

This table is canonical here. ADR-006 links to it rather than restating it. Terraform owns VM existence only — it writes no DNS records (see "Internal DNS" below).


The handoff pipeline

There is one path by which a managed host comes into existence and reaches its configured state:

make tf-plan TF_ENV=production       # review infrastructure changes
make tf-apply TF_ENV=production      # clone template → VM (no DNS records written)
make tf-inventory TF_ENV=production  # regenerate Ansible inventory from outputs
make check PLAYBOOK=site             # dry-run Ansible against the new host(s)
make deploy PLAYBOOK=bootstrap       # first-run specifics (see ADR-005)
make deploy PLAYBOOK=site            # full standard state — `dns` role writes the zone

tf-apply creates the VM by cloning the Debian 13 cloud-init template (ADR-005). tf-inventory regenerates the Ansible inventory from Terraform outputs. From make check onward the host is Ansible's — including its DNS record, which the dns role writes into the internal zone during make deploy.

Adding a host means editing local.vms in the environment's main.tf and running this pipeline — never by hand-editing the inventory.


The data contract

The seam's interface is a single Terraform output consumed by a single script.

Producerterraform/environments/<env>/outputs.tf emits a vms map:

{
  "vms": {
    "value": {
      "host-a": { "ip": "192.168.1.10", "group": "docker_hosts" }
    }
  }
}

Consumerscripts/tf_to_inventory.py (Python standard library only) reads terraform output -json and writes inventories/<env>/hosts.yml. It validates the group against the allowed set and fails loudly on an unknown group.

Valid groups: control, docker_hosts, proxmox_hosts.

The generated hosts.yml carries a "do not edit manually" header and is owned by the generator. Treat it as a build artifact: the source of truth is local.vms in Terraform, and the inventory is regenerated, never edited.


Cloud-init's role

Cloud-init is the thin first-boot layer between Terraform and Ansible:

  • Terraform clones the cloud-init template (ADR-005) and sets cloud-init values (hostname, SSH public key, IP/gateway).
  • Cloud-init does just enough at first boot to make the VM reachable over SSH with the ansible user's key — nothing more.
  • Ansible takes over from a reachable host: the bootstrap playbook handles first-run specifics, then site applies the full standard state.

The line is sharp: cloud-init buys reachability, Ansible owns configuration.


Internal DNS — owned by Ansible, no chicken-and-egg

Terraform writes no DNS records. The internal zone (boma.baobab.band) is rendered entirely by the Ansible dns role:

  • Host A records derive from the inventory — the same hostname → ip data that originated in local.vms and reached Ansible via make tf-inventory. So Terraform remains the ultimate source of truth for which hosts exist; the data simply flows through the inventory instead of through a direct Terraform→DNS write.
  • Service, alias (CNAME), split-horizon, and non-VM records (e.g. the OPNsense gateway, forgejo.nyumbani.baobab.band → proxy) are explicit zone data in group_vars.

This dissolves the bootstrap cycle that a Terraform-managed zone would create. If Terraform wrote records via RFC 2136, provisioning the first DNS server would require a DNS server that does not yet exist — dns1 cannot register its own A record before it is running and configured. Because Ansible renders the zone from inventory (using IP addresses, never name resolution, to connect), dns1/dns2 are ordinary Terraform-created VMs whose records are written by the same role that configures the DNS service. There is no special case and no ordering trap.

ADR-007 holds the zone structure, split-horizon, and addressing conventions. The IP-range split there (.10.19 core infra vs .50.249 fleet) is now an addressing convention only — it no longer implies any difference in how records are written.


The control-node exception

The control node — the host that runs Terraform and Ansible — is the one VM Terraform does not create. It cannot provision the infrastructure that would provision itself (chicken-and-egg). It is therefore the single documented exception to "Terraform owns VM existence":

  • Provisioned and bootstrapped manually, per the control-node section of ADR-005.
  • Listed in inventories/<env>/hosts.yml under the control group, and managed by Ansible for baseline config only (no docker_host role).

Every other host is Terraform-managed.


What was ruled out

Option Reason
Manual qm clone as a general provisioning path Terraform is the single way VMs come into existence; a parallel manual path would let the inventory and real infrastructure drift. The sole exception is the control node.
Hand-editing the generated inventory hosts.yml is a build artifact of tf_to_inventory.py; edits are overwritten on the next make tf-inventory. Edit local.vms instead.
Documenting the seam in both ADR-005 and ADR-006 The boundary belongs in exactly one place. Those ADRs link here.
Terraform-managed DNS records (hashicorp/dns + RFC 2136) Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle.