boma/docs/decisions/006-terraform.md
sjat 905bc92b15 Use local Terraform state; drop unworkable Forgejo HTTP backend (R10b)
Forgejo's /raw/ API is read-only so it cannot serve as a Terraform HTTP state
backend. Switch both envs to local state on the control node (ADR-006); remove
the dead TF_HTTP_* credential hints.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:34:05 +02:00

4.4 KiB

ADR-006 — Terraform for infrastructure provisioning

Context

Ansible manages host configuration well but has no state model for infrastructure existence. Adding Terraform handles the "what exists" layer — creating and destroying VMs on Proxmox — while Ansible continues to own everything that runs inside them, including all internal DNS records.

This complements rather than replaces Ansible. The two tools do not overlap. The exact boundary, handoff pipeline, and data contract between them live in ADR-009 (provisioning handoff) — this ADR covers Terraform's own internals only.


Responsibility split

The canonical responsibility-split table lives in ADR-009. In short: Terraform owns VM existence only; Ansible owns everything inside a VM, including all internal DNS records.

OPNsense is entirely Ansible. The available Terraform providers for OPNsense are community-maintained with real risk of provider rot across OPNsense releases. OPNsense firewall rules also change on a service cadence, not an infrastructure cadence, making them a poor fit for Terraform state.


Providers

bpg/proxmox (~> 0.70): Chosen over telmate/proxmox for active maintenance, full Proxmox 8 API support, and better cloud-init integration. This is the only provider.

Terraform does not manage DNS. An earlier design used hashicorp/dns (RFC 2136) to write A records, but that created a bootstrap cycle — the first DNS server cannot register itself — and split DNS ownership across two tools. Ansible's dns role now owns the entire internal zone, rendered from inventory. See ADR-009.

Terraform manages its own provider dependencies via required_providers and .terraform.lock.hcl (tracked in git once terraform init has been run).


State backend

Choice: Local state on the control node.

Forgejo (Gitea-based) has no usable Terraform HTTP state backend — its API /raw/ endpoint is read-only, so state cannot be written there. State therefore lives locally as terraform.tfstate (gitignored) on the control node, which is persistent and backed up with the rest of the node.

At this scale (solo operator, a handful of VMs) local state is sufficient: no concurrent applies, so no remote locking is needed. If a remote backend with locking becomes worthwhile later, add a backend block to backend.tf pointing at a real backend such as MinIO/S3 — Forgejo is not an option. See ADR-010 for the Forgejo integration boundary.


Structure

terraform/
  modules/
    proxmox_vm/          # reusable VM module — Proxmox only, no DNS
  environments/
    staging/             # staging VMs, separate state file
    production/          # production VMs, separate state file

Separate environment directories (not Terraform workspaces) for the clearest isolation — no risk of accidentally applying the wrong state.

Each environment directory contains:

  • providers.tf — provider version pins and configuration
  • backend.tf — Forgejo state backend (environment-specific path)
  • variables.tf — input declarations
  • terraform.tfvars.example — tracked template; copy to terraform.tfvars for actual values
  • main.tflocal.vms map and module calls (no DNS resources)
  • outputs.tf — VM map consumed by make tf-inventory

Secrets handling

The only secret input (the Proxmox API token) is passed via a TF_VAR_* environment variable and declared sensitive = true in variables.tf. It never appears in .tfvars files. Non-secret configuration lives in tracked terraform.tfvars.example; the real terraform.tfvars is gitignored.


Ansible integration

After terraform apply, run make tf-inventory TF_ENV=<env> to regenerate inventories/<env>/hosts.yml from the vms output. The full handoff pipeline, the vms output → inventory data contract, and the generator script (scripts/tf_to_inventory.py) are documented in ADR-009 (provisioning handoff).


What was ruled out

Option Reason
telmate/proxmox provider Less actively maintained; weaker cloud-init and Proxmox 8 support
OPNsense Terraform provider Community-maintained; provider rot risk across OPNsense releases
Terraform workspaces Single state file with workspace prefix; accidental cross-env apply possible
Separate Terraform repo Cross-referencing between infra and config adds friction; monorepo keeps the full picture together