148 lines
6.3 KiB
Markdown
148 lines
6.3 KiB
Markdown
# ADR-006 — Terraform for infrastructure provisioning
|
|
|
|
## Status
|
|
|
|
Accepted (2026-05-30)
|
|
|
|
## Context
|
|
|
|
Ansible manages host configuration well but has no state model for infrastructure
|
|
existence. Adding Terraform handles the "what exists" layer — creating and destroying
|
|
VMs on Proxmox and Hetzner — while Ansible continues to own everything that runs inside them,
|
|
including all internal DNS records.
|
|
|
|
This complements rather than replaces Ansible. The two tools do not overlap. The
|
|
exact boundary, handoff pipeline, and data contract between them live in **ADR-009
|
|
(provisioning handoff)** — this ADR covers Terraform's own internals only.
|
|
|
|
---
|
|
|
|
## Decision
|
|
|
|
### Responsibility split
|
|
|
|
The canonical responsibility-split table lives in **ADR-009**. In short: Terraform
|
|
owns VM existence only; Ansible owns everything inside a VM, including all internal
|
|
DNS records.
|
|
|
|
**OPNsense is entirely Ansible.** The available Terraform providers for OPNsense
|
|
are community-maintained with real risk of provider rot across OPNsense releases.
|
|
OPNsense firewall rules also change on a service cadence, not an infrastructure
|
|
cadence, making them a poor fit for Terraform state.
|
|
|
|
---
|
|
|
|
### Providers
|
|
|
|
**`bpg/proxmox` (`~> 0.70`)**: Chosen over `telmate/proxmox` for active maintenance,
|
|
full Proxmox 8 API support, and better cloud-init integration. This is the provider
|
|
for Proxmox VMs.
|
|
|
|
**`hetznercloud/hcloud` (`~> 1.65`)**: owns off-site VM existence (`askari`). ADR-006's
|
|
scope is now **Proxmox + Hetzner** — "Terraform owns VM existence" generalizes across
|
|
providers. The `offsite` environment + `hetzner_vm` module live alongside the Proxmox env
|
|
+ `proxmox_vm` module; each environment has its own local state.
|
|
|
|
Terraform does **not** manage DNS. An earlier design used `hashicorp/dns` (RFC 2136)
|
|
to write A records, but that created a bootstrap cycle — the first DNS server cannot
|
|
register itself — and split DNS ownership across two tools. Ansible's `dns` role now
|
|
owns the entire internal zone, rendered from inventory. See ADR-009.
|
|
|
|
Terraform manages its own provider dependencies via `required_providers` and
|
|
`.terraform.lock.hcl` (tracked in git once `terraform init` has been run).
|
|
|
|
---
|
|
|
|
### State backend
|
|
|
|
**Choice**: Local state on the control node.
|
|
|
|
Forgejo (Gitea-based) has no usable Terraform HTTP state backend — its API `/raw/`
|
|
endpoint is read-only, so state cannot be written there. State therefore lives
|
|
locally as `terraform.tfstate` (gitignored) on the control node, which is persistent
|
|
and backed up with the rest of the node.
|
|
|
|
At this scale (solo operator, a handful of VMs) local state is sufficient: no
|
|
concurrent applies, so no remote locking is needed. If a remote backend with locking
|
|
becomes worthwhile later, add a `backend` block to `backend.tf` pointing at a real
|
|
backend such as MinIO/S3 — Forgejo is not an option. See ADR-010 for the Forgejo
|
|
integration boundary.
|
|
|
|
---
|
|
|
|
### Structure
|
|
|
|
```
|
|
terraform/
|
|
modules/
|
|
proxmox_vm/ # reusable VM module — Proxmox only, no DNS
|
|
hetzner_vm/ # reusable VM module — Hetzner Cloud, no DNS
|
|
environments/
|
|
staging/ # staging Proxmox VMs, separate state file
|
|
production/ # production Proxmox VMs, separate state file
|
|
offsite/ # off-site Hetzner VMs (askari), separate state file
|
|
```
|
|
|
|
Separate environment directories (not Terraform workspaces) for the clearest
|
|
isolation — no risk of accidentally applying the wrong state.
|
|
|
|
Each environment directory contains:
|
|
- `providers.tf` — provider version pins and configuration
|
|
- `backend.tf` — backend configuration (local state on the control node; no remote backend — see "State backend" above)
|
|
- `variables.tf` — input declarations
|
|
- `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values
|
|
- `main.tf` — `local.vms` map and module calls (no DNS resources)
|
|
- `outputs.tf` — VM map consumed by `make tf-inventory`
|
|
|
|
---
|
|
|
|
### Secrets handling
|
|
|
|
The only secret input (the Proxmox API token) is passed via a `TF_VAR_*`
|
|
environment variable and declared `sensitive = true` in `variables.tf`. It never
|
|
appears in `.tfvars` files. Non-secret configuration lives in tracked
|
|
`terraform.tfvars.example`; the real `terraform.tfvars` is gitignored.
|
|
|
|
---
|
|
|
|
### Ansible integration
|
|
|
|
After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate
|
|
`inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline,
|
|
the `vms` output → inventory data contract, and the generator script
|
|
(`scripts/tf_to_inventory.py`) are documented in **ADR-009 (provisioning
|
|
handoff)**.
|
|
|
|
---
|
|
|
|
### What was ruled out
|
|
|
|
| Option | Reason |
|
|
|---|---|
|
|
| `telmate/proxmox` provider | Less actively maintained; weaker cloud-init and Proxmox 8 support |
|
|
| OPNsense Terraform provider | Community-maintained; provider rot risk across OPNsense releases |
|
|
| Terraform workspaces | Single state file with workspace prefix; accidental cross-env apply possible |
|
|
| Separate Terraform repo | Cross-referencing between infra and config adds friction; monorepo keeps the full picture together |
|
|
|
|
## Consequences
|
|
|
|
Drawn from the "What was ruled out" section and the decisions stated above:
|
|
|
|
- `bpg/proxmox` is the provider for Proxmox VMs; `telmate/proxmox` was ruled out for weaker
|
|
maintenance and Proxmox 8 / cloud-init support (Providers; What was ruled out).
|
|
- `hetznercloud/hcloud` is the provider for off-site VM existence (`askari`); ADR-006's
|
|
scope now covers Proxmox + Hetzner (Providers).
|
|
- OPNsense stays entirely in Ansible — no Terraform OPNsense provider — to avoid
|
|
community-provider rot across OPNsense releases (Responsibility split; What was
|
|
ruled out).
|
|
- Terraform writes no DNS records; Ansible's `dns` role owns the entire internal
|
|
zone, avoiding the bootstrap cycle and split DNS ownership the earlier
|
|
`hashicorp/dns` design created (Providers).
|
|
- State is local on the control node because Forgejo offers no usable HTTP state
|
|
backend; this is sufficient at solo-operator scale (no concurrent applies, no
|
|
remote locking), with a real backend such as MinIO/S3 to be added later if
|
|
warranted (State backend).
|
|
- Separate environment directories are used instead of Terraform workspaces to
|
|
remove the risk of applying the wrong state (Structure; What was ruled out).
|
|
- Terraform and Ansible internals are kept in one monorepo rather than a separate
|
|
Terraform repo to avoid cross-referencing friction (What was ruled out).
|