boma/docs/decisions/006-terraform.md

# ADR-006 — Terraform for infrastructure provisioning

## Status

Accepted (2026-05-30)

## Context

Ansible manages host configuration well but has no state model for infrastructure
existence. Adding Terraform handles the "what exists" layer — creating and destroying
VMs on Proxmox — while Ansible continues to own everything that runs inside them,
including all internal DNS records.

This complements rather than replaces Ansible. The two tools do not overlap. The
exact boundary, handoff pipeline, and data contract between them live in **ADR-009
(provisioning handoff)** — this ADR covers Terraform's own internals only.

---

## Decision

### Responsibility split

The canonical responsibility-split table lives in **ADR-009**. In short: Terraform
owns VM existence only; Ansible owns everything inside a VM, including all internal
DNS records.

**OPNsense is entirely Ansible.** The available Terraform providers for OPNsense
are community-maintained with real risk of provider rot across OPNsense releases.
OPNsense firewall rules also change on a service cadence, not an infrastructure
cadence, making them a poor fit for Terraform state.

---

### Providers

**`bpg/proxmox` (`~> 0.70`)**: Chosen over `telmate/proxmox` for active maintenance,
full Proxmox 8 API support, and better cloud-init integration. This is the only
provider.

Terraform does **not** manage DNS. An earlier design used `hashicorp/dns` (RFC 2136)
to write A records, but that created a bootstrap cycle — the first DNS server cannot
register itself — and split DNS ownership across two tools. Ansible's `dns` role now
owns the entire internal zone, rendered from inventory. See ADR-009.

Terraform manages its own provider dependencies via `required_providers` and
`.terraform.lock.hcl` (tracked in git once `terraform init` has been run).

---

### State backend

**Choice**: Local state on the control node.

Forgejo (Gitea-based) has no usable Terraform HTTP state backend — its API `/raw/`
endpoint is read-only, so state cannot be written there. State therefore lives
locally as `terraform.tfstate` (gitignored) on the control node, which is persistent
and backed up with the rest of the node.

At this scale (solo operator, a handful of VMs) local state is sufficient: no
concurrent applies, so no remote locking is needed. If a remote backend with locking
becomes worthwhile later, add a `backend` block to `backend.tf` pointing at a real
backend such as MinIO/S3 — Forgejo is not an option. See ADR-010 for the Forgejo
integration boundary.

---

### Structure

```
terraform/
  modules/
    proxmox_vm/          # reusable VM module — Proxmox only, no DNS
  environments/
    staging/             # staging VMs, separate state file
    production/          # production VMs, separate state file
```

Separate environment directories (not Terraform workspaces) for the clearest
isolation — no risk of accidentally applying the wrong state.

Each environment directory contains:
- `providers.tf` — provider version pins and configuration
- `backend.tf` — backend configuration (local state on the control node; no remote backend — see "State backend" above)
- `variables.tf` — input declarations
- `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values
- `main.tf` — `local.vms` map and module calls (no DNS resources)
- `outputs.tf` — VM map consumed by `make tf-inventory`

---

### Secrets handling

The only secret input (the Proxmox API token) is passed via a `TF_VAR_*`
environment variable and declared `sensitive = true` in `variables.tf`. It never
appears in `.tfvars` files. Non-secret configuration lives in tracked
`terraform.tfvars.example`; the real `terraform.tfvars` is gitignored.

---

### Ansible integration

After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate
`inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline,
the `vms` output → inventory data contract, and the generator script
(`scripts/tf_to_inventory.py`) are documented in **ADR-009 (provisioning
handoff)**.

---

### What was ruled out

| Option | Reason |
|---|---|
| `telmate/proxmox` provider | Less actively maintained; weaker cloud-init and Proxmox 8 support |
| OPNsense Terraform provider | Community-maintained; provider rot risk across OPNsense releases |
| Terraform workspaces | Single state file with workspace prefix; accidental cross-env apply possible |
| Separate Terraform repo | Cross-referencing between infra and config adds friction; monorepo keeps the full picture together |

## Consequences

Drawn from the "What was ruled out" section and the decisions stated above:

- `bpg/proxmox` is the only provider; `telmate/proxmox` was ruled out for weaker
  maintenance and Proxmox 8 / cloud-init support (Providers; What was ruled out).
- OPNsense stays entirely in Ansible — no Terraform OPNsense provider — to avoid
  community-provider rot across OPNsense releases (Responsibility split; What was
  ruled out).
- Terraform writes no DNS records; Ansible's `dns` role owns the entire internal
  zone, avoiding the bootstrap cycle and split DNS ownership the earlier
  `hashicorp/dns` design created (Providers).
- State is local on the control node because Forgejo offers no usable HTTP state
  backend; this is sufficient at solo-operator scale (no concurrent applies, no
  remote locking), with a real backend such as MinIO/S3 to be added later if
  warranted (State backend).
- Separate environment directories are used instead of Terraform workspaces to
  remove the risk of applying the wrong state (Structure; What was ruled out).
- Terraform and Ansible internals are kept in one monorepo rather than a separate
  Terraform repo to avoid cross-referencing friction (What was ruled out).
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00			`# ADR-006 — Terraform for infrastructure provisioning`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`## Status`

			`Accepted (2026-05-30)`

Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00			`## Context`

			`Ansible manages host configuration well but has no state model for infrastructure`
			`existence. Adding Terraform handles the "what exists" layer — creating and destroying`
			`VMs on Proxmox — while Ansible continues to own everything that runs inside them,`
			`including all internal DNS records.`

			`This complements rather than replaces Ansible. The two tools do not overlap. The`
			`exact boundary, handoff pipeline, and data contract between them live in **ADR-009`
			`(provisioning handoff)** — this ADR covers Terraform's own internals only.`

			`---`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`## Decision`

			`### Responsibility split`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`The canonical responsibility-split table lives in ADR-009. In short: Terraform`
			`owns VM existence only; Ansible owns everything inside a VM, including all internal`
			`DNS records.`

			`OPNsense is entirely Ansible. The available Terraform providers for OPNsense`
			`are community-maintained with real risk of provider rot across OPNsense releases.`
			`OPNsense firewall rules also change on a service cadence, not an infrastructure`
			`cadence, making them a poor fit for Terraform state.`

			`---`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`### Providers`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`bpg/proxmox` (`~> 0.70`): Chosen over `telmate/proxmox` for active maintenance,
			`full Proxmox 8 API support, and better cloud-init integration. This is the only`
			`provider.`

			Terraform does not manage DNS. An earlier design used `hashicorp/dns` (RFC 2136)
			`to write A records, but that created a bootstrap cycle — the first DNS server cannot`
			register itself — and split DNS ownership across two tools. Ansible's `dns` role now
			`owns the entire internal zone, rendered from inventory. See ADR-009.`

Apply review fixes R12-R14: printf scaffold, phantom control/ dir, Galaxy wording Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 19:19:47 +02:00			Terraform manages its own provider dependencies via `required_providers` and
			`.terraform.lock.hcl` (tracked in git once `terraform init` has been run).
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`---`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`### State backend`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
Use local Terraform state; drop unworkable Forgejo HTTP backend (R10b) Forgejo's /raw/ API is read-only so it cannot serve as a Terraform HTTP state backend. Switch both envs to local state on the control node (ADR-006); remove the dead TF_HTTP_* credential hints. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 21:34:05 +02:00			`Choice: Local state on the control node.`

			Forgejo (Gitea-based) has no usable Terraform HTTP state backend — its API `/raw/`
			`endpoint is read-only, so state cannot be written there. State therefore lives`
			locally as `terraform.tfstate` (gitignored) on the control node, which is persistent
			`and backed up with the rest of the node.`

			`At this scale (solo operator, a handful of VMs) local state is sufficient: no`
			`concurrent applies, so no remote locking is needed. If a remote backend with locking`
			becomes worthwhile later, add a `backend` block to `backend.tf` pointing at a real
			`backend such as MinIO/S3 — Forgejo is not an option. See ADR-010 for the Forgejo`
			`integration boundary.`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`---`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`### Structure`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			```
			`terraform/`
			`modules/`
			`proxmox_vm/ # reusable VM module — Proxmox only, no DNS`
			`environments/`
			`staging/ # staging VMs, separate state file`
			`production/ # production VMs, separate state file`
			```

			`Separate environment directories (not Terraform workspaces) for the clearest`
			`isolation — no risk of accidentally applying the wrong state.`

			`Each environment directory contains:`
			- `providers.tf` — provider version pins and configuration
ADR-006/014: clear two stale labels Review O5/O6: ADR-006 mislabeled backend.tf as "Forgejo state backend" (its own State-backend section chooses local state — Forgejo's API is read-only); ADR-014 called plugin reproducibility open though TODO 10.7 is done. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-05 18:55:17 +02:00			- `backend.tf` — backend configuration (local state on the control node; no remote backend — see "State backend" above)
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00			- `variables.tf` — input declarations
			- `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values
			- `main.tf` — `local.vms` map and module calls (no DNS resources)
			- `outputs.tf` — VM map consumed by `make tf-inventory`

			`---`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`### Secrets handling`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			The only secret input (the Proxmox API token) is passed via a `TF_VAR_*`
			environment variable and declared `sensitive = true` in `variables.tf`. It never
			appears in `.tfvars` files. Non-secret configuration lives in tracked
			`terraform.tfvars.example`; the real `terraform.tfvars` is gitignored.

			`---`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`### Ansible integration`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate
			`inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline,
			the `vms` output → inventory data contract, and the generator script
			(`scripts/tf_to_inventory.py`) are documented in **ADR-009 (provisioning
			`handoff)**.`

			`---`

docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00			`### What was ruled out`
Add architecture decision records and runbooks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`\| Option \| Reason \|`
			`\|---\|---\|`
			\| `telmate/proxmox` provider \| Less actively maintained; weaker cloud-init and Proxmox 8 support \|
			`\| OPNsense Terraform provider \| Community-maintained; provider rot risk across OPNsense releases \|`
			`\| Terraform workspaces \| Single state file with workspace prefix; accidental cross-env apply possible \|`
			`\| Separate Terraform repo \| Cross-referencing between infra and config adds friction; monorepo keeps the full picture together \|`
docs(adr): restructure ADRs 006-009 to ADR-023 conformance Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-10 14:41:24 +02:00
			`## Consequences`

			`Drawn from the "What was ruled out" section and the decisions stated above:`

			- `bpg/proxmox` is the only provider; `telmate/proxmox` was ruled out for weaker
			`maintenance and Proxmox 8 / cloud-init support (Providers; What was ruled out).`
			`- OPNsense stays entirely in Ansible — no Terraform OPNsense provider — to avoid`
			`community-provider rot across OPNsense releases (Responsibility split; What was`
			`ruled out).`
			- Terraform writes no DNS records; Ansible's `dns` role owns the entire internal
			`zone, avoiding the bootstrap cycle and split DNS ownership the earlier`
			`hashicorp/dns` design created (Providers).
			`- State is local on the control node because Forgejo offers no usable HTTP state`
			`backend; this is sufficient at solo-operator scale (no concurrent applies, no`
			`remote locking), with a real backend such as MinIO/S3 to be added later if`
			`warranted (State backend).`
			`- Separate environment directories are used instead of Terraform workspaces to`
			`remove the risk of applying the wrong state (Structure; What was ruled out).`
			`- Terraform and Ansible internals are kept in one monorepo rather than a separate`
			`Terraform repo to avoid cross-referencing friction (What was ruled out).`