docs(adr): restructure ADRs 006-009 to ADR-023 conformance

Add dated Status sections, a Decision umbrella over the existing topical
sections (demoted to ###), and Consequences assembled from each ADR's
already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-10 14:41:24 +02:00
parent 188882449d
commit 44dbd4628f
4 changed files with 138 additions and 26 deletions

View file

@ -1,5 +1,9 @@
# ADR-006 — Terraform for infrastructure provisioning # ADR-006 — Terraform for infrastructure provisioning
## Status
Accepted (2026-05-30)
## Context ## Context
Ansible manages host configuration well but has no state model for infrastructure Ansible manages host configuration well but has no state model for infrastructure
@ -13,7 +17,9 @@ exact boundary, handoff pipeline, and data contract between them live in **ADR-0
--- ---
## Responsibility split ## Decision
### Responsibility split
The canonical responsibility-split table lives in **ADR-009**. In short: Terraform The canonical responsibility-split table lives in **ADR-009**. In short: Terraform
owns VM existence only; Ansible owns everything inside a VM, including all internal owns VM existence only; Ansible owns everything inside a VM, including all internal
@ -26,7 +32,7 @@ cadence, making them a poor fit for Terraform state.
--- ---
## Providers ### Providers
**`bpg/proxmox` (`~> 0.70`)**: Chosen over `telmate/proxmox` for active maintenance, **`bpg/proxmox` (`~> 0.70`)**: Chosen over `telmate/proxmox` for active maintenance,
full Proxmox 8 API support, and better cloud-init integration. This is the only full Proxmox 8 API support, and better cloud-init integration. This is the only
@ -42,7 +48,7 @@ Terraform manages its own provider dependencies via `required_providers` and
--- ---
## State backend ### State backend
**Choice**: Local state on the control node. **Choice**: Local state on the control node.
@ -59,7 +65,7 @@ integration boundary.
--- ---
## Structure ### Structure
``` ```
terraform/ terraform/
@ -83,7 +89,7 @@ Each environment directory contains:
--- ---
## Secrets handling ### Secrets handling
The only secret input (the Proxmox API token) is passed via a `TF_VAR_*` The only secret input (the Proxmox API token) is passed via a `TF_VAR_*`
environment variable and declared `sensitive = true` in `variables.tf`. It never environment variable and declared `sensitive = true` in `variables.tf`. It never
@ -92,7 +98,7 @@ appears in `.tfvars` files. Non-secret configuration lives in tracked
--- ---
## Ansible integration ### Ansible integration
After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate
`inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline, `inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline,
@ -102,7 +108,7 @@ handoff)**.
--- ---
## What was ruled out ### What was ruled out
| Option | Reason | | Option | Reason |
|---|---| |---|---|
@ -110,3 +116,24 @@ handoff)**.
| OPNsense Terraform provider | Community-maintained; provider rot risk across OPNsense releases | | OPNsense Terraform provider | Community-maintained; provider rot risk across OPNsense releases |
| Terraform workspaces | Single state file with workspace prefix; accidental cross-env apply possible | | Terraform workspaces | Single state file with workspace prefix; accidental cross-env apply possible |
| Separate Terraform repo | Cross-referencing between infra and config adds friction; monorepo keeps the full picture together | | Separate Terraform repo | Cross-referencing between infra and config adds friction; monorepo keeps the full picture together |
## Consequences
Drawn from the "What was ruled out" section and the decisions stated above:
- `bpg/proxmox` is the only provider; `telmate/proxmox` was ruled out for weaker
maintenance and Proxmox 8 / cloud-init support (Providers; What was ruled out).
- OPNsense stays entirely in Ansible — no Terraform OPNsense provider — to avoid
community-provider rot across OPNsense releases (Responsibility split; What was
ruled out).
- Terraform writes no DNS records; Ansible's `dns` role owns the entire internal
zone, avoiding the bootstrap cycle and split DNS ownership the earlier
`hashicorp/dns` design created (Providers).
- State is local on the control node because Forgejo offers no usable HTTP state
backend; this is sufficient at solo-operator scale (no concurrent applies, no
remote locking), with a real backend such as MinIO/S3 to be added later if
warranted (State backend).
- Separate environment directories are used instead of Terraform workspaces to
remove the risk of applying the wrong state (Structure; What was ruled out).
- Terraform and Ansible internals are kept in one monorepo rather than a separate
Terraform repo to avoid cross-referencing friction (What was ruled out).

View file

@ -1,5 +1,9 @@
# ADR-007 — Network topology and addressing # ADR-007 — Network topology and addressing
## Status
Accepted (2026-05-30)
## Context ## Context
The boma homelab is a Proxmox cluster on a dedicated private network behind an The boma homelab is a Proxmox cluster on a dedicated private network behind an
@ -10,7 +14,9 @@ and OPNsense configuration.
--- ---
## Physical topology ## Decision
### Physical topology
``` ```
ISP ISP
@ -38,7 +44,7 @@ ISP
--- ---
## VLAN design ### VLAN design
| VLAN | Name | Subnet | Purpose | | VLAN | Name | Subnet | Purpose |
|---|---|---|---| |---|---|---|---|
@ -51,7 +57,7 @@ ISP
--- ---
## IP addressing ### IP addressing
### VLAN 10 — mgmt (10.10.0.0/24) — no DHCP ### VLAN 10 — mgmt (10.10.0.0/24) — no DHCP
@ -121,7 +127,7 @@ NetBird self-hosted on `askari`. NetBird manages its own overlay addressing
--- ---
## OPNsense firewall rules (intent) ### OPNsense firewall rules (intent)
| Source | Destination | Policy | | Source | Destination | Policy |
|---|---|---| |---|---|---|
@ -142,7 +148,7 @@ IoT devices cannot initiate connections to `srv`.
--- ---
## Naming scheme ### Naming scheme
| Layer | Convention | Examples | | Layer | Convention | Examples |
|---|---|---| |---|---|---|
@ -155,7 +161,7 @@ IoT devices cannot initiate connections to `srv`.
--- ---
## DNS zones and split-horizon ### DNS zones and split-horizon
**Internal zone**: `boma.baobab.band` — served by `dns1` and `dns2`. **Internal zone**: `boma.baobab.band` — served by `dns1` and `dns2`.
The zone is rendered by the Ansible `dns` role: host A records come from the The zone is rendered by the Ansible `dns` role: host A records come from the
@ -175,7 +181,7 @@ All other queries go upstream (e.g., `1.1.1.1`, `9.9.9.9`).
--- ---
## External monitoring — askari ### External monitoring — askari
`askari` (Hetzner VPS) is a peer on the **NetBird mesh** (ADR-016) and also **hosts `askari` (Hetzner VPS) is a peer on the **NetBird mesh** (ADR-016) and also **hosts
the self-hosted NetBird coordinator** (management/signal/relay). It reaches `srv` the self-hosted NetBird coordinator** (management/signal/relay). It reaches `srv`
@ -186,3 +192,24 @@ ACLs — no OPNsense WireGuard tunnel and no `10.99.0.0/24` routing.
be reachable even when the homelab is down (its entire purpose), which is also why be reachable even when the homelab is down (its entire purpose), which is also why
the mesh coordinator lives here: an off-site control plane survives a homelab outage. the mesh coordinator lives here: an off-site control plane survives a homelab outage.
FQDN: `askari.baobab.band`. FQDN: `askari.baobab.band`.
---
## Consequences
Drawn from the implications already stated above:
- VLAN 99 (`vpn`, `10.99.0.0/24`) is retired and the subnet freed; remote access is
carried by the self-hosted NetBird mesh instead of an OPNsense WireGuard subnet
(VLAN design; IP addressing — VLAN 99 retired).
- Mesh-peer firewall allowances (to `srv` metrics ports and `mgmt`) are enforced by
NetBird ACLs, not OPNsense rules (OPNsense firewall rules (intent)).
- IoT devices cannot initiate connections to `srv`; only Home Assistant at
`10.20.0.13` may reach the IoT VLAN, with OPNsense Avahi bridging `srv``iot`
for discovery (OPNsense firewall rules (intent)).
- Terraform writes no DNS records; the Ansible `dns` role renders the internal zone
from inventory plus `group_vars`, with `dns1`/`dns2` serving split-horizon answers
(DNS zones and split-horizon).
- `askari` runs independently of the cluster so it survives a homelab outage, which
is why the off-site NetBird control plane lives there (External monitoring —
askari).

View file

@ -3,6 +3,10 @@
> Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`, > Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`,
> apply-path coverage blind spots) live in `docs/testing/gotchas.md`. > apply-path coverage blind spots) live in `docs/testing/gotchas.md`.
## Status
Accepted (2026-05-30)
## Context ## Context
Ansible roles must be idempotent and correct before they touch production hosts. Ansible roles must be idempotent and correct before they touch production hosts.
@ -11,7 +15,9 @@ This document records the testing strategy, what each level covers, and — crit
--- ---
## Three testing levels ## Decision
### Three testing levels
### Level 1 — Molecule (per role, always required) ### Level 1 — Molecule (per role, always required)
@ -78,7 +84,7 @@ deploy (STATUS.md). Full design: ADR-017.
--- ---
## Molecule test image ### Molecule test image
**No external images.** The project builds and hosts its own test image. **No external images.** The project builds and hosts its own test image.
@ -103,7 +109,7 @@ functionally equivalent and fully owned.
--- ---
## Idempotency requirements ### Idempotency requirements
Every role task must satisfy one of these: Every role task must satisfy one of these:
@ -121,7 +127,7 @@ catches anything lint misses.
--- ---
## What Molecule tests — and what it does not ### What Molecule tests — and what it does not
### Tested in Molecule ### Tested in Molecule
@ -161,7 +167,7 @@ Behavioural correctness is confirmed on staging.
--- ---
## CI pipeline ### CI pipeline
``` ```
push to main push to main
@ -178,3 +184,27 @@ promote to production
Manual gates are intentional. Automated tests prove correctness in isolation; Manual gates are intentional. Automated tests prove correctness in isolation;
a human confirms the change is safe to promote. a human confirms the change is safe to promote.
---
## Consequences
Drawn from the limitations and trade-offs already stated above:
- The Molecule idempotency step is non-negotiable; every role must pass it cleanly
(Three testing levels — Level 1).
- A class of capabilities (nftables rule loading, NetBird mesh data plane,
unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware
passthrough, corosync cluster formation) cannot be verified in Molecule and is
validated only at Level 2 (staging) or Level 3 (external) — a conscious,
documented decision, not a gap (What Molecule tests — and what it does not).
- The project builds and hosts its own `molecule-debian13` image rather than relying
on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a
custom image to avoid drift, disappearance, or unexpected changes outside project
control (Molecule test image).
- Level 4 service-UI acceptance is authorable now but its execution is deferred,
pending `ubongo`, the `playwright` plugin, Authentik, and a staging deploy (Three
testing levels — Level 4).
- Promotion to staging and to production stays behind intentional manual approval
gates; automation proves isolated correctness, a human confirms promotion safety
(CI pipeline).

View file

@ -1,5 +1,9 @@
# ADR-009 — Terraform ↔ Ansible provisioning handoff # ADR-009 — Terraform ↔ Ansible provisioning handoff
## Status
Accepted (2026-05-30)
## Context ## Context
Two tools touch every managed host. Terraform owns **what exists** — VMs on Two tools touch every managed host. Terraform owns **what exists** — VMs on
@ -14,7 +18,9 @@ the cloud-init template that VMs are cloned from. This ADR covers how they conne
--- ---
## The boundary ## Decision
### The boundary
| Layer | Tool | Notes | | Layer | Tool | Notes |
|---|---|---| |---|---|---|
@ -31,7 +37,7 @@ below).
--- ---
## The handoff pipeline ### The handoff pipeline
There is one path by which a managed host comes into existence and reaches its There is one path by which a managed host comes into existence and reaches its
configured state: configured state:
@ -55,7 +61,7 @@ this pipeline — **never** by hand-editing the inventory.
--- ---
## The data contract ### The data contract
The seam's interface is a single Terraform output consumed by a single script. The seam's interface is a single Terraform output consumed by a single script.
@ -88,7 +94,7 @@ Terraform, and the inventory is regenerated, never edited.
--- ---
## Cloud-init's role ### Cloud-init's role
Cloud-init is the thin first-boot layer between Terraform and Ansible: Cloud-init is the thin first-boot layer between Terraform and Ansible:
@ -103,7 +109,7 @@ The line is sharp: cloud-init buys *reachability*, Ansible owns *configuration*.
--- ---
## Internal DNS — owned by Ansible, no chicken-and-egg ### Internal DNS — owned by Ansible, no chicken-and-egg
Terraform writes **no** DNS records. The internal zone (`boma.baobab.band`) is Terraform writes **no** DNS records. The internal zone (`boma.baobab.band`) is
rendered entirely by the Ansible `dns` role: rendered entirely by the Ansible `dns` role:
@ -129,7 +135,7 @@ convention only — it no longer implies any difference in how records are writt
--- ---
## The control-node exception ### The control-node exception
The control node — the host that runs Terraform and Ansible — is `ubongo`, a The control node — the host that runs Terraform and Ansible — is `ubongo`, a
dedicated **physical** machine outside the cluster. It is not a VM at all, so dedicated **physical** machine outside the cluster. It is not a VM at all, so
@ -146,7 +152,7 @@ Every other host is Terraform-managed.
--- ---
## What was ruled out ### What was ruled out
| Option | Reason | | Option | Reason |
|---|---| |---|---|
@ -154,3 +160,25 @@ Every other host is Terraform-managed.
| Hand-editing the generated inventory | `hosts.yml` is a build artifact of `tf_to_inventory.py`; edits are overwritten on the next `make tf-inventory`. Edit `local.vms` instead. | | Hand-editing the generated inventory | `hosts.yml` is a build artifact of `tf_to_inventory.py`; edits are overwritten on the next `make tf-inventory`. Edit `local.vms` instead. |
| Documenting the seam in both ADR-005 and ADR-006 | The boundary belongs in exactly one place. Those ADRs link here. | | Documenting the seam in both ADR-005 and ADR-006 | The boundary belongs in exactly one place. Those ADRs link here. |
| Terraform-managed DNS records (`hashicorp/dns` + RFC 2136) | Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle. | | Terraform-managed DNS records (`hashicorp/dns` + RFC 2136) | Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle. |
## Consequences
Drawn from the boundary, the data contract, and the "What was ruled out" section above:
- Adding a host means editing `local.vms` and running the handoff pipeline; the
generated `hosts.yml` is a build artifact and must never be hand-edited — manual
edits are overwritten on the next `make tf-inventory` (The handoff pipeline; The
data contract; What was ruled out).
- Manual `qm clone` is rejected as a general provisioning path so the inventory and
real infrastructure cannot drift; Terraform is the single way VMs come into
existence (What was ruled out).
- Terraform writes no DNS records: the Ansible `dns` role renders the whole internal
zone from inventory plus `group_vars`, dissolving the bootstrap cycle a
Terraform-managed zone (`hashicorp/dns` + RFC 2136) would create (Internal DNS —
owned by Ansible, no chicken-and-egg; What was ruled out).
- The control node (`ubongo`) is the single documented exception to "Terraform owns
VM existence" — a physical machine provisioned manually and managed by Ansible for
baseline config only; every other host is Terraform-managed (The control-node
exception).
- The seam is documented in exactly one place (this ADR); ADR-005 and ADR-006 link
here rather than restating it (What was ruled out).