docs(adr): restructure ADRs 006-009 to ADR-023 conformance

Add dated Status sections, a Decision umbrella over the existing topical
sections (demoted to ###), and Consequences assembled from each ADR's
already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-10 14:41:24 +02:00
parent 188882449d
commit 44dbd4628f
4 changed files with 138 additions and 26 deletions

View file

@ -1,5 +1,9 @@
# ADR-006 — Terraform for infrastructure provisioning
## Status
Accepted (2026-05-30)
## Context
Ansible manages host configuration well but has no state model for infrastructure
@ -13,7 +17,9 @@ exact boundary, handoff pipeline, and data contract between them live in **ADR-0
---
## Responsibility split
## Decision
### Responsibility split
The canonical responsibility-split table lives in **ADR-009**. In short: Terraform
owns VM existence only; Ansible owns everything inside a VM, including all internal
@ -26,7 +32,7 @@ cadence, making them a poor fit for Terraform state.
---
## Providers
### Providers
**`bpg/proxmox` (`~> 0.70`)**: Chosen over `telmate/proxmox` for active maintenance,
full Proxmox 8 API support, and better cloud-init integration. This is the only
@ -42,7 +48,7 @@ Terraform manages its own provider dependencies via `required_providers` and
---
## State backend
### State backend
**Choice**: Local state on the control node.
@ -59,7 +65,7 @@ integration boundary.
---
## Structure
### Structure
```
terraform/
@ -83,7 +89,7 @@ Each environment directory contains:
---
## Secrets handling
### Secrets handling
The only secret input (the Proxmox API token) is passed via a `TF_VAR_*`
environment variable and declared `sensitive = true` in `variables.tf`. It never
@ -92,7 +98,7 @@ appears in `.tfvars` files. Non-secret configuration lives in tracked
---
## Ansible integration
### Ansible integration
After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate
`inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline,
@ -102,7 +108,7 @@ handoff)**.
---
## What was ruled out
### What was ruled out
| Option | Reason |
|---|---|
@ -110,3 +116,24 @@ handoff)**.
| OPNsense Terraform provider | Community-maintained; provider rot risk across OPNsense releases |
| Terraform workspaces | Single state file with workspace prefix; accidental cross-env apply possible |
| Separate Terraform repo | Cross-referencing between infra and config adds friction; monorepo keeps the full picture together |
## Consequences
Drawn from the "What was ruled out" section and the decisions stated above:
- `bpg/proxmox` is the only provider; `telmate/proxmox` was ruled out for weaker
maintenance and Proxmox 8 / cloud-init support (Providers; What was ruled out).
- OPNsense stays entirely in Ansible — no Terraform OPNsense provider — to avoid
community-provider rot across OPNsense releases (Responsibility split; What was
ruled out).
- Terraform writes no DNS records; Ansible's `dns` role owns the entire internal
zone, avoiding the bootstrap cycle and split DNS ownership the earlier
`hashicorp/dns` design created (Providers).
- State is local on the control node because Forgejo offers no usable HTTP state
backend; this is sufficient at solo-operator scale (no concurrent applies, no
remote locking), with a real backend such as MinIO/S3 to be added later if
warranted (State backend).
- Separate environment directories are used instead of Terraform workspaces to
remove the risk of applying the wrong state (Structure; What was ruled out).
- Terraform and Ansible internals are kept in one monorepo rather than a separate
Terraform repo to avoid cross-referencing friction (What was ruled out).

View file

@ -1,5 +1,9 @@
# ADR-007 — Network topology and addressing
## Status
Accepted (2026-05-30)
## Context
The boma homelab is a Proxmox cluster on a dedicated private network behind an
@ -10,7 +14,9 @@ and OPNsense configuration.
---
## Physical topology
## Decision
### Physical topology
```
ISP
@ -38,7 +44,7 @@ ISP
---
## VLAN design
### VLAN design
| VLAN | Name | Subnet | Purpose |
|---|---|---|---|
@ -51,7 +57,7 @@ ISP
---
## IP addressing
### IP addressing
### VLAN 10 — mgmt (10.10.0.0/24) — no DHCP
@ -121,7 +127,7 @@ NetBird self-hosted on `askari`. NetBird manages its own overlay addressing
---
## OPNsense firewall rules (intent)
### OPNsense firewall rules (intent)
| Source | Destination | Policy |
|---|---|---|
@ -142,7 +148,7 @@ IoT devices cannot initiate connections to `srv`.
---
## Naming scheme
### Naming scheme
| Layer | Convention | Examples |
|---|---|---|
@ -155,7 +161,7 @@ IoT devices cannot initiate connections to `srv`.
---
## DNS zones and split-horizon
### DNS zones and split-horizon
**Internal zone**: `boma.baobab.band` — served by `dns1` and `dns2`.
The zone is rendered by the Ansible `dns` role: host A records come from the
@ -175,7 +181,7 @@ All other queries go upstream (e.g., `1.1.1.1`, `9.9.9.9`).
---
## External monitoring — askari
### External monitoring — askari
`askari` (Hetzner VPS) is a peer on the **NetBird mesh** (ADR-016) and also **hosts
the self-hosted NetBird coordinator** (management/signal/relay). It reaches `srv`
@ -186,3 +192,24 @@ ACLs — no OPNsense WireGuard tunnel and no `10.99.0.0/24` routing.
be reachable even when the homelab is down (its entire purpose), which is also why
the mesh coordinator lives here: an off-site control plane survives a homelab outage.
FQDN: `askari.baobab.band`.
---
## Consequences
Drawn from the implications already stated above:
- VLAN 99 (`vpn`, `10.99.0.0/24`) is retired and the subnet freed; remote access is
carried by the self-hosted NetBird mesh instead of an OPNsense WireGuard subnet
(VLAN design; IP addressing — VLAN 99 retired).
- Mesh-peer firewall allowances (to `srv` metrics ports and `mgmt`) are enforced by
NetBird ACLs, not OPNsense rules (OPNsense firewall rules (intent)).
- IoT devices cannot initiate connections to `srv`; only Home Assistant at
`10.20.0.13` may reach the IoT VLAN, with OPNsense Avahi bridging `srv``iot`
for discovery (OPNsense firewall rules (intent)).
- Terraform writes no DNS records; the Ansible `dns` role renders the internal zone
from inventory plus `group_vars`, with `dns1`/`dns2` serving split-horizon answers
(DNS zones and split-horizon).
- `askari` runs independently of the cluster so it survives a homelab outage, which
is why the off-site NetBird control plane lives there (External monitoring —
askari).

View file

@ -3,6 +3,10 @@
> Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`,
> apply-path coverage blind spots) live in `docs/testing/gotchas.md`.
## Status
Accepted (2026-05-30)
## Context
Ansible roles must be idempotent and correct before they touch production hosts.
@ -11,7 +15,9 @@ This document records the testing strategy, what each level covers, and — crit
---
## Three testing levels
## Decision
### Three testing levels
### Level 1 — Molecule (per role, always required)
@ -78,7 +84,7 @@ deploy (STATUS.md). Full design: ADR-017.
---
## Molecule test image
### Molecule test image
**No external images.** The project builds and hosts its own test image.
@ -103,7 +109,7 @@ functionally equivalent and fully owned.
---
## Idempotency requirements
### Idempotency requirements
Every role task must satisfy one of these:
@ -121,7 +127,7 @@ catches anything lint misses.
---
## What Molecule tests — and what it does not
### What Molecule tests — and what it does not
### Tested in Molecule
@ -161,7 +167,7 @@ Behavioural correctness is confirmed on staging.
---
## CI pipeline
### CI pipeline
```
push to main
@ -178,3 +184,27 @@ promote to production
Manual gates are intentional. Automated tests prove correctness in isolation;
a human confirms the change is safe to promote.
---
## Consequences
Drawn from the limitations and trade-offs already stated above:
- The Molecule idempotency step is non-negotiable; every role must pass it cleanly
(Three testing levels — Level 1).
- A class of capabilities (nftables rule loading, NetBird mesh data plane,
unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware
passthrough, corosync cluster formation) cannot be verified in Molecule and is
validated only at Level 2 (staging) or Level 3 (external) — a conscious,
documented decision, not a gap (What Molecule tests — and what it does not).
- The project builds and hosts its own `molecule-debian13` image rather than relying
on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a
custom image to avoid drift, disappearance, or unexpected changes outside project
control (Molecule test image).
- Level 4 service-UI acceptance is authorable now but its execution is deferred,
pending `ubongo`, the `playwright` plugin, Authentik, and a staging deploy (Three
testing levels — Level 4).
- Promotion to staging and to production stays behind intentional manual approval
gates; automation proves isolated correctness, a human confirms promotion safety
(CI pipeline).

View file

@ -1,5 +1,9 @@
# ADR-009 — Terraform ↔ Ansible provisioning handoff
## Status
Accepted (2026-05-30)
## Context
Two tools touch every managed host. Terraform owns **what exists** — VMs on
@ -14,7 +18,9 @@ the cloud-init template that VMs are cloned from. This ADR covers how they conne
---
## The boundary
## Decision
### The boundary
| Layer | Tool | Notes |
|---|---|---|
@ -31,7 +37,7 @@ below).
---
## The handoff pipeline
### The handoff pipeline
There is one path by which a managed host comes into existence and reaches its
configured state:
@ -55,7 +61,7 @@ this pipeline — **never** by hand-editing the inventory.
---
## The data contract
### The data contract
The seam's interface is a single Terraform output consumed by a single script.
@ -88,7 +94,7 @@ Terraform, and the inventory is regenerated, never edited.
---
## Cloud-init's role
### Cloud-init's role
Cloud-init is the thin first-boot layer between Terraform and Ansible:
@ -103,7 +109,7 @@ The line is sharp: cloud-init buys *reachability*, Ansible owns *configuration*.
---
## Internal DNS — owned by Ansible, no chicken-and-egg
### Internal DNS — owned by Ansible, no chicken-and-egg
Terraform writes **no** DNS records. The internal zone (`boma.baobab.band`) is
rendered entirely by the Ansible `dns` role:
@ -129,7 +135,7 @@ convention only — it no longer implies any difference in how records are writt
---
## The control-node exception
### The control-node exception
The control node — the host that runs Terraform and Ansible — is `ubongo`, a
dedicated **physical** machine outside the cluster. It is not a VM at all, so
@ -146,7 +152,7 @@ Every other host is Terraform-managed.
---
## What was ruled out
### What was ruled out
| Option | Reason |
|---|---|
@ -154,3 +160,25 @@ Every other host is Terraform-managed.
| Hand-editing the generated inventory | `hosts.yml` is a build artifact of `tf_to_inventory.py`; edits are overwritten on the next `make tf-inventory`. Edit `local.vms` instead. |
| Documenting the seam in both ADR-005 and ADR-006 | The boundary belongs in exactly one place. Those ADRs link here. |
| Terraform-managed DNS records (`hashicorp/dns` + RFC 2136) | Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle. |
## Consequences
Drawn from the boundary, the data contract, and the "What was ruled out" section above:
- Adding a host means editing `local.vms` and running the handoff pipeline; the
generated `hosts.yml` is a build artifact and must never be hand-edited — manual
edits are overwritten on the next `make tf-inventory` (The handoff pipeline; The
data contract; What was ruled out).
- Manual `qm clone` is rejected as a general provisioning path so the inventory and
real infrastructure cannot drift; Terraform is the single way VMs come into
existence (What was ruled out).
- Terraform writes no DNS records: the Ansible `dns` role renders the whole internal
zone from inventory plus `group_vars`, dissolving the bootstrap cycle a
Terraform-managed zone (`hashicorp/dns` + RFC 2136) would create (Internal DNS —
owned by Ansible, no chicken-and-egg; What was ruled out).
- The control node (`ubongo`) is the single documented exception to "Terraform owns
VM existence" — a physical machine provisioned manually and managed by Ansible for
baseline config only; every other host is Terraform-managed (The control-node
exception).
- The seam is documented in exactly one place (this ADR); ADR-005 and ADR-006 link
here rather than restating it (What was ruled out).