docs(adr): restructure ADRs 006-009 to ADR-023 conformance
Add dated Status sections, a Decision umbrella over the existing topical sections (demoted to ###), and Consequences assembled from each ADR's already-stated implications. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
188882449d
commit
44dbd4628f
4 changed files with 138 additions and 26 deletions
|
|
@ -1,5 +1,9 @@
|
|||
# ADR-006 — Terraform for infrastructure provisioning
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-05-30)
|
||||
|
||||
## Context
|
||||
|
||||
Ansible manages host configuration well but has no state model for infrastructure
|
||||
|
|
@ -13,7 +17,9 @@ exact boundary, handoff pipeline, and data contract between them live in **ADR-0
|
|||
|
||||
---
|
||||
|
||||
## Responsibility split
|
||||
## Decision
|
||||
|
||||
### Responsibility split
|
||||
|
||||
The canonical responsibility-split table lives in **ADR-009**. In short: Terraform
|
||||
owns VM existence only; Ansible owns everything inside a VM, including all internal
|
||||
|
|
@ -26,7 +32,7 @@ cadence, making them a poor fit for Terraform state.
|
|||
|
||||
---
|
||||
|
||||
## Providers
|
||||
### Providers
|
||||
|
||||
**`bpg/proxmox` (`~> 0.70`)**: Chosen over `telmate/proxmox` for active maintenance,
|
||||
full Proxmox 8 API support, and better cloud-init integration. This is the only
|
||||
|
|
@ -42,7 +48,7 @@ Terraform manages its own provider dependencies via `required_providers` and
|
|||
|
||||
---
|
||||
|
||||
## State backend
|
||||
### State backend
|
||||
|
||||
**Choice**: Local state on the control node.
|
||||
|
||||
|
|
@ -59,7 +65,7 @@ integration boundary.
|
|||
|
||||
---
|
||||
|
||||
## Structure
|
||||
### Structure
|
||||
|
||||
```
|
||||
terraform/
|
||||
|
|
@ -83,7 +89,7 @@ Each environment directory contains:
|
|||
|
||||
---
|
||||
|
||||
## Secrets handling
|
||||
### Secrets handling
|
||||
|
||||
The only secret input (the Proxmox API token) is passed via a `TF_VAR_*`
|
||||
environment variable and declared `sensitive = true` in `variables.tf`. It never
|
||||
|
|
@ -92,7 +98,7 @@ appears in `.tfvars` files. Non-secret configuration lives in tracked
|
|||
|
||||
---
|
||||
|
||||
## Ansible integration
|
||||
### Ansible integration
|
||||
|
||||
After `terraform apply`, run `make tf-inventory TF_ENV=<env>` to regenerate
|
||||
`inventories/<env>/hosts.yml` from the `vms` output. The full handoff pipeline,
|
||||
|
|
@ -102,7 +108,7 @@ handoff)**.
|
|||
|
||||
---
|
||||
|
||||
## What was ruled out
|
||||
### What was ruled out
|
||||
|
||||
| Option | Reason |
|
||||
|---|---|
|
||||
|
|
@ -110,3 +116,24 @@ handoff)**.
|
|||
| OPNsense Terraform provider | Community-maintained; provider rot risk across OPNsense releases |
|
||||
| Terraform workspaces | Single state file with workspace prefix; accidental cross-env apply possible |
|
||||
| Separate Terraform repo | Cross-referencing between infra and config adds friction; monorepo keeps the full picture together |
|
||||
|
||||
## Consequences
|
||||
|
||||
Drawn from the "What was ruled out" section and the decisions stated above:
|
||||
|
||||
- `bpg/proxmox` is the only provider; `telmate/proxmox` was ruled out for weaker
|
||||
maintenance and Proxmox 8 / cloud-init support (Providers; What was ruled out).
|
||||
- OPNsense stays entirely in Ansible — no Terraform OPNsense provider — to avoid
|
||||
community-provider rot across OPNsense releases (Responsibility split; What was
|
||||
ruled out).
|
||||
- Terraform writes no DNS records; Ansible's `dns` role owns the entire internal
|
||||
zone, avoiding the bootstrap cycle and split DNS ownership the earlier
|
||||
`hashicorp/dns` design created (Providers).
|
||||
- State is local on the control node because Forgejo offers no usable HTTP state
|
||||
backend; this is sufficient at solo-operator scale (no concurrent applies, no
|
||||
remote locking), with a real backend such as MinIO/S3 to be added later if
|
||||
warranted (State backend).
|
||||
- Separate environment directories are used instead of Terraform workspaces to
|
||||
remove the risk of applying the wrong state (Structure; What was ruled out).
|
||||
- Terraform and Ansible internals are kept in one monorepo rather than a separate
|
||||
Terraform repo to avoid cross-referencing friction (What was ruled out).
|
||||
|
|
|
|||
|
|
@ -1,5 +1,9 @@
|
|||
# ADR-007 — Network topology and addressing
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-05-30)
|
||||
|
||||
## Context
|
||||
|
||||
The boma homelab is a Proxmox cluster on a dedicated private network behind an
|
||||
|
|
@ -10,7 +14,9 @@ and OPNsense configuration.
|
|||
|
||||
---
|
||||
|
||||
## Physical topology
|
||||
## Decision
|
||||
|
||||
### Physical topology
|
||||
|
||||
```
|
||||
ISP
|
||||
|
|
@ -38,7 +44,7 @@ ISP
|
|||
|
||||
---
|
||||
|
||||
## VLAN design
|
||||
### VLAN design
|
||||
|
||||
| VLAN | Name | Subnet | Purpose |
|
||||
|---|---|---|---|
|
||||
|
|
@ -51,7 +57,7 @@ ISP
|
|||
|
||||
---
|
||||
|
||||
## IP addressing
|
||||
### IP addressing
|
||||
|
||||
### VLAN 10 — mgmt (10.10.0.0/24) — no DHCP
|
||||
|
||||
|
|
@ -121,7 +127,7 @@ NetBird self-hosted on `askari`. NetBird manages its own overlay addressing
|
|||
|
||||
---
|
||||
|
||||
## OPNsense firewall rules (intent)
|
||||
### OPNsense firewall rules (intent)
|
||||
|
||||
| Source | Destination | Policy |
|
||||
|---|---|---|
|
||||
|
|
@ -142,7 +148,7 @@ IoT devices cannot initiate connections to `srv`.
|
|||
|
||||
---
|
||||
|
||||
## Naming scheme
|
||||
### Naming scheme
|
||||
|
||||
| Layer | Convention | Examples |
|
||||
|---|---|---|
|
||||
|
|
@ -155,7 +161,7 @@ IoT devices cannot initiate connections to `srv`.
|
|||
|
||||
---
|
||||
|
||||
## DNS zones and split-horizon
|
||||
### DNS zones and split-horizon
|
||||
|
||||
**Internal zone**: `boma.baobab.band` — served by `dns1` and `dns2`.
|
||||
The zone is rendered by the Ansible `dns` role: host A records come from the
|
||||
|
|
@ -175,7 +181,7 @@ All other queries go upstream (e.g., `1.1.1.1`, `9.9.9.9`).
|
|||
|
||||
---
|
||||
|
||||
## External monitoring — askari
|
||||
### External monitoring — askari
|
||||
|
||||
`askari` (Hetzner VPS) is a peer on the **NetBird mesh** (ADR-016) and also **hosts
|
||||
the self-hosted NetBird coordinator** (management/signal/relay). It reaches `srv`
|
||||
|
|
@ -186,3 +192,24 @@ ACLs — no OPNsense WireGuard tunnel and no `10.99.0.0/24` routing.
|
|||
be reachable even when the homelab is down (its entire purpose), which is also why
|
||||
the mesh coordinator lives here: an off-site control plane survives a homelab outage.
|
||||
FQDN: `askari.baobab.band`.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
Drawn from the implications already stated above:
|
||||
|
||||
- VLAN 99 (`vpn`, `10.99.0.0/24`) is retired and the subnet freed; remote access is
|
||||
carried by the self-hosted NetBird mesh instead of an OPNsense WireGuard subnet
|
||||
(VLAN design; IP addressing — VLAN 99 retired).
|
||||
- Mesh-peer firewall allowances (to `srv` metrics ports and `mgmt`) are enforced by
|
||||
NetBird ACLs, not OPNsense rules (OPNsense firewall rules (intent)).
|
||||
- IoT devices cannot initiate connections to `srv`; only Home Assistant at
|
||||
`10.20.0.13` may reach the IoT VLAN, with OPNsense Avahi bridging `srv` ↔ `iot`
|
||||
for discovery (OPNsense firewall rules (intent)).
|
||||
- Terraform writes no DNS records; the Ansible `dns` role renders the internal zone
|
||||
from inventory plus `group_vars`, with `dns1`/`dns2` serving split-horizon answers
|
||||
(DNS zones and split-horizon).
|
||||
- `askari` runs independently of the cluster so it survives a homelab outage, which
|
||||
is why the off-site NetBird control plane lives there (External monitoring —
|
||||
askari).
|
||||
|
|
|
|||
|
|
@ -3,6 +3,10 @@
|
|||
> Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`,
|
||||
> apply-path coverage blind spots) live in `docs/testing/gotchas.md`.
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-05-30)
|
||||
|
||||
## Context
|
||||
|
||||
Ansible roles must be idempotent and correct before they touch production hosts.
|
||||
|
|
@ -11,7 +15,9 @@ This document records the testing strategy, what each level covers, and — crit
|
|||
|
||||
---
|
||||
|
||||
## Three testing levels
|
||||
## Decision
|
||||
|
||||
### Three testing levels
|
||||
|
||||
### Level 1 — Molecule (per role, always required)
|
||||
|
||||
|
|
@ -78,7 +84,7 @@ deploy (STATUS.md). Full design: ADR-017.
|
|||
|
||||
---
|
||||
|
||||
## Molecule test image
|
||||
### Molecule test image
|
||||
|
||||
**No external images.** The project builds and hosts its own test image.
|
||||
|
||||
|
|
@ -103,7 +109,7 @@ functionally equivalent and fully owned.
|
|||
|
||||
---
|
||||
|
||||
## Idempotency requirements
|
||||
### Idempotency requirements
|
||||
|
||||
Every role task must satisfy one of these:
|
||||
|
||||
|
|
@ -121,7 +127,7 @@ catches anything lint misses.
|
|||
|
||||
---
|
||||
|
||||
## What Molecule tests — and what it does not
|
||||
### What Molecule tests — and what it does not
|
||||
|
||||
### Tested in Molecule
|
||||
|
||||
|
|
@ -161,7 +167,7 @@ Behavioural correctness is confirmed on staging.
|
|||
|
||||
---
|
||||
|
||||
## CI pipeline
|
||||
### CI pipeline
|
||||
|
||||
```
|
||||
push to main
|
||||
|
|
@ -178,3 +184,27 @@ promote to production
|
|||
|
||||
Manual gates are intentional. Automated tests prove correctness in isolation;
|
||||
a human confirms the change is safe to promote.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
Drawn from the limitations and trade-offs already stated above:
|
||||
|
||||
- The Molecule idempotency step is non-negotiable; every role must pass it cleanly
|
||||
(Three testing levels — Level 1).
|
||||
- A class of capabilities (nftables rule loading, NetBird mesh data plane,
|
||||
unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware
|
||||
passthrough, corosync cluster formation) cannot be verified in Molecule and is
|
||||
validated only at Level 2 (staging) or Level 3 (external) — a conscious,
|
||||
documented decision, not a gap (What Molecule tests — and what it does not).
|
||||
- The project builds and hosts its own `molecule-debian13` image rather than relying
|
||||
on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a
|
||||
custom image to avoid drift, disappearance, or unexpected changes outside project
|
||||
control (Molecule test image).
|
||||
- Level 4 service-UI acceptance is authorable now but its execution is deferred,
|
||||
pending `ubongo`, the `playwright` plugin, Authentik, and a staging deploy (Three
|
||||
testing levels — Level 4).
|
||||
- Promotion to staging and to production stays behind intentional manual approval
|
||||
gates; automation proves isolated correctness, a human confirms promotion safety
|
||||
(CI pipeline).
|
||||
|
|
|
|||
|
|
@ -1,5 +1,9 @@
|
|||
# ADR-009 — Terraform ↔ Ansible provisioning handoff
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-05-30)
|
||||
|
||||
## Context
|
||||
|
||||
Two tools touch every managed host. Terraform owns **what exists** — VMs on
|
||||
|
|
@ -14,7 +18,9 @@ the cloud-init template that VMs are cloned from. This ADR covers how they conne
|
|||
|
||||
---
|
||||
|
||||
## The boundary
|
||||
## Decision
|
||||
|
||||
### The boundary
|
||||
|
||||
| Layer | Tool | Notes |
|
||||
|---|---|---|
|
||||
|
|
@ -31,7 +37,7 @@ below).
|
|||
|
||||
---
|
||||
|
||||
## The handoff pipeline
|
||||
### The handoff pipeline
|
||||
|
||||
There is one path by which a managed host comes into existence and reaches its
|
||||
configured state:
|
||||
|
|
@ -55,7 +61,7 @@ this pipeline — **never** by hand-editing the inventory.
|
|||
|
||||
---
|
||||
|
||||
## The data contract
|
||||
### The data contract
|
||||
|
||||
The seam's interface is a single Terraform output consumed by a single script.
|
||||
|
||||
|
|
@ -88,7 +94,7 @@ Terraform, and the inventory is regenerated, never edited.
|
|||
|
||||
---
|
||||
|
||||
## Cloud-init's role
|
||||
### Cloud-init's role
|
||||
|
||||
Cloud-init is the thin first-boot layer between Terraform and Ansible:
|
||||
|
||||
|
|
@ -103,7 +109,7 @@ The line is sharp: cloud-init buys *reachability*, Ansible owns *configuration*.
|
|||
|
||||
---
|
||||
|
||||
## Internal DNS — owned by Ansible, no chicken-and-egg
|
||||
### Internal DNS — owned by Ansible, no chicken-and-egg
|
||||
|
||||
Terraform writes **no** DNS records. The internal zone (`boma.baobab.band`) is
|
||||
rendered entirely by the Ansible `dns` role:
|
||||
|
|
@ -129,7 +135,7 @@ convention only — it no longer implies any difference in how records are writt
|
|||
|
||||
---
|
||||
|
||||
## The control-node exception
|
||||
### The control-node exception
|
||||
|
||||
The control node — the host that runs Terraform and Ansible — is `ubongo`, a
|
||||
dedicated **physical** machine outside the cluster. It is not a VM at all, so
|
||||
|
|
@ -146,7 +152,7 @@ Every other host is Terraform-managed.
|
|||
|
||||
---
|
||||
|
||||
## What was ruled out
|
||||
### What was ruled out
|
||||
|
||||
| Option | Reason |
|
||||
|---|---|
|
||||
|
|
@ -154,3 +160,25 @@ Every other host is Terraform-managed.
|
|||
| Hand-editing the generated inventory | `hosts.yml` is a build artifact of `tf_to_inventory.py`; edits are overwritten on the next `make tf-inventory`. Edit `local.vms` instead. |
|
||||
| Documenting the seam in both ADR-005 and ADR-006 | The boundary belongs in exactly one place. Those ADRs link here. |
|
||||
| Terraform-managed DNS records (`hashicorp/dns` + RFC 2136) | Created a bootstrap cycle (the first DNS server can't register itself) and split DNS ownership across two tools. Ansible owns the whole internal zone instead — one owner, no cycle. |
|
||||
|
||||
## Consequences
|
||||
|
||||
Drawn from the boundary, the data contract, and the "What was ruled out" section above:
|
||||
|
||||
- Adding a host means editing `local.vms` and running the handoff pipeline; the
|
||||
generated `hosts.yml` is a build artifact and must never be hand-edited — manual
|
||||
edits are overwritten on the next `make tf-inventory` (The handoff pipeline; The
|
||||
data contract; What was ruled out).
|
||||
- Manual `qm clone` is rejected as a general provisioning path so the inventory and
|
||||
real infrastructure cannot drift; Terraform is the single way VMs come into
|
||||
existence (What was ruled out).
|
||||
- Terraform writes no DNS records: the Ansible `dns` role renders the whole internal
|
||||
zone from inventory plus `group_vars`, dissolving the bootstrap cycle a
|
||||
Terraform-managed zone (`hashicorp/dns` + RFC 2136) would create (Internal DNS —
|
||||
owned by Ansible, no chicken-and-egg; What was ruled out).
|
||||
- The control node (`ubongo`) is the single documented exception to "Terraform owns
|
||||
VM existence" — a physical machine provisioned manually and managed by Ansible for
|
||||
baseline config only; every other host is Terraform-managed (The control-node
|
||||
exception).
|
||||
- The seam is documented in exactly one place (this ADR); ADR-005 and ADR-006 link
|
||||
here rather than restating it (What was ruled out).
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue