From d8afa94c4b42bfadb5fee553778ad7d877b8f559 Mon Sep 17 00:00:00 2001 From: sjat Date: Fri, 5 Jun 2026 18:54:54 +0200 Subject: [PATCH] Name and propagate the offsite_hosts inventory group (askari) Review O4: ADR-016 said askari gets "its own inventory group" but never named it. Settled as offsite_hosts (off-site, distinct from on-site-but-off-cluster ubongo). Added to VALID_GROUPS (tf_to_inventory.py), ADR-009 valid groups, ADR-001/ADR-016 host-group enumerations, and CLAUDE.md. Generated hosts.yml picks up the section on the next make tf-inventory (a manual-exception group like control). Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 7 +++++-- docs/decisions/001-architecture.md | 7 +++++-- docs/decisions/009-provisioning-handoff.md | 7 ++++++- docs/decisions/016-mesh-vpn.md | 3 ++- scripts/tf_to_inventory.py | 6 ++++-- 5 files changed, 22 insertions(+), 8 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index e55f4a6..1d98b3c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -101,14 +101,17 @@ inventories/ vault.yml docker_hosts/ # hosts running Docker services proxmox_hosts/ # Proxmox nodes themselves + offsite_hosts/ # off-site hosts (askari) — NetBird coordinator + watchdog host_vars/ # per-host overrides staging/ # safe to run freely ``` -Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts` +Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts` (`control` holds `ubongo`, the one manually-provisioned **physical** control node -outside the cluster — see ADR-009 and ADR-015.) +outside the cluster; `offsite_hosts` holds `askari`, the off-site Hetzner host that +runs the NetBird coordinator + watchdog — also added manually. See ADR-009, ADR-015, +ADR-016.) --- diff --git a/docs/decisions/001-architecture.md b/docs/decisions/001-architecture.md index cac5be9..adc3dbc 100644 --- a/docs/decisions/001-architecture.md +++ b/docs/decisions/001-architecture.md @@ -35,12 +35,15 @@ describes the *intended* design — see STATUS.md for what is actually built. all ├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services ├── docker_hosts # VMs running Docker services (most hosts) -└── proxmox_hosts # Proxmox nodes themselves (limited management scope) +├── proxmox_hosts # Proxmox nodes themselves (limited management scope) +└── offsite_hosts # askari (off-site Hetzner) — NetBird coordinator + external watchdog ``` The `control` group holds the single manually-provisioned control node; it is managed for baseline config (SSH, firewall, updates) but never runs the -`docker_host` role. Proxmox nodes are managed only for basic baseline tasks (SSH). +`docker_host` role. The `offsite_hosts` group holds `askari`, the off-site Hetzner +host — also manually provisioned (ADR-016), managed for baseline config plus the +`netbird_coordinator` service role. Proxmox nodes are managed only for basic baseline tasks (SSH). Proxmox configuration itself (storage, clustering, networking) is out of scope. diff --git a/docs/decisions/009-provisioning-handoff.md b/docs/decisions/009-provisioning-handoff.md index 0b9cc42..abb0173 100644 --- a/docs/decisions/009-provisioning-handoff.md +++ b/docs/decisions/009-provisioning-handoff.md @@ -75,7 +75,12 @@ The seam's interface is a single Terraform output consumed by a single script. `terraform output -json` and writes `inventories//hosts.yml`. It validates the group against the allowed set and fails loudly on an unknown group. -**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`. +**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`. + +`control` and `offsite_hosts` are not produced by Terraform — they hold manually +provisioned hosts (`ubongo` and `askari` respectively) added to the inventory by hand +(see the control-node exception below and ADR-015/ADR-016). They are valid groups so +the generated `hosts.yml` carries their (otherwise empty) sections. The generated `hosts.yml` carries a "do not edit manually" header and is owned by the generator. Treat it as a build artifact: the source of truth is `local.vms` in diff --git a/docs/decisions/016-mesh-vpn.md b/docs/decisions/016-mesh-vpn.md index a322c0b..a317d01 100644 --- a/docs/decisions/016-mesh-vpn.md +++ b/docs/decisions/016-mesh-vpn.md @@ -77,7 +77,8 @@ allocated for it. - **Coordinator survival:** off-site on `askari` ⇒ mesh survives a homelab outage. NetBird's management datastore is backed up encrypted off `askari` (synced to `ubongo`/`mamba`); peers keep last-known config through a brief coordinator outage. -- **`askari` is Ansible-managed:** its own inventory group, `base` role, plus a +- **`askari` is Ansible-managed:** its own inventory group `offsite_hosts` (added + manually like the control node — it is not Terraform-managed), `base` role, plus a dedicated `netbird_coordinator` service role (one service = one role, ADR-004; with `SECURITY.md`). Agent install/enrollment lives in `base`. NetBird server + agents are version-pinned (ADR-011). boma's `dns` role stays authoritative for diff --git a/scripts/tf_to_inventory.py b/scripts/tf_to_inventory.py index b9a0959..aaed934 100644 --- a/scripts/tf_to_inventory.py +++ b/scripts/tf_to_inventory.py @@ -15,13 +15,15 @@ Expected Terraform output shape: } } -Valid groups: control, docker_hosts, proxmox_hosts +Valid groups: control, docker_hosts, proxmox_hosts, offsite_hosts +(control and offsite_hosts hold manually-provisioned hosts not in Terraform; they +are valid so their sections appear in the generated inventory — see ADR-009.) """ import json import sys -VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts"} +VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts", "offsite_hosts"} def main() -> None: