boma/docs/decisions/001-architecture.md
sjat d8afa94c4b Name and propagate the offsite_hosts inventory group (askari)
Review O4: ADR-016 said askari gets "its own inventory group" but never named it.
Settled as offsite_hosts (off-site, distinct from on-site-but-off-cluster ubongo).
Added to VALID_GROUPS (tf_to_inventory.py), ADR-009 valid groups, ADR-001/ADR-016
host-group enumerations, and CLAUDE.md. Generated hosts.yml picks up the section on
the next make tf-inventory (a manual-exception group like control).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:54:54 +02:00

3.2 KiB
Raw Blame History

ADR-001 — Architecture overview

Context

This document describes the overall architecture of the homelab infrastructure and the boundaries of what this Ansible monorepo manages.

Infrastructure

  • Hypervisor: Proxmox cluster (2+ nodes)
  • Guest OS: Debian 13 (all managed hosts)
  • Scale: 25 VMs, small fleet — treated as individuals, not cattle
  • Control node: ubongo — a dedicated always-on physical x86-64 machine outside the cluster. Ansible runs from here. It cannot be created by the Terraform it hosts, so it is provisioned manually (see ADR-015 and docs/runbooks/new-host.md).

What this repo manages

Layer Managed by Notes
VM existence Terraform (terraform/) Clones the cloud-init template; ubongo (control node) is a physical box outside the cluster, the one manual exception (see ADR-009/ADR-015)
Internal DNS records Ansible dns role Internal zone rendered from inventory (see ADR-007/009)
OS baseline Ansible base role Users, SSH, firewall, updates, audit
Docker runtime Ansible docker_host role Engine, daemon config, log driver
Service deployment Ansible per-service roles Compose rendered from templates
Secrets Ansible Vault Encrypted vault.yml files in repo

The Terraform↔Ansible boundary and handoff are defined in ADR-009. This table describes the intended design — see STATUS.md for what is actually built.

Host groups

all
├── control           # ubongo — physical control node outside the cluster; baseline config only, runs no services
├── docker_hosts      # VMs running Docker services (most hosts)
├── proxmox_hosts     # Proxmox nodes themselves (limited management scope)
└── offsite_hosts     # askari (off-site Hetzner) — NetBird coordinator + external watchdog

The control group holds the single manually-provisioned control node; it is managed for baseline config (SSH, firewall, updates) but never runs the docker_host role. The offsite_hosts group holds askari, the off-site Hetzner host — also manually provisioned (ADR-016), managed for baseline config plus the netbird_coordinator service role. Proxmox nodes are managed only for basic baseline tasks (SSH). Proxmox configuration itself (storage, clustering, networking) is out of scope.

Service interaction model

Services run as Docker containers on one or more docker_hosts. Where services need to interact, they do so via:

  • Docker networks (same host)
  • Internal DNS / hostname resolution (cross-host)
  • Explicitly defined published ports (external access)

All Compose files are rendered by Ansible from Jinja2 templates. No hand-edited Compose files exist on hosts — they are always regenerated on deploy.

Decision

This architecture prioritises:

  • Simplicity: few moving parts, no orchestration layer (no Kubernetes, no Swarm)
  • Reproducibility: any host can be rebuilt from scratch via Ansible
  • Legibility: a human reading the repo can understand what runs where