R6/R7: ADR-003 & ADR-008 CI pipelines rewritten trunk-based (push to main -> test -> staging -> [manual gate] production); CLAUDE.md no longer forbids pushing to main. R8: STATUS/roles-README/site.yml now say base & docker_host are not built (not in git), so a clean clone errors. R15/R16: ADR-001 table flagged as intended design; dropped the unbuilt 'monitoring agent' from the baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
63 lines
2.8 KiB
Markdown
63 lines
2.8 KiB
Markdown
# ADR-001 — Architecture overview
|
||
|
||
## Context
|
||
|
||
This document describes the overall architecture of the homelab infrastructure
|
||
and the boundaries of what this Ansible monorepo manages.
|
||
|
||
## Infrastructure
|
||
|
||
- **Hypervisor**: Proxmox cluster (2+ nodes)
|
||
- **Guest OS**: Debian 13 (all managed hosts)
|
||
- **Scale**: 2–5 VMs, small fleet — treated as individuals, not cattle
|
||
- **Control node**: A dedicated Debian 13 VM on the cluster. Ansible runs from here.
|
||
The control node is the one host that cannot fully bootstrap itself from scratch
|
||
and requires manual initial setup (see `docs/runbooks/new-host.md`).
|
||
|
||
## What this repo manages
|
||
|
||
| Layer | Managed by | Notes |
|
||
|--------------------|--------------------|--------------------------------------------|
|
||
| VM existence | Terraform (`terraform/`) | Clones the cloud-init template; control node is the one manual exception (see ADR-009) |
|
||
| Internal DNS records | Ansible `dns` role | Internal zone rendered from inventory (see ADR-007/009) |
|
||
| OS baseline | Ansible `base` role | Users, SSH, firewall, updates, audit |
|
||
| Docker runtime | Ansible `docker_host` role | Engine, daemon config, log driver |
|
||
| Service deployment | Ansible per-service roles | Compose rendered from templates |
|
||
| Secrets | Ansible Vault | Encrypted `vault.yml` files in repo |
|
||
|
||
The Terraform↔Ansible boundary and handoff are defined in ADR-009. This table
|
||
describes the *intended* design — see STATUS.md for what is actually built.
|
||
|
||
## Host groups
|
||
|
||
```
|
||
all
|
||
├── control # the control node itself — baseline config only, runs no services
|
||
├── docker_hosts # VMs running Docker services (most hosts)
|
||
└── proxmox_hosts # Proxmox nodes themselves (limited management scope)
|
||
```
|
||
|
||
The `control` group holds the single manually-provisioned control node; it is
|
||
managed for baseline config (SSH, firewall, updates) but never runs the
|
||
`docker_host` role. Proxmox nodes are managed only for basic baseline tasks (SSH).
|
||
Proxmox configuration itself (storage, clustering, networking)
|
||
is out of scope.
|
||
|
||
## Service interaction model
|
||
|
||
Services run as Docker containers on one or more `docker_hosts`. Where services
|
||
need to interact, they do so via:
|
||
|
||
- Docker networks (same host)
|
||
- Internal DNS / hostname resolution (cross-host)
|
||
- Explicitly defined published ports (external access)
|
||
|
||
All Compose files are rendered by Ansible from Jinja2 templates. No hand-edited
|
||
Compose files exist on hosts — they are always regenerated on deploy.
|
||
|
||
## Decision
|
||
|
||
This architecture prioritises:
|
||
- **Simplicity**: few moving parts, no orchestration layer (no Kubernetes, no Swarm)
|
||
- **Reproducibility**: any host can be rebuilt from scratch via Ansible
|
||
- **Legibility**: a human reading the repo can understand what runs where
|