Auto-fixes from /review-repo: - ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record" (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run) - ADR-005: control node is physical ubongo, not cloned from the template (ADR-015) - CLAUDE.md: add the VERIFY.md template to Further reading - TODO.md: typo fixes (we we / seperate) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3.6 KiB
ADR-005 — Host bootstrapping
Context
This document defines the cloud-init template that managed VMs are cloned
from, and the control-node bootstrapping special case. The per-host
provisioning pipeline — how a VM is created from this template and handed off to
Ansible — is owned by ADR-009. Terraform clones the template defined here; the
template is the base image for Terraform-managed hosts. The control node (ubongo)
is a physical machine installed directly, not cloned from this template (ADR-015).
Approach: Proxmox cloud-init template
Managed VMs are cloned from a Proxmox VM template based on the official Debian 13 cloud image. Cloud-init handles first-boot configuration. Ansible takes over from there.
The cloud-init image was chosen over:
- Manual Debian installer: slow, error-prone, not reproducible
- Preseed/netboot: powerful but complex to maintain
Template creation (one-time, manual)
This is a manual procedure performed once per Proxmox cluster. Documented in
docs/runbooks/new-host.md.
High-level steps:
- Download official Debian 13 genericcloud image
- Import disk to Proxmox, create VM template
- Install
qemu-guest-agentin the template image - Convert VM to template — never boot the template directly
VM provisioning (per new host)
Per-host VMs are created by Terraform, which clones this template and sets the
cloud-init values (hostname, SSH public key, IP/gateway). Cloud-init runs at first
boot (~30–60 seconds), leaving the VM reachable via SSH with the ansible user's key.
Terraform writes no DNS records — the dns role owns the internal zone (ADR-009).
The full create → inventory → configure pipeline, and the Terraform↔Ansible data
contract, are defined in ADR-009 (provisioning handoff). There is no manual
qm clone path for managed hosts — the sole exception is the control node below.
Ansible handoff
Once Terraform has created the VM and make tf-inventory has regenerated the
inventory, the bootstrap playbook handles first-run specifics (Python may not be
present, user may differ) and site applies the full standard state. See ADR-009
for the end-to-end commands and docs/runbooks/new-host.md for the full procedure.
Control node bootstrapping
The control node is a special case — it runs Terraform and Ansible, so it cannot
be created by the Terraform it hosts (chicken-and-egg). It is ubongo, a dedicated
physical machine outside the cluster, and the one documented exception to
Terraform-owned VM existence (see ADR-009 and ADR-015). The control node requires:
- Manual OS provisioning — install Debian 13 on the physical box by hand (it is not a Proxmox guest, so there is no template to clone)
- Manual setup of the Ansible environment:
git clone <repo> ~/ansible cd ~/ansible make setup # creates venv, installs deps make collections # installs Ansible collections # set up rbw + unlock so the vault password resolves from Vaultwarden # (one-time, per docs/runbooks/rotate-secrets.md) rbw login && rbw unlock - After that, the control node can manage all other hosts normally
ubongo is listed in inventories/production/hosts.yml under the control group
and can be managed for baseline config (SSH, firewall, updates) but not for the
docker_host role (it does not run services). Hardware target and recovery model
are in ADR-015.
Decision
Cloud-init with Proxmox templates provides:
- Reproducible VM creation in under 2 minutes
- No manual installer interaction
- A clean handoff point to Ansible
- Easy rebuilds — destroy VM, clone template, run Ansible