boma/docs/decisions/005-bootstrapping.md
sjat 188882449d docs(adr): restructure ADRs 001,002,004,005,012,014,015 to ADR-023 conformance
Add dated Status sections and (where missing) Consequences sections assembled
from each ADR's already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:39:00 +02:00

4.5 KiB
Raw Permalink Blame History

ADR-005 — Host bootstrapping

Status

Accepted (2026-05-30)

Context

This document defines the cloud-init template that managed VMs are cloned from, and the control-node bootstrapping special case. The per-host provisioning pipeline — how a VM is created from this template and handed off to Ansible — is owned by ADR-009. Terraform clones the template defined here; the template is the base image for Terraform-managed hosts. The control node (ubongo) is a physical machine installed directly, not cloned from this template (ADR-015).

Approach: Proxmox cloud-init template

Managed VMs are cloned from a Proxmox VM template based on the official Debian 13 cloud image. Cloud-init handles first-boot configuration. Ansible takes over from there.

The cloud-init image was chosen over:

  • Manual Debian installer: slow, error-prone, not reproducible
  • Preseed/netboot: powerful but complex to maintain

Template creation (one-time, manual)

This is a manual procedure performed once per Proxmox cluster. Documented in docs/runbooks/new-host.md.

High-level steps:

  1. Download official Debian 13 genericcloud image
  2. Import disk to Proxmox, create VM template
  3. Install qemu-guest-agent in the template image
  4. Convert VM to template — never boot the template directly

VM provisioning (per new host)

Per-host VMs are created by Terraform, which clones this template and sets the cloud-init values (hostname, SSH public key, IP/gateway). Cloud-init runs at first boot (~3060 seconds), leaving the VM reachable via SSH with the ansible user's key. Terraform writes no DNS records — the dns role owns the internal zone (ADR-009).

The full create → inventory → configure pipeline, and the Terraform↔Ansible data contract, are defined in ADR-009 (provisioning handoff). There is no manual qm clone path for managed hosts — the sole exception is the control node below.

Ansible handoff

Once Terraform has created the VM and make tf-inventory has regenerated the inventory, the bootstrap playbook handles first-run specifics (Python may not be present, user may differ) and site applies the full standard state. See ADR-009 for the end-to-end commands and docs/runbooks/new-host.md for the full procedure.

Control node bootstrapping

The control node is a special case — it runs Terraform and Ansible, so it cannot be created by the Terraform it hosts (chicken-and-egg). It is ubongo, a dedicated physical machine outside the cluster, and the one documented exception to Terraform-owned VM existence (see ADR-009 and ADR-015). The control node requires:

  1. Manual OS provisioning — install Debian 13 on the physical box by hand (it is not a Proxmox guest, so there is no template to clone)
  2. Manual setup of the Ansible environment:
    git clone <repo> ~/ansible
    cd ~/ansible
    make setup        # creates venv, installs deps
    make collections  # installs Ansible collections
    # set up rbw + unlock so the vault password resolves from Vaultwarden
    # (one-time, per docs/runbooks/rotate-secrets.md)
    rbw login && rbw unlock
    
  3. After that, the control node can manage all other hosts normally

ubongo is listed in inventories/production/hosts.yml under the control group and can be managed for baseline config (SSH, firewall, updates) but not for the docker_host role (it does not run services). Hardware target and recovery model are in ADR-015.

Decision

Cloud-init with Proxmox templates provides:

  • Reproducible VM creation in under 2 minutes
  • No manual installer interaction
  • A clean handoff point to Ansible
  • Easy rebuilds — destroy VM, clone template, run Ansible

Consequences

Drawn from the trade-offs and special cases this ADR already states:

  • The cloud-init image was chosen over a manual Debian installer (slow, error-prone, not reproducible) and over preseed/netboot (powerful but complex to maintain) (per Approach).
  • Template creation is a one-time manual procedure per Proxmox cluster, and the template is never booted directly (per Template creation).
  • There is no manual qm clone path for managed hosts; the full create → inventory → configure pipeline and the Terraform↔Ansible contract live in ADR-009 (per VM provisioning / Ansible handoff).
  • The control node is the sole documented exception — ubongo, a physical machine installed by hand because it cannot be created by the Terraform it hosts (chicken-and-egg); its hardware target and recovery model live in ADR-015 (per Control node bootstrapping).