boma/docs/decisions/005-bootstrapping.md
sjat 188882449d docs(adr): restructure ADRs 001,002,004,005,012,014,015 to ADR-023 conformance
Add dated Status sections and (where missing) Consequences sections assembled
from each ADR's already-stated implications. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:39:00 +02:00

103 lines
4.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-005 — Host bootstrapping
## Status
Accepted (2026-05-30)
## Context
This document defines the **cloud-init template** that managed VMs are cloned
from, and the **control-node** bootstrapping special case. The per-host
provisioning pipeline — how a VM is created from this template and handed off to
Ansible — is owned by ADR-009. Terraform clones the template defined here; the
template is the base image for Terraform-managed hosts. The control node (`ubongo`)
is a physical machine installed directly, not cloned from this template (ADR-015).
## Approach: Proxmox cloud-init template
Managed VMs are cloned from a Proxmox VM template based on the official Debian 13
cloud image. Cloud-init handles first-boot configuration. Ansible takes over
from there.
The cloud-init image was chosen over:
- **Manual Debian installer**: slow, error-prone, not reproducible
- **Preseed/netboot**: powerful but complex to maintain
## Template creation (one-time, manual)
This is a manual procedure performed once per Proxmox cluster. Documented in
`docs/runbooks/new-host.md`.
High-level steps:
1. Download official Debian 13 genericcloud image
2. Import disk to Proxmox, create VM template
3. Install `qemu-guest-agent` in the template image
4. Convert VM to template — never boot the template directly
## VM provisioning (per new host)
Per-host VMs are created by **Terraform**, which clones this template and sets the
cloud-init values (hostname, SSH public key, IP/gateway). Cloud-init runs at first
boot (~3060 seconds), leaving the VM reachable via SSH with the ansible user's key.
Terraform writes no DNS records — the `dns` role owns the internal zone (ADR-009).
The full create → inventory → configure pipeline, and the Terraform↔Ansible data
contract, are defined in **ADR-009 (provisioning handoff)**. There is no manual
`qm clone` path for managed hosts — the sole exception is the control node below.
## Ansible handoff
Once Terraform has created the VM and `make tf-inventory` has regenerated the
inventory, the `bootstrap` playbook handles first-run specifics (Python may not be
present, user may differ) and `site` applies the full standard state. See ADR-009
for the end-to-end commands and `docs/runbooks/new-host.md` for the full procedure.
## Control node bootstrapping
The control node is a special case — it runs Terraform and Ansible, so it cannot
be created by the Terraform it hosts (chicken-and-egg). It is `ubongo`, a dedicated
**physical** machine outside the cluster, and the one documented exception to
Terraform-owned VM existence (see ADR-009 and ADR-015). The control node requires:
1. Manual OS provisioning — install Debian 13 on the physical box by hand (it is not
a Proxmox guest, so there is no template to clone)
2. Manual setup of the Ansible environment:
```bash
git clone <repo> ~/ansible
cd ~/ansible
make setup # creates venv, installs deps
make collections # installs Ansible collections
# set up rbw + unlock so the vault password resolves from Vaultwarden
# (one-time, per docs/runbooks/rotate-secrets.md)
rbw login && rbw unlock
```
3. After that, the control node can manage all other hosts normally
`ubongo` is listed in `inventories/production/hosts.yml` under the `control` group
and can be managed for baseline config (SSH, firewall, updates) but not for the
`docker_host` role (it does not run services). Hardware target and recovery model
are in ADR-015.
## Decision
Cloud-init with Proxmox templates provides:
- Reproducible VM creation in under 2 minutes
- No manual installer interaction
- A clean handoff point to Ansible
- Easy rebuilds — destroy VM, clone template, run Ansible
## Consequences
Drawn from the trade-offs and special cases this ADR already states:
- The cloud-init image was chosen over a manual Debian installer (slow, error-prone,
not reproducible) and over preseed/netboot (powerful but complex to maintain) (per
Approach).
- Template creation is a one-time manual procedure per Proxmox cluster, and the template
is never booted directly (per Template creation).
- There is no manual `qm clone` path for managed hosts; the full create → inventory →
configure pipeline and the Terraform↔Ansible contract live in ADR-009 (per VM
provisioning / Ansible handoff).
- The control node is the sole documented exception — `ubongo`, a physical machine
installed by hand because it cannot be created by the Terraform it hosts (chicken-and-egg);
its hardware target and recovery model live in ADR-015 (per Control node bootstrapping).