2026-05-30 14:10:01 +02:00
|
|
|
|
# ADR-005 — Host bootstrapping
|
|
|
|
|
|
|
2026-06-10 14:37:52 +02:00
|
|
|
|
## Status
|
|
|
|
|
|
|
|
|
|
|
|
Accepted (2026-05-30)
|
|
|
|
|
|
|
2026-05-30 14:10:01 +02:00
|
|
|
|
## Context
|
|
|
|
|
|
|
|
|
|
|
|
This document defines the **cloud-init template** that managed VMs are cloned
|
|
|
|
|
|
from, and the **control-node** bootstrapping special case. The per-host
|
|
|
|
|
|
provisioning pipeline — how a VM is created from this template and handed off to
|
|
|
|
|
|
Ansible — is owned by ADR-009. Terraform clones the template defined here; the
|
2026-06-05 18:23:16 +02:00
|
|
|
|
template is the base image for Terraform-managed hosts. The control node (`ubongo`)
|
|
|
|
|
|
is a physical machine installed directly, not cloned from this template (ADR-015).
|
2026-05-30 14:10:01 +02:00
|
|
|
|
|
|
|
|
|
|
## Approach: Proxmox cloud-init template
|
|
|
|
|
|
|
|
|
|
|
|
Managed VMs are cloned from a Proxmox VM template based on the official Debian 13
|
|
|
|
|
|
cloud image. Cloud-init handles first-boot configuration. Ansible takes over
|
|
|
|
|
|
from there.
|
|
|
|
|
|
|
|
|
|
|
|
The cloud-init image was chosen over:
|
|
|
|
|
|
- **Manual Debian installer**: slow, error-prone, not reproducible
|
|
|
|
|
|
- **Preseed/netboot**: powerful but complex to maintain
|
|
|
|
|
|
|
|
|
|
|
|
## Template creation (one-time, manual)
|
|
|
|
|
|
|
|
|
|
|
|
This is a manual procedure performed once per Proxmox cluster. Documented in
|
|
|
|
|
|
`docs/runbooks/new-host.md`.
|
|
|
|
|
|
|
|
|
|
|
|
High-level steps:
|
|
|
|
|
|
1. Download official Debian 13 genericcloud image
|
|
|
|
|
|
2. Import disk to Proxmox, create VM template
|
|
|
|
|
|
3. Install `qemu-guest-agent` in the template image
|
|
|
|
|
|
4. Convert VM to template — never boot the template directly
|
|
|
|
|
|
|
|
|
|
|
|
## VM provisioning (per new host)
|
|
|
|
|
|
|
2026-06-05 18:23:16 +02:00
|
|
|
|
Per-host VMs are created by **Terraform**, which clones this template and sets the
|
|
|
|
|
|
cloud-init values (hostname, SSH public key, IP/gateway). Cloud-init runs at first
|
|
|
|
|
|
boot (~30–60 seconds), leaving the VM reachable via SSH with the ansible user's key.
|
|
|
|
|
|
Terraform writes no DNS records — the `dns` role owns the internal zone (ADR-009).
|
2026-05-30 14:10:01 +02:00
|
|
|
|
|
|
|
|
|
|
The full create → inventory → configure pipeline, and the Terraform↔Ansible data
|
|
|
|
|
|
contract, are defined in **ADR-009 (provisioning handoff)**. There is no manual
|
|
|
|
|
|
`qm clone` path for managed hosts — the sole exception is the control node below.
|
|
|
|
|
|
|
|
|
|
|
|
## Ansible handoff
|
|
|
|
|
|
|
|
|
|
|
|
Once Terraform has created the VM and `make tf-inventory` has regenerated the
|
|
|
|
|
|
inventory, the `bootstrap` playbook handles first-run specifics (Python may not be
|
|
|
|
|
|
present, user may differ) and `site` applies the full standard state. See ADR-009
|
|
|
|
|
|
for the end-to-end commands and `docs/runbooks/new-host.md` for the full procedure.
|
|
|
|
|
|
|
|
|
|
|
|
## Control node bootstrapping
|
|
|
|
|
|
|
|
|
|
|
|
The control node is a special case — it runs Terraform and Ansible, so it cannot
|
2026-06-05 09:40:15 +02:00
|
|
|
|
be created by the Terraform it hosts (chicken-and-egg). It is `ubongo`, a dedicated
|
|
|
|
|
|
**physical** machine outside the cluster, and the one documented exception to
|
|
|
|
|
|
Terraform-owned VM existence (see ADR-009 and ADR-015). The control node requires:
|
2026-05-30 14:10:01 +02:00
|
|
|
|
|
2026-06-05 09:40:15 +02:00
|
|
|
|
1. Manual OS provisioning — install Debian 13 on the physical box by hand (it is not
|
|
|
|
|
|
a Proxmox guest, so there is no template to clone)
|
2026-05-30 14:10:01 +02:00
|
|
|
|
2. Manual setup of the Ansible environment:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
git clone <repo> ~/ansible
|
|
|
|
|
|
cd ~/ansible
|
|
|
|
|
|
make setup # creates venv, installs deps
|
|
|
|
|
|
make collections # installs Ansible collections
|
2026-05-30 19:17:25 +02:00
|
|
|
|
# set up rbw + unlock so the vault password resolves from Vaultwarden
|
|
|
|
|
|
# (one-time, per docs/runbooks/rotate-secrets.md)
|
|
|
|
|
|
rbw login && rbw unlock
|
2026-05-30 14:10:01 +02:00
|
|
|
|
```
|
|
|
|
|
|
3. After that, the control node can manage all other hosts normally
|
|
|
|
|
|
|
2026-06-05 09:40:15 +02:00
|
|
|
|
`ubongo` is listed in `inventories/production/hosts.yml` under the `control` group
|
|
|
|
|
|
and can be managed for baseline config (SSH, firewall, updates) but not for the
|
|
|
|
|
|
`docker_host` role (it does not run services). Hardware target and recovery model
|
|
|
|
|
|
are in ADR-015.
|
2026-05-30 14:10:01 +02:00
|
|
|
|
|
|
|
|
|
|
## Decision
|
|
|
|
|
|
|
|
|
|
|
|
Cloud-init with Proxmox templates provides:
|
|
|
|
|
|
- Reproducible VM creation in under 2 minutes
|
|
|
|
|
|
- No manual installer interaction
|
|
|
|
|
|
- A clean handoff point to Ansible
|
|
|
|
|
|
- Easy rebuilds — destroy VM, clone template, run Ansible
|
2026-06-10 14:37:52 +02:00
|
|
|
|
|
|
|
|
|
|
## Consequences
|
|
|
|
|
|
|
|
|
|
|
|
Drawn from the trade-offs and special cases this ADR already states:
|
|
|
|
|
|
|
|
|
|
|
|
- The cloud-init image was chosen over a manual Debian installer (slow, error-prone,
|
|
|
|
|
|
not reproducible) and over preseed/netboot (powerful but complex to maintain) (per
|
|
|
|
|
|
Approach).
|
|
|
|
|
|
- Template creation is a one-time manual procedure per Proxmox cluster, and the template
|
|
|
|
|
|
is never booted directly (per Template creation).
|
|
|
|
|
|
- There is no manual `qm clone` path for managed hosts; the full create → inventory →
|
|
|
|
|
|
configure pipeline and the Terraform↔Ansible contract live in ADR-009 (per VM
|
|
|
|
|
|
provisioning / Ansible handoff).
|
|
|
|
|
|
- The control node is the sole documented exception — `ubongo`, a physical machine
|
|
|
|
|
|
installed by hand because it cannot be created by the Terraform it hosts (chicken-and-egg);
|
|
|
|
|
|
its hardware target and recovery model live in ADR-015 (per Control node bootstrapping).
|