boma/docs/decisions/005-bootstrapping.md
sjat 45ab6ced01 Purge residual .vault_pass references (review R1-R5)
Point ADR-005, the new-host runbook, CONTRIBUTING, and AGENTS at the
rbw/Vaultwarden flow instead of a .vault_pass file. Also record the cron-section
idea in docs/TODO.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:17:25 +02:00

81 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-005 — Host bootstrapping
## Context
This document defines the **cloud-init template** that managed VMs are cloned
from, and the **control-node** bootstrapping special case. The per-host
provisioning pipeline — how a VM is created from this template and handed off to
Ansible — is owned by ADR-009. Terraform clones the template defined here; the
template is the base image both for Terraform-managed hosts and for the manually
provisioned control node.
## Approach: Proxmox cloud-init template
Managed VMs are cloned from a Proxmox VM template based on the official Debian 13
cloud image. Cloud-init handles first-boot configuration. Ansible takes over
from there.
The cloud-init image was chosen over:
- **Manual Debian installer**: slow, error-prone, not reproducible
- **Preseed/netboot**: powerful but complex to maintain
## Template creation (one-time, manual)
This is a manual procedure performed once per Proxmox cluster. Documented in
`docs/runbooks/new-host.md`.
High-level steps:
1. Download official Debian 13 genericcloud image
2. Import disk to Proxmox, create VM template
3. Install `qemu-guest-agent` in the template image
4. Convert VM to template — never boot the template directly
## VM provisioning (per new host)
Per-host VMs are created by **Terraform**, which clones this template, sets the
cloud-init values (hostname, SSH public key, IP/gateway), and writes the host's
DNS A record. Cloud-init runs at first boot (~3060 seconds), leaving the VM
reachable via SSH with the ansible user's key.
The full create → inventory → configure pipeline, and the Terraform↔Ansible data
contract, are defined in **ADR-009 (provisioning handoff)**. There is no manual
`qm clone` path for managed hosts — the sole exception is the control node below.
## Ansible handoff
Once Terraform has created the VM and `make tf-inventory` has regenerated the
inventory, the `bootstrap` playbook handles first-run specifics (Python may not be
present, user may differ) and `site` applies the full standard state. See ADR-009
for the end-to-end commands and `docs/runbooks/new-host.md` for the full procedure.
## Control node bootstrapping
The control node is a special case — it runs Terraform and Ansible, so it cannot
be created by the Terraform it hosts (chicken-and-egg). It is the one documented
exception to Terraform-owned VM existence (see ADR-009). The control node requires:
1. Manual VM provisioning — clone this cloud-init template by hand (Proxmox UI or
`qm clone`), since Terraform is not yet available to do it
2. Manual setup of the Ansible environment:
```bash
git clone <repo> ~/ansible
cd ~/ansible
make setup # creates venv, installs deps
make collections # installs Ansible collections
# set up rbw + unlock so the vault password resolves from Vaultwarden
# (one-time, per docs/runbooks/rotate-secrets.md)
rbw login && rbw unlock
```
3. After that, the control node can manage all other hosts normally
The control node itself is listed in `inventories/production/hosts.yml` under
a `control` group and can be managed for baseline config (SSH, firewall, updates)
but not for the `docker_host` role (it does not run services).
## Decision
Cloud-init with Proxmox templates provides:
- Reproducible VM creation in under 2 minutes
- No manual installer interaction
- A clean handoff point to Ansible
- Easy rebuilds — destroy VM, clone template, run Ansible