Point ADR-005, the new-host runbook, CONTRIBUTING, and AGENTS at the rbw/Vaultwarden flow instead of a .vault_pass file. Also record the cron-section idea in docs/TODO.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
81 lines
3.4 KiB
Markdown
81 lines
3.4 KiB
Markdown
# ADR-005 — Host bootstrapping
|
||
|
||
## Context
|
||
|
||
This document defines the **cloud-init template** that managed VMs are cloned
|
||
from, and the **control-node** bootstrapping special case. The per-host
|
||
provisioning pipeline — how a VM is created from this template and handed off to
|
||
Ansible — is owned by ADR-009. Terraform clones the template defined here; the
|
||
template is the base image both for Terraform-managed hosts and for the manually
|
||
provisioned control node.
|
||
|
||
## Approach: Proxmox cloud-init template
|
||
|
||
Managed VMs are cloned from a Proxmox VM template based on the official Debian 13
|
||
cloud image. Cloud-init handles first-boot configuration. Ansible takes over
|
||
from there.
|
||
|
||
The cloud-init image was chosen over:
|
||
- **Manual Debian installer**: slow, error-prone, not reproducible
|
||
- **Preseed/netboot**: powerful but complex to maintain
|
||
|
||
## Template creation (one-time, manual)
|
||
|
||
This is a manual procedure performed once per Proxmox cluster. Documented in
|
||
`docs/runbooks/new-host.md`.
|
||
|
||
High-level steps:
|
||
1. Download official Debian 13 genericcloud image
|
||
2. Import disk to Proxmox, create VM template
|
||
3. Install `qemu-guest-agent` in the template image
|
||
4. Convert VM to template — never boot the template directly
|
||
|
||
## VM provisioning (per new host)
|
||
|
||
Per-host VMs are created by **Terraform**, which clones this template, sets the
|
||
cloud-init values (hostname, SSH public key, IP/gateway), and writes the host's
|
||
DNS A record. Cloud-init runs at first boot (~30–60 seconds), leaving the VM
|
||
reachable via SSH with the ansible user's key.
|
||
|
||
The full create → inventory → configure pipeline, and the Terraform↔Ansible data
|
||
contract, are defined in **ADR-009 (provisioning handoff)**. There is no manual
|
||
`qm clone` path for managed hosts — the sole exception is the control node below.
|
||
|
||
## Ansible handoff
|
||
|
||
Once Terraform has created the VM and `make tf-inventory` has regenerated the
|
||
inventory, the `bootstrap` playbook handles first-run specifics (Python may not be
|
||
present, user may differ) and `site` applies the full standard state. See ADR-009
|
||
for the end-to-end commands and `docs/runbooks/new-host.md` for the full procedure.
|
||
|
||
## Control node bootstrapping
|
||
|
||
The control node is a special case — it runs Terraform and Ansible, so it cannot
|
||
be created by the Terraform it hosts (chicken-and-egg). It is the one documented
|
||
exception to Terraform-owned VM existence (see ADR-009). The control node requires:
|
||
|
||
1. Manual VM provisioning — clone this cloud-init template by hand (Proxmox UI or
|
||
`qm clone`), since Terraform is not yet available to do it
|
||
2. Manual setup of the Ansible environment:
|
||
```bash
|
||
git clone <repo> ~/ansible
|
||
cd ~/ansible
|
||
make setup # creates venv, installs deps
|
||
make collections # installs Ansible collections
|
||
# set up rbw + unlock so the vault password resolves from Vaultwarden
|
||
# (one-time, per docs/runbooks/rotate-secrets.md)
|
||
rbw login && rbw unlock
|
||
```
|
||
3. After that, the control node can manage all other hosts normally
|
||
|
||
The control node itself is listed in `inventories/production/hosts.yml` under
|
||
a `control` group and can be managed for baseline config (SSH, firewall, updates)
|
||
but not for the `docker_host` role (it does not run services).
|
||
|
||
## Decision
|
||
|
||
Cloud-init with Proxmox templates provides:
|
||
- Reproducible VM creation in under 2 minutes
|
||
- No manual installer interaction
|
||
- A clean handoff point to Ansible
|
||
- Easy rebuilds — destroy VM, clone template, run Ansible
|