# Runbook — Adding a new managed host ## Prerequisites - Proxmox VM template exists (Debian 13 cloud-init image — see below if not) - `rbw` is installed and unlocked (`rbw unlock`) so the vault password resolves from Vaultwarden - The host's intended hostname and IP are decided --- ## Part A — Create the Proxmox template (one-time) Run on a Proxmox node. Only needed once per cluster. ```bash # Download the Debian 13 genericcloud image wget https://cloud.debian.org/images/cloud/trixie/latest/debian-13-genericcloud-amd64.qcow2 # Create a VM (adjust ID, storage name as needed) qm create 9000 --name debian13-template --memory 2048 --cores 2 \ --net0 virtio,bridge=vmbr0 --serial0 socket --vga serial0 # Import the disk qm importdisk 9000 debian-13-genericcloud-amd64.qcow2 local-lvm # Attach disk and set boot order qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0 qm set 9000 --boot c --bootdisk scsi0 # Add cloud-init drive qm set 9000 --ide2 local-lvm:cloudinit # Enable QEMU guest agent qm set 9000 --agent enabled=1 # Convert to template (cannot be undone) qm template 9000 ``` --- ## Part B — Define the VM in Terraform Managed hosts are created by Terraform, never by hand. Add an entry to `local.vms` in the environment's `main.tf` (`terraform/environments//main.tf`): ```hcl locals { vms = { = { ip = "/24" # static; from docs/decisions/007-network.md group = "docker_hosts" # control | docker_hosts | proxmox_hosts cores = 2 memory_mb = 2048 } } } ``` Terraform clones the cloud-init template from Part A, sets the cloud-init values (hostname, SSH key, IP/gateway), and writes the host's DNS A record. See ADR-009 for the full handoff and the `vms` output → inventory data contract. --- ## Part C — Provision and regenerate the inventory ```bash make tf-plan TF_ENV=production # review — confirm only the new VM is added make tf-apply TF_ENV=production # create the VM + write its DNS A record make tf-inventory TF_ENV=production # regenerate inventories/production/hosts.yml ``` `make tf-inventory` rewrites `hosts.yml` from Terraform outputs — **do not edit that file by hand**; it carries a "do not edit manually" header and your changes would be overwritten. The source of truth is `local.vms`. Wait ~60 seconds after apply for cloud-init to complete, then verify SSH access: ```bash ssh ansible@ echo ok ``` Add a `host_vars//` directory if the host needs specific overrides (this is config, not inventory membership, so it is not generated): ```bash mkdir -p inventories/production/host_vars/ touch inventories/production/host_vars//vars.yml ``` --- ## Part D — Bootstrap and configure ```bash # First-run bootstrap (handles Python installation, initial user setup) make deploy PLAYBOOK=bootstrap # Apply full standard state make deploy PLAYBOOK=site ``` Verify the host reaches baseline: ```bash make check PLAYBOOK=site # Should report no changes ``` --- ## Part E — Control node (manual exception) The control node runs Terraform and Ansible, so it cannot be created by the Terraform it hosts (chicken-and-egg). It is the **one** host provisioned manually — see ADR-009 and the control-node section of ADR-005. Use the template from Part A: ```bash # Clone the template by hand (Proxmox UI or qm clone) qm clone 9000 --name --full qm set --memory 2048 --cores 2 \ --ciuser ansible \ --sshkeys /path/to/ansible_ed25519.pub \ --ipconfig0 ip=/24,gw= qm start ``` Then set up the Ansible environment on it (`make setup`, `make collections`, set up `rbw` and `rbw unlock`) per ADR-005, and add it to `inventories//hosts.yml` under the `control` group. Because the control node is not in `local.vms`, this is the only case where editing `hosts.yml` by hand is expected — every other host comes from `make tf-inventory`. --- ## Troubleshooting **SSH connection refused**: cloud-init may still be running. Wait and retry. **Python not found**: the bootstrap playbook handles this via `raw` module. If bootstrap fails, SSH to the host manually and run `apt install -y python3`. **Firewall locked out**: if nftables rules are misconfigured, connect via Proxmox console (not SSH) and run `nft flush ruleset` to clear all rules temporarily.