From 666ad42634352148b3e5448d65df513392aacb24 Mon Sep 17 00:00:00 2001 From: sjat Date: Fri, 5 Jun 2026 18:23:16 +0200 Subject: [PATCH] review-repo: fix DNS-write contradictions + stale control-node/template refs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Auto-fixes from /review-repo: - ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record" (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run) - ADR-005: control node is physical ubongo, not cloned from the template (ADR-015) - CLAUDE.md: add the VERIFY.md template to Further reading - TODO.md: typo fixes (we we / seperate) Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 1 + docs/TODO.md | 15 ++++++--------- docs/decisions/005-bootstrapping.md | 12 ++++++------ docs/runbooks/new-host.md | 8 ++++---- 4 files changed, 17 insertions(+), 19 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 2389b1d..e55f4a6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -195,6 +195,7 @@ Single-contributor, trunk-based (no merge requests / approval gates): | Accepted security risks | `docs/security/accepted-risks.md` | | Per-service security checklist | `docs/security/service-checklist.md` | | Per-service security record (template) | `docs/security/service-security-template.md` | +| Per-service verification spec (template) | `docs/testing/service-verify-template.md` | | Heritage / V4 policy | `docs/decisions/013-heritage-v4.md` | | Sourcing tech knowledge | `docs/decisions/014-knowledge-sourcing.md` | | Toolchain choices | `docs/decisions/003-toolchain.md` | diff --git a/docs/TODO.md b/docs/TODO.md index b05f0e9..644548b 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -28,20 +28,17 @@ 8. Ensure the right things are backed up (incl. database dumps if we land on PBS). 9. Decide: a central database server, or individual database services per app? 10. Should we continue to use the base-container method, or maybe something in the improvements of the methods in boma moods the point? + 11. Deliberate tagging strategy. 4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani? 5. **Control node** 1. Set up and test the control node while waiting for hardware. 2. Define control-node bootstrapping — a dedicated recipe and playbook? - 3. Decide the role of mamba — access/availability vs compute power and ease? - 4. Set up rbw on the control node. + 3. Set up rbw on the control node. -6. **Updating** - 1. Decide pinning vs latest for versions. - 2. Decide the update strategy across services & containers vs packages & - builds / GitHub pulls / Flatpaks. - 3. Define scheduling of updates and reboots, including post-update testing. +6. **Updating** 2. Decide the update strategy across services & containers vs packages & + builds / GitHub pulls / Flatpaks. 3. Define scheduling of updates and reboots, including post-update testing. 7. **Shell setup** 1. Decide what shell setup matters for the AI's work on the control node. @@ -79,7 +76,7 @@ remaining setup to carry out from this decision? 1. ~~Policy for how we collaborate with references to baobabAnsibleV4 without misusing it.~~ DECIDED — ADR-013. 2. Policy for how we write key documents like ADRs. - 3. Further development on how we we collaborate on designing the foundation for the project - seperate from how we implement new containers etc. + 3. Further development on how we collaborate on designing the foundation for the project - separate from how we implement new containers etc. 4. ~~How do we make sure agents always use the latest official documentation for the technologies etc. we use?~~ DECIDED — ADR-014 (facts → version-matched docs, cited + stamped; best practices → translated per ADR-013; risk-based triggers; graceful fallback to WebFetch). 5. Always subagent driven? 6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback? @@ -99,7 +96,7 @@ 2. Keep appending raw signals to `docs/FRICTION.md` (live now) until the retro consumes them. -12. **Spin-up order** — what is the right order of operations when spinning up +12. **Spin-up / build order** — what is the right order of operations when spinning up from scratch (OS, DNS, Authentik, Traefik, …)? 13. **Intentions** - Is the current setup clearly identifying intentions throughout? We have the readme files but is that enough? Also, how do we rechallange desisions and how they interact over time. I.e. We have these two services running, but extending one a little bit could make the other redundant so we could remove it. Or an alternative to this services has emerged, and it is actually better. diff --git a/docs/decisions/005-bootstrapping.md b/docs/decisions/005-bootstrapping.md index 71188f6..b91a85c 100644 --- a/docs/decisions/005-bootstrapping.md +++ b/docs/decisions/005-bootstrapping.md @@ -6,8 +6,8 @@ This document defines the **cloud-init template** that managed VMs are cloned from, and the **control-node** bootstrapping special case. The per-host provisioning pipeline — how a VM is created from this template and handed off to Ansible — is owned by ADR-009. Terraform clones the template defined here; the -template is the base image both for Terraform-managed hosts and for the manually -provisioned control node. +template is the base image for Terraform-managed hosts. The control node (`ubongo`) +is a physical machine installed directly, not cloned from this template (ADR-015). ## Approach: Proxmox cloud-init template @@ -32,10 +32,10 @@ High-level steps: ## VM provisioning (per new host) -Per-host VMs are created by **Terraform**, which clones this template, sets the -cloud-init values (hostname, SSH public key, IP/gateway), and writes the host's -DNS A record. Cloud-init runs at first boot (~30–60 seconds), leaving the VM -reachable via SSH with the ansible user's key. +Per-host VMs are created by **Terraform**, which clones this template and sets the +cloud-init values (hostname, SSH public key, IP/gateway). Cloud-init runs at first +boot (~30–60 seconds), leaving the VM reachable via SSH with the ansible user's key. +Terraform writes no DNS records — the `dns` role owns the internal zone (ADR-009). The full create → inventory → configure pipeline, and the Terraform↔Ansible data contract, are defined in **ADR-009 (provisioning handoff)**. There is no manual diff --git a/docs/runbooks/new-host.md b/docs/runbooks/new-host.md index 0f23f26..7b99266 100644 --- a/docs/runbooks/new-host.md +++ b/docs/runbooks/new-host.md @@ -58,9 +58,9 @@ locals { } ``` -Terraform clones the cloud-init template from Part A, sets the cloud-init values -(hostname, SSH key, IP/gateway), and writes the host's DNS A record. See ADR-009 -for the full handoff and the `vms` output → inventory data contract. +Terraform clones the cloud-init template from Part A and sets the cloud-init values +(hostname, SSH key, IP/gateway). It writes no DNS records — the `dns` role owns the +internal zone. See ADR-009 for the full handoff and the `vms` output → inventory data contract. --- @@ -68,7 +68,7 @@ for the full handoff and the `vms` output → inventory data contract. ```bash make tf-plan TF_ENV=production # review — confirm only the new VM is added -make tf-apply TF_ENV=production # create the VM + write its DNS A record +make tf-apply TF_ENV=production # create the VM (no DNS records written) make tf-inventory TF_ENV=production # regenerate inventories/production/hosts.yml ```