diff --git a/docs/superpowers/plans/2026-06-05-ubongo-control-host.md b/docs/superpowers/plans/2026-06-05-ubongo-control-host.md new file mode 100644 index 0000000..0a1807f --- /dev/null +++ b/docs/superpowers/plans/2026-06-05-ubongo-control-host.md @@ -0,0 +1,745 @@ +# Ubongo Control / AI-Worker Host — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Record the decision to replace the cluster-resident control VM with a dedicated always-on physical host (`ubongo`) outside the Proxmox cluster, by authoring ADR-015 and reconciling every doc that currently assumes the control node is a cluster VM. + +**Architecture:** This is a **documentation-only** change. No code, no roles, no inventory data. `ubongo` is recorded as *designed, not built* (per STATUS.md discipline) — the physical box, its OS install, and its inventory wiring are a future manual build, not part of this plan. The work is: one new ADR (the home of record) plus targeted amendments to the ADRs/runbooks/registers that contradict it, each cross-linking ADR-015. + +**Tech Stack:** Markdown only. Verification is the repo's pre-commit hooks (trailing-whitespace, end-of-file, gitleaks, ansible-lint, vault-encryption guard) plus manual internal-consistency checks. There is no markdown linter in the toolchain, so "tests" are hook-pass + cross-reference-resolves greps. + +--- + +## Pre-flight (read once before starting) + +- **`rbw` must be unlocked before every commit.** The pre-commit ansible-lint hook decrypts `vault.yml`. Run `rbw unlocked` (exit 0 = good); if not, stop and ask the user to `rbw unlock`. Do not start a task you cannot commit. +- **Commit style:** one commit per task, imperative subject ≤72 chars, with the trailer: + ``` + Co-Authored-By: Claude Opus 4.8 (1M context) + ``` +- **Order matters:** Task 1 (ADR-015) must land first — every later task links to it. +- **Spec reference:** `docs/superpowers/specs/2026-06-05-ubongo-control-host-design.md`. + +--- + +## File map + +| File | Action | Responsibility after change | +|---|---|---| +| `docs/decisions/015-control-host.md` | Create | Home of record for the `ubongo` decision | +| `docs/decisions/001-architecture.md` | Modify | Control node = physical box outside cluster | +| `docs/decisions/005-bootstrapping.md` | Modify | Control-node bootstrap = bare-metal Debian install | +| `docs/decisions/009-provisioning-handoff.md` | Modify | Control-node exception is genuinely physical | +| `docs/decisions/008-testing.md` | Modify | All test levels run on `ubongo`; stub future UI level | +| `docs/decisions/012-hardware-capacity.md` | Modify | `ubongo` is in-scope physical compute | +| `docs/hardware/reference.md` | Modify | `ubongo` row in node-capacity + physical-compute section | +| `docs/runbooks/new-host.md` | Modify | Part E: control node is bare-metal, not `qm clone` | +| `docs/runbooks/rotate-secrets.md` | Modify | Offline break-glass vault-password requirement | +| `docs/security/accepted-risks.md` | Modify | Reserve mesh-VPN coordinator risk (pending VPN choice) | +| `STATUS.md` | Modify | Row: `ubongo` — designed, not built | +| `CLAUDE.md` | Modify | ADR-015 in Further reading; control-group note | + +--- + +### Task 1: Author ADR-015 (the home of record) + +**Files:** +- Create: `docs/decisions/015-control-host.md` + +- [ ] **Step 1: Create the ADR file** + +Create `docs/decisions/015-control-host.md` with exactly this content: + +```markdown +# ADR-015 — Control / development / AI-worker host (`ubongo`) + +## Context + +Earlier ADRs framed the control node — the host that runs Terraform and Ansible — +as a **single Debian 13 VM on the Proxmox cluster**, manually provisioned as the one +documented exception to "Terraform owns VM existence" (ADR-009). That framing treats +the control node purely as a control-plane runner. + +It fails four needs, all confirmed as drivers: + +1. **Cold-start bootstrap** — the VM that runs Terraform/Ansible cannot exist until + something else creates it; the bootstrap is circular and awkward. +2. **Always-on availability** — the operator wants to SSH in from a work PC or + anywhere to drive Claude Code. A cluster VM is gone whenever the cluster is down + or being rebuilt. +3. **Recovery / disaster** — the tool used to rebuild the cluster must not live + inside the thing it rebuilds. +4. **Dev ergonomics** — a persistent home for Claude Code + the repo, not entangled + with production VM lifecycle. + +A laptop-only answer fails always-on and recovery. A VM-only answer fails cold-start +and recovery. A small dedicated always-on physical machine outside the cluster +satisfies all four. + +## Decision + +Introduce **`ubongo`** (Swahili: *brain*, consistent with the fleet's theme): a +single dedicated x86-64 mini-PC, always-on, living **outside** the Proxmox cluster. +It becomes *the* control node and collapses four roles into one box: + +- Terraform + Ansible runner (control plane) +- Claude Code / AI-worker host the operator SSHes into +- Local test runner (Molecule/Docker, lint, and later a browser stack) +- Persistent dev home for the repo + +There is **no longer a control VM on the cluster.** The `control` inventory group +points at this physical box. This *strengthens* the ADR-009 control-node exception: +it is genuinely outside Terraform's world, not a VM pretending to be the exception. +Every other host stays a Terraform-managed VM exactly as designed. + +`ubongo` runs **plain Debian 13** (the `base` role applies). It is not a hypervisor +and runs no `docker_host` services. + +### Hardware target + +| Spec | Target | Why | +|---|---|---| +| CPU | 4 cores, x86-64 (Intel N100-class or better) | Molecule containers + Chromium prefer x86 | +| RAM | 16 GB | Docker + headless Chromium + toolchain headroom | +| Disk | 250 GB SSD/NVMe | Docker images, molecule layers, repos, browser cache | +| Network | Wired GbE | Always-on reliability over Wi-Fi | +| Power | Low draw (≤15 W idle) | Runs 24/7 | + +Indicative: a refurb Dell/Lenovo/HP micro (USFF) or an N100 mini-PC (~€150–250). +Claude Code itself is light (the model runs in Anthropic's cloud); the sizing driver +is **all testing being local** — Molecule (Docker), lint, and a future +headless-Chromium/Playwright stack. + +### Provisioning (bootstrap path) + +Manual, on bare metal: + +1. Install Debian 13 on the box (one-time, by hand). +2. `git clone` the repo; `make setup`; `make collections`; set up `rbw` + unlock. +3. Join the mesh VPN (choice deferred — see below). +4. From then on `ubongo` manages every other host normally; Ansible manages *it* for + baseline config via the `control` group (`base` role only). + +### Access & security + +- Remote access is via the **mesh VPN** (choice deferred). SSH to `ubongo` over the + mesh; nothing is published to the public internet — this stays inside ADR-002. +- `ubongo` runs the `base` role: SSH hardening, nftables default-deny, fail2ban, + auditd, unattended-upgrades. Inbound SSH is allowed **only on the mesh interface**, + denied on the physical NIC. + +### Recovery model + +`ubongo` is the rebuild tool, so three things must survive a full cluster loss: + +1. **`mamba` (laptop) is a break-glass clone** — repo + toolchain + mesh + `rbw`, + able to drive the fleet if `ubongo` dies. +2. **Terraform state** lives on `ubongo`, backed up encrypted off-box (synced to + `mamba`). For a 2–5 VM fleet it is also reconstructable via `terraform import`. +3. **Vault password** — `ubongo` gets it from Vaultwarden via `rbw`. `rbw` keeps a + local encrypted copy of the vault and decrypts it offline with the operator's + Vaultwarden master password, so `ubongo` can decrypt the Ansible vault with the + whole cluster down — provided `rbw` has synced once and the operator keeps the + Vaultwarden master password offline (memorised + paper in a safe). Mirror onto + `mamba`. + +There is always exactly one irreducible offline root secret; here it is the +Vaultwarden master password. Mirroring Vaultwarden onto `ubongo` is rejected: it +would make the control node run a service (against its remit) and still need that +master password. + +> verified: rbw offline-cache decryption · TO VERIFY before relying on the recovery +> model · rbw docs · (ADR-014, security-relevant — confirm during build) + +## Consequences + +- The control node is physical compute outside the cluster, so it appears in + `docs/hardware/reference.md` even though it is not a cluster node (ADR-012). +- All testing (Molecule, lint, staging/external) runs on `ubongo` (ADR-008). +- A future **service-UI acceptance** testing level (Claude driving a headless browser + against a deployed service) is anticipated; `ubongo` is sized for it. The harness + is a separate spec. + +## Deferred (separate specs / discussions) + +1. **Mesh VPN choice** — Tailscale vs NetBird, hosted vs self-hosted. Recovery + dimension: a hosted coordinator keeps the mesh up when the cluster is down; a + self-hosted coordinator must live off-cluster (on `ubongo`), never on the fleet, + or it recreates the chicken-and-egg. +2. **Browser-E2E verification harness** — Playwright/headless-Chromium, test-user + generation, screenshot-back-to-Claude, and the new ADR-008 level. +3. **`rbw` offline-cache verification** — confirm offline decryption before relying + on it (ADR-014). + +## What was ruled out + +| Option | Reason | +|---|---| +| Keep control node as a cluster VM | Fails cold-start, recovery, always-on. | +| Laptop-only (`mamba` for everything) | Fails always-on. Retained as break-glass backup. | +| Split roles (control VM + thin jump box) | Two toolchains, split control plane, heavy testing back on a cluster VM. | +| Mirror Vaultwarden onto `ubongo` | Control node would run a service; still needs the master password. | +| Self-hosted mesh coordinator on the cluster | Recreates the chicken-and-egg. | +| Raspberry Pi | Chokes running Docker + Chromium + toolchain together. | + +See also: ADR-001 (architecture), ADR-005 (bootstrapping), ADR-008 (testing), +ADR-009 (provisioning handoff), ADR-012 (hardware/capacity), ADR-002 (security). +``` + +- [ ] **Step 2: Confirm `rbw` is unlocked, then verify hooks pass** + +Run: `rbw unlocked && pre-commit run --files docs/decisions/015-control-host.md` +Expected: `rbw` exits 0; hooks report `Passed`/`Skipped` (ansible-lint skips non-YAML; trailing-whitespace + end-of-file Passed). + +- [ ] **Step 3: Commit** + +```bash +git add docs/decisions/015-control-host.md +git commit -m "Add ADR-015 (control/AI-worker host ubongo)" +``` + +--- + +### Task 2: Amend ADR-001 (architecture) + +**Files:** +- Modify: `docs/decisions/001-architecture.md` + +- [ ] **Step 1: Update the control-node bullet** + +Find (lines ~13–15): +```markdown +- **Control node**: A dedicated Debian 13 VM on the cluster. Ansible runs from here. + The control node is the one host that cannot fully bootstrap itself from scratch + and requires manual initial setup (see `docs/runbooks/new-host.md`). +``` +Replace with: +```markdown +- **Control node**: `ubongo` — a dedicated always-on **physical** x86-64 machine + **outside** the cluster. Ansible runs from here. It cannot be created by the + Terraform it hosts, so it is provisioned manually (see ADR-015 and + `docs/runbooks/new-host.md`). +``` + +- [ ] **Step 2: Update the VM-existence table row** + +Find: +```markdown +| VM existence | Terraform (`terraform/`) | Clones the cloud-init template; control node is the one manual exception (see ADR-009) | +``` +Replace with: +```markdown +| VM existence | Terraform (`terraform/`) | Clones the cloud-init template; `ubongo` (control node) is a physical box outside the cluster, the one manual exception (see ADR-009/ADR-015) | +``` + +- [ ] **Step 3: Update the `control` host-group comment** + +Find: +```markdown +├── control # the control node itself — baseline config only, runs no services +``` +Replace with: +```markdown +├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services +``` + +- [ ] **Step 4: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files docs/decisions/001-architecture.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/decisions/001-architecture.md +git commit -m "ADR-001: control node is physical ubongo outside cluster" +``` + +--- + +### Task 3: Amend ADR-005 (bootstrapping) + +**Files:** +- Modify: `docs/decisions/005-bootstrapping.md` + +- [ ] **Step 1: Replace the "Control node bootstrapping" section body** + +Find (the numbered list under `## Control node bootstrapping`, lines ~52–69): +```markdown +The control node is a special case — it runs Terraform and Ansible, so it cannot +be created by the Terraform it hosts (chicken-and-egg). It is the one documented +exception to Terraform-owned VM existence (see ADR-009). The control node requires: + +1. Manual VM provisioning — clone this cloud-init template by hand (Proxmox UI or + `qm clone`), since Terraform is not yet available to do it +2. Manual setup of the Ansible environment: +``` +Replace with: +```markdown +The control node is a special case — it runs Terraform and Ansible, so it cannot +be created by the Terraform it hosts (chicken-and-egg). It is `ubongo`, a dedicated +**physical** machine outside the cluster, and the one documented exception to +Terraform-owned VM existence (see ADR-009 and ADR-015). The control node requires: + +1. Manual OS provisioning — install Debian 13 on the physical box by hand (it is not + a Proxmox guest, so there is no template to clone) +2. Manual setup of the Ansible environment: +``` + +- [ ] **Step 2: Update the trailing reference to the control node listing** + +Find: +```markdown +The control node itself is listed in `inventories/production/hosts.yml` under +a `control` group and can be managed for baseline config (SSH, firewall, updates) +but not for the `docker_host` role (it does not run services). +``` +Replace with: +```markdown +`ubongo` is listed in `inventories/production/hosts.yml` under the `control` group +and can be managed for baseline config (SSH, firewall, updates) but not for the +`docker_host` role (it does not run services). Hardware target and recovery model +are in ADR-015. +``` + +- [ ] **Step 3: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files docs/decisions/005-bootstrapping.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/decisions/005-bootstrapping.md +git commit -m "ADR-005: control node bootstrap is bare-metal Debian on ubongo" +``` + +--- + +### Task 4: Amend ADR-009 (provisioning handoff) + +**Files:** +- Modify: `docs/decisions/009-provisioning-handoff.md` + +- [ ] **Step 1: Strengthen the control-node exception section** + +Find (under `## The control-node exception`, lines ~129–138): +```markdown +The control node — the host that runs Terraform and Ansible — is the one VM +Terraform does **not** create. It cannot provision the infrastructure that would +provision itself (chicken-and-egg). It is therefore the single documented exception +to "Terraform owns VM existence": + +- Provisioned and bootstrapped manually, per the control-node section of ADR-005. +- Listed in `inventories//hosts.yml` under the `control` group, and managed by + Ansible for baseline config only (no `docker_host` role). +``` +Replace with: +```markdown +The control node — the host that runs Terraform and Ansible — is `ubongo`, a +dedicated **physical** machine outside the cluster. It is not a VM at all, so +Terraform genuinely never touches it: it cannot provision the infrastructure that +would provision itself (chicken-and-egg). It is therefore the single documented +exception to "Terraform owns VM existence": + +- Provisioned and bootstrapped manually on bare metal, per the control-node section + of ADR-005; rationale, hardware, and recovery model in ADR-015. +- Listed in `inventories//hosts.yml` under the `control` group, and managed by + Ansible for baseline config only (no `docker_host` role). +``` + +- [ ] **Step 2: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files docs/decisions/009-provisioning-handoff.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/decisions/009-provisioning-handoff.md +git commit -m "ADR-009: control-node exception is a physical box, not a VM" +``` + +--- + +### Task 5: Amend ADR-008 (testing) + +**Files:** +- Modify: `docs/decisions/008-testing.md` + +- [ ] **Step 1: Make Level 1 say it runs on `ubongo`** + +Find: +```markdown +Runs in Docker on the control node or in CI. Fast (~5 min per role). +``` +Replace with: +```markdown +Runs in Docker on the control node (`ubongo`) or in CI. Fast (~5 min per role). +``` + +- [ ] **Step 2: Add a future service-UI acceptance level stub** + +Find (the end of `### Level 3 — External smoke test from askari`, lines ~51–55): +```markdown +### Level 3 — External smoke test from askari + +Once `askari` is operational: scripted checks from outside the network confirming +that public-facing services respond correctly. Catches firewall and reverse proxy +configuration issues invisible to Ansible check mode. +``` +Replace with: +```markdown +### Level 3 — External smoke test from askari + +Once `askari` is operational: scripted checks from outside the network confirming +that public-facing services respond correctly. Catches firewall and reverse proxy +configuration issues invisible to Ansible check mode. + +### Level 4 — Service-UI acceptance (planned, not built) + +Claude drives a headless browser from `ubongo` against a *deployed* service: loads +the rendered UI, creates test users, exercises features, and hands the operator a +manual test script for the rest. Catches application-level regressions that no lower +level sees. The harness (Playwright/headless-Chromium, screenshot-back-to-Claude) is +a **separate spec**; `ubongo` is sized for it (ADR-015). Status: designed, not built +(STATUS.md). +``` + +- [ ] **Step 3: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files docs/decisions/008-testing.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/decisions/008-testing.md +git commit -m "ADR-008: tests run on ubongo; stub Level 4 service-UI acceptance" +``` + +--- + +### Task 6: Amend ADR-012 and the hardware reference + +**Files:** +- Modify: `docs/decisions/012-hardware-capacity.md` +- Modify: `docs/hardware/reference.md` + +- [ ] **Step 1: Note `ubongo` as in-scope physical compute in ADR-012** + +In `docs/decisions/012-hardware-capacity.md`, find the first bullet under `## Decision`: +```markdown +- `docs/hardware/reference.md` is the single, hand-maintained source of truth for + physical compute + network gear and workload placement intent. Two + machine-readable tables (node capacity, workload placement) carry the numbers. +``` +Replace with: +```markdown +- `docs/hardware/reference.md` is the single, hand-maintained source of truth for + physical compute + network gear and workload placement intent. Two + machine-readable tables (node capacity, workload placement) carry the numbers. + This includes `ubongo`, the physical control node (ADR-015), even though it sits + outside the Proxmox cluster. +``` + +- [ ] **Step 2: Add `ubongo` to the physical-compute section of the reference** + +In `docs/hardware/reference.md`, find: +```markdown +_(repeat for pve1, pve2, askari)_ +``` +Replace with: +```markdown +### ubongo (control node — outside the cluster) +- **Model / form factor:** _TBD (x86-64 mini-PC / USFF, e.g. N100 or refurb micro)_ +- **CPU:** _TBD (target 4 cores, x86-64)_ +- **RAM:** _TBD (target 16 GB)_ +- **Storage:** _TBD (target 250 GB SSD/NVMe)_ +- **NICs:** _wired GbE_ +- **Notes:** _always-on; control plane + AI-worker + local test runner (ADR-015); not a Proxmox guest_ + +_(repeat for pve1, pve2, askari)_ +``` + +- [ ] **Step 3: Add `ubongo` to the machine-readable node-capacity table** + +In `docs/hardware/reference.md`, find the node-capacity table: +```markdown +| node | cores | ram_gb | disk_gb | +|------|-------|--------|---------| +| pve0 | 20 | 64 | 4000 | +| pve1 | 20 | 64 | 4000 | +``` +Replace with: +```markdown +| node | cores | ram_gb | disk_gb | +|------|-------|--------|---------| +| pve0 | 20 | 64 | 4000 | +| pve1 | 20 | 64 | 4000 | +| ubongo | 4 | 16 | 250 | +``` + +Note: the header row (`node | cores | ram_gb | disk_gb`) is a parser contract for +`scripts/capacity-scan.py` — only a data row is added, the header is untouched. + +- [ ] **Step 4: Verify the capacity scan still parses, hooks pass, then commit** + +Run: `python3 scripts/capacity-scan.py 2>&1 | head -c 400` +Expected: it runs without a parse error and the output reflects the new `ubongo` row (no traceback). If the script needs an argument or env, consult its `--help`; a clean exit with JSON is success. + +Run: `rbw unlocked && pre-commit run --files docs/decisions/012-hardware-capacity.md docs/hardware/reference.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/decisions/012-hardware-capacity.md docs/hardware/reference.md +git commit -m "ADR-012/hardware: add ubongo as physical control node" +``` + +--- + +### Task 7: Update the new-host runbook (Part E) + +**Files:** +- Modify: `docs/runbooks/new-host.md` + +- [ ] **Step 1: Replace Part E with the bare-metal control-node procedure** + +Find the whole `## Part E — Control node (manual exception)` section (lines ~113–133), from the heading through the paragraph ending "every other host comes from `make tf-inventory`." Replace it with: +```markdown +## Part E — Control node (`ubongo`, manual exception) + +The control node runs Terraform and Ansible, so it cannot be created by the +Terraform it hosts (chicken-and-egg). It is `ubongo`, a dedicated **physical** +machine outside the cluster — not a Proxmox guest. It is the **one** host +provisioned manually. Rationale, hardware target, and recovery model: ADR-015. + +1. Install Debian 13 on the physical box by hand (no template to clone). +2. Create the `ansible` user and install its SSH public key. +3. Set up the Ansible environment on it: + ```bash + git clone ~/ansible + cd ~/ansible + make setup # venv + Python deps + make collections # Ansible collections + rbw login && rbw unlock # vault password from Vaultwarden (see rotate-secrets.md) + ``` +4. Join the mesh VPN (choice deferred — see ADR-015) so it is reachable over SSH + from elsewhere. +5. Add `ubongo` to `inventories//hosts.yml` under the `control` group. + +Because `ubongo` is not in `local.vms`, this is the only case where editing +`hosts.yml` by hand is expected. **Known limitation:** `make tf-inventory` +regenerates `hosts.yml` from Terraform outputs and will overwrite a hand-added +`control` entry — re-add `ubongo` after running it (preserving the control entry in +the generator is tracked separately, not yet built). +``` + +- [ ] **Step 2: Update the Prerequisites note that assumes a template** + +Find: +```markdown +- Proxmox VM template exists (Debian 13 cloud-init image — see below if not) +``` +Replace with: +```markdown +- Proxmox VM template exists (Debian 13 cloud-init image — see below if not). + Not needed for the control node `ubongo`, which is bare-metal (Part E). +``` + +- [ ] **Step 3: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files docs/runbooks/new-host.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/runbooks/new-host.md +git commit -m "new-host runbook: control node ubongo is bare-metal" +``` + +--- + +### Task 8: Update the rotate-secrets runbook (offline break-glass) + +**Files:** +- Modify: `docs/runbooks/rotate-secrets.md` + +- [ ] **Step 1: Add a break-glass section after the `rbw` setup section** + +Find the end of the `## One-time — \`rbw\` setup on a new machine` section: +```markdown +Once unlocked, `make encrypt/decrypt/check/deploy` and the pre-commit ansible-lint +hook all obtain the password automatically. If the agent is locked you'll see a +clear "run: rbw unlock" error rather than a hang. +``` +Replace with: +```markdown +Once unlocked, `make encrypt/decrypt/check/deploy` and the pre-commit ansible-lint +hook all obtain the password automatically. If the agent is locked you'll see a +clear "run: rbw unlock" error rather than a hang. + +--- + +## Break-glass — vault access during a full cluster outage + +The control node `ubongo` (ADR-015) is the tool used to rebuild the cluster, so it +must be able to decrypt the vault even when Vaultwarden (if hosted on the cluster) +is down. `rbw` keeps a **local encrypted copy** of the Vaultwarden vault and decrypts +it **offline** with your Vaultwarden master password — no live server needed for +entries it has already synced. The recovery design therefore requires: + +- `rbw` on `ubongo` (and on `mamba`, the break-glass laptop) has **synced at least + once** while Vaultwarden was reachable (`rbw sync`). +- Your **Vaultwarden master password** is kept **offline** — in a password manager on + `mamba` and on paper in a safe — independent of any cluster-hosted Vaultwarden. + +There is always exactly one irreducible offline root secret; here it is the +Vaultwarden master password. Keep it recoverable without the cluster. + +> **To verify (ADR-014, security-relevant):** confirm `rbw` actually decrypts its +> local cache fully offline on your pinned `rbw` version before relying on this. +``` + +- [ ] **Step 2: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files docs/runbooks/rotate-secrets.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/runbooks/rotate-secrets.md +git commit -m "rotate-secrets: document offline vault break-glass for ubongo" +``` + +--- + +### Task 9: Reserve the mesh-VPN accepted-risk entry + +**Files:** +- Modify: `docs/security/accepted-risks.md` + +- [ ] **Step 1: Add R3 to the risk table** + +Find the table row for R2: +```markdown +| R2 | **SELinux not used** — no SELinux mandatory access control | AppArmor — Debian-native and enforced via the CIS baseline — already provides MAC; adding SELinux means two MAC systems, non-native to Debian, for no real gain | A service that ships and requires its own SELinux policy; threat model shifts toward targeted attackers | +``` +Add immediately **after** it: +```markdown +| R3 | **Mesh-VPN coordinator dependency (pending VPN choice)** — remote SSH to the control node `ubongo` (ADR-015) rides a mesh VPN whose coordination plane may be a third party (e.g. hosted Tailscale/NetBird) | A hosted coordinator keeps the mesh up when the cluster is down, which *helps* recovery; nothing is exposed to the public internet (ADR-002 preserved). Provisional — finalised when the VPN is chosen (separate discussion) | The VPN choice is settled (replace this entry with the concrete decision); a self-hosted coordinator is adopted; the provider's trust/security posture changes | +``` + +- [ ] **Step 2: Update the "Last reviewed" footer date** + +Find: +```markdown +_Last reviewed: 2026-06-04. The prior gaps +``` +Replace `2026-06-04` with `2026-06-05` (only the date changes; leave the rest of the sentence intact): +```markdown +_Last reviewed: 2026-06-05. The prior gaps +``` + +- [ ] **Step 3: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files docs/security/accepted-risks.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add docs/security/accepted-risks.md +git commit -m "accepted-risks: reserve R3 mesh-VPN coordinator (pending choice)" +``` + +--- + +### Task 10: Add the `ubongo` row to STATUS.md + +**Files:** +- Modify: `STATUS.md` + +- [ ] **Step 1: Add a row to the "Designed but not built" table** + +Find the last row of the `## Designed but not built` table: +```markdown +| Network IDS + security alerting | ADR-002 / TODO 15 | Suricata on OPNsense + AIDE/`auditd`/`fail2ban` alerting into the monitoring stack; not built | +``` +Add immediately **after** it: +```markdown +| `ubongo` — physical control / AI-worker host | ADR-015 | Replaces the cluster control VM with a dedicated always-on x86 box outside the cluster. Decision recorded; box not yet acquired/installed, not in inventory. | +``` + +- [ ] **Step 2: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files STATUS.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add STATUS.md +git commit -m "STATUS: record ubongo control host as designed, not built" +``` + +--- + +### Task 11: Update CLAUDE.md (index + control-group note) + +**Files:** +- Modify: `CLAUDE.md` + +- [ ] **Step 1: Add ADR-015 to the Further reading table** + +Find: +```markdown +| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` | +``` +Replace with: +```markdown +| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` | +| Control / AI-worker host (`ubongo`) | `docs/decisions/015-control-host.md` | +``` + +- [ ] **Step 2: Update the control-group parenthetical in the Inventory structure section** + +Find: +```markdown +(`control` holds the one manually-provisioned control node — see ADR-009.) +``` +Replace with: +```markdown +(`control` holds `ubongo`, the one manually-provisioned **physical** control node +outside the cluster — see ADR-009 and ADR-015.) +``` + +- [ ] **Step 3: Verify and commit** + +Run: `rbw unlocked && pre-commit run --files CLAUDE.md` +Expected: hooks `Passed`/`Skipped`. +```bash +git add CLAUDE.md +git commit -m "CLAUDE.md: link ADR-015; note ubongo as physical control node" +``` + +--- + +### Task 12: Final consistency sweep + +**Files:** none modified (verification only) + +- [ ] **Step 1: Confirm no doc still calls the control node a VM** + +Run: +```bash +grep -rniE "control node.*(VM|virtual)|dedicated Debian 13 VM" docs/ CLAUDE.md STATUS.md +``` +Expected: no hit that *asserts* the control node is a VM. (Hits inside ADR-015's "What was ruled out" table that describe the rejected option are fine.) If any other doc still frames the control node as a VM, fix it the same way as the relevant task above and amend that task's commit. + +- [ ] **Step 2: Confirm every ADR-015 cross-link resolves** + +Run: +```bash +grep -rl "ADR-015\|015-control-host" docs/ CLAUDE.md STATUS.md +test -f docs/decisions/015-control-host.md && echo "ADR-015 present" +``` +Expected: the file exists and the referencing docs (001, 005, 008, 009, 012, runbooks, accepted-risks, STATUS, CLAUDE.md) appear. + +- [ ] **Step 3: Full hook run** + +Run: `rbw unlocked && pre-commit run --all-files` +Expected: all hooks `Passed`/`Skipped`. Fix anything that fails (most likely trailing whitespace or end-of-file) and amend the owning commit. + +- [ ] **Step 4: Push (only if the user asks)** + +Per CLAUDE.md, push to `origin` is the off-machine backup. If the user wants it pushed: +```bash +git push origin main +``` + +--- + +## Self-review notes (author) + +- **Spec coverage:** every spec section maps to a task — host decision/hardware/bootstrap/access/recovery → Task 1 (ADR-015); the doc-changes table → Tasks 2–11; testing implication → Task 5; deferrals are recorded in ADR-015 and not implemented here (correct — they are separate specs). ✓ +- **Not in scope (intentional):** acquiring/installing the box, mesh-VPN selection, the browser harness, adding `ubongo` to live inventory, and modifying `tf_to_inventory.py` to preserve the control entry (logged as a known limitation in Task 7). ✓ +- **No placeholders:** every edit shows exact find/replace text; the only `_TBD_` strings are deliberate hardware-reference skeleton fields matching that file's existing style. ✓ +```