Add implementation plan for ubongo control host

Task-by-task docs plan: author ADR-015 and reconcile ADR-001/005/008/009/012,
the new-host and rotate-secrets runbooks, accepted-risks, STATUS, and CLAUDE.md.
Documentation-only; the physical box stays "designed, not built".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-05 09:29:10 +02:00
parent c1b21c9b2b
commit 0e9f179bfc

View file

@ -0,0 +1,745 @@
# Ubongo Control / AI-Worker Host — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Record the decision to replace the cluster-resident control VM with a dedicated always-on physical host (`ubongo`) outside the Proxmox cluster, by authoring ADR-015 and reconciling every doc that currently assumes the control node is a cluster VM.
**Architecture:** This is a **documentation-only** change. No code, no roles, no inventory data. `ubongo` is recorded as *designed, not built* (per STATUS.md discipline) — the physical box, its OS install, and its inventory wiring are a future manual build, not part of this plan. The work is: one new ADR (the home of record) plus targeted amendments to the ADRs/runbooks/registers that contradict it, each cross-linking ADR-015.
**Tech Stack:** Markdown only. Verification is the repo's pre-commit hooks (trailing-whitespace, end-of-file, gitleaks, ansible-lint, vault-encryption guard) plus manual internal-consistency checks. There is no markdown linter in the toolchain, so "tests" are hook-pass + cross-reference-resolves greps.
---
## Pre-flight (read once before starting)
- **`rbw` must be unlocked before every commit.** The pre-commit ansible-lint hook decrypts `vault.yml`. Run `rbw unlocked` (exit 0 = good); if not, stop and ask the user to `rbw unlock`. Do not start a task you cannot commit.
- **Commit style:** one commit per task, imperative subject ≤72 chars, with the trailer:
```
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
```
- **Order matters:** Task 1 (ADR-015) must land first — every later task links to it.
- **Spec reference:** `docs/superpowers/specs/2026-06-05-ubongo-control-host-design.md`.
---
## File map
| File | Action | Responsibility after change |
|---|---|---|
| `docs/decisions/015-control-host.md` | Create | Home of record for the `ubongo` decision |
| `docs/decisions/001-architecture.md` | Modify | Control node = physical box outside cluster |
| `docs/decisions/005-bootstrapping.md` | Modify | Control-node bootstrap = bare-metal Debian install |
| `docs/decisions/009-provisioning-handoff.md` | Modify | Control-node exception is genuinely physical |
| `docs/decisions/008-testing.md` | Modify | All test levels run on `ubongo`; stub future UI level |
| `docs/decisions/012-hardware-capacity.md` | Modify | `ubongo` is in-scope physical compute |
| `docs/hardware/reference.md` | Modify | `ubongo` row in node-capacity + physical-compute section |
| `docs/runbooks/new-host.md` | Modify | Part E: control node is bare-metal, not `qm clone` |
| `docs/runbooks/rotate-secrets.md` | Modify | Offline break-glass vault-password requirement |
| `docs/security/accepted-risks.md` | Modify | Reserve mesh-VPN coordinator risk (pending VPN choice) |
| `STATUS.md` | Modify | Row: `ubongo` — designed, not built |
| `CLAUDE.md` | Modify | ADR-015 in Further reading; control-group note |
---
### Task 1: Author ADR-015 (the home of record)
**Files:**
- Create: `docs/decisions/015-control-host.md`
- [ ] **Step 1: Create the ADR file**
Create `docs/decisions/015-control-host.md` with exactly this content:
```markdown
# ADR-015 — Control / development / AI-worker host (`ubongo`)
## Context
Earlier ADRs framed the control node — the host that runs Terraform and Ansible —
as a **single Debian 13 VM on the Proxmox cluster**, manually provisioned as the one
documented exception to "Terraform owns VM existence" (ADR-009). That framing treats
the control node purely as a control-plane runner.
It fails four needs, all confirmed as drivers:
1. **Cold-start bootstrap** — the VM that runs Terraform/Ansible cannot exist until
something else creates it; the bootstrap is circular and awkward.
2. **Always-on availability** — the operator wants to SSH in from a work PC or
anywhere to drive Claude Code. A cluster VM is gone whenever the cluster is down
or being rebuilt.
3. **Recovery / disaster** — the tool used to rebuild the cluster must not live
inside the thing it rebuilds.
4. **Dev ergonomics** — a persistent home for Claude Code + the repo, not entangled
with production VM lifecycle.
A laptop-only answer fails always-on and recovery. A VM-only answer fails cold-start
and recovery. A small dedicated always-on physical machine outside the cluster
satisfies all four.
## Decision
Introduce **`ubongo`** (Swahili: *brain*, consistent with the fleet's theme): a
single dedicated x86-64 mini-PC, always-on, living **outside** the Proxmox cluster.
It becomes *the* control node and collapses four roles into one box:
- Terraform + Ansible runner (control plane)
- Claude Code / AI-worker host the operator SSHes into
- Local test runner (Molecule/Docker, lint, and later a browser stack)
- Persistent dev home for the repo
There is **no longer a control VM on the cluster.** The `control` inventory group
points at this physical box. This *strengthens* the ADR-009 control-node exception:
it is genuinely outside Terraform's world, not a VM pretending to be the exception.
Every other host stays a Terraform-managed VM exactly as designed.
`ubongo` runs **plain Debian 13** (the `base` role applies). It is not a hypervisor
and runs no `docker_host` services.
### Hardware target
| Spec | Target | Why |
|---|---|---|
| CPU | 4 cores, x86-64 (Intel N100-class or better) | Molecule containers + Chromium prefer x86 |
| RAM | 16 GB | Docker + headless Chromium + toolchain headroom |
| Disk | 250 GB SSD/NVMe | Docker images, molecule layers, repos, browser cache |
| Network | Wired GbE | Always-on reliability over Wi-Fi |
| Power | Low draw (≤15 W idle) | Runs 24/7 |
Indicative: a refurb Dell/Lenovo/HP micro (USFF) or an N100 mini-PC (~€150250).
Claude Code itself is light (the model runs in Anthropic's cloud); the sizing driver
is **all testing being local** — Molecule (Docker), lint, and a future
headless-Chromium/Playwright stack.
### Provisioning (bootstrap path)
Manual, on bare metal:
1. Install Debian 13 on the box (one-time, by hand).
2. `git clone` the repo; `make setup`; `make collections`; set up `rbw` + unlock.
3. Join the mesh VPN (choice deferred — see below).
4. From then on `ubongo` manages every other host normally; Ansible manages *it* for
baseline config via the `control` group (`base` role only).
### Access & security
- Remote access is via the **mesh VPN** (choice deferred). SSH to `ubongo` over the
mesh; nothing is published to the public internet — this stays inside ADR-002.
- `ubongo` runs the `base` role: SSH hardening, nftables default-deny, fail2ban,
auditd, unattended-upgrades. Inbound SSH is allowed **only on the mesh interface**,
denied on the physical NIC.
### Recovery model
`ubongo` is the rebuild tool, so three things must survive a full cluster loss:
1. **`mamba` (laptop) is a break-glass clone** — repo + toolchain + mesh + `rbw`,
able to drive the fleet if `ubongo` dies.
2. **Terraform state** lives on `ubongo`, backed up encrypted off-box (synced to
`mamba`). For a 25 VM fleet it is also reconstructable via `terraform import`.
3. **Vault password**`ubongo` gets it from Vaultwarden via `rbw`. `rbw` keeps a
local encrypted copy of the vault and decrypts it offline with the operator's
Vaultwarden master password, so `ubongo` can decrypt the Ansible vault with the
whole cluster down — provided `rbw` has synced once and the operator keeps the
Vaultwarden master password offline (memorised + paper in a safe). Mirror onto
`mamba`.
There is always exactly one irreducible offline root secret; here it is the
Vaultwarden master password. Mirroring Vaultwarden onto `ubongo` is rejected: it
would make the control node run a service (against its remit) and still need that
master password.
> verified: rbw offline-cache decryption · TO VERIFY before relying on the recovery
> model · rbw docs · (ADR-014, security-relevant — confirm during build)
## Consequences
- The control node is physical compute outside the cluster, so it appears in
`docs/hardware/reference.md` even though it is not a cluster node (ADR-012).
- All testing (Molecule, lint, staging/external) runs on `ubongo` (ADR-008).
- A future **service-UI acceptance** testing level (Claude driving a headless browser
against a deployed service) is anticipated; `ubongo` is sized for it. The harness
is a separate spec.
## Deferred (separate specs / discussions)
1. **Mesh VPN choice** — Tailscale vs NetBird, hosted vs self-hosted. Recovery
dimension: a hosted coordinator keeps the mesh up when the cluster is down; a
self-hosted coordinator must live off-cluster (on `ubongo`), never on the fleet,
or it recreates the chicken-and-egg.
2. **Browser-E2E verification harness** — Playwright/headless-Chromium, test-user
generation, screenshot-back-to-Claude, and the new ADR-008 level.
3. **`rbw` offline-cache verification** — confirm offline decryption before relying
on it (ADR-014).
## What was ruled out
| Option | Reason |
|---|---|
| Keep control node as a cluster VM | Fails cold-start, recovery, always-on. |
| Laptop-only (`mamba` for everything) | Fails always-on. Retained as break-glass backup. |
| Split roles (control VM + thin jump box) | Two toolchains, split control plane, heavy testing back on a cluster VM. |
| Mirror Vaultwarden onto `ubongo` | Control node would run a service; still needs the master password. |
| Self-hosted mesh coordinator on the cluster | Recreates the chicken-and-egg. |
| Raspberry Pi | Chokes running Docker + Chromium + toolchain together. |
See also: ADR-001 (architecture), ADR-005 (bootstrapping), ADR-008 (testing),
ADR-009 (provisioning handoff), ADR-012 (hardware/capacity), ADR-002 (security).
```
- [ ] **Step 2: Confirm `rbw` is unlocked, then verify hooks pass**
Run: `rbw unlocked && pre-commit run --files docs/decisions/015-control-host.md`
Expected: `rbw` exits 0; hooks report `Passed`/`Skipped` (ansible-lint skips non-YAML; trailing-whitespace + end-of-file Passed).
- [ ] **Step 3: Commit**
```bash
git add docs/decisions/015-control-host.md
git commit -m "Add ADR-015 (control/AI-worker host ubongo)"
```
---
### Task 2: Amend ADR-001 (architecture)
**Files:**
- Modify: `docs/decisions/001-architecture.md`
- [ ] **Step 1: Update the control-node bullet**
Find (lines ~1315):
```markdown
- **Control node**: A dedicated Debian 13 VM on the cluster. Ansible runs from here.
The control node is the one host that cannot fully bootstrap itself from scratch
and requires manual initial setup (see `docs/runbooks/new-host.md`).
```
Replace with:
```markdown
- **Control node**: `ubongo` — a dedicated always-on **physical** x86-64 machine
**outside** the cluster. Ansible runs from here. It cannot be created by the
Terraform it hosts, so it is provisioned manually (see ADR-015 and
`docs/runbooks/new-host.md`).
```
- [ ] **Step 2: Update the VM-existence table row**
Find:
```markdown
| VM existence | Terraform (`terraform/`) | Clones the cloud-init template; control node is the one manual exception (see ADR-009) |
```
Replace with:
```markdown
| VM existence | Terraform (`terraform/`) | Clones the cloud-init template; `ubongo` (control node) is a physical box outside the cluster, the one manual exception (see ADR-009/ADR-015) |
```
- [ ] **Step 3: Update the `control` host-group comment**
Find:
```markdown
├── control # the control node itself — baseline config only, runs no services
```
Replace with:
```markdown
├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services
```
- [ ] **Step 4: Verify and commit**
Run: `rbw unlocked && pre-commit run --files docs/decisions/001-architecture.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/decisions/001-architecture.md
git commit -m "ADR-001: control node is physical ubongo outside cluster"
```
---
### Task 3: Amend ADR-005 (bootstrapping)
**Files:**
- Modify: `docs/decisions/005-bootstrapping.md`
- [ ] **Step 1: Replace the "Control node bootstrapping" section body**
Find (the numbered list under `## Control node bootstrapping`, lines ~5269):
```markdown
The control node is a special case — it runs Terraform and Ansible, so it cannot
be created by the Terraform it hosts (chicken-and-egg). It is the one documented
exception to Terraform-owned VM existence (see ADR-009). The control node requires:
1. Manual VM provisioning — clone this cloud-init template by hand (Proxmox UI or
`qm clone`), since Terraform is not yet available to do it
2. Manual setup of the Ansible environment:
```
Replace with:
```markdown
The control node is a special case — it runs Terraform and Ansible, so it cannot
be created by the Terraform it hosts (chicken-and-egg). It is `ubongo`, a dedicated
**physical** machine outside the cluster, and the one documented exception to
Terraform-owned VM existence (see ADR-009 and ADR-015). The control node requires:
1. Manual OS provisioning — install Debian 13 on the physical box by hand (it is not
a Proxmox guest, so there is no template to clone)
2. Manual setup of the Ansible environment:
```
- [ ] **Step 2: Update the trailing reference to the control node listing**
Find:
```markdown
The control node itself is listed in `inventories/production/hosts.yml` under
a `control` group and can be managed for baseline config (SSH, firewall, updates)
but not for the `docker_host` role (it does not run services).
```
Replace with:
```markdown
`ubongo` is listed in `inventories/production/hosts.yml` under the `control` group
and can be managed for baseline config (SSH, firewall, updates) but not for the
`docker_host` role (it does not run services). Hardware target and recovery model
are in ADR-015.
```
- [ ] **Step 3: Verify and commit**
Run: `rbw unlocked && pre-commit run --files docs/decisions/005-bootstrapping.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/decisions/005-bootstrapping.md
git commit -m "ADR-005: control node bootstrap is bare-metal Debian on ubongo"
```
---
### Task 4: Amend ADR-009 (provisioning handoff)
**Files:**
- Modify: `docs/decisions/009-provisioning-handoff.md`
- [ ] **Step 1: Strengthen the control-node exception section**
Find (under `## The control-node exception`, lines ~129138):
```markdown
The control node — the host that runs Terraform and Ansible — is the one VM
Terraform does **not** create. It cannot provision the infrastructure that would
provision itself (chicken-and-egg). It is therefore the single documented exception
to "Terraform owns VM existence":
- Provisioned and bootstrapped manually, per the control-node section of ADR-005.
- Listed in `inventories/<env>/hosts.yml` under the `control` group, and managed by
Ansible for baseline config only (no `docker_host` role).
```
Replace with:
```markdown
The control node — the host that runs Terraform and Ansible — is `ubongo`, a
dedicated **physical** machine outside the cluster. It is not a VM at all, so
Terraform genuinely never touches it: it cannot provision the infrastructure that
would provision itself (chicken-and-egg). It is therefore the single documented
exception to "Terraform owns VM existence":
- Provisioned and bootstrapped manually on bare metal, per the control-node section
of ADR-005; rationale, hardware, and recovery model in ADR-015.
- Listed in `inventories/<env>/hosts.yml` under the `control` group, and managed by
Ansible for baseline config only (no `docker_host` role).
```
- [ ] **Step 2: Verify and commit**
Run: `rbw unlocked && pre-commit run --files docs/decisions/009-provisioning-handoff.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/decisions/009-provisioning-handoff.md
git commit -m "ADR-009: control-node exception is a physical box, not a VM"
```
---
### Task 5: Amend ADR-008 (testing)
**Files:**
- Modify: `docs/decisions/008-testing.md`
- [ ] **Step 1: Make Level 1 say it runs on `ubongo`**
Find:
```markdown
Runs in Docker on the control node or in CI. Fast (~5 min per role).
```
Replace with:
```markdown
Runs in Docker on the control node (`ubongo`) or in CI. Fast (~5 min per role).
```
- [ ] **Step 2: Add a future service-UI acceptance level stub**
Find (the end of `### Level 3 — External smoke test from askari`, lines ~5155):
```markdown
### Level 3 — External smoke test from askari
Once `askari` is operational: scripted checks from outside the network confirming
that public-facing services respond correctly. Catches firewall and reverse proxy
configuration issues invisible to Ansible check mode.
```
Replace with:
```markdown
### Level 3 — External smoke test from askari
Once `askari` is operational: scripted checks from outside the network confirming
that public-facing services respond correctly. Catches firewall and reverse proxy
configuration issues invisible to Ansible check mode.
### Level 4 — Service-UI acceptance (planned, not built)
Claude drives a headless browser from `ubongo` against a *deployed* service: loads
the rendered UI, creates test users, exercises features, and hands the operator a
manual test script for the rest. Catches application-level regressions that no lower
level sees. The harness (Playwright/headless-Chromium, screenshot-back-to-Claude) is
a **separate spec**; `ubongo` is sized for it (ADR-015). Status: designed, not built
(STATUS.md).
```
- [ ] **Step 3: Verify and commit**
Run: `rbw unlocked && pre-commit run --files docs/decisions/008-testing.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/decisions/008-testing.md
git commit -m "ADR-008: tests run on ubongo; stub Level 4 service-UI acceptance"
```
---
### Task 6: Amend ADR-012 and the hardware reference
**Files:**
- Modify: `docs/decisions/012-hardware-capacity.md`
- Modify: `docs/hardware/reference.md`
- [ ] **Step 1: Note `ubongo` as in-scope physical compute in ADR-012**
In `docs/decisions/012-hardware-capacity.md`, find the first bullet under `## Decision`:
```markdown
- `docs/hardware/reference.md` is the single, hand-maintained source of truth for
physical compute + network gear and workload placement intent. Two
machine-readable tables (node capacity, workload placement) carry the numbers.
```
Replace with:
```markdown
- `docs/hardware/reference.md` is the single, hand-maintained source of truth for
physical compute + network gear and workload placement intent. Two
machine-readable tables (node capacity, workload placement) carry the numbers.
This includes `ubongo`, the physical control node (ADR-015), even though it sits
outside the Proxmox cluster.
```
- [ ] **Step 2: Add `ubongo` to the physical-compute section of the reference**
In `docs/hardware/reference.md`, find:
```markdown
_(repeat for pve1, pve2, askari)_
```
Replace with:
```markdown
### ubongo (control node — outside the cluster)
- **Model / form factor:** _TBD (x86-64 mini-PC / USFF, e.g. N100 or refurb micro)_
- **CPU:** _TBD (target 4 cores, x86-64)_
- **RAM:** _TBD (target 16 GB)_
- **Storage:** _TBD (target 250 GB SSD/NVMe)_
- **NICs:** _wired GbE_
- **Notes:** _always-on; control plane + AI-worker + local test runner (ADR-015); not a Proxmox guest_
_(repeat for pve1, pve2, askari)_
```
- [ ] **Step 3: Add `ubongo` to the machine-readable node-capacity table**
In `docs/hardware/reference.md`, find the node-capacity table:
```markdown
| node | cores | ram_gb | disk_gb |
|------|-------|--------|---------|
| pve0 | 20 | 64 | 4000 |
| pve1 | 20 | 64 | 4000 |
```
Replace with:
```markdown
| node | cores | ram_gb | disk_gb |
|------|-------|--------|---------|
| pve0 | 20 | 64 | 4000 |
| pve1 | 20 | 64 | 4000 |
| ubongo | 4 | 16 | 250 |
```
Note: the header row (`node | cores | ram_gb | disk_gb`) is a parser contract for
`scripts/capacity-scan.py` — only a data row is added, the header is untouched.
- [ ] **Step 4: Verify the capacity scan still parses, hooks pass, then commit**
Run: `python3 scripts/capacity-scan.py 2>&1 | head -c 400`
Expected: it runs without a parse error and the output reflects the new `ubongo` row (no traceback). If the script needs an argument or env, consult its `--help`; a clean exit with JSON is success.
Run: `rbw unlocked && pre-commit run --files docs/decisions/012-hardware-capacity.md docs/hardware/reference.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/decisions/012-hardware-capacity.md docs/hardware/reference.md
git commit -m "ADR-012/hardware: add ubongo as physical control node"
```
---
### Task 7: Update the new-host runbook (Part E)
**Files:**
- Modify: `docs/runbooks/new-host.md`
- [ ] **Step 1: Replace Part E with the bare-metal control-node procedure**
Find the whole `## Part E — Control node (manual exception)` section (lines ~113133), from the heading through the paragraph ending "every other host comes from `make tf-inventory`." Replace it with:
```markdown
## Part E — Control node (`ubongo`, manual exception)
The control node runs Terraform and Ansible, so it cannot be created by the
Terraform it hosts (chicken-and-egg). It is `ubongo`, a dedicated **physical**
machine outside the cluster — not a Proxmox guest. It is the **one** host
provisioned manually. Rationale, hardware target, and recovery model: ADR-015.
1. Install Debian 13 on the physical box by hand (no template to clone).
2. Create the `ansible` user and install its SSH public key.
3. Set up the Ansible environment on it:
```bash
git clone <repo> ~/ansible
cd ~/ansible
make setup # venv + Python deps
make collections # Ansible collections
rbw login && rbw unlock # vault password from Vaultwarden (see rotate-secrets.md)
```
4. Join the mesh VPN (choice deferred — see ADR-015) so it is reachable over SSH
from elsewhere.
5. Add `ubongo` to `inventories/<env>/hosts.yml` under the `control` group.
Because `ubongo` is not in `local.vms`, this is the only case where editing
`hosts.yml` by hand is expected. **Known limitation:** `make tf-inventory`
regenerates `hosts.yml` from Terraform outputs and will overwrite a hand-added
`control` entry — re-add `ubongo` after running it (preserving the control entry in
the generator is tracked separately, not yet built).
```
- [ ] **Step 2: Update the Prerequisites note that assumes a template**
Find:
```markdown
- Proxmox VM template exists (Debian 13 cloud-init image — see below if not)
```
Replace with:
```markdown
- Proxmox VM template exists (Debian 13 cloud-init image — see below if not).
Not needed for the control node `ubongo`, which is bare-metal (Part E).
```
- [ ] **Step 3: Verify and commit**
Run: `rbw unlocked && pre-commit run --files docs/runbooks/new-host.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/runbooks/new-host.md
git commit -m "new-host runbook: control node ubongo is bare-metal"
```
---
### Task 8: Update the rotate-secrets runbook (offline break-glass)
**Files:**
- Modify: `docs/runbooks/rotate-secrets.md`
- [ ] **Step 1: Add a break-glass section after the `rbw` setup section**
Find the end of the `## One-time — \`rbw\` setup on a new machine` section:
```markdown
Once unlocked, `make encrypt/decrypt/check/deploy` and the pre-commit ansible-lint
hook all obtain the password automatically. If the agent is locked you'll see a
clear "run: rbw unlock" error rather than a hang.
```
Replace with:
```markdown
Once unlocked, `make encrypt/decrypt/check/deploy` and the pre-commit ansible-lint
hook all obtain the password automatically. If the agent is locked you'll see a
clear "run: rbw unlock" error rather than a hang.
---
## Break-glass — vault access during a full cluster outage
The control node `ubongo` (ADR-015) is the tool used to rebuild the cluster, so it
must be able to decrypt the vault even when Vaultwarden (if hosted on the cluster)
is down. `rbw` keeps a **local encrypted copy** of the Vaultwarden vault and decrypts
it **offline** with your Vaultwarden master password — no live server needed for
entries it has already synced. The recovery design therefore requires:
- `rbw` on `ubongo` (and on `mamba`, the break-glass laptop) has **synced at least
once** while Vaultwarden was reachable (`rbw sync`).
- Your **Vaultwarden master password** is kept **offline** — in a password manager on
`mamba` and on paper in a safe — independent of any cluster-hosted Vaultwarden.
There is always exactly one irreducible offline root secret; here it is the
Vaultwarden master password. Keep it recoverable without the cluster.
> **To verify (ADR-014, security-relevant):** confirm `rbw` actually decrypts its
> local cache fully offline on your pinned `rbw` version before relying on this.
```
- [ ] **Step 2: Verify and commit**
Run: `rbw unlocked && pre-commit run --files docs/runbooks/rotate-secrets.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/runbooks/rotate-secrets.md
git commit -m "rotate-secrets: document offline vault break-glass for ubongo"
```
---
### Task 9: Reserve the mesh-VPN accepted-risk entry
**Files:**
- Modify: `docs/security/accepted-risks.md`
- [ ] **Step 1: Add R3 to the risk table**
Find the table row for R2:
```markdown
| R2 | **SELinux not used** — no SELinux mandatory access control | AppArmor — Debian-native and enforced via the CIS baseline — already provides MAC; adding SELinux means two MAC systems, non-native to Debian, for no real gain | A service that ships and requires its own SELinux policy; threat model shifts toward targeted attackers |
```
Add immediately **after** it:
```markdown
| R3 | **Mesh-VPN coordinator dependency (pending VPN choice)** — remote SSH to the control node `ubongo` (ADR-015) rides a mesh VPN whose coordination plane may be a third party (e.g. hosted Tailscale/NetBird) | A hosted coordinator keeps the mesh up when the cluster is down, which *helps* recovery; nothing is exposed to the public internet (ADR-002 preserved). Provisional — finalised when the VPN is chosen (separate discussion) | The VPN choice is settled (replace this entry with the concrete decision); a self-hosted coordinator is adopted; the provider's trust/security posture changes |
```
- [ ] **Step 2: Update the "Last reviewed" footer date**
Find:
```markdown
_Last reviewed: 2026-06-04. The prior gaps
```
Replace `2026-06-04` with `2026-06-05` (only the date changes; leave the rest of the sentence intact):
```markdown
_Last reviewed: 2026-06-05. The prior gaps
```
- [ ] **Step 3: Verify and commit**
Run: `rbw unlocked && pre-commit run --files docs/security/accepted-risks.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add docs/security/accepted-risks.md
git commit -m "accepted-risks: reserve R3 mesh-VPN coordinator (pending choice)"
```
---
### Task 10: Add the `ubongo` row to STATUS.md
**Files:**
- Modify: `STATUS.md`
- [ ] **Step 1: Add a row to the "Designed but not built" table**
Find the last row of the `## Designed but not built` table:
```markdown
| Network IDS + security alerting | ADR-002 / TODO 15 | Suricata on OPNsense + AIDE/`auditd`/`fail2ban` alerting into the monitoring stack; not built |
```
Add immediately **after** it:
```markdown
| `ubongo` — physical control / AI-worker host | ADR-015 | Replaces the cluster control VM with a dedicated always-on x86 box outside the cluster. Decision recorded; box not yet acquired/installed, not in inventory. |
```
- [ ] **Step 2: Verify and commit**
Run: `rbw unlocked && pre-commit run --files STATUS.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add STATUS.md
git commit -m "STATUS: record ubongo control host as designed, not built"
```
---
### Task 11: Update CLAUDE.md (index + control-group note)
**Files:**
- Modify: `CLAUDE.md`
- [ ] **Step 1: Add ADR-015 to the Further reading table**
Find:
```markdown
| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` |
```
Replace with:
```markdown
| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` |
| Control / AI-worker host (`ubongo`) | `docs/decisions/015-control-host.md` |
```
- [ ] **Step 2: Update the control-group parenthetical in the Inventory structure section**
Find:
```markdown
(`control` holds the one manually-provisioned control node — see ADR-009.)
```
Replace with:
```markdown
(`control` holds `ubongo`, the one manually-provisioned **physical** control node
outside the cluster — see ADR-009 and ADR-015.)
```
- [ ] **Step 3: Verify and commit**
Run: `rbw unlocked && pre-commit run --files CLAUDE.md`
Expected: hooks `Passed`/`Skipped`.
```bash
git add CLAUDE.md
git commit -m "CLAUDE.md: link ADR-015; note ubongo as physical control node"
```
---
### Task 12: Final consistency sweep
**Files:** none modified (verification only)
- [ ] **Step 1: Confirm no doc still calls the control node a VM**
Run:
```bash
grep -rniE "control node.*(VM|virtual)|dedicated Debian 13 VM" docs/ CLAUDE.md STATUS.md
```
Expected: no hit that *asserts* the control node is a VM. (Hits inside ADR-015's "What was ruled out" table that describe the rejected option are fine.) If any other doc still frames the control node as a VM, fix it the same way as the relevant task above and amend that task's commit.
- [ ] **Step 2: Confirm every ADR-015 cross-link resolves**
Run:
```bash
grep -rl "ADR-015\|015-control-host" docs/ CLAUDE.md STATUS.md
test -f docs/decisions/015-control-host.md && echo "ADR-015 present"
```
Expected: the file exists and the referencing docs (001, 005, 008, 009, 012, runbooks, accepted-risks, STATUS, CLAUDE.md) appear.
- [ ] **Step 3: Full hook run**
Run: `rbw unlocked && pre-commit run --all-files`
Expected: all hooks `Passed`/`Skipped`. Fix anything that fails (most likely trailing whitespace or end-of-file) and amend the owning commit.
- [ ] **Step 4: Push (only if the user asks)**
Per CLAUDE.md, push to `origin` is the off-machine backup. If the user wants it pushed:
```bash
git push origin main
```
---
## Self-review notes (author)
- **Spec coverage:** every spec section maps to a task — host decision/hardware/bootstrap/access/recovery → Task 1 (ADR-015); the doc-changes table → Tasks 211; testing implication → Task 5; deferrals are recorded in ADR-015 and not implemented here (correct — they are separate specs). ✓
- **Not in scope (intentional):** acquiring/installing the box, mesh-VPN selection, the browser harness, adding `ubongo` to live inventory, and modifying `tf_to_inventory.py` to preserve the control entry (logged as a known limitation in Task 7). ✓
- **No placeholders:** every edit shows exact find/replace text; the only `_TBD_` strings are deliberate hardware-reference skeleton fields matching that file's existing style. ✓
```