boma/docs/superpowers/plans/2026-06-11-ubongo-build.md
sjat b9daf2a0ad plan: record ubongo build outcome (done/deferred/follow-ups)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:33:18 +02:00

150 lines
8.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Ubongo Physical Build — Implementation Plan
> **For agentic workers:** Execute task-by-task. This is the **physical bring-up** of
> `ubongo`. The 2026-06-05 plan (`2026-06-05-ubongo-control-host.md`) was
> *documentation-only* (it authored ADR-015); this is its sequel — taking the actual
> box from bare Debian 13 to a working control / AI-worker node.
**Goal:** Bring the Lenovo ThinkCentre M70q from a fresh Debian 13 install to a working
control node: toolchain, dedicated `claude` identity, repo + Claude Code, vault access,
inventory wiring, keys-only SSH, and reconciliation of the docs to "built."
**Spec / decisions of record:** ADR-015 + `docs/superpowers/specs/2026-06-05-ubongo-control-host-design.md`,
plus the interactive build decisions captured below (2026-06-11 session).
---
## Decisions made this session (2026-06-11)
- **Hardware:** Lenovo ThinkCentre M70q Tiny · i3-10100T (4c/8t) · 16 GB · 256 GB
SanDisk X600 SATA SSD (TCG **Opal**-capable; Opal **unused**, see encryption).
- **BIOS:** auto-power-on after loss; Wake-on-LAN on; ErP/deep-S5 off; **supervisor
password set**; external/USB + PXE boot **disabled**; Secure Boot on; TPM (PTT) on;
VT-x/VT-d on; Better-Thermal cooling.
- **Disk encryption: NONE.** Accepted risk — compensated by physical security + BIOS
supervisor password + disabled external boot. Recorded in `accepted-risks.md` (Task H1).
- **Partitioning:** simple single ext4 root (`/dev/sda2`, 221 G) + 12 G swap, no LVM.
Revisit via reinstall onto LVM/bigger drive only if the layout bites.
- **Identity:** dedicated **`claude`** user — for **attribution + revocation, not
containment**. In the `docker` group (Molecule); **no local sudo** (boma deploys run
over SSH as `ansible`; the agent needs Docker, not root). Reached via `sudo -iu claude`
from `sjat`. Own `ed25519` key for Forgejo. ADR-021 leaves this identity open — note it.
- **Access:** LAN SSH only for now — the NetBird mesh (ADR-016) is deferred (`askari` +
service machinery unbuilt). Keys-only enforced after bootstrap.
- **Address:** `10.20.10.151/24` on `eno1`. Make stable via an OPNsense DHCP reservation.
**Pinned versions (match `fisi`):** docker 29.5.2 · rbw 1.15.0 · node 20.19.2 ·
claude 2.1.173. Terraform is absent on `fisi` (TF un-init'd) — install deferred.
---
## Pre-flight
- **Temp passwordless sudo** for `sjat` during the build (`/etc/sudoers.d/99-boma-build`);
**removed in Task F2**. Without it, non-interactive SSH `sudo` hangs.
- **`rbw unlock`** on `fisi` before any commit (pre-commit decrypts `vault.yml`).
- **Commit style:** one commit per logical unit; imperative subject ≤72 chars; trailer
`Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>`.
- Drive the live box (`ubongo`) directly over SSH; do repo/doc tasks (H) as clean commits.
---
## Stage A — Toolchain (on `ubongo`, via `sjat` sudo)
- [ ] **A1.** apt base: `git make build-essential python3-venv python3-pip curl
ca-certificates gnupg jq` (+ `apt update`).
- [ ] **A2.** Docker Engine from Docker's official apt repo (Debian 13/trixie); enable +
start; confirm `docker --version` ≈ 29.5.2.
- [ ] **A3.** `rbw` 1.15.0 — try `apt install rbw`; if the version doesn't match, install
the pinned release binary to `/usr/local/bin` (match `fisi`).
- [ ] **A4.** Node 20.19.2 (nodesource or distro) — only if Claude Code needs it; the
native installer bundles its runtime, so Node may be optional.
- [ ] **A5.** Claude Code via the **native installer** (matches `fisi`'s
`~/.local/share/claude/versions/`), installed under the `claude` user in Stage C.
- [ ] Defer Terraform (absent on `fisi`).
## Stage B — Identity (`claude` user)
- [ ] **B1.** `useradd -m -s /bin/bash claude`; lock the password (`passwd -l claude`) —
reached only via `sudo -iu claude` from `sjat` or its own key.
- [ ] **B2.** Add `claude` to the `docker` group.
- [ ] **B3.** No sudo for `claude` (explicit decision). Confirm `sudo -iu claude` works.
## Stage C — Repo + Claude Code (as `claude`)
- [ ] **C1.** Generate `claude`'s `ed25519` key; **[USER]** register the public key in
Forgejo (Settings → SSH keys).
- [ ] **C2.** Clone `ssh://git@forgejo.nyumbani.baobab.band:7577/sjat/boma.git` into
`/home/claude/Projects/boma`.
- [ ] **C3.** `make setup` (venv + `requirements.txt`); `make collections`.
- [ ] **C4.** Install Claude Code (native installer) for `claude`; set up plugins/MCP/
settings per `docs/runbooks/claude-code-setup.md`. Set git `user.name`/`user.email`.
## Stage D — Vault (`rbw`)
- [ ] **D1.** `rbw config set base_url https://vaultwarden.baobab.band`; set email.
- [ ] **D2. [USER]** `rbw login` (master password) on `ubongo`; then `rbw sync`,
`rbw unlock`; verify `rbw get boma-ansible-vault` returns the vault password.
- [ ] **D3.** **Offline-cache verification (ADR-015 open item, security-relevant):**
confirm `rbw` decrypts its local cache with Vaultwarden unreachable. Stamp the result
into ADR-015 / `rotate-secrets.md` (replaces the `TO VERIFY` note).
## Stage E — Inventory + base (partial)
- [ ] **E1.** Add `ubongo` to `inventories/production/hosts.yml` under `control`
(manual exception; note `tf-inventory` will overwrite — re-add after).
- [ ] **E2.** Set `base__firewall_control_addr` to `10.20.10.151` in the appropriate
`group_vars` (the dormant `ssh-from-control` knob, ADR-020/021).
- [ ] **E3.** `make check PLAYBOOK=site` against `control`; apply the built `firewall`
concern only (SSH-hardening/fail2ban/auditd concerns are unbuilt — note the gap).
## Stage F — Hardening / address
- [ ] **F1.** Disable SSH password auth (keys-only) via `/etc/ssh/sshd_config.d/`;
`PermitRootLogin no`; reload `sshd` (we're on a key, so safe).
- [ ] **F2.** **Remove the temp NOPASSWD** drop-in (`/etc/sudoers.d/99-boma-build`).
- [ ] **F3. [USER]** OPNsense DHCP reservation for `10.20.10.151`.
## Stage H — Docs reconciliation (repo commits)
- [ ] **H1.** `accepted-risks.md`: add the plaintext-disk accepted risk (compensations:
physical security, BIOS supervisor password, no external boot).
- [ ] **H2.** `docs/hardware/reference.md`: fill `ubongo`'s real specs (M70q, i3-10100T,
16 GB, 256 GB SanDisk X600) into the TBD skeleton; node-capacity row already present.
- [ ] **H3.** `STATUS.md`: move `ubongo` from "Designed but not built" toward built
(note what's live vs. still pending — mesh, full `base`).
- [ ] **H4.** Note the dedicated-`claude` identity decision (short amendment to ADR-021
or ADR-015) and the LAN address.
---
## Out of scope this session
- **Mesh VPN** (NetBird) — needs `askari` + service roles (ADR-016). SSH stays LAN-only.
- **Full `base` hardening** — SSH/fail2ban/auditd concerns not built (only `firewall`).
- **Recovery wiring (G)** — TF-state backup to `mamba`, rbw mirror — no TF state yet
(TF un-init'd). `mamba` as break-glass clone tracked separately.
---
## Outcome (2026-06-11)
`STATUS.md` is the live source of truth; this is the session record.
**Done:** A (toolchain — Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173; Node deferred),
B (dedicated `claude` user — docker group, no sudo), C (repo cloned, `make setup` +
`collections`, git identity; plugins install on first interactive launch), D (vault via
rbw + **offline-cache decryption verified**), E1/E2 (inventory + `ssh-from-control`
knob), F1 (key-only SSH), F2 (temp NOPASSWD removed), H1H4 (docs reconciled).
**Deferred, with reason:**
- **E3 — apply `base` to `ubongo`:** would push nftables default-deny with SSH allowed
*only on the mesh interface*, but no mesh exists yet → would deny inbound SSH on `eno1`
and strand the box. Wait for NetBird (ADR-016). `base` is also firewall-concern-only.
- **F3 — OPNsense DHCP reservation** for `10.20.10.151` (MAC `88:a4:c2:e0:ee:da`): operator action.
- **Mesh enrollment, full `base` hardening, recovery wiring (G):** out of scope (above).
**Follow-ups flagged:** (1) `ubongo` sits in `10.20.10.0/24`, which doesn't match
ADR-007's zone map (`srv: 10.20.0.0/24`) — network-design drift to reconcile. (2) The
hardware reference previously assumed `ubongo` had 1 TB NVMe for an ADR-022 "restore-verify"
role; the real disk is 256 GB — check ADR-022 doesn't bank on the larger size.