From 69faaf5e43c66159fcf6c3b4e30257e093b6b800 Mon Sep 17 00:00:00 2001 From: sjat Date: Wed, 17 Jun 2026 22:27:26 +0200 Subject: [PATCH] docs(todo): local VM integration testing (2.4) + screenshot hand-off (10.8) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From the 2026-06-17 mesh-hardening incident: Molecule can't catch reboot/firewall-x-Docker/boot-order bugs — build local-VM pre-deploy testing on ubongo (ADR-008 Level 2/3). And a smooth screenshot hand-off for the agent during incidents. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/TODO.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/docs/TODO.md b/docs/TODO.md index 97a30d8..4f0456c 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -17,6 +17,19 @@ calls, curl pulls of web products, log reviews. Headless browsing → ADR-017 (`/verify-service`); the API/curl/log-review siblings remain open. 3. ~~Standard for test users + manual-test instructions.~~ → ADR-017. + 4. **Local VM integration testing on ubongo (pre-deploy).** Molecule (containers, + one converge, no reboot, no real Docker/firewall interaction) structurally + **cannot** catch reboot-survivability, host-firewall × Docker, or boot-order bugs — + exactly the class that caused the 2026-06-17 mesh-hardening incident (`base`'s + nftables `forward policy drop` broke the askari Docker host on reboot; + `ip_nonlocal_bind` didn't beat the sshd boot-race). Build a way for the agent to + spin up throwaway VMs **locally on ubongo** (libvirt/QEMU? Proxmox-on-ubongo?) that + mirror a target host (real Docker, a real reboot, the real role apply) and validate + risky infra changes there **before** deploying to a live host. This is the concrete + build of ADR-008's Level 2/3 (staging/integration) testing — deferred for lack of + hosts, but ubongo can host it. Decide the virtualisation approach + how the agent + drives it (provision → snapshot/reset → run the playbook → reboot → assert). Ties to + 3.10 (testing approach as it matures) and the 2026-06-17 FRICTION signals. 3. **Building services** 1. ~~Decide how to manage logs.~~ → ADR-018. @@ -84,6 +97,13 @@ 5. ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`). 6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback? 7. ~~Reproducible agent toolchain.~~ → `.claude/settings.json` + `docs/runbooks/claude-code-setup.md`. + 8. **Screenshot hand-off to the agent.** Give the operator a smooth way to hand the + agent a screenshot (e.g. of a Hetzner/VNC console during an incident) — the agent + can already read image files; the gap is the hand-off. During the 2026-06-17 + incident the only diagnostic channel was console screenshots, copied manually to + `/tmp` and `find`-located. Options: a known drop path the agent checks (e.g. + `~/screenshots/`), a small `screenshot`/paste helper or slash-command, or a + clipboard→file convention. Cheap, high-value for incident work. 11. **Kaizen loop** — `/kaizen` built (STATUS). 1. ~~Build the loop command.~~ → `/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).