docs(todo): local VM integration testing (2.4) + screenshot hand-off (10.8)

From the 2026-06-17 mesh-hardening incident: Molecule can't catch reboot/firewall-x-Docker/boot-order bugs — build local-VM pre-deploy testing on ubongo (ADR-008 Level 2/3). And a smooth screenshot hand-off for the agent during incidents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 22:27:26 +02:00 · 2026-06-17 22:27:26 +02:00 · 69faaf5e43
commit 69faaf5e43
parent 958e35e3c3
1 changed files with 20 additions and 0 deletions
--- a/docs/TODO.md
+++ b/docs/TODO.md
@ -17,6 +17,19 @@
      calls, curl pulls of web products, log reviews. Headless browsing → ADR-017
      (`/verify-service`); the API/curl/log-review siblings remain open.
   3. ~~Standard for test users + manual-test instructions.~~ → ADR-017.
   4. **Local VM integration testing on ubongo (pre-deploy).** Molecule (containers,
      one converge, no reboot, no real Docker/firewall interaction) structurally
      **cannot** catch reboot-survivability, host-firewall × Docker, or boot-order bugs —
      exactly the class that caused the 2026-06-17 mesh-hardening incident (`base`'s
      nftables `forward policy drop` broke the askari Docker host on reboot;
      `ip_nonlocal_bind` didn't beat the sshd boot-race). Build a way for the agent to
      spin up throwaway VMs **locally on ubongo** (libvirt/QEMU? Proxmox-on-ubongo?) that
      mirror a target host (real Docker, a real reboot, the real role apply) and validate
      risky infra changes there **before** deploying to a live host. This is the concrete
      build of ADR-008's Level 2/3 (staging/integration) testing — deferred for lack of
      hosts, but ubongo can host it. Decide the virtualisation approach + how the agent
      drives it (provision → snapshot/reset → run the playbook → reboot → assert). Ties to
      3.10 (testing approach as it matures) and the 2026-06-17 FRICTION signals.
 3. **Building services**
   1. ~~Decide how to manage logs.~~ → ADR-018.
@ -84,6 +97,13 @@
    5. ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`).
    6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
    7. ~~Reproducible agent toolchain.~~ → `.claude/settings.json` + `docs/runbooks/claude-code-setup.md`.
    8. **Screenshot hand-off to the agent.** Give the operator a smooth way to hand the
       agent a screenshot (e.g. of a Hetzner/VNC console during an incident) — the agent
       can already read image files; the gap is the hand-off. During the 2026-06-17
       incident the only diagnostic channel was console screenshots, copied manually to
       `/tmp` and `find`-located. Options: a known drop path the agent checks (e.g.
       `~/screenshots/`), a small `screenshot`/paste helper or slash-command, or a
       clipboard→file convention. Cheap, high-value for incident work.
 11. **Kaizen loop** — `/kaizen` built (STATUS).
    1. ~~Build the loop command.~~ → `/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).