docs(todo): local VM integration testing (2.4) + screenshot hand-off (10.8)

From the 2026-06-17 mesh-hardening incident: Molecule can't catch
reboot/firewall-x-Docker/boot-order bugs — build local-VM pre-deploy testing
on ubongo (ADR-008 Level 2/3). And a smooth screenshot hand-off for the agent
during incidents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-17 22:27:26 +02:00
parent 958e35e3c3
commit 69faaf5e43

View file

@ -17,6 +17,19 @@
calls, curl pulls of web products, log reviews. Headless browsing → ADR-017
(`/verify-service`); the API/curl/log-review siblings remain open.
3. ~~Standard for test users + manual-test instructions.~~ → ADR-017.
4. **Local VM integration testing on ubongo (pre-deploy).** Molecule (containers,
one converge, no reboot, no real Docker/firewall interaction) structurally
**cannot** catch reboot-survivability, host-firewall × Docker, or boot-order bugs —
exactly the class that caused the 2026-06-17 mesh-hardening incident (`base`'s
nftables `forward policy drop` broke the askari Docker host on reboot;
`ip_nonlocal_bind` didn't beat the sshd boot-race). Build a way for the agent to
spin up throwaway VMs **locally on ubongo** (libvirt/QEMU? Proxmox-on-ubongo?) that
mirror a target host (real Docker, a real reboot, the real role apply) and validate
risky infra changes there **before** deploying to a live host. This is the concrete
build of ADR-008's Level 2/3 (staging/integration) testing — deferred for lack of
hosts, but ubongo can host it. Decide the virtualisation approach + how the agent
drives it (provision → snapshot/reset → run the playbook → reboot → assert). Ties to
3.10 (testing approach as it matures) and the 2026-06-17 FRICTION signals.
3. **Building services**
1. ~~Decide how to manage logs.~~ → ADR-018.
@ -84,6 +97,13 @@
5. ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`).
6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
7. ~~Reproducible agent toolchain.~~`.claude/settings.json` + `docs/runbooks/claude-code-setup.md`.
8. **Screenshot hand-off to the agent.** Give the operator a smooth way to hand the
agent a screenshot (e.g. of a Hetzner/VNC console during an incident) — the agent
can already read image files; the gap is the hand-off. During the 2026-06-17
incident the only diagnostic channel was console screenshots, copied manually to
`/tmp` and `find`-located. Options: a known drop path the agent checks (e.g.
`~/screenshots/`), a small `screenshot`/paste helper or slash-command, or a
clipboard→file convention. Cheap, high-value for incident work.
11. **Kaizen loop**`/kaizen` built (STATUS).
1. ~~Build the loop command.~~`/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).