docs(todo): local VM integration testing (2.4) + screenshot hand-off (10.8)

From the 2026-06-17 mesh-hardening incident: Molecule can't catch
reboot/firewall-x-Docker/boot-order bugs — build local-VM pre-deploy testing
on ubongo (ADR-008 Level 2/3). And a smooth screenshot hand-off for the agent
during incidents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-17 22:27:26 +02:00
parent 958e35e3c3
commit 69faaf5e43

View file

@ -17,6 +17,19 @@
calls, curl pulls of web products, log reviews. Headless browsing → ADR-017 calls, curl pulls of web products, log reviews. Headless browsing → ADR-017
(`/verify-service`); the API/curl/log-review siblings remain open. (`/verify-service`); the API/curl/log-review siblings remain open.
3. ~~Standard for test users + manual-test instructions.~~ → ADR-017. 3. ~~Standard for test users + manual-test instructions.~~ → ADR-017.
4. **Local VM integration testing on ubongo (pre-deploy).** Molecule (containers,
one converge, no reboot, no real Docker/firewall interaction) structurally
**cannot** catch reboot-survivability, host-firewall × Docker, or boot-order bugs —
exactly the class that caused the 2026-06-17 mesh-hardening incident (`base`'s
nftables `forward policy drop` broke the askari Docker host on reboot;
`ip_nonlocal_bind` didn't beat the sshd boot-race). Build a way for the agent to
spin up throwaway VMs **locally on ubongo** (libvirt/QEMU? Proxmox-on-ubongo?) that
mirror a target host (real Docker, a real reboot, the real role apply) and validate
risky infra changes there **before** deploying to a live host. This is the concrete
build of ADR-008's Level 2/3 (staging/integration) testing — deferred for lack of
hosts, but ubongo can host it. Decide the virtualisation approach + how the agent
drives it (provision → snapshot/reset → run the playbook → reboot → assert). Ties to
3.10 (testing approach as it matures) and the 2026-06-17 FRICTION signals.
3. **Building services** 3. **Building services**
1. ~~Decide how to manage logs.~~ → ADR-018. 1. ~~Decide how to manage logs.~~ → ADR-018.
@ -84,6 +97,13 @@
5. ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`). 5. ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`).
6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback? 6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
7. ~~Reproducible agent toolchain.~~`.claude/settings.json` + `docs/runbooks/claude-code-setup.md`. 7. ~~Reproducible agent toolchain.~~`.claude/settings.json` + `docs/runbooks/claude-code-setup.md`.
8. **Screenshot hand-off to the agent.** Give the operator a smooth way to hand the
agent a screenshot (e.g. of a Hetzner/VNC console during an incident) — the agent
can already read image files; the gap is the hand-off. During the 2026-06-17
incident the only diagnostic channel was console screenshots, copied manually to
`/tmp` and `find`-located. Options: a known drop path the agent checks (e.g.
`~/screenshots/`), a small `screenshot`/paste helper or slash-command, or a
clipboard→file convention. Cheap, high-value for incident work.
11. **Kaizen loop**`/kaizen` built (STATUS). 11. **Kaizen loop**`/kaizen` built (STATUS).
1. ~~Build the loop command.~~`/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`). 1. ~~Build the loop command.~~`/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).