diff --git a/docs/FRICTION.md b/docs/FRICTION.md index 994ba59..fdcc2ac 100644 --- a/docs/FRICTION.md +++ b/docs/FRICTION.md @@ -158,6 +158,34 @@ harness on ubongo and shaking it down against real KVM (spec/plan in docs/superp reservations** (`10.20.10.17` = MAC `bc:0f:f3:c8:4a:8a`; mamba's MAC TBD) and allow the reserved IPs. Spec: `docs/superpowers/specs/2026-06-19-mesh-hardening-ubongo-default-deny-design.md`. +- `[gotcha]` **`make test-integration` on ubongo fails (`qemu-img` "Permission denied") when + the agent session predates the `libvirt` group grant** (2026-06-19): the `integration_test` + role adds `claude` to `libvirt`+`kvm` and makes the cache dir `/var/lib/boma-integration` + `root:libvirt 2775` — correct — but a `claude` session whose shell started *before* that + grant carries a stale process group set (`id` → `claude,docker` only, no `libvirt`), so + `qemu-img create` of the VM overlay into the group-owned dir is denied. `virsh`/`virt-install` + still work (they reach system libvirtd via polkit/socket, and the real KVM runs server-side + as `libvirt-qemu`), so ONLY claude's own file-writes break. Unblock without restarting the + session: **`sg libvirt -c 'make test-integration HOST='`** (claude needs only `libvirt` + for the dir; `kvm` is server-side; note `sg` adds one group, not the full set). → self-heal + in `scripts/integration-vm.py`: if the `libvirt` gid is absent from `os.getgroups()`, re-exec + under `sg libvirt` (or have the Makefile target do it), so a stale-session agent never hits + this opaque symptom. New agent sessions pick the groups up on login, so it's a stale-session + transient — but high-confusion, worth self-healing. + +- `[friction]` **No standard for when the agent may run local-VM integration tests on ubongo + without asking** (2026-06-19): `make test-integration HOST=` spins an ISOLATED throwaway + KVM VM (its own libvirt NAT; never touches the real host's firewall/network; guards: + one-VM-at-a-time + a 4 GiB free-RAM floor + auto-destroy on success), so it is safe and + self-contained — yet the agent paused for a go-ahead before running it (mesh-hardening 2/3, + Task 4). The operator wants a STANDARD that pre-authorises VM-testing on ubongo so the agent + just runs it. → decide + record the rule: e.g. a `.claude/settings.json` permission allow for + `make test-integration*` / `scripts/integration-vm.py` (and the `sg libvirt -c '…'` form per + the gotcha above), plus a CLAUDE.md line distinguishing the pre-authorised isolated VM tests + from the genuinely-gated live steps (`make deploy` to real hosts, host reboots, cutovers — + still need a go-ahead). Ties to the `test-risky-infra-before-live-deploy` + + `dont-reask-settled-defaults` memories + ADR-025. + --- ## Kaizen reviews — decisions ledger