docs(friction): VM-testing standard + libvirt stale-session gotcha
Two signals from running the ubongo harness gate: (1) the operator wants a standard pre-authorising isolated VM integration tests on ubongo so the agent doesn't ask each time; (2) a stale agent session (shell predating the integration_test libvirt-group grant) carries stale process groups, so the harness's qemu-img/file writes are denied -> run via 'sg libvirt -c ...'; self-heal idea noted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
468f8c3a92
commit
8d8c86fa39
1 changed files with 28 additions and 0 deletions
|
|
@ -158,6 +158,34 @@ harness on ubongo and shaking it down against real KVM (spec/plan in docs/superp
|
||||||
reservations** (`10.20.10.17` = MAC `bc:0f:f3:c8:4a:8a`; mamba's MAC TBD) and allow the
|
reservations** (`10.20.10.17` = MAC `bc:0f:f3:c8:4a:8a`; mamba's MAC TBD) and allow the
|
||||||
reserved IPs. Spec: `docs/superpowers/specs/2026-06-19-mesh-hardening-ubongo-default-deny-design.md`.
|
reserved IPs. Spec: `docs/superpowers/specs/2026-06-19-mesh-hardening-ubongo-default-deny-design.md`.
|
||||||
|
|
||||||
|
- `[gotcha]` **`make test-integration` on ubongo fails (`qemu-img` "Permission denied") when
|
||||||
|
the agent session predates the `libvirt` group grant** (2026-06-19): the `integration_test`
|
||||||
|
role adds `claude` to `libvirt`+`kvm` and makes the cache dir `/var/lib/boma-integration`
|
||||||
|
`root:libvirt 2775` — correct — but a `claude` session whose shell started *before* that
|
||||||
|
grant carries a stale process group set (`id` → `claude,docker` only, no `libvirt`), so
|
||||||
|
`qemu-img create` of the VM overlay into the group-owned dir is denied. `virsh`/`virt-install`
|
||||||
|
still work (they reach system libvirtd via polkit/socket, and the real KVM runs server-side
|
||||||
|
as `libvirt-qemu`), so ONLY claude's own file-writes break. Unblock without restarting the
|
||||||
|
session: **`sg libvirt -c 'make test-integration HOST=<name>'`** (claude needs only `libvirt`
|
||||||
|
for the dir; `kvm` is server-side; note `sg` adds one group, not the full set). → self-heal
|
||||||
|
in `scripts/integration-vm.py`: if the `libvirt` gid is absent from `os.getgroups()`, re-exec
|
||||||
|
under `sg libvirt` (or have the Makefile target do it), so a stale-session agent never hits
|
||||||
|
this opaque symptom. New agent sessions pick the groups up on login, so it's a stale-session
|
||||||
|
transient — but high-confusion, worth self-healing.
|
||||||
|
|
||||||
|
- `[friction]` **No standard for when the agent may run local-VM integration tests on ubongo
|
||||||
|
without asking** (2026-06-19): `make test-integration HOST=<name>` spins an ISOLATED throwaway
|
||||||
|
KVM VM (its own libvirt NAT; never touches the real host's firewall/network; guards:
|
||||||
|
one-VM-at-a-time + a 4 GiB free-RAM floor + auto-destroy on success), so it is safe and
|
||||||
|
self-contained — yet the agent paused for a go-ahead before running it (mesh-hardening 2/3,
|
||||||
|
Task 4). The operator wants a STANDARD that pre-authorises VM-testing on ubongo so the agent
|
||||||
|
just runs it. → decide + record the rule: e.g. a `.claude/settings.json` permission allow for
|
||||||
|
`make test-integration*` / `scripts/integration-vm.py` (and the `sg libvirt -c '…'` form per
|
||||||
|
the gotcha above), plus a CLAUDE.md line distinguishing the pre-authorised isolated VM tests
|
||||||
|
from the genuinely-gated live steps (`make deploy` to real hosts, host reboots, cutovers —
|
||||||
|
still need a go-ahead). Ties to the `test-risky-infra-before-live-deploy` +
|
||||||
|
`dont-reask-settled-defaults` memories + ADR-025.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Kaizen reviews — decisions ledger
|
## Kaizen reviews — decisions ledger
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue