docs(friction): base firewall flush wipes Docker nat (cutover finding)

Applying base's nftables (even INPUT-only/forward-accept) to a Docker host
flushes Docker's ip nat -> container egress breaks until 'systemctl restart
docker'. Found on the ubongo mesh-hardening 2/3 live cutover; the Docker-less
test VM couldn't surface it. Self-heals on reboot (dockerd re-adds nat;
forward=accept doesn't block). Runbook/docker_host follow-ups noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-19 15:16:21 +02:00
parent 180af46879
commit a881185c73

View file

@ -197,6 +197,23 @@ harness on ubongo and shaking it down against real KVM (spec/plan in docs/superp
integration-only coverage. Final-review finding; not a cutover blocker (the accept branch is a integration-only coverage. Final-review finding; not a cutover blocker (the accept branch is a
literal, and a var-name break would fail the drop branch too → caught). literal, and a var-name break would fail the drop branch too → caught).
- `[gotcha]` **Applying base's firewall to a Docker host flushes Docker's nat → container
egress dies until `restart docker`** (2026-06-19, mesh-hardening 2/3 live cutover): base's
`nftables.conf.j2` starts with `flush ruleset`, which wipes ALL tables incl. Docker's
`ip nat`/`ip filter` (+ libvirt's). On ubongo I chose INPUT-only so `forward` stays `accept`
— yet the apply STILL broke CONTAINER egress: `docker pull` worked (dockerd uses HOST egress)
but a container `ping` FAILED — the masquerade (SNAT) was gone, so replies couldn't return.
`forward accept` permits forwarding but can't replace the missing nat. The spec's "input-only
keeps Docker egress working" was therefore **incomplete**, and the local-VM harness couldn't
catch it (the test VM runs no Docker). Fix on the live host: `systemctl restart docker`
re-adds its `ip nat`/`ip filter` (egress restored; coexists fine with base's `inet filter`).
On REBOOT it self-heals (dockerd re-adds nat on boot; `forward accept` doesn't block — unlike
the 2026-06-17 `forward drop` incident). → (1) any cutover/runbook applying base firewall to a
Docker host MUST `restart docker` + check container egress after the apply; (2) the pending
`docker_host` nftables integration should own re-adding/persisting Docker's rules so base's
`flush` is safe; (3) the firewall final-review checklist should include "does the host run
Docker/libvirt? the flush wipes their nat."
--- ---
## Kaizen reviews — decisions ledger ## Kaizen reviews — decisions ledger