# Spec — Mesh-hardening (2 of 3): ubongo INPUT-only default-deny + `ssh-from-control` Status: Accepted (2026-06-19) ## Context & scope The **mesh-hardening follow-on** (deferred from M5, ROADMAP) was decomposed into three independent sub-projects, each its own spec → plan → implementation cycle: 1. askari SSH → `wt0` — spec/plan written 2026-06-17, **attempted and backed out the same day** (the incident; six lessons in `FRICTION.md`). Needs a redesign — **not** this spec. 2. **ubongo nftables default-deny + `ssh-from-control`** ← *this spec* 3. NetBird ACL off Allow-All → scoped policies (its own later spec; open mechanism question — no headless API path). ROADMAP (re-ordered after the 2026-06-17 incident) puts **ubongo first**: it is the clean, low-risk case — a physical box with a permanent console break-glass, and *not* the coordinator host that the incident proved you must not corner. This spec hardens **ubongo's inbound surface only**. It does **not** change sshd's `ListenAddress` (so no boot-race), does **not** apply a forward-chain default-deny (so Docker + the libvirt NAT keep working), and does **not** touch askari or the NetBird ACL. Current state (verified on ubongo, 2026-06-19): **no host firewall** — sshd listens on `0.0.0.0:22`, reachable from LAN, mesh, and anything routable; only Docker's + libvirt's own `iptables-nft` tables exist. Interfaces: `eno1` `10.20.10.151` (LAN, = `ansible_host`), `wt0` `100.99.146.14` (mesh), `docker0` (one container, no published ports), `virbr-boma` `192.168.150.1/24` (the libvirt NAT that `make test-integration` uses), `ip_forward=1`. ## Goal / success criteria - SSH to ubongo succeeds over **`wt0`** (road-warriors, askari), from **mamba on the LAN** (`10.20.10.50`), and via the **`ssh-from-control` self-path** (Ansible; source `10.20.10.151`). - SSH from any **other** LAN source is **dropped** (default-deny on `input`). - **Docker container egress and `make test-integration` (libvirt NAT) keep working** — the forward chain is untouched. - A **reboot** does not lock SSH out (no `ListenAddress`, so no bind race). - Break-glass is the **on-prem physical console** (permanent, non-mesh). The live apply is additionally gated by the firewall **auto-rollback** timer. ## Design Apply base's nftables `firewall` concern to ubongo, with two adjustments and one deliberate non-change: 1. **INPUT-only default-deny.** The `input` chain keeps `policy drop` with the guaranteed management plane: `lo`, `established,related`, ICMP, SSH on `wt0`, and SSH from `ssh-from-control` (`10.20.10.151`). We add **one operator-workstation source** (mamba, `10.20.10.50`) via a new `base__firewall_admin_addrs` list. Everything else on `eno1` drops. 2. **Forward chain left permissive.** base hardcodes `chain forward { … policy drop; }` for inter-container isolation. On ubongo that would break Docker egress **and** the libvirt NAT the integration harness depends on — the same class of failure that sank askari (FRICTION 2026-06-17, signal 1). A new `base__firewall_input_only` knob renders the forward chain `policy accept` instead. Docker's and libvirt's own `iptables-nft` forward rules continue to apply (separate tables); base simply does not add a default-deny on top. 3. **No sshd `ListenAddress` change.** sshd keeps listening on `0.0.0.0:22`; nftables does all inbound scoping. This deliberately avoids the `ip_nonlocal_bind` boot-race that broke askari (FRICTION signal 2) — there is nothing to bind before `wt0` exists. Resulting `input` allow-list: ``` iif "lo" accept ct state established,related accept ct state invalid drop iifname "wt0" tcp dport 22 accept # mesh (road-warriors, askari) ip saddr 10.20.10.151 tcp dport 22 accept # ssh-from-control (Ansible self) — group_vars/all ip saddr 10.20.10.50 tcp dport 22 accept # mamba on the LAN — base__firewall_admin_addrs ip protocol icmp accept ; ip6 nexthdr ipv6-icmp accept # (no catalog services on ubongo) → default drop chain forward: policy accept # Docker + libvirt-NAT forwarding preserved ``` ## Why ubongo is the safe case (maps to the 2026-06-17 incident) - **Signal 1** (forward-drop breaks Docker hosts): sidestepped — INPUT-only leaves forwarding alone. - **Signal 2** (`ip_nonlocal_bind` boot-race): sidestepped — no `ListenAddress`; sshd binds nothing new. - **Signal 3** (a host's only mgmt path must not depend on a service it hosts): satisfied — ubongo is not the coordinator and keeps three independent paths (mesh, LAN, physical console). - **Signal 6** (recovery tested after the break-glass was removed): the physical console is permanent (nothing to retire), and reboot-recovery is proven on a throwaway VM first. ## New & changed code **Role `base`:** - `roles/base/defaults/main.yml` — add: - `base__firewall_input_only: false` — when true, the forward chain is `policy accept` (host-local input filtering only), for hosts that route/forward container or NAT traffic (e.g. the control node's Docker + libvirt-NAT) where a forward default-deny would break them. - `base__firewall_admin_addrs: []` — extra LAN source IPs allowed to SSH (besides `wt0` + `ssh-from-control`); for an operator workstation reaching the host over the LAN. Key-gated. - `roles/base/templates/nftables.conf.j2`: - the forward line (currently line 21) → `chain forward { type filter hook forward priority 0; policy {{ "accept" if base__firewall_input_only | bool else "drop" }}; }` - after the `ssh-from-control` block (currently lines 12-14), add a loop: `{% for addr in base__firewall_admin_addrs %}` → `ip saddr {{ addr }} tcp dport {{ base__firewall_ssh_port }} accept` - `roles/base/molecule/default/{converge,verify}.yml` — fixture sets `input_only: true` + an `admin_addrs` entry; assert (a) `forward` renders `policy accept`, (b) the admin-addr accept rule renders, (c) existing input default-deny + `wt0` + control-addr assertions stay green. **Inventory** (`inventories/production/group_vars/control/vars.yml`, append): ```yaml # Mesh-hardening 2/3 (2026-06-19, ADR-020/021): apply base's host firewall to ubongo as # INPUT-only default-deny — harden the inbound surface, leave the forward chain permissive so # Docker egress + the libvirt-NAT integration harness keep working. sshd is unchanged # (nftables scopes inbound), so there is no boot-race. Reach ubongo over wt0, the # ssh-from-control self-path (base__firewall_control_addr in group_vars/all), or mamba on the # LAN. Break-glass: the physical console. base__firewall_input_only: true base__firewall_admin_addrs: - "10.20.10.50" # mamba over the LAN (NetBird off). Raw DHCP lease — see note below. # base__firewall_apply defaults true; base__firewall_control_addr (= ubongo's own 10.20.10.151) # is set in group_vars/all and covers Ansible's self-connection. ``` **Integration harness** (ADR-025) — a "be ubongo" profile, mirroring "be askari": - `tests/integration/overrides/ubongo.yml` — `firewall_apply: true`, `input_only: true`, `admin_addrs: ["192.168.150.99"]` (a representative LAN addr to exercise the rule), `firewall_control_addr: "192.168.150.1"` (the libvirt-NAT gateway = the harness's own SSH path, so the apply + reboot don't lock it out), `ssh_listen_mesh_only: false`, `mesh_enabled: false`. - `tests/integration/profiles/ubongo.json` — mirror `profiles/askari.json` (VM resources/image). - `tests/integration/verify.yml` — make the assertions **profile-aware** (gated on the active profile, since `verify.yml` is shared): for ubongo assert `input` policy drop, `forward` policy **accept**, and the admin-addr rule present. Reachability across the reboot is the harness's existing cycle. The askari assertions (Docker/forward-DNAT) must **not** run for the ubongo profile, nor vice-versa. Enables `make test-integration HOST=ubongo`. ## The mamba admin-addr — a deliberately interim value `base__firewall_admin_addrs: ["10.20.10.50"]` is mamba's **current raw DHCP lease**, not a reservation (operator decision, 2026-06-19). Caveats, accepted for now: - **Lease drift:** if DHCP reassigns `10.20.10.50`, the rule allows whatever host then holds it (still SSH-key-gated, so low risk) and mamba loses its *LAN* path. **Backstop:** mamba also reaches ubongo over `wt0` (mesh), so it is never cut off — only the off-mesh LAN convenience lapses until the IP is corrected. - **Revisit trigger:** when OPNsense-as-code lands (ADR-020 perimeter layer), replace this with a **DHCP reservation** (MAC → fixed IP) and allow the reserved address. Tracked here and in the implementation plan's follow-ups. ## Testing - **Molecule** (base `default`, render-only, `firewall_apply: false`): the new forward-accept + admin-addr assertions above, with existing assertions green. - **Integration harness** (`make test-integration HOST=ubongo`): on a throwaway UEFI VM, apply the ubongo overlay, assert the ruleset shape, and prove **SSH survives a reboot** from an allowed source (the existing assert/cycle). This is the gate before touching the real control node. - **Live** (during cutover): SSH over `wt0` ✓, from mamba LAN ✓, Ansible self-ping ✓; SSH from a disallowed LAN host dropped ✓; `docker run … ` egress ✓; a fresh `make test-integration` still spins a VM (libvirt NAT intact) ✓. ## Staged cutover (operator-supervised — lockout-aware, FRICTION signal-6 order) ubongo is managed as `sjat` (password sudo), so the live apply needs the operator present anyway. The physical console is open throughout. 1. **Harness GREEN:** `make test-integration HOST=ubongo` passes (incl. the reboot). 2. **Pre-check the real paths** *before* applying: SSH over `wt0`, SSH from mamba (`10.20.10.50`), `ansible ubongo -m ping`. Confirm the physical console is reachable. 3. **Dry-run:** `make check PLAYBOOK=site LIMIT=ubongo TAGS=firewall` — review the nftables diff (input default-deny + `wt0` + `10.20.10.151` + `10.20.10.50`; forward `policy accept`). 4. **Apply (auto-rollback armed):** `make deploy PLAYBOOK=site LIMIT=ubongo TAGS=firewall` — the firewall concern snapshots, arms the 45 s revert, applies, `reset_connection` → `wait_for_connection` over the live path (`10.20.10.151`), then cancels the timer. A bad ruleset reverts itself; the console is the ultimate fallback. 5. **Verify** every path + Docker egress + a fresh integration-VM spin (above). 6. **Reboot ubongo; confirm SSH returns on all paths unaided** (console present). Only now is it done — recovery is proven *while the break-glass is still there*. 7. **Docs:** update `STATUS.md` (ubongo row: input-only default-deny applied) and `ROADMAP.md` (mesh-hardening 2/3 done; next is sub-project 1 askari redesign or 3 NetBird ACL). ## Risks & rollback - **Self-referential apply** (ubongo runs Ansible against itself): mitigated by the auto-rollback timer, the `wait_for_connection` over the real path, three redundant allowed sources, and the permanent physical console. ubongo cannot be bricked. - **Raw-lease fragility:** documented above; backstopped by the mesh path; revisit with OPNsense. - **No new container isolation** (forward stays accept): accepted — ubongo is a single-tenant control node, not a service host; Docker/libvirt keep their own forward rules. The forward default-deny remains the norm for real service hosts (`base__firewall_input_only: false`). ## Out of scope / follow-ons - askari SSH → `wt0` redesign (sub-project 1) — needs the boot-race + coordinator-bootstrap resolved; folds in the coordinator-robustness (geo-DB FATAL-loop) + off-site backup lessons. - NetBird ACL off Allow-All (sub-project 3) — open mechanism question (no headless API path). - OPNsense DHCP reservation for mamba (and ubongo) — replaces the raw lease; with OPNsense-as-code. - Forward-chain container isolation on ubongo — deliberately not done here. - `STATUS.md` / `ROADMAP.md` edits land with the implementation, not this spec.