# Mesh-hardening 2/3 — ubongo INPUT-only default-deny — Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Apply base's nftables firewall to the control node (ubongo) as an INPUT-only default-deny — hardening its inbound surface — while leaving the forward chain permissive so Docker egress and the libvirt-NAT integration harness keep working, and without any sshd `ListenAddress` change. **Architecture:** Two new `base` knobs make the existing firewall concern fit a control node: `base__firewall_input_only` flips the forward chain to `policy accept` (host-local input filtering only), and `base__firewall_admin_addrs` adds operator-workstation LAN sources to the SSH allow-list (alongside `wt0` and `ssh-from-control`). sshd is untouched (nftables does the scoping → no `ip_nonlocal_bind` boot-race). The change is validated on a throwaway VM via the ADR-025 integration harness (a new "be ubongo" profile) before an operator-supervised live cutover whose safety net is the firewall auto-rollback timer plus the permanent on-prem physical console. **Tech Stack:** Ansible (role `base`, FQCN), nftables, Jinja2, Molecule on Debian 13, pytest (none new), the ADR-025 integration harness (`scripts/integration-vm.py`, JSON profiles, `-e @` overlays). **Spec:** `docs/superpowers/specs/2026-06-19-mesh-hardening-ubongo-default-deny-design.md` **Conventions:** `make lint` and `make test ROLE=base` before each commit; `make check` before `make deploy`; never hand-edit the generated `offsite.yml`; `rbw unlocked` for any commit touching Ansible content and for the integration/live applies (the production `group_vars/all/vault.yml` is in inventory scope and gets decrypted at playbook load). Tasks 1–3 are code (subagent-driven, each lint/Molecule-verified). Task 4 is a real-VM validation gate on ubongo. Task 5 is the live, operator-supervised cutover. --- ## File Structure | File | Create/Modify | Responsibility | |---|---|---| | `roles/base/defaults/main.yml` | Modify | Declare `base__firewall_input_only` + `base__firewall_admin_addrs` (defaults: off / empty). | | `roles/base/templates/nftables.conf.j2` | Modify | Conditional forward policy; render an SSH-allow rule per admin address. | | `roles/base/molecule/default/converge.yml` | Modify | Fixture: an admin-addr source (input-only stays at its default → forward drop). | | `roles/base/molecule/default/verify.yml` | Modify | Assert forward-drop default + the admin-addr rule render. | | `inventories/production/group_vars/control/vars.yml` | Modify | Turn the knobs on for ubongo (input-only; mamba's LAN IP). | | `tests/integration/overrides/ubongo.yml` | Create | The "be ubongo" overlay (input-only firewall; harness SSH lifeline). | | `tests/integration/profiles/ubongo.json` | Create | The "be ubongo" VM profile (group `control`, applies `site.yml:base`). | | `tests/integration/overrides/askari.yml` | Modify | Add the `integration_profile` marker (verify is now profile-aware). | | `tests/integration/verify.yml` | Modify | Gate the askari (Docker/DNAT) block; add the ubongo (input-only) block + a guard. | | `STATUS.md`, `docs/ROADMAP.md` | Modify (Task 5) | Record mesh-hardening 2/3 done. | --- ### Task 1: base role — `base__firewall_input_only` (forward policy) + `base__firewall_admin_addrs` (LAN SSH allow) **Files:** - Modify: `roles/base/defaults/main.yml` - Modify: `roles/base/templates/nftables.conf.j2` - Modify: `roles/base/molecule/default/converge.yml` - Modify: `roles/base/molecule/default/verify.yml` > **Test strategy (note):** Molecule renders one fixture, so it locks the *secure default* — > `input_only` **off** → forward `policy drop` — plus the new admin-addr rule (red→green). The > `input_only` **on** → forward `policy accept` path is exercised on a real VM by the > integration "be ubongo" profile (Tasks 3–4), whose verify fails red until this template > conditional exists. Both branches are covered, across the two test layers. - [ ] **Step 1: Write the failing test (extend Molecule verify)** In `roles/base/molecule/default/verify.yml`, after the `Assert the docker_host extension hook is present` block, add: ```yaml - name: Assert the forward chain defaults to policy drop (input_only off) ansible.builtin.assert: that: - "'hook forward priority 0; policy drop;' in nft" fail_msg: >- forward chain must default to policy drop when base__firewall_input_only is false (container isolation stays the norm on real service hosts) - name: Assert the admin-addr SSH allow rule (operator workstation on the LAN) ansible.builtin.assert: that: - "'ip saddr 10.30.0.77 tcp dport 22 accept' in nft" fail_msg: "missing admin-addr SSH allow rule from base__firewall_admin_addrs" ``` - [ ] **Step 2: Add the fixture that drives it (Molecule converge)** In `roles/base/molecule/default/converge.yml`, add to the `vars:` block (after the `base__firewall_control_addr` line): ```yaml base__firewall_admin_addrs: - "10.30.0.77" # fixture: an operator-workstation LAN source (admin-addr SSH allow) ``` - [ ] **Step 3: Run the test to verify it fails** Run: `make test ROLE=base` Expected: FAIL on `Assert the admin-addr SSH allow rule` (the template does not consume `base__firewall_admin_addrs` yet, so the `ip saddr 10.30.0.77 …` rule is absent). The forward-drop assertion passes already (the template currently hardcodes `policy drop`). - [ ] **Step 4: Add the defaults** In `roles/base/defaults/main.yml`, after the `base__firewall_apply: true` line (end of the firewall behaviour block, currently line 13), add: ```yaml base__firewall_input_only: false # true → the forward chain is `policy accept` (host-local # INPUT filtering only). For hosts that forward/route # container or NAT traffic (the control node's Docker + # libvirt-NAT) where a forward default-deny would break # them. Real service hosts keep this false (forward drop). base__firewall_admin_addrs: [] # extra LAN source IPs allowed to SSH, besides wt0 + # ssh-from-control. For an operator workstation reaching # the host over the LAN (no mesh). Key-gated. (ADR-021) ``` - [ ] **Step 5: Make the forward policy conditional + render the admin-addr rules** In `roles/base/templates/nftables.conf.j2`: (a) Replace the forward-chain line (currently line 21): ```jinja chain forward { type filter hook forward priority 0; policy {{ 'accept' if base__firewall_input_only | bool else 'drop' }}; } ``` (b) After the `ssh-from-control` `{% endif %}` (currently line 14) and before the `ip protocol icmp accept` line, add the admin-addr loop: ```jinja {% for addr in base__firewall_admin_addrs %} ip saddr {{ addr }} tcp dport {{ base__firewall_ssh_port }} accept {% endfor %} ``` - [ ] **Step 6: Run the test to verify it passes** Run: `make test ROLE=base` Expected: PASS — converge renders the ruleset; verify confirms the forward chain is `policy drop` (input_only defaults false) and the `ip saddr 10.30.0.77 tcp dport 22 accept` rule is present; all pre-existing assertions stay green. - [ ] **Step 7: Lint** Run: `make lint` Expected: `Passed: 0 failure(s)` and `check-tags: OK`. - [ ] **Step 8: Commit** ```bash git add roles/base/defaults/main.yml roles/base/templates/nftables.conf.j2 \ roles/base/molecule/default/converge.yml roles/base/molecule/default/verify.yml git commit -m "feat(base): input-only forward policy + admin-addr SSH allow base__firewall_input_only renders the forward chain policy accept (host-local INPUT filtering only) for hosts that forward container/NAT traffic; defaults false so real service hosts keep the forward default-deny. base__firewall_admin_addrs adds operator-workstation LAN sources to the SSH allow-list alongside wt0 + ssh-from-control. Molecule locks the secure default + the admin rule. Mesh-hardening 2/3 (ADR-020/021). Co-Authored-By: Claude Opus 4.8 (1M context) " ``` --- ### Task 2: inventory — enable input-only default-deny + mamba on ubongo (control group) **Files:** - Modify: `inventories/production/group_vars/control/vars.yml` - [ ] **Step 1: Turn the knobs on for the control group** Append to `inventories/production/group_vars/control/vars.yml`: ```yaml # Mesh-hardening 2/3 (2026-06-19, ADR-020/021): apply base's host firewall to ubongo as # INPUT-only default-deny — harden the inbound surface, leave the forward chain permissive so # Docker egress + the libvirt-NAT integration harness keep working. sshd is unchanged # (nftables scopes inbound), so there is no boot-race. Reach ubongo over wt0 (mesh), the # ssh-from-control self-path (base__firewall_control_addr, group_vars/all = 10.20.10.151), or # mamba on the LAN. Break-glass: the physical console. (base__firewall_apply defaults true.) base__firewall_input_only: true base__firewall_admin_addrs: - "10.20.10.50" # mamba over the LAN (NetBird off). Raw DHCP lease — revisit with an # OPNsense reservation when OPNsense-as-code lands; backstopped by wt0. - "10.20.10.17" # 2nd operator workstation (MAC bc:0f:f3:c8:4a:8a). Raw lease — ditto. ``` - [ ] **Step 2: Verify the vars resolve for ubongo** Run: `.venv/bin/ansible-inventory -i inventories/production/ --host ubongo 2>/dev/null | grep -E 'firewall_input_only|firewall_admin_addrs|10.20.10.(50|17)'` Expected: shows `"base__firewall_input_only": true` and `"base__firewall_admin_addrs": ["10.20.10.50", "10.20.10.17"]`. - [ ] **Step 3: Lint** Run: `make lint` Expected: clean pass (`check-tags: OK`). - [ ] **Step 4: Commit** ```bash git add inventories/production/group_vars/control/vars.yml git commit -m "feat(inventory): ubongo gets INPUT-only host firewall + mamba LAN SSH Enables base__firewall_input_only on the control group (forward chain stays permissive so Docker egress + the integration-test libvirt NAT survive) and allows the operator workstations' LAN IPs (mamba 10.20.10.50 + 10.20.10.17; raw leases, backstopped by wt0). Mesh-hardening 2/3. Co-Authored-By: Claude Opus 4.8 (1M context) " ``` --- ### Task 3: integration harness — "be ubongo" profile (overlay + profile + profile-aware verify) **Files:** - Create: `tests/integration/overrides/ubongo.yml` - Create: `tests/integration/profiles/ubongo.json` - Modify: `tests/integration/overrides/askari.yml` - Modify: `tests/integration/verify.yml` - [ ] **Step 1: Create the "be ubongo" overlay** Create `tests/integration/overrides/ubongo.yml`: ```yaml --- # Integration-test overlay for the "ubongo" profile (ADR-025). Passed via `-e @`. # Exercises mesh-hardening 2/3: base's INPUT-only default-deny on the control node — input # chain default-deny, forward chain left permissive (Docker/libvirt-NAT safe), no sshd # ListenAddress change (so no boot-race). integration_profile: ubongo base__firewall_apply: true base__firewall_input_only: true # forward chain renders `policy accept` base__firewall_admin_addrs: - "192.168.150.98" # two representative LAN sources — exercises the - "192.168.150.99" # admin-addr loop with a multi-entry list (like ubongo) # Never wt0-only; never touch the real mesh from a throwaway VM. base__ssh_listen_mesh_only: false base__mesh_enabled: false # Allow SSH from the libvirt-NAT gateway (where the driver/ansible connect from) so the # default-deny apply + the reboot don't lock out the harness. By source IP (interface- # independent). This is the harness's lifeline; the admin-addr above is only exercised. base__firewall_control_addr: "192.168.150.1" ``` - [ ] **Step 2: Create the "be ubongo" VM profile** Create `tests/integration/profiles/ubongo.json`: ```json { "groups": ["control"], "applies": [ {"playbook": "site.yml", "tags": ["base"]} ], "extra_vars_files": ["overrides/ubongo.yml"], "mem_mib": 2048, "vcpus": 2 } ``` - [ ] **Step 3: Mark the askari overlay with its profile name** In `tests/integration/overrides/askari.yml`, after the two header comment lines (before `base__firewall_apply: true`), add: ```yaml integration_profile: askari ``` - [ ] **Step 4: Make `verify.yml` profile-aware (the test)** Replace the entire contents of `tests/integration/verify.yml` with: ```yaml --- # Integration verify (ADR-025). Outcome-based, profile-aware: the active profile is named by # `integration_profile` (set in each profile's overlay). Each profile asserts its own success # criteria; an unknown/unset profile fails loudly (never a silent pass). - name: Verify the rebooted host hosts: all become: true gather_facts: false tasks: - name: A known integration_profile must be set (no silent pass) ansible.builtin.assert: that: - integration_profile is defined - integration_profile in ['askari', 'ubongo'] fail_msg: "integration_profile must be set in the profile overlay (askari|ubongo)" # ── askari profile — Docker host: published-port forwarding survives the reboot ── # The load-bearing check probes the VM's published :80 FROM the controller (ubongo) — if # base's forward-drop killed DNAT, this times out (the FRICTION 2026-06-17 #1 bug). - name: (askari) Gather service facts when: integration_profile == 'askari' ansible.builtin.service_facts: - name: (askari) Docker daemon is active when: integration_profile == 'askari' ansible.builtin.assert: that: "ansible_facts.services['docker.service'].state == 'running'" fail_msg: "docker.service is not running" - name: (askari) Forward chain permits container traffic (drop-in loaded) when: integration_profile == 'askari' ansible.builtin.command: nft list chain inet filter forward register: _fwd changed_when: false - name: (askari) Assert container forwarding is allowed (not pure drop) when: integration_profile == 'askari' ansible.builtin.assert: that: "'accept' in _fwd.stdout" fail_msg: >- forward chain is pure drop — container forwarding will die on reboot (FRICTION 2026-06-17 #1). docker_host container-forward drop-in missing. - name: (askari) Published port answers from the controller (DNAT + forward alive) when: integration_profile == 'askari' delegate_to: localhost become: false ansible.builtin.uri: url: "http://{{ ansible_host }}/" follow_redirects: none status_code: [200, 301, 308, 404, 502, 503] timeout: 10 register: _probe retries: 5 delay: 6 until: _probe is succeeded # ── ubongo profile — control node: INPUT-only default-deny survives the reboot ── # SSH reachability across the reboot is proven by the harness itself (it re-SSHes and # checks boot_id changed before this verify runs). Here we assert the ruleset shape. - name: (ubongo) Read the live nftables ruleset when: integration_profile == 'ubongo' ansible.builtin.command: nft list ruleset register: _nft changed_when: false - name: (ubongo) INPUT default-deny, forward permissive, admin-addr allow when: integration_profile == 'ubongo' ansible.builtin.assert: that: - "'hook input priority 0; policy drop;' in _nft.stdout" - "'hook forward priority 0; policy accept;' in _nft.stdout" - "'ip saddr 192.168.150.98 tcp dport 22 accept' in _nft.stdout" - "'ip saddr 192.168.150.99 tcp dport 22 accept' in _nft.stdout" fail_msg: >- ubongo profile: expected input policy drop, forward policy accept (input-only), and both admin-addr (192.168.150.98/99) SSH allows in the live ruleset. ``` - [ ] **Step 5: Validate the JSON + lint** Run: `.venv/bin/python -m json.tool tests/integration/profiles/ubongo.json >/dev/null && echo OK` then `make lint` Expected: `OK`, then a clean lint pass (`check-tags: OK`). - [ ] **Step 6: Commit** ```bash git add tests/integration/overrides/ubongo.yml tests/integration/profiles/ubongo.json \ tests/integration/overrides/askari.yml tests/integration/verify.yml git commit -m "test(integration): add the 'be ubongo' profile (input-only default-deny) A control-group VM that applies base with INPUT-only default-deny (forward policy accept; admin-addr SSH allow). verify.yml is now profile-aware via an integration_profile marker — the askari Docker/DNAT block is gated, and a ubongo block asserts input drop + forward accept + the admin-addr rule. Enables \`make test-integration HOST=ubongo\`. Mesh-hardening 2/3 (ADR-025). Co-Authored-By: Claude Opus 4.8 (1M context) " ``` --- ### Task 4: Validate on the integration harness (`make test-integration HOST=ubongo`) — the GREEN gate > Runs a throwaway UEFI VM on ubongo: boots it, applies the base role with the ubongo > overlay (INPUT-only default-deny), **reboots it**, and asserts the ruleset + SSH-returns. > This proves the change survives a reboot before the real control node is ever touched > (spec §cutover step 1; FRICTION signal-6). No code change / no commit — a validation gate. - [ ] **Step 1: Ensure the vault is unlocked** The run loads `inventories/production/group_vars/all/vault.yml` (symlinked into the run dir), which is decrypted at playbook load. Run: `rbw unlocked || rbw unlock` Expected: exits 0 (unlocked). If it prompts, the operator unlocks. - [ ] **Step 2: Run the integration cycle** Run: `make test-integration HOST=ubongo` Expected (the `cycle`: up → apply → reboot → assert): the VM gets a `192.168.150.x` lease; `site.yml --tags base` applies cleanly; `… rebooted (boot_id changed), SSH back at 192.168.150.x`; then `VERIFY PASSED for boma-it-ubongo-…`. The VM is destroyed on success. - [ ] **Step 3: On failure, read the diagnostics** If it prints `VERIFY FAILED`, diagnostics are in `~/integration-runs/boma-it-ubongo-/` (`nft.txt`, `console.log`, `journal.txt`). The likely suspects: the admin-addr/forward assertion (Task 1/3 wiring) or SSH not returning post-reboot (the `base__firewall_control_addr: 192.168.150.1` lifeline in the overlay). Fix the implicated task, re-commit, and re-run Step 2. Re-run `make test-integration-clean` first if a VM was left defined. - [ ] **Step 4: Record the result** Capture the `VERIFY PASSED` line in the task notes (this is the gate Task 5 step 1 depends on). No commit. --- ### Task 5: Live staged cutover (operator-supervised — NOT a subagent task) > Touches the **real ubongo** (the control node Ansible runs from) and reboots it — lockout- > risky. Run it interactively with the operator, in order, verifying each step before the > next. The firewall auto-rollback timer (`base__firewall_rollback_timeout`, 45 s) + > `wait_for_connection` over the live path is the safety net; the **on-prem physical console** > is the permanent break-glass. Do NOT hand this to an unattended agent. - [ ] **Step 1: Pre-checks (gate: Task 4 GREEN)** - `rbw unlocked || rbw unlock`. - SSH to ubongo over `wt0` from a road-warrior succeeds. - SSH to ubongo from mamba on the LAN (`10.20.10.50`) succeeds. - `.venv/bin/ansible ubongo -i inventories/production/ -m ping` → `SUCCESS` (over `10.20.10.151`). - The physical console is reachable. If any path fails, STOP. - [ ] **Step 2: Dry-run the firewall apply** Run: `make check PLAYBOOK=site LIMIT=ubongo TAGS=firewall` Expected: the nftables diff shows `policy drop` on input, `iifname "wt0" … accept`, `ip saddr 10.20.10.151 … accept`, `ip saddr 10.20.10.50 … accept`, and the forward chain as `policy accept`. No errors. - [ ] **Step 3: Apply the host firewall (auto-rollback armed)** Run: `make deploy PLAYBOOK=site LIMIT=ubongo TAGS=firewall` Expected: the firewall concern snapshots `/etc/nftables.rollback`, arms the 45 s `systemd-run` revert, applies the ruleset, `reset_connection` → `wait_for_connection` over `10.20.10.151` succeeds, then cancels the timer. If connectivity is lost, the timer reverts the ruleset within 45 s and the console is the fallback. - [ ] **Step 4: Verify every path + forwarding still works** ```bash # from a road-warrior over wt0, and from mamba on the LAN: ssh sjat@100.99.146.14 true && echo "wt0 OK" ssh sjat@10.20.10.151 true && echo "mamba-LAN OK" # run from mamba (10.20.10.50) # Ansible self-path: .venv/bin/ansible ubongo -i inventories/production/ -m ping # a disallowed LAN host (e.g. 10.20.10.17) must now be refused/timeout on :22 # Docker egress (forward chain still permissive): docker run --rm busybox wget -qO- https://cloudflare.com/cdn-cgi/trace | head -1 # libvirt-NAT forwarding intact — a fresh integration VM still reaches apt: make test-integration HOST=ubongo # expect VERIFY PASSED (proves the NAT path survived) ``` Expected: `wt0 OK`, `mamba-LAN OK`, Ansible `SUCCESS`, the disallowed host refused, the Docker egress line returns, and the integration cycle passes. - [ ] **Step 5: Reboot resilience — while the console is present (FRICTION signal-6)** With the operator at the physical console, reboot ubongo (`sudo systemctl reboot`). After it returns, confirm SSH comes back on all paths **unaided**: ```bash ssh sjat@100.99.146.14 true && echo "wt0 OK after reboot" .venv/bin/ansible ubongo -i inventories/production/ -m ping ``` Expected: SSH returns with no manual intervention (no `ListenAddress`, so nothing to race). Only now is the cutover complete. - [ ] **Step 6: Update STATUS + ROADMAP** - In `STATUS.md`: in the `roles/base/` row of "Scaffolded but empty", change the firewall note — the `firewall` concern is now **applied to ubongo** as INPUT-only default-deny (it is no longer "not yet applied to any host"); note the `base__firewall_input_only` knob and that the forward default-deny still awaits the `docker_host` drop-in for real service hosts. Add the ubongo control-node row's "Pending" item for default-deny → done. - In `docs/ROADMAP.md`: mark **mesh-hardening sub-project 2 (ubongo default-deny) done**; the remaining follow-on is sub-project 1 (askari SSH→`wt0` *redesign*) and sub-project 3 (NetBird ACL). Update the "Next step" section accordingly. ```bash git add STATUS.md docs/ROADMAP.md git commit -m "docs: ubongo INPUT-only default-deny applied (mesh-hardening 2/3 done) Co-Authored-By: Claude Opus 4.8 (1M context) " ``` - [ ] **Step 7: Push** Run: `git push origin main` --- ## Self-review (against the spec) - **§ Design — INPUT-only default-deny** → Task 1 (forward-policy knob) + Task 2 (enabled on ubongo). ✓ - **§ Design — admin-addrs (operator workstations on LAN)** → Task 1 (`base__firewall_admin_addrs` + template loop) + Task 2 (`10.20.10.50` mamba, `10.20.10.17`). ✓ - **§ Design — no sshd ListenAddress change** → nothing touches `ssh.yml`/`sshd_hardening.conf.j2`; only nftables. ✓ (verified: Tasks 1–3 file lists exclude them). - **§ allow-list** (lo, established, wt0, ssh-from-control, admin-addr, icmp; forward accept) → template already renders lo/established/wt0/control/icmp; Task 1 adds admin-addr + forward-accept. ✓ - **§ Why-safe (incident signals 1/2/3/6)** → signal 1 (forward accept, Task 1); signal 2 (no ListenAddress); signal 3 (ubongo keeps LAN + console); signal 6 (Task 4 harness reboot + Task 5 step 5 reboot-while-console). ✓ - **§ New & changed code** (defaults, template, molecule, group_vars/control, integration profile) → Tasks 1–3. ✓ - **§ admin raw-leases + revisit** → Task 2 comments record both leases + the OPNsense-reservation revisit trigger; backstop (wt0) noted; flagged in `FRICTION.md`. ✓ - **§ Testing** (Molecule render asserts; `make test-integration HOST=ubongo`; live checks) → Task 1 (Molecule), Task 4 (harness), Task 5 step 4 (live). ✓ Coverage split (default in Molecule, input_only on the VM) noted in Task 1. - **§ Staged cutover (signal-6 order)** → Task 5 steps 1–7; reboot-recovery (step 5) precedes nothing that retires a break-glass (the console is permanent). ✓ - **§ Risks/rollback** → auto-rollback (Task 5 step 3), redundant paths + physical console, raw-lease backstop. ✓ - **Type/name consistency:** `base__firewall_input_only` (bool) and `base__firewall_admin_addrs` (list) are spelled identically in defaults, template, converge, group_vars, and the overlay. `integration_profile` is spelled identically in both overlays and the three gates in `verify.yml`. ✓ - **Placeholder scan:** no TBD/TODO; every code/command step shows the actual content. ✓