Five tasks: base knobs (input-only forward policy + admin-addr SSH allow, TDD via Molecule) → enable on the control group → a 'be ubongo' integration profile (profile-aware verify) → the real-VM harness GREEN gate → the operator-supervised live cutover (signal-6 order, physical-console break-glass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
24 KiB
Mesh-hardening 2/3 — ubongo INPUT-only default-deny — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Apply base's nftables firewall to the control node (ubongo) as an INPUT-only default-deny — hardening its inbound surface — while leaving the forward chain permissive so Docker egress and the libvirt-NAT integration harness keep working, and without any sshd ListenAddress change.
Architecture: Two new base knobs make the existing firewall concern fit a control node: base__firewall_input_only flips the forward chain to policy accept (host-local input filtering only), and base__firewall_admin_addrs adds operator-workstation LAN sources to the SSH allow-list (alongside wt0 and ssh-from-control). sshd is untouched (nftables does the scoping → no ip_nonlocal_bind boot-race). The change is validated on a throwaway VM via the ADR-025 integration harness (a new "be ubongo" profile) before an operator-supervised live cutover whose safety net is the firewall auto-rollback timer plus the permanent on-prem physical console.
Tech Stack: Ansible (role base, FQCN), nftables, Jinja2, Molecule on Debian 13, pytest (none new), the ADR-025 integration harness (scripts/integration-vm.py, JSON profiles, -e @ overlays).
Spec: docs/superpowers/specs/2026-06-19-mesh-hardening-ubongo-default-deny-design.md
Conventions: make lint and make test ROLE=base before each commit; make check before make deploy; never hand-edit the generated offsite.yml; rbw unlocked for any commit touching Ansible content and for the integration/live applies (the production group_vars/all/vault.yml is in inventory scope and gets decrypted at playbook load). Tasks 1–3 are code (subagent-driven, each lint/Molecule-verified). Task 4 is a real-VM validation gate on ubongo. Task 5 is the live, operator-supervised cutover.
File Structure
| File | Create/Modify | Responsibility |
|---|---|---|
roles/base/defaults/main.yml |
Modify | Declare base__firewall_input_only + base__firewall_admin_addrs (defaults: off / empty). |
roles/base/templates/nftables.conf.j2 |
Modify | Conditional forward policy; render an SSH-allow rule per admin address. |
roles/base/molecule/default/converge.yml |
Modify | Fixture: an admin-addr source (input-only stays at its default → forward drop). |
roles/base/molecule/default/verify.yml |
Modify | Assert forward-drop default + the admin-addr rule render. |
inventories/production/group_vars/control/vars.yml |
Modify | Turn the knobs on for ubongo (input-only; mamba's LAN IP). |
tests/integration/overrides/ubongo.yml |
Create | The "be ubongo" overlay (input-only firewall; harness SSH lifeline). |
tests/integration/profiles/ubongo.json |
Create | The "be ubongo" VM profile (group control, applies site.yml:base). |
tests/integration/overrides/askari.yml |
Modify | Add the integration_profile marker (verify is now profile-aware). |
tests/integration/verify.yml |
Modify | Gate the askari (Docker/DNAT) block; add the ubongo (input-only) block + a guard. |
STATUS.md, docs/ROADMAP.md |
Modify (Task 5) | Record mesh-hardening 2/3 done. |
Task 1: base role — base__firewall_input_only (forward policy) + base__firewall_admin_addrs (LAN SSH allow)
Files:
- Modify:
roles/base/defaults/main.yml - Modify:
roles/base/templates/nftables.conf.j2 - Modify:
roles/base/molecule/default/converge.yml - Modify:
roles/base/molecule/default/verify.yml
Test strategy (note): Molecule renders one fixture, so it locks the secure default —
input_onlyoff → forwardpolicy drop— plus the new admin-addr rule (red→green). Theinput_onlyon → forwardpolicy acceptpath is exercised on a real VM by the integration "be ubongo" profile (Tasks 3–4), whose verify fails red until this template conditional exists. Both branches are covered, across the two test layers.
- Step 1: Write the failing test (extend Molecule verify)
In roles/base/molecule/default/verify.yml, after the Assert the docker_host extension hook is present block, add:
- name: Assert the forward chain defaults to policy drop (input_only off)
ansible.builtin.assert:
that:
- "'hook forward priority 0; policy drop;' in nft"
fail_msg: >-
forward chain must default to policy drop when base__firewall_input_only is
false (container isolation stays the norm on real service hosts)
- name: Assert the admin-addr SSH allow rule (operator workstation on the LAN)
ansible.builtin.assert:
that:
- "'ip saddr 10.30.0.77 tcp dport 22 accept' in nft"
fail_msg: "missing admin-addr SSH allow rule from base__firewall_admin_addrs"
- Step 2: Add the fixture that drives it (Molecule converge)
In roles/base/molecule/default/converge.yml, add to the vars: block (after the base__firewall_control_addr line):
base__firewall_admin_addrs:
- "10.30.0.77" # fixture: an operator-workstation LAN source (admin-addr SSH allow)
- Step 3: Run the test to verify it fails
Run: make test ROLE=base
Expected: FAIL on Assert the admin-addr SSH allow rule (the template does not consume base__firewall_admin_addrs yet, so the ip saddr 10.30.0.77 … rule is absent). The forward-drop assertion passes already (the template currently hardcodes policy drop).
- Step 4: Add the defaults
In roles/base/defaults/main.yml, after the base__firewall_apply: true line (end of the firewall behaviour block, currently line 13), add:
base__firewall_input_only: false # true → the forward chain is `policy accept` (host-local
# INPUT filtering only). For hosts that forward/route
# container or NAT traffic (the control node's Docker +
# libvirt-NAT) where a forward default-deny would break
# them. Real service hosts keep this false (forward drop).
base__firewall_admin_addrs: [] # extra LAN source IPs allowed to SSH, besides wt0 +
# ssh-from-control. For an operator workstation reaching
# the host over the LAN (no mesh). Key-gated. (ADR-021)
- Step 5: Make the forward policy conditional + render the admin-addr rules
In roles/base/templates/nftables.conf.j2:
(a) Replace the forward-chain line (currently line 21):
chain forward { type filter hook forward priority 0; policy {{ 'accept' if base__firewall_input_only | bool else 'drop' }}; }
(b) After the ssh-from-control {% endif %} (currently line 14) and before the ip protocol icmp accept line, add the admin-addr loop:
{% for addr in base__firewall_admin_addrs %}
ip saddr {{ addr }} tcp dport {{ base__firewall_ssh_port }} accept
{% endfor %}
- Step 6: Run the test to verify it passes
Run: make test ROLE=base
Expected: PASS — converge renders the ruleset; verify confirms the forward chain is policy drop (input_only defaults false) and the ip saddr 10.30.0.77 tcp dport 22 accept rule is present; all pre-existing assertions stay green.
- Step 7: Lint
Run: make lint
Expected: Passed: 0 failure(s) and check-tags: OK.
- Step 8: Commit
git add roles/base/defaults/main.yml roles/base/templates/nftables.conf.j2 \
roles/base/molecule/default/converge.yml roles/base/molecule/default/verify.yml
git commit -m "feat(base): input-only forward policy + admin-addr SSH allow
base__firewall_input_only renders the forward chain policy accept (host-local
INPUT filtering only) for hosts that forward container/NAT traffic; defaults
false so real service hosts keep the forward default-deny. base__firewall_admin_addrs
adds operator-workstation LAN sources to the SSH allow-list alongside wt0 +
ssh-from-control. Molecule locks the secure default + the admin rule.
Mesh-hardening 2/3 (ADR-020/021).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
Task 2: inventory — enable input-only default-deny + mamba on ubongo (control group)
Files:
-
Modify:
inventories/production/group_vars/control/vars.yml -
Step 1: Turn the knobs on for the control group
Append to inventories/production/group_vars/control/vars.yml:
# Mesh-hardening 2/3 (2026-06-19, ADR-020/021): apply base's host firewall to ubongo as
# INPUT-only default-deny — harden the inbound surface, leave the forward chain permissive so
# Docker egress + the libvirt-NAT integration harness keep working. sshd is unchanged
# (nftables scopes inbound), so there is no boot-race. Reach ubongo over wt0 (mesh), the
# ssh-from-control self-path (base__firewall_control_addr, group_vars/all = 10.20.10.151), or
# mamba on the LAN. Break-glass: the physical console. (base__firewall_apply defaults true.)
base__firewall_input_only: true
base__firewall_admin_addrs:
- "10.20.10.50" # mamba over the LAN (NetBird off). Raw DHCP lease — revisit with an
# OPNsense reservation when OPNsense-as-code lands; backstopped by wt0.
- "10.20.10.17" # 2nd operator workstation (MAC bc:0f:f3:c8:4a:8a). Raw lease — ditto.
- Step 2: Verify the vars resolve for ubongo
Run: .venv/bin/ansible-inventory -i inventories/production/ --host ubongo 2>/dev/null | grep -E 'firewall_input_only|firewall_admin_addrs|10.20.10.(50|17)'
Expected: shows "base__firewall_input_only": true and "base__firewall_admin_addrs": ["10.20.10.50", "10.20.10.17"].
- Step 3: Lint
Run: make lint
Expected: clean pass (check-tags: OK).
- Step 4: Commit
git add inventories/production/group_vars/control/vars.yml
git commit -m "feat(inventory): ubongo gets INPUT-only host firewall + mamba LAN SSH
Enables base__firewall_input_only on the control group (forward chain stays
permissive so Docker egress + the integration-test libvirt NAT survive) and
allows the operator workstations' LAN IPs (mamba 10.20.10.50 + 10.20.10.17;
raw leases, backstopped by wt0). Mesh-hardening 2/3.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
Task 3: integration harness — "be ubongo" profile (overlay + profile + profile-aware verify)
Files:
-
Create:
tests/integration/overrides/ubongo.yml -
Create:
tests/integration/profiles/ubongo.json -
Modify:
tests/integration/overrides/askari.yml -
Modify:
tests/integration/verify.yml -
Step 1: Create the "be ubongo" overlay
Create tests/integration/overrides/ubongo.yml:
---
# Integration-test overlay for the "ubongo" profile (ADR-025). Passed via `-e @`.
# Exercises mesh-hardening 2/3: base's INPUT-only default-deny on the control node — input
# chain default-deny, forward chain left permissive (Docker/libvirt-NAT safe), no sshd
# ListenAddress change (so no boot-race).
integration_profile: ubongo
base__firewall_apply: true
base__firewall_input_only: true # forward chain renders `policy accept`
base__firewall_admin_addrs:
- "192.168.150.98" # two representative LAN sources — exercises the
- "192.168.150.99" # admin-addr loop with a multi-entry list (like ubongo)
# Never wt0-only; never touch the real mesh from a throwaway VM.
base__ssh_listen_mesh_only: false
base__mesh_enabled: false
# Allow SSH from the libvirt-NAT gateway (where the driver/ansible connect from) so the
# default-deny apply + the reboot don't lock out the harness. By source IP (interface-
# independent). This is the harness's lifeline; the admin-addr above is only exercised.
base__firewall_control_addr: "192.168.150.1"
- Step 2: Create the "be ubongo" VM profile
Create tests/integration/profiles/ubongo.json:
{
"groups": ["control"],
"applies": [
{"playbook": "site.yml", "tags": ["base"]}
],
"extra_vars_files": ["overrides/ubongo.yml"],
"mem_mib": 2048,
"vcpus": 2
}
- Step 3: Mark the askari overlay with its profile name
In tests/integration/overrides/askari.yml, after the two header comment lines (before base__firewall_apply: true), add:
integration_profile: askari
- Step 4: Make
verify.ymlprofile-aware (the test)
Replace the entire contents of tests/integration/verify.yml with:
---
# Integration verify (ADR-025). Outcome-based, profile-aware: the active profile is named by
# `integration_profile` (set in each profile's overlay). Each profile asserts its own success
# criteria; an unknown/unset profile fails loudly (never a silent pass).
- name: Verify the rebooted host
hosts: all
become: true
gather_facts: false
tasks:
- name: A known integration_profile must be set (no silent pass)
ansible.builtin.assert:
that:
- integration_profile is defined
- integration_profile in ['askari', 'ubongo']
fail_msg: "integration_profile must be set in the profile overlay (askari|ubongo)"
# ── askari profile — Docker host: published-port forwarding survives the reboot ──
# The load-bearing check probes the VM's published :80 FROM the controller (ubongo) — if
# base's forward-drop killed DNAT, this times out (the FRICTION 2026-06-17 #1 bug).
- name: (askari) Gather service facts
when: integration_profile == 'askari'
ansible.builtin.service_facts:
- name: (askari) Docker daemon is active
when: integration_profile == 'askari'
ansible.builtin.assert:
that: "ansible_facts.services['docker.service'].state == 'running'"
fail_msg: "docker.service is not running"
- name: (askari) Forward chain permits container traffic (drop-in loaded)
when: integration_profile == 'askari'
ansible.builtin.command: nft list chain inet filter forward
register: _fwd
changed_when: false
- name: (askari) Assert container forwarding is allowed (not pure drop)
when: integration_profile == 'askari'
ansible.builtin.assert:
that: "'accept' in _fwd.stdout"
fail_msg: >-
forward chain is pure drop — container forwarding will die on reboot
(FRICTION 2026-06-17 #1). docker_host container-forward drop-in missing.
- name: (askari) Published port answers from the controller (DNAT + forward alive)
when: integration_profile == 'askari'
delegate_to: localhost
become: false
ansible.builtin.uri:
url: "http://{{ ansible_host }}/"
follow_redirects: none
status_code: [200, 301, 308, 404, 502, 503]
timeout: 10
register: _probe
retries: 5
delay: 6
until: _probe is succeeded
# ── ubongo profile — control node: INPUT-only default-deny survives the reboot ──
# SSH reachability across the reboot is proven by the harness itself (it re-SSHes and
# checks boot_id changed before this verify runs). Here we assert the ruleset shape.
- name: (ubongo) Read the live nftables ruleset
when: integration_profile == 'ubongo'
ansible.builtin.command: nft list ruleset
register: _nft
changed_when: false
- name: (ubongo) INPUT default-deny, forward permissive, admin-addr allow
when: integration_profile == 'ubongo'
ansible.builtin.assert:
that:
- "'hook input priority 0; policy drop;' in _nft.stdout"
- "'hook forward priority 0; policy accept;' in _nft.stdout"
- "'ip saddr 192.168.150.98 tcp dport 22 accept' in _nft.stdout"
- "'ip saddr 192.168.150.99 tcp dport 22 accept' in _nft.stdout"
fail_msg: >-
ubongo profile: expected input policy drop, forward policy accept (input-only),
and both admin-addr (192.168.150.98/99) SSH allows in the live ruleset.
- Step 5: Validate the JSON + lint
Run: .venv/bin/python -m json.tool tests/integration/profiles/ubongo.json >/dev/null && echo OK then make lint
Expected: OK, then a clean lint pass (check-tags: OK).
- Step 6: Commit
git add tests/integration/overrides/ubongo.yml tests/integration/profiles/ubongo.json \
tests/integration/overrides/askari.yml tests/integration/verify.yml
git commit -m "test(integration): add the 'be ubongo' profile (input-only default-deny)
A control-group VM that applies base with INPUT-only default-deny (forward
policy accept; admin-addr SSH allow). verify.yml is now profile-aware via an
integration_profile marker — the askari Docker/DNAT block is gated, and a ubongo
block asserts input drop + forward accept + the admin-addr rule. Enables
\`make test-integration HOST=ubongo\`. Mesh-hardening 2/3 (ADR-025).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
Task 4: Validate on the integration harness (make test-integration HOST=ubongo) — the GREEN gate
Runs a throwaway UEFI VM on ubongo: boots it, applies the base role with the ubongo overlay (INPUT-only default-deny), reboots it, and asserts the ruleset + SSH-returns. This proves the change survives a reboot before the real control node is ever touched (spec §cutover step 1; FRICTION signal-6). No code change / no commit — a validation gate.
- Step 1: Ensure the vault is unlocked
The run loads inventories/production/group_vars/all/vault.yml (symlinked into the run dir), which is decrypted at playbook load.
Run: rbw unlocked || rbw unlock
Expected: exits 0 (unlocked). If it prompts, the operator unlocks.
- Step 2: Run the integration cycle
Run: make test-integration HOST=ubongo
Expected (the cycle: up → apply → reboot → assert): the VM gets a 192.168.150.x lease; site.yml --tags base applies cleanly; … rebooted (boot_id changed), SSH back at 192.168.150.x; then VERIFY PASSED for boma-it-ubongo-…. The VM is destroyed on success.
- Step 3: On failure, read the diagnostics
If it prints VERIFY FAILED, diagnostics are in ~/integration-runs/boma-it-ubongo-<id>/ (nft.txt, console.log, journal.txt). The likely suspects: the admin-addr/forward assertion (Task 1/3 wiring) or SSH not returning post-reboot (the base__firewall_control_addr: 192.168.150.1 lifeline in the overlay). Fix the implicated task, re-commit, and re-run Step 2. Re-run make test-integration-clean first if a VM was left defined.
- Step 4: Record the result
Capture the VERIFY PASSED line in the task notes (this is the gate Task 5 step 1 depends on). No commit.
Task 5: Live staged cutover (operator-supervised — NOT a subagent task)
Touches the real ubongo (the control node Ansible runs from) and reboots it — lockout- risky. Run it interactively with the operator, in order, verifying each step before the next. The firewall auto-rollback timer (
base__firewall_rollback_timeout, 45 s) +wait_for_connectionover the live path is the safety net; the on-prem physical console is the permanent break-glass. Do NOT hand this to an unattended agent.
-
Step 1: Pre-checks (gate: Task 4 GREEN)
-
rbw unlocked || rbw unlock. -
SSH to ubongo over
wt0from a road-warrior succeeds. -
SSH to ubongo from mamba on the LAN (
10.20.10.50) succeeds. -
.venv/bin/ansible ubongo -i inventories/production/ -m ping→SUCCESS(over10.20.10.151). -
The physical console is reachable. If any path fails, STOP.
-
Step 2: Dry-run the firewall apply
Run: make check PLAYBOOK=site LIMIT=ubongo TAGS=firewall
Expected: the nftables diff shows policy drop on input, iifname "wt0" … accept, ip saddr 10.20.10.151 … accept, ip saddr 10.20.10.50 … accept, and the forward chain as policy accept. No errors.
- Step 3: Apply the host firewall (auto-rollback armed)
Run: make deploy PLAYBOOK=site LIMIT=ubongo TAGS=firewall
Expected: the firewall concern snapshots /etc/nftables.rollback, arms the 45 s systemd-run revert, applies the ruleset, reset_connection → wait_for_connection over 10.20.10.151 succeeds, then cancels the timer. If connectivity is lost, the timer reverts the ruleset within 45 s and the console is the fallback.
- Step 4: Verify every path + forwarding still works
# from a road-warrior over wt0, and from mamba on the LAN:
ssh sjat@100.99.146.14 true && echo "wt0 OK"
ssh sjat@10.20.10.151 true && echo "mamba-LAN OK" # run from mamba (10.20.10.50)
# Ansible self-path:
.venv/bin/ansible ubongo -i inventories/production/ -m ping
# a disallowed LAN host (e.g. 10.20.10.17) must now be refused/timeout on :22
# Docker egress (forward chain still permissive):
docker run --rm busybox wget -qO- https://cloudflare.com/cdn-cgi/trace | head -1
# libvirt-NAT forwarding intact — a fresh integration VM still reaches apt:
make test-integration HOST=ubongo # expect VERIFY PASSED (proves the NAT path survived)
Expected: wt0 OK, mamba-LAN OK, Ansible SUCCESS, the disallowed host refused, the Docker egress line returns, and the integration cycle passes.
- Step 5: Reboot resilience — while the console is present (FRICTION signal-6)
With the operator at the physical console, reboot ubongo (sudo systemctl reboot). After it returns, confirm SSH comes back on all paths unaided:
ssh sjat@100.99.146.14 true && echo "wt0 OK after reboot"
.venv/bin/ansible ubongo -i inventories/production/ -m ping
Expected: SSH returns with no manual intervention (no ListenAddress, so nothing to race). Only now is the cutover complete.
-
Step 6: Update STATUS + ROADMAP
-
In
STATUS.md: in theroles/base/row of "Scaffolded but empty", change the firewall note — thefirewallconcern is now applied to ubongo as INPUT-only default-deny (it is no longer "not yet applied to any host"); note thebase__firewall_input_onlyknob and that the forward default-deny still awaits thedocker_hostdrop-in for real service hosts. Add the ubongo control-node row's "Pending" item for default-deny → done. -
In
docs/ROADMAP.md: mark mesh-hardening sub-project 2 (ubongo default-deny) done; the remaining follow-on is sub-project 1 (askari SSH→wt0redesign) and sub-project 3 (NetBird ACL). Update the "Next step" section accordingly.
git add STATUS.md docs/ROADMAP.md
git commit -m "docs: ubongo INPUT-only default-deny applied (mesh-hardening 2/3 done)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
- Step 7: Push
Run: git push origin main
Self-review (against the spec)
- § Design — INPUT-only default-deny → Task 1 (forward-policy knob) + Task 2 (enabled on ubongo). ✓
- § Design — admin-addrs (operator workstations on LAN) → Task 1 (
base__firewall_admin_addrs+ template loop) + Task 2 (10.20.10.50mamba,10.20.10.17). ✓ - § Design — no sshd ListenAddress change → nothing touches
ssh.yml/sshd_hardening.conf.j2; only nftables. ✓ (verified: Tasks 1–3 file lists exclude them). - § allow-list (lo, established, wt0, ssh-from-control, admin-addr, icmp; forward accept) → template already renders lo/established/wt0/control/icmp; Task 1 adds admin-addr + forward-accept. ✓
- § Why-safe (incident signals 1/2/3/6) → signal 1 (forward accept, Task 1); signal 2 (no ListenAddress); signal 3 (ubongo keeps LAN + console); signal 6 (Task 4 harness reboot + Task 5 step 5 reboot-while-console). ✓
- § New & changed code (defaults, template, molecule, group_vars/control, integration profile) → Tasks 1–3. ✓
- § admin raw-leases + revisit → Task 2 comments record both leases + the OPNsense-reservation revisit trigger; backstop (wt0) noted; flagged in
FRICTION.md. ✓ - § Testing (Molecule render asserts;
make test-integration HOST=ubongo; live checks) → Task 1 (Molecule), Task 4 (harness), Task 5 step 4 (live). ✓ Coverage split (default in Molecule, input_only on the VM) noted in Task 1. - § Staged cutover (signal-6 order) → Task 5 steps 1–7; reboot-recovery (step 5) precedes nothing that retires a break-glass (the console is permanent). ✓
- § Risks/rollback → auto-rollback (Task 5 step 3), redundant paths + physical console, raw-lease backstop. ✓
- Type/name consistency:
base__firewall_input_only(bool) andbase__firewall_admin_addrs(list) are spelled identically in defaults, template, converge, group_vars, and the overlay.integration_profileis spelled identically in both overlays and the three gates inverify.yml. ✓ - Placeholder scan: no TBD/TODO; every code/command step shows the actual content. ✓