boma/docs/superpowers/specs/2026-06-06-host-nftables-firewall-design.md
sjat d7fbaca554 docs(spec): host nftables firewall design (ADR-020 build #1)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:40:50 +02:00

10 KiB
Raw Permalink Blame History

Design — Host nftables firewall (the firewall concern of base)

  • Date: 2026-06-06
  • Status: Approved design — pending implementation plan
  • Implements: ADR-020 deferred build #1 (host nftables in base)
  • Scope: The firewall-tagged concern of the base role only. Other base concerns (SSH hardening, fail2ban, auditd, packages, users) are separate future efforts. Docker netfilter is deferred to the docker_host role.

Problem

ADR-020 settled the firewall strategy: a per-host nftables layer doing default-deny inbound + east-west allowlisting + permissive egress, rendered from a shared group_vars service catalog. Nothing is built yet — roles/base/ is empty. This spec designs the concrete host firewall: the catalog schema, how rules are resolved and rendered, how they are applied without locking out the host, and how it is tested.

Two hard constraints shape the design:

  1. Molecule runs in a privileged Docker container sharing the dev host (ubongo) kernel netfilter — applying real nftables rules there could mutate the live host. So Level-1 testing renders and syntax-checks but does not apply.
  2. Lockout risk — a bad ruleset can brick SSH/Ansible. On-cluster hosts have the Proxmox console as break-glass; offsite askari (Hetzner) does not, cheaply.

Scope decisions (settled in brainstorming)

  • Host firewall only, coherent on any host (even one with no services). Docker iptables:false + container forward/NAT/masquerade are deferred to docker_host, which contributes rules via an extension hook (below).
  • Placement lives in the catalog (host: | group: | hosts:), giving one source of truth that also resolves symbolic sources. Proxmox HA/migration moves a VM between physical nodes but the VM keeps its static srv IP and inventory identity, so node-level failover is invisible to the firewall. A planned service relocation is a one-line catalog edit + --tags firewall re-deploy (which re-renders opened ports and every source resolution consistently). Within-group HA is handled by placing a service on a group/hosts list — the allowlist then already covers every member.
  • Level-1 testing = render + nft -c syntax check, no apply. Enforcement is verified at Level 2 on staging VMs.
  • Auto-rollback safety net on apply (critical for offsite askari).

Role layout

Scaffold with make new-role base, then implement the firewall concern:

roles/base/
  tasks/main.yml                     # include_tasks firewall.yml (tags: [firewall]); grows later
  tasks/firewall.yml                 # install nftables, render, validate, safe-apply
  filter_plugins/firewall_rules.py   # pure catalog→resolved-rules resolver (pytest-unit-tested)
  templates/nftables.conf.j2
  defaults/main.yml                  # base__firewall_* behaviour knobs
  handlers/main.yml
  molecule/default/                  # fixture catalog + inventory; converge + verify
  README.md, meta/main.yml

base is infrastructure, not a service role, so the service-role SECURITY.md / VERIFY.md conventions (ADR-004) do not apply. The firewall role import in a playbook carries the base role-name tag (enforced by check-tags.py, ADR-019); the firewall tasks within carry the firewall concern tag.

Data model — shared catalog + zones

Two new global inventory facts (read by base now and OPNsense later, so plain names, not role-namespaced) in inventories/<env>/group_vars/all/firewall.yml:

# Zone → subnet (from ADR-007)
firewall_zones:
  lan:  10.30.0.0/24
  srv:  10.20.0.0/24
  mgmt: 10.10.0.0/24
  iot:  10.40.0.0/24
  guest: 10.50.0.0/24

# Service catalog: name → placement + ingress
firewall_catalog:
  reverse_proxy:
    host: docker01                  # placement: host | group | hosts:[...]
    ingress:
      - { from: lan, port: 443, proto: tcp }
  photoprism:
    host: docker01
    ingress:
      - { from: reverse_proxy, port: 2342, proto: tcp }
  • Placement is exactly one of host: <name>, group: <group>, or hosts: [<name>, …].
  • from resolves three ways, checked in this order: (1) a key in firewall_zones → that subnet; (2) a key in firewall_catalog → that service's placement → host IP(s) as /32; (3) an inventory group or host name → its IP(s) as /32. An unresolvable from is a hard error (fail fast, never silently open/skip).

Role behaviour knobs stay role-namespaced in roles/base/defaults/main.yml:

Default Value Purpose
base__firewall_mgmt_interface wt0 interface SSH is accepted on (NetBird overlay, ADR-016)
base__firewall_ssh_port 22 SSH port allowed on the mgmt interface
base__firewall_rollback_timeout 45 seconds before auto-revert fires
base__firewall_dropin_dir /etc/nftables.d extension dir included by the ruleset

Resolution & rendering

The resolver is a pure Python filter plugin, roles/base/filter_plugins/firewall_rules.py, exposing resolve_firewall_rules(catalog, zones, inventory_hostname, hostvars). It:

  1. selects catalog entries placed on inventory_hostname (matching host, membership in group, or presence in hosts);
  2. for each entry's ingress rules, resolves from to a list of source CIDRs (zone / service-placement / group-or-host, per the order above);
  3. returns a deterministic, de-duplicated, sorted list of {proto, port, sources: [cidr, …]}.

Chosen over inline Jinja (unreadable, untestable) and a set_fact loop (awkward to unit-test) — a filter plugin matches the house style of check-tags.py / capacity-scan.py and is pytest-unit-testable in isolation. Host→IP resolution reads hostvars[<host>].ansible_host (the static srv IP the Terraform-generated inventory provides).

tasks/firewall.yml builds base__firewall_resolved from the filter; the template renders that flat list:

#!/usr/sbin/nft -f
flush ruleset
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;
    iif "lo" accept
    ct state established,related accept
    ct state invalid drop
    iif "{{ base__firewall_mgmt_interface }}" tcp dport {{ base__firewall_ssh_port }} accept
    ip protocol icmp accept
    ip6 nexthdr ipv6-icmp accept
{% for r in base__firewall_resolved %}
    ip saddr { {{ r.sources | join(', ') }} } {{ r.proto }} dport {{ r.port }} accept
{% endfor %}
  }
  chain forward { type filter hook forward priority 0; policy drop; }
  chain output  { type filter hook output  priority 0; policy accept; }
}
include "{{ base__firewall_dropin_dir }}/*.nft"

A host with no catalog entries still gets a valid default-deny + management-plane ruleset. The include is the docker_host extension hook (forward/NAT drop-ins). Sorted resolved rules → stable diffs and deterministic tests.

Safe apply (lockout protection)

tasks/firewall.yml renders /etc/nftables.conf; when it changes, a linear safe-apply sequence runs (deliberately in tasks, not a handler, so the confirm/cancel step is controllable — a small, justified deviation from the handler idiom, noted in the role README):

  1. Validatenft -c -f /etc/nftables.conf; fail the play if invalid, before touching the live ruleset.
  2. Snapshotnft list ruleset > /etc/nftables.rollback (empty/flush on first run).
  3. Arm revertsystemd-run --on-active={{ base__firewall_rollback_timeout }} --unit=nft-rollback nft -f /etc/nftables.rollback (transient timer, no at dependency).
  4. Applynft -f /etc/nftables.conf.
  5. Confirm + disarm — the next Ansible task running proves the connection survived → systemctl stop nft-rollback. If the apply bricked connectivity, the play cannot continue, the timer fires, and the host self-heals (the offsite-askari safeguard).
  6. Persist — enable nftables.service so /etc/nftables.conf loads on boot.

established/related (rendered in the ruleset) means the in-flight Ansible session survives the swap; atomic nft -f avoids partial states.

NetBird dependency: locking SSH to wt0-only assumes NetBird (ADR-016) is built. Until then, base__firewall_mgmt_interface (and, if needed, an additional management source) is set to a reachable path so the role is deployable independently. This is a config knob, not a code dependency.

Testing (ADR-008)

  • Level 1 / pytest — unit-test firewall_rules.py against fixture catalogs: zone resolution, service→host-IP resolution, group/hosts multi-host placement, a host with no services, source de-dup/sort, and an unresolvable from raising. Mirrors tests/test_check_tags.py (import the module, assert on return values).
  • Level 1 / Molecule — fixture firewall_catalog + fixture inventory (host_vars/ group_vars) in the scenario; converge renders /etc/nftables.conf; verify asserts (a) expected accept lines are present for the fixture and (b) nft -c -f /etc/nftables.conf validates syntax. No apply (kernel safety).
  • Level 2 / staging — real apply on staging VMs verifies enforcement and the safe-apply + auto-rollback path (steps 25), which Level 1 cannot safely cover.

The Molecule base image is not guaranteed to ship nft. The role installs the nftables package as its first firewall task, so by the time verify runs the nft -c syntax check, nft is present (installed during converge).

Open dependencies / notes

  • NetBird/ADR-016 unbuilt — see the mgmt-interface knob above; full wt0-only lockdown lands when NetBird does.
  • The safe-apply orchestration (steps 25) has no Level-1 coverage by design; it is integration-tested at Level 2. Called out so the gap is explicit.

Scope summary

Built here: firewall_catalog/firewall_zones schema; firewall_rules.py resolver

  • pytest; nftables.conf.j2 (default-deny input, mgmt plane, permissive egress, drop-in include hook); safe-apply-with-rollback tasks; Molecule render/syntax scenario; base role scaffolding (README, meta, defaults, handlers).

Deferred: Docker iptables:false + container forward/NAT (→ docker_host spec, via the drop-in hook); OPNsense rendering from the same catalog (→ OPNsense-as-code spec); drift-detection check (ADR-020); all other base concerns.

ADR-020 (firewall strategy), ADR-002 (security baseline), ADR-004 (Docker model — iptables:false, one service = one role), ADR-007 (VLANs/subnets), ADR-008 (testing levels), ADR-016 (NetBird mesh — SSH on wt0), ADR-019 (firewall tag).