boma/docs/superpowers/specs/2026-06-06-host-nftables-firewall-design.md
sjat d7fbaca554 docs(spec): host nftables firewall design (ADR-020 build #1)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:40:50 +02:00

219 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Design — Host nftables firewall (the `firewall` concern of `base`)
- **Date:** 2026-06-06
- **Status:** Approved design — pending implementation plan
- **Implements:** ADR-020 deferred build #1 (host nftables in `base`)
- **Scope:** The **`firewall`-tagged concern of the `base` role only**. Other `base`
concerns (SSH hardening, fail2ban, auditd, packages, users) are separate future efforts.
Docker netfilter is deferred to the `docker_host` role.
---
## Problem
ADR-020 settled the firewall *strategy*: a per-host nftables layer doing default-deny
inbound + east-west allowlisting + permissive egress, rendered from a shared
`group_vars` service catalog. Nothing is built yet — `roles/base/` is empty. This spec
designs the concrete host firewall: the catalog schema, how rules are resolved and
rendered, how they are applied without locking out the host, and how it is tested.
Two hard constraints shape the design:
1. **Molecule runs in a privileged Docker container sharing the dev host (`ubongo`)
kernel netfilter** — applying real nftables rules there could mutate the live host.
So Level-1 testing renders and syntax-checks but does **not** apply.
2. **Lockout risk** — a bad ruleset can brick SSH/Ansible. On-cluster hosts have the
Proxmox console as break-glass; offsite `askari` (Hetzner) does not, cheaply.
## Scope decisions (settled in brainstorming)
- **Host firewall only**, coherent on any host (even one with no services). Docker
`iptables:false` + container forward/NAT/masquerade are **deferred to `docker_host`**,
which contributes rules via an extension hook (below).
- **Placement lives in the catalog** (`host:` | `group:` | `hosts:`), giving one source
of truth that also resolves symbolic sources. Proxmox HA/migration moves a *VM*
between physical nodes but the VM keeps its static `srv` IP and inventory identity, so
node-level failover is invisible to the firewall. A planned service relocation is a
one-line catalog edit + `--tags firewall` re-deploy (which re-renders opened ports
*and* every source resolution consistently). Within-group HA is handled by placing a
service on a `group`/`hosts` list — the allowlist then already covers every member.
- **Level-1 testing = render + `nft -c` syntax check, no apply.** Enforcement is
verified at Level 2 on staging VMs.
- **Auto-rollback safety net** on apply (critical for offsite `askari`).
## Role layout
Scaffold with `make new-role base`, then implement the firewall concern:
```
roles/base/
tasks/main.yml # include_tasks firewall.yml (tags: [firewall]); grows later
tasks/firewall.yml # install nftables, render, validate, safe-apply
filter_plugins/firewall_rules.py # pure catalog→resolved-rules resolver (pytest-unit-tested)
templates/nftables.conf.j2
defaults/main.yml # base__firewall_* behaviour knobs
handlers/main.yml
molecule/default/ # fixture catalog + inventory; converge + verify
README.md, meta/main.yml
```
`base` is infrastructure, not a *service* role, so the service-role `SECURITY.md` /
`VERIFY.md` conventions (ADR-004) do not apply. The firewall role import in a playbook
carries the `base` role-name tag (enforced by `check-tags.py`, ADR-019); the firewall
tasks within carry the `firewall` concern tag.
## Data model — shared catalog + zones
Two new **global inventory facts** (read by `base` now and OPNsense later, so plain
names, not role-namespaced) in `inventories/<env>/group_vars/all/firewall.yml`:
```yaml
# Zone → subnet (from ADR-007)
firewall_zones:
lan: 10.30.0.0/24
srv: 10.20.0.0/24
mgmt: 10.10.0.0/24
iot: 10.40.0.0/24
guest: 10.50.0.0/24
# Service catalog: name → placement + ingress
firewall_catalog:
reverse_proxy:
host: docker01 # placement: host | group | hosts:[...]
ingress:
- { from: lan, port: 443, proto: tcp }
photoprism:
host: docker01
ingress:
- { from: reverse_proxy, port: 2342, proto: tcp }
```
- **Placement** is exactly one of `host: <name>`, `group: <group>`, or `hosts: [<name>, …]`.
- **`from`** resolves three ways, checked in this order: (1) a key in `firewall_zones`
→ that subnet; (2) a key in `firewall_catalog` → that service's placement → host
IP(s) as `/32`; (3) an inventory group or host name → its IP(s) as `/32`. An
unresolvable `from` is a hard error (fail fast, never silently open/skip).
Role **behaviour knobs** stay role-namespaced in `roles/base/defaults/main.yml`:
| Default | Value | Purpose |
|---|---|---|
| `base__firewall_mgmt_interface` | `wt0` | interface SSH is accepted on (NetBird overlay, ADR-016) |
| `base__firewall_ssh_port` | `22` | SSH port allowed on the mgmt interface |
| `base__firewall_rollback_timeout` | `45` | seconds before auto-revert fires |
| `base__firewall_dropin_dir` | `/etc/nftables.d` | extension dir included by the ruleset |
## Resolution & rendering
The resolver is a **pure Python filter plugin**, `roles/base/filter_plugins/firewall_rules.py`,
exposing `resolve_firewall_rules(catalog, zones, inventory_hostname, hostvars)`. It:
1. selects catalog entries placed on `inventory_hostname` (matching `host`, membership
in `group`, or presence in `hosts`);
2. for each entry's `ingress` rules, resolves `from` to a list of source CIDRs (zone /
service-placement / group-or-host, per the order above);
3. returns a **deterministic, de-duplicated, sorted** list of
`{proto, port, sources: [cidr, …]}`.
Chosen over inline Jinja (unreadable, untestable) and a `set_fact` loop (awkward to
unit-test) — a filter plugin matches the house style of `check-tags.py` /
`capacity-scan.py` and is pytest-unit-testable in isolation. Host→IP resolution reads
`hostvars[<host>].ansible_host` (the static `srv` IP the Terraform-generated inventory
provides).
`tasks/firewall.yml` builds `base__firewall_resolved` from the filter; the template
renders that flat list:
```jinja
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
iif "lo" accept
ct state established,related accept
ct state invalid drop
iif "{{ base__firewall_mgmt_interface }}" tcp dport {{ base__firewall_ssh_port }} accept
ip protocol icmp accept
ip6 nexthdr ipv6-icmp accept
{% for r in base__firewall_resolved %}
ip saddr { {{ r.sources | join(', ') }} } {{ r.proto }} dport {{ r.port }} accept
{% endfor %}
}
chain forward { type filter hook forward priority 0; policy drop; }
chain output { type filter hook output priority 0; policy accept; }
}
include "{{ base__firewall_dropin_dir }}/*.nft"
```
A host with no catalog entries still gets a valid default-deny + management-plane
ruleset. The `include` is the `docker_host` extension hook (forward/NAT drop-ins).
Sorted resolved rules → stable diffs and deterministic tests.
## Safe apply (lockout protection)
`tasks/firewall.yml` renders `/etc/nftables.conf`; when it changes, a **linear**
safe-apply sequence runs (deliberately in tasks, not a handler, so the confirm/cancel
step is controllable — a small, justified deviation from the handler idiom, noted in the
role README):
1. **Validate**`nft -c -f /etc/nftables.conf`; fail the play if invalid, before
touching the live ruleset.
2. **Snapshot**`nft list ruleset > /etc/nftables.rollback` (empty/flush on first run).
3. **Arm revert** — `systemd-run --on-active={{ base__firewall_rollback_timeout }}
--unit=nft-rollback nft -f /etc/nftables.rollback` (transient timer, no `at`
dependency).
4. **Apply** — `nft -f /etc/nftables.conf`.
5. **Confirm + disarm** — the next Ansible task running proves the connection survived →
`systemctl stop nft-rollback`. If the apply bricked connectivity, the play cannot
continue, the timer fires, and the host self-heals (the offsite-`askari` safeguard).
6. **Persist** — enable `nftables.service` so `/etc/nftables.conf` loads on boot.
`established/related` (rendered in the ruleset) means the in-flight Ansible session
survives the swap; atomic `nft -f` avoids partial states.
**NetBird dependency:** locking SSH to `wt0`-only assumes NetBird (ADR-016) is built.
Until then, `base__firewall_mgmt_interface` (and, if needed, an additional management
source) is set to a reachable path so the role is deployable independently. This is a
config knob, not a code dependency.
## Testing (ADR-008)
- **Level 1 / pytest** — unit-test `firewall_rules.py` against fixture catalogs: zone
resolution, service→host-IP resolution, `group`/`hosts` multi-host placement, a host
with no services, source de-dup/sort, and an unresolvable `from` raising. Mirrors
`tests/test_check_tags.py` (import the module, assert on return values).
- **Level 1 / Molecule** — fixture `firewall_catalog` + fixture inventory (host_vars/
group_vars) in the scenario; `converge` renders `/etc/nftables.conf`; `verify` asserts
(a) expected accept lines are present for the fixture and (b) `nft -c -f
/etc/nftables.conf` validates syntax. **No apply** (kernel safety).
- **Level 2 / staging** — real apply on staging VMs verifies enforcement *and* the
safe-apply + auto-rollback path (steps 25), which Level 1 cannot safely cover.
The Molecule base image is not guaranteed to ship `nft`. The role installs the
`nftables` package as its first firewall task, so by the time `verify` runs the `nft -c`
syntax check, `nft` is present (installed during `converge`).
## Open dependencies / notes
- **NetBird/ADR-016 unbuilt** — see the mgmt-interface knob above; full `wt0`-only
lockdown lands when NetBird does.
- The safe-apply orchestration (steps 25) has **no Level-1 coverage** by design; it is
integration-tested at Level 2. Called out so the gap is explicit.
## Scope summary
**Built here:** `firewall_catalog`/`firewall_zones` schema; `firewall_rules.py` resolver
+ pytest; `nftables.conf.j2` (default-deny input, mgmt plane, permissive egress, drop-in
`include` hook); safe-apply-with-rollback tasks; Molecule render/syntax scenario;
`base` role scaffolding (README, meta, defaults, handlers).
**Deferred:** Docker `iptables:false` + container forward/NAT (→ `docker_host` spec, via
the drop-in hook); OPNsense rendering from the same catalog (→ OPNsense-as-code spec);
drift-detection check (ADR-020); all other `base` concerns.
## Related
ADR-020 (firewall strategy), ADR-002 (security baseline), ADR-004 (Docker model —
`iptables:false`, one service = one role), ADR-007 (VLANs/subnets), ADR-008 (testing
levels), ADR-016 (NetBird mesh — SSH on `wt0`), ADR-019 (`firewall` tag).