- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional, outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative boma.baobab.band -> boma.wingu.me transition note already added earlier - terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and <host>.boma.baobab.band per ADR-007 naming (O11) - ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections placed after Consequences, matching ADR-014/019-023 (O13) - docs/README + inventories/README: list the missing subdirs / offsite_hosts + offsite.yml merge behaviour (O14, O29 note) - ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19) - ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20) - ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21) - netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23) - ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24) - capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28) - tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9) - tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep) O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected); the fix lives in the generator for the next regeneration. make lint + pytest (57) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.3 KiB
ADR-020 — Firewall strategy: two-layer model with a shared service catalog
Status
Accepted (2026-06-06). Resolves TODO 3.5 ("Decide the firewall strategy — which firewall, ruleset, per-host vs central").
Strategy ADR. It pins the architecture and each layer's responsibilities; the detailed builds are separate follow-up efforts (see Scope).
Context
boma needs a firewall strategy that is predictable, declarative, and defends the stated
threat model — opportunistic external, lateral movement / blast radius, operator/agent
error (ADR-002). The pieces were already committed across other ADRs (nftables
default-deny on hosts — ADR-002; OPNsense at the perimeter — ADR-007; Docker with
iptables: false — ADR-004), but nothing tied them together: which layer owns what,
where firewall intent is declared, and how the layers stay consistent. Without that,
ports drift open ad-hoc and "per-host vs central" stays unanswered.
Decision
Two layers, distinct jobs
OPNsense — perimeter + inter-VLAN. Owns the WAN edge and all policy between zones:
lan/iot/guest → srv, mgmt access, and the per-VLAN egress rules (ADR-007). It
is structurally blind to intra-srv traffic — services share the switched srv
subnet (VLAN 20), which never reaches the gateway.
Host nftables — host-local + east-west within srv (in the base role, every VM):
- Default-deny inbound; allow loopback + established/related.
- East-west allowlist: a service host accepts a connection only from declared sources (e.g. the reverse proxy, a named peer) — the lateral-movement control OPNsense cannot provide.
- Permissive egress: allow outbound + established/related; per-VLAN egress restriction stays at OPNsense (ADR-007). Host-level egress allowlisting is high-friction (every DNS/NTP/update/registry/webhook must be enumerated) for limited added benefit once the VLAN already bounds where a host can go.
- Docker: daemon runs with
"iptables": false; nftables owns all filtering, including container traffic (ADR-004). - Guaranteed management plane: loopback, established/related,
wt0(NetBird, ADR-016), and SSH from the control node's LAN address (base__firewall_control_addr, thessh-from-controlsource) for SSH + Ansible are always allowed, independent of the catalog, applied atomically — a malformed or empty catalog can never lock out management. The control-node source is part of the guaranteed plane, not the service catalog (it is management, not a service); see ADR-021 for the access doctrine.
So "per-host vs central" is answered: both, with clear ownership.
Single source of truth — a shared service catalog
A central, declarative service catalog in group_vars/ is the one source of truth
for firewall intent (aligning with ADR-002's "port definitions live in group_vars/",
and keeping connectivity topology in inventory rather than in any one self-contained
service role — ADR-004). Each entry describes a service's ingress:
photoprism:
ingress:
- { from: reverse_proxy, port: 2342, proto: tcp }
reverse_proxy:
ingress:
- { from: lan, port: 443, proto: tcp }
from is symbolic, resolved at render time: a host/group → IP(s) from inventory; a
role (reverse_proxy) → the host(s) filling it; a VLAN/zone (lan) → the subnet from
the ADR-007 table. This keeps the catalog readable and resilient to IP changes.
Each layer renders only its own slice
| Ingress rule | Host nftables | OPNsense |
|---|---|---|
from: reverse_proxy (a srv peer) |
allow proxy IP → port | — (intra-srv, invisible) |
from: lan (cross-VLAN) |
allow lan subnet → port |
allow lan → host:port |
The dominant pattern falls out naturally: most services are proxied — their only
ingress is from: reverse_proxy, and users reach them through the reverse proxy, which
alone carries from: lan, port: 443 (matches "services sit behind the reverse proxy
with authentication", ADR-002).
This was chosen over a single connectivity-model-generates-both (too much machinery, tight coupling of two very different rule domains) and over fully independent per-layer declarations (real drift risk).
Off-cluster hosts — askari (Hetzner)
askari sits outside the Proxmox cluster and has no OPNsense. Its perimeter layer
is a TF-managed Hetzner Cloud Firewall (declared in terraform/environments/offsite/)
alongside the VM itself. Rule set: SSH inbound from ubongo's public IP (M2), plus
TCP 80/443 + UDP 3478 opened in M4a (Caddy + NetBird). The netbird_coordinator
service role that uses 3478 lands in M4b; the ports are already open.
The group_vars service catalog remains authoritative for askari's host nftables
layer — the same two-layer model applies, with Hetzner Cloud Firewall substituting for
OPNsense at the perimeter.
OPNsense automation — owned here, mechanism deferred
OPNsense is Ansible-managed (CLAUDE.md: "OPNsense is entirely Ansible; no Terraform OPNsense provider"). It renders the cross-VLAN slice of the catalog plus the static ADR-007 facts. The how — config-XML templating vs the OPNsense API vs a plugin — is deferred to the OPNsense-as-code follow-up spec. Recorded as an explicit open sub-decision.
Guardrails
- The catalog is authoritative. If a port is not in the catalog, it does not exist — hardening the existing rule "never open a firewall port ad-hoc on a host" (ADR-002).
- The
firewalltag (ADR-019) marks firewall tasks;--tags firewallre-renders rules. - Drift detection (aspiration). A deterministic check — in the spirit of
scripts/check-tags.py— comparing each host's livenftruleset / listening ports against the catalog and flagging anything undeclared. Ties to TODO 8.5 (/security-review). Not necessarily built first.
Consequences
- Lateral movement within
srvis constrained — the gap OPNsense structurally can't close. - One declarative catalog → no ad-hoc ports and no cross-layer drift on shared facts (ports, IPs, sources).
- Cost: the catalog + render-per-layer machinery must be built and maintained; east-west allowlisting adds per-service ingress declarations (mitigated by proxied-by-default, which keeps most entries to a single line).
Scope
Decided here: the two-layer model and responsibilities; host nftables = default-deny
inbound + east-west allowlist + permissive egress + guaranteed management plane + Docker
iptables:false; the shared group_vars catalog as single source of truth with
symbolic sources; each layer renders its own slice; the no-ad-hoc-ports guardrail.
Deferred to follow-up specs (each its own brainstorm → plan):
- Host nftables implementation in
base— catalog schema, nftables template, Dockeriptables:falseintegration, fail-safe ordering, Molecule tests. The natural next spec. - OPNsense-as-code — tooling mechanism + cross-VLAN rule rendering.
- Drift-detection check — if/when built.
Related
ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
ADR-004 (Docker model: iptables:false), ADR-007 (network topology, VLANs, OPNsense,
per-VLAN egress), ADR-016 (NetBird mesh: SSH on wt0 only), ADR-019 (firewall tag),
ADR-021 (operational access doctrine; ssh-from-control management-plane source).