# ADR-020 — Firewall strategy: two-layer model with a shared service catalog ## Status Accepted (2026-06-06). Resolves TODO 3.5 ("Decide the firewall strategy — which firewall, ruleset, per-host vs central"). **Strategy ADR.** It pins the architecture and each layer's responsibilities; the detailed builds are separate follow-up efforts (see *Scope*). ## Context boma needs a firewall strategy that is predictable, declarative, and defends the stated threat model — opportunistic external, lateral movement / blast radius, operator/agent error (ADR-002). The pieces were already committed across other ADRs (`nftables` default-deny on hosts — ADR-002; OPNsense at the perimeter — ADR-007; Docker with `iptables: false` — ADR-004), but nothing tied them together: which layer owns what, where firewall intent is declared, and how the layers stay consistent. Without that, ports drift open ad-hoc and "per-host vs central" stays unanswered. ## Decision ### Two layers, distinct jobs **OPNsense — perimeter + inter-VLAN.** Owns the WAN edge and all policy *between zones*: `lan`/`iot`/`guest` → `srv`, `mgmt` access, and the per-VLAN egress rules (ADR-007). It is **structurally blind to intra-`srv` traffic** — services share the switched `srv` subnet (VLAN 20), which never reaches the gateway. **Host nftables — host-local + east-west within `srv`** (in the `base` role, every VM): - **Default-deny inbound**; allow loopback + established/related. - **East-west allowlist**: a service host accepts a connection only from declared sources (e.g. the reverse proxy, a named peer) — the lateral-movement control OPNsense cannot provide. - **Permissive egress**: allow outbound + established/related; per-VLAN egress restriction stays at OPNsense (ADR-007). Host-level egress allowlisting is high-friction (every DNS/NTP/update/registry/webhook must be enumerated) for limited added benefit once the VLAN already bounds where a host can go. - **Docker**: daemon runs with `"iptables": false`; nftables owns all filtering, including container traffic (ADR-004). - **Guaranteed management plane**: loopback, established/related, `wt0` (NetBird, ADR-016), and SSH from the control node's LAN address (`base__firewall_control_addr`, the `ssh-from-control` source) for SSH + Ansible are always allowed, independent of the catalog, applied atomically — a malformed or empty catalog can never lock out management. The control-node source is part of the guaranteed plane, not the service catalog (it is management, not a service); see ADR-021 for the access doctrine. So "per-host vs central" is answered: **both**, with clear ownership. ### Single source of truth — a shared service catalog A central, declarative **service catalog** in `group_vars/` is the one source of truth for firewall intent (aligning with ADR-002's "port definitions live in `group_vars/`", and keeping connectivity *topology* in inventory rather than in any one self-contained service role — ADR-004). Each entry describes a service's **ingress**: ```yaml photoprism: ingress: - { from: reverse_proxy, port: 2342, proto: tcp } reverse_proxy: ingress: - { from: lan, port: 443, proto: tcp } ``` `from` is **symbolic**, resolved at render time: a host/group → IP(s) from inventory; a role (`reverse_proxy`) → the host(s) filling it; a VLAN/zone (`lan`) → the subnet from the ADR-007 table. This keeps the catalog readable and resilient to IP changes. ### Each layer renders only its own slice | Ingress rule | Host nftables | OPNsense | |---|---|---| | `from: reverse_proxy` (a `srv` peer) | allow proxy IP → port | — (intra-`srv`, invisible) | | `from: lan` (cross-VLAN) | allow `lan` subnet → port | allow `lan` → host:port | The dominant pattern falls out naturally: most services are **proxied** — their only ingress is `from: reverse_proxy`, and users reach them through the reverse proxy, which alone carries `from: lan, port: 443` (matches "services sit behind the reverse proxy with authentication", ADR-002). This was chosen over a single connectivity-model-generates-both (too much machinery, tight coupling of two very different rule domains) and over fully independent per-layer declarations (real drift risk). ### Off-cluster hosts — `askari` (Hetzner) `askari` sits outside the Proxmox cluster and has no OPNsense. Its **perimeter** layer is a TF-managed **Hetzner Cloud Firewall** (declared in `terraform/environments/offsite/`) alongside the VM itself. Rule set: SSH inbound from `ubongo`'s public IP (M2), plus TCP 80/443 + UDP 3478 opened in **M4a** (Caddy + NetBird). The `netbird_coordinator` service role that uses 3478 lands in **M4b**; the ports are already open. The `group_vars` service catalog remains authoritative for `askari`'s **host nftables** layer — the same two-layer model applies, with Hetzner Cloud Firewall substituting for OPNsense at the perimeter. --- ### OPNsense automation — owned here, mechanism deferred OPNsense is Ansible-managed (CLAUDE.md: "OPNsense is entirely Ansible; no Terraform OPNsense provider"). It renders the cross-VLAN slice of the catalog plus the static ADR-007 facts. The **how** — config-XML templating vs the OPNsense API vs a plugin — is deferred to the OPNsense-as-code follow-up spec. Recorded as an explicit open sub-decision. ## Guardrails - **The catalog is authoritative.** If a port is not in the catalog, it does not exist — hardening the existing rule "never open a firewall port ad-hoc on a host" (ADR-002). - **The `firewall` tag** (ADR-019) marks firewall tasks; `--tags firewall` re-renders rules. - **Drift detection (aspiration).** A deterministic check — in the spirit of `scripts/check-tags.py` — comparing each host's live `nft` ruleset / listening ports against the catalog and flagging anything undeclared. Ties to TODO 8.5 (`/security-review`). Not necessarily built first. ## Consequences - Lateral movement within `srv` is constrained — the gap OPNsense structurally can't close. - One declarative catalog → no ad-hoc ports and no cross-layer drift on shared facts (ports, IPs, sources). - Cost: the catalog + render-per-layer machinery must be built and maintained; east-west allowlisting adds per-service ingress declarations (mitigated by proxied-by-default, which keeps most entries to a single line). ## Scope **Decided here:** the two-layer model and responsibilities; host nftables = default-deny inbound + east-west allowlist + permissive egress + guaranteed management plane + Docker `iptables:false`; the shared `group_vars` catalog as single source of truth with symbolic sources; each layer renders its own slice; the no-ad-hoc-ports guardrail. **Deferred to follow-up specs (each its own brainstorm → plan):** 1. **Host nftables implementation** in `base` — catalog schema, nftables template, Docker `iptables:false` integration, fail-safe ordering, Molecule tests. The natural next spec. 2. **OPNsense-as-code** — tooling mechanism + cross-VLAN rule rendering. 3. **Drift-detection check** — if/when built. ## Related ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius), ADR-004 (Docker model: `iptables:false`), ADR-007 (network topology, VLANs, OPNsense, per-VLAN egress), ADR-016 (NetBird mesh: SSH on `wt0` only), ADR-019 (`firewall` tag), ADR-021 (operational access doctrine; `ssh-from-control` management-plane source).