6.3 KiB
ADR-020 — Firewall strategy: two-layer model with a shared service catalog
Status
Accepted (2026-06-06). Resolves TODO 3.5 ("Decide the firewall strategy — which firewall, ruleset, per-host vs central").
Strategy ADR. It pins the architecture and each layer's responsibilities; the detailed builds are separate follow-up efforts (see Scope).
Context
boma needs a firewall strategy that is predictable, declarative, and defends the stated
threat model — opportunistic external, lateral movement / blast radius, operator/agent
error (ADR-002). The pieces were already committed across other ADRs (nftables
default-deny on hosts — ADR-002; OPNsense at the perimeter — ADR-007; Docker with
iptables: false — ADR-004), but nothing tied them together: which layer owns what,
where firewall intent is declared, and how the layers stay consistent. Without that,
ports drift open ad-hoc and "per-host vs central" stays unanswered.
Decision
Two layers, distinct jobs
OPNsense — perimeter + inter-VLAN. Owns the WAN edge and all policy between zones:
lan/iot/guest → srv, mgmt access, and the per-VLAN egress rules (ADR-007). It
is structurally blind to intra-srv traffic — services share the switched srv
subnet (VLAN 20), which never reaches the gateway.
Host nftables — host-local + east-west within srv (in the base role, every VM):
- Default-deny inbound; allow loopback + established/related.
- East-west allowlist: a service host accepts a connection only from declared sources (e.g. the reverse proxy, a named peer) — the lateral-movement control OPNsense cannot provide.
- Permissive egress: allow outbound + established/related; per-VLAN egress restriction stays at OPNsense (ADR-007). Host-level egress allowlisting is high-friction (every DNS/NTP/update/registry/webhook must be enumerated) for limited added benefit once the VLAN already bounds where a host can go.
- Docker: daemon runs with
"iptables": false; nftables owns all filtering, including container traffic (ADR-004). - Guaranteed management plane: loopback, established/related, and
wt0(NetBird, ADR-016) for SSH + Ansible are always allowed, independent of the catalog, applied atomically — a malformed or empty catalog can never lock out management. (ADR-016: SSH is allowed only onwt0.)
So "per-host vs central" is answered: both, with clear ownership.
Single source of truth — a shared service catalog
A central, declarative service catalog in group_vars/ is the one source of truth
for firewall intent (aligning with ADR-002's "port definitions live in group_vars/",
and keeping connectivity topology in inventory rather than in any one self-contained
service role — ADR-004). Each entry describes a service's ingress:
photoprism:
ingress:
- { from: reverse_proxy, port: 2342, proto: tcp }
reverse_proxy:
ingress:
- { from: lan, port: 443, proto: tcp }
from is symbolic, resolved at render time: a host/group → IP(s) from inventory; a
role (reverse_proxy) → the host(s) filling it; a VLAN/zone (lan) → the subnet from
the ADR-007 table. This keeps the catalog readable and resilient to IP changes.
Each layer renders only its own slice
| Ingress rule | Host nftables | OPNsense |
|---|---|---|
from: reverse_proxy (a srv peer) |
allow proxy IP → port | — (intra-srv, invisible) |
from: lan (cross-VLAN) |
allow lan subnet → port |
allow lan → host:port |
The dominant pattern falls out naturally: most services are proxied — their only
ingress is from: reverse_proxy, and users reach them through the reverse proxy, which
alone carries from: lan, port: 443 (matches "services sit behind the reverse proxy
with authentication", ADR-002).
This was chosen over a single connectivity-model-generates-both (too much machinery, tight coupling of two very different rule domains) and over fully independent per-layer declarations (real drift risk).
OPNsense automation — owned here, mechanism deferred
OPNsense is Ansible-managed (CLAUDE.md: "OPNsense is entirely Ansible; no Terraform OPNsense provider"). It renders the cross-VLAN slice of the catalog plus the static ADR-007 facts. The how — config-XML templating vs the OPNsense API vs a plugin — is deferred to the OPNsense-as-code follow-up spec. Recorded as an explicit open sub-decision.
Guardrails
- The catalog is authoritative. If a port is not in the catalog, it does not exist — hardening the existing rule "never open a firewall port ad-hoc on a host" (ADR-002).
- The
firewalltag (ADR-019) marks firewall tasks;--tags firewallre-renders rules. - Drift detection (aspiration). A deterministic check — in the spirit of
scripts/check-tags.py— comparing each host's livenftruleset / listening ports against the catalog and flagging anything undeclared. Ties to TODO 8.5 (/security-review). Not necessarily built first.
Consequences
- Lateral movement within
srvis constrained — the gap OPNsense structurally can't close. - One declarative catalog → no ad-hoc ports and no cross-layer drift on shared facts (ports, IPs, sources).
- Cost: the catalog + render-per-layer machinery must be built and maintained; east-west allowlisting adds per-service ingress declarations (mitigated by proxied-by-default, which keeps most entries to a single line).
Scope
Decided here: the two-layer model and responsibilities; host nftables = default-deny
inbound + east-west allowlist + permissive egress + guaranteed management plane + Docker
iptables:false; the shared group_vars catalog as single source of truth with
symbolic sources; each layer renders its own slice; the no-ad-hoc-ports guardrail.
Deferred to follow-up specs (each its own brainstorm → plan):
- Host nftables implementation in
base— catalog schema, nftables template, Dockeriptables:falseintegration, fail-safe ordering, Molecule tests. The natural next spec. - OPNsense-as-code — tooling mechanism + cross-VLAN rule rendering.
- Drift-detection check — if/when built.
Related
ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
ADR-004 (Docker model: iptables:false), ADR-007 (network topology, VLANs, OPNsense,
per-VLAN egress), ADR-016 (NetBird mesh: SSH on wt0 only), ADR-019 (firewall tag).