docs(adr): ADR-020 firewall strategy (two-layer + shared catalog)
This commit is contained in:
parent
8d1d8a88ea
commit
d311f67098
1 changed files with 133 additions and 0 deletions
133
docs/decisions/020-firewall.md
Normal file
133
docs/decisions/020-firewall.md
Normal file
|
|
@ -0,0 +1,133 @@
|
|||
# ADR-020 — Firewall strategy: two-layer model with a shared service catalog
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-06-06). Resolves TODO 3.5 ("Decide the firewall strategy — which
|
||||
firewall, ruleset, per-host vs central").
|
||||
|
||||
**Strategy ADR.** It pins the architecture and each layer's responsibilities; the
|
||||
detailed builds are separate follow-up efforts (see *Scope*).
|
||||
|
||||
## Context
|
||||
|
||||
boma needs a firewall strategy that is predictable, declarative, and defends the stated
|
||||
threat model — opportunistic external, lateral movement / blast radius, operator/agent
|
||||
error (ADR-002). The pieces were already committed across other ADRs (`nftables`
|
||||
default-deny on hosts — ADR-002; OPNsense at the perimeter — ADR-007; Docker with
|
||||
`iptables: false` — ADR-004), but nothing tied them together: which layer owns what,
|
||||
where firewall intent is declared, and how the layers stay consistent. Without that,
|
||||
ports drift open ad-hoc and "per-host vs central" stays unanswered.
|
||||
|
||||
## Decision
|
||||
|
||||
### Two layers, distinct jobs
|
||||
|
||||
**OPNsense — perimeter + inter-VLAN.** Owns the WAN edge and all policy *between zones*:
|
||||
`lan`/`iot`/`guest` → `srv`, `mgmt` access, and the per-VLAN egress rules (ADR-007). It
|
||||
is **structurally blind to intra-`srv` traffic** — services share the switched `srv`
|
||||
subnet (VLAN 20), which never reaches the gateway.
|
||||
|
||||
**Host nftables — host-local + east-west within `srv`** (in the `base` role, every VM):
|
||||
|
||||
- **Default-deny inbound**; allow loopback + established/related.
|
||||
- **East-west allowlist**: a service host accepts a connection only from declared
|
||||
sources (e.g. the reverse proxy, a named peer) — the lateral-movement control OPNsense
|
||||
cannot provide.
|
||||
- **Permissive egress**: allow outbound + established/related; per-VLAN egress
|
||||
restriction stays at OPNsense (ADR-007). Host-level egress allowlisting is
|
||||
high-friction (every DNS/NTP/update/registry/webhook must be enumerated) for limited
|
||||
added benefit once the VLAN already bounds where a host can go.
|
||||
- **Docker**: daemon runs with `"iptables": false`; nftables owns all filtering,
|
||||
including container traffic (ADR-004).
|
||||
- **Guaranteed management plane**: loopback, established/related, and `wt0` (NetBird,
|
||||
ADR-016) for SSH + Ansible are always allowed, independent of the catalog, applied
|
||||
atomically — a malformed or empty catalog can never lock out management. (ADR-016: SSH
|
||||
is allowed only on `wt0`.)
|
||||
|
||||
So "per-host vs central" is answered: **both**, with clear ownership.
|
||||
|
||||
### Single source of truth — a shared service catalog
|
||||
|
||||
A central, declarative **service catalog** in `group_vars/` is the one source of truth
|
||||
for firewall intent (aligning with ADR-002's "port definitions live in `group_vars/`",
|
||||
and keeping connectivity *topology* in inventory rather than in any one self-contained
|
||||
service role — ADR-004). Each entry describes a service's **ingress**:
|
||||
|
||||
```yaml
|
||||
photoprism:
|
||||
ingress:
|
||||
- { from: reverse_proxy, port: 2342, proto: tcp }
|
||||
reverse_proxy:
|
||||
ingress:
|
||||
- { from: lan, port: 443, proto: tcp }
|
||||
```
|
||||
|
||||
`from` is **symbolic**, resolved at render time: a host/group → IP(s) from inventory; a
|
||||
role (`reverse_proxy`) → the host(s) filling it; a VLAN/zone (`lan`) → the subnet from
|
||||
the ADR-007 table. This keeps the catalog readable and resilient to IP changes.
|
||||
|
||||
### Each layer renders only its own slice
|
||||
|
||||
| Ingress rule | Host nftables | OPNsense |
|
||||
|---|---|---|
|
||||
| `from: reverse_proxy` (a `srv` peer) | allow proxy IP → port | — (intra-`srv`, invisible) |
|
||||
| `from: lan` (cross-VLAN) | allow `lan` subnet → port | allow `lan` → host:port |
|
||||
|
||||
The dominant pattern falls out naturally: most services are **proxied** — their only
|
||||
ingress is `from: reverse_proxy`, and users reach them through the reverse proxy, which
|
||||
alone carries `from: lan, port: 443` (matches "services sit behind the reverse proxy
|
||||
with authentication", ADR-002).
|
||||
|
||||
This was chosen over a single connectivity-model-generates-both (too much machinery,
|
||||
tight coupling of two very different rule domains) and over fully independent per-layer
|
||||
declarations (real drift risk).
|
||||
|
||||
### OPNsense automation — owned here, mechanism deferred
|
||||
|
||||
OPNsense is Ansible-managed (CLAUDE.md: "OPNsense is entirely Ansible; no Terraform
|
||||
OPNsense provider"). It renders the cross-VLAN slice of the catalog plus the static
|
||||
ADR-007 facts. The **how** — config-XML templating vs the OPNsense API vs a plugin — is
|
||||
deferred to the OPNsense-as-code follow-up spec. Recorded as an explicit open
|
||||
sub-decision.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- **The catalog is authoritative.** If a port is not in the catalog, it does not exist —
|
||||
hardening the existing rule "never open a firewall port ad-hoc on a host" (ADR-002).
|
||||
- **The `firewall` tag** (ADR-019) marks firewall tasks; `--tags firewall` re-renders
|
||||
rules.
|
||||
- **Drift detection (aspiration).** A deterministic check — in the spirit of
|
||||
`scripts/check-tags.py` — comparing each host's live `nft` ruleset / listening ports
|
||||
against the catalog and flagging anything undeclared. Ties to TODO 8.5
|
||||
(`/security-review`). Not necessarily built first.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Lateral movement within `srv` is constrained — the gap OPNsense structurally can't
|
||||
close.
|
||||
- One declarative catalog → no ad-hoc ports and no cross-layer drift on shared facts
|
||||
(ports, IPs, sources).
|
||||
- Cost: the catalog + render-per-layer machinery must be built and maintained; east-west
|
||||
allowlisting adds per-service ingress declarations (mitigated by proxied-by-default,
|
||||
which keeps most entries to a single line).
|
||||
|
||||
## Scope
|
||||
|
||||
**Decided here:** the two-layer model and responsibilities; host nftables = default-deny
|
||||
inbound + east-west allowlist + permissive egress + guaranteed management plane + Docker
|
||||
`iptables:false`; the shared `group_vars` catalog as single source of truth with
|
||||
symbolic sources; each layer renders its own slice; the no-ad-hoc-ports guardrail.
|
||||
|
||||
**Deferred to follow-up specs (each its own brainstorm → plan):**
|
||||
|
||||
1. **Host nftables implementation** in `base` — catalog schema, nftables template,
|
||||
Docker `iptables:false` integration, fail-safe ordering, Molecule tests. The natural
|
||||
next spec.
|
||||
2. **OPNsense-as-code** — tooling mechanism + cross-VLAN rule rendering.
|
||||
3. **Drift-detection check** — if/when built.
|
||||
|
||||
## Related
|
||||
|
||||
ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
|
||||
ADR-004 (Docker model: `iptables:false`), ADR-007 (network topology, VLANs, OPNsense,
|
||||
per-VLAN egress), ADR-016 (NetBird mesh: SSH on `wt0` only), ADR-019 (`firewall` tag).
|
||||
Loading…
Add table
Reference in a new issue