docs(spec): firewall strategy design (TODO 3.5 → ADR-020)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
86bb3559ad
commit
2a65391c0e
1 changed files with 164 additions and 0 deletions
164
docs/superpowers/specs/2026-06-06-firewall-strategy-design.md
Normal file
164
docs/superpowers/specs/2026-06-06-firewall-strategy-design.md
Normal file
|
|
@ -0,0 +1,164 @@
|
|||
# Design — Firewall strategy (two-layer model + shared catalog)
|
||||
|
||||
- **Date:** 2026-06-06
|
||||
- **Status:** Approved design — pending implementation plan
|
||||
- **Resolves:** TODO 3.5 ("Decide the firewall strategy — which firewall, ruleset,
|
||||
per-host vs central")
|
||||
- **Becomes:** ADR-020 (this design is the basis for that ADR)
|
||||
- **Scope note:** This is the **strategy** ADR. It pins the architecture and
|
||||
responsibilities; the detailed builds (host nftables in `base`, OPNsense-as-code) are
|
||||
separate follow-up specs (see *Scope*).
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
boma needs a firewall strategy that is **predictable, declarative, and defends the
|
||||
stated threat model** (opportunistic external, lateral movement / blast radius,
|
||||
operator/agent error — ADR-002). The ADRs already commit to pieces of this — `nftables`
|
||||
default-deny on hosts (ADR-002), OPNsense at the perimeter (ADR-007), Docker with
|
||||
`iptables: false` (ADR-004) — but no document ties them together: *which layer owns
|
||||
what, where firewall intent is declared, and how the two layers stay consistent.*
|
||||
Without that, ports drift open ad-hoc and "per-host vs central" stays unanswered.
|
||||
|
||||
The roles that would hold the host firewall (`base`, `docker_host`) are empty, and there
|
||||
is no OPNsense automation yet — so this is greenfield strategy work.
|
||||
|
||||
## The two-layer model
|
||||
|
||||
Two firewall layers, each with a distinct job; the host layer adds deliberate
|
||||
defense-in-depth for the one thing the perimeter structurally cannot see.
|
||||
|
||||
### OPNsense — perimeter + inter-VLAN
|
||||
|
||||
Owns everything *between zones* and at the edge:
|
||||
|
||||
- WAN edge (the internet boundary).
|
||||
- Inter-VLAN policy: `lan`/`iot`/`guest` → `srv`, `mgmt` access, the documented
|
||||
per-VLAN egress rules (ADR-007).
|
||||
- **Structurally blind to intra-`srv` traffic**: services share the `srv` subnet
|
||||
(VLAN 20), which is switched and never reaches the OPNsense gateway.
|
||||
|
||||
### Host nftables — host-local + east-west within `srv` (in `base`)
|
||||
|
||||
Runs on every Debian VM:
|
||||
|
||||
- **Default-deny inbound**; allow loopback + established/related.
|
||||
- **East-west allowlist**: a service host accepts a connection only from declared
|
||||
sources (e.g. the reverse proxy, a named peer). This is the lateral-movement control
|
||||
OPNsense cannot provide — the blast-radius goal in ADR-002.
|
||||
- **Permissive egress**: allow outbound + established/related. Per-VLAN egress
|
||||
restriction stays at OPNsense (where it already lives, ADR-007). Rationale: host-level
|
||||
egress allowlisting is high-friction (every DNS/NTP/update/registry/webhook call must
|
||||
be enumerated) for limited additional benefit given OPNsense already bounds where each
|
||||
VLAN can go.
|
||||
- **Docker integration**: Docker daemon runs with `"iptables": false`; nftables owns all
|
||||
filtering, including container traffic (ADR-004).
|
||||
- **Guaranteed management plane**: loopback, established/related, and `wt0` (the NetBird
|
||||
overlay, ADR-016) for SSH + Ansible are *always* allowed, independent of the catalog,
|
||||
and the ruleset is applied atomically — so a malformed or empty catalog can never lock
|
||||
out management. (ADR-016: SSH is allowed only on `wt0`, not the LAN.)
|
||||
|
||||
## The shared service catalog (single source of truth)
|
||||
|
||||
A central, declarative **service catalog** in `group_vars/` is the one source of truth
|
||||
for firewall intent. This aligns with ADR-002's existing rule that "port definitions
|
||||
live in `group_vars/` so rules stay in sync with deployed services," and keeps
|
||||
connectivity *topology* (inherently cross-cutting) in inventory rather than in any one
|
||||
self-contained service role (ADR-004).
|
||||
|
||||
Each entry describes a service's **ingress** as a list of allow rules:
|
||||
|
||||
```yaml
|
||||
photoprism:
|
||||
ingress:
|
||||
- { from: reverse_proxy, port: 2342, proto: tcp }
|
||||
reverse_proxy:
|
||||
ingress:
|
||||
- { from: lan, port: 443, proto: tcp }
|
||||
```
|
||||
|
||||
`from` is **symbolic**, resolved at render time:
|
||||
|
||||
- a **host or group** → IP(s) from inventory;
|
||||
- a **role** (e.g. `reverse_proxy`) → the host(s) filling it;
|
||||
- a **VLAN/zone** (e.g. `lan`) → the subnet from the ADR-007 table.
|
||||
|
||||
Symbolic sources keep the catalog readable and resilient to IP changes.
|
||||
|
||||
### Each layer renders only its own slice
|
||||
|
||||
The same catalog feeds both layers; each filters for the rules it owns:
|
||||
|
||||
| Ingress rule | Host nftables | OPNsense |
|
||||
|---|---|---|
|
||||
| `from: reverse_proxy` (a `srv` peer) | allow proxy IP → port | — (intra-`srv`, invisible) |
|
||||
| `from: lan` (cross-VLAN) | allow `lan` subnet → port | allow `lan` → host:port |
|
||||
|
||||
The dominant pattern falls out naturally: most services are **proxied** — their only
|
||||
ingress is `from: reverse_proxy`; users reach them *through* the reverse proxy, which
|
||||
alone carries `from: lan, port: 443`. This matches "services sit behind the reverse
|
||||
proxy with authentication" (ADR-002).
|
||||
|
||||
"Shared catalog, each layer renders its own" was chosen over a single
|
||||
connectivity-model-generates-both (too much machinery, tight coupling of two very
|
||||
different rule domains) and over fully independent per-layer declarations (real drift
|
||||
risk: a port opened on the host but not at OPNsense, or vice versa).
|
||||
|
||||
## OPNsense automation — owned here, mechanism deferred
|
||||
|
||||
OPNsense is **Ansible-managed** (CLAUDE.md: "OPNsense is entirely Ansible; do not reach
|
||||
for a Terraform OPNsense provider"). It renders the **cross-VLAN slice** of the catalog
|
||||
(every `from: <other-zone>` rule) plus the static ADR-007 facts (WAN edge, per-VLAN
|
||||
egress, mgmt access, inter-VLAN defaults).
|
||||
|
||||
This ADR pins **what** OPNsense owns and that it renders from the shared catalog. The
|
||||
**how** — config-XML templating vs the OPNsense API vs a plugin — is a substantial,
|
||||
separate tooling decision, **deferred to the OPNsense-as-code follow-up spec**. Recorded
|
||||
here as an explicit open sub-decision so it is not lost.
|
||||
|
||||
## Guardrails & enforcement
|
||||
|
||||
- **The catalog is authoritative.** If a port is not in the catalog, it does not exist.
|
||||
This hardens the existing CLAUDE.md guardrail ("never open a firewall port ad-hoc on a
|
||||
host") into a positive contract.
|
||||
- **The `firewall` tag** (ADR-019) marks firewall tasks, so `--tags firewall` re-renders
|
||||
rules on `base` and any service role that contributes them.
|
||||
- **Drift detection (aspiration).** A deterministic check — in the spirit of
|
||||
`scripts/check-tags.py` — compares each host's actual listening ports / live `nft`
|
||||
ruleset against the catalog and flags anything undeclared. Ties to TODO 8.5
|
||||
(`/security-review`) and the "undeclared open ports" pre-scan idea. Listed as a
|
||||
consequence and future guardrail; not necessarily built in the first implementation.
|
||||
|
||||
## Consequences
|
||||
|
||||
- "Per-host vs central" is answered: **both**, with clear ownership — central perimeter
|
||||
(OPNsense) + per-host default-deny with east-west allowlisting, fed by one catalog.
|
||||
- Lateral movement within `srv` is constrained (the gap OPNsense can't close).
|
||||
- One declarative catalog means no ad-hoc ports and no cross-layer drift on the shared
|
||||
facts (ports, IPs, sources).
|
||||
- Cost: the catalog and the render-per-layer machinery must be built and maintained;
|
||||
east-west allowlisting adds per-service ingress declarations (mitigated by the
|
||||
proxied-by-default pattern, which keeps most entries to a single line).
|
||||
|
||||
## Scope
|
||||
|
||||
**This ADR decides:** the two-layer model and each layer's responsibilities; host
|
||||
nftables = default-deny inbound + east-west allowlist + permissive egress + guaranteed
|
||||
management plane + Docker `iptables:false`; the shared `group_vars` service catalog as
|
||||
single source of truth with symbolic sources; each layer renders its own slice; the
|
||||
no-ad-hoc-ports guardrail.
|
||||
|
||||
**Deferred to follow-up specs (each its own brainstorm → plan):**
|
||||
|
||||
1. **Host nftables implementation** in `base` — exact catalog schema, nftables template
|
||||
structure, Docker `iptables:false` integration, fail-safe ordering, Molecule tests.
|
||||
The natural next spec.
|
||||
2. **OPNsense-as-code** — the tooling mechanism + cross-VLAN rule rendering.
|
||||
3. **Drift-detection check** — if/when we build it.
|
||||
|
||||
## Related
|
||||
|
||||
ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
|
||||
ADR-004 (Docker model: `iptables:false`), ADR-007 (network topology, VLANs, OPNsense,
|
||||
per-VLAN egress), ADR-016 (NetBird mesh: SSH on `wt0` only), ADR-019 (`firewall` tag).
|
||||
Loading…
Add table
Reference in a new issue