boma/docs/superpowers/plans/2026-06-06-firewall-strategy.md
sjat f700f4a475 docs(plan): firewall strategy ADR-020 landing plan
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 15:42:17 +02:00

331 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Firewall Strategy (ADR-020) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Land the firewall *strategy* as ADR-020 and fold it into the living docs — no firewall code is built here (the host-nftables and OPNsense-as-code builds are separate follow-up specs).
**Architecture:** This is a documentation-only change. It creates `docs/decisions/020-firewall.md` from the approved design spec, then updates CLAUDE.md (Further reading + the firewall guardrail), `docs/TODO.md` (mark 3.5 decided), and `docs/CAPABILITIES.md` (point the firewall note at ADR-020). There is no executable code, so verification is consistency greps + `make lint`.
**Tech Stack:** Markdown docs only. `make lint` (yamllint + ansible-lint + check-tags) must stay green; none of these tools lint Markdown content, but the run confirms nothing else broke.
---
## File structure
| File | Responsibility | Action |
|------|----------------|--------|
| `docs/decisions/020-firewall.md` | The firewall strategy ADR (two-layer model, shared catalog, deferred builds) | Create |
| `CLAUDE.md` | Add ADR-020 to *Further reading*; harden the firewall guardrail bullet to reference the catalog/ADR-020 | Modify |
| `docs/TODO.md` | Mark item 3.5 DECIDED (ADR-020) | Modify |
| `docs/CAPABILITIES.md` | Point the existing firewall parenthetical at ADR-020 + the two-layer model | Modify |
Notes for the implementer:
- The design spec this ADR is based on is `docs/superpowers/specs/2026-06-06-firewall-strategy-design.md` — read it if you need the full rationale, but the ADR text below is complete and self-contained.
- Existing ADRs live in `docs/decisions/` numbered 001019; this is 020. Match their concise, decision-focused tone (ADR-019 is a good recent reference).
- Before any `git commit`, the pre-commit hook runs and decrypts `vault.yml`, so the vault agent must be unlocked: run `rbw unlocked` (exit 0 = good). If locked, ask the user to `rbw unlock` and wait. None of these tasks touch vault files.
- Run `make lint` via the repo venv wiring (the Makefile handles paths).
---
### Task 1: Write ADR-020
**Files:**
- Create: `docs/decisions/020-firewall.md`
- [ ] **Step 1: Create the ADR**
Create `docs/decisions/020-firewall.md` with exactly this content:
````markdown
# ADR-020 — Firewall strategy: two-layer model with a shared service catalog
## Status
Accepted (2026-06-06). Resolves TODO 3.5 ("Decide the firewall strategy — which
firewall, ruleset, per-host vs central").
**Strategy ADR.** It pins the architecture and each layer's responsibilities; the
detailed builds are separate follow-up efforts (see *Scope*).
## Context
boma needs a firewall strategy that is predictable, declarative, and defends the stated
threat model — opportunistic external, lateral movement / blast radius, operator/agent
error (ADR-002). The pieces were already committed across other ADRs (`nftables`
default-deny on hosts — ADR-002; OPNsense at the perimeter — ADR-007; Docker with
`iptables: false` — ADR-004), but nothing tied them together: which layer owns what,
where firewall intent is declared, and how the layers stay consistent. Without that,
ports drift open ad-hoc and "per-host vs central" stays unanswered.
## Decision
### Two layers, distinct jobs
**OPNsense — perimeter + inter-VLAN.** Owns the WAN edge and all policy *between zones*:
`lan`/`iot`/`guest` → `srv`, `mgmt` access, and the per-VLAN egress rules (ADR-007). It
is **structurally blind to intra-`srv` traffic** — services share the switched `srv`
subnet (VLAN 20), which never reaches the gateway.
**Host nftables — host-local + east-west within `srv`** (in the `base` role, every VM):
- **Default-deny inbound**; allow loopback + established/related.
- **East-west allowlist**: a service host accepts a connection only from declared
sources (e.g. the reverse proxy, a named peer) — the lateral-movement control OPNsense
cannot provide.
- **Permissive egress**: allow outbound + established/related; per-VLAN egress
restriction stays at OPNsense (ADR-007). Host-level egress allowlisting is
high-friction (every DNS/NTP/update/registry/webhook must be enumerated) for limited
added benefit once the VLAN already bounds where a host can go.
- **Docker**: daemon runs with `"iptables": false`; nftables owns all filtering,
including container traffic (ADR-004).
- **Guaranteed management plane**: loopback, established/related, and `wt0` (NetBird,
ADR-016) for SSH + Ansible are always allowed, independent of the catalog, applied
atomically — a malformed or empty catalog can never lock out management. (ADR-016: SSH
is allowed only on `wt0`.)
So "per-host vs central" is answered: **both**, with clear ownership.
### Single source of truth — a shared service catalog
A central, declarative **service catalog** in `group_vars/` is the one source of truth
for firewall intent (aligning with ADR-002's "port definitions live in `group_vars/`",
and keeping connectivity *topology* in inventory rather than in any one self-contained
service role — ADR-004). Each entry describes a service's **ingress**:
```yaml
photoprism:
ingress:
- { from: reverse_proxy, port: 2342, proto: tcp }
reverse_proxy:
ingress:
- { from: lan, port: 443, proto: tcp }
```
`from` is **symbolic**, resolved at render time: a host/group → IP(s) from inventory; a
role (`reverse_proxy`) → the host(s) filling it; a VLAN/zone (`lan`) → the subnet from
the ADR-007 table. This keeps the catalog readable and resilient to IP changes.
### Each layer renders only its own slice
| Ingress rule | Host nftables | OPNsense |
|---|---|---|
| `from: reverse_proxy` (a `srv` peer) | allow proxy IP → port | — (intra-`srv`, invisible) |
| `from: lan` (cross-VLAN) | allow `lan` subnet → port | allow `lan` → host:port |
The dominant pattern falls out naturally: most services are **proxied** — their only
ingress is `from: reverse_proxy`, and users reach them through the reverse proxy, which
alone carries `from: lan, port: 443` (matches "services sit behind the reverse proxy
with authentication", ADR-002).
This was chosen over a single connectivity-model-generates-both (too much machinery,
tight coupling of two very different rule domains) and over fully independent per-layer
declarations (real drift risk).
### OPNsense automation — owned here, mechanism deferred
OPNsense is Ansible-managed (CLAUDE.md: "OPNsense is entirely Ansible; no Terraform
OPNsense provider"). It renders the cross-VLAN slice of the catalog plus the static
ADR-007 facts. The **how** — config-XML templating vs the OPNsense API vs a plugin — is
deferred to the OPNsense-as-code follow-up spec. Recorded as an explicit open
sub-decision.
## Guardrails
- **The catalog is authoritative.** If a port is not in the catalog, it does not exist —
hardening the existing rule "never open a firewall port ad-hoc on a host" (ADR-002).
- **The `firewall` tag** (ADR-019) marks firewall tasks; `--tags firewall` re-renders
rules.
- **Drift detection (aspiration).** A deterministic check — in the spirit of
`scripts/check-tags.py` — comparing each host's live `nft` ruleset / listening ports
against the catalog and flagging anything undeclared. Ties to TODO 8.5
(`/security-review`). Not necessarily built first.
## Consequences
- Lateral movement within `srv` is constrained — the gap OPNsense structurally can't
close.
- One declarative catalog → no ad-hoc ports and no cross-layer drift on shared facts
(ports, IPs, sources).
- Cost: the catalog + render-per-layer machinery must be built and maintained; east-west
allowlisting adds per-service ingress declarations (mitigated by proxied-by-default,
which keeps most entries to a single line).
## Scope
**Decided here:** the two-layer model and responsibilities; host nftables = default-deny
inbound + east-west allowlist + permissive egress + guaranteed management plane + Docker
`iptables:false`; the shared `group_vars` catalog as single source of truth with
symbolic sources; each layer renders its own slice; the no-ad-hoc-ports guardrail.
**Deferred to follow-up specs (each its own brainstorm → plan):**
1. **Host nftables implementation** in `base` — catalog schema, nftables template,
Docker `iptables:false` integration, fail-safe ordering, Molecule tests. The natural
next spec.
2. **OPNsense-as-code** — tooling mechanism + cross-VLAN rule rendering.
3. **Drift-detection check** — if/when built.
## Related
ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
ADR-004 (Docker model: `iptables:false`), ADR-007 (network topology, VLANs, OPNsense,
per-VLAN egress), ADR-016 (NetBird mesh: SSH on `wt0` only), ADR-019 (`firewall` tag).
````
- [ ] **Step 2: Verify the file is well-formed**
Run:
```bash
test -f docs/decisions/020-firewall.md && grep -c "^## " docs/decisions/020-firewall.md
```
Expected: exit 0 and a printed count of `7` (the H2 sections: Status, Context, Decision, Guardrails, Consequences, Scope, Related — H3 subsections under Decision are not counted by `^## `).
- [ ] **Step 3: Commit**
```bash
git add docs/decisions/020-firewall.md
git commit -m "docs(adr): ADR-020 firewall strategy (two-layer + shared catalog)"
```
---
### Task 2: Wire ADR-020 into CLAUDE.md
**Files:**
- Modify: `CLAUDE.md` (Further reading table; firewall guardrail bullet)
- [ ] **Step 1: Add ADR-020 to the Further reading table**
In `CLAUDE.md`, find this row (around line 225):
```markdown
| Tagging & run-targeting | `docs/decisions/019-tagging.md` |
```
Add this row immediately after it:
```markdown
| Firewall strategy | `docs/decisions/020-firewall.md` |
```
(Exact column padding need not match perfectly — just produce a valid Markdown table row consistent with the surrounding rows.)
- [ ] **Step 2: Harden the firewall guardrail bullet**
In `CLAUDE.md`, find this bullet (around line 172, under "What Claude must not do without explicit instruction"):
```markdown
- Open a firewall port anywhere but the `group_vars` firewall definitions — never ad-hoc on a host (ADR-002)
```
Replace it with:
```markdown
- Open a firewall port anywhere but the `group_vars` service catalog — never ad-hoc on a host. If it's not in the catalog, it doesn't exist (ADR-002, ADR-020)
```
- [ ] **Step 3: Verify both edits**
Run:
```bash
grep -n "020-firewall" CLAUDE.md && grep -n "service catalog" CLAUDE.md
```
Expected: the Further reading row matches `020-firewall`, and the guardrail bullet now contains "service catalog".
- [ ] **Step 4: Commit**
```bash
git add CLAUDE.md
git commit -m "docs: link ADR-020; harden firewall guardrail to the service catalog"
```
---
### Task 3: Mark TODO 3.5 decided
**Files:**
- Modify: `docs/TODO.md` (item 3.5)
- [ ] **Step 1: Strike through and annotate item 3.5**
In `docs/TODO.md`, find this line (around line 26):
```markdown
5. Decide the firewall strategy (which firewall, ruleset, per-host vs central).
```
Replace it with:
```markdown
5. ~~Decide the firewall strategy (which firewall, ruleset, per-host vs central).~~
DECIDED (ADR-020): two layers — OPNsense (perimeter + inter-VLAN) + host nftables
(default-deny inbound + east-west allowlist, permissive egress). Single source of
truth: a `group_vars` service catalog with symbolic sources; each layer renders
its own slice. Builds deferred to follow-up specs (host nftables in `base`, then
OPNsense-as-code).
```
- [ ] **Step 2: Verify**
Run: `grep -n "DECIDED (ADR-020)" docs/TODO.md`
Expected: one match on the item 3.5 annotation.
- [ ] **Step 3: Commit**
```bash
git add docs/TODO.md
git commit -m "docs(todo): mark 3.5 firewall strategy decided (ADR-020)"
```
---
### Task 4: Update CAPABILITIES.md firewall note
**Files:**
- Modify: `docs/CAPABILITIES.md` (the firewall parenthetical in §1 Edge & networking, around line 32)
- [ ] **Step 1: Point the firewall note at ADR-020**
In `docs/CAPABILITIES.md`, find this line (around line 32, just under the §1 table):
```markdown
_(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not containers.)_
```
Replace it with:
```markdown
_(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not containers.)_
_Firewalling is two-layer (ADR-020): OPNsense at the perimeter + inter-VLAN, plus
per-host `nftables` (default-deny inbound + east-west allowlist) rendered by the `base`
role from a shared `group_vars` service catalog. Both layers are still to be built._
```
- [ ] **Step 2: Verify and run the full lint suite**
Run:
```bash
grep -n "ADR-020" docs/CAPABILITIES.md && make lint
```
Expected: the new ADR-020 note is found, and `make lint` passes (yamllint clean, ansible-lint clean, `check-tags: OK`).
- [ ] **Step 3: Commit**
```bash
git add docs/CAPABILITIES.md
git commit -m "docs(capabilities): note two-layer firewall model (ADR-020)"
```
---
## Final verification
- [ ] Confirm cross-references resolve:
```bash
ls docs/decisions/020-firewall.md && grep -rl "ADR-020\|020-firewall" CLAUDE.md docs/TODO.md docs/CAPABILITIES.md
```
Expected: the ADR file exists and all three living docs reference it.
- [ ] `make lint` passes end to end.
- [ ] `git log --oneline -4` shows the four task commits.
- [ ] Sanity: the ADR's *Scope* section names the two deferred build specs (host nftables in `base`, OPNsense-as-code) so the next brainstorm has an obvious starting point.