docs(capabilities): note two-layer firewall model (ADR-020)

docs(todo): mark 3.5 firewall strategy decided (ADR-020)
docs: link ADR-020; harden firewall guardrail to the service catalog
2026-06-06 16:00:19 +02:00 · 2026-06-06 16:00:01 +02:00 · 2026-06-06 15:59:47 +02:00 · 2026-06-06 15:59:30 +02:00 · 2026-06-06 15:57:40 +02:00 · 2026-06-06 15:42:17 +02:00
7 changed files with 657 additions and 2 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -169,7 +169,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 - Edit vault-encrypted files directly — decrypt first, re-encrypt after
 - Force-push or rewrite already-pushed history on `main`
 - Add a collection to `requirements.yml` without a specific module need in existing role tasks
- Open a firewall port anywhere but the `group_vars` firewall definitions — never ad-hoc on a host (ADR-002)
+- Open a firewall port anywhere but the `group_vars` service catalog — never ad-hoc on a host. If it's not in the catalog, it doesn't exist (ADR-002, ADR-020)
 - Disable or weaken a baseline control from ADR-002 (SSH hardening, nftables default-deny, fail2ban, auditd)
 - Expose a service to the LAN/WAN without it sitting behind the reverse proxy with authentication (ADR-002)
 - Deploy a service that hasn't cleared `docs/security/service-checklist.md` (record any deviation in `docs/security/accepted-risks.md`)
@ -223,6 +223,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 | Hardware & capacity    | `docs/decisions/012-hardware-capacity.md` |
 | Logging & log integrity | `docs/decisions/018-logging.md` |
 | Tagging & run-targeting | `docs/decisions/019-tagging.md` |
+| Firewall strategy      | `docs/decisions/020-firewall.md`      |
 | Adding a new role      | `docs/runbooks/new-role.md`           |
 | Adding a new host      | `docs/runbooks/new-host.md`           |
 | Rotating vault secrets | `docs/runbooks/rotate-secrets.md`     |
--- a/docs/CAPABILITIES.md
+++ b/docs/CAPABILITIES.md
@ -31,6 +31,10 @@ decisions this frame enables.

 _(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not containers.)_

+_Firewalling is two-layer (ADR-020): OPNsense at the perimeter + inter-VLAN, plus
+per-host `nftables` (default-deny inbound + east-west allowlist) rendered by the `base`
+role from a shared `group_vars` service catalog. Both layers are still to be built._
+
 ## 2. Identity & access — [P]

 | Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
--- a/docs/FRICTION.md
+++ b/docs/FRICTION.md
@ -77,3 +77,20 @@ earning its keep.
    ADR's Deferred list** in the same change. Three hits now — promote from "worth a
    check" to **build it**: a `/review-repo` rule flagging any ADR "Deferred/Open" entry
    whose subject is named as RESOLVED/DECIDED elsewhere.
+
+## 2026-06-06
+
+- `[recurring]` **Asked the execution-mode question AGAIN** ("subagent-driven vs inline —
+  which approach?") at the end of `writing-plans`, despite the 2026-06-05 standing
+  preference *and* the `always-subagent-driven-execution` memory both saying don't ask.
+  Root cause: the `writing-plans` skill's "Execution Handoff" step scripts the menu, and
+  I followed the skill text over the user's standing override. Second occurrence →
+  escalate from "skip the prompt" to a **hard rule**: never present the execution-mode
+  menu; finishing a plan means defaulting straight to subagent-driven.
+- `[friction]` **Don't pause for approval between writing a plan and implementing it.**
+  The user has standing pre-approval to carry straight through plan → implementation. The
+  brainstorming/plan flow already has explicit approval gates (design approval, spec
+  review); adding another "shall I proceed to implement?" gate after the plan is written
+  is redundant friction. → After `writing-plans` finishes, begin subagent-driven
+  implementation directly. The only reason to stop is a genuine blocker or ambiguity, not
+  a routine checkpoint.
--- a/docs/TODO.md
+++ b/docs/TODO.md
@ -23,7 +23,12 @@
      translate-don't-transplant — V4 is a source only of gotchas + working config
      snippets, re-derived on boma's terms; never structure/requirements/values.
   4. Decide what each node runs — base packages plus which apps/services.
-   5. Decide the firewall strategy (which firewall, ruleset, per-host vs central).
+   5. ~~Decide the firewall strategy (which firewall, ruleset, per-host vs central).~~
+      DECIDED (ADR-020): two layers — OPNsense (perimeter + inter-VLAN) + host nftables
+      (default-deny inbound + east-west allowlist, permissive egress). Single source of
+      truth: a `group_vars` service catalog with symbolic sources; each layer renders
+      its own slice. Builds deferred to follow-up specs (host nftables in `base`, then
+      OPNsense-as-code).
   6. Wire up the monitoring stack. Logging topology DECIDED (ADR-018): cluster Loki
      (all logs) + off-site security subset on `askari` + Grafana on-cluster (not the
      whole stack on `askari`). Still to design/build: Prometheus + metric exporters,
--- a/docs/decisions/020-firewall.md
+++ b/docs/decisions/020-firewall.md
@ -0,0 +1,133 @@
+# ADR-020 — Firewall strategy: two-layer model with a shared service catalog
+
+## Status
+
+Accepted (2026-06-06). Resolves TODO 3.5 ("Decide the firewall strategy — which
+firewall, ruleset, per-host vs central").
+
+**Strategy ADR.** It pins the architecture and each layer's responsibilities; the
+detailed builds are separate follow-up efforts (see *Scope*).
+
+## Context
+
+boma needs a firewall strategy that is predictable, declarative, and defends the stated
+threat model — opportunistic external, lateral movement / blast radius, operator/agent
+error (ADR-002). The pieces were already committed across other ADRs (`nftables`
+default-deny on hosts — ADR-002; OPNsense at the perimeter — ADR-007; Docker with
+`iptables: false` — ADR-004), but nothing tied them together: which layer owns what,
+where firewall intent is declared, and how the layers stay consistent. Without that,
+ports drift open ad-hoc and "per-host vs central" stays unanswered.
+
+## Decision
+
+### Two layers, distinct jobs
+
+**OPNsense — perimeter + inter-VLAN.** Owns the WAN edge and all policy *between zones*:
+`lan`/`iot`/`guest` → `srv`, `mgmt` access, and the per-VLAN egress rules (ADR-007). It
+is **structurally blind to intra-`srv` traffic** — services share the switched `srv`
+subnet (VLAN 20), which never reaches the gateway.
+
+**Host nftables — host-local + east-west within `srv`** (in the `base` role, every VM):
+
+- **Default-deny inbound**; allow loopback + established/related.
+- **East-west allowlist**: a service host accepts a connection only from declared
+  sources (e.g. the reverse proxy, a named peer) — the lateral-movement control OPNsense
+  cannot provide.
+- **Permissive egress**: allow outbound + established/related; per-VLAN egress
+  restriction stays at OPNsense (ADR-007). Host-level egress allowlisting is
+  high-friction (every DNS/NTP/update/registry/webhook must be enumerated) for limited
+  added benefit once the VLAN already bounds where a host can go.
+- **Docker**: daemon runs with `"iptables": false`; nftables owns all filtering,
+  including container traffic (ADR-004).
+- **Guaranteed management plane**: loopback, established/related, and `wt0` (NetBird,
+  ADR-016) for SSH + Ansible are always allowed, independent of the catalog, applied
+  atomically — a malformed or empty catalog can never lock out management. (ADR-016: SSH
+  is allowed only on `wt0`.)
+
+So "per-host vs central" is answered: **both**, with clear ownership.
+
+### Single source of truth — a shared service catalog
+
+A central, declarative **service catalog** in `group_vars/` is the one source of truth
+for firewall intent (aligning with ADR-002's "port definitions live in `group_vars/`",
+and keeping connectivity *topology* in inventory rather than in any one self-contained
+service role — ADR-004). Each entry describes a service's **ingress**:
+
+```yaml
+photoprism:
+  ingress:
+    - { from: reverse_proxy, port: 2342, proto: tcp }
+reverse_proxy:
+  ingress:
+    - { from: lan, port: 443, proto: tcp }
+```
+
+`from` is **symbolic**, resolved at render time: a host/group → IP(s) from inventory; a
+role (`reverse_proxy`) → the host(s) filling it; a VLAN/zone (`lan`) → the subnet from
+the ADR-007 table. This keeps the catalog readable and resilient to IP changes.
+
+### Each layer renders only its own slice
+
+| Ingress rule | Host nftables | OPNsense |
+|---|---|---|
+| `from: reverse_proxy` (a `srv` peer) | allow proxy IP → port | — (intra-`srv`, invisible) |
+| `from: lan` (cross-VLAN) | allow `lan` subnet → port | allow `lan` → host:port |
+
+The dominant pattern falls out naturally: most services are **proxied** — their only
+ingress is `from: reverse_proxy`, and users reach them through the reverse proxy, which
+alone carries `from: lan, port: 443` (matches "services sit behind the reverse proxy
+with authentication", ADR-002).
+
+This was chosen over a single connectivity-model-generates-both (too much machinery,
+tight coupling of two very different rule domains) and over fully independent per-layer
+declarations (real drift risk).
+
+### OPNsense automation — owned here, mechanism deferred
+
+OPNsense is Ansible-managed (CLAUDE.md: "OPNsense is entirely Ansible; no Terraform
+OPNsense provider"). It renders the cross-VLAN slice of the catalog plus the static
+ADR-007 facts. The **how** — config-XML templating vs the OPNsense API vs a plugin — is
+deferred to the OPNsense-as-code follow-up spec. Recorded as an explicit open
+sub-decision.
+
+## Guardrails
+
+- **The catalog is authoritative.** If a port is not in the catalog, it does not exist —
+  hardening the existing rule "never open a firewall port ad-hoc on a host" (ADR-002).
+- **The `firewall` tag** (ADR-019) marks firewall tasks; `--tags firewall` re-renders
+  rules.
+- **Drift detection (aspiration).** A deterministic check — in the spirit of
+  `scripts/check-tags.py` — comparing each host's live `nft` ruleset / listening ports
+  against the catalog and flagging anything undeclared. Ties to TODO 8.5
+  (`/security-review`). Not necessarily built first.
+
+## Consequences
+
+- Lateral movement within `srv` is constrained — the gap OPNsense structurally can't
+  close.
+- One declarative catalog → no ad-hoc ports and no cross-layer drift on shared facts
+  (ports, IPs, sources).
+- Cost: the catalog + render-per-layer machinery must be built and maintained; east-west
+  allowlisting adds per-service ingress declarations (mitigated by proxied-by-default,
+  which keeps most entries to a single line).
+
+## Scope
+
+**Decided here:** the two-layer model and responsibilities; host nftables = default-deny
+inbound + east-west allowlist + permissive egress + guaranteed management plane + Docker
+`iptables:false`; the shared `group_vars` catalog as single source of truth with
+symbolic sources; each layer renders its own slice; the no-ad-hoc-ports guardrail.
+
+**Deferred to follow-up specs (each its own brainstorm → plan):**
+
+1. **Host nftables implementation** in `base` — catalog schema, nftables template,
+   Docker `iptables:false` integration, fail-safe ordering, Molecule tests. The natural
+   next spec.
+2. **OPNsense-as-code** — tooling mechanism + cross-VLAN rule rendering.
+3. **Drift-detection check** — if/when built.
+
+## Related
+
+ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
+ADR-004 (Docker model: `iptables:false`), ADR-007 (network topology, VLANs, OPNsense,
+per-VLAN egress), ADR-016 (NetBird mesh: SSH on `wt0` only), ADR-019 (`firewall` tag).
--- a/docs/superpowers/plans/2026-06-06-firewall-strategy.md
+++ b/docs/superpowers/plans/2026-06-06-firewall-strategy.md
@ -0,0 +1,331 @@
+# Firewall Strategy (ADR-020) Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Land the firewall *strategy* as ADR-020 and fold it into the living docs — no firewall code is built here (the host-nftables and OPNsense-as-code builds are separate follow-up specs).
+
+**Architecture:** This is a documentation-only change. It creates `docs/decisions/020-firewall.md` from the approved design spec, then updates CLAUDE.md (Further reading + the firewall guardrail), `docs/TODO.md` (mark 3.5 decided), and `docs/CAPABILITIES.md` (point the firewall note at ADR-020). There is no executable code, so verification is consistency greps + `make lint`.
+
+**Tech Stack:** Markdown docs only. `make lint` (yamllint + ansible-lint + check-tags) must stay green; none of these tools lint Markdown content, but the run confirms nothing else broke.
+
+---
+
+## File structure
+
+| File | Responsibility | Action |
+|------|----------------|--------|
+| `docs/decisions/020-firewall.md` | The firewall strategy ADR (two-layer model, shared catalog, deferred builds) | Create |
+| `CLAUDE.md` | Add ADR-020 to *Further reading*; harden the firewall guardrail bullet to reference the catalog/ADR-020 | Modify |
+| `docs/TODO.md` | Mark item 3.5 DECIDED (ADR-020) | Modify |
+| `docs/CAPABILITIES.md` | Point the existing firewall parenthetical at ADR-020 + the two-layer model | Modify |
+
+Notes for the implementer:
+- The design spec this ADR is based on is `docs/superpowers/specs/2026-06-06-firewall-strategy-design.md` — read it if you need the full rationale, but the ADR text below is complete and self-contained.
+- Existing ADRs live in `docs/decisions/` numbered 001–019; this is 020. Match their concise, decision-focused tone (ADR-019 is a good recent reference).
+- Before any `git commit`, the pre-commit hook runs and decrypts `vault.yml`, so the vault agent must be unlocked: run `rbw unlocked` (exit 0 = good). If locked, ask the user to `rbw unlock` and wait. None of these tasks touch vault files.
+- Run `make lint` via the repo venv wiring (the Makefile handles paths).
+
+---
+
+### Task 1: Write ADR-020
+
+**Files:**
+- Create: `docs/decisions/020-firewall.md`
+
+- [ ] **Step 1: Create the ADR**
+
+Create `docs/decisions/020-firewall.md` with exactly this content:
+
+````markdown
+# ADR-020 — Firewall strategy: two-layer model with a shared service catalog
+
+## Status
+
+Accepted (2026-06-06). Resolves TODO 3.5 ("Decide the firewall strategy — which
+firewall, ruleset, per-host vs central").
+
+**Strategy ADR.** It pins the architecture and each layer's responsibilities; the
+detailed builds are separate follow-up efforts (see *Scope*).
+
+## Context
+
+boma needs a firewall strategy that is predictable, declarative, and defends the stated
+threat model — opportunistic external, lateral movement / blast radius, operator/agent
+error (ADR-002). The pieces were already committed across other ADRs (`nftables`
+default-deny on hosts — ADR-002; OPNsense at the perimeter — ADR-007; Docker with
+`iptables: false` — ADR-004), but nothing tied them together: which layer owns what,
+where firewall intent is declared, and how the layers stay consistent. Without that,
+ports drift open ad-hoc and "per-host vs central" stays unanswered.
+
+## Decision
+
+### Two layers, distinct jobs
+
+**OPNsense — perimeter + inter-VLAN.** Owns the WAN edge and all policy *between zones*:
+`lan`/`iot`/`guest` → `srv`, `mgmt` access, and the per-VLAN egress rules (ADR-007). It
+is **structurally blind to intra-`srv` traffic** — services share the switched `srv`
+subnet (VLAN 20), which never reaches the gateway.
+
+**Host nftables — host-local + east-west within `srv`** (in the `base` role, every VM):
+
+- **Default-deny inbound**; allow loopback + established/related.
+- **East-west allowlist**: a service host accepts a connection only from declared
+  sources (e.g. the reverse proxy, a named peer) — the lateral-movement control OPNsense
+  cannot provide.
+- **Permissive egress**: allow outbound + established/related; per-VLAN egress
+  restriction stays at OPNsense (ADR-007). Host-level egress allowlisting is
+  high-friction (every DNS/NTP/update/registry/webhook must be enumerated) for limited
+  added benefit once the VLAN already bounds where a host can go.
+- **Docker**: daemon runs with `"iptables": false`; nftables owns all filtering,
+  including container traffic (ADR-004).
+- **Guaranteed management plane**: loopback, established/related, and `wt0` (NetBird,
+  ADR-016) for SSH + Ansible are always allowed, independent of the catalog, applied
+  atomically — a malformed or empty catalog can never lock out management. (ADR-016: SSH
+  is allowed only on `wt0`.)
+
+So "per-host vs central" is answered: **both**, with clear ownership.
+
+### Single source of truth — a shared service catalog
+
+A central, declarative **service catalog** in `group_vars/` is the one source of truth
+for firewall intent (aligning with ADR-002's "port definitions live in `group_vars/`",
+and keeping connectivity *topology* in inventory rather than in any one self-contained
+service role — ADR-004). Each entry describes a service's **ingress**:
+
+```yaml
+photoprism:
+  ingress:
+    - { from: reverse_proxy, port: 2342, proto: tcp }
+reverse_proxy:
+  ingress:
+    - { from: lan, port: 443, proto: tcp }
+```
+
+`from` is **symbolic**, resolved at render time: a host/group → IP(s) from inventory; a
+role (`reverse_proxy`) → the host(s) filling it; a VLAN/zone (`lan`) → the subnet from
+the ADR-007 table. This keeps the catalog readable and resilient to IP changes.
+
+### Each layer renders only its own slice
+
+| Ingress rule | Host nftables | OPNsense |
+|---|---|---|
+| `from: reverse_proxy` (a `srv` peer) | allow proxy IP → port | — (intra-`srv`, invisible) |
+| `from: lan` (cross-VLAN) | allow `lan` subnet → port | allow `lan` → host:port |
+
+The dominant pattern falls out naturally: most services are **proxied** — their only
+ingress is `from: reverse_proxy`, and users reach them through the reverse proxy, which
+alone carries `from: lan, port: 443` (matches "services sit behind the reverse proxy
+with authentication", ADR-002).
+
+This was chosen over a single connectivity-model-generates-both (too much machinery,
+tight coupling of two very different rule domains) and over fully independent per-layer
+declarations (real drift risk).
+
+### OPNsense automation — owned here, mechanism deferred
+
+OPNsense is Ansible-managed (CLAUDE.md: "OPNsense is entirely Ansible; no Terraform
+OPNsense provider"). It renders the cross-VLAN slice of the catalog plus the static
+ADR-007 facts. The **how** — config-XML templating vs the OPNsense API vs a plugin — is
+deferred to the OPNsense-as-code follow-up spec. Recorded as an explicit open
+sub-decision.
+
+## Guardrails
+
+- **The catalog is authoritative.** If a port is not in the catalog, it does not exist —
+  hardening the existing rule "never open a firewall port ad-hoc on a host" (ADR-002).
+- **The `firewall` tag** (ADR-019) marks firewall tasks; `--tags firewall` re-renders
+  rules.
+- **Drift detection (aspiration).** A deterministic check — in the spirit of
+  `scripts/check-tags.py` — comparing each host's live `nft` ruleset / listening ports
+  against the catalog and flagging anything undeclared. Ties to TODO 8.5
+  (`/security-review`). Not necessarily built first.
+
+## Consequences
+
+- Lateral movement within `srv` is constrained — the gap OPNsense structurally can't
+  close.
+- One declarative catalog → no ad-hoc ports and no cross-layer drift on shared facts
+  (ports, IPs, sources).
+- Cost: the catalog + render-per-layer machinery must be built and maintained; east-west
+  allowlisting adds per-service ingress declarations (mitigated by proxied-by-default,
+  which keeps most entries to a single line).
+
+## Scope
+
+**Decided here:** the two-layer model and responsibilities; host nftables = default-deny
+inbound + east-west allowlist + permissive egress + guaranteed management plane + Docker
+`iptables:false`; the shared `group_vars` catalog as single source of truth with
+symbolic sources; each layer renders its own slice; the no-ad-hoc-ports guardrail.
+
+**Deferred to follow-up specs (each its own brainstorm → plan):**
+
+1. **Host nftables implementation** in `base` — catalog schema, nftables template,
+   Docker `iptables:false` integration, fail-safe ordering, Molecule tests. The natural
+   next spec.
+2. **OPNsense-as-code** — tooling mechanism + cross-VLAN rule rendering.
+3. **Drift-detection check** — if/when built.
+
+## Related
+
+ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
+ADR-004 (Docker model: `iptables:false`), ADR-007 (network topology, VLANs, OPNsense,
+per-VLAN egress), ADR-016 (NetBird mesh: SSH on `wt0` only), ADR-019 (`firewall` tag).
+````
+
+- [ ] **Step 2: Verify the file is well-formed**
+
+Run:
+```bash
+test -f docs/decisions/020-firewall.md && grep -c "^## " docs/decisions/020-firewall.md
+```
+Expected: exit 0 and a printed count of `7` (the H2 sections: Status, Context, Decision, Guardrails, Consequences, Scope, Related — H3 subsections under Decision are not counted by `^## `).
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add docs/decisions/020-firewall.md
+git commit -m "docs(adr): ADR-020 firewall strategy (two-layer + shared catalog)"
+```
+
+---
+
+### Task 2: Wire ADR-020 into CLAUDE.md
+
+**Files:**
+- Modify: `CLAUDE.md` (Further reading table; firewall guardrail bullet)
+
+- [ ] **Step 1: Add ADR-020 to the Further reading table**
+
+In `CLAUDE.md`, find this row (around line 225):
+
+```markdown
+| Tagging & run-targeting | `docs/decisions/019-tagging.md` |
+```
+
+Add this row immediately after it:
+
+```markdown
+| Firewall strategy      | `docs/decisions/020-firewall.md`      |
+```
+
+(Exact column padding need not match perfectly — just produce a valid Markdown table row consistent with the surrounding rows.)
+
+- [ ] **Step 2: Harden the firewall guardrail bullet**
+
+In `CLAUDE.md`, find this bullet (around line 172, under "What Claude must not do without explicit instruction"):
+
+```markdown
+- Open a firewall port anywhere but the `group_vars` firewall definitions — never ad-hoc on a host (ADR-002)
+```
+
+Replace it with:
+
+```markdown
+- Open a firewall port anywhere but the `group_vars` service catalog — never ad-hoc on a host. If it's not in the catalog, it doesn't exist (ADR-002, ADR-020)
+```
+
+- [ ] **Step 3: Verify both edits**
+
+Run:
+```bash
+grep -n "020-firewall" CLAUDE.md && grep -n "service catalog" CLAUDE.md
+```
+Expected: the Further reading row matches `020-firewall`, and the guardrail bullet now contains "service catalog".
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add CLAUDE.md
+git commit -m "docs: link ADR-020; harden firewall guardrail to the service catalog"
+```
+
+---
+
+### Task 3: Mark TODO 3.5 decided
+
+**Files:**
+- Modify: `docs/TODO.md` (item 3.5)
+
+- [ ] **Step 1: Strike through and annotate item 3.5**
+
+In `docs/TODO.md`, find this line (around line 26):
+
+```markdown
+   5. Decide the firewall strategy (which firewall, ruleset, per-host vs central).
+```
+
+Replace it with:
+
+```markdown
+   5. ~~Decide the firewall strategy (which firewall, ruleset, per-host vs central).~~
+      DECIDED (ADR-020): two layers — OPNsense (perimeter + inter-VLAN) + host nftables
+      (default-deny inbound + east-west allowlist, permissive egress). Single source of
+      truth: a `group_vars` service catalog with symbolic sources; each layer renders
+      its own slice. Builds deferred to follow-up specs (host nftables in `base`, then
+      OPNsense-as-code).
+```
+
+- [ ] **Step 2: Verify**
+
+Run: `grep -n "DECIDED (ADR-020)" docs/TODO.md`
+Expected: one match on the item 3.5 annotation.
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add docs/TODO.md
+git commit -m "docs(todo): mark 3.5 firewall strategy decided (ADR-020)"
+```
+
+---
+
+### Task 4: Update CAPABILITIES.md firewall note
+
+**Files:**
+- Modify: `docs/CAPABILITIES.md` (the firewall parenthetical in §1 Edge & networking, around line 32)
+
+- [ ] **Step 1: Point the firewall note at ADR-020**
+
+In `docs/CAPABILITIES.md`, find this line (around line 32, just under the §1 table):
+
+```markdown
+_(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not containers.)_
+```
+
+Replace it with:
+
+```markdown
+_(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not containers.)_
+
+_Firewalling is two-layer (ADR-020): OPNsense at the perimeter + inter-VLAN, plus
+per-host `nftables` (default-deny inbound + east-west allowlist) rendered by the `base`
+role from a shared `group_vars` service catalog. Both layers are still to be built._
+```
+
+- [ ] **Step 2: Verify and run the full lint suite**
+
+Run:
+```bash
+grep -n "ADR-020" docs/CAPABILITIES.md && make lint
+```
+Expected: the new ADR-020 note is found, and `make lint` passes (yamllint clean, ansible-lint clean, `check-tags: OK`).
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add docs/CAPABILITIES.md
+git commit -m "docs(capabilities): note two-layer firewall model (ADR-020)"
+```
+
+---
+
+## Final verification
+
+- [ ] Confirm cross-references resolve:
+  ```bash
+  ls docs/decisions/020-firewall.md && grep -rl "ADR-020\|020-firewall" CLAUDE.md docs/TODO.md docs/CAPABILITIES.md
+  ```
+  Expected: the ADR file exists and all three living docs reference it.
+- [ ] `make lint` passes end to end.
+- [ ] `git log --oneline -4` shows the four task commits.
+- [ ] Sanity: the ADR's *Scope* section names the two deferred build specs (host nftables in `base`, OPNsense-as-code) so the next brainstorm has an obvious starting point.
--- a/docs/superpowers/specs/2026-06-06-firewall-strategy-design.md
+++ b/docs/superpowers/specs/2026-06-06-firewall-strategy-design.md
@ -0,0 +1,164 @@
+# Design — Firewall strategy (two-layer model + shared catalog)
+
+- **Date:** 2026-06-06
+- **Status:** Approved design — pending implementation plan
+- **Resolves:** TODO 3.5 ("Decide the firewall strategy — which firewall, ruleset,
+  per-host vs central")
+- **Becomes:** ADR-020 (this design is the basis for that ADR)
+- **Scope note:** This is the **strategy** ADR. It pins the architecture and
+  responsibilities; the detailed builds (host nftables in `base`, OPNsense-as-code) are
+  separate follow-up specs (see *Scope*).
+
+---
+
+## Problem
+
+boma needs a firewall strategy that is **predictable, declarative, and defends the
+stated threat model** (opportunistic external, lateral movement / blast radius,
+operator/agent error — ADR-002). The ADRs already commit to pieces of this — `nftables`
+default-deny on hosts (ADR-002), OPNsense at the perimeter (ADR-007), Docker with
+`iptables: false` (ADR-004) — but no document ties them together: *which layer owns
+what, where firewall intent is declared, and how the two layers stay consistent.*
+Without that, ports drift open ad-hoc and "per-host vs central" stays unanswered.
+
+The roles that would hold the host firewall (`base`, `docker_host`) are empty, and there
+is no OPNsense automation yet — so this is greenfield strategy work.
+
+## The two-layer model
+
+Two firewall layers, each with a distinct job; the host layer adds deliberate
+defense-in-depth for the one thing the perimeter structurally cannot see.
+
+### OPNsense — perimeter + inter-VLAN
+
+Owns everything *between zones* and at the edge:
+
+- WAN edge (the internet boundary).
+- Inter-VLAN policy: `lan`/`iot`/`guest` → `srv`, `mgmt` access, the documented
+  per-VLAN egress rules (ADR-007).
+- **Structurally blind to intra-`srv` traffic**: services share the `srv` subnet
+  (VLAN 20), which is switched and never reaches the OPNsense gateway.
+
+### Host nftables — host-local + east-west within `srv` (in `base`)
+
+Runs on every Debian VM:
+
+- **Default-deny inbound**; allow loopback + established/related.
+- **East-west allowlist**: a service host accepts a connection only from declared
+  sources (e.g. the reverse proxy, a named peer). This is the lateral-movement control
+  OPNsense cannot provide — the blast-radius goal in ADR-002.
+- **Permissive egress**: allow outbound + established/related. Per-VLAN egress
+  restriction stays at OPNsense (where it already lives, ADR-007). Rationale: host-level
+  egress allowlisting is high-friction (every DNS/NTP/update/registry/webhook call must
+  be enumerated) for limited additional benefit given OPNsense already bounds where each
+  VLAN can go.
+- **Docker integration**: Docker daemon runs with `"iptables": false`; nftables owns all
+  filtering, including container traffic (ADR-004).
+- **Guaranteed management plane**: loopback, established/related, and `wt0` (the NetBird
+  overlay, ADR-016) for SSH + Ansible are *always* allowed, independent of the catalog,
+  and the ruleset is applied atomically — so a malformed or empty catalog can never lock
+  out management. (ADR-016: SSH is allowed only on `wt0`, not the LAN.)
+
+## The shared service catalog (single source of truth)
+
+A central, declarative **service catalog** in `group_vars/` is the one source of truth
+for firewall intent. This aligns with ADR-002's existing rule that "port definitions
+live in `group_vars/` so rules stay in sync with deployed services," and keeps
+connectivity *topology* (inherently cross-cutting) in inventory rather than in any one
+self-contained service role (ADR-004).
+
+Each entry describes a service's **ingress** as a list of allow rules:
+
+```yaml
+photoprism:
+  ingress:
+    - { from: reverse_proxy, port: 2342, proto: tcp }
+reverse_proxy:
+  ingress:
+    - { from: lan, port: 443, proto: tcp }
+```
+
+`from` is **symbolic**, resolved at render time:
+
+- a **host or group** → IP(s) from inventory;
+- a **role** (e.g. `reverse_proxy`) → the host(s) filling it;
+- a **VLAN/zone** (e.g. `lan`) → the subnet from the ADR-007 table.
+
+Symbolic sources keep the catalog readable and resilient to IP changes.
+
+### Each layer renders only its own slice
+
+The same catalog feeds both layers; each filters for the rules it owns:
+
+| Ingress rule | Host nftables | OPNsense |
+|---|---|---|
+| `from: reverse_proxy` (a `srv` peer) | allow proxy IP → port | — (intra-`srv`, invisible) |
+| `from: lan` (cross-VLAN) | allow `lan` subnet → port | allow `lan` → host:port |
+
+The dominant pattern falls out naturally: most services are **proxied** — their only
+ingress is `from: reverse_proxy`; users reach them *through* the reverse proxy, which
+alone carries `from: lan, port: 443`. This matches "services sit behind the reverse
+proxy with authentication" (ADR-002).
+
+"Shared catalog, each layer renders its own" was chosen over a single
+connectivity-model-generates-both (too much machinery, tight coupling of two very
+different rule domains) and over fully independent per-layer declarations (real drift
+risk: a port opened on the host but not at OPNsense, or vice versa).
+
+## OPNsense automation — owned here, mechanism deferred
+
+OPNsense is **Ansible-managed** (CLAUDE.md: "OPNsense is entirely Ansible; do not reach
+for a Terraform OPNsense provider"). It renders the **cross-VLAN slice** of the catalog
+(every `from: <other-zone>` rule) plus the static ADR-007 facts (WAN edge, per-VLAN
+egress, mgmt access, inter-VLAN defaults).
+
+This ADR pins **what** OPNsense owns and that it renders from the shared catalog. The
+**how** — config-XML templating vs the OPNsense API vs a plugin — is a substantial,
+separate tooling decision, **deferred to the OPNsense-as-code follow-up spec**. Recorded
+here as an explicit open sub-decision so it is not lost.
+
+## Guardrails & enforcement
+
+- **The catalog is authoritative.** If a port is not in the catalog, it does not exist.
+  This hardens the existing CLAUDE.md guardrail ("never open a firewall port ad-hoc on a
+  host") into a positive contract.
+- **The `firewall` tag** (ADR-019) marks firewall tasks, so `--tags firewall` re-renders
+  rules on `base` and any service role that contributes them.
+- **Drift detection (aspiration).** A deterministic check — in the spirit of
+  `scripts/check-tags.py` — compares each host's actual listening ports / live `nft`
+  ruleset against the catalog and flags anything undeclared. Ties to TODO 8.5
+  (`/security-review`) and the "undeclared open ports" pre-scan idea. Listed as a
+  consequence and future guardrail; not necessarily built in the first implementation.
+
+## Consequences
+
+- "Per-host vs central" is answered: **both**, with clear ownership — central perimeter
+  (OPNsense) + per-host default-deny with east-west allowlisting, fed by one catalog.
+- Lateral movement within `srv` is constrained (the gap OPNsense can't close).
+- One declarative catalog means no ad-hoc ports and no cross-layer drift on the shared
+  facts (ports, IPs, sources).
+- Cost: the catalog and the render-per-layer machinery must be built and maintained;
+  east-west allowlisting adds per-service ingress declarations (mitigated by the
+  proxied-by-default pattern, which keeps most entries to a single line).
+
+## Scope
+
+**This ADR decides:** the two-layer model and each layer's responsibilities; host
+nftables = default-deny inbound + east-west allowlist + permissive egress + guaranteed
+management plane + Docker `iptables:false`; the shared `group_vars` service catalog as
+single source of truth with symbolic sources; each layer renders its own slice; the
+no-ad-hoc-ports guardrail.
+
+**Deferred to follow-up specs (each its own brainstorm → plan):**
+
+1. **Host nftables implementation** in `base` — exact catalog schema, nftables template
+   structure, Docker `iptables:false` integration, fail-safe ordering, Molecule tests.
+   The natural next spec.
+2. **OPNsense-as-code** — the tooling mechanism + cross-VLAN rule rendering.
+3. **Drift-detection check** — if/when we build it.
+
+## Related
+
+ADR-002 (security baseline: nftables default-deny, fail2ban, blast radius),
+ADR-004 (Docker model: `iptables:false`), ADR-007 (network topology, VLANs, OPNsense,
+per-VLAN egress), ADR-016 (NetBird mesh: SSH on `wt0` only), ADR-019 (`firewall` tag).
Author	SHA1	Message	Date
sjat	2ad50e4d5b	docs(capabilities): note two-layer firewall model (ADR-020)	2026-06-06 16:00:19 +02:00
sjat	a9287427e3	docs(todo): mark 3.5 firewall strategy decided (ADR-020)	2026-06-06 16:00:01 +02:00
sjat	e24aab28b2	docs: link ADR-020; harden firewall guardrail to the service catalog	2026-06-06 15:59:47 +02:00
sjat	d311f67098	docs(adr): ADR-020 firewall strategy (two-layer + shared catalog)	2026-06-06 15:59:30 +02:00
sjat	8d1d8a88ea	docs(friction): escalate execution-mode prompt; no plan→impl approval gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:57:40 +02:00
sjat	f700f4a475	docs(plan): firewall strategy ADR-020 landing plan Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:42:17 +02:00
sjat	2a65391c0e	docs(spec): firewall strategy design (TODO 3.5 → ADR-020) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:36:24 +02:00