Re-challenge accepted risks; adopt CIS hardening + IDS

Walked the seeded accepted-risk register (R1-R4) and turned inherited gaps into deliberate decisions: - Supply chain (R1): tightened to required baseline hygiene (digest pinning, official/verified images); active scanning deferred — stays an accepted risk - CIS (R2): adopted as a positive decision — CIS Debian L1+L2 (base role) + CIS Docker (docker_host + service checklist); app layer via the checklist - SELinux/AppArmor (R3): AppArmor becomes a baseline control (CIS-enforced); register keeps a clean "no SELinux" accept - IDS (R4): adopt AIDE (baseline via CIS) + Suricata on OPNsense + active alerting Register shrinks from 4 inherited gaps to 2 deliberate accepts. ADR-002 gains a Hardening standard section; STATUS + TODO 15 track the (unbuilt) implementation, including the CIS L2 partition impact on VM provisioning (ADR-006). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:15:39 +02:00 · 2026-06-04 15:15:39 +02:00 · 19dd89b875
commit 19dd89b875
parent f338bccd46
4 changed files with 71 additions and 8 deletions
--- a/STATUS.md
+++ b/STATUS.md
@ -49,6 +49,8 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
 | Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
 | Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |
 | `/security-review` skill | ADR-002 / TODO 8.5 | Periodic posture re-check + accepted-risk re-challenge; planned, not built |
+| CIS hardening (Debian L1+L2 + Docker) | ADR-002 / TODO 15 | Implemented by the (unbuilt) `base`/`docker_host` roles; brings AppArmor + AIDE as baseline. L2 partitions affect VM provisioning (ADR-006) |
+| Network IDS + security alerting | ADR-002 / TODO 15 | Suricata on OPNsense + AIDE/`auditd`/`fail2ban` alerting into the monitoring stack; not built |

 ## Keeping this honest

--- a/docs/TODO.md
+++ b/docs/TODO.md
@ -97,3 +97,20 @@
    whether selectively allowing libraries (e.g. PyYAML — already present via
    Ansible) is a better fit in general: weigh the parsing-correctness win
    against losing zero-setup portability. Decide a clear rule and record it.
+
+15. **Security hardening implementation** — build out the ADR-002 hardening standard.
+    1. Implement the CIS Debian Benchmark **Level 1 + Level 2** in the `base` role
+       (local tasks; CIS / `dev-sec` as reference only — no Galaxy roles). Includes
+       AppArmor (enforce mode) and AIDE file-integrity.
+    2. Implement the CIS Docker Benchmark: daemon/engine settings in `docker_host`;
+       per-container settings enforced via `docs/security/service-checklist.md`.
+    3. VM disk layout for CIS L2: separate `/tmp`, `/var`, `/var/log`, `/home`
+       partitions with `nodev,nosuid,noexec` — a Terraform/cloud-init concern
+       (ADR-006). Decide the template layout **before** provisioning, since it is
+       painful to retrofit.
+    4. Network IDS: enable Suricata on OPNsense (IDS first; IPS later?).
+    5. Active security alerting: wire AIDE, `auditd`, `fail2ban`, and Suricata into
+       the Loki/Grafana alerting stack (ties to 3.6).
+    6. Supply-chain hygiene: enforce image digest pinning + official/verified images
+       via the service checklist; revisit active scanning (Trivy/Grype) once a
+       triage stack exists (accepted-risk R1).
--- a/docs/decisions/002-security.md
+++ b/docs/decisions/002-security.md
@ -25,7 +25,7 @@ What we deliberately design against — and, just as importantly, what we do not
 | **Opportunistic external** — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services | Yes — primary | SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth |
 | **Lateral movement / blast radius** — assume one service *is* compromised; limit how far it spreads | Yes | VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials |
 | **Operator / agent error** — accidental secret leak, misconfiguration, or an AI agent making an unsafe change | Yes | Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks |
-| **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Version pinning where practical (ADR-011), gitleaks; tracked as an accepted risk with a revisit trigger |
+| **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Baseline hygiene required: image digest pinning + prefer official/verified images (ADR-011, service checklist), gitleaks. Active vuln scanning deferred — accepted risk |
 | **Targeted / physical** — a determined adversary specifically after this homelab, or physical device access | Out of scope | Not designed against at this scale; revisit if the threat model changes |

 Supply chain is consciously deprioritized, not forgotten — see
@ -89,6 +89,22 @@ time. Each heading tags the threat(s) it primarily serves.
 - `auditd` installed and running with a baseline ruleset
 - Logs shipped to a central location if a log aggregation service is available

+### Mandatory access control — *blast radius*
+
+- **AppArmor** enabled with profiles in enforce mode — Debian-native MAC, default-on,
+  and required by the CIS Debian benchmark. Docker applies its `docker-default`
+  profile to containers; tighter per-service profiles are authored as needed.
+- **SELinux is not used** — non-native to Debian and redundant with AppArmor
+  (see `docs/security/accepted-risks.md`).
+
+### File integrity & intrusion detection — *opportunistic, blast radius, agent error*
+
+- **AIDE** file-integrity monitoring (required by the CIS Debian benchmark) — detects
+  unexpected changes to system files
+- **Network IDS** — Suricata on OPNsense (planned; see STATUS.md / TODO)
+- **Active alerting** wires AIDE, `auditd`, `fail2ban`, and Suricata into the
+  monitoring/alerting stack (planned; ties to the Loki/Grafana effort)
+
 ## Secrets management — *agent error, opportunistic*

 - Ansible Vault for all secrets (API keys, passwords, certificates), structured as a
@ -99,6 +115,29 @@ time. Each heading tags the threat(s) it primarily serves.
  `rbw unlock`; nothing decryptable sits at rest in the repo or working tree
 - See `docs/runbooks/rotate-secrets.md` for `rbw` setup and rotation

+## Hardening standard
+
+The baseline above is implemented to a recognised benchmark rather than ad-hoc:
+
+- **Hosts** — the **CIS Debian Benchmark, Levels 1 and 2**, applied by the `base`
+  role. Some L2 items require separate partitions (`/tmp`, `/var`, `/var/log`,
+  `/home`) with restrictive mount options (`nodev,nosuid,noexec`) — that reaches into
+  VM disk layout, a provisioning concern (Terraform / cloud-init, ADR-006), not just
+  the `base` role.
+- **Container runtime** — the **CIS Docker Benchmark**: daemon/engine settings in the
+  `docker_host` role; per-container run settings (non-root, read-only rootfs, dropped
+  capabilities, no `privileged`, no host namespaces) enforced via
+  `docs/security/service-checklist.md`.
+- **Application containers** — no CIS benchmark exists for the app long tail
+  (Jellyfin, Nextcloud, Forgejo, …); they are covered by the CIS Docker run settings
+  plus the service checklist plus upstream hardening guidance.
+
+Hardening controls are **implemented as local roles** (per the no-Galaxy-roles
+policy, ADR-003), using the CIS benchmarks and community roles (e.g. `dev-sec`) only
+as reference. Any specific CIS item that proves impractical is exempted into
+`docs/security/accepted-risks.md` with a rationale — so the register records named
+exceptions, not a blanket opt-out.
+
 ## Governance

 Security is maintained, not achieved once. This ADR **establishes** four
@ -132,6 +171,8 @@ This posture was chosen to be:
 - **Automated** — no manual steps to reach baseline state
 - **Legible & revisitable** — the threat model, principles, and accepted risks are
  written down and reviewed over time, not implicit
+- **Benchmarked** — host and container hardening follow CIS (Debian L1+L2, Docker),
+  not ad-hoc choices

 Out-of-scope items and conscious trade-offs are recorded in
 `docs/security/accepted-risks.md` rather than here, so this decision record stays
--- a/docs/security/accepted-risks.md
+++ b/docs/security/accepted-risks.md
@ -2,8 +2,8 @@

 Conscious security trade-offs we are choosing to live with — recorded so "what we
 are *not* doing" is explicit and revisitable, not forgotten. This register is a
-**living document** and is expected to change; it is deliberately kept out of
-ADR-002 (which records durable decisions) so the ADR stays stable.
+**living document**, deliberately kept out of ADR-002 (which records durable
+decisions) so the ADR stays stable.

 Owned by **ADR-002** (Security baseline and strategy). Re-challenged during the
 periodic security review (planned `/security-review`; see `docs/TODO.md`).
@ -13,9 +13,12 @@ revisit (trigger).

 | # | Accepted risk | Rationale | Revisit trigger |
 |---|---|---|---|
-| R1 | **Supply chain not actively defended** — third-party container/base images, dependencies, and Ansible collections are trusted as pulled | Out of proportion to a homelab's effort budget; the realistic threat is opportunistic, not a targeted supply-chain attack. gitleaks + version pinning (ADR-011) give partial cover | Hosting high-value data/finances for others; a relevant upstream compromise; appetite for image signing / SBOM / pinned digests |
-| R2 | **No full CIS benchmark hardening** | Significant complexity for marginal gain at this scale | A compliance need, or hosting third-party data with obligations |
-| R3 | **No SELinux / AppArmor** mandatory access control | Operational overhead exceeds benefit for the current threat model | Threat model shifts toward targeted attackers; a service with a poor security history |
-| R4 | **No intrusion detection system (IDS)** | Detection is only useful with the capacity to triage it; alerts no one reads are noise | Monitoring/alerting stack (Prometheus/Loki/Grafana) is in place and someone will act on alerts |
+| R1 | **Active supply-chain scanning deferred** — baseline hygiene *is* required (image digest pinning + prefer official/verified images, ADR-011 / service checklist; gitleaks), but images and dependencies are not actively vulnerability-scanned (Trivy/Grype) or signature-verified | Scanning only pays off with the capacity to triage its output; the realistic threat is opportunistic, not a targeted supply-chain attack | A monitoring/triage stack is live; hosting high-value data/finances for others; a relevant upstream compromise |
+| R2 | **SELinux not used** — no SELinux mandatory access control | AppArmor — Debian-native and enforced via the CIS baseline — already provides MAC; adding SELinux means two MAC systems, non-native to Debian, for no real gain | A service that ships and requires its own SELinux policy; threat model shifts toward targeted attackers |

-_Last reviewed: 2026-06-04 (seeded — pending a first re-challenge pass)._
+_Last reviewed: 2026-06-04. The prior gaps (full CIS hardening, SELinux/AppArmor,
+IDS) were re-challenged and **adopted rather than accepted**: CIS Debian L1+L2 + CIS
+Docker, AppArmor (enforce), AIDE file-integrity, and Suricata network IDS are now
+part of the security strategy (ADR-002). See STATUS.md / `docs/TODO.md` for build
+status. As CIS is implemented, any specific item that proves impractical is added
+here as a named exception._