From 19dd89b8758e60b754c0c394519df89f678520b6 Mon Sep 17 00:00:00 2001 From: sjat Date: Thu, 4 Jun 2026 15:15:39 +0200 Subject: [PATCH] Re-challenge accepted risks; adopt CIS hardening + IDS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Walked the seeded accepted-risk register (R1-R4) and turned inherited gaps into deliberate decisions: - Supply chain (R1): tightened to required baseline hygiene (digest pinning, official/verified images); active scanning deferred — stays an accepted risk - CIS (R2): adopted as a positive decision — CIS Debian L1+L2 (base role) + CIS Docker (docker_host + service checklist); app layer via the checklist - SELinux/AppArmor (R3): AppArmor becomes a baseline control (CIS-enforced); register keeps a clean "no SELinux" accept - IDS (R4): adopt AIDE (baseline via CIS) + Suricata on OPNsense + active alerting Register shrinks from 4 inherited gaps to 2 deliberate accepts. ADR-002 gains a Hardening standard section; STATUS + TODO 15 track the (unbuilt) implementation, including the CIS L2 partition impact on VM provisioning (ADR-006). Co-Authored-By: Claude Opus 4.8 (1M context) --- STATUS.md | 2 ++ docs/TODO.md | 17 +++++++++++++ docs/decisions/002-security.md | 43 ++++++++++++++++++++++++++++++++- docs/security/accepted-risks.md | 17 +++++++------ 4 files changed, 71 insertions(+), 8 deletions(-) diff --git a/STATUS.md b/STATUS.md index ece666e..5b8500a 100644 --- a/STATUS.md +++ b/STATUS.md @@ -49,6 +49,8 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas | Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built | | Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster | | `/security-review` skill | ADR-002 / TODO 8.5 | Periodic posture re-check + accepted-risk re-challenge; planned, not built | +| CIS hardening (Debian L1+L2 + Docker) | ADR-002 / TODO 15 | Implemented by the (unbuilt) `base`/`docker_host` roles; brings AppArmor + AIDE as baseline. L2 partitions affect VM provisioning (ADR-006) | +| Network IDS + security alerting | ADR-002 / TODO 15 | Suricata on OPNsense + AIDE/`auditd`/`fail2ban` alerting into the monitoring stack; not built | ## Keeping this honest diff --git a/docs/TODO.md b/docs/TODO.md index d4563da..a00b0b2 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -97,3 +97,20 @@ whether selectively allowing libraries (e.g. PyYAML — already present via Ansible) is a better fit in general: weigh the parsing-correctness win against losing zero-setup portability. Decide a clear rule and record it. + +15. **Security hardening implementation** — build out the ADR-002 hardening standard. + 1. Implement the CIS Debian Benchmark **Level 1 + Level 2** in the `base` role + (local tasks; CIS / `dev-sec` as reference only — no Galaxy roles). Includes + AppArmor (enforce mode) and AIDE file-integrity. + 2. Implement the CIS Docker Benchmark: daemon/engine settings in `docker_host`; + per-container settings enforced via `docs/security/service-checklist.md`. + 3. VM disk layout for CIS L2: separate `/tmp`, `/var`, `/var/log`, `/home` + partitions with `nodev,nosuid,noexec` — a Terraform/cloud-init concern + (ADR-006). Decide the template layout **before** provisioning, since it is + painful to retrofit. + 4. Network IDS: enable Suricata on OPNsense (IDS first; IPS later?). + 5. Active security alerting: wire AIDE, `auditd`, `fail2ban`, and Suricata into + the Loki/Grafana alerting stack (ties to 3.6). + 6. Supply-chain hygiene: enforce image digest pinning + official/verified images + via the service checklist; revisit active scanning (Trivy/Grype) once a + triage stack exists (accepted-risk R1). diff --git a/docs/decisions/002-security.md b/docs/decisions/002-security.md index cff531c..eb736af 100644 --- a/docs/decisions/002-security.md +++ b/docs/decisions/002-security.md @@ -25,7 +25,7 @@ What we deliberately design against — and, just as importantly, what we do not | **Opportunistic external** — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services | Yes — primary | SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth | | **Lateral movement / blast radius** — assume one service *is* compromised; limit how far it spreads | Yes | VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials | | **Operator / agent error** — accidental secret leak, misconfiguration, or an AI agent making an unsafe change | Yes | Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks | -| **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Version pinning where practical (ADR-011), gitleaks; tracked as an accepted risk with a revisit trigger | +| **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Baseline hygiene required: image digest pinning + prefer official/verified images (ADR-011, service checklist), gitleaks. Active vuln scanning deferred — accepted risk | | **Targeted / physical** — a determined adversary specifically after this homelab, or physical device access | Out of scope | Not designed against at this scale; revisit if the threat model changes | Supply chain is consciously deprioritized, not forgotten — see @@ -89,6 +89,22 @@ time. Each heading tags the threat(s) it primarily serves. - `auditd` installed and running with a baseline ruleset - Logs shipped to a central location if a log aggregation service is available +### Mandatory access control — *blast radius* + +- **AppArmor** enabled with profiles in enforce mode — Debian-native MAC, default-on, + and required by the CIS Debian benchmark. Docker applies its `docker-default` + profile to containers; tighter per-service profiles are authored as needed. +- **SELinux is not used** — non-native to Debian and redundant with AppArmor + (see `docs/security/accepted-risks.md`). + +### File integrity & intrusion detection — *opportunistic, blast radius, agent error* + +- **AIDE** file-integrity monitoring (required by the CIS Debian benchmark) — detects + unexpected changes to system files +- **Network IDS** — Suricata on OPNsense (planned; see STATUS.md / TODO) +- **Active alerting** wires AIDE, `auditd`, `fail2ban`, and Suricata into the + monitoring/alerting stack (planned; ties to the Loki/Grafana effort) + ## Secrets management — *agent error, opportunistic* - Ansible Vault for all secrets (API keys, passwords, certificates), structured as a @@ -99,6 +115,29 @@ time. Each heading tags the threat(s) it primarily serves. `rbw unlock`; nothing decryptable sits at rest in the repo or working tree - See `docs/runbooks/rotate-secrets.md` for `rbw` setup and rotation +## Hardening standard + +The baseline above is implemented to a recognised benchmark rather than ad-hoc: + +- **Hosts** — the **CIS Debian Benchmark, Levels 1 and 2**, applied by the `base` + role. Some L2 items require separate partitions (`/tmp`, `/var`, `/var/log`, + `/home`) with restrictive mount options (`nodev,nosuid,noexec`) — that reaches into + VM disk layout, a provisioning concern (Terraform / cloud-init, ADR-006), not just + the `base` role. +- **Container runtime** — the **CIS Docker Benchmark**: daemon/engine settings in the + `docker_host` role; per-container run settings (non-root, read-only rootfs, dropped + capabilities, no `privileged`, no host namespaces) enforced via + `docs/security/service-checklist.md`. +- **Application containers** — no CIS benchmark exists for the app long tail + (Jellyfin, Nextcloud, Forgejo, …); they are covered by the CIS Docker run settings + plus the service checklist plus upstream hardening guidance. + +Hardening controls are **implemented as local roles** (per the no-Galaxy-roles +policy, ADR-003), using the CIS benchmarks and community roles (e.g. `dev-sec`) only +as reference. Any specific CIS item that proves impractical is exempted into +`docs/security/accepted-risks.md` with a rationale — so the register records named +exceptions, not a blanket opt-out. + ## Governance Security is maintained, not achieved once. This ADR **establishes** four @@ -132,6 +171,8 @@ This posture was chosen to be: - **Automated** — no manual steps to reach baseline state - **Legible & revisitable** — the threat model, principles, and accepted risks are written down and reviewed over time, not implicit +- **Benchmarked** — host and container hardening follow CIS (Debian L1+L2, Docker), + not ad-hoc choices Out-of-scope items and conscious trade-offs are recorded in `docs/security/accepted-risks.md` rather than here, so this decision record stays diff --git a/docs/security/accepted-risks.md b/docs/security/accepted-risks.md index b5498cd..2e7f776 100644 --- a/docs/security/accepted-risks.md +++ b/docs/security/accepted-risks.md @@ -2,8 +2,8 @@ Conscious security trade-offs we are choosing to live with — recorded so "what we are *not* doing" is explicit and revisitable, not forgotten. This register is a -**living document** and is expected to change; it is deliberately kept out of -ADR-002 (which records durable decisions) so the ADR stays stable. +**living document**, deliberately kept out of ADR-002 (which records durable +decisions) so the ADR stays stable. Owned by **ADR-002** (Security baseline and strategy). Re-challenged during the periodic security review (planned `/security-review`; see `docs/TODO.md`). @@ -13,9 +13,12 @@ revisit (trigger). | # | Accepted risk | Rationale | Revisit trigger | |---|---|---|---| -| R1 | **Supply chain not actively defended** — third-party container/base images, dependencies, and Ansible collections are trusted as pulled | Out of proportion to a homelab's effort budget; the realistic threat is opportunistic, not a targeted supply-chain attack. gitleaks + version pinning (ADR-011) give partial cover | Hosting high-value data/finances for others; a relevant upstream compromise; appetite for image signing / SBOM / pinned digests | -| R2 | **No full CIS benchmark hardening** | Significant complexity for marginal gain at this scale | A compliance need, or hosting third-party data with obligations | -| R3 | **No SELinux / AppArmor** mandatory access control | Operational overhead exceeds benefit for the current threat model | Threat model shifts toward targeted attackers; a service with a poor security history | -| R4 | **No intrusion detection system (IDS)** | Detection is only useful with the capacity to triage it; alerts no one reads are noise | Monitoring/alerting stack (Prometheus/Loki/Grafana) is in place and someone will act on alerts | +| R1 | **Active supply-chain scanning deferred** — baseline hygiene *is* required (image digest pinning + prefer official/verified images, ADR-011 / service checklist; gitleaks), but images and dependencies are not actively vulnerability-scanned (Trivy/Grype) or signature-verified | Scanning only pays off with the capacity to triage its output; the realistic threat is opportunistic, not a targeted supply-chain attack | A monitoring/triage stack is live; hosting high-value data/finances for others; a relevant upstream compromise | +| R2 | **SELinux not used** — no SELinux mandatory access control | AppArmor — Debian-native and enforced via the CIS baseline — already provides MAC; adding SELinux means two MAC systems, non-native to Debian, for no real gain | A service that ships and requires its own SELinux policy; threat model shifts toward targeted attackers | -_Last reviewed: 2026-06-04 (seeded — pending a first re-challenge pass)._ +_Last reviewed: 2026-06-04. The prior gaps (full CIS hardening, SELinux/AppArmor, +IDS) were re-challenged and **adopted rather than accepted**: CIS Debian L1+L2 + CIS +Docker, AppArmor (enforce), AIDE file-integrity, and Suricata network IDS are now +part of the security strategy (ADR-002). See STATUS.md / `docs/TODO.md` for build +status. As CIS is implemented, any specific item that proves impractical is added +here as a named exception._