Re-challenge accepted risks; adopt CIS hardening + IDS

Walked the seeded accepted-risk register (R1-R4) and turned inherited gaps into
deliberate decisions:

- Supply chain (R1): tightened to required baseline hygiene (digest pinning,
  official/verified images); active scanning deferred — stays an accepted risk
- CIS (R2): adopted as a positive decision — CIS Debian L1+L2 (base role) + CIS
  Docker (docker_host + service checklist); app layer via the checklist
- SELinux/AppArmor (R3): AppArmor becomes a baseline control (CIS-enforced);
  register keeps a clean "no SELinux" accept
- IDS (R4): adopt AIDE (baseline via CIS) + Suricata on OPNsense + active alerting

Register shrinks from 4 inherited gaps to 2 deliberate accepts. ADR-002 gains a
Hardening standard section; STATUS + TODO 15 track the (unbuilt) implementation,
including the CIS L2 partition impact on VM provisioning (ADR-006).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-04 15:15:39 +02:00
parent f338bccd46
commit 19dd89b875
4 changed files with 71 additions and 8 deletions

View file

@ -49,6 +49,8 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
| Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
| Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |
| `/security-review` skill | ADR-002 / TODO 8.5 | Periodic posture re-check + accepted-risk re-challenge; planned, not built |
| CIS hardening (Debian L1+L2 + Docker) | ADR-002 / TODO 15 | Implemented by the (unbuilt) `base`/`docker_host` roles; brings AppArmor + AIDE as baseline. L2 partitions affect VM provisioning (ADR-006) |
| Network IDS + security alerting | ADR-002 / TODO 15 | Suricata on OPNsense + AIDE/`auditd`/`fail2ban` alerting into the monitoring stack; not built |
## Keeping this honest

View file

@ -97,3 +97,20 @@
whether selectively allowing libraries (e.g. PyYAML — already present via
Ansible) is a better fit in general: weigh the parsing-correctness win
against losing zero-setup portability. Decide a clear rule and record it.
15. **Security hardening implementation** — build out the ADR-002 hardening standard.
1. Implement the CIS Debian Benchmark **Level 1 + Level 2** in the `base` role
(local tasks; CIS / `dev-sec` as reference only — no Galaxy roles). Includes
AppArmor (enforce mode) and AIDE file-integrity.
2. Implement the CIS Docker Benchmark: daemon/engine settings in `docker_host`;
per-container settings enforced via `docs/security/service-checklist.md`.
3. VM disk layout for CIS L2: separate `/tmp`, `/var`, `/var/log`, `/home`
partitions with `nodev,nosuid,noexec` — a Terraform/cloud-init concern
(ADR-006). Decide the template layout **before** provisioning, since it is
painful to retrofit.
4. Network IDS: enable Suricata on OPNsense (IDS first; IPS later?).
5. Active security alerting: wire AIDE, `auditd`, `fail2ban`, and Suricata into
the Loki/Grafana alerting stack (ties to 3.6).
6. Supply-chain hygiene: enforce image digest pinning + official/verified images
via the service checklist; revisit active scanning (Trivy/Grype) once a
triage stack exists (accepted-risk R1).

View file

@ -25,7 +25,7 @@ What we deliberately design against — and, just as importantly, what we do not
| **Opportunistic external** — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services | Yes — primary | SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth |
| **Lateral movement / blast radius** — assume one service *is* compromised; limit how far it spreads | Yes | VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials |
| **Operator / agent error** — accidental secret leak, misconfiguration, or an AI agent making an unsafe change | Yes | Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks |
| **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Version pinning where practical (ADR-011), gitleaks; tracked as an accepted risk with a revisit trigger |
| **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Baseline hygiene required: image digest pinning + prefer official/verified images (ADR-011, service checklist), gitleaks. Active vuln scanning deferred — accepted risk |
| **Targeted / physical** — a determined adversary specifically after this homelab, or physical device access | Out of scope | Not designed against at this scale; revisit if the threat model changes |
Supply chain is consciously deprioritized, not forgotten — see
@ -89,6 +89,22 @@ time. Each heading tags the threat(s) it primarily serves.
- `auditd` installed and running with a baseline ruleset
- Logs shipped to a central location if a log aggregation service is available
### Mandatory access control — *blast radius*
- **AppArmor** enabled with profiles in enforce mode — Debian-native MAC, default-on,
and required by the CIS Debian benchmark. Docker applies its `docker-default`
profile to containers; tighter per-service profiles are authored as needed.
- **SELinux is not used** — non-native to Debian and redundant with AppArmor
(see `docs/security/accepted-risks.md`).
### File integrity & intrusion detection — *opportunistic, blast radius, agent error*
- **AIDE** file-integrity monitoring (required by the CIS Debian benchmark) — detects
unexpected changes to system files
- **Network IDS** — Suricata on OPNsense (planned; see STATUS.md / TODO)
- **Active alerting** wires AIDE, `auditd`, `fail2ban`, and Suricata into the
monitoring/alerting stack (planned; ties to the Loki/Grafana effort)
## Secrets management — *agent error, opportunistic*
- Ansible Vault for all secrets (API keys, passwords, certificates), structured as a
@ -99,6 +115,29 @@ time. Each heading tags the threat(s) it primarily serves.
`rbw unlock`; nothing decryptable sits at rest in the repo or working tree
- See `docs/runbooks/rotate-secrets.md` for `rbw` setup and rotation
## Hardening standard
The baseline above is implemented to a recognised benchmark rather than ad-hoc:
- **Hosts** — the **CIS Debian Benchmark, Levels 1 and 2**, applied by the `base`
role. Some L2 items require separate partitions (`/tmp`, `/var`, `/var/log`,
`/home`) with restrictive mount options (`nodev,nosuid,noexec`) — that reaches into
VM disk layout, a provisioning concern (Terraform / cloud-init, ADR-006), not just
the `base` role.
- **Container runtime** — the **CIS Docker Benchmark**: daemon/engine settings in the
`docker_host` role; per-container run settings (non-root, read-only rootfs, dropped
capabilities, no `privileged`, no host namespaces) enforced via
`docs/security/service-checklist.md`.
- **Application containers** — no CIS benchmark exists for the app long tail
(Jellyfin, Nextcloud, Forgejo, …); they are covered by the CIS Docker run settings
plus the service checklist plus upstream hardening guidance.
Hardening controls are **implemented as local roles** (per the no-Galaxy-roles
policy, ADR-003), using the CIS benchmarks and community roles (e.g. `dev-sec`) only
as reference. Any specific CIS item that proves impractical is exempted into
`docs/security/accepted-risks.md` with a rationale — so the register records named
exceptions, not a blanket opt-out.
## Governance
Security is maintained, not achieved once. This ADR **establishes** four
@ -132,6 +171,8 @@ This posture was chosen to be:
- **Automated** — no manual steps to reach baseline state
- **Legible & revisitable** — the threat model, principles, and accepted risks are
written down and reviewed over time, not implicit
- **Benchmarked** — host and container hardening follow CIS (Debian L1+L2, Docker),
not ad-hoc choices
Out-of-scope items and conscious trade-offs are recorded in
`docs/security/accepted-risks.md` rather than here, so this decision record stays

View file

@ -2,8 +2,8 @@
Conscious security trade-offs we are choosing to live with — recorded so "what we
are *not* doing" is explicit and revisitable, not forgotten. This register is a
**living document** and is expected to change; it is deliberately kept out of
ADR-002 (which records durable decisions) so the ADR stays stable.
**living document**, deliberately kept out of ADR-002 (which records durable
decisions) so the ADR stays stable.
Owned by **ADR-002** (Security baseline and strategy). Re-challenged during the
periodic security review (planned `/security-review`; see `docs/TODO.md`).
@ -13,9 +13,12 @@ revisit (trigger).
| # | Accepted risk | Rationale | Revisit trigger |
|---|---|---|---|
| R1 | **Supply chain not actively defended** — third-party container/base images, dependencies, and Ansible collections are trusted as pulled | Out of proportion to a homelab's effort budget; the realistic threat is opportunistic, not a targeted supply-chain attack. gitleaks + version pinning (ADR-011) give partial cover | Hosting high-value data/finances for others; a relevant upstream compromise; appetite for image signing / SBOM / pinned digests |
| R2 | **No full CIS benchmark hardening** | Significant complexity for marginal gain at this scale | A compliance need, or hosting third-party data with obligations |
| R3 | **No SELinux / AppArmor** mandatory access control | Operational overhead exceeds benefit for the current threat model | Threat model shifts toward targeted attackers; a service with a poor security history |
| R4 | **No intrusion detection system (IDS)** | Detection is only useful with the capacity to triage it; alerts no one reads are noise | Monitoring/alerting stack (Prometheus/Loki/Grafana) is in place and someone will act on alerts |
| R1 | **Active supply-chain scanning deferred** — baseline hygiene *is* required (image digest pinning + prefer official/verified images, ADR-011 / service checklist; gitleaks), but images and dependencies are not actively vulnerability-scanned (Trivy/Grype) or signature-verified | Scanning only pays off with the capacity to triage its output; the realistic threat is opportunistic, not a targeted supply-chain attack | A monitoring/triage stack is live; hosting high-value data/finances for others; a relevant upstream compromise |
| R2 | **SELinux not used** — no SELinux mandatory access control | AppArmor — Debian-native and enforced via the CIS baseline — already provides MAC; adding SELinux means two MAC systems, non-native to Debian, for no real gain | A service that ships and requires its own SELinux policy; threat model shifts toward targeted attackers |
_Last reviewed: 2026-06-04 (seeded — pending a first re-challenge pass)._
_Last reviewed: 2026-06-04. The prior gaps (full CIS hardening, SELinux/AppArmor,
IDS) were re-challenged and **adopted rather than accepted**: CIS Debian L1+L2 + CIS
Docker, AppArmor (enforce), AIDE file-integrity, and Suricata network IDS are now
part of the security strategy (ADR-002). See STATUS.md / `docs/TODO.md` for build
status. As CIS is implemented, any specific item that proves impractical is added
here as a named exception._