7 changed files with 24 additions and 245 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -154,10 +154,6 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 - Edit vault-encrypted files directly — decrypt first, re-encrypt after
 - Force-push or rewrite already-pushed history on `main`
 - Add a collection to `requirements.yml` without a specific module need in existing role tasks
 - Open a firewall port anywhere but the `group_vars` firewall definitions — never ad-hoc on a host (ADR-002)
 - Disable or weaken a baseline control from ADR-002 (SSH hardening, nftables default-deny, fail2ban, auditd)
 - Expose a service to the LAN/WAN without it sitting behind the reverse proxy with authentication (ADR-002)
 - Deploy a service that hasn't cleared `docs/security/service-checklist.md` (record any deviation in `docs/security/accepted-risks.md`)
 ---
@ -166,9 +162,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 | Topic                  | File                                  |
 |------------------------|---------------------------------------|
 | Architecture overview  | `docs/decisions/001-architecture.md`  |
-| Security baseline & strategy | `docs/decisions/002-security.md`      |
+| Security baseline      | `docs/decisions/002-security.md`      |
 | Accepted security risks | `docs/security/accepted-risks.md`     |
 | Per-service security checklist | `docs/security/service-checklist.md` |
 | Toolchain choices      | `docs/decisions/003-toolchain.md`     |
 | Docker & Compose model | `docs/decisions/004-docker-model.md`  |
 | Bootstrapping hosts    | `docs/decisions/005-bootstrapping.md` |
--- a/STATUS.md
+++ b/STATUS.md
@ -23,7 +23,6 @@ _Last reviewed: 2026-05-30._
 | Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below |
 | `docs/hardware/reference.md` + `scripts/capacity-scan.py` | Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON |
 | `/capacity-review` | Works — on-demand capacity evaluation → `docs/hardware/reviews/`. Intent-based (no live usage yet) |
 | ADR-002 security strategy + `docs/security/{accepted-risks,service-checklist}.md` | Present — threat model, principles, governance frame; checklist + risk register are docs, enforced manually in review |
 ## Scaffolded but empty — NOT implemented
@ -48,9 +47,6 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
 | Per-service roles | ADR-004 | Model defined; no service roles built |
 | Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
 | Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |
 | `/security-review` skill | ADR-002 / TODO 8.5 | Periodic posture re-check + accepted-risk re-challenge; planned, not built |
 | CIS hardening (Debian L1+L2 + Docker) | ADR-002 / TODO 15 | Implemented by the (unbuilt) `base`/`docker_host` roles; brings AppArmor + AIDE as baseline. L2 partitions affect VM provisioning (ADR-006) |
 | Network IDS + security alerting | ADR-002 / TODO 15 | Suricata on OPNsense + AIDE/`auditd`/`fail2ban` alerting into the monitoring stack; not built |
 ## Keeping this honest
--- a/docs/TODO.md
+++ b/docs/TODO.md
@ -60,12 +60,6 @@
      Prometheus/Loki/Grafana/Grafana-Alloy stack we will likely set up anyway
      (richer, per-process, but more to run) — see TODO 3.6. Don't build the
      Proxmox-RRD hook before settling this, to avoid throwaway work.
   5. Build a `/security-review` skill (sibling to `/review-repo`): re-check the
      security posture against ADR-002, surface drift, and re-challenge the
      accepted-risk register (`docs/security/accepted-risks.md`). Could pair a
      deterministic pre-scan (undeclared open ports, disabled baseline controls,
      world-readable secrets, services not behind auth) with a judgement pass.
      Open question: standalone, or folded into the kaizen `/retro` (item 11)?
 9. Should we make a basic function so that tools (and AI) can send messages to the user - email, matrix or ntfy?
 10. **Claude setup** — DECIDED: brainstorm for intent, capture as ADRs (skip plan
@ -74,9 +68,6 @@
    1. Policy for how we collaborate with references to baobabAnsibleV4 without misusing it.
    2. Policy for how we write key documents like ADRs.
    3. Further development on how we we collaborate on designing the foundation for the project - seperate from how we implement new containers etc.
    4. How do we make sure agents always use the latest official documentation for the technologies etc. we use?
    5. Always subagent driven?
    6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
 11. **Kaizen loop** — set up ~2026-06-06 (one week from now).
    1. Build `/retro`: reads `docs/FRICTION.md` + recurring `/review-repo`
@ -97,20 +88,3 @@
    whether selectively allowing libraries (e.g. PyYAML — already present via
    Ansible) is a better fit in general: weigh the parsing-correctness win
    against losing zero-setup portability. Decide a clear rule and record it.
 15. **Security hardening implementation** — build out the ADR-002 hardening standard.
    1. Implement the CIS Debian Benchmark **Level 1 + Level 2** in the `base` role
       (local tasks; CIS / `dev-sec` as reference only — no Galaxy roles). Includes
       AppArmor (enforce mode) and AIDE file-integrity.
    2. Implement the CIS Docker Benchmark: daemon/engine settings in `docker_host`;
       per-container settings enforced via `docs/security/service-checklist.md`.
    3. VM disk layout for CIS L2: separate `/tmp`, `/var`, `/var/log`, `/home`
       partitions with `nodev,nosuid,noexec` — a Terraform/cloud-init concern
       (ADR-006). Decide the template layout **before** provisioning, since it is
       painful to retrofit.
    4. Network IDS: enable Suricata on OPNsense (IDS first; IPS later?).
    5. Active security alerting: wire AIDE, `auditd`, `fail2ban`, and Suricata into
       the Loki/Grafana alerting stack (ties to 3.6).
    6. Supply-chain hygiene: enforce image digest pinning + official/verified images
       via the service checklist; revisit active scanning (Trivy/Grype) once a
       triage stack exists (accepted-risk R1).
--- a/docs/decisions/002-security.md
+++ b/docs/decisions/002-security.md
@ -1,61 +1,24 @@
-# ADR-002 — Security baseline and strategy
+# ADR-002 — Security baseline
 ## Context
-Security here is not a single control but the sum of several combined efforts —
+Every managed host must reach a defined security baseline before any services
-host hardening, network segmentation, secrets handling, supply-chain hygiene, and
+are deployed. This baseline is applied by the `base` role and is non-negotiable —
-disciplined automation. This ADR is the frame that organizes them: it records the
+it runs first, on every host, every time.
 **threat model** we design against, the **principles** every control serves, the
 host-level **baseline** the `base` role enforces, and the **governance** that keeps
 security sharp as the homelab grows.
-The goal is a principled, maintainable posture for a homelab with some
+The goal is a principled, maintainable baseline appropriate for a homelab with
-public-facing services — effective against a realistic threat model, not a
+some public-facing services — not a compliance exercise.
 compliance exercise.
-Related decisions: network segmentation (ADR-007), secrets structure (ADR-003),
+## Baseline components
 per-service roles (ADR-004), CI secret-scanning (ADR-010).
-## Threat model
+### Access & authentication
 What we deliberately design against — and, just as importantly, what we do not:
 | Threat | In scope? | What it drives |
 |---|---|---|
 | **Opportunistic external** — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services | Yes — primary | SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth |
 | **Lateral movement / blast radius** — assume one service *is* compromised; limit how far it spreads | Yes | VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials |
 | **Operator / agent error** — accidental secret leak, misconfiguration, or an AI agent making an unsafe change | Yes | Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks |
 | **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Baseline hygiene required: image digest pinning + prefer official/verified images (ADR-011, service checklist), gitleaks. Active vuln scanning deferred — accepted risk |
 | **Targeted / physical** — a determined adversary specifically after this homelab, or physical device access | Out of scope | Not designed against at this scale; revisit if the threat model changes |
 Supply chain is consciously deprioritized, not forgotten — see
 `docs/security/accepted-risks.md`.
 ## Security principles
 Every control below should trace back to one of these:
 - **Defense in depth** — no single control is load-bearing; layers compensate.
 - **Least privilege** — accounts, containers, and automation get the minimum they need.
 - **Deny / secure by default** — closed unless explicitly opened; safe defaults.
 - **Contain the blast radius** — segment and isolate so one compromise isn't total.
 - **Automated & reproducible** — the baseline is reached by Ansible, never by hand.
 - **Explicit & revisitable** — decisions and accepted risks are written down and
  re-challenged, not left implicit.
 ## Baseline controls
 Applied by the `base` role, non-negotiable — it runs first, on every host, every
 time. Each heading tags the threat(s) it primarily serves.
 ### Access & authentication — *opportunistic, agent error*
 - SSH key authentication only — password auth disabled
 - Root login disabled — `PermitRootLogin no`
 - Dedicated `ansible` user with locked-down sudo (NOPASSWD for automation)
 - No shared user accounts — per-person SSH keys in `group_vars/all/vars.yml`
-### Firewall — *opportunistic, blast radius, agent error*
+### Firewall
 - `nftables` (native on Debian 13, replaces iptables)
 - Default policy: deny inbound, allow established/related, allow loopback
@ -67,45 +30,29 @@ time. Each heading tags the threat(s) it primarily serves.
 > This is addressed by setting `"iptables": false` in Docker daemon config and managing
 > all rules via nftables explicitly. See `docs/decisions/004-docker-model.md`.
-### Intrusion deterrence — *opportunistic*
+### Intrusion deterrence
 - `fail2ban` monitoring SSH (and optionally reverse proxy logs)
 - Configured to ban after 5 failed attempts, 1-hour ban
-### Updates — *opportunistic*
+### Updates
 - `unattended-upgrades` enabled for **security patches only**
 - Full system upgrades triggered deliberately via Ansible (`make deploy PLAYBOOK=upgrade`)
 - No automatic reboots — reboots are a conscious operational decision
-### Minimal attack surface — *opportunistic, blast radius*
+### Minimal attack surface
 - No unnecessary packages installed
 - Docker daemon TCP socket disabled — Unix socket only
 - No open ports beyond those explicitly defined in firewall rules
-### Audit trail — *agent error, blast radius*
+### Audit trail
 - `auditd` installed and running with a baseline ruleset
 - Logs shipped to a central location if a log aggregation service is available
-### Mandatory access control — *blast radius*
+## Secrets management
 - **AppArmor** enabled with profiles in enforce mode — Debian-native MAC, default-on,
  and required by the CIS Debian benchmark. Docker applies its `docker-default`
  profile to containers; tighter per-service profiles are authored as needed.
 - **SELinux is not used** — non-native to Debian and redundant with AppArmor
  (see `docs/security/accepted-risks.md`).
 ### File integrity & intrusion detection — *opportunistic, blast radius, agent error*
 - **AIDE** file-integrity monitoring (required by the CIS Debian benchmark) — detects
  unexpected changes to system files
 - **Network IDS** — Suricata on OPNsense (planned; see STATUS.md / TODO)
 - **Active alerting** wires AIDE, `auditd`, `fail2ban`, and Suricata into the
  monitoring/alerting stack (planned; ties to the Loki/Grafana effort)
 ## Secrets management — *agent error, opportunistic*
 - Ansible Vault for all secrets (API keys, passwords, certificates), structured as a
  nested `vault.<service>.<key>` map (ADR-003)
@ -115,65 +62,15 @@ time. Each heading tags the threat(s) it primarily serves.
  `rbw unlock`; nothing decryptable sits at rest in the repo or working tree
 - See `docs/runbooks/rotate-secrets.md` for `rbw` setup and rotation
-## Hardening standard
+## What this baseline does not include
-The baseline above is implemented to a recognised benchmark rather than ad-hoc:
+- Full CIS benchmark hardening — adds complexity for marginal gain at this scale
-
+- SELinux / AppArmor — not applied by default, revisit if threat model changes
- **Hosts** — the **CIS Debian Benchmark, Levels 1 and 2**, applied by the `base`
+- Intrusion detection (IDS) — out of scope for now
  role. Some L2 items require separate partitions (`/tmp`, `/var`, `/var/log`,
  `/home`) with restrictive mount options (`nodev,nosuid,noexec`) — that reaches into
  VM disk layout, a provisioning concern (Terraform / cloud-init, ADR-006), not just
  the `base` role.
 - **Container runtime** — the **CIS Docker Benchmark**: daemon/engine settings in the
  `docker_host` role; per-container run settings (non-root, read-only rootfs, dropped
  capabilities, no `privileged`, no host namespaces) enforced via
  `docs/security/service-checklist.md`.
 - **Application containers** — no CIS benchmark exists for the app long tail
  (Jellyfin, Nextcloud, Forgejo, …); they are covered by the CIS Docker run settings
  plus the service checklist plus upstream hardening guidance.
 Hardening controls are **implemented as local roles** (per the no-Galaxy-roles
 policy, ADR-003), using the CIS benchmarks and community roles (e.g. `dev-sec`) only
 as reference. Any specific CIS item that proves impractical is exempted into
 `docs/security/accepted-risks.md` with a rationale — so the register records named
 exceptions, not a blanket opt-out.
 ## Governance
 Security is maintained, not achieved once. This ADR **establishes** four
 mechanisms; each lives where change is cheap and is linked from here.
 - **Per-service security bar** — every exposed service must clear a defined
  checklist before deploy (secrets in vault, no default creds, least-privilege /
  non-root, declared firewall ports, reverse-proxy + auth if exposed). Lives in
  `docs/security/service-checklist.md`; referenced from `docs/runbooks/new-role.md`.
  Enforced manually in review today; the planned `/security-review` will automate it.
 - **Periodic security review** — a recurring review that re-checks posture,
  surfaces drift, and re-challenges accepted risks. Planned as a `/security-review`
  skill (sibling to `/review-repo`); see `docs/TODO.md` (Scheduled work). Not built
  yet — see STATUS.md.
 - **Accepted-risk register** — the conscious trade-offs we choose to live with, each
  with rationale and a revisit trigger. Lives in `docs/security/accepted-risks.md`
  (expected to change; kept out of this ADR so the ADR stays stable).
 - **Agent / automation guardrails** — what AI agents and automation may do
  unsupervised vs. what needs a human gate, since operator/agent error is in the
  threat model. Encoded in `CLAUDE.md` ("What Claude must not do without explicit
  instruction") and enforced by PreToolUse hooks (generated-file guard, `rbw`
  pre-flight).
 ## Decision
-This posture was chosen to be:
+This baseline was chosen to be:
-
+- **Effective** against the realistic threat model (exposed services, shared repo)
- **Effective** against the stated threat model (opportunistic external, lateral
+- **Maintainable** by a small team without security expertise overhead
-  movement, operator/agent error)
+- **Automated** — no manual steps should be needed to reach baseline state
 - **Maintainable** by a small team without security-expertise overhead
 - **Automated** — no manual steps to reach baseline state
 - **Legible & revisitable** — the threat model, principles, and accepted risks are
  written down and reviewed over time, not implicit
 - **Benchmarked** — host and container hardening follow CIS (Debian L1+L2, Docker),
  not ad-hoc choices
 Out-of-scope items and conscious trade-offs are recorded in
 `docs/security/accepted-risks.md` rather than here, so this decision record stays
 stable while the risk posture evolves.
--- a/docs/runbooks/new-role.md
+++ b/docs/runbooks/new-role.md
@ -71,16 +71,7 @@ Fix any lint or test failures before committing.
 Add the role to the appropriate playbook in `playbooks/` and add the host group
 to `inventories/staging/hosts.yml` for integration testing.
-### 9. Clear the security checklist (services)
+### 9. Commit
 If the role is a **service** — especially one reachable beyond its own host —
 walk `docs/security/service-checklist.md` and confirm every item passes (secrets
 in vault, no default creds, least-privilege, declared firewall ports, behind the
 reverse proxy with auth if exposed). Record any conscious deviation in
 `docs/security/accepted-risks.md`. This bar is established by ADR-002; enforcement
 is manual in review today, with the planned `/security-review` to automate it.
 ### 10. Commit
 ```bash
 git checkout -b role/<rolename>
--- a/docs/security/accepted-risks.md
+++ b/docs/security/accepted-risks.md
@ -1,24 +0,0 @@
 # Accepted security risks
 Conscious security trade-offs we are choosing to live with — recorded so "what we
 are *not* doing" is explicit and revisitable, not forgotten. This register is a
 **living document**, deliberately kept out of ADR-002 (which records durable
 decisions) so the ADR stays stable.
 Owned by **ADR-002** (Security baseline and strategy). Re-challenged during the
 periodic security review (planned `/security-review`; see `docs/TODO.md`).
 **Each entry:** the risk · why we accept it (rationale) · what would make us
 revisit (trigger).
 | # | Accepted risk | Rationale | Revisit trigger |
 |---|---|---|---|
 | R1 | **Active supply-chain scanning deferred** — baseline hygiene *is* required (image digest pinning + prefer official/verified images, ADR-011 / service checklist; gitleaks), but images and dependencies are not actively vulnerability-scanned (Trivy/Grype) or signature-verified | Scanning only pays off with the capacity to triage its output; the realistic threat is opportunistic, not a targeted supply-chain attack | A monitoring/triage stack is live; hosting high-value data/finances for others; a relevant upstream compromise |
 | R2 | **SELinux not used** — no SELinux mandatory access control | AppArmor — Debian-native and enforced via the CIS baseline — already provides MAC; adding SELinux means two MAC systems, non-native to Debian, for no real gain | A service that ships and requires its own SELinux policy; threat model shifts toward targeted attackers |
 _Last reviewed: 2026-06-04. The prior gaps (full CIS hardening, SELinux/AppArmor,
 IDS) were re-challenged and **adopted rather than accepted**: CIS Debian L1+L2 + CIS
 Docker, AppArmor (enforce), AIDE file-integrity, and Suricata network IDS are now
 part of the security strategy (ADR-002). See STATUS.md / `docs/TODO.md` for build
 status. As CIS is implemented, any specific item that proves impractical is added
 here as a named exception._
--- a/docs/security/service-checklist.md
+++ b/docs/security/service-checklist.md
@ -1,49 +0,0 @@
 # Per-service security checklist
 The bar every service (a per-service role — ADR-004) must clear **before deploy**,
 especially anything reachable beyond its own host. Established by **ADR-002**
 (Security baseline and strategy); referenced from `docs/runbooks/new-role.md`.
 Enforced manually in review today; the planned `/security-review` skill (see
 `docs/TODO.md`) will automate the check.
 Treat each item as must-pass **unless** a deviation is recorded in
 `docs/security/accepted-risks.md` with a rationale and a revisit trigger.
 ## Secrets & credentials
 - [ ] All secrets live in an encrypted `vault.yml` (`vault.<service>.<key>`); none in
      plaintext files, templates, or Compose env literals
 - [ ] No default or vendor-shipped credentials remain — admin passwords/tokens are
      generated and stored in vault
 - [ ] Nothing secret is baked into an image or committed to git (gitleaks must pass)
 ## Least privilege
 - [ ] Container runs as a non-root user where the image supports it
 - [ ] No `privileged: true` and no host network mode unless explicitly justified
 - [ ] Only the volumes/paths the service needs are mounted; read-only where possible
 - [ ] Linux capabilities dropped to what's required (no blanket grants)
 ## Network & exposure
 - [ ] Every listening port is declared in `group_vars` firewall definitions — never
      opened ad-hoc on a host
 - [ ] The service is not published directly to a LAN/WAN port if it can sit behind the
      reverse proxy instead
 - [ ] Anything reachable beyond the `srv` VLAN is behind the reverse proxy **with
      authentication** (and TLS)
 - [ ] Inter-service reach follows least privilege — no broad `srv`→`srv` access where a
      single declared dependency suffices
 ## Updates & provenance
 - [ ] Image/source version is pinned (tag or digest), not floating `latest` (ADR-011)
 - [ ] The update path is known — how this service gets patched
 ## Operability (security-adjacent)
 - [ ] Logs go somewhere reviewable (central aggregation when available)
 - [ ] Backup/restore is covered if the service holds state
 > Deviations are allowed but must be **conscious**: record them in
 > `docs/security/accepted-risks.md`, don't leave them implicit.