# ADR-002 — Security baseline and strategy ## Context Security here is not a single control but the sum of several combined efforts — host hardening, network segmentation, secrets handling, supply-chain hygiene, and disciplined automation. This ADR is the frame that organizes them: it records the **threat model** we design against, the **principles** every control serves, the host-level **baseline** the `base` role enforces, and the **governance** that keeps security sharp as the homelab grows. The goal is a principled, maintainable posture for a homelab with some public-facing services — effective against a realistic threat model, not a compliance exercise. Related decisions: network segmentation (ADR-007), secrets structure (ADR-003), per-service roles (ADR-004), CI secret-scanning (ADR-010). ## Threat model What we deliberately design against — and, just as importantly, what we do not: | Threat | In scope? | What it drives | |---|---|---| | **Opportunistic external** — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services | Yes — primary | SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth | | **Lateral movement / blast radius** — assume one service *is* compromised; limit how far it spreads | Yes | VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials | | **Operator / agent error** — accidental secret leak, misconfiguration, or an AI agent making an unsafe change | Yes | Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks | | **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Version pinning where practical (ADR-011), gitleaks; tracked as an accepted risk with a revisit trigger | | **Targeted / physical** — a determined adversary specifically after this homelab, or physical device access | Out of scope | Not designed against at this scale; revisit if the threat model changes | Supply chain is consciously deprioritized, not forgotten — see `docs/security/accepted-risks.md`. ## Security principles Every control below should trace back to one of these: - **Defense in depth** — no single control is load-bearing; layers compensate. - **Least privilege** — accounts, containers, and automation get the minimum they need. - **Deny / secure by default** — closed unless explicitly opened; safe defaults. - **Contain the blast radius** — segment and isolate so one compromise isn't total. - **Automated & reproducible** — the baseline is reached by Ansible, never by hand. - **Explicit & revisitable** — decisions and accepted risks are written down and re-challenged, not left implicit. ## Baseline controls Applied by the `base` role, non-negotiable — it runs first, on every host, every time. Each heading tags the threat(s) it primarily serves. ### Access & authentication — *opportunistic, agent error* - SSH key authentication only — password auth disabled - Root login disabled — `PermitRootLogin no` - Dedicated `ansible` user with locked-down sudo (NOPASSWD for automation) - No shared user accounts — per-person SSH keys in `group_vars/all/vars.yml` ### Firewall — *opportunistic, blast radius, agent error* - `nftables` (native on Debian 13, replaces iptables) - Default policy: deny inbound, allow established/related, allow loopback - Rules managed entirely by Ansible — never edited manually on hosts - Port definitions live in `group_vars/` so rules stay in sync with deployed services - Docker's own iptables rules are disabled — nftables manages all filtering > **Note on Docker + nftables**: Docker historically bypassed iptables-based firewalls. > This is addressed by setting `"iptables": false` in Docker daemon config and managing > all rules via nftables explicitly. See `docs/decisions/004-docker-model.md`. ### Intrusion deterrence — *opportunistic* - `fail2ban` monitoring SSH (and optionally reverse proxy logs) - Configured to ban after 5 failed attempts, 1-hour ban ### Updates — *opportunistic* - `unattended-upgrades` enabled for **security patches only** - Full system upgrades triggered deliberately via Ansible (`make deploy PLAYBOOK=upgrade`) - No automatic reboots — reboots are a conscious operational decision ### Minimal attack surface — *opportunistic, blast radius* - No unnecessary packages installed - Docker daemon TCP socket disabled — Unix socket only - No open ports beyond those explicitly defined in firewall rules ### Audit trail — *agent error, blast radius* - `auditd` installed and running with a baseline ruleset - Logs shipped to a central location if a log aggregation service is available ## Secrets management — *agent error, opportunistic* - Ansible Vault for all secrets (API keys, passwords, certificates), structured as a nested `vault..` map (ADR-003) - The master vault password lives in **Vaultwarden** and is fetched on demand by `scripts/vault-pass-client.sh` (wired as `vault_password_file`) through the `rbw` agent — never written to a plaintext file on disk. Unlock once per session with `rbw unlock`; nothing decryptable sits at rest in the repo or working tree - See `docs/runbooks/rotate-secrets.md` for `rbw` setup and rotation ## Governance Security is maintained, not achieved once. This ADR **establishes** four mechanisms; each lives where change is cheap and is linked from here. - **Per-service security bar** — every exposed service must clear a defined checklist before deploy (secrets in vault, no default creds, least-privilege / non-root, declared firewall ports, reverse-proxy + auth if exposed). Lives in `docs/security/service-checklist.md`; referenced from `docs/runbooks/new-role.md`. Enforced manually in review today; the planned `/security-review` will automate it. - **Periodic security review** — a recurring review that re-checks posture, surfaces drift, and re-challenges accepted risks. Planned as a `/security-review` skill (sibling to `/review-repo`); see `docs/TODO.md` (Scheduled work). Not built yet — see STATUS.md. - **Accepted-risk register** — the conscious trade-offs we choose to live with, each with rationale and a revisit trigger. Lives in `docs/security/accepted-risks.md` (expected to change; kept out of this ADR so the ADR stays stable). - **Agent / automation guardrails** — what AI agents and automation may do unsupervised vs. what needs a human gate, since operator/agent error is in the threat model. Encoded in `CLAUDE.md` ("What Claude must not do without explicit instruction") and enforced by PreToolUse hooks (generated-file guard, `rbw` pre-flight). ## Decision This posture was chosen to be: - **Effective** against the stated threat model (opportunistic external, lateral movement, operator/agent error) - **Maintainable** by a small team without security-expertise overhead - **Automated** — no manual steps to reach baseline state - **Legible & revisitable** — the threat model, principles, and accepted risks are written down and reviewed over time, not implicit Out-of-scope items and conscious trade-offs are recorded in `docs/security/accepted-risks.md` rather than here, so this decision record stays stable while the risk posture evolves.