Resolve the conflict between ADR-011 (tags-not-digests) and the security work (digest pinning) with one coherent rule that respects ADR-011's stateless/stateful split: - Stateful → pin `tag@digest` (readable tag + integrity digest): legible diffs AND tamper-evidence. Snapshots cover broken updates; the digest covers swapped images. - Stateless → rolling tags (latest/stable); digest-pinning would defeat the rolling design. Integrity rests on official/verified images + disposability. Aligned across ADR-011 (decision 2), ADR-004 (image management), ADR-002 (supply-chain row), accepted-risk R1, the service checklist, and TODO 15.6. TODO 16.7 marked decided. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.9 KiB
ADR-002 — Security baseline and strategy
Context
Security here is not a single control but the sum of several combined efforts —
host hardening, network segmentation, secrets handling, supply-chain hygiene, and
disciplined automation. This ADR is the frame that organizes them: it records the
threat model we design against, the principles every control serves, the
host-level baseline the base role enforces, and the governance that keeps
security sharp as the homelab grows.
The goal is a principled, maintainable posture for a homelab with some public-facing services — effective against a realistic threat model, not a compliance exercise.
Related decisions: network segmentation (ADR-007), secrets structure (ADR-003), per-service roles (ADR-004), CI secret-scanning (ADR-010).
Threat model
What we deliberately design against — and, just as importantly, what we do not:
| Threat | In scope? | What it drives |
|---|---|---|
| Opportunistic external — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services | Yes — primary | SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth |
| Lateral movement / blast radius — assume one service is compromised; limit how far it spreads | Yes | VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials |
| Operator / agent error — accidental secret leak, misconfiguration, or an AI agent making an unsafe change | Yes | Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks |
| Supply chain — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Baseline hygiene required: tiered image pinning (stateful tag@digest, stateless rolling — ADR-011) + prefer official/verified images, gitleaks. Active vuln scanning deferred — accepted risk |
| Targeted / physical — a determined adversary specifically after this homelab, or physical device access | Out of scope | Not designed against at this scale; revisit if the threat model changes |
Supply chain is consciously deprioritized, not forgotten — see
docs/security/accepted-risks.md.
Security principles
Every control below should trace back to one of these:
- Defense in depth — no single control is load-bearing; layers compensate.
- Least privilege — accounts, containers, and automation get the minimum they need.
- Deny / secure by default — closed unless explicitly opened; safe defaults.
- Contain the blast radius — segment and isolate so one compromise isn't total.
- Automated & reproducible — the baseline is reached by Ansible, never by hand.
- Explicit & revisitable — decisions and accepted risks are written down and re-challenged, not left implicit.
Baseline controls
Applied by the base role, non-negotiable — it runs first, on every host, every
time. Each heading tags the threat(s) it primarily serves.
Access & authentication — opportunistic, agent error
- SSH key authentication only — password auth disabled
- Root login disabled —
PermitRootLogin no - Dedicated
ansibleuser with locked-down sudo (NOPASSWD for automation) - No shared user accounts — per-person SSH keys in
group_vars/all/vars.yml
Firewall — opportunistic, blast radius, agent error
nftables(native on Debian 13, replaces iptables)- Default policy: deny inbound, allow established/related, allow loopback
- Rules managed entirely by Ansible — never edited manually on hosts
- Port definitions live in
group_vars/so rules stay in sync with deployed services - Docker's own iptables rules are disabled — nftables manages all filtering
Note on Docker + nftables: Docker historically bypassed iptables-based firewalls. This is addressed by setting
"iptables": falsein Docker daemon config and managing all rules via nftables explicitly. Seedocs/decisions/004-docker-model.md.
Intrusion deterrence — opportunistic
fail2banmonitoring SSH (and optionally reverse proxy logs)- Configured to ban after 5 failed attempts, 1-hour ban
Updates — opportunistic
unattended-upgradesenabled for security patches only- Full system upgrades triggered deliberately via Ansible (
make deploy PLAYBOOK=upgrade) - No automatic reboots — reboots are a conscious operational decision
Minimal attack surface — opportunistic, blast radius
- No unnecessary packages installed
- Docker daemon TCP socket disabled — Unix socket only
- No open ports beyond those explicitly defined in firewall rules
Audit trail — agent error, blast radius
auditdinstalled and running with a baseline ruleset- Logs shipped to a central location if a log aggregation service is available
Mandatory access control — blast radius
- AppArmor enabled with profiles in enforce mode — Debian-native MAC, default-on,
and required by the CIS Debian benchmark. Docker applies its
docker-defaultprofile to containers; tighter per-service profiles are authored as needed. - SELinux is not used — non-native to Debian and redundant with AppArmor
(see
docs/security/accepted-risks.md).
File integrity & intrusion detection — opportunistic, blast radius, agent error
- AIDE file-integrity monitoring (required by the CIS Debian benchmark) — detects unexpected changes to system files
- Network IDS — Suricata on OPNsense (planned; see STATUS.md / TODO)
- Active alerting wires AIDE,
auditd,fail2ban, and Suricata into the monitoring/alerting stack (planned; ties to the Loki/Grafana effort)
Secrets management — agent error, opportunistic
- Ansible Vault for all secrets (API keys, passwords, certificates), structured as a
nested
vault.<service>.<key>map (ADR-003) - The master vault password lives in Vaultwarden and is fetched on demand by
scripts/vault-pass-client.sh(wired asvault_password_file) through therbwagent — never written to a plaintext file on disk. Unlock once per session withrbw unlock; nothing decryptable sits at rest in the repo or working tree - See
docs/runbooks/rotate-secrets.mdforrbwsetup and rotation
Hardening standard
The baseline above is implemented to a recognised benchmark rather than ad-hoc:
- Hosts — the CIS Debian Benchmark, Levels 1 and 2, applied by the
baserole. Some L2 items require separate partitions (/tmp,/var,/var/log,/home) with restrictive mount options (nodev,nosuid,noexec) — that reaches into VM disk layout, a provisioning concern (Terraform / cloud-init, ADR-006), not just thebaserole. - Container runtime — the CIS Docker Benchmark: daemon/engine settings in the
docker_hostrole; per-container run settings (non-root, read-only rootfs, dropped capabilities, noprivileged, no host namespaces) enforced viadocs/security/service-checklist.md. - Application containers — no CIS benchmark exists for the app long tail (Jellyfin, Nextcloud, Forgejo, …); they are covered by the CIS Docker run settings plus the service checklist plus upstream hardening guidance.
Hardening controls are implemented as local roles (per the no-Galaxy-roles
policy, ADR-003), using the CIS benchmarks and community roles (e.g. dev-sec) only
as reference. Any specific CIS item that proves impractical is exempted into
docs/security/accepted-risks.md with a rationale — so the register records named
exceptions, not a blanket opt-out.
Governance
Security is maintained, not achieved once. This ADR establishes four mechanisms; each lives where change is cheap and is linked from here.
- Per-service security bar — every exposed service must clear a defined
checklist before deploy (secrets in vault, no default creds, least-privilege /
non-root, declared firewall ports, reverse-proxy + auth if exposed). The generic
bar lives in
docs/security/service-checklist.md, and each service records how it meets the bar (plus service-specific hardening) in its ownroles/<service>/SECURITY.md, created fromdocs/security/service-security-template.md(ADR-004). Enforced manually in review today; the planned/security-reviewaggregates everyroles/*/SECURITY.mdand cross-checks it against the role's config. - Periodic security review — a recurring review that re-checks posture,
surfaces drift, and re-challenges accepted risks. Planned as a
/security-reviewskill (sibling to/review-repo); seedocs/TODO.md(Scheduled work). Not built yet — see STATUS.md. - Accepted-risk register — the conscious trade-offs we choose to live with, each
with rationale and a revisit trigger. Lives in
docs/security/accepted-risks.md(expected to change; kept out of this ADR so the ADR stays stable). - Agent / automation guardrails — what AI agents and automation may do
unsupervised vs. what needs a human gate, since operator/agent error is in the
threat model. Encoded in
CLAUDE.md("What Claude must not do without explicit instruction") and enforced by PreToolUse hooks (generated-file guard,rbwpre-flight).
Decision
This posture was chosen to be:
- Effective against the stated threat model (opportunistic external, lateral movement, operator/agent error)
- Maintainable by a small team without security-expertise overhead
- Automated — no manual steps to reach baseline state
- Legible & revisitable — the threat model, principles, and accepted risks are written down and reviewed over time, not implicit
- Benchmarked — host and container hardening follow CIS (Debian L1+L2, Docker), not ad-hoc choices
Out-of-scope items and conscious trade-offs are recorded in
docs/security/accepted-risks.md rather than here, so this decision record stays
stable while the risk posture evolves.