sjat/boma

sjat f338bccd46 Expand ADR-002 into a security baseline + strategy

Add a managerial security frame on top of the host baseline: explicit threat
model (opportunistic external, lateral movement/blast radius, operator/agent
error; supply chain accepted-lower-priority), security principles, and four
governance mechanisms that ADR-002 establishes and links out to:

- docs/security/service-checklist.md — per-service security bar (referenced
  from the new-role runbook)
- docs/security/accepted-risks.md — living accepted-risk register (R1-R4)
- planned /security-review skill (TODO 8.5)
- agent guardrails in CLAUDE.md "what Claude must not do"

STATUS.md records the frame as present (manual enforcement) and /security-review
as planned-not-built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-04 14:39:51 +02:00

7.3 KiB

Raw Blame History

ADR-002 — Security baseline and strategy

Context

Security here is not a single control but the sum of several combined efforts — host hardening, network segmentation, secrets handling, supply-chain hygiene, and disciplined automation. This ADR is the frame that organizes them: it records the threat model we design against, the principles every control serves, the host-level baseline the base role enforces, and the governance that keeps security sharp as the homelab grows.

The goal is a principled, maintainable posture for a homelab with some public-facing services — effective against a realistic threat model, not a compliance exercise.

Related decisions: network segmentation (ADR-007), secrets structure (ADR-003), per-service roles (ADR-004), CI secret-scanning (ADR-010).

Threat model

What we deliberately design against — and, just as importantly, what we do not:

Threat	In scope?	What it drives
Opportunistic external — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services	Yes — primary	SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth
Lateral movement / blast radius — assume one service is compromised; limit how far it spreads	Yes	VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials
Operator / agent error — accidental secret leak, misconfiguration, or an AI agent making an unsafe change	Yes	Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks
Supply chain — compromised images, base images, dependencies, collections	Acknowledged, lower priority	Version pinning where practical (ADR-011), gitleaks; tracked as an accepted risk with a revisit trigger
Targeted / physical — a determined adversary specifically after this homelab, or physical device access	Out of scope	Not designed against at this scale; revisit if the threat model changes

Supply chain is consciously deprioritized, not forgotten — see docs/security/accepted-risks.md.

Security principles

Every control below should trace back to one of these:

Defense in depth — no single control is load-bearing; layers compensate.
Least privilege — accounts, containers, and automation get the minimum they need.
Deny / secure by default — closed unless explicitly opened; safe defaults.
Contain the blast radius — segment and isolate so one compromise isn't total.
Automated & reproducible — the baseline is reached by Ansible, never by hand.
Explicit & revisitable — decisions and accepted risks are written down and re-challenged, not left implicit.

Baseline controls

Applied by the base role, non-negotiable — it runs first, on every host, every time. Each heading tags the threat(s) it primarily serves.

Access & authentication — opportunistic, agent error

SSH key authentication only — password auth disabled
Root login disabled — PermitRootLogin no
Dedicated ansible user with locked-down sudo (NOPASSWD for automation)
No shared user accounts — per-person SSH keys in group_vars/all/vars.yml

Firewall — opportunistic, blast radius, agent error

nftables (native on Debian 13, replaces iptables)
Default policy: deny inbound, allow established/related, allow loopback
Rules managed entirely by Ansible — never edited manually on hosts
Port definitions live in group_vars/ so rules stay in sync with deployed services
Docker's own iptables rules are disabled — nftables manages all filtering

Note on Docker + nftables: Docker historically bypassed iptables-based firewalls. This is addressed by setting "iptables": false in Docker daemon config and managing all rules via nftables explicitly. See docs/decisions/004-docker-model.md.

Intrusion deterrence — opportunistic

fail2ban monitoring SSH (and optionally reverse proxy logs)
Configured to ban after 5 failed attempts, 1-hour ban

Updates — opportunistic

unattended-upgrades enabled for security patches only
Full system upgrades triggered deliberately via Ansible (make deploy PLAYBOOK=upgrade)
No automatic reboots — reboots are a conscious operational decision

Minimal attack surface — opportunistic, blast radius

No unnecessary packages installed
Docker daemon TCP socket disabled — Unix socket only
No open ports beyond those explicitly defined in firewall rules

Audit trail — agent error, blast radius

auditd installed and running with a baseline ruleset
Logs shipped to a central location if a log aggregation service is available

Secrets management — agent error, opportunistic

Ansible Vault for all secrets (API keys, passwords, certificates), structured as a nested vault.<service>.<key> map (ADR-003)
The master vault password lives in Vaultwarden and is fetched on demand by scripts/vault-pass-client.sh (wired as vault_password_file) through the rbw agent — never written to a plaintext file on disk. Unlock once per session with rbw unlock; nothing decryptable sits at rest in the repo or working tree
See docs/runbooks/rotate-secrets.md for rbw setup and rotation

Governance

Security is maintained, not achieved once. This ADR establishes four mechanisms; each lives where change is cheap and is linked from here.

Per-service security bar — every exposed service must clear a defined checklist before deploy (secrets in vault, no default creds, least-privilege / non-root, declared firewall ports, reverse-proxy + auth if exposed). Lives in docs/security/service-checklist.md; referenced from docs/runbooks/new-role.md. Enforced manually in review today; the planned /security-review will automate it.
Periodic security review — a recurring review that re-checks posture, surfaces drift, and re-challenges accepted risks. Planned as a /security-review skill (sibling to /review-repo); see docs/TODO.md (Scheduled work). Not built yet — see STATUS.md.
Accepted-risk register — the conscious trade-offs we choose to live with, each with rationale and a revisit trigger. Lives in docs/security/accepted-risks.md (expected to change; kept out of this ADR so the ADR stays stable).
Agent / automation guardrails — what AI agents and automation may do unsupervised vs. what needs a human gate, since operator/agent error is in the threat model. Encoded in CLAUDE.md ("What Claude must not do without explicit instruction") and enforced by PreToolUse hooks (generated-file guard, rbw pre-flight).

Decision

This posture was chosen to be:

Effective against the stated threat model (opportunistic external, lateral movement, operator/agent error)
Maintainable by a small team without security-expertise overhead
Automated — no manual steps to reach baseline state
Legible & revisitable — the threat model, principles, and accepted risks are written down and reviewed over time, not implicit

Out-of-scope items and conscious trade-offs are recorded in docs/security/accepted-risks.md rather than here, so this decision record stays stable while the risk posture evolves.

7.3 KiB Raw Blame History