Expand ADR-002 into a security baseline + strategy
Add a managerial security frame on top of the host baseline: explicit threat model (opportunistic external, lateral movement/blast radius, operator/agent error; supply chain accepted-lower-priority), security principles, and four governance mechanisms that ADR-002 establishes and links out to: - docs/security/service-checklist.md — per-service security bar (referenced from the new-role runbook) - docs/security/accepted-risks.md — living accepted-risk register (R1-R4) - planned /security-review skill (TODO 8.5) - agent guardrails in CLAUDE.md "what Claude must not do" STATUS.md records the frame as present (manual enforcement) and /security-review as planned-not-built. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
c57910eda8
commit
f338bccd46
7 changed files with 182 additions and 24 deletions
|
|
@ -154,6 +154,10 @@ Single-contributor, trunk-based (no merge requests / approval gates):
|
|||
- Edit vault-encrypted files directly — decrypt first, re-encrypt after
|
||||
- Force-push or rewrite already-pushed history on `main`
|
||||
- Add a collection to `requirements.yml` without a specific module need in existing role tasks
|
||||
- Open a firewall port anywhere but the `group_vars` firewall definitions — never ad-hoc on a host (ADR-002)
|
||||
- Disable or weaken a baseline control from ADR-002 (SSH hardening, nftables default-deny, fail2ban, auditd)
|
||||
- Expose a service to the LAN/WAN without it sitting behind the reverse proxy with authentication (ADR-002)
|
||||
- Deploy a service that hasn't cleared `docs/security/service-checklist.md` (record any deviation in `docs/security/accepted-risks.md`)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -162,7 +166,9 @@ Single-contributor, trunk-based (no merge requests / approval gates):
|
|||
| Topic | File |
|
||||
|------------------------|---------------------------------------|
|
||||
| Architecture overview | `docs/decisions/001-architecture.md` |
|
||||
| Security baseline | `docs/decisions/002-security.md` |
|
||||
| Security baseline & strategy | `docs/decisions/002-security.md` |
|
||||
| Accepted security risks | `docs/security/accepted-risks.md` |
|
||||
| Per-service security checklist | `docs/security/service-checklist.md` |
|
||||
| Toolchain choices | `docs/decisions/003-toolchain.md` |
|
||||
| Docker & Compose model | `docs/decisions/004-docker-model.md` |
|
||||
| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` |
|
||||
|
|
|
|||
|
|
@ -23,6 +23,7 @@ _Last reviewed: 2026-05-30._
|
|||
| Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below |
|
||||
| `docs/hardware/reference.md` + `scripts/capacity-scan.py` | Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON |
|
||||
| `/capacity-review` | Works — on-demand capacity evaluation → `docs/hardware/reviews/`. Intent-based (no live usage yet) |
|
||||
| ADR-002 security strategy + `docs/security/{accepted-risks,service-checklist}.md` | Present — threat model, principles, governance frame; checklist + risk register are docs, enforced manually in review |
|
||||
|
||||
## Scaffolded but empty — NOT implemented
|
||||
|
||||
|
|
@ -47,6 +48,7 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
|
|||
| Per-service roles | ADR-004 | Model defined; no service roles built |
|
||||
| Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
|
||||
| Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |
|
||||
| `/security-review` skill | ADR-002 / TODO 8.5 | Periodic posture re-check + accepted-risk re-challenge; planned, not built |
|
||||
|
||||
## Keeping this honest
|
||||
|
||||
|
|
|
|||
|
|
@ -60,6 +60,12 @@
|
|||
Prometheus/Loki/Grafana/Grafana-Alloy stack we will likely set up anyway
|
||||
(richer, per-process, but more to run) — see TODO 3.6. Don't build the
|
||||
Proxmox-RRD hook before settling this, to avoid throwaway work.
|
||||
5. Build a `/security-review` skill (sibling to `/review-repo`): re-check the
|
||||
security posture against ADR-002, surface drift, and re-challenge the
|
||||
accepted-risk register (`docs/security/accepted-risks.md`). Could pair a
|
||||
deterministic pre-scan (undeclared open ports, disabled baseline controls,
|
||||
world-readable secrets, services not behind auth) with a judgement pass.
|
||||
Open question: standalone, or folded into the kaizen `/retro` (item 11)?
|
||||
9. Should we make a basic function so that tools (and AI) can send messages to the user - email, matrix or ntfy?
|
||||
|
||||
10. **Claude setup** — DECIDED: brainstorm for intent, capture as ADRs (skip plan
|
||||
|
|
@ -68,6 +74,9 @@
|
|||
1. Policy for how we collaborate with references to baobabAnsibleV4 without misusing it.
|
||||
2. Policy for how we write key documents like ADRs.
|
||||
3. Further development on how we we collaborate on designing the foundation for the project - seperate from how we implement new containers etc.
|
||||
4. How do we make sure agents always use the latest official documentation for the technologies etc. we use?
|
||||
5. Always subagent driven?
|
||||
6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
|
||||
|
||||
11. **Kaizen loop** — set up ~2026-06-06 (one week from now).
|
||||
1. Build `/retro`: reads `docs/FRICTION.md` + recurring `/review-repo`
|
||||
|
|
|
|||
|
|
@ -1,24 +1,61 @@
|
|||
# ADR-002 — Security baseline
|
||||
# ADR-002 — Security baseline and strategy
|
||||
|
||||
## Context
|
||||
|
||||
Every managed host must reach a defined security baseline before any services
|
||||
are deployed. This baseline is applied by the `base` role and is non-negotiable —
|
||||
it runs first, on every host, every time.
|
||||
Security here is not a single control but the sum of several combined efforts —
|
||||
host hardening, network segmentation, secrets handling, supply-chain hygiene, and
|
||||
disciplined automation. This ADR is the frame that organizes them: it records the
|
||||
**threat model** we design against, the **principles** every control serves, the
|
||||
host-level **baseline** the `base` role enforces, and the **governance** that keeps
|
||||
security sharp as the homelab grows.
|
||||
|
||||
The goal is a principled, maintainable baseline appropriate for a homelab with
|
||||
some public-facing services — not a compliance exercise.
|
||||
The goal is a principled, maintainable posture for a homelab with some
|
||||
public-facing services — effective against a realistic threat model, not a
|
||||
compliance exercise.
|
||||
|
||||
## Baseline components
|
||||
Related decisions: network segmentation (ADR-007), secrets structure (ADR-003),
|
||||
per-service roles (ADR-004), CI secret-scanning (ADR-010).
|
||||
|
||||
### Access & authentication
|
||||
## Threat model
|
||||
|
||||
What we deliberately design against — and, just as importantly, what we do not:
|
||||
|
||||
| Threat | In scope? | What it drives |
|
||||
|---|---|---|
|
||||
| **Opportunistic external** — bots scanning, credential stuffing, mass-exploiting known CVEs in exposed services | Yes — primary | SSH key-only + fail2ban, deny-by-default firewall, security auto-patching, minimal attack surface, services behind a reverse proxy with auth |
|
||||
| **Lateral movement / blast radius** — assume one service *is* compromised; limit how far it spreads | Yes | VLAN segmentation (ADR-007), least-privilege containers, no host network mode, per-service isolation, no shared credentials |
|
||||
| **Operator / agent error** — accidental secret leak, misconfiguration, or an AI agent making an unsafe change | Yes | Vault + gitleaks, declarative firewall (no ad-hoc ports), review gates, agent guardrails (below), pre-commit hooks |
|
||||
| **Supply chain** — compromised images, base images, dependencies, collections | Acknowledged, lower priority | Version pinning where practical (ADR-011), gitleaks; tracked as an accepted risk with a revisit trigger |
|
||||
| **Targeted / physical** — a determined adversary specifically after this homelab, or physical device access | Out of scope | Not designed against at this scale; revisit if the threat model changes |
|
||||
|
||||
Supply chain is consciously deprioritized, not forgotten — see
|
||||
`docs/security/accepted-risks.md`.
|
||||
|
||||
## Security principles
|
||||
|
||||
Every control below should trace back to one of these:
|
||||
|
||||
- **Defense in depth** — no single control is load-bearing; layers compensate.
|
||||
- **Least privilege** — accounts, containers, and automation get the minimum they need.
|
||||
- **Deny / secure by default** — closed unless explicitly opened; safe defaults.
|
||||
- **Contain the blast radius** — segment and isolate so one compromise isn't total.
|
||||
- **Automated & reproducible** — the baseline is reached by Ansible, never by hand.
|
||||
- **Explicit & revisitable** — decisions and accepted risks are written down and
|
||||
re-challenged, not left implicit.
|
||||
|
||||
## Baseline controls
|
||||
|
||||
Applied by the `base` role, non-negotiable — it runs first, on every host, every
|
||||
time. Each heading tags the threat(s) it primarily serves.
|
||||
|
||||
### Access & authentication — *opportunistic, agent error*
|
||||
|
||||
- SSH key authentication only — password auth disabled
|
||||
- Root login disabled — `PermitRootLogin no`
|
||||
- Dedicated `ansible` user with locked-down sudo (NOPASSWD for automation)
|
||||
- No shared user accounts — per-person SSH keys in `group_vars/all/vars.yml`
|
||||
|
||||
### Firewall
|
||||
### Firewall — *opportunistic, blast radius, agent error*
|
||||
|
||||
- `nftables` (native on Debian 13, replaces iptables)
|
||||
- Default policy: deny inbound, allow established/related, allow loopback
|
||||
|
|
@ -30,29 +67,29 @@ some public-facing services — not a compliance exercise.
|
|||
> This is addressed by setting `"iptables": false` in Docker daemon config and managing
|
||||
> all rules via nftables explicitly. See `docs/decisions/004-docker-model.md`.
|
||||
|
||||
### Intrusion deterrence
|
||||
### Intrusion deterrence — *opportunistic*
|
||||
|
||||
- `fail2ban` monitoring SSH (and optionally reverse proxy logs)
|
||||
- Configured to ban after 5 failed attempts, 1-hour ban
|
||||
|
||||
### Updates
|
||||
### Updates — *opportunistic*
|
||||
|
||||
- `unattended-upgrades` enabled for **security patches only**
|
||||
- Full system upgrades triggered deliberately via Ansible (`make deploy PLAYBOOK=upgrade`)
|
||||
- No automatic reboots — reboots are a conscious operational decision
|
||||
|
||||
### Minimal attack surface
|
||||
### Minimal attack surface — *opportunistic, blast radius*
|
||||
|
||||
- No unnecessary packages installed
|
||||
- Docker daemon TCP socket disabled — Unix socket only
|
||||
- No open ports beyond those explicitly defined in firewall rules
|
||||
|
||||
### Audit trail
|
||||
### Audit trail — *agent error, blast radius*
|
||||
|
||||
- `auditd` installed and running with a baseline ruleset
|
||||
- Logs shipped to a central location if a log aggregation service is available
|
||||
|
||||
## Secrets management
|
||||
## Secrets management — *agent error, opportunistic*
|
||||
|
||||
- Ansible Vault for all secrets (API keys, passwords, certificates), structured as a
|
||||
nested `vault.<service>.<key>` map (ADR-003)
|
||||
|
|
@ -62,15 +99,40 @@ some public-facing services — not a compliance exercise.
|
|||
`rbw unlock`; nothing decryptable sits at rest in the repo or working tree
|
||||
- See `docs/runbooks/rotate-secrets.md` for `rbw` setup and rotation
|
||||
|
||||
## What this baseline does not include
|
||||
## Governance
|
||||
|
||||
- Full CIS benchmark hardening — adds complexity for marginal gain at this scale
|
||||
- SELinux / AppArmor — not applied by default, revisit if threat model changes
|
||||
- Intrusion detection (IDS) — out of scope for now
|
||||
Security is maintained, not achieved once. This ADR **establishes** four
|
||||
mechanisms; each lives where change is cheap and is linked from here.
|
||||
|
||||
- **Per-service security bar** — every exposed service must clear a defined
|
||||
checklist before deploy (secrets in vault, no default creds, least-privilege /
|
||||
non-root, declared firewall ports, reverse-proxy + auth if exposed). Lives in
|
||||
`docs/security/service-checklist.md`; referenced from `docs/runbooks/new-role.md`.
|
||||
Enforced manually in review today; the planned `/security-review` will automate it.
|
||||
- **Periodic security review** — a recurring review that re-checks posture,
|
||||
surfaces drift, and re-challenges accepted risks. Planned as a `/security-review`
|
||||
skill (sibling to `/review-repo`); see `docs/TODO.md` (Scheduled work). Not built
|
||||
yet — see STATUS.md.
|
||||
- **Accepted-risk register** — the conscious trade-offs we choose to live with, each
|
||||
with rationale and a revisit trigger. Lives in `docs/security/accepted-risks.md`
|
||||
(expected to change; kept out of this ADR so the ADR stays stable).
|
||||
- **Agent / automation guardrails** — what AI agents and automation may do
|
||||
unsupervised vs. what needs a human gate, since operator/agent error is in the
|
||||
threat model. Encoded in `CLAUDE.md` ("What Claude must not do without explicit
|
||||
instruction") and enforced by PreToolUse hooks (generated-file guard, `rbw`
|
||||
pre-flight).
|
||||
|
||||
## Decision
|
||||
|
||||
This baseline was chosen to be:
|
||||
- **Effective** against the realistic threat model (exposed services, shared repo)
|
||||
- **Maintainable** by a small team without security expertise overhead
|
||||
- **Automated** — no manual steps should be needed to reach baseline state
|
||||
This posture was chosen to be:
|
||||
|
||||
- **Effective** against the stated threat model (opportunistic external, lateral
|
||||
movement, operator/agent error)
|
||||
- **Maintainable** by a small team without security-expertise overhead
|
||||
- **Automated** — no manual steps to reach baseline state
|
||||
- **Legible & revisitable** — the threat model, principles, and accepted risks are
|
||||
written down and reviewed over time, not implicit
|
||||
|
||||
Out-of-scope items and conscious trade-offs are recorded in
|
||||
`docs/security/accepted-risks.md` rather than here, so this decision record stays
|
||||
stable while the risk posture evolves.
|
||||
|
|
|
|||
|
|
@ -71,7 +71,16 @@ Fix any lint or test failures before committing.
|
|||
Add the role to the appropriate playbook in `playbooks/` and add the host group
|
||||
to `inventories/staging/hosts.yml` for integration testing.
|
||||
|
||||
### 9. Commit
|
||||
### 9. Clear the security checklist (services)
|
||||
|
||||
If the role is a **service** — especially one reachable beyond its own host —
|
||||
walk `docs/security/service-checklist.md` and confirm every item passes (secrets
|
||||
in vault, no default creds, least-privilege, declared firewall ports, behind the
|
||||
reverse proxy with auth if exposed). Record any conscious deviation in
|
||||
`docs/security/accepted-risks.md`. This bar is established by ADR-002; enforcement
|
||||
is manual in review today, with the planned `/security-review` to automate it.
|
||||
|
||||
### 10. Commit
|
||||
|
||||
```bash
|
||||
git checkout -b role/<rolename>
|
||||
|
|
|
|||
21
docs/security/accepted-risks.md
Normal file
21
docs/security/accepted-risks.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# Accepted security risks
|
||||
|
||||
Conscious security trade-offs we are choosing to live with — recorded so "what we
|
||||
are *not* doing" is explicit and revisitable, not forgotten. This register is a
|
||||
**living document** and is expected to change; it is deliberately kept out of
|
||||
ADR-002 (which records durable decisions) so the ADR stays stable.
|
||||
|
||||
Owned by **ADR-002** (Security baseline and strategy). Re-challenged during the
|
||||
periodic security review (planned `/security-review`; see `docs/TODO.md`).
|
||||
|
||||
**Each entry:** the risk · why we accept it (rationale) · what would make us
|
||||
revisit (trigger).
|
||||
|
||||
| # | Accepted risk | Rationale | Revisit trigger |
|
||||
|---|---|---|---|
|
||||
| R1 | **Supply chain not actively defended** — third-party container/base images, dependencies, and Ansible collections are trusted as pulled | Out of proportion to a homelab's effort budget; the realistic threat is opportunistic, not a targeted supply-chain attack. gitleaks + version pinning (ADR-011) give partial cover | Hosting high-value data/finances for others; a relevant upstream compromise; appetite for image signing / SBOM / pinned digests |
|
||||
| R2 | **No full CIS benchmark hardening** | Significant complexity for marginal gain at this scale | A compliance need, or hosting third-party data with obligations |
|
||||
| R3 | **No SELinux / AppArmor** mandatory access control | Operational overhead exceeds benefit for the current threat model | Threat model shifts toward targeted attackers; a service with a poor security history |
|
||||
| R4 | **No intrusion detection system (IDS)** | Detection is only useful with the capacity to triage it; alerts no one reads are noise | Monitoring/alerting stack (Prometheus/Loki/Grafana) is in place and someone will act on alerts |
|
||||
|
||||
_Last reviewed: 2026-06-04 (seeded — pending a first re-challenge pass)._
|
||||
49
docs/security/service-checklist.md
Normal file
49
docs/security/service-checklist.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# Per-service security checklist
|
||||
|
||||
The bar every service (a per-service role — ADR-004) must clear **before deploy**,
|
||||
especially anything reachable beyond its own host. Established by **ADR-002**
|
||||
(Security baseline and strategy); referenced from `docs/runbooks/new-role.md`.
|
||||
Enforced manually in review today; the planned `/security-review` skill (see
|
||||
`docs/TODO.md`) will automate the check.
|
||||
|
||||
Treat each item as must-pass **unless** a deviation is recorded in
|
||||
`docs/security/accepted-risks.md` with a rationale and a revisit trigger.
|
||||
|
||||
## Secrets & credentials
|
||||
|
||||
- [ ] All secrets live in an encrypted `vault.yml` (`vault.<service>.<key>`); none in
|
||||
plaintext files, templates, or Compose env literals
|
||||
- [ ] No default or vendor-shipped credentials remain — admin passwords/tokens are
|
||||
generated and stored in vault
|
||||
- [ ] Nothing secret is baked into an image or committed to git (gitleaks must pass)
|
||||
|
||||
## Least privilege
|
||||
|
||||
- [ ] Container runs as a non-root user where the image supports it
|
||||
- [ ] No `privileged: true` and no host network mode unless explicitly justified
|
||||
- [ ] Only the volumes/paths the service needs are mounted; read-only where possible
|
||||
- [ ] Linux capabilities dropped to what's required (no blanket grants)
|
||||
|
||||
## Network & exposure
|
||||
|
||||
- [ ] Every listening port is declared in `group_vars` firewall definitions — never
|
||||
opened ad-hoc on a host
|
||||
- [ ] The service is not published directly to a LAN/WAN port if it can sit behind the
|
||||
reverse proxy instead
|
||||
- [ ] Anything reachable beyond the `srv` VLAN is behind the reverse proxy **with
|
||||
authentication** (and TLS)
|
||||
- [ ] Inter-service reach follows least privilege — no broad `srv`→`srv` access where a
|
||||
single declared dependency suffices
|
||||
|
||||
## Updates & provenance
|
||||
|
||||
- [ ] Image/source version is pinned (tag or digest), not floating `latest` (ADR-011)
|
||||
- [ ] The update path is known — how this service gets patched
|
||||
|
||||
## Operability (security-adjacent)
|
||||
|
||||
- [ ] Logs go somewhere reviewable (central aggregation when available)
|
||||
- [ ] Backup/restore is covered if the service holds state
|
||||
|
||||
> Deviations are allowed but must be **conscious**: record them in
|
||||
> `docs/security/accepted-risks.md`, don't leave them implicit.
|
||||
Loading…
Add table
Reference in a new issue