diff --git a/CLAUDE.md b/CLAUDE.md index 92a83d6..068d727 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -80,6 +80,8 @@ Full design rationale: `docs/decisions/` - Every role must have `molecule/default/` scenario targeting Debian 13 - Every role must have a populated `README.md` - Every role must have `meta/main.yml` filled in +- Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md` +- One service = one self-contained role; no shared multi-service roles (ADR-004) - Role names: `snake_case`, descriptive nouns (`base`, `docker_host`, `reverse_proxy`) - Use `make new-role NAME=` to scaffold — never create role structure by hand @@ -169,6 +171,7 @@ Single-contributor, trunk-based (no merge requests / approval gates): | Security baseline & strategy | `docs/decisions/002-security.md` | | Accepted security risks | `docs/security/accepted-risks.md` | | Per-service security checklist | `docs/security/service-checklist.md` | +| Per-service security record (template) | `docs/security/service-security-template.md` | | Toolchain choices | `docs/decisions/003-toolchain.md` | | Docker & Compose model | `docs/decisions/004-docker-model.md` | | Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` | diff --git a/STATUS.md b/STATUS.md index 5b8500a..6dad1a3 100644 --- a/STATUS.md +++ b/STATUS.md @@ -24,6 +24,7 @@ _Last reviewed: 2026-05-30._ | `docs/hardware/reference.md` + `scripts/capacity-scan.py` | Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON | | `/capacity-review` | Works — on-demand capacity evaluation → `docs/hardware/reviews/`. Intent-based (no live usage yet) | | ADR-002 security strategy + `docs/security/{accepted-risks,service-checklist}.md` | Present — threat model, principles, governance frame; checklist + risk register are docs, enforced manually in review | +| Service-role standard + per-service `SECURITY.md` convention | Defined (ADR-004 + `docs/security/service-security-template.md`); not yet applied — no service roles exist | ## Scaffolded but empty — NOT implemented diff --git a/docs/decisions/002-security.md b/docs/decisions/002-security.md index eb736af..6259f74 100644 --- a/docs/decisions/002-security.md +++ b/docs/decisions/002-security.md @@ -145,9 +145,12 @@ mechanisms; each lives where change is cheap and is linked from here. - **Per-service security bar** — every exposed service must clear a defined checklist before deploy (secrets in vault, no default creds, least-privilege / - non-root, declared firewall ports, reverse-proxy + auth if exposed). Lives in - `docs/security/service-checklist.md`; referenced from `docs/runbooks/new-role.md`. - Enforced manually in review today; the planned `/security-review` will automate it. + non-root, declared firewall ports, reverse-proxy + auth if exposed). The generic + bar lives in `docs/security/service-checklist.md`, and each service + records how it meets the bar (plus service-specific hardening) in its own + `roles//SECURITY.md`, created from `docs/security/service-security-template.md` + (ADR-004). Enforced manually in review today; the planned `/security-review` + aggregates every `roles/*/SECURITY.md` and cross-checks it against the role's config. - **Periodic security review** — a recurring review that re-checks posture, surfaces drift, and re-challenges accepted risks. Planned as a `/security-review` skill (sibling to `/review-repo`); see `docs/TODO.md` (Scheduled work). Not built diff --git a/docs/decisions/004-docker-model.md b/docs/decisions/004-docker-model.md index 52d1051..f2fb6ce 100644 --- a/docs/decisions/004-docker-model.md +++ b/docs/decisions/004-docker-model.md @@ -28,16 +28,44 @@ defines how services are structured, deployed, and maintained. All services live under `/opt/services/`. The path is defined in `group_vars/all/vars.yml` as `services__base_dir`. -## Compose file delivery +## Service-role standard -Each service has a corresponding Ansible role (or is managed by a shared role -with per-service variables). The role: +**Every service has its own self-contained role** — one service, one role. Shared +roles serving multiple services are no longer used (see "Why not a shared engine" +below). Each service role contains a standard set of files: -1. Creates `/opt/services/servicename/` directory -2. Renders `docker-compose.yml` from `templates/docker-compose.yml.j2` -3. Renders `.env` from `templates/env.j2` (pulling secrets from vault variables) -4. Runs `docker compose up -d --remove-orphans` via `ansible.builtin.command` -5. Optionally runs `docker compose pull` before up (controlled by variable) +| File | Purpose | +|---|---| +| `tasks/main.yml` | The standard deploy mechanics (below) | +| `templates/docker-compose.yml.j2` | The Compose definition | +| `templates/env.j2` | `.env` rendered from vault variables | +| `defaults/main.yml` | Tuneables, `rolename__` namespace | +| `README.md` | Purpose, variables, usage (role convention) | +| `SECURITY.md` | Per-service security record — see ADR-002 and `docs/security/service-security-template.md` | +| `meta/main.yml`, `molecule/default/` | Metadata + Debian 13 test scenario | + +### Standard deploy mechanics + +Every service role's `tasks/main.yml` follows the same sequence, so all roles are +uniform and predictable: + +1. Create `/opt/services//` directory +2. Render `docker-compose.yml` from `templates/docker-compose.yml.j2` +3. Render `.env` from `templates/env.j2` (secrets from vault variables) +4. Run `docker compose up -d --remove-orphans` via `ansible.builtin.command` +5. Optionally run `docker compose pull` before up (controlled by a variable) + +### Why not a shared engine + +A shared `compose_service` engine role — service roles delegating the mechanics to +one place — is **intentionally not built**. Duplicating the ~5 standard tasks per +role is accepted in favour of legible, self-contained roles a reader can understand +without indirection, and AI authorship makes the duplication cheap to generate +uniformly from this standard. + +**Revisit trigger:** extract a shared engine role if maintaining the duplicated +mechanics across service roles becomes painful — a pattern change that means editing +many roles, or drift between them that this standard alone isn't preventing. ## Docker daemon configuration diff --git a/docs/runbooks/new-role.md b/docs/runbooks/new-role.md index aa7dd34..4e133a4 100644 --- a/docs/runbooks/new-role.md +++ b/docs/runbooks/new-role.md @@ -71,14 +71,16 @@ Fix any lint or test failures before committing. Add the role to the appropriate playbook in `playbooks/` and add the host group to `inventories/staging/hosts.yml` for integration testing. -### 9. Clear the security checklist (services) +### 9. Write the per-service security record (services) -If the role is a **service** — especially one reachable beyond its own host — -walk `docs/security/service-checklist.md` and confirm every item passes (secrets -in vault, no default creds, least-privilege, declared firewall ports, behind the -reverse proxy with auth if exposed). Record any conscious deviation in -`docs/security/accepted-risks.md`. This bar is established by ADR-002; enforcement -is manual in review today, with the planned `/security-review` to automate it. +For a **service** role, copy `docs/security/service-security-template.md` to +`roles//SECURITY.md` and fill it in: exposure, the checklist status +(from `docs/security/service-checklist.md`), service-specific hardening, and any +residual/accepted risks. Filling the **Checklist status** section is how the +service clears the security bar — record any conscious deviation in +`docs/security/accepted-risks.md`. The bar is established by ADR-002; enforcement is +manual in review today, with the planned `/security-review` aggregating every +`roles/*/SECURITY.md` to automate it. ### 10. Commit diff --git a/docs/security/service-checklist.md b/docs/security/service-checklist.md index 1a73edb..0f1e112 100644 --- a/docs/security/service-checklist.md +++ b/docs/security/service-checklist.md @@ -9,6 +9,10 @@ Enforced manually in review today; the planned `/security-review` skill (see Treat each item as must-pass **unless** a deviation is recorded in `docs/security/accepted-risks.md` with a rationale and a revisit trigger. +This checklist is the generic **bar**. Each service answers it in its own +`roles//SECURITY.md` (the "Checklist status" section), created from +`docs/security/service-security-template.md` — see ADR-004. + ## Secrets & credentials - [ ] All secrets live in an encrypted `vault.yml` (`vault..`); none in diff --git a/docs/security/service-security-template.md b/docs/security/service-security-template.md new file mode 100644 index 0000000..b22b0a4 --- /dev/null +++ b/docs/security/service-security-template.md @@ -0,0 +1,53 @@ +# Per-service security record — template + +Copy this file to `roles//SECURITY.md` and fill it in when building a +service role (ADR-004). It is the per-service security **record**: the generic bar +lives in `docs/security/service-checklist.md` (the questions); this file is *this +service's answers, plus service-specific measures*. Filling the **Checklist status** +section is how a service clears the bar (`docs/runbooks/new-role.md`). + +The planned `/security-review` aggregates every `roles/*/SECURITY.md` and +cross-checks the claims here against the role's actual config (Compose template, +declared firewall ports), so keep this honest and current. + +Delete this preamble in the copy and start from the heading below. + +--- + +# Security — <service> + +## Exposure + +- **Published ports:** <ports> — and which are declared in the `group_vars` firewall vars +- **Auth surface:** <how it authenticates — e.g. behind reverse proxy + SSO, or app-native login> +- **Reachability:** <which VLANs / sources can reach it; internal-only vs LAN vs public> +- **Data sensitivity:** <what data it holds; backup/restore pointer> + +## Checklist status + +Each item from `docs/security/service-checklist.md`, with this service's status: +✅ met · ⚠️ deviation (link to `docs/security/accepted-risks.md`) · n/a. + +- [ ] Secrets in vault; no default creds; nothing secret in git/images +- [ ] Non-root; no `privileged`/host-network unless justified; minimal mounts; caps dropped +- [ ] Ports declared in `group_vars`; behind reverse proxy + auth if exposed; least-privilege inter-service reach +- [ ] Image pinned (tag/digest), update path known +- [ ] Logs reviewable; backup/restore covered if stateful + +## Service-specific hardening + +Measures beyond the generic bar — the application's own relevant settings. Examples +of the *kind* of thing that belongs here (replace with this service's actual measures): + +- App-level brute-force / rate-limiting protection enabled +- Trusted-domains / allowed-hosts restricted +- Unused features or remote-access modes disabled +- Server-side encryption / secure cookie / security-header settings + +## Residual / accepted risks + +Anything not done for this service — each with rationale and a revisit trigger. +Link global trade-offs to `docs/security/accepted-risks.md`; note service-local ones +here. + +- <none yet, or: risk · rationale · revisit trigger>