boma/docs/decisions/004-docker-model.md
sjat 13ae674cc9 chore(kaizen): first /kaizen run — curate 12 friction signals
Dogfood of the new /kaizen command. 11 consumed, 1 kept open.
- SYSTEMATIZE → docs/testing/gotchas.md (apply:{tags} propagation, Molecule
  tag-isolation testing, API/templating render-only gap); CLAUDE.md
  (item['key'] loop convention, TF module required_providers); public_dns
  README (Gandi null-MX workaround).
- CHANGE → extend the Stop hook to also guard the brainstorming spec-review gate
  (verified: blocks the gate, passes meta-discussion).
- SYSTEMATIZE → make new-role scaffolds the access__/backup__ noqa reminder;
  ADR-004 documents the cross-role-naming convention.
- ALREADY-BUILT/ACCEPTED → exec-menu guard verified firing; ADR-023; ADR-024;
  subagent-faithfulness now embodied in the two-stage subagent review.
- KEEP-OPEN → a repo-scan.py check for ADRs that over-claim reconciliation.

Nudge: OVERDUE (13 signals) → ok (1). make lint + 16 friction-scan tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:46:23 +02:00

144 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-004 — Docker and Compose service model
## Status
Accepted (2026-05-30)
## Context
All services run as Docker containers managed via Docker Compose. This document
defines how services are structured, deployed, and maintained.
## Core principles
- **No hand-edited files on hosts**: all Compose files are rendered by Ansible
from Jinja2 templates. If a file exists on a host, it was put there by Ansible.
- **Compose per service**: each service (or tightly coupled service group) gets
its own Compose file and directory under a standard path.
- **Variables drive differences**: the same template renders differently per host
via `group_vars` and `host_vars`. No host-specific templates.
## Directory layout on hosts
```
/opt/services/
├── servicename/
│ ├── docker-compose.yml # rendered by Ansible, never edited manually
│ ├── .env # rendered by Ansible from vault variables
│ └── data/ # persistent volumes (bind mounts)
│ └── ...
```
All services live under `/opt/services/`. The path is defined in
`group_vars/all/vars.yml` as `services__base_dir`.
## Service-role standard
**Every service has its own self-contained role** — one service, one role. Shared
roles serving multiple services are no longer used (see "Why not a shared engine"
below). Each service role contains a standard set of files:
| File | Purpose |
|---|---|
| `tasks/main.yml` | The standard deploy mechanics (below) |
| `templates/docker-compose.yml.j2` | The Compose definition |
| `templates/env.j2` | `.env` rendered from vault variables |
| `defaults/main.yml` | Tuneables, `rolename__` namespace |
| `README.md` | Purpose, variables, usage (role convention) |
| `SECURITY.md` | Per-service security record — see ADR-002 and `docs/security/service-security-template.md` |
| `VERIFY.md` | Per-service UI acceptance spec — see ADR-008 Level 4 / ADR-017 and `docs/testing/service-verify-template.md` |
| `ACCESS.md` | Per-service operational-access record — see ADR-021 and `docs/access/service-access-template.md` |
| `BACKUP.md` | Per-service backup record — see ADR-022 and `docs/backup/service-backup-template.md` (a stateless service declares `backup__state: false` with a reason) |
| `meta/main.yml`, `molecule/default/` | Metadata + Debian 13 test scenario |
The `access__*` (ADR-021) and `backup__*` (ADR-022) data in `defaults/main.yml` are
**cross-role conventions** — shared field names that deliberately do *not* carry the
`<rolename>__` prefix. ansible-lint's `var-naming[no-role-prefix]` has no per-prefix
allowlist, so each such line carries a trailing `# noqa: var-naming[no-role-prefix]` (the
rule stays enforced for genuinely role-scoped vars). `make new-role` scaffolds a reminder;
`roles/reverse_proxy/defaults/main.yml` is the reference.
### Standard deploy mechanics
Every service role's `tasks/main.yml` follows the same sequence, so all roles are
uniform and predictable:
1. Create `/opt/services/<service>/` directory
2. Render `docker-compose.yml` from `templates/docker-compose.yml.j2`
3. Render `.env` from `templates/env.j2` (secrets from vault variables)
4. Run `docker compose up -d --remove-orphans` via `ansible.builtin.command`
5. Optionally run `docker compose pull` before up (controlled by a variable)
### Why not a shared engine
A shared `compose_service` engine role — service roles delegating the mechanics to
one place — is **intentionally not built**. Duplicating the ~5 standard tasks per
role is accepted in favour of legible, self-contained roles a reader can understand
without indirection, and AI authorship makes the duplication cheap to generate
uniformly from this standard.
**Revisit trigger:** extract a shared engine role if maintaining the duplicated
mechanics across service roles becomes painful — a pattern change that means editing
many roles, or drift between them that this standard alone isn't preventing.
## Docker daemon configuration
Managed by the `docker_host` role. Key settings:
- `"log-driver": "json-file"` with size limits (prevents disk exhaustion)
- `"iptables": false` — firewall managed entirely by nftables (see ADR-002)
- TCP socket disabled — Unix socket only (`/var/run/docker.sock`)
- User namespace remapping: evaluated per use case, not enabled by default
## Networking
- Each service Compose file defines its own named network(s)
- Services that need to communicate are placed on a shared named network
defined in a dedicated `docker-compose.networks.yml` (if cross-service
networking is needed on a host)
- External port publishing is explicit and matches nftables rules
## Image management
- Image pinning follows the tiered model in ADR-011: **stateful** services pin
`tag@digest` (readable tag + integrity digest); **stateless** services use rolling
tags (`latest`/`stable`), refreshed deliberately and watched by DIUN
- Bare `latest` is therefore acceptable only on the stateless tier; the stateful tier
is always pinned
- Image updates are a deliberate operation: update the tag/digest variable, run deploy
## Persistent data
- Bind mounts preferred over named volumes for data that must be backed up
- All bind mount paths are under `/opt/services/<name>/data/`
- Backup strategy is defined in **ADR-022** — the bind mounts under
`/opt/services/<name>/data/` are exactly the unit ADR-022's per-service `backup__*`
contract (and `BACKUP.md`) captures
## Decision
Docker Compose was chosen over Kubernetes/Swarm because:
- Appropriate complexity level for 25 hosts with independent service sets
- Compose files are human-readable and easily auditable
- No distributed state to manage
- Straightforward to back up and restore
## Consequences
Drawn from the trade-offs and deferred items this ADR already states:
- A shared `compose_service` engine role is intentionally not built: the ~5 standard
tasks are duplicated per role in favour of legible, self-contained roles, with a stated
revisit trigger — extract a shared engine if maintaining the duplicated mechanics
becomes painful (a pattern change touching many roles, or drift this standard alone
isn't preventing) (per "Why not a shared engine").
- Forgoing Kubernetes/Swarm is the deliberate cost of matching complexity to a 25 host
fleet with no distributed state to manage (per Decision).
- User-namespace remapping is not enabled by default — evaluated per use case (per Docker
daemon configuration).
- Bare `latest` is acceptable only on the stateless tier; the stateful tier is always
pinned `tag@digest`, and image updates are a deliberate operation (per Image management;
ADR-011).
- Backup strategy is defined in ADR-022 (not in this ADR); the persistent bind mounts
under `/opt/services/<name>/data/` are the unit ADR-022's per-service `backup__*`
contract captures (per Persistent data).