docs(tags): ADR-019 + CLAUDE.md/TODO/CAPABILITIES (tagging standard)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 09:42:22 +02:00 · 2026-06-06 09:42:22 +02:00 · 24b5e9361e
commit 24b5e9361e
parent 9584cc2c76
4 changed files with 128 additions and 3 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -51,7 +51,11 @@ Full design rationale: `docs/decisions/`
 ## Ansible conventions

 - **FQCN always**: `ansible.builtin.template`, never `template`
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering
+- **Tags** (ADR-019): import each role with its role-name tag once at the play level
+  (Ansible inherits it to every task). Tag a task/block with a concern tag from the
+  approved list (`tests/tags.yml`) only where it genuinely belongs to that concern —
+  don't invent tags or tag for tagging's sake. Target one axis at a time (role/service
+  *or* concern; tags are union/OR, never intersected). `make lint` enforces the vocabulary.
 - **Handlers**: use `listen:` topic strings, not direct name references
 - **Variables**: `rolename__varname` double-underscore namespace for role defaults
 - **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only
@ -144,6 +148,9 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 ## Terraform conventions

 - Terraform owns VM existence only — nothing inside a VM, and no DNS records
+- Every TF-managed VM carries three Proxmox tags — `<env>`, its inventory `group`, and
+  `managed-by=terraform` — as **metadata only** (ADR-019). They do not feed inventory
+  or run-targeting; `tf_to_inventory.py` still groups by the `group` output field.
 - Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory)
 - OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider
 - Environments are separate directories (`staging/`, `production/`), not workspaces
@ -215,6 +222,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 | Update management      | `docs/decisions/011-update-management.md` |
 | Hardware & capacity    | `docs/decisions/012-hardware-capacity.md` |
 | Logging & log integrity | `docs/decisions/018-logging.md` |
+| Tagging & run-targeting | `docs/decisions/019-tagging.md` |
 | Adding a new role      | `docs/runbooks/new-role.md`           |
 | Adding a new host      | `docs/runbooks/new-host.md`           |
 | Rotating vault secrets | `docs/runbooks/rotate-secrets.md`     |
--- a/docs/CAPABILITIES.md
+++ b/docs/CAPABILITIES.md
@ -112,6 +112,10 @@ _(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not cont
 | Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 |
 | Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik |

+- **Targeted runs** (ADR-019): playbooks are sliced with `--tags` along two axes —
+  role/service (tag = role name) or a closed list of cross-cutting concerns
+  (`firewall`, `logging`, `config`, `deploy`, …); the vocabulary is lint-enforced.
+
 ---

 ## V4 completeness check
--- a/docs/TODO.md
+++ b/docs/TODO.md
@ -28,11 +28,13 @@
      (all logs) + off-site security subset on `askari` + Grafana on-cluster (not the
      whole stack on `askari`). Still to design/build: Prometheus + metric exporters,
      Uptime Kuma, and exactly which alerts live where.
-   7. Define a tagging standard that lets us target runs without over-tagging.
+   7. ~~Define a tagging standard that lets us target runs without over-tagging.~~
+      DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed
+      9-tag concern list (`tests/tags.yml`); union-only targeting; enforced by `make lint`.
   8. Ensure the right things are backed up (incl. database dumps if we land on PBS).
   9. Decide: a central database server, or individual database services per app?
   10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)?
-   11. Deliberate tagging strategy.
+   11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7.

 4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?

--- a/docs/decisions/019-tagging.md
+++ b/docs/decisions/019-tagging.md
@ -0,0 +1,111 @@
+# ADR-019 — Tagging standard for targeted, predictable runs
+
+## Status
+
+Accepted (2026-06-06). Resolves TODO 3.7 ("Define a tagging standard that lets us
+target runs without over-tagging") and TODO 3.11 ("Deliberate tagging strategy").
+
+## Context
+
+boma wants to run playbooks **targeted** — a single service, a single layer, or a
+single cross-cutting concern — **transparently and predictably**: a reader should
+know from a `--tags` invocation exactly what it will and won't touch. CLAUDE.md
+already requires tag-filterable tasks, but no vocabulary or convention existed, and
+the TODO explicitly warns against the opposite failure mode: **over-tagging**.
+
+## Decision
+
+### Two-tier tagging
+
+**Tier 1 — role/service tag (mechanical).** The tag equals the role name, applied
+once at the role-import level:
+
+```yaml
+roles:
+  - role: photoprism
+    tags: [photoprism]
+```
+
+Ansible propagates it to every task in the role. Because one service = one role
+(ADR-004), this single rule covers both the *layer/role* and *single-service*
+targeting axes with zero per-task burden. Role-less lifecycle playbooks
+(e.g. `bootstrap.yml`) carry a single playbook-identity tag instead.
+
+**Tier 2 — concern tag (curated).** A small **closed list** of cross-cutting concern
+tags, applied per-task/block **only where a task genuinely belongs to that concern**.
+
+### The closed concern list
+
+A concern earns a tag only if it (a) appears in 2+ roles, (b) is worth running as a
+slice on its own, and (c) doesn't overlap confusingly with another.
+
+| Tag | Covers |
+|-----|--------|
+| `packages`   | apt package install/management |
+| `users`      | accounts, groups, sudo |
+| `firewall`   | nftables rulesets & port definitions (ADR-002) |
+| `hardening`  | security baseline — sshd config, fail2ban, auditd, sysctl |
+| `logging`    | Alloy / log-shipping config (ADR-018) |
+| `monitoring` | metric exporters / health checks |
+| `config`     | render templated config/compose files to disk — **no restart** |
+| `deploy`     | bring services up / restart (`compose up -d`) |
+| `proxy`      | reverse-proxy + TLS registration (Traefik routes, Authentik) |
+
+The `config`/`deploy` split lets you re-render and diff configuration (`--tags
+config`) without bouncing services, then restart deliberately (`--tags deploy`).
+`backup` and `secrets` are intentionally omitted until the roles needing them exist.
+
+### `always` / `never`
+
+- **`always`** — reserved for cheap preflight assertions (vault unlocked, OS is
+  Debian 13, required vars present), so even `--tags config` runs its safety guards.
+- **`never`** — reserved for destructive/expensive opt-in tasks, each paired with a
+  descriptive tag (e.g. `tags: [never, force_pull]`); they run only when named.
+
+### Predictability principle: tags are union-only
+
+`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. boma therefore
+targets **one axis at a time**: either a role/service *or* a concern, never an
+intersection like "photoprism's firewall only." If that's ever needed, just run
+`--tags photoprism` (idempotent and fast). Designing for intersection is the
+over-tagging trap; we decline it on purpose.
+
+### Terraform / Proxmox VM tags (metadata only)
+
+Every Terraform-managed VM carries exactly three Proxmox tags:
+
+| Tag | Value | Purpose |
+|-----|-------|---------|
+| env        | `staging` \| `production`          | which environment |
+| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group |
+| managed-by | `terraform`                        | distinguishes IaC VMs from hand-made ones |
+
+These are **pure metadata for transparency** (glanceable in the Proxmox UI). They do
+**not** drive run-targeting and do **not** feed inventory — `scripts/tf_to_inventory.py`
+keeps building groups from the `group` output field, the single source of truth.
+
+## Enforcement
+
+`tests/tags.yml` is the single source of truth for the allowed concern/special/
+opt-in/playbook tags. `scripts/check-tags.py` (run by `make lint`, covered by
+`tests/test_check_tags.py`) scans `roles/` and `playbooks/` and fails on any tag
+outside `{role directory names} ∪ {tests/tags.yml entries}`.
+
+## Extending the vocabulary
+
+To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the concern
+table above with a one-line justification showing it passes the litmus test
+(cross-cutting, 2+ roles, distinct). That is the whole gate — lightweight, but it
+leaves a paper trail.
+
+## Consequences
+
+- Targeted runs are predictable: only two kinds of tags exist, one of them mechanical.
+- Over-tagging is structurally resisted (closed list + lint enforcement).
+- Intersection targeting is unavailable by design.
+- Authors must keep role tags = role names; the linter enforces it.
+
+## Related
+
+ADR-002 (security baseline / firewall), ADR-004 (one service = one role),
+ADR-009 (TF↔Ansible handoff / inventory), ADR-018 (logging).