docs(spec): tagging standard design (TODO 3.7/3.11 → ADR-019)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 09:15:44 +02:00 · 2026-06-06 09:15:44 +02:00 · 4ed9e9a8bf
commit 4ed9e9a8bf
parent 9bdb3017bb
1 changed files with 188 additions and 0 deletions
--- a/docs/superpowers/specs/2026-06-06-tagging-strategy-design.md
+++ b/docs/superpowers/specs/2026-06-06-tagging-strategy-design.md
@ -0,0 +1,188 @@
+# Design — Ansible tagging standard (targeted, predictable runs)
+
+- **Date:** 2026-06-06
+- **Status:** Approved design — pending implementation plan
+- **Resolves:** TODO 3.7 ("Define a tagging standard that lets us target runs without
+  over-tagging") and TODO 3.11 ("Deliberate tagging strategy") — the same thread
+- **Becomes:** ADR-019 (this design is the basis for that ADR)
+
+---
+
+## Problem
+
+boma wants to run playbooks **targeted** — a single service, a single layer, or a
+single cross-cutting concern — and to do so **transparently and predictably**: you
+should be able to look at a `--tags` invocation and know exactly what it will and won't
+touch. CLAUDE.md already mandates that every task be tag-filterable, but no *vocabulary*
+or *naming convention* exists. Without one, tags proliferate ad-hoc per role and the
+"predictable" property is lost — and the TODO explicitly warns against the opposite
+failure mode, **over-tagging**.
+
+The repo is effectively greenfield for this: `base` and `docker_host` are empty, and the
+only tags in existence are `[base]`/`[docker]` in `site.yml` and `[bootstrap]` in
+`bootstrap.yml`. So we can bake the standard into role-authoring conventions *before*
+there are a dozen service roles to retrofit.
+
+## Targeting axes (what we want to slice by)
+
+1. **Layer / role** — `--tags base`, `--tags docker`
+2. **Single service** — `--tags photoprism`, `--tags traefik`
+3. **Concern / function** — `--tags firewall`, `--tags logging`, …
+
+Lifecycle phases (bootstrap/config/deploy) are **not** a tag axis — `bootstrap.yml` vs
+`site.yml` already separate those as whole playbooks.
+
+Key simplification: because of ADR-004 (*one service = one role*, role name = service
+name), axes 1 and 2 are the **same mechanism** — a tag equal to the role name. Only the
+concern axis needs a curated vocabulary.
+
+## Approach (chosen): two-tier tagging
+
+**Tier 1 — role/service tag (mechanical).** The tag *equals the role name*, applied
+**once** at the role-import level in the playbook:
+
+```yaml
+roles:
+  - role: photoprism
+    tags: [photoprism]
+```
+
+Ansible propagates the tag to every task in the role. This covers both the layer/role
+and single-service axes with one rule and **zero per-task burden**.
+
+**Tier 2 — concern tag (curated).** A small **closed, documented list** of cross-cutting
+concern tags, applied per-task/block **only where a task genuinely belongs to that
+concern**. `--tags firewall` then hits firewall tasks in `base` and in every service
+role.
+
+Rejected alternatives: *concern-only/flat* (loses natural `--tags <service>` ergonomics);
+*rich multi-dimensional* (role+service+concern+lifecycle+ad-hoc per task) — that is
+precisely the over-tagging the TODO warns against.
+
+## The closed concern list
+
+Litmus test for earning a spot: a concern must (a) appear in **2+ roles**, (b) be
+something you'd realistically want to run as a slice on its own, and (c) not overlap
+confusingly with another.
+
+**Baseline concerns** (mostly in `base`, some echoed in service roles):
+
+| Tag | Covers |
+|-----|--------|
+| `packages`   | apt package install/management |
+| `users`      | accounts, groups, sudo |
+| `firewall`   | nftables rulesets & port definitions (ADR-002) |
+| `hardening`  | security baseline — sshd config, fail2ban, auditd, sysctl |
+| `logging`    | Alloy / log-shipping config (ADR-018) |
+| `monitoring` | metric exporters / health checks |
+
+**Service concerns** (in every service role, ADR-004):
+
+| Tag | Covers |
+|-----|--------|
+| `config` | render templated config/compose files to disk — **no restart** |
+| `deploy` | bring services up / restart (`compose up -d`) |
+| `proxy`  | reverse-proxy + TLS registration (Traefik routes, Authentik) |
+
+Nine tags total. The `config`/`deploy` split is deliberate and high-value: `--tags
+config` re-renders and lets you diff configuration without bouncing services; `--tags
+deploy` does the restart.
+
+`backup` and `secrets` are **intentionally omitted** until the roles that need them
+exist — they enter via the extend process, not speculative reservation.
+
+## `always` / `never` policy
+
+boma uses Ansible's two built-in special tags, narrowly:
+
+- **`always`** — reserved strictly for **cheap preflight assertions** (vault unlocked,
+  OS is Debian 13, required vars present). Ensures even `--tags config` runs its safety
+  guards.
+- **`never`** — reserved for **destructive/expensive opt-in tasks**, each paired with a
+  descriptive tag (e.g. `never, force_pull` or `never, restore`). They never run unless
+  explicitly named, keeping dangerous actions out of normal runs. The descriptive
+  partner tag is a documented `never`-paired opt-in (allowed by the linter).
+
+## Predictability principle: tags are union-only
+
+`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. Rather than fight
+this, we make it an explicit principle: **boma targets one axis at a time** — *either* a
+role/service (`--tags photoprism`) *or* a concern (`--tags firewall`), never an
+intersection like "photoprism's firewall only." If that is ever genuinely needed, the
+answer is "just run `--tags photoprism`" (idempotent and fast). Designing for
+intersection is the over-tagging trap; we decline it on purpose.
+
+## Reconciling the existing CLAUDE.md rule
+
+CLAUDE.md currently says *"every task must have at least one tag."* Under the two-tier
+model the role tag is applied **once at the play/import level** and **inherited** by
+every task, so tasks are always reachable without hand-tagging each one. The rule is
+**reworded** to:
+
+> Import each role with its role-name tag (once, at the play level). Within a role, tag a
+> task/block with a concern tag from the approved list **only where it genuinely belongs
+> to that concern** — don't invent tags or tag for tagging's sake.
+
+This directly resolves the "without over-tagging" tension.
+
+## Terraform / Proxmox VM tags (metadata only)
+
+Formalize the convention that already half-exists in `staging/main.tf`
+(`tags = ["staging", each.value.group]`). Every TF-managed VM gets exactly three tags:
+
+| Tag | Value | Purpose |
+|-----|-------|---------|
+| env        | `staging` \| `production`            | which environment |
+| role/group | `docker_hosts`, `proxmox_hosts`, …   | matches the inventory group |
+| managed-by | `terraform`                          | distinguishes IaC VMs from hand-made ones |
+
+Set as `tags = ["${env}", each.value.group, "managed-by=terraform"]` in the env
+`main.tf` (env is constant per directory).
+
+**Explicit non-goals** (stated so nobody wires them up later): these tags are **pure
+metadata for transparency** — glanceable in the Proxmox UI. They do **not** drive
+run-targeting and do **not** feed inventory. `scripts/tf_to_inventory.py` keeps building
+groups from the `group` output field, which stays the single source of truth.
+
+## Enforcement
+
+A small **lint check wired into `make lint`**: a script collects every `tags:` value
+across `roles/` and `playbooks/` and fails if any tag is not in the allowed set:
+
+```
+{role names} ∪ {9 concern tags} ∪ {always, never} ∪ {documented never-paired opt-ins}
+```
+
+The allowed concern list (and the `never`-paired opt-ins) live in **one
+machine-readable file, `tests/tags.yml`**, which both the linter reads and the ADR
+documents — so doc and enforcement cannot drift. This is more honest than ansible-lint's
+limited built-in tags rule. A unit test (mirroring `tests/test_capacity_scan.py`) covers
+the checker.
+
+## The "propose to extend" process
+
+To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the ADR-019 table
+with a one-line justification showing it passes the litmus test (cross-cutting, 2+
+roles, distinct). That is the whole gate — lightweight, but it leaves a paper trail.
+
+## Deliverables
+
+- **New `docs/decisions/019-tagging.md`** — the standard: rationale, two-tier model,
+  concern table, union-only principle, `always`/`never` policy, Proxmox tag convention,
+  extend process.
+- **`tests/tags.yml`** — machine-readable allowed concern list + `never`-paired opt-ins.
+- **Lint checker script** (e.g. `scripts/check-tags.py`) + **`make lint`** wiring +
+  **`tests/test_check_tags.py`**.
+- **CLAUDE.md** — reword the tag bullet under *Ansible conventions*; add the Proxmox tag
+  convention under *Terraform conventions*; add ADR-019 to *Further reading*.
+- **`terraform/environments/{staging,production}/main.tf`** — apply the three-tag
+  convention.
+- **`docs/TODO.md`** — mark 3.7 and 3.11 DECIDED (ADR-019).
+- **`docs/CAPABILITIES.md`** — note targeted runs as a capability, if it fits.
+
+## Out of scope
+
+- Intersection targeting (role ∩ concern) — declined on purpose (see principle).
+- Lifecycle-phase tags — handled by separate playbooks.
+- Proxmox tags feeding inventory or run-targeting — metadata only.
+- `backup`/`secrets` concern tags — added later via the extend process.