From 4ed9e9a8bf857dc6219dbe94bd753a62e231d451 Mon Sep 17 00:00:00 2001 From: sjat Date: Sat, 6 Jun 2026 09:15:44 +0200 Subject: [PATCH] =?UTF-8?q?docs(spec):=20tagging=20standard=20design=20(TO?= =?UTF-8?q?DO=203.7/3.11=20=E2=86=92=20ADR-019)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- .../2026-06-06-tagging-strategy-design.md | 188 ++++++++++++++++++ 1 file changed, 188 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-06-tagging-strategy-design.md diff --git a/docs/superpowers/specs/2026-06-06-tagging-strategy-design.md b/docs/superpowers/specs/2026-06-06-tagging-strategy-design.md new file mode 100644 index 0000000..896fab2 --- /dev/null +++ b/docs/superpowers/specs/2026-06-06-tagging-strategy-design.md @@ -0,0 +1,188 @@ +# Design — Ansible tagging standard (targeted, predictable runs) + +- **Date:** 2026-06-06 +- **Status:** Approved design — pending implementation plan +- **Resolves:** TODO 3.7 ("Define a tagging standard that lets us target runs without + over-tagging") and TODO 3.11 ("Deliberate tagging strategy") — the same thread +- **Becomes:** ADR-019 (this design is the basis for that ADR) + +--- + +## Problem + +boma wants to run playbooks **targeted** — a single service, a single layer, or a +single cross-cutting concern — and to do so **transparently and predictably**: you +should be able to look at a `--tags` invocation and know exactly what it will and won't +touch. CLAUDE.md already mandates that every task be tag-filterable, but no *vocabulary* +or *naming convention* exists. Without one, tags proliferate ad-hoc per role and the +"predictable" property is lost — and the TODO explicitly warns against the opposite +failure mode, **over-tagging**. + +The repo is effectively greenfield for this: `base` and `docker_host` are empty, and the +only tags in existence are `[base]`/`[docker]` in `site.yml` and `[bootstrap]` in +`bootstrap.yml`. So we can bake the standard into role-authoring conventions *before* +there are a dozen service roles to retrofit. + +## Targeting axes (what we want to slice by) + +1. **Layer / role** — `--tags base`, `--tags docker` +2. **Single service** — `--tags photoprism`, `--tags traefik` +3. **Concern / function** — `--tags firewall`, `--tags logging`, … + +Lifecycle phases (bootstrap/config/deploy) are **not** a tag axis — `bootstrap.yml` vs +`site.yml` already separate those as whole playbooks. + +Key simplification: because of ADR-004 (*one service = one role*, role name = service +name), axes 1 and 2 are the **same mechanism** — a tag equal to the role name. Only the +concern axis needs a curated vocabulary. + +## Approach (chosen): two-tier tagging + +**Tier 1 — role/service tag (mechanical).** The tag *equals the role name*, applied +**once** at the role-import level in the playbook: + +```yaml +roles: + - role: photoprism + tags: [photoprism] +``` + +Ansible propagates the tag to every task in the role. This covers both the layer/role +and single-service axes with one rule and **zero per-task burden**. + +**Tier 2 — concern tag (curated).** A small **closed, documented list** of cross-cutting +concern tags, applied per-task/block **only where a task genuinely belongs to that +concern**. `--tags firewall` then hits firewall tasks in `base` and in every service +role. + +Rejected alternatives: *concern-only/flat* (loses natural `--tags ` ergonomics); +*rich multi-dimensional* (role+service+concern+lifecycle+ad-hoc per task) — that is +precisely the over-tagging the TODO warns against. + +## The closed concern list + +Litmus test for earning a spot: a concern must (a) appear in **2+ roles**, (b) be +something you'd realistically want to run as a slice on its own, and (c) not overlap +confusingly with another. + +**Baseline concerns** (mostly in `base`, some echoed in service roles): + +| Tag | Covers | +|-----|--------| +| `packages` | apt package install/management | +| `users` | accounts, groups, sudo | +| `firewall` | nftables rulesets & port definitions (ADR-002) | +| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl | +| `logging` | Alloy / log-shipping config (ADR-018) | +| `monitoring` | metric exporters / health checks | + +**Service concerns** (in every service role, ADR-004): + +| Tag | Covers | +|-----|--------| +| `config` | render templated config/compose files to disk — **no restart** | +| `deploy` | bring services up / restart (`compose up -d`) | +| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) | + +Nine tags total. The `config`/`deploy` split is deliberate and high-value: `--tags +config` re-renders and lets you diff configuration without bouncing services; `--tags +deploy` does the restart. + +`backup` and `secrets` are **intentionally omitted** until the roles that need them +exist — they enter via the extend process, not speculative reservation. + +## `always` / `never` policy + +boma uses Ansible's two built-in special tags, narrowly: + +- **`always`** — reserved strictly for **cheap preflight assertions** (vault unlocked, + OS is Debian 13, required vars present). Ensures even `--tags config` runs its safety + guards. +- **`never`** — reserved for **destructive/expensive opt-in tasks**, each paired with a + descriptive tag (e.g. `never, force_pull` or `never, restore`). They never run unless + explicitly named, keeping dangerous actions out of normal runs. The descriptive + partner tag is a documented `never`-paired opt-in (allowed by the linter). + +## Predictability principle: tags are union-only + +`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. Rather than fight +this, we make it an explicit principle: **boma targets one axis at a time** — *either* a +role/service (`--tags photoprism`) *or* a concern (`--tags firewall`), never an +intersection like "photoprism's firewall only." If that is ever genuinely needed, the +answer is "just run `--tags photoprism`" (idempotent and fast). Designing for +intersection is the over-tagging trap; we decline it on purpose. + +## Reconciling the existing CLAUDE.md rule + +CLAUDE.md currently says *"every task must have at least one tag."* Under the two-tier +model the role tag is applied **once at the play/import level** and **inherited** by +every task, so tasks are always reachable without hand-tagging each one. The rule is +**reworded** to: + +> Import each role with its role-name tag (once, at the play level). Within a role, tag a +> task/block with a concern tag from the approved list **only where it genuinely belongs +> to that concern** — don't invent tags or tag for tagging's sake. + +This directly resolves the "without over-tagging" tension. + +## Terraform / Proxmox VM tags (metadata only) + +Formalize the convention that already half-exists in `staging/main.tf` +(`tags = ["staging", each.value.group]`). Every TF-managed VM gets exactly three tags: + +| Tag | Value | Purpose | +|-----|-------|---------| +| env | `staging` \| `production` | which environment | +| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group | +| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones | + +Set as `tags = ["${env}", each.value.group, "managed-by=terraform"]` in the env +`main.tf` (env is constant per directory). + +**Explicit non-goals** (stated so nobody wires them up later): these tags are **pure +metadata for transparency** — glanceable in the Proxmox UI. They do **not** drive +run-targeting and do **not** feed inventory. `scripts/tf_to_inventory.py` keeps building +groups from the `group` output field, which stays the single source of truth. + +## Enforcement + +A small **lint check wired into `make lint`**: a script collects every `tags:` value +across `roles/` and `playbooks/` and fails if any tag is not in the allowed set: + +``` +{role names} ∪ {9 concern tags} ∪ {always, never} ∪ {documented never-paired opt-ins} +``` + +The allowed concern list (and the `never`-paired opt-ins) live in **one +machine-readable file, `tests/tags.yml`**, which both the linter reads and the ADR +documents — so doc and enforcement cannot drift. This is more honest than ansible-lint's +limited built-in tags rule. A unit test (mirroring `tests/test_capacity_scan.py`) covers +the checker. + +## The "propose to extend" process + +To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the ADR-019 table +with a one-line justification showing it passes the litmus test (cross-cutting, 2+ +roles, distinct). That is the whole gate — lightweight, but it leaves a paper trail. + +## Deliverables + +- **New `docs/decisions/019-tagging.md`** — the standard: rationale, two-tier model, + concern table, union-only principle, `always`/`never` policy, Proxmox tag convention, + extend process. +- **`tests/tags.yml`** — machine-readable allowed concern list + `never`-paired opt-ins. +- **Lint checker script** (e.g. `scripts/check-tags.py`) + **`make lint`** wiring + + **`tests/test_check_tags.py`**. +- **CLAUDE.md** — reword the tag bullet under *Ansible conventions*; add the Proxmox tag + convention under *Terraform conventions*; add ADR-019 to *Further reading*. +- **`terraform/environments/{staging,production}/main.tf`** — apply the three-tag + convention. +- **`docs/TODO.md`** — mark 3.7 and 3.11 DECIDED (ADR-019). +- **`docs/CAPABILITIES.md`** — note targeted runs as a capability, if it fits. + +## Out of scope + +- Intersection targeting (role ∩ concern) — declined on purpose (see principle). +- Lifecycle-phase tags — handled by separate playbooks. +- Proxmox tags feeding inventory or run-targeting — metadata only. +- `backup`/`secrets` concern tags — added later via the extend process.