docs(spec): tagging standard design (TODO 3.7/3.11 → ADR-019)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9bdb3017bb
commit
4ed9e9a8bf
1 changed files with 188 additions and 0 deletions
188
docs/superpowers/specs/2026-06-06-tagging-strategy-design.md
Normal file
188
docs/superpowers/specs/2026-06-06-tagging-strategy-design.md
Normal file
|
|
@ -0,0 +1,188 @@
|
|||
# Design — Ansible tagging standard (targeted, predictable runs)
|
||||
|
||||
- **Date:** 2026-06-06
|
||||
- **Status:** Approved design — pending implementation plan
|
||||
- **Resolves:** TODO 3.7 ("Define a tagging standard that lets us target runs without
|
||||
over-tagging") and TODO 3.11 ("Deliberate tagging strategy") — the same thread
|
||||
- **Becomes:** ADR-019 (this design is the basis for that ADR)
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
boma wants to run playbooks **targeted** — a single service, a single layer, or a
|
||||
single cross-cutting concern — and to do so **transparently and predictably**: you
|
||||
should be able to look at a `--tags` invocation and know exactly what it will and won't
|
||||
touch. CLAUDE.md already mandates that every task be tag-filterable, but no *vocabulary*
|
||||
or *naming convention* exists. Without one, tags proliferate ad-hoc per role and the
|
||||
"predictable" property is lost — and the TODO explicitly warns against the opposite
|
||||
failure mode, **over-tagging**.
|
||||
|
||||
The repo is effectively greenfield for this: `base` and `docker_host` are empty, and the
|
||||
only tags in existence are `[base]`/`[docker]` in `site.yml` and `[bootstrap]` in
|
||||
`bootstrap.yml`. So we can bake the standard into role-authoring conventions *before*
|
||||
there are a dozen service roles to retrofit.
|
||||
|
||||
## Targeting axes (what we want to slice by)
|
||||
|
||||
1. **Layer / role** — `--tags base`, `--tags docker`
|
||||
2. **Single service** — `--tags photoprism`, `--tags traefik`
|
||||
3. **Concern / function** — `--tags firewall`, `--tags logging`, …
|
||||
|
||||
Lifecycle phases (bootstrap/config/deploy) are **not** a tag axis — `bootstrap.yml` vs
|
||||
`site.yml` already separate those as whole playbooks.
|
||||
|
||||
Key simplification: because of ADR-004 (*one service = one role*, role name = service
|
||||
name), axes 1 and 2 are the **same mechanism** — a tag equal to the role name. Only the
|
||||
concern axis needs a curated vocabulary.
|
||||
|
||||
## Approach (chosen): two-tier tagging
|
||||
|
||||
**Tier 1 — role/service tag (mechanical).** The tag *equals the role name*, applied
|
||||
**once** at the role-import level in the playbook:
|
||||
|
||||
```yaml
|
||||
roles:
|
||||
- role: photoprism
|
||||
tags: [photoprism]
|
||||
```
|
||||
|
||||
Ansible propagates the tag to every task in the role. This covers both the layer/role
|
||||
and single-service axes with one rule and **zero per-task burden**.
|
||||
|
||||
**Tier 2 — concern tag (curated).** A small **closed, documented list** of cross-cutting
|
||||
concern tags, applied per-task/block **only where a task genuinely belongs to that
|
||||
concern**. `--tags firewall` then hits firewall tasks in `base` and in every service
|
||||
role.
|
||||
|
||||
Rejected alternatives: *concern-only/flat* (loses natural `--tags <service>` ergonomics);
|
||||
*rich multi-dimensional* (role+service+concern+lifecycle+ad-hoc per task) — that is
|
||||
precisely the over-tagging the TODO warns against.
|
||||
|
||||
## The closed concern list
|
||||
|
||||
Litmus test for earning a spot: a concern must (a) appear in **2+ roles**, (b) be
|
||||
something you'd realistically want to run as a slice on its own, and (c) not overlap
|
||||
confusingly with another.
|
||||
|
||||
**Baseline concerns** (mostly in `base`, some echoed in service roles):
|
||||
|
||||
| Tag | Covers |
|
||||
|-----|--------|
|
||||
| `packages` | apt package install/management |
|
||||
| `users` | accounts, groups, sudo |
|
||||
| `firewall` | nftables rulesets & port definitions (ADR-002) |
|
||||
| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl |
|
||||
| `logging` | Alloy / log-shipping config (ADR-018) |
|
||||
| `monitoring` | metric exporters / health checks |
|
||||
|
||||
**Service concerns** (in every service role, ADR-004):
|
||||
|
||||
| Tag | Covers |
|
||||
|-----|--------|
|
||||
| `config` | render templated config/compose files to disk — **no restart** |
|
||||
| `deploy` | bring services up / restart (`compose up -d`) |
|
||||
| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) |
|
||||
|
||||
Nine tags total. The `config`/`deploy` split is deliberate and high-value: `--tags
|
||||
config` re-renders and lets you diff configuration without bouncing services; `--tags
|
||||
deploy` does the restart.
|
||||
|
||||
`backup` and `secrets` are **intentionally omitted** until the roles that need them
|
||||
exist — they enter via the extend process, not speculative reservation.
|
||||
|
||||
## `always` / `never` policy
|
||||
|
||||
boma uses Ansible's two built-in special tags, narrowly:
|
||||
|
||||
- **`always`** — reserved strictly for **cheap preflight assertions** (vault unlocked,
|
||||
OS is Debian 13, required vars present). Ensures even `--tags config` runs its safety
|
||||
guards.
|
||||
- **`never`** — reserved for **destructive/expensive opt-in tasks**, each paired with a
|
||||
descriptive tag (e.g. `never, force_pull` or `never, restore`). They never run unless
|
||||
explicitly named, keeping dangerous actions out of normal runs. The descriptive
|
||||
partner tag is a documented `never`-paired opt-in (allowed by the linter).
|
||||
|
||||
## Predictability principle: tags are union-only
|
||||
|
||||
`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. Rather than fight
|
||||
this, we make it an explicit principle: **boma targets one axis at a time** — *either* a
|
||||
role/service (`--tags photoprism`) *or* a concern (`--tags firewall`), never an
|
||||
intersection like "photoprism's firewall only." If that is ever genuinely needed, the
|
||||
answer is "just run `--tags photoprism`" (idempotent and fast). Designing for
|
||||
intersection is the over-tagging trap; we decline it on purpose.
|
||||
|
||||
## Reconciling the existing CLAUDE.md rule
|
||||
|
||||
CLAUDE.md currently says *"every task must have at least one tag."* Under the two-tier
|
||||
model the role tag is applied **once at the play/import level** and **inherited** by
|
||||
every task, so tasks are always reachable without hand-tagging each one. The rule is
|
||||
**reworded** to:
|
||||
|
||||
> Import each role with its role-name tag (once, at the play level). Within a role, tag a
|
||||
> task/block with a concern tag from the approved list **only where it genuinely belongs
|
||||
> to that concern** — don't invent tags or tag for tagging's sake.
|
||||
|
||||
This directly resolves the "without over-tagging" tension.
|
||||
|
||||
## Terraform / Proxmox VM tags (metadata only)
|
||||
|
||||
Formalize the convention that already half-exists in `staging/main.tf`
|
||||
(`tags = ["staging", each.value.group]`). Every TF-managed VM gets exactly three tags:
|
||||
|
||||
| Tag | Value | Purpose |
|
||||
|-----|-------|---------|
|
||||
| env | `staging` \| `production` | which environment |
|
||||
| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group |
|
||||
| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones |
|
||||
|
||||
Set as `tags = ["${env}", each.value.group, "managed-by=terraform"]` in the env
|
||||
`main.tf` (env is constant per directory).
|
||||
|
||||
**Explicit non-goals** (stated so nobody wires them up later): these tags are **pure
|
||||
metadata for transparency** — glanceable in the Proxmox UI. They do **not** drive
|
||||
run-targeting and do **not** feed inventory. `scripts/tf_to_inventory.py` keeps building
|
||||
groups from the `group` output field, which stays the single source of truth.
|
||||
|
||||
## Enforcement
|
||||
|
||||
A small **lint check wired into `make lint`**: a script collects every `tags:` value
|
||||
across `roles/` and `playbooks/` and fails if any tag is not in the allowed set:
|
||||
|
||||
```
|
||||
{role names} ∪ {9 concern tags} ∪ {always, never} ∪ {documented never-paired opt-ins}
|
||||
```
|
||||
|
||||
The allowed concern list (and the `never`-paired opt-ins) live in **one
|
||||
machine-readable file, `tests/tags.yml`**, which both the linter reads and the ADR
|
||||
documents — so doc and enforcement cannot drift. This is more honest than ansible-lint's
|
||||
limited built-in tags rule. A unit test (mirroring `tests/test_capacity_scan.py`) covers
|
||||
the checker.
|
||||
|
||||
## The "propose to extend" process
|
||||
|
||||
To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the ADR-019 table
|
||||
with a one-line justification showing it passes the litmus test (cross-cutting, 2+
|
||||
roles, distinct). That is the whole gate — lightweight, but it leaves a paper trail.
|
||||
|
||||
## Deliverables
|
||||
|
||||
- **New `docs/decisions/019-tagging.md`** — the standard: rationale, two-tier model,
|
||||
concern table, union-only principle, `always`/`never` policy, Proxmox tag convention,
|
||||
extend process.
|
||||
- **`tests/tags.yml`** — machine-readable allowed concern list + `never`-paired opt-ins.
|
||||
- **Lint checker script** (e.g. `scripts/check-tags.py`) + **`make lint`** wiring +
|
||||
**`tests/test_check_tags.py`**.
|
||||
- **CLAUDE.md** — reword the tag bullet under *Ansible conventions*; add the Proxmox tag
|
||||
convention under *Terraform conventions*; add ADR-019 to *Further reading*.
|
||||
- **`terraform/environments/{staging,production}/main.tf`** — apply the three-tag
|
||||
convention.
|
||||
- **`docs/TODO.md`** — mark 3.7 and 3.11 DECIDED (ADR-019).
|
||||
- **`docs/CAPABILITIES.md`** — note targeted runs as a capability, if it fits.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Intersection targeting (role ∩ concern) — declined on purpose (see principle).
|
||||
- Lifecycle-phase tags — handled by separate playbooks.
|
||||
- Proxmox tags feeding inventory or run-targeting — metadata only.
|
||||
- `backup`/`secrets` concern tags — added later via the extend process.
|
||||
Loading…
Add table
Reference in a new issue