From 24b5e9361eea98b37f3f77bd8f16a8b2a294a1eb Mon Sep 17 00:00:00 2001 From: sjat Date: Sat, 6 Jun 2026 09:42:22 +0200 Subject: [PATCH] docs(tags): ADR-019 + CLAUDE.md/TODO/CAPABILITIES (tagging standard) Co-Authored-By: Claude Sonnet 4.6 --- CLAUDE.md | 10 ++- docs/CAPABILITIES.md | 4 ++ docs/TODO.md | 6 +- docs/decisions/019-tagging.md | 111 ++++++++++++++++++++++++++++++++++ 4 files changed, 128 insertions(+), 3 deletions(-) create mode 100644 docs/decisions/019-tagging.md diff --git a/CLAUDE.md b/CLAUDE.md index 543c6e2..10f70ce 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -51,7 +51,11 @@ Full design rationale: `docs/decisions/` ## Ansible conventions - **FQCN always**: `ansible.builtin.template`, never `template` -- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering +- **Tags** (ADR-019): import each role with its role-name tag once at the play level + (Ansible inherits it to every task). Tag a task/block with a concern tag from the + approved list (`tests/tags.yml`) only where it genuinely belongs to that concern — + don't invent tags or tag for tagging's sake. Target one axis at a time (role/service + *or* concern; tags are union/OR, never intersected). `make lint` enforces the vocabulary. - **Handlers**: use `listen:` topic strings, not direct name references - **Variables**: `rolename__varname` double-underscore namespace for role defaults - **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only @@ -144,6 +148,9 @@ Single-contributor, trunk-based (no merge requests / approval gates): ## Terraform conventions - Terraform owns VM existence only — nothing inside a VM, and no DNS records +- Every TF-managed VM carries three Proxmox tags — ``, its inventory `group`, and + `managed-by=terraform` — as **metadata only** (ADR-019). They do not feed inventory + or run-targeting; `tf_to_inventory.py` still groups by the `group` output field. - Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory) - OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider - Environments are separate directories (`staging/`, `production/`), not workspaces @@ -215,6 +222,7 @@ Single-contributor, trunk-based (no merge requests / approval gates): | Update management | `docs/decisions/011-update-management.md` | | Hardware & capacity | `docs/decisions/012-hardware-capacity.md` | | Logging & log integrity | `docs/decisions/018-logging.md` | +| Tagging & run-targeting | `docs/decisions/019-tagging.md` | | Adding a new role | `docs/runbooks/new-role.md` | | Adding a new host | `docs/runbooks/new-host.md` | | Rotating vault secrets | `docs/runbooks/rotate-secrets.md` | diff --git a/docs/CAPABILITIES.md b/docs/CAPABILITIES.md index a149eb2..b71ee4a 100644 --- a/docs/CAPABILITIES.md +++ b/docs/CAPABILITIES.md @@ -112,6 +112,10 @@ _(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not cont | Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 | | Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik | +- **Targeted runs** (ADR-019): playbooks are sliced with `--tags` along two axes — + role/service (tag = role name) or a closed list of cross-cutting concerns + (`firewall`, `logging`, `config`, `deploy`, …); the vocabulary is lint-enforced. + --- ## V4 completeness check diff --git a/docs/TODO.md b/docs/TODO.md index 9d9d38e..c8699c8 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -28,11 +28,13 @@ (all logs) + off-site security subset on `askari` + Grafana on-cluster (not the whole stack on `askari`). Still to design/build: Prometheus + metric exporters, Uptime Kuma, and exactly which alerts live where. - 7. Define a tagging standard that lets us target runs without over-tagging. + 7. ~~Define a tagging standard that lets us target runs without over-tagging.~~ + DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed + 9-tag concern list (`tests/tags.yml`); union-only targeting; enforced by `make lint`. 8. Ensure the right things are backed up (incl. database dumps if we land on PBS). 9. Decide: a central database server, or individual database services per app? 10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)? - 11. Deliberate tagging strategy. + 11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7. 4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani? diff --git a/docs/decisions/019-tagging.md b/docs/decisions/019-tagging.md new file mode 100644 index 0000000..3772a68 --- /dev/null +++ b/docs/decisions/019-tagging.md @@ -0,0 +1,111 @@ +# ADR-019 — Tagging standard for targeted, predictable runs + +## Status + +Accepted (2026-06-06). Resolves TODO 3.7 ("Define a tagging standard that lets us +target runs without over-tagging") and TODO 3.11 ("Deliberate tagging strategy"). + +## Context + +boma wants to run playbooks **targeted** — a single service, a single layer, or a +single cross-cutting concern — **transparently and predictably**: a reader should +know from a `--tags` invocation exactly what it will and won't touch. CLAUDE.md +already requires tag-filterable tasks, but no vocabulary or convention existed, and +the TODO explicitly warns against the opposite failure mode: **over-tagging**. + +## Decision + +### Two-tier tagging + +**Tier 1 — role/service tag (mechanical).** The tag equals the role name, applied +once at the role-import level: + +```yaml +roles: + - role: photoprism + tags: [photoprism] +``` + +Ansible propagates it to every task in the role. Because one service = one role +(ADR-004), this single rule covers both the *layer/role* and *single-service* +targeting axes with zero per-task burden. Role-less lifecycle playbooks +(e.g. `bootstrap.yml`) carry a single playbook-identity tag instead. + +**Tier 2 — concern tag (curated).** A small **closed list** of cross-cutting concern +tags, applied per-task/block **only where a task genuinely belongs to that concern**. + +### The closed concern list + +A concern earns a tag only if it (a) appears in 2+ roles, (b) is worth running as a +slice on its own, and (c) doesn't overlap confusingly with another. + +| Tag | Covers | +|-----|--------| +| `packages` | apt package install/management | +| `users` | accounts, groups, sudo | +| `firewall` | nftables rulesets & port definitions (ADR-002) | +| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl | +| `logging` | Alloy / log-shipping config (ADR-018) | +| `monitoring` | metric exporters / health checks | +| `config` | render templated config/compose files to disk — **no restart** | +| `deploy` | bring services up / restart (`compose up -d`) | +| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) | + +The `config`/`deploy` split lets you re-render and diff configuration (`--tags +config`) without bouncing services, then restart deliberately (`--tags deploy`). +`backup` and `secrets` are intentionally omitted until the roles needing them exist. + +### `always` / `never` + +- **`always`** — reserved for cheap preflight assertions (vault unlocked, OS is + Debian 13, required vars present), so even `--tags config` runs its safety guards. +- **`never`** — reserved for destructive/expensive opt-in tasks, each paired with a + descriptive tag (e.g. `tags: [never, force_pull]`); they run only when named. + +### Predictability principle: tags are union-only + +`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. boma therefore +targets **one axis at a time**: either a role/service *or* a concern, never an +intersection like "photoprism's firewall only." If that's ever needed, just run +`--tags photoprism` (idempotent and fast). Designing for intersection is the +over-tagging trap; we decline it on purpose. + +### Terraform / Proxmox VM tags (metadata only) + +Every Terraform-managed VM carries exactly three Proxmox tags: + +| Tag | Value | Purpose | +|-----|-------|---------| +| env | `staging` \| `production` | which environment | +| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group | +| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones | + +These are **pure metadata for transparency** (glanceable in the Proxmox UI). They do +**not** drive run-targeting and do **not** feed inventory — `scripts/tf_to_inventory.py` +keeps building groups from the `group` output field, the single source of truth. + +## Enforcement + +`tests/tags.yml` is the single source of truth for the allowed concern/special/ +opt-in/playbook tags. `scripts/check-tags.py` (run by `make lint`, covered by +`tests/test_check_tags.py`) scans `roles/` and `playbooks/` and fails on any tag +outside `{role directory names} ∪ {tests/tags.yml entries}`. + +## Extending the vocabulary + +To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the concern +table above with a one-line justification showing it passes the litmus test +(cross-cutting, 2+ roles, distinct). That is the whole gate — lightweight, but it +leaves a paper trail. + +## Consequences + +- Targeted runs are predictable: only two kinds of tags exist, one of them mechanical. +- Over-tagging is structurally resisted (closed list + lint enforcement). +- Intersection targeting is unavailable by design. +- Authors must keep role tags = role names; the linter enforces it. + +## Related + +ADR-002 (security baseline / firewall), ADR-004 (one service = one role), +ADR-009 (TF↔Ansible handoff / inventory), ADR-018 (logging).