docs(tags): ADR-019 + CLAUDE.md/TODO/CAPABILITIES (tagging standard)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-06 09:42:22 +02:00
parent 9584cc2c76
commit 24b5e9361e
4 changed files with 128 additions and 3 deletions

View file

@ -51,7 +51,11 @@ Full design rationale: `docs/decisions/`
## Ansible conventions
- **FQCN always**: `ansible.builtin.template`, never `template`
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering
- **Tags** (ADR-019): import each role with its role-name tag once at the play level
(Ansible inherits it to every task). Tag a task/block with a concern tag from the
approved list (`tests/tags.yml`) only where it genuinely belongs to that concern —
don't invent tags or tag for tagging's sake. Target one axis at a time (role/service
*or* concern; tags are union/OR, never intersected). `make lint` enforces the vocabulary.
- **Handlers**: use `listen:` topic strings, not direct name references
- **Variables**: `rolename__varname` double-underscore namespace for role defaults
- **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only
@ -144,6 +148,9 @@ Single-contributor, trunk-based (no merge requests / approval gates):
## Terraform conventions
- Terraform owns VM existence only — nothing inside a VM, and no DNS records
- Every TF-managed VM carries three Proxmox tags — `<env>`, its inventory `group`, and
`managed-by=terraform` — as **metadata only** (ADR-019). They do not feed inventory
or run-targeting; `tf_to_inventory.py` still groups by the `group` output field.
- Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory)
- OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider
- Environments are separate directories (`staging/`, `production/`), not workspaces
@ -215,6 +222,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
| Update management | `docs/decisions/011-update-management.md` |
| Hardware & capacity | `docs/decisions/012-hardware-capacity.md` |
| Logging & log integrity | `docs/decisions/018-logging.md` |
| Tagging & run-targeting | `docs/decisions/019-tagging.md` |
| Adding a new role | `docs/runbooks/new-role.md` |
| Adding a new host | `docs/runbooks/new-host.md` |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |

View file

@ -112,6 +112,10 @@ _(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not cont
| Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 |
| Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik |
- **Targeted runs** (ADR-019): playbooks are sliced with `--tags` along two axes —
role/service (tag = role name) or a closed list of cross-cutting concerns
(`firewall`, `logging`, `config`, `deploy`, …); the vocabulary is lint-enforced.
---
## V4 completeness check

View file

@ -28,11 +28,13 @@
(all logs) + off-site security subset on `askari` + Grafana on-cluster (not the
whole stack on `askari`). Still to design/build: Prometheus + metric exporters,
Uptime Kuma, and exactly which alerts live where.
7. Define a tagging standard that lets us target runs without over-tagging.
7. ~~Define a tagging standard that lets us target runs without over-tagging.~~
DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed
9-tag concern list (`tests/tags.yml`); union-only targeting; enforced by `make lint`.
8. Ensure the right things are backed up (incl. database dumps if we land on PBS).
9. Decide: a central database server, or individual database services per app?
10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)?
11. Deliberate tagging strategy.
11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7.
4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?

View file

@ -0,0 +1,111 @@
# ADR-019 — Tagging standard for targeted, predictable runs
## Status
Accepted (2026-06-06). Resolves TODO 3.7 ("Define a tagging standard that lets us
target runs without over-tagging") and TODO 3.11 ("Deliberate tagging strategy").
## Context
boma wants to run playbooks **targeted** — a single service, a single layer, or a
single cross-cutting concern — **transparently and predictably**: a reader should
know from a `--tags` invocation exactly what it will and won't touch. CLAUDE.md
already requires tag-filterable tasks, but no vocabulary or convention existed, and
the TODO explicitly warns against the opposite failure mode: **over-tagging**.
## Decision
### Two-tier tagging
**Tier 1 — role/service tag (mechanical).** The tag equals the role name, applied
once at the role-import level:
```yaml
roles:
- role: photoprism
tags: [photoprism]
```
Ansible propagates it to every task in the role. Because one service = one role
(ADR-004), this single rule covers both the *layer/role* and *single-service*
targeting axes with zero per-task burden. Role-less lifecycle playbooks
(e.g. `bootstrap.yml`) carry a single playbook-identity tag instead.
**Tier 2 — concern tag (curated).** A small **closed list** of cross-cutting concern
tags, applied per-task/block **only where a task genuinely belongs to that concern**.
### The closed concern list
A concern earns a tag only if it (a) appears in 2+ roles, (b) is worth running as a
slice on its own, and (c) doesn't overlap confusingly with another.
| Tag | Covers |
|-----|--------|
| `packages` | apt package install/management |
| `users` | accounts, groups, sudo |
| `firewall` | nftables rulesets & port definitions (ADR-002) |
| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl |
| `logging` | Alloy / log-shipping config (ADR-018) |
| `monitoring` | metric exporters / health checks |
| `config` | render templated config/compose files to disk — **no restart** |
| `deploy` | bring services up / restart (`compose up -d`) |
| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) |
The `config`/`deploy` split lets you re-render and diff configuration (`--tags
config`) without bouncing services, then restart deliberately (`--tags deploy`).
`backup` and `secrets` are intentionally omitted until the roles needing them exist.
### `always` / `never`
- **`always`** — reserved for cheap preflight assertions (vault unlocked, OS is
Debian 13, required vars present), so even `--tags config` runs its safety guards.
- **`never`** — reserved for destructive/expensive opt-in tasks, each paired with a
descriptive tag (e.g. `tags: [never, force_pull]`); they run only when named.
### Predictability principle: tags are union-only
`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. boma therefore
targets **one axis at a time**: either a role/service *or* a concern, never an
intersection like "photoprism's firewall only." If that's ever needed, just run
`--tags photoprism` (idempotent and fast). Designing for intersection is the
over-tagging trap; we decline it on purpose.
### Terraform / Proxmox VM tags (metadata only)
Every Terraform-managed VM carries exactly three Proxmox tags:
| Tag | Value | Purpose |
|-----|-------|---------|
| env | `staging` \| `production` | which environment |
| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group |
| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones |
These are **pure metadata for transparency** (glanceable in the Proxmox UI). They do
**not** drive run-targeting and do **not** feed inventory — `scripts/tf_to_inventory.py`
keeps building groups from the `group` output field, the single source of truth.
## Enforcement
`tests/tags.yml` is the single source of truth for the allowed concern/special/
opt-in/playbook tags. `scripts/check-tags.py` (run by `make lint`, covered by
`tests/test_check_tags.py`) scans `roles/` and `playbooks/` and fails on any tag
outside `{role directory names} {tests/tags.yml entries}`.
## Extending the vocabulary
To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the concern
table above with a one-line justification showing it passes the litmus test
(cross-cutting, 2+ roles, distinct). That is the whole gate — lightweight, but it
leaves a paper trail.
## Consequences
- Targeted runs are predictable: only two kinds of tags exist, one of them mechanical.
- Over-tagging is structurally resisted (closed list + lint enforcement).
- Intersection targeting is unavailable by design.
- Authors must keep role tags = role names; the linter enforces it.
## Related
ADR-002 (security baseline / firewall), ADR-004 (one service = one role),
ADR-009 (TF↔Ansible handoff / inventory), ADR-018 (logging).