docs(tags): ADR-019 + CLAUDE.md/TODO/CAPABILITIES (tagging standard)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-06 09:42:22 +02:00
parent 9584cc2c76
commit 24b5e9361e
4 changed files with 128 additions and 3 deletions

View file

@ -51,7 +51,11 @@ Full design rationale: `docs/decisions/`
## Ansible conventions ## Ansible conventions
- **FQCN always**: `ansible.builtin.template`, never `template` - **FQCN always**: `ansible.builtin.template`, never `template`
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering - **Tags** (ADR-019): import each role with its role-name tag once at the play level
(Ansible inherits it to every task). Tag a task/block with a concern tag from the
approved list (`tests/tags.yml`) only where it genuinely belongs to that concern —
don't invent tags or tag for tagging's sake. Target one axis at a time (role/service
*or* concern; tags are union/OR, never intersected). `make lint` enforces the vocabulary.
- **Handlers**: use `listen:` topic strings, not direct name references - **Handlers**: use `listen:` topic strings, not direct name references
- **Variables**: `rolename__varname` double-underscore namespace for role defaults - **Variables**: `rolename__varname` double-underscore namespace for role defaults
- **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only - **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only
@ -144,6 +148,9 @@ Single-contributor, trunk-based (no merge requests / approval gates):
## Terraform conventions ## Terraform conventions
- Terraform owns VM existence only — nothing inside a VM, and no DNS records - Terraform owns VM existence only — nothing inside a VM, and no DNS records
- Every TF-managed VM carries three Proxmox tags — `<env>`, its inventory `group`, and
`managed-by=terraform` — as **metadata only** (ADR-019). They do not feed inventory
or run-targeting; `tf_to_inventory.py` still groups by the `group` output field.
- Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory) - Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory)
- OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider - OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider
- Environments are separate directories (`staging/`, `production/`), not workspaces - Environments are separate directories (`staging/`, `production/`), not workspaces
@ -215,6 +222,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
| Update management | `docs/decisions/011-update-management.md` | | Update management | `docs/decisions/011-update-management.md` |
| Hardware & capacity | `docs/decisions/012-hardware-capacity.md` | | Hardware & capacity | `docs/decisions/012-hardware-capacity.md` |
| Logging & log integrity | `docs/decisions/018-logging.md` | | Logging & log integrity | `docs/decisions/018-logging.md` |
| Tagging & run-targeting | `docs/decisions/019-tagging.md` |
| Adding a new role | `docs/runbooks/new-role.md` | | Adding a new role | `docs/runbooks/new-role.md` |
| Adding a new host | `docs/runbooks/new-host.md` | | Adding a new host | `docs/runbooks/new-host.md` |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` | | Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |

View file

@ -112,6 +112,10 @@ _(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not cont
| Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 | | Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 |
| Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik | | Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik |
- **Targeted runs** (ADR-019): playbooks are sliced with `--tags` along two axes —
role/service (tag = role name) or a closed list of cross-cutting concerns
(`firewall`, `logging`, `config`, `deploy`, …); the vocabulary is lint-enforced.
--- ---
## V4 completeness check ## V4 completeness check

View file

@ -28,11 +28,13 @@
(all logs) + off-site security subset on `askari` + Grafana on-cluster (not the (all logs) + off-site security subset on `askari` + Grafana on-cluster (not the
whole stack on `askari`). Still to design/build: Prometheus + metric exporters, whole stack on `askari`). Still to design/build: Prometheus + metric exporters,
Uptime Kuma, and exactly which alerts live where. Uptime Kuma, and exactly which alerts live where.
7. Define a tagging standard that lets us target runs without over-tagging. 7. ~~Define a tagging standard that lets us target runs without over-tagging.~~
DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed
9-tag concern list (`tests/tags.yml`); union-only targeting; enforced by `make lint`.
8. Ensure the right things are backed up (incl. database dumps if we land on PBS). 8. Ensure the right things are backed up (incl. database dumps if we land on PBS).
9. Decide: a central database server, or individual database services per app? 9. Decide: a central database server, or individual database services per app?
10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)? 10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)?
11. Deliberate tagging strategy. 11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7.
4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani? 4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?

View file

@ -0,0 +1,111 @@
# ADR-019 — Tagging standard for targeted, predictable runs
## Status
Accepted (2026-06-06). Resolves TODO 3.7 ("Define a tagging standard that lets us
target runs without over-tagging") and TODO 3.11 ("Deliberate tagging strategy").
## Context
boma wants to run playbooks **targeted** — a single service, a single layer, or a
single cross-cutting concern — **transparently and predictably**: a reader should
know from a `--tags` invocation exactly what it will and won't touch. CLAUDE.md
already requires tag-filterable tasks, but no vocabulary or convention existed, and
the TODO explicitly warns against the opposite failure mode: **over-tagging**.
## Decision
### Two-tier tagging
**Tier 1 — role/service tag (mechanical).** The tag equals the role name, applied
once at the role-import level:
```yaml
roles:
- role: photoprism
tags: [photoprism]
```
Ansible propagates it to every task in the role. Because one service = one role
(ADR-004), this single rule covers both the *layer/role* and *single-service*
targeting axes with zero per-task burden. Role-less lifecycle playbooks
(e.g. `bootstrap.yml`) carry a single playbook-identity tag instead.
**Tier 2 — concern tag (curated).** A small **closed list** of cross-cutting concern
tags, applied per-task/block **only where a task genuinely belongs to that concern**.
### The closed concern list
A concern earns a tag only if it (a) appears in 2+ roles, (b) is worth running as a
slice on its own, and (c) doesn't overlap confusingly with another.
| Tag | Covers |
|-----|--------|
| `packages` | apt package install/management |
| `users` | accounts, groups, sudo |
| `firewall` | nftables rulesets & port definitions (ADR-002) |
| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl |
| `logging` | Alloy / log-shipping config (ADR-018) |
| `monitoring` | metric exporters / health checks |
| `config` | render templated config/compose files to disk — **no restart** |
| `deploy` | bring services up / restart (`compose up -d`) |
| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) |
The `config`/`deploy` split lets you re-render and diff configuration (`--tags
config`) without bouncing services, then restart deliberately (`--tags deploy`).
`backup` and `secrets` are intentionally omitted until the roles needing them exist.
### `always` / `never`
- **`always`** — reserved for cheap preflight assertions (vault unlocked, OS is
Debian 13, required vars present), so even `--tags config` runs its safety guards.
- **`never`** — reserved for destructive/expensive opt-in tasks, each paired with a
descriptive tag (e.g. `tags: [never, force_pull]`); they run only when named.
### Predictability principle: tags are union-only
`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. boma therefore
targets **one axis at a time**: either a role/service *or* a concern, never an
intersection like "photoprism's firewall only." If that's ever needed, just run
`--tags photoprism` (idempotent and fast). Designing for intersection is the
over-tagging trap; we decline it on purpose.
### Terraform / Proxmox VM tags (metadata only)
Every Terraform-managed VM carries exactly three Proxmox tags:
| Tag | Value | Purpose |
|-----|-------|---------|
| env | `staging` \| `production` | which environment |
| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group |
| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones |
These are **pure metadata for transparency** (glanceable in the Proxmox UI). They do
**not** drive run-targeting and do **not** feed inventory — `scripts/tf_to_inventory.py`
keeps building groups from the `group` output field, the single source of truth.
## Enforcement
`tests/tags.yml` is the single source of truth for the allowed concern/special/
opt-in/playbook tags. `scripts/check-tags.py` (run by `make lint`, covered by
`tests/test_check_tags.py`) scans `roles/` and `playbooks/` and fails on any tag
outside `{role directory names} {tests/tags.yml entries}`.
## Extending the vocabulary
To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the concern
table above with a one-line justification showing it passes the litmus test
(cross-cutting, 2+ roles, distinct). That is the whole gate — lightweight, but it
leaves a paper trail.
## Consequences
- Targeted runs are predictable: only two kinds of tags exist, one of them mechanical.
- Over-tagging is structurally resisted (closed list + lint enforcement).
- Intersection targeting is unavailable by design.
- Authors must keep role tags = role names; the linter enforces it.
## Related
ADR-002 (security baseline / firewall), ADR-004 (one service = one role),
ADR-009 (TF↔Ansible handoff / inventory), ADR-018 (logging).