Compare commits
9 commits
9bdb3017bb
...
2e5a1e1e23
| Author | SHA1 | Date | |
|---|---|---|---|
| 2e5a1e1e23 | |||
| 24b5e9361e | |||
| 9584cc2c76 | |||
| 0b59107b33 | |||
| a3ea2aceb2 | |||
| b45118dac3 | |||
| 24397fa280 | |||
| 04bfc26422 | |||
| 4ed9e9a8bf |
13 changed files with 1295 additions and 6 deletions
10
CLAUDE.md
10
CLAUDE.md
|
|
@ -51,7 +51,11 @@ Full design rationale: `docs/decisions/`
|
|||
## Ansible conventions
|
||||
|
||||
- **FQCN always**: `ansible.builtin.template`, never `template`
|
||||
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering
|
||||
- **Tags** (ADR-019): import each role with its role-name tag once at the play level
|
||||
(Ansible inherits it to every task). Tag a task/block with a concern tag from the
|
||||
approved list (`tests/tags.yml`) only where it genuinely belongs to that concern —
|
||||
don't invent tags or tag for tagging's sake. Target one axis at a time (role/service
|
||||
*or* concern; tags are union/OR, never intersected). `make lint` enforces the vocabulary.
|
||||
- **Handlers**: use `listen:` topic strings, not direct name references
|
||||
- **Variables**: `rolename__varname` double-underscore namespace for role defaults
|
||||
- **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only
|
||||
|
|
@ -144,6 +148,9 @@ Single-contributor, trunk-based (no merge requests / approval gates):
|
|||
## Terraform conventions
|
||||
|
||||
- Terraform owns VM existence only — nothing inside a VM, and no DNS records
|
||||
- Every TF-managed VM carries three Proxmox tags — `<env>`, its inventory `group`, and
|
||||
`managed-by=terraform` — as **metadata only** (ADR-019). They do not feed inventory
|
||||
or run-targeting; `tf_to_inventory.py` still groups by the `group` output field.
|
||||
- Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory)
|
||||
- OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider
|
||||
- Environments are separate directories (`staging/`, `production/`), not workspaces
|
||||
|
|
@ -215,6 +222,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
|
|||
| Update management | `docs/decisions/011-update-management.md` |
|
||||
| Hardware & capacity | `docs/decisions/012-hardware-capacity.md` |
|
||||
| Logging & log integrity | `docs/decisions/018-logging.md` |
|
||||
| Tagging & run-targeting | `docs/decisions/019-tagging.md` |
|
||||
| Adding a new role | `docs/runbooks/new-role.md` |
|
||||
| Adding a new host | `docs/runbooks/new-host.md` |
|
||||
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |
|
||||
|
|
|
|||
1
Makefile
1
Makefile
|
|
@ -67,6 +67,7 @@ collections:
|
|||
lint:
|
||||
$(VENV)/bin/yamllint .
|
||||
$(LINT)
|
||||
$(PYTHON) scripts/check-tags.py
|
||||
|
||||
# ── Testing ───────────────────────────────────────────────────────────────────
|
||||
|
||||
|
|
|
|||
|
|
@ -112,6 +112,10 @@ _(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not cont
|
|||
| Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 |
|
||||
| Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik |
|
||||
|
||||
- **Targeted runs** (ADR-019): playbooks are sliced with `--tags` along two axes —
|
||||
role/service (tag = role name) or a closed list of cross-cutting concerns
|
||||
(`firewall`, `logging`, `config`, `deploy`, …); the vocabulary is lint-enforced.
|
||||
|
||||
---
|
||||
|
||||
## V4 completeness check
|
||||
|
|
|
|||
|
|
@ -28,11 +28,13 @@
|
|||
(all logs) + off-site security subset on `askari` + Grafana on-cluster (not the
|
||||
whole stack on `askari`). Still to design/build: Prometheus + metric exporters,
|
||||
Uptime Kuma, and exactly which alerts live where.
|
||||
7. Define a tagging standard that lets us target runs without over-tagging.
|
||||
7. ~~Define a tagging standard that lets us target runs without over-tagging.~~
|
||||
DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed
|
||||
9-tag concern list (`tests/tags.yml`); union-only targeting; enforced by `make lint`.
|
||||
8. Ensure the right things are backed up (incl. database dumps if we land on PBS).
|
||||
9. Decide: a central database server, or individual database services per app?
|
||||
10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)?
|
||||
11. Deliberate tagging strategy.
|
||||
11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7.
|
||||
|
||||
4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?
|
||||
|
||||
|
|
|
|||
112
docs/decisions/019-tagging.md
Normal file
112
docs/decisions/019-tagging.md
Normal file
|
|
@ -0,0 +1,112 @@
|
|||
# ADR-019 — Tagging standard for targeted, predictable runs
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-06-06). Resolves TODO 3.7 ("Define a tagging standard that lets us
|
||||
target runs without over-tagging") and TODO 3.11 ("Deliberate tagging strategy").
|
||||
|
||||
## Context
|
||||
|
||||
boma wants to run playbooks **targeted** — a single service, a single layer, or a
|
||||
single cross-cutting concern — **transparently and predictably**: a reader should
|
||||
know from a `--tags` invocation exactly what it will and won't touch. CLAUDE.md
|
||||
already requires tag-filterable tasks, but no vocabulary or convention existed, and
|
||||
the TODO explicitly warns against the opposite failure mode: **over-tagging**.
|
||||
|
||||
## Decision
|
||||
|
||||
### Two-tier tagging
|
||||
|
||||
**Tier 1 — role/service tag (mechanical).** The tag equals the role name, applied
|
||||
once at the role-import level:
|
||||
|
||||
```yaml
|
||||
roles:
|
||||
- role: photoprism
|
||||
tags: [photoprism]
|
||||
```
|
||||
|
||||
Ansible propagates it to every task in the role. Because one service = one role
|
||||
(ADR-004), this single rule covers both the *layer/role* and *single-service*
|
||||
targeting axes with zero per-task burden. Role-less lifecycle playbooks
|
||||
(e.g. `bootstrap.yml`) carry a single playbook-identity tag instead.
|
||||
|
||||
**Tier 2 — concern tag (curated).** A small **closed list** of cross-cutting concern
|
||||
tags, applied per-task/block **only where a task genuinely belongs to that concern**.
|
||||
|
||||
### The closed concern list
|
||||
|
||||
A concern earns a tag only if it (a) appears in 2+ roles, (b) is worth running as a
|
||||
slice on its own, and (c) doesn't overlap confusingly with another.
|
||||
|
||||
| Tag | Covers |
|
||||
|-----|--------|
|
||||
| `packages` | apt package install/management |
|
||||
| `users` | accounts, groups, sudo |
|
||||
| `firewall` | nftables rulesets & port definitions (ADR-002) |
|
||||
| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl |
|
||||
| `logging` | Alloy / log-shipping config (ADR-018) |
|
||||
| `monitoring` | metric exporters / health checks |
|
||||
| `config` | render templated config/compose files to disk — **no restart** |
|
||||
| `deploy` | bring services up / restart (`compose up -d`) |
|
||||
| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) |
|
||||
|
||||
The `config`/`deploy` split lets you re-render and diff configuration (`--tags
|
||||
config`) without bouncing services, then restart deliberately (`--tags deploy`).
|
||||
`backup` and `secrets` are intentionally omitted until the roles needing them exist.
|
||||
|
||||
### `always` / `never`
|
||||
|
||||
- **`always`** — reserved for cheap preflight assertions (vault unlocked, OS is
|
||||
Debian 13, required vars present), so even `--tags config` runs its safety guards.
|
||||
- **`never`** — reserved for destructive/expensive opt-in tasks, each paired with a
|
||||
descriptive tag (e.g. `tags: [never, force_pull]`); they run only when named.
|
||||
|
||||
### Predictability principle: tags are union-only
|
||||
|
||||
`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. boma therefore
|
||||
targets **one axis at a time**: either a role/service *or* a concern, never an
|
||||
intersection like "photoprism's firewall only." If that's ever needed, just run
|
||||
`--tags photoprism` (idempotent and fast). Designing for intersection is the
|
||||
over-tagging trap; we decline it on purpose.
|
||||
|
||||
### Terraform / Proxmox VM tags (metadata only)
|
||||
|
||||
Every Terraform-managed VM carries exactly three Proxmox tags:
|
||||
|
||||
| Tag | Value | Purpose |
|
||||
|-----|-------|---------|
|
||||
| env | `staging` \| `production` | which environment |
|
||||
| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group |
|
||||
| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones |
|
||||
|
||||
These are **pure metadata for transparency** (glanceable in the Proxmox UI). They do
|
||||
**not** drive run-targeting and do **not** feed inventory — `scripts/tf_to_inventory.py`
|
||||
keeps building groups from the `group` output field, the single source of truth.
|
||||
|
||||
## Enforcement
|
||||
|
||||
`tests/tags.yml` is the single source of truth for the allowed concern/special/
|
||||
opt-in/playbook tags. `scripts/check-tags.py` (run by `make lint`, covered by
|
||||
`tests/test_check_tags.py`) scans `roles/` and `playbooks/` and fails on any tag
|
||||
outside `{role directory names} ∪ {tests/tags.yml entries}`.
|
||||
Molecule scenario files (`roles/*/molecule/**`) are excluded from the scan — they are test orchestration, not the production run-targeting surface this standard governs.
|
||||
|
||||
## Extending the vocabulary
|
||||
|
||||
To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the concern
|
||||
table above with a one-line justification showing it passes the litmus test
|
||||
(cross-cutting, 2+ roles, distinct). That is the whole gate — lightweight, but it
|
||||
leaves a paper trail.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Targeted runs are predictable: only two kinds of tags exist, one of them mechanical.
|
||||
- Over-tagging is structurally resisted (closed list + lint enforcement).
|
||||
- Intersection targeting is unavailable by design.
|
||||
- Authors must keep role tags = role names. The linter enforces the *vocabulary* (every tag must be a known role name or an approved tag); the role-tag-equals-role-name rule itself is a convention the linter does not separately check.
|
||||
|
||||
## Related
|
||||
|
||||
ADR-002 (security baseline / firewall), ADR-004 (one service = one role),
|
||||
ADR-009 (TF↔Ansible handoff / inventory), ADR-018 (logging).
|
||||
728
docs/superpowers/plans/2026-06-06-tagging-strategy.md
Normal file
728
docs/superpowers/plans/2026-06-06-tagging-strategy.md
Normal file
|
|
@ -0,0 +1,728 @@
|
|||
# Ansible Tagging Standard Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Establish a two-tier Ansible tagging standard (role-name tags + a closed concern list) with machine-enforced vocabulary, plus a Proxmox VM metadata-tag convention, so playbook runs are targeted, transparent, and predictable.
|
||||
|
||||
**Architecture:** A single source-of-truth YAML (`tests/tags.yml`) lists the allowed concern/special/opt-in/playbook tags. A Python checker (`scripts/check-tags.py`) scans `roles/` and `playbooks/`, computes the allowed set as `{role dir names} ∪ {tags.yml entries}`, and fails `make lint` on any unknown tag. Terraform gets a documented three-tag VM convention (metadata only). The standard is recorded as ADR-019 and folded into CLAUDE.md.
|
||||
|
||||
**Tech Stack:** Python 3 (stdlib + PyYAML, already present via ansible-core), pytest (already in `requirements.txt`), Make, Terraform (HCL edit only — not `init`ed), Markdown docs.
|
||||
|
||||
---
|
||||
|
||||
## File structure
|
||||
|
||||
| File | Responsibility | Action |
|
||||
|------|----------------|--------|
|
||||
| `tests/tags.yml` | Single source of truth: allowed concern/special/opt-in/playbook tags | Create |
|
||||
| `scripts/check-tags.py` | Scan `roles/`+`playbooks/`, fail on tags outside the allowed set | Create |
|
||||
| `tests/test_check_tags.py` | Unit tests for the checker (mirrors `tests/test_capacity_scan.py`) | Create |
|
||||
| `Makefile` | Wire `check-tags.py` into the `lint` target | Modify |
|
||||
| `playbooks/site.yml` | Fix `docker_host` role tag (`docker` → `docker_host`) | Modify |
|
||||
| `docs/decisions/019-tagging.md` | The ADR (the standard itself) | Create |
|
||||
| `CLAUDE.md` | Reword tag rule; add Proxmox tag convention; add ADR-019 to Further reading | Modify |
|
||||
| `terraform/environments/staging/main.tf` | Add `managed-by=terraform` tag | Modify |
|
||||
| `terraform/environments/production/main.tf` | Add `managed-by=terraform` tag | Modify |
|
||||
| `docs/TODO.md` | Mark 3.7 and 3.11 DECIDED | Modify |
|
||||
| `docs/CAPABILITIES.md` | Note targeted runs as a capability | Modify |
|
||||
|
||||
Notes for the implementer:
|
||||
- The repo venv is `.venv`. Run Python as `.venv/bin/python` (Makefile vars: `PYTHON := .venv/bin/python`). If `.venv` is missing, run `make setup` first.
|
||||
- PyYAML is available in the venv (ansible-core depends on it) — `import yaml` works.
|
||||
- Terraform is **not** `init`ed in this repo, so `terraform validate`/`plan` will fail offline. Only use `terraform fmt` (offline-safe) for the HCL tasks.
|
||||
- Before any `git commit`, the pre-commit hook decrypts `vault.yml`, so the vault agent must be unlocked: run `rbw unlocked` (exit 0 = good). If locked, ask the user to `rbw unlock` and wait. None of these tasks touch vault files, but the hook still runs.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Tag vocabulary file (`tests/tags.yml`)
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/tags.yml`
|
||||
|
||||
- [ ] **Step 1: Create the vocabulary file**
|
||||
|
||||
Create `tests/tags.yml` with exactly this content:
|
||||
|
||||
```yaml
|
||||
---
|
||||
# Allowed Ansible tag vocabulary — single source of truth for scripts/check-tags.py.
|
||||
# Authoritative reference & rationale: docs/decisions/019-tagging.md.
|
||||
#
|
||||
# The full allowed set the linter enforces is:
|
||||
# {role directory names under roles/} ∪ everything listed below.
|
||||
#
|
||||
# To add a CONCERN tag: add it here AND add a row to the ADR-019 table with a
|
||||
# one-line justification (cross-cutting, used in 2+ roles, distinct).
|
||||
|
||||
# Cross-cutting concern tags, applied per-task/block where a task belongs to the
|
||||
# concern. Targeted one at a time (tags are union/OR, never intersected).
|
||||
concerns:
|
||||
- packages # apt package install/management
|
||||
- users # accounts, groups, sudo
|
||||
- firewall # nftables rulesets & port definitions (ADR-002)
|
||||
- hardening # security baseline — sshd config, fail2ban, auditd, sysctl
|
||||
- logging # Alloy / log-shipping config (ADR-018)
|
||||
- monitoring # metric exporters / health checks
|
||||
- config # render templated config/compose files to disk — no restart
|
||||
- deploy # bring services up / restart (compose up -d)
|
||||
- proxy # reverse-proxy + TLS registration (Traefik routes, Authentik)
|
||||
|
||||
# Ansible built-in special tags. Narrow use only:
|
||||
# always — cheap preflight assertions (run regardless of --tags)
|
||||
# never — destructive/expensive tasks, paired with an opt-in tag below
|
||||
special:
|
||||
- always
|
||||
- never
|
||||
|
||||
# `never`-paired opt-in tags: destructive/expensive tasks that only run when
|
||||
# named explicitly (e.g. `tags: [never, force_pull]`). Empty until a role adds one.
|
||||
opt_ins: []
|
||||
|
||||
# Playbook-level identity tags for role-less lifecycle plays (e.g. bootstrap.yml).
|
||||
playbooks:
|
||||
- bootstrap
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify it parses and has the expected shape**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
.venv/bin/python -c "import yaml; d=yaml.safe_load(open('tests/tags.yml')); assert len(d['concerns'])==9, d['concerns']; assert d['special']==['always','never']; assert d['opt_ins']==[]; assert d['playbooks']==['bootstrap']; print('tags.yml OK')"
|
||||
```
|
||||
Expected: prints `tags.yml OK` and exits 0.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/tags.yml
|
||||
git commit -m "feat(tags): add allowed-tag vocabulary (tests/tags.yml)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Checker core — tag collection & allowed-set helpers
|
||||
|
||||
**Files:**
|
||||
- Create: `scripts/check-tags.py`
|
||||
- Test: `tests/test_check_tags.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Create `tests/test_check_tags.py`:
|
||||
|
||||
```python
|
||||
import importlib.util
|
||||
import pathlib
|
||||
|
||||
_PATH = pathlib.Path(__file__).resolve().parent.parent / "scripts" / "check-tags.py"
|
||||
_spec = importlib.util.spec_from_file_location("check_tags", _PATH)
|
||||
ct = importlib.util.module_from_spec(_spec)
|
||||
_spec.loader.exec_module(ct)
|
||||
|
||||
|
||||
def test_collect_tags_list_form():
|
||||
node = {"name": "t", "tags": ["firewall", "users"]}
|
||||
assert ct.collect_tags(node) == {"firewall", "users"}
|
||||
|
||||
|
||||
def test_collect_tags_string_form():
|
||||
node = {"name": "t", "tags": "always"}
|
||||
assert ct.collect_tags(node) == {"always"}
|
||||
|
||||
|
||||
def test_collect_tags_nested_blocks_and_roles():
|
||||
doc = [
|
||||
{"hosts": "all", "roles": [{"role": "base", "tags": ["base"]}]},
|
||||
{"block": [{"name": "x", "tags": ["config"]}], "tags": ["deploy"]},
|
||||
]
|
||||
assert ct.collect_tags(doc) == {"base", "config", "deploy"}
|
||||
|
||||
|
||||
def test_collect_tags_ignores_templated_values():
|
||||
node = {"tags": ["{{ dynamic }}", "logging"]}
|
||||
assert ct.collect_tags(node) == {"logging"}
|
||||
|
||||
|
||||
def test_load_vocab_unions_all_categories():
|
||||
vocab = ct.load_vocab()
|
||||
assert "firewall" in vocab # concern
|
||||
assert "always" in vocab # special
|
||||
assert "bootstrap" in vocab # playbook identity
|
||||
assert len([c for c in vocab]) >= 12
|
||||
|
||||
|
||||
def test_role_names_reads_role_dirs():
|
||||
names = ct.role_names()
|
||||
assert "base" in names
|
||||
assert "docker_host" in names
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `.venv/bin/python -m pytest tests/test_check_tags.py -v`
|
||||
Expected: FAIL — `ModuleNotFoundError` / file not found for `scripts/check-tags.py` (the module can't be imported yet).
|
||||
|
||||
- [ ] **Step 3: Write the minimal implementation**
|
||||
|
||||
Create `scripts/check-tags.py`:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Validate that every Ansible tag used under roles/ and playbooks/ belongs to the
|
||||
approved vocabulary. Single source of truth: tests/tags.yml. Rationale: ADR-019.
|
||||
|
||||
Allowed set = {role directory names under roles/} ∪ {concerns, special, opt_ins,
|
||||
playbooks from tests/tags.yml}. Templated tags (containing "{{") are skipped —
|
||||
they can't be statically validated.
|
||||
|
||||
Usage: python3 scripts/check-tags.py
|
||||
Exit 0 = all tags allowed; exit 1 = unknown tag(s) found.
|
||||
"""
|
||||
import pathlib
|
||||
import sys
|
||||
|
||||
import yaml
|
||||
|
||||
REPO = pathlib.Path(__file__).resolve().parent.parent
|
||||
VOCAB_FILE = REPO / "tests" / "tags.yml"
|
||||
SCAN_DIRS = ("roles", "playbooks")
|
||||
|
||||
|
||||
class _IgnoreUnknownTags(yaml.SafeLoader):
|
||||
"""SafeLoader that tolerates custom YAML tags (e.g. !vault) instead of crashing."""
|
||||
|
||||
|
||||
def _ignore(loader, tag_suffix, node):
|
||||
return None
|
||||
|
||||
|
||||
_IgnoreUnknownTags.add_multi_constructor("", _ignore)
|
||||
_IgnoreUnknownTags.add_multi_constructor("!", _ignore)
|
||||
|
||||
|
||||
def _static_str(value):
|
||||
return isinstance(value, str) and "{{" not in value
|
||||
|
||||
|
||||
def load_vocab(path=VOCAB_FILE):
|
||||
data = yaml.safe_load(path.read_text()) or {}
|
||||
vocab = set()
|
||||
for key in ("concerns", "special", "opt_ins", "playbooks"):
|
||||
vocab.update(data.get(key) or [])
|
||||
return vocab
|
||||
|
||||
|
||||
def role_names(repo=REPO):
|
||||
roles_dir = repo / "roles"
|
||||
if not roles_dir.is_dir():
|
||||
return set()
|
||||
return {p.name for p in roles_dir.iterdir() if p.is_dir()}
|
||||
|
||||
|
||||
def collect_tags(node):
|
||||
"""Recursively collect every static tag string under any 'tags:' key."""
|
||||
tags = set()
|
||||
if isinstance(node, dict):
|
||||
for key, value in node.items():
|
||||
if key == "tags":
|
||||
if _static_str(value):
|
||||
tags.add(value)
|
||||
elif isinstance(value, list):
|
||||
tags.update(t for t in value if _static_str(t))
|
||||
tags |= collect_tags(value)
|
||||
elif isinstance(node, list):
|
||||
for item in node:
|
||||
tags |= collect_tags(item)
|
||||
return tags
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: no cover
|
||||
sys.exit(0)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `.venv/bin/python -m pytest tests/test_check_tags.py -v`
|
||||
Expected: PASS (all 6 tests).
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/check-tags.py tests/test_check_tags.py
|
||||
git commit -m "feat(tags): checker helpers — tag collection & allowed-set"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Checker validation — scan files and fail on unknown tags
|
||||
|
||||
**Files:**
|
||||
- Modify: `scripts/check-tags.py`
|
||||
- Test: `tests/test_check_tags.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Append to `tests/test_check_tags.py`:
|
||||
|
||||
```python
|
||||
def test_scan_text_collects_from_yaml_string():
|
||||
text = """
|
||||
- hosts: all
|
||||
roles:
|
||||
- role: base
|
||||
tags: [base]
|
||||
tasks:
|
||||
- name: open port
|
||||
tags: [firewall]
|
||||
"""
|
||||
assert ct.scan_text(text) == {"base", "firewall"}
|
||||
|
||||
|
||||
def test_scan_text_tolerates_custom_yaml_tags():
|
||||
text = "- name: t\n secret: !vault xxx\n tags: [users]\n"
|
||||
assert ct.scan_text(text) == {"users"}
|
||||
|
||||
|
||||
def test_find_violations_flags_unknown_tag():
|
||||
allowed = {"base", "firewall"}
|
||||
used = {"base", "frewall"} # typo
|
||||
assert ct.find_violations(used, allowed) == ["frewall"]
|
||||
|
||||
|
||||
def test_find_violations_empty_when_all_allowed():
|
||||
assert ct.find_violations({"base", "firewall"}, {"base", "firewall"}) == []
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `.venv/bin/python -m pytest tests/test_check_tags.py -v`
|
||||
Expected: FAIL — `AttributeError: module 'check_tags' has no attribute 'scan_text'` (and `find_violations`).
|
||||
|
||||
- [ ] **Step 3: Add the scanning + validation functions**
|
||||
|
||||
In `scripts/check-tags.py`, replace the final block:
|
||||
|
||||
```python
|
||||
if __name__ == "__main__": # pragma: no cover
|
||||
sys.exit(0)
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```python
|
||||
def scan_text(text):
|
||||
"""Collect static tags from a (possibly multi-document) YAML string."""
|
||||
found = set()
|
||||
for doc in yaml.load_all(text, Loader=_IgnoreUnknownTags):
|
||||
found |= collect_tags(doc)
|
||||
return found
|
||||
|
||||
|
||||
def iter_yaml_files(repo=REPO, scan_dirs=SCAN_DIRS):
|
||||
for name in scan_dirs:
|
||||
base = repo / name
|
||||
if not base.is_dir():
|
||||
continue
|
||||
for ext in ("*.yml", "*.yaml"):
|
||||
yield from sorted(base.rglob(ext))
|
||||
|
||||
|
||||
def find_violations(used, allowed):
|
||||
return sorted(used - allowed)
|
||||
|
||||
|
||||
def main():
|
||||
allowed = load_vocab() | role_names()
|
||||
violations = []
|
||||
for path in iter_yaml_files():
|
||||
try:
|
||||
used = scan_text(path.read_text())
|
||||
except yaml.YAMLError as exc:
|
||||
print(f"warning: could not parse {path}: {exc}", file=sys.stderr)
|
||||
continue
|
||||
for tag in find_violations(used, allowed):
|
||||
violations.append((path.relative_to(REPO), tag))
|
||||
|
||||
if violations:
|
||||
print(
|
||||
"error: Ansible tag(s) not in tests/tags.yml or role names "
|
||||
"(see docs/decisions/019-tagging.md):",
|
||||
file=sys.stderr,
|
||||
)
|
||||
for relpath, tag in violations:
|
||||
print(f" {relpath}: '{tag}'", file=sys.stderr)
|
||||
print(f"\nallowed: {', '.join(sorted(allowed))}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"check-tags: OK ({len(allowed)} tags allowed across {len(SCAN_DIRS)} dirs)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `.venv/bin/python -m pytest tests/test_check_tags.py -v`
|
||||
Expected: PASS (all 10 tests).
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/check-tags.py tests/test_check_tags.py
|
||||
git commit -m "feat(tags): scan roles/+playbooks/ and fail on unknown tags"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Reconcile existing tags & wire into `make lint`
|
||||
|
||||
**Files:**
|
||||
- Modify: `playbooks/site.yml:18-19`
|
||||
- Modify: `Makefile` (the `lint:` target)
|
||||
|
||||
- [ ] **Step 1: Run the checker against the current repo (expect one violation)**
|
||||
|
||||
Run: `.venv/bin/python scripts/check-tags.py`
|
||||
Expected: FAIL (exit 1) reporting `playbooks/site.yml: 'docker'` — because the `docker_host` role is tagged `[docker]`, which is neither a role name nor a vocabulary tag. This confirms the checker works end-to-end.
|
||||
|
||||
- [ ] **Step 2: Fix the role tag to equal the role name**
|
||||
|
||||
In `playbooks/site.yml`, change:
|
||||
|
||||
```yaml
|
||||
- role: docker_host
|
||||
tags: [docker]
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```yaml
|
||||
- role: docker_host
|
||||
tags: [docker_host]
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Re-run the checker (expect clean)**
|
||||
|
||||
Run: `.venv/bin/python scripts/check-tags.py`
|
||||
Expected: PASS — prints `check-tags: OK (... tags allowed across 2 dirs)` and exits 0.
|
||||
(Allowed set now includes role names `base`, `docker_host`; used tags are `base`, `docker_host`, `bootstrap` — all allowed.)
|
||||
|
||||
- [ ] **Step 4: Wire the checker into `make lint`**
|
||||
|
||||
In `Makefile`, change the `lint:` target from:
|
||||
|
||||
```makefile
|
||||
lint:
|
||||
$(VENV)/bin/yamllint .
|
||||
$(LINT)
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```makefile
|
||||
lint:
|
||||
$(VENV)/bin/yamllint .
|
||||
$(LINT)
|
||||
$(PYTHON) scripts/check-tags.py
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the full lint suite and the test suite**
|
||||
|
||||
Run: `make lint && .venv/bin/python -m pytest tests/test_check_tags.py -v`
|
||||
Expected: yamllint passes, ansible-lint passes, `check-tags: OK`, and all pytest tests PASS.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add playbooks/site.yml Makefile
|
||||
git commit -m "feat(tags): enforce tag vocabulary in make lint; fix docker_host tag"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Terraform Proxmox VM tag convention
|
||||
|
||||
**Files:**
|
||||
- Modify: `terraform/environments/staging/main.tf` (the `tags =` line in `module "vms"`)
|
||||
- Modify: `terraform/environments/production/main.tf` (the `tags =` line in `module "vms"`)
|
||||
|
||||
- [ ] **Step 1: Add `managed-by=terraform` to the staging VM tags**
|
||||
|
||||
In `terraform/environments/staging/main.tf`, change:
|
||||
|
||||
```hcl
|
||||
tags = ["staging", each.value.group]
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```hcl
|
||||
tags = ["staging", each.value.group, "managed-by=terraform"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add `managed-by=terraform` to the production VM tags**
|
||||
|
||||
In `terraform/environments/production/main.tf`, change:
|
||||
|
||||
```hcl
|
||||
tags = ["production", each.value.group]
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```hcl
|
||||
tags = ["production", each.value.group, "managed-by=terraform"]
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Format-check the HCL (offline-safe)**
|
||||
|
||||
Run: `terraform -chdir=terraform/environments/staging fmt && terraform -chdir=terraform/environments/production fmt`
|
||||
Expected: either no output (already formatted) or the filename printed (reformatted). Exit 0.
|
||||
(Do NOT run `terraform validate`/`plan` — Terraform is not `init`ed in this repo and they will fail offline.)
|
||||
|
||||
- [ ] **Step 4: Confirm the edits**
|
||||
|
||||
Run: `grep -n "managed-by=terraform" terraform/environments/staging/main.tf terraform/environments/production/main.tf`
|
||||
Expected: one match in each file.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add terraform/environments/staging/main.tf terraform/environments/production/main.tf
|
||||
git commit -m "feat(tags): Proxmox VM metadata convention (managed-by=terraform)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Documentation — ADR-019, CLAUDE.md, TODO, CAPABILITIES
|
||||
|
||||
**Files:**
|
||||
- Create: `docs/decisions/019-tagging.md`
|
||||
- Modify: `CLAUDE.md` (Ansible conventions; Terraform conventions; Further reading)
|
||||
- Modify: `docs/TODO.md` (items 3.7 and 3.11)
|
||||
- Modify: `docs/CAPABILITIES.md`
|
||||
|
||||
- [ ] **Step 1: Write the ADR**
|
||||
|
||||
Create `docs/decisions/019-tagging.md`:
|
||||
|
||||
````markdown
|
||||
# ADR-019 — Tagging standard for targeted, predictable runs
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-06-06). Resolves TODO 3.7 ("Define a tagging standard that lets us
|
||||
target runs without over-tagging") and TODO 3.11 ("Deliberate tagging strategy").
|
||||
|
||||
## Context
|
||||
|
||||
boma wants to run playbooks **targeted** — a single service, a single layer, or a
|
||||
single cross-cutting concern — **transparently and predictably**: a reader should
|
||||
know from a `--tags` invocation exactly what it will and won't touch. CLAUDE.md
|
||||
already requires tag-filterable tasks, but no vocabulary or convention existed, and
|
||||
the TODO explicitly warns against the opposite failure mode: **over-tagging**.
|
||||
|
||||
## Decision
|
||||
|
||||
### Two-tier tagging
|
||||
|
||||
**Tier 1 — role/service tag (mechanical).** The tag equals the role name, applied
|
||||
once at the role-import level:
|
||||
|
||||
```yaml
|
||||
roles:
|
||||
- role: photoprism
|
||||
tags: [photoprism]
|
||||
```
|
||||
|
||||
Ansible propagates it to every task in the role. Because one service = one role
|
||||
(ADR-004), this single rule covers both the *layer/role* and *single-service*
|
||||
targeting axes with zero per-task burden. Role-less lifecycle playbooks
|
||||
(e.g. `bootstrap.yml`) carry a single playbook-identity tag instead.
|
||||
|
||||
**Tier 2 — concern tag (curated).** A small **closed list** of cross-cutting concern
|
||||
tags, applied per-task/block **only where a task genuinely belongs to that concern**.
|
||||
|
||||
### The closed concern list
|
||||
|
||||
A concern earns a tag only if it (a) appears in 2+ roles, (b) is worth running as a
|
||||
slice on its own, and (c) doesn't overlap confusingly with another.
|
||||
|
||||
| Tag | Covers |
|
||||
|-----|--------|
|
||||
| `packages` | apt package install/management |
|
||||
| `users` | accounts, groups, sudo |
|
||||
| `firewall` | nftables rulesets & port definitions (ADR-002) |
|
||||
| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl |
|
||||
| `logging` | Alloy / log-shipping config (ADR-018) |
|
||||
| `monitoring` | metric exporters / health checks |
|
||||
| `config` | render templated config/compose files to disk — **no restart** |
|
||||
| `deploy` | bring services up / restart (`compose up -d`) |
|
||||
| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) |
|
||||
|
||||
The `config`/`deploy` split lets you re-render and diff configuration (`--tags
|
||||
config`) without bouncing services, then restart deliberately (`--tags deploy`).
|
||||
`backup` and `secrets` are intentionally omitted until the roles needing them exist.
|
||||
|
||||
### `always` / `never`
|
||||
|
||||
- **`always`** — reserved for cheap preflight assertions (vault unlocked, OS is
|
||||
Debian 13, required vars present), so even `--tags config` runs its safety guards.
|
||||
- **`never`** — reserved for destructive/expensive opt-in tasks, each paired with a
|
||||
descriptive tag (e.g. `tags: [never, force_pull]`); they run only when named.
|
||||
|
||||
### Predictability principle: tags are union-only
|
||||
|
||||
`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. boma therefore
|
||||
targets **one axis at a time**: either a role/service *or* a concern, never an
|
||||
intersection like "photoprism's firewall only." If that's ever needed, just run
|
||||
`--tags photoprism` (idempotent and fast). Designing for intersection is the
|
||||
over-tagging trap; we decline it on purpose.
|
||||
|
||||
### Terraform / Proxmox VM tags (metadata only)
|
||||
|
||||
Every Terraform-managed VM carries exactly three Proxmox tags:
|
||||
|
||||
| Tag | Value | Purpose |
|
||||
|-----|-------|---------|
|
||||
| env | `staging` \| `production` | which environment |
|
||||
| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group |
|
||||
| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones |
|
||||
|
||||
These are **pure metadata for transparency** (glanceable in the Proxmox UI). They do
|
||||
**not** drive run-targeting and do **not** feed inventory — `scripts/tf_to_inventory.py`
|
||||
keeps building groups from the `group` output field, the single source of truth.
|
||||
|
||||
## Enforcement
|
||||
|
||||
`tests/tags.yml` is the single source of truth for the allowed concern/special/
|
||||
opt-in/playbook tags. `scripts/check-tags.py` (run by `make lint`, covered by
|
||||
`tests/test_check_tags.py`) scans `roles/` and `playbooks/` and fails on any tag
|
||||
outside `{role directory names} ∪ {tests/tags.yml entries}`.
|
||||
|
||||
## Extending the vocabulary
|
||||
|
||||
To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the concern
|
||||
table above with a one-line justification showing it passes the litmus test
|
||||
(cross-cutting, 2+ roles, distinct). That is the whole gate — lightweight, but it
|
||||
leaves a paper trail.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Targeted runs are predictable: only two kinds of tags exist, one of them mechanical.
|
||||
- Over-tagging is structurally resisted (closed list + lint enforcement).
|
||||
- Intersection targeting is unavailable by design.
|
||||
- Authors must keep role tags = role names; the linter enforces it.
|
||||
|
||||
## Related
|
||||
|
||||
ADR-002 (security baseline / firewall), ADR-004 (one service = one role),
|
||||
ADR-009 (TF↔Ansible handoff / inventory), ADR-018 (logging).
|
||||
````
|
||||
|
||||
- [ ] **Step 2: Reword the tag rule in CLAUDE.md**
|
||||
|
||||
In `CLAUDE.md`, under **Ansible conventions**, change:
|
||||
|
||||
```markdown
|
||||
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```markdown
|
||||
- **Tags** (ADR-019): import each role with its role-name tag once at the play level
|
||||
(Ansible inherits it to every task). Tag a task/block with a concern tag from the
|
||||
approved list (`tests/tags.yml`) only where it genuinely belongs to that concern —
|
||||
don't invent tags or tag for tagging's sake. Target one axis at a time (role/service
|
||||
*or* concern; tags are union/OR, never intersected). `make lint` enforces the vocabulary.
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Add the Proxmox tag convention to CLAUDE.md**
|
||||
|
||||
In `CLAUDE.md`, under **Terraform conventions**, add this bullet after the existing
|
||||
"Terraform owns VM existence only" bullet:
|
||||
|
||||
```markdown
|
||||
- Every TF-managed VM carries three Proxmox tags — `<env>`, its inventory `group`, and
|
||||
`managed-by=terraform` — as **metadata only** (ADR-019). They do not feed inventory
|
||||
or run-targeting; `tf_to_inventory.py` still groups by the `group` output field.
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Add ADR-019 to the Further reading table**
|
||||
|
||||
In `CLAUDE.md`, in the **Further reading** table, add this row immediately after the
|
||||
`Logging & log integrity` row:
|
||||
|
||||
```markdown
|
||||
| Tagging & run-targeting | `docs/decisions/019-tagging.md` |
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Mark the TODO items decided**
|
||||
|
||||
In `docs/TODO.md`, change line for item 3.7:
|
||||
|
||||
```markdown
|
||||
7. Define a tagging standard that lets us target runs without over-tagging.
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```markdown
|
||||
7. ~~Define a tagging standard that lets us target runs without over-tagging.~~
|
||||
DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed
|
||||
9-tag concern list (`tests/tags.yml`); union-only targeting; enforced by `make lint`.
|
||||
```
|
||||
|
||||
and change item 3.11:
|
||||
|
||||
```markdown
|
||||
11. Deliberate tagging strategy.
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```markdown
|
||||
11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7.
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Note the capability in CAPABILITIES.md**
|
||||
|
||||
Run: `grep -n "^## \|^### " docs/CAPABILITIES.md` to locate the section covering
|
||||
operations / CI / how playbooks are run. Add this bullet under the most appropriate
|
||||
existing section (operations or testing/CI):
|
||||
|
||||
```markdown
|
||||
- **Targeted runs** (ADR-019): playbooks are sliced with `--tags` along two axes —
|
||||
role/service (tag = role name) or a closed list of cross-cutting concerns
|
||||
(`firewall`, `logging`, `config`, `deploy`, …); the vocabulary is lint-enforced.
|
||||
```
|
||||
|
||||
- [ ] **Step 7: Verify docs are consistent and lint still passes**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
grep -n "019-tagging" CLAUDE.md && grep -c "managed-by=terraform" CLAUDE.md && make lint
|
||||
```
|
||||
Expected: the ADR-019 row is found in CLAUDE.md, `managed-by=terraform` appears at
|
||||
least once, and `make lint` passes (including `check-tags: OK`).
|
||||
|
||||
- [ ] **Step 8: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/decisions/019-tagging.md CLAUDE.md docs/TODO.md docs/CAPABILITIES.md
|
||||
git commit -m "docs(tags): ADR-019 + CLAUDE.md/TODO/CAPABILITIES (tagging standard)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Final verification
|
||||
|
||||
- [ ] Run the full suite once more: `make lint && .venv/bin/python -m pytest tests/ -v`
|
||||
Expected: yamllint + ansible-lint pass, `check-tags: OK`, all tests PASS.
|
||||
- [ ] Confirm a deliberate violation is caught: temporarily add `tags: [bogus]` to a
|
||||
task in `playbooks/site.yml`, run `.venv/bin/python scripts/check-tags.py`, confirm it
|
||||
exits 1 reporting `'bogus'`, then revert the edit.
|
||||
- [ ] `git log --oneline -7` shows the six task commits.
|
||||
188
docs/superpowers/specs/2026-06-06-tagging-strategy-design.md
Normal file
188
docs/superpowers/specs/2026-06-06-tagging-strategy-design.md
Normal file
|
|
@ -0,0 +1,188 @@
|
|||
# Design — Ansible tagging standard (targeted, predictable runs)
|
||||
|
||||
- **Date:** 2026-06-06
|
||||
- **Status:** Approved design — pending implementation plan
|
||||
- **Resolves:** TODO 3.7 ("Define a tagging standard that lets us target runs without
|
||||
over-tagging") and TODO 3.11 ("Deliberate tagging strategy") — the same thread
|
||||
- **Becomes:** ADR-019 (this design is the basis for that ADR)
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
boma wants to run playbooks **targeted** — a single service, a single layer, or a
|
||||
single cross-cutting concern — and to do so **transparently and predictably**: you
|
||||
should be able to look at a `--tags` invocation and know exactly what it will and won't
|
||||
touch. CLAUDE.md already mandates that every task be tag-filterable, but no *vocabulary*
|
||||
or *naming convention* exists. Without one, tags proliferate ad-hoc per role and the
|
||||
"predictable" property is lost — and the TODO explicitly warns against the opposite
|
||||
failure mode, **over-tagging**.
|
||||
|
||||
The repo is effectively greenfield for this: `base` and `docker_host` are empty, and the
|
||||
only tags in existence are `[base]`/`[docker]` in `site.yml` and `[bootstrap]` in
|
||||
`bootstrap.yml`. So we can bake the standard into role-authoring conventions *before*
|
||||
there are a dozen service roles to retrofit.
|
||||
|
||||
## Targeting axes (what we want to slice by)
|
||||
|
||||
1. **Layer / role** — `--tags base`, `--tags docker`
|
||||
2. **Single service** — `--tags photoprism`, `--tags traefik`
|
||||
3. **Concern / function** — `--tags firewall`, `--tags logging`, …
|
||||
|
||||
Lifecycle phases (bootstrap/config/deploy) are **not** a tag axis — `bootstrap.yml` vs
|
||||
`site.yml` already separate those as whole playbooks.
|
||||
|
||||
Key simplification: because of ADR-004 (*one service = one role*, role name = service
|
||||
name), axes 1 and 2 are the **same mechanism** — a tag equal to the role name. Only the
|
||||
concern axis needs a curated vocabulary.
|
||||
|
||||
## Approach (chosen): two-tier tagging
|
||||
|
||||
**Tier 1 — role/service tag (mechanical).** The tag *equals the role name*, applied
|
||||
**once** at the role-import level in the playbook:
|
||||
|
||||
```yaml
|
||||
roles:
|
||||
- role: photoprism
|
||||
tags: [photoprism]
|
||||
```
|
||||
|
||||
Ansible propagates the tag to every task in the role. This covers both the layer/role
|
||||
and single-service axes with one rule and **zero per-task burden**.
|
||||
|
||||
**Tier 2 — concern tag (curated).** A small **closed, documented list** of cross-cutting
|
||||
concern tags, applied per-task/block **only where a task genuinely belongs to that
|
||||
concern**. `--tags firewall` then hits firewall tasks in `base` and in every service
|
||||
role.
|
||||
|
||||
Rejected alternatives: *concern-only/flat* (loses natural `--tags <service>` ergonomics);
|
||||
*rich multi-dimensional* (role+service+concern+lifecycle+ad-hoc per task) — that is
|
||||
precisely the over-tagging the TODO warns against.
|
||||
|
||||
## The closed concern list
|
||||
|
||||
Litmus test for earning a spot: a concern must (a) appear in **2+ roles**, (b) be
|
||||
something you'd realistically want to run as a slice on its own, and (c) not overlap
|
||||
confusingly with another.
|
||||
|
||||
**Baseline concerns** (mostly in `base`, some echoed in service roles):
|
||||
|
||||
| Tag | Covers |
|
||||
|-----|--------|
|
||||
| `packages` | apt package install/management |
|
||||
| `users` | accounts, groups, sudo |
|
||||
| `firewall` | nftables rulesets & port definitions (ADR-002) |
|
||||
| `hardening` | security baseline — sshd config, fail2ban, auditd, sysctl |
|
||||
| `logging` | Alloy / log-shipping config (ADR-018) |
|
||||
| `monitoring` | metric exporters / health checks |
|
||||
|
||||
**Service concerns** (in every service role, ADR-004):
|
||||
|
||||
| Tag | Covers |
|
||||
|-----|--------|
|
||||
| `config` | render templated config/compose files to disk — **no restart** |
|
||||
| `deploy` | bring services up / restart (`compose up -d`) |
|
||||
| `proxy` | reverse-proxy + TLS registration (Traefik routes, Authentik) |
|
||||
|
||||
Nine tags total. The `config`/`deploy` split is deliberate and high-value: `--tags
|
||||
config` re-renders and lets you diff configuration without bouncing services; `--tags
|
||||
deploy` does the restart.
|
||||
|
||||
`backup` and `secrets` are **intentionally omitted** until the roles that need them
|
||||
exist — they enter via the extend process, not speculative reservation.
|
||||
|
||||
## `always` / `never` policy
|
||||
|
||||
boma uses Ansible's two built-in special tags, narrowly:
|
||||
|
||||
- **`always`** — reserved strictly for **cheap preflight assertions** (vault unlocked,
|
||||
OS is Debian 13, required vars present). Ensures even `--tags config` runs its safety
|
||||
guards.
|
||||
- **`never`** — reserved for **destructive/expensive opt-in tasks**, each paired with a
|
||||
descriptive tag (e.g. `never, force_pull` or `never, restore`). They never run unless
|
||||
explicitly named, keeping dangerous actions out of normal runs. The descriptive
|
||||
partner tag is a documented `never`-paired opt-in (allowed by the linter).
|
||||
|
||||
## Predictability principle: tags are union-only
|
||||
|
||||
`--tags a,b` runs tasks tagged a **OR** b — Ansible has no native AND. Rather than fight
|
||||
this, we make it an explicit principle: **boma targets one axis at a time** — *either* a
|
||||
role/service (`--tags photoprism`) *or* a concern (`--tags firewall`), never an
|
||||
intersection like "photoprism's firewall only." If that is ever genuinely needed, the
|
||||
answer is "just run `--tags photoprism`" (idempotent and fast). Designing for
|
||||
intersection is the over-tagging trap; we decline it on purpose.
|
||||
|
||||
## Reconciling the existing CLAUDE.md rule
|
||||
|
||||
CLAUDE.md currently says *"every task must have at least one tag."* Under the two-tier
|
||||
model the role tag is applied **once at the play/import level** and **inherited** by
|
||||
every task, so tasks are always reachable without hand-tagging each one. The rule is
|
||||
**reworded** to:
|
||||
|
||||
> Import each role with its role-name tag (once, at the play level). Within a role, tag a
|
||||
> task/block with a concern tag from the approved list **only where it genuinely belongs
|
||||
> to that concern** — don't invent tags or tag for tagging's sake.
|
||||
|
||||
This directly resolves the "without over-tagging" tension.
|
||||
|
||||
## Terraform / Proxmox VM tags (metadata only)
|
||||
|
||||
Formalize the convention that already half-exists in `staging/main.tf`
|
||||
(`tags = ["staging", each.value.group]`). Every TF-managed VM gets exactly three tags:
|
||||
|
||||
| Tag | Value | Purpose |
|
||||
|-----|-------|---------|
|
||||
| env | `staging` \| `production` | which environment |
|
||||
| role/group | `docker_hosts`, `proxmox_hosts`, … | matches the inventory group |
|
||||
| managed-by | `terraform` | distinguishes IaC VMs from hand-made ones |
|
||||
|
||||
Set as `tags = ["${env}", each.value.group, "managed-by=terraform"]` in the env
|
||||
`main.tf` (env is constant per directory).
|
||||
|
||||
**Explicit non-goals** (stated so nobody wires them up later): these tags are **pure
|
||||
metadata for transparency** — glanceable in the Proxmox UI. They do **not** drive
|
||||
run-targeting and do **not** feed inventory. `scripts/tf_to_inventory.py` keeps building
|
||||
groups from the `group` output field, which stays the single source of truth.
|
||||
|
||||
## Enforcement
|
||||
|
||||
A small **lint check wired into `make lint`**: a script collects every `tags:` value
|
||||
across `roles/` and `playbooks/` and fails if any tag is not in the allowed set:
|
||||
|
||||
```
|
||||
{role names} ∪ {9 concern tags} ∪ {always, never} ∪ {documented never-paired opt-ins}
|
||||
```
|
||||
|
||||
The allowed concern list (and the `never`-paired opt-ins) live in **one
|
||||
machine-readable file, `tests/tags.yml`**, which both the linter reads and the ADR
|
||||
documents — so doc and enforcement cannot drift. This is more honest than ansible-lint's
|
||||
limited built-in tags rule. A unit test (mirroring `tests/test_capacity_scan.py`) covers
|
||||
the checker.
|
||||
|
||||
## The "propose to extend" process
|
||||
|
||||
To add a concern tag: (1) add it to `tests/tags.yml`; (2) add a row to the ADR-019 table
|
||||
with a one-line justification showing it passes the litmus test (cross-cutting, 2+
|
||||
roles, distinct). That is the whole gate — lightweight, but it leaves a paper trail.
|
||||
|
||||
## Deliverables
|
||||
|
||||
- **New `docs/decisions/019-tagging.md`** — the standard: rationale, two-tier model,
|
||||
concern table, union-only principle, `always`/`never` policy, Proxmox tag convention,
|
||||
extend process.
|
||||
- **`tests/tags.yml`** — machine-readable allowed concern list + `never`-paired opt-ins.
|
||||
- **Lint checker script** (e.g. `scripts/check-tags.py`) + **`make lint`** wiring +
|
||||
**`tests/test_check_tags.py`**.
|
||||
- **CLAUDE.md** — reword the tag bullet under *Ansible conventions*; add the Proxmox tag
|
||||
convention under *Terraform conventions*; add ADR-019 to *Further reading*.
|
||||
- **`terraform/environments/{staging,production}/main.tf`** — apply the three-tag
|
||||
convention.
|
||||
- **`docs/TODO.md`** — mark 3.7 and 3.11 DECIDED (ADR-019).
|
||||
- **`docs/CAPABILITIES.md`** — note targeted runs as a capability, if it fits.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Intersection targeting (role ∩ concern) — declined on purpose (see principle).
|
||||
- Lifecycle-phase tags — handled by separate playbooks.
|
||||
- Proxmox tags feeding inventory or run-targeting — metadata only.
|
||||
- `backup`/`secrets` concern tags — added later via the extend process.
|
||||
|
|
@ -16,4 +16,4 @@
|
|||
become: true
|
||||
roles:
|
||||
- role: docker_host
|
||||
tags: [docker]
|
||||
tags: [docker_host]
|
||||
|
|
|
|||
124
scripts/check-tags.py
Normal file
124
scripts/check-tags.py
Normal file
|
|
@ -0,0 +1,124 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Validate that every Ansible tag used under roles/ and playbooks/ belongs to the
|
||||
approved vocabulary. Single source of truth: tests/tags.yml. Rationale: ADR-019.
|
||||
|
||||
Allowed set = {role directory names under roles/} ∪ {concerns, special, opt_ins,
|
||||
playbooks from tests/tags.yml}. Templated tags (containing "{{") are skipped —
|
||||
they can't be statically validated.
|
||||
|
||||
Usage: python3 scripts/check-tags.py
|
||||
Exit 0 = all tags allowed; exit 1 = unknown tag(s) found.
|
||||
"""
|
||||
import pathlib
|
||||
import sys
|
||||
|
||||
import yaml
|
||||
|
||||
REPO = pathlib.Path(__file__).resolve().parent.parent
|
||||
VOCAB_FILE = REPO / "tests" / "tags.yml"
|
||||
SCAN_DIRS = ("roles", "playbooks")
|
||||
|
||||
|
||||
class _IgnoreUnknownTags(yaml.SafeLoader):
|
||||
"""SafeLoader that tolerates custom YAML tags (e.g. !vault) instead of crashing."""
|
||||
|
||||
|
||||
def _ignore(loader, tag_suffix, node):
|
||||
return None
|
||||
|
||||
|
||||
_IgnoreUnknownTags.add_multi_constructor("", _ignore)
|
||||
|
||||
|
||||
def _static_str(value):
|
||||
return isinstance(value, str) and "{{" not in value
|
||||
|
||||
|
||||
def load_vocab(path=VOCAB_FILE):
|
||||
data = yaml.safe_load(path.read_text()) or {}
|
||||
vocab = set()
|
||||
for key in ("concerns", "special", "opt_ins", "playbooks"):
|
||||
vocab.update(data.get(key) or [])
|
||||
return vocab
|
||||
|
||||
|
||||
def role_names(repo=REPO):
|
||||
roles_dir = repo / "roles"
|
||||
if not roles_dir.is_dir():
|
||||
return set()
|
||||
return {p.name for p in roles_dir.iterdir() if p.is_dir()}
|
||||
|
||||
|
||||
def collect_tags(node):
|
||||
"""Recursively collect every static tag string under any 'tags:' key."""
|
||||
# Matches any dict key literally named `tags`; Ansible-tag semantics assumed.
|
||||
tags = set()
|
||||
if isinstance(node, dict):
|
||||
for key, value in node.items():
|
||||
if key == "tags":
|
||||
if _static_str(value):
|
||||
tags.add(value)
|
||||
elif isinstance(value, list):
|
||||
tags.update(t for t in value if _static_str(t))
|
||||
tags |= collect_tags(value)
|
||||
elif isinstance(node, list):
|
||||
for item in node:
|
||||
tags |= collect_tags(item)
|
||||
return tags
|
||||
|
||||
|
||||
def scan_text(text):
|
||||
"""Collect static tags from a (possibly multi-document) YAML string."""
|
||||
found = set()
|
||||
for doc in yaml.load_all(text, Loader=_IgnoreUnknownTags):
|
||||
found |= collect_tags(doc)
|
||||
return found
|
||||
|
||||
|
||||
def iter_yaml_files(repo=REPO, scan_dirs=SCAN_DIRS):
|
||||
for name in scan_dirs:
|
||||
base = repo / name
|
||||
if not base.is_dir():
|
||||
continue
|
||||
for ext in ("*.yml", "*.yaml"):
|
||||
for path in sorted(base.rglob(ext)):
|
||||
# Molecule scenarios are test orchestration, not the production
|
||||
# run-targeting surface this standard governs (ADR-019). Skip them.
|
||||
if "molecule" in path.relative_to(base).parts:
|
||||
continue
|
||||
yield path
|
||||
|
||||
|
||||
def find_violations(used, allowed):
|
||||
return sorted(used - allowed)
|
||||
|
||||
|
||||
def main():
|
||||
allowed = load_vocab() | role_names()
|
||||
violations = []
|
||||
for path in iter_yaml_files():
|
||||
try:
|
||||
used = scan_text(path.read_text())
|
||||
except yaml.YAMLError as exc:
|
||||
print(f"warning: could not parse {path}: {exc}", file=sys.stderr)
|
||||
continue
|
||||
for tag in find_violations(used, allowed):
|
||||
violations.append((path.relative_to(REPO), tag))
|
||||
|
||||
if violations:
|
||||
print(
|
||||
"error: Ansible tag(s) not in tests/tags.yml or role names "
|
||||
"(see docs/decisions/019-tagging.md):",
|
||||
file=sys.stderr,
|
||||
)
|
||||
for relpath, tag in violations:
|
||||
print(f" {relpath}: '{tag}'", file=sys.stderr)
|
||||
print(f"\nallowed: {', '.join(sorted(allowed))}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"check-tags: OK ({len(allowed)} tags allowed across {len(SCAN_DIRS)} dirs)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -35,7 +35,7 @@ module "vms" {
|
|||
ssh_public_keys = var.ssh_public_keys
|
||||
cores = each.value.cores
|
||||
memory_mb = each.value.memory_mb
|
||||
tags = ["production", each.value.group]
|
||||
tags = ["production", each.value.group, "managed-by=terraform"]
|
||||
}
|
||||
|
||||
# Internal DNS records are NOT managed here. Terraform owns VM existence only;
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ module "vms" {
|
|||
ssh_public_keys = var.ssh_public_keys
|
||||
cores = each.value.cores
|
||||
memory_mb = each.value.memory_mb
|
||||
tags = ["staging", each.value.group]
|
||||
tags = ["staging", each.value.group, "managed-by=terraform"]
|
||||
}
|
||||
|
||||
# Internal DNS records are NOT managed here. Terraform owns VM existence only;
|
||||
|
|
|
|||
37
tests/tags.yml
Normal file
37
tests/tags.yml
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
---
|
||||
# Allowed Ansible tag vocabulary — single source of truth for scripts/check-tags.py.
|
||||
# Authoritative reference & rationale: docs/decisions/019-tagging.md.
|
||||
#
|
||||
# The full allowed set the linter enforces is:
|
||||
# {role directory names under roles/} ∪ everything listed below.
|
||||
#
|
||||
# To add a CONCERN tag: add it here AND add a row to the ADR-019 table with a
|
||||
# one-line justification (cross-cutting, used in 2+ roles, distinct).
|
||||
|
||||
# Cross-cutting concern tags, applied per-task/block where a task belongs to the
|
||||
# concern. Targeted one at a time (tags are union/OR, never intersected).
|
||||
concerns:
|
||||
- packages # apt package install/management
|
||||
- users # accounts, groups, sudo
|
||||
- firewall # nftables rulesets & port definitions (ADR-002)
|
||||
- hardening # security baseline — sshd config, fail2ban, auditd, sysctl
|
||||
- logging # Alloy / log-shipping config (ADR-018)
|
||||
- monitoring # metric exporters / health checks
|
||||
- config # render templated config/compose files to disk — no restart
|
||||
- deploy # bring services up / restart (compose up -d)
|
||||
- proxy # reverse-proxy + TLS registration (Traefik routes, Authentik)
|
||||
|
||||
# Ansible built-in special tags. Narrow use only:
|
||||
# always — cheap preflight assertions (run regardless of --tags)
|
||||
# never — destructive/expensive tasks, paired with an opt-in tag below
|
||||
special:
|
||||
- always
|
||||
- never
|
||||
|
||||
# `never`-paired opt-in tags: destructive/expensive tasks that only run when
|
||||
# named explicitly (e.g. `tags: [never, force_pull]`). Empty until a role adds one.
|
||||
opt_ins: []
|
||||
|
||||
# Playbook-level identity tags for role-less lifecycle plays (e.g. bootstrap.yml).
|
||||
playbooks:
|
||||
- bootstrap
|
||||
85
tests/test_check_tags.py
Normal file
85
tests/test_check_tags.py
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
import importlib.util
|
||||
import pathlib
|
||||
|
||||
_PATH = pathlib.Path(__file__).resolve().parent.parent / "scripts" / "check-tags.py"
|
||||
_spec = importlib.util.spec_from_file_location("check_tags", _PATH)
|
||||
ct = importlib.util.module_from_spec(_spec)
|
||||
_spec.loader.exec_module(ct)
|
||||
|
||||
|
||||
def test_collect_tags_list_form():
|
||||
node = {"name": "t", "tags": ["firewall", "users"]}
|
||||
assert ct.collect_tags(node) == {"firewall", "users"}
|
||||
|
||||
|
||||
def test_collect_tags_string_form():
|
||||
node = {"name": "t", "tags": "always"}
|
||||
assert ct.collect_tags(node) == {"always"}
|
||||
|
||||
|
||||
def test_collect_tags_nested_blocks_and_roles():
|
||||
doc = [
|
||||
{"hosts": "all", "roles": [{"role": "base", "tags": ["base"]}]},
|
||||
{"block": [{"name": "x", "tags": ["config"]}], "tags": ["deploy"]},
|
||||
]
|
||||
assert ct.collect_tags(doc) == {"base", "config", "deploy"}
|
||||
|
||||
|
||||
def test_collect_tags_ignores_templated_values():
|
||||
node = {"tags": ["{{ dynamic }}", "logging"]}
|
||||
assert ct.collect_tags(node) == {"logging"}
|
||||
|
||||
|
||||
def test_load_vocab_unions_all_categories():
|
||||
vocab = ct.load_vocab()
|
||||
assert "firewall" in vocab # concern
|
||||
assert "always" in vocab # special
|
||||
assert "bootstrap" in vocab # playbook identity
|
||||
assert len(vocab) >= 10
|
||||
|
||||
|
||||
def test_role_names_reads_role_dirs():
|
||||
names = ct.role_names()
|
||||
assert "base" in names
|
||||
assert "docker_host" in names
|
||||
|
||||
|
||||
def test_scan_text_collects_from_yaml_string():
|
||||
text = """
|
||||
- hosts: all
|
||||
roles:
|
||||
- role: base
|
||||
tags: [base]
|
||||
tasks:
|
||||
- name: open port
|
||||
tags: [firewall]
|
||||
"""
|
||||
assert ct.scan_text(text) == {"base", "firewall"}
|
||||
|
||||
|
||||
def test_scan_text_tolerates_custom_yaml_tags():
|
||||
text = "- name: t\n secret: !vault xxx\n tags: [users]\n"
|
||||
assert ct.scan_text(text) == {"users"}
|
||||
|
||||
|
||||
def test_find_violations_flags_unknown_tag():
|
||||
allowed = {"base", "firewall"}
|
||||
used = {"base", "frewall"} # typo
|
||||
assert ct.find_violations(used, allowed) == ["frewall"]
|
||||
|
||||
|
||||
def test_find_violations_empty_when_all_allowed():
|
||||
assert ct.find_violations({"base", "firewall"}, {"base", "firewall"}) == []
|
||||
|
||||
|
||||
def test_iter_yaml_files_skips_molecule(tmp_path):
|
||||
role = tmp_path / "roles" / "demo"
|
||||
(role / "tasks").mkdir(parents=True)
|
||||
(role / "tasks" / "main.yml").write_text("---\n")
|
||||
mol = role / "molecule" / "default"
|
||||
mol.mkdir(parents=True)
|
||||
(mol / "verify.yml").write_text("---\n")
|
||||
found = list(ct.iter_yaml_files(repo=tmp_path, scan_dirs=("roles",)))
|
||||
names = [p.name for p in found]
|
||||
assert "main.yml" in names
|
||||
assert "verify.yml" not in names
|
||||
Loading…
Add table
Reference in a new issue