boma/docs/testing/gotchas.md
sjat 13ae674cc9 chore(kaizen): first /kaizen run — curate 12 friction signals
Dogfood of the new /kaizen command. 11 consumed, 1 kept open.
- SYSTEMATIZE → docs/testing/gotchas.md (apply:{tags} propagation, Molecule
  tag-isolation testing, API/templating render-only gap); CLAUDE.md
  (item['key'] loop convention, TF module required_providers); public_dns
  README (Gandi null-MX workaround).
- CHANGE → extend the Stop hook to also guard the brainstorming spec-review gate
  (verified: blocks the gate, passes meta-discussion).
- SYSTEMATIZE → make new-role scaffolds the access__/backup__ noqa reminder;
  ADR-004 documents the cross-role-naming convention.
- ALREADY-BUILT/ACCEPTED → exec-menu guard verified firing; ADR-023; ADR-024;
  subagent-faithfulness now embodied in the two-stage subagent review.
- KEEP-OPEN → a repo-scan.py check for ADRs that over-claim reconciliation.

Nudge: OVERDUE (13 signals) → ok (1). make lint + 16 friction-scan tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:46:23 +02:00

4.5 KiB

Testing & Molecule gotchas

Durable, point-of-use knowledge for writing and running role tests (ADR-008). Migrated from docs/FRICTION.md by the 2026-06-10 kaizen review. Append here when a testing surprise is worth remembering past the session that hit it.

nftables / nft -c render checks

  • nft -c rejects iif "<name>" when the interface is absentiif resolves to an interface index at load time, so it fails in the Molecule container and would fail identically on any real host before the interface exists (e.g. wt0 before NetBird is up). Use iifname "<name>" (string match, no existence requirement, survives the interface coming and going) for any interface that may be absent.
  • The render-and-nft -c (no-apply) Molecule approach earns its keep — it caught the iif/iifname bug deterministically without touching the host kernel. Reuse this pattern (render template → static-check, never apply) for other config-rendering roles.

Molecule (community.docker)

  • Molecule's community.docker connection uses ansible_host as the container name (remote_addr). Setting ansible_host as data in a scenario's host_vars (e.g. to give a resolver a fake IP) breaks the connection → UNREACHABLE / "Failed to create temporary directory". Don't override ansible_host in Molecule; feed fixture IPs another way (keep fixtures to zone sources and unit-test IP resolution).

Coverage blind spot: apply-only task paths

  • Apply-only task paths have no Level-1 coverage, so safety bugs hide there. Example: an nft auto-rollback snapshot used a bare nft list ruleset (no leading flush ruleset), so the revert was a silent no-op on first apply and errored on later ones — the whole safety net was dead. Molecule never runs the apply (gated off), so only adversarial review + an isolated-netns round-trip test caught it. → For apply/safety paths Molecule can't exercise, validate out-of-band (a throwaway --privileged container with its own netns) and treat a final adversarial review as mandatory, not optional.

Tags on dynamic include_tasks need apply: to reach the included tasks

  • A tag on a dynamic include_tasks selects the include statement, not its contents. Tagging include_tasks: x.yml with concern and running --tags concern runs nothing (ok=N changed=0) unless the included tasks are independently tagged. Use include_tasks: {file: x.yml, apply: {tags: [concern]}} to propagate the tag onto the included tasks — mandatory whenever a role uses tags to apply concern-subsets (roles/base/tasks/main.yml and roles/dev_env/tasks/main.yml are the references).
  • Molecule converges untagged, so it cannot catch this by default — the bug only shows under make deploy … TAGS=<concern> on a real host (first hit live on askari, M3). See the tag-isolation pattern below to catch it in Molecule instead.
  • Check-mode artifact: a service/handler for a not-yet-installed package fails in a first-run --check; guard with when: not ansible_check_mode.

Testing concern-tag isolation in Molecule

  • To catch the tag-propagation bug above in Molecule, add a second converge play that applies one concern to a fresh target — include_role with apply: {tags: [config]} — plus a verify assertion that the concern's effect landed. Drive the real partial path with molecule converge -- --tags config.
  • Sequence matters: a partial-tag run on a fresh instance fails on cross-concern deps (a config task may need a binary the packages concern installs). The realistic test is full converge → partial --tags re-run (idempotent). Harness pre_tasks (e.g. test-user creation) must be tagged always, or --tags filters them out. (Pattern proven on dev_env, 2026-06-14.)

API / templating roles: render-only tests miss the real call

  • For a role whose payload is "render data → external API call" (e.g. public_dns → Gandi LiveDNS), apply=false Molecule + data-only pytest exercise the data file, not the rendered module args — so corrupt-template and API-rejection bugs (item.values resolving to a dict method; Gandi rejecting RFC-7505 null-MX 0 .) sail through both, plus review. Only a real (or --check) call against the API surfaces them.
  • → Treat a check-mode run against the real API as a required gate for such roles, or build a render-only assertion that materializes and inspects the rendered module args.