boma/docs/testing/gotchas.md

73 lines
4.5 KiB
Markdown
Raw Permalink Normal View History

# Testing & Molecule gotchas
Durable, point-of-use knowledge for writing and running role tests (ADR-008).
Migrated from `docs/FRICTION.md` by the 2026-06-10 kaizen review. Append here when a
testing surprise is worth remembering past the session that hit it.
## nftables / `nft -c` render checks
- **`nft -c` rejects `iif "<name>"` when the interface is absent** — `iif` resolves to
an interface *index* at load time, so it fails in the Molecule container and would
fail identically on any real host before the interface exists (e.g. `wt0` before
NetBird is up). Use **`iifname "<name>"`** (string match, no existence requirement,
survives the interface coming and going) for any interface that may be absent.
- **The render-and-`nft -c` (no-apply) Molecule approach earns its keep** — it caught
the `iif`/`iifname` bug deterministically without touching the host kernel. Reuse
this pattern (render template → static-check, never apply) for other config-rendering
roles.
## Molecule (`community.docker`)
- **Molecule's `community.docker` connection uses `ansible_host` as the container name**
(`remote_addr`). Setting `ansible_host` as *data* in a scenario's `host_vars` (e.g. to
give a resolver a fake IP) breaks the connection → `UNREACHABLE` / "Failed to create
temporary directory". Don't override `ansible_host` in Molecule; feed fixture IPs
another way (keep fixtures to zone sources and unit-test IP resolution).
## Coverage blind spot: apply-only task paths
- **Apply-only task paths have no Level-1 coverage**, so safety bugs hide there. Example:
an `nft` auto-rollback snapshot used a bare `nft list ruleset` (no leading
`flush ruleset`), so the revert was a silent no-op on first apply and errored on later
ones — the whole safety net was dead. Molecule never runs the apply (gated off), so
only adversarial review + an isolated-netns round-trip test caught it. → For
apply/safety paths Molecule can't exercise, validate out-of-band (a throwaway
`--privileged` container with its own netns) and treat a final adversarial review as
**mandatory, not optional**.
## Tags on dynamic `include_tasks` need `apply:` to reach the included tasks
- **A tag on a dynamic `include_tasks` selects the include statement, not its contents.**
Tagging `include_tasks: x.yml` with `concern` and running `--tags concern` runs
*nothing* (`ok=N changed=0`) unless the included tasks are independently tagged. Use
`include_tasks: {file: x.yml, apply: {tags: [concern]}}` to propagate the tag onto the
included tasks — **mandatory** whenever a role uses tags to apply concern-subsets
(`roles/base/tasks/main.yml` and `roles/dev_env/tasks/main.yml` are the references).
- **Molecule converges *untagged*, so it cannot catch this by default** — the bug only
shows under `make deploy … TAGS=<concern>` on a real host (first hit live on askari, M3).
See the tag-isolation pattern below to catch it in Molecule instead.
- **Check-mode artifact:** a `service`/handler for a not-yet-installed package fails in a
first-run `--check`; guard with `when: not ansible_check_mode`.
## Testing concern-tag isolation in Molecule
- To catch the tag-propagation bug above *in Molecule*, add a **second converge play**
that applies one concern to a fresh target — `include_role` with `apply: {tags: [config]}`
— plus a `verify` assertion that the concern's effect landed. Drive the real partial
path with `molecule converge -- --tags config`.
- **Sequence matters:** a partial-tag run on a *fresh* instance fails on cross-concern
deps (a `config` task may need a binary the `packages` concern installs). The realistic
test is **full converge → partial `--tags` re-run** (idempotent). Harness `pre_tasks`
(e.g. test-user creation) must be tagged `always`, or `--tags` filters them out.
(Pattern proven on `dev_env`, 2026-06-14.)
## API / templating roles: render-only tests miss the real call
- For a role whose payload is "render data → external API call" (e.g. `public_dns`
Gandi LiveDNS), `apply=false` Molecule + data-only pytest exercise the *data file*, not
the *rendered module args* — so corrupt-template and API-rejection bugs (`item.values`
resolving to a dict method; Gandi rejecting RFC-7505 null-MX `0 .`) sail through both,
plus review. Only a real (or `--check`) call against the API surfaces them.
- → Treat a **check-mode run against the real API as a required gate** for such roles, or
build a render-only assertion that materializes and inspects the rendered module args.