# Testing & Molecule gotchas Durable, point-of-use knowledge for writing and running role tests (ADR-008). Migrated from `docs/FRICTION.md` by the 2026-06-10 kaizen review. Append here when a testing surprise is worth remembering past the session that hit it. ## nftables / `nft -c` render checks - **`nft -c` rejects `iif ""` when the interface is absent** — `iif` resolves to an interface *index* at load time, so it fails in the Molecule container and would fail identically on any real host before the interface exists (e.g. `wt0` before NetBird is up). Use **`iifname ""`** (string match, no existence requirement, survives the interface coming and going) for any interface that may be absent. - **The render-and-`nft -c` (no-apply) Molecule approach earns its keep** — it caught the `iif`/`iifname` bug deterministically without touching the host kernel. Reuse this pattern (render template → static-check, never apply) for other config-rendering roles. ## Molecule (`community.docker`) - **Molecule's `community.docker` connection uses `ansible_host` as the container name** (`remote_addr`). Setting `ansible_host` as *data* in a scenario's `host_vars` (e.g. to give a resolver a fake IP) breaks the connection → `UNREACHABLE` / "Failed to create temporary directory". Don't override `ansible_host` in Molecule; feed fixture IPs another way (keep fixtures to zone sources and unit-test IP resolution). ## Coverage blind spot: apply-only task paths - **Apply-only task paths have no Level-1 coverage**, so safety bugs hide there. Example: an `nft` auto-rollback snapshot used a bare `nft list ruleset` (no leading `flush ruleset`), so the revert was a silent no-op on first apply and errored on later ones — the whole safety net was dead. Molecule never runs the apply (gated off), so only adversarial review + an isolated-netns round-trip test caught it. → For apply/safety paths Molecule can't exercise, validate out-of-band (a throwaway `--privileged` container with its own netns) and treat a final adversarial review as **mandatory, not optional**. ## Tags on dynamic `include_tasks` need `apply:` to reach the included tasks - **A tag on a dynamic `include_tasks` selects the include statement, not its contents.** Tagging `include_tasks: x.yml` with `concern` and running `--tags concern` runs *nothing* (`ok=N changed=0`) unless the included tasks are independently tagged. Use `include_tasks: {file: x.yml, apply: {tags: [concern]}}` to propagate the tag onto the included tasks — **mandatory** whenever a role uses tags to apply concern-subsets (`roles/base/tasks/main.yml` and `roles/dev_env/tasks/main.yml` are the references). - **Molecule converges *untagged*, so it cannot catch this by default** — the bug only shows under `make deploy … TAGS=` on a real host (first hit live on askari, M3). See the tag-isolation pattern below to catch it in Molecule instead. - **Check-mode artifact:** a `service`/handler for a not-yet-installed package fails in a first-run `--check`; guard with `when: not ansible_check_mode`. ## Testing concern-tag isolation in Molecule - To catch the tag-propagation bug above *in Molecule*, add a **second converge play** that applies one concern to a fresh target — `include_role` with `apply: {tags: [config]}` — plus a `verify` assertion that the concern's effect landed. Drive the real partial path with `molecule converge -- --tags config`. - **Sequence matters:** a partial-tag run on a *fresh* instance fails on cross-concern deps (a `config` task may need a binary the `packages` concern installs). The realistic test is **full converge → partial `--tags` re-run** (idempotent). Harness `pre_tasks` (e.g. test-user creation) must be tagged `always`, or `--tags` filters them out. (Pattern proven on `dev_env`, 2026-06-14.) ## API / templating roles: render-only tests miss the real call - For a role whose payload is "render data → external API call" (e.g. `public_dns` → Gandi LiveDNS), `apply=false` Molecule + data-only pytest exercise the *data file*, not the *rendered module args* — so corrupt-template and API-rejection bugs (`item.values` resolving to a dict method; Gandi rejecting RFC-7505 null-MX `0 .`) sail through both, plus review. Only a real (or `--check`) call against the API surfaces them. - → Treat a **check-mode run against the real API as a required gate** for such roles, or build a render-only assertion that materializes and inspects the rendered module args. ## Single-file bind mount + atomic rewrite = stale config (reload-in-place only) - **`ansible.builtin.template` writes atomically** (temp file + rename → a *new inode*). A Docker **single-file** bind mount pins the *old* inode, so a container that reloads config **in place** (no restart) keeps reading the stale file. Live hit: `reverse_proxy` bind-mounted the Caddyfile as a single file; `caddy reload` (in-container) re-read the old inode and silently no-op'd (`"config is unchanged"`). The new NetBird route never loaded → Caddy never requested its cert → surfaced only as a downstream TLS handshake failure. - **Fix for reload-in-place roles: bind-mount the config *directory*, not the file** (`./caddy` → `/etc/caddy`). Directory mounts reflect the inode swap, so the reload sees the new file (proven on askari). - **Restart-based roles are fine with a single-file mount.** Sibling case: `netbird` single-file-mounts `config.yaml`, but its handler does `docker compose restart` (not an in-container reload), and a **restart re-resolves the bind mount** (verified: route count 0 before, 1 after). Rule of thumb: **reload-in-place needs a directory mount; restart-based roles don't.**