Migrate the single-file-bind-mount/stale-config gotcha (reload-in-place needs a directory mount; restart-based roles don't) to docs/testing/gotchas.md, and move all 7 open signals out of FRICTION.md's Open-signals section into the new 2026-06-17 decisions-ledger block: all consumed, 1 PARK (the ubongo self-management gap, tracked in STATUS), 0 REMOVE. Relax test_load_signals to accept an empty Open-signals section (the goal state after a kaizen pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
90 lines
5.7 KiB
Markdown
90 lines
5.7 KiB
Markdown
# Testing & Molecule gotchas
|
|
|
|
Durable, point-of-use knowledge for writing and running role tests (ADR-008).
|
|
Migrated from `docs/FRICTION.md` by the 2026-06-10 kaizen review. Append here when a
|
|
testing surprise is worth remembering past the session that hit it.
|
|
|
|
## nftables / `nft -c` render checks
|
|
|
|
- **`nft -c` rejects `iif "<name>"` when the interface is absent** — `iif` resolves to
|
|
an interface *index* at load time, so it fails in the Molecule container and would
|
|
fail identically on any real host before the interface exists (e.g. `wt0` before
|
|
NetBird is up). Use **`iifname "<name>"`** (string match, no existence requirement,
|
|
survives the interface coming and going) for any interface that may be absent.
|
|
- **The render-and-`nft -c` (no-apply) Molecule approach earns its keep** — it caught
|
|
the `iif`/`iifname` bug deterministically without touching the host kernel. Reuse
|
|
this pattern (render template → static-check, never apply) for other config-rendering
|
|
roles.
|
|
|
|
## Molecule (`community.docker`)
|
|
|
|
- **Molecule's `community.docker` connection uses `ansible_host` as the container name**
|
|
(`remote_addr`). Setting `ansible_host` as *data* in a scenario's `host_vars` (e.g. to
|
|
give a resolver a fake IP) breaks the connection → `UNREACHABLE` / "Failed to create
|
|
temporary directory". Don't override `ansible_host` in Molecule; feed fixture IPs
|
|
another way (keep fixtures to zone sources and unit-test IP resolution).
|
|
|
|
## Coverage blind spot: apply-only task paths
|
|
|
|
- **Apply-only task paths have no Level-1 coverage**, so safety bugs hide there. Example:
|
|
an `nft` auto-rollback snapshot used a bare `nft list ruleset` (no leading
|
|
`flush ruleset`), so the revert was a silent no-op on first apply and errored on later
|
|
ones — the whole safety net was dead. Molecule never runs the apply (gated off), so
|
|
only adversarial review + an isolated-netns round-trip test caught it. → For
|
|
apply/safety paths Molecule can't exercise, validate out-of-band (a throwaway
|
|
`--privileged` container with its own netns) and treat a final adversarial review as
|
|
**mandatory, not optional**.
|
|
|
|
## Tags on dynamic `include_tasks` need `apply:` to reach the included tasks
|
|
|
|
- **A tag on a dynamic `include_tasks` selects the include statement, not its contents.**
|
|
Tagging `include_tasks: x.yml` with `concern` and running `--tags concern` runs
|
|
*nothing* (`ok=N changed=0`) unless the included tasks are independently tagged. Use
|
|
`include_tasks: {file: x.yml, apply: {tags: [concern]}}` to propagate the tag onto the
|
|
included tasks — **mandatory** whenever a role uses tags to apply concern-subsets
|
|
(`roles/base/tasks/main.yml` and `roles/dev_env/tasks/main.yml` are the references).
|
|
- **Molecule converges *untagged*, so it cannot catch this by default** — the bug only
|
|
shows under `make deploy … TAGS=<concern>` on a real host (first hit live on askari, M3).
|
|
See the tag-isolation pattern below to catch it in Molecule instead.
|
|
- **Check-mode artifact:** a `service`/handler for a not-yet-installed package fails in a
|
|
first-run `--check`; guard with `when: not ansible_check_mode`.
|
|
|
|
## Testing concern-tag isolation in Molecule
|
|
|
|
- To catch the tag-propagation bug above *in Molecule*, add a **second converge play**
|
|
that applies one concern to a fresh target — `include_role` with `apply: {tags: [config]}`
|
|
— plus a `verify` assertion that the concern's effect landed. Drive the real partial
|
|
path with `molecule converge -- --tags config`.
|
|
- **Sequence matters:** a partial-tag run on a *fresh* instance fails on cross-concern
|
|
deps (a `config` task may need a binary the `packages` concern installs). The realistic
|
|
test is **full converge → partial `--tags` re-run** (idempotent). Harness `pre_tasks`
|
|
(e.g. test-user creation) must be tagged `always`, or `--tags` filters them out.
|
|
(Pattern proven on `dev_env`, 2026-06-14.)
|
|
|
|
## API / templating roles: render-only tests miss the real call
|
|
|
|
- For a role whose payload is "render data → external API call" (e.g. `public_dns` →
|
|
Gandi LiveDNS), `apply=false` Molecule + data-only pytest exercise the *data file*, not
|
|
the *rendered module args* — so corrupt-template and API-rejection bugs (`item.values`
|
|
resolving to a dict method; Gandi rejecting RFC-7505 null-MX `0 .`) sail through both,
|
|
plus review. Only a real (or `--check`) call against the API surfaces them.
|
|
- → Treat a **check-mode run against the real API as a required gate** for such roles, or
|
|
build a render-only assertion that materializes and inspects the rendered module args.
|
|
|
|
## Single-file bind mount + atomic rewrite = stale config (reload-in-place only)
|
|
|
|
- **`ansible.builtin.template` writes atomically** (temp file + rename → a *new inode*). A
|
|
Docker **single-file** bind mount pins the *old* inode, so a container that reloads
|
|
config **in place** (no restart) keeps reading the stale file. Live hit: `reverse_proxy`
|
|
bind-mounted the Caddyfile as a single file; `caddy reload` (in-container) re-read the
|
|
old inode and silently no-op'd (`"config is unchanged"`). The new NetBird route never
|
|
loaded → Caddy never requested its cert → surfaced only as a downstream TLS handshake
|
|
failure.
|
|
- **Fix for reload-in-place roles: bind-mount the config *directory*, not the file**
|
|
(`./caddy` → `/etc/caddy`). Directory mounts reflect the inode swap, so the reload sees
|
|
the new file (proven on askari).
|
|
- **Restart-based roles are fine with a single-file mount.** Sibling case: `netbird`
|
|
single-file-mounts `config.yaml`, but its handler does `docker compose restart` (not an
|
|
in-container reload), and a **restart re-resolves the bind mount** (verified: route
|
|
count 0 before, 1 after). Rule of thumb: **reload-in-place needs a directory mount;
|
|
restart-based roles don't.**
|