Migrate the single-file-bind-mount/stale-config gotcha (reload-in-place needs a directory mount; restart-based roles don't) to docs/testing/gotchas.md, and move all 7 open signals out of FRICTION.md's Open-signals section into the new 2026-06-17 decisions-ledger block: all consumed, 1 PARK (the ubongo self-management gap, tracked in STATUS), 0 REMOVE. Relax test_load_signals to accept an empty Open-signals section (the goal state after a kaizen pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.7 KiB
5.7 KiB
Testing & Molecule gotchas
Durable, point-of-use knowledge for writing and running role tests (ADR-008).
Migrated from docs/FRICTION.md by the 2026-06-10 kaizen review. Append here when a
testing surprise is worth remembering past the session that hit it.
nftables / nft -c render checks
nft -crejectsiif "<name>"when the interface is absent —iifresolves to an interface index at load time, so it fails in the Molecule container and would fail identically on any real host before the interface exists (e.g.wt0before NetBird is up). Useiifname "<name>"(string match, no existence requirement, survives the interface coming and going) for any interface that may be absent.- The render-and-
nft -c(no-apply) Molecule approach earns its keep — it caught theiif/iifnamebug deterministically without touching the host kernel. Reuse this pattern (render template → static-check, never apply) for other config-rendering roles.
Molecule (community.docker)
- Molecule's
community.dockerconnection usesansible_hostas the container name (remote_addr). Settingansible_hostas data in a scenario'shost_vars(e.g. to give a resolver a fake IP) breaks the connection →UNREACHABLE/ "Failed to create temporary directory". Don't overrideansible_hostin Molecule; feed fixture IPs another way (keep fixtures to zone sources and unit-test IP resolution).
Coverage blind spot: apply-only task paths
- Apply-only task paths have no Level-1 coverage, so safety bugs hide there. Example:
an
nftauto-rollback snapshot used a barenft list ruleset(no leadingflush ruleset), so the revert was a silent no-op on first apply and errored on later ones — the whole safety net was dead. Molecule never runs the apply (gated off), so only adversarial review + an isolated-netns round-trip test caught it. → For apply/safety paths Molecule can't exercise, validate out-of-band (a throwaway--privilegedcontainer with its own netns) and treat a final adversarial review as mandatory, not optional.
Tags on dynamic include_tasks need apply: to reach the included tasks
- A tag on a dynamic
include_tasksselects the include statement, not its contents. Tagginginclude_tasks: x.ymlwithconcernand running--tags concernruns nothing (ok=N changed=0) unless the included tasks are independently tagged. Useinclude_tasks: {file: x.yml, apply: {tags: [concern]}}to propagate the tag onto the included tasks — mandatory whenever a role uses tags to apply concern-subsets (roles/base/tasks/main.ymlandroles/dev_env/tasks/main.ymlare the references). - Molecule converges untagged, so it cannot catch this by default — the bug only
shows under
make deploy … TAGS=<concern>on a real host (first hit live on askari, M3). See the tag-isolation pattern below to catch it in Molecule instead. - Check-mode artifact: a
service/handler for a not-yet-installed package fails in a first-run--check; guard withwhen: not ansible_check_mode.
Testing concern-tag isolation in Molecule
- To catch the tag-propagation bug above in Molecule, add a second converge play
that applies one concern to a fresh target —
include_rolewithapply: {tags: [config]}— plus averifyassertion that the concern's effect landed. Drive the real partial path withmolecule converge -- --tags config. - Sequence matters: a partial-tag run on a fresh instance fails on cross-concern
deps (a
configtask may need a binary thepackagesconcern installs). The realistic test is full converge → partial--tagsre-run (idempotent). Harnesspre_tasks(e.g. test-user creation) must be taggedalways, or--tagsfilters them out. (Pattern proven ondev_env, 2026-06-14.)
API / templating roles: render-only tests miss the real call
- For a role whose payload is "render data → external API call" (e.g.
public_dns→ Gandi LiveDNS),apply=falseMolecule + data-only pytest exercise the data file, not the rendered module args — so corrupt-template and API-rejection bugs (item.valuesresolving to a dict method; Gandi rejecting RFC-7505 null-MX0 .) sail through both, plus review. Only a real (or--check) call against the API surfaces them. - → Treat a check-mode run against the real API as a required gate for such roles, or build a render-only assertion that materializes and inspects the rendered module args.
Single-file bind mount + atomic rewrite = stale config (reload-in-place only)
ansible.builtin.templatewrites atomically (temp file + rename → a new inode). A Docker single-file bind mount pins the old inode, so a container that reloads config in place (no restart) keeps reading the stale file. Live hit:reverse_proxybind-mounted the Caddyfile as a single file;caddy reload(in-container) re-read the old inode and silently no-op'd ("config is unchanged"). The new NetBird route never loaded → Caddy never requested its cert → surfaced only as a downstream TLS handshake failure.- Fix for reload-in-place roles: bind-mount the config directory, not the file
(
./caddy→/etc/caddy). Directory mounts reflect the inode swap, so the reload sees the new file (proven on askari). - Restart-based roles are fine with a single-file mount. Sibling case:
netbirdsingle-file-mountsconfig.yaml, but its handler doesdocker compose restart(not an in-container reload), and a restart re-resolves the bind mount (verified: route count 0 before, 1 after). Rule of thumb: reload-in-place needs a directory mount; restart-based roles don't.