# ADR-008 — Testing methodology > Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`, > apply-path coverage blind spots) live in `docs/testing/gotchas.md`. ## Status Accepted (2026-05-30) ## Context Ansible roles must be idempotent and correct before they touch production hosts. This document records the testing strategy, what each level covers, and — critically — what is explicitly out of scope for automated testing and why. --- ## Decision ### Three testing levels #### Level 1 — Molecule (per role, always required) Runs in Docker on the control node (`ubongo`) or in CI. Fast (~5 min per role). **What happens during `molecule test`:** 1. `create` — start the test container 2. `converge` — apply the role via `converge.yml` 3. **`idempotency`** — run `converge.yml` again; fail if any task reports `changed` 4. `verify` — assert expected state via `verify.yml` 5. `destroy` — remove the container The idempotency step is non-negotiable. Every role must pass it cleanly. **`verify.yml` must assert outcomes, not task success:** ```yaml # Wrong — only proves the task ran - assert: that: result is success # Right — proves the outcome exists - ansible.builtin.command: systemctl is-active fail2ban changed_when: false register: svc - ansible.builtin.assert: that: svc.stdout == "active" ``` #### Level 2 — Staging playbook (full stack, real VMs) `make check PLAYBOOK=site` followed by `make deploy PLAYBOOK=site` on Terraform-provisioned staging VMs. Catches inter-role dependencies and ordering issues that Molecule cannot see (e.g., `docker_host` role requires `base` to have already run and configured the firewall). Run before every merge to `main`. #### Level 3 — External smoke test from askari Once `askari` is operational: scripted checks from outside the network confirming that public-facing services respond correctly. Catches firewall and reverse proxy configuration issues invisible to Ansible check mode. #### Level 4 — Service-UI acceptance (Claude-driven exploratory) A Claude-driven exploratory check of a service's **application UI**, run as `/verify-service ` on `ubongo` (ADR-017). Claude drives Chromium via the `playwright` plugin against a **staging** deploy, authenticates through the real Caddy (ADR-024) + Authentik SSO flow using a test user in the staging `test` group, then executes the service's `roles//VERIFY.md` acceptance journeys *and* free-explores — judging pass/fail, screenshotting key states. It writes a dated report to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything it can't verify (hardware, paid/external flows, subjective judgment). Catches application-level regressions no lower level sees ("does PhotoPrism actually serve photos?"). Placement: after Level 2 (staging deploy), before production promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate (that role belongs to health checks / Uptime Kuma). **Status:** the skill, the `VERIFY.md` template, and standards are authorable now; running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging deploy (STATUS.md). Full design: ADR-017. --- ### Molecule test image **No external images.** The project builds and hosts its own test image. **Source**: `.docker/molecule-debian13/Dockerfile` **Base**: `debian:trixie-slim` (official Debian 13, Docker Hub — only external dependency permitted here, as the base OS image is not substitutable) **Registry**: `forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest` Build and push with: ```bash make molecule-image # build locally make molecule-image-push # push to Forgejo registry (requires docker login) ``` The scaffold `molecule.yml` references this image with `pre_build_image: true`, meaning Molecule uses the image as-is and does not attempt to build it. **Why not geerlingguy/docker-debian13-ansible?** It is a Docker Hub image outside project control. It is not a Galaxy role, but it is an external dependency that can drift, disappear, or introduce unexpected changes. The custom image is functionally equivalent and fully owned. --- ### Idempotency requirements Every role task must satisfy one of these: | Task type | Requirement | |---|---| | `apt`, `template`, `copy`, `file`, `user`, `group`, `service` | Naturally idempotent — no action needed | | `command` / `shell` (read-only) | `changed_when: false` | | `command` / `shell` (detectable change) | `changed_when: result.stdout \| length > 0` or equivalent | | `command` / `shell` (creates a file) | `creates: /path/to/artifact` | | Service restart after config change | Move to a handler; handler fires only when notified | | `docker compose up -d` | Handler only — notified by template change, never runs unconditionally | ansible-lint enforces most of these at lint time. The Molecule idempotency step catches anything lint misses. --- ### What Molecule tests — and what it does not #### Tested in Molecule | Capability | Notes | |---|---| | Package installation | `apt` works in the container | | File and directory creation, permissions, ownership | Full support | | Template rendering and content | Full support | | User and group management | Full support | | Service installation and `systemd enable` | Requires the systemd-capable image | | Service start/stop | Works for most services in the container | | SSH configuration file content | File-level only | | fail2ban installation and configuration | Install and config file; not live banning | | Docker daemon installation | Works in privileged container | | auditd installation and configuration | Install and config file | | Idempotency of all of the above | Enforced by Molecule's idempotency step | #### Not tested in Molecule — explicit exceptions The following require a real kernel or real hardware and are validated only at Level 2 (staging) or Level 3 (external). This is a conscious, documented decision — not a gap. | Capability | Reason not testable in Molecule | |---|---| | `nftables` rule loading | Requires `nf_tables` kernel module; not available in Docker | | **Reboot-survivability / host-firewall × Docker interaction / boot-ordering** | **Requires a real kernel reboot — the class that caused the 2026-06-17 mesh-hardening incident. Now covered by local VM integration testing (ADR-025).** | | NetBird mesh data plane (`wt0` WireGuard interface) | Requires the `wireguard` kernel module; Molecule checks only that the agent is installed/configured (ADR-016) | | `unattended-upgrades` behaviour | Installs correctly; actual upgrade behaviour requires a real apt environment | | DHCP behaviour (OPNsense) | OPNsense is managed by Ansible but not testable in a container | | mDNS reflector (Avahi cross-VLAN) | Requires real network interfaces and VLANs | | Hardware passthrough (NIC, USB) | Not applicable in containers | | Corosync cluster formation | Requires multiple real nodes | For the above, Molecule tests only what it can: that the relevant packages are installed, that configuration files render correctly, and that services are enabled. Behavioural correctness is confirmed on staging. **ADR-025 is the concrete build of Level 2/3** — local VM integration testing on ubongo (libvirt/KVM, throwaway overlay VMs, stdlib-only driver). It specifically targets the reboot-survivability / host-firewall × Docker / boot-ordering class that Molecule structurally cannot reach. See `docs/decisions/025-local-vm-integration-testing.md`. --- ### CI pipeline ``` push to main ├── yamllint + ansible-lint (fast gate, ~1 min) └── molecule test (all roles) (parallel, ~5 min per role) on green (main) ├── review tf-plan if infra changed; make check on staging └── [manual approval] make deploy PLAYBOOK=site on staging promote to production └── [manual approval] make deploy PLAYBOOK=site on production ``` Manual gates are intentional. Automated tests prove correctness in isolation; a human confirms the change is safe to promote. --- ## Consequences Drawn from the limitations and trade-offs already stated above: - The Molecule idempotency step is non-negotiable; every role must pass it cleanly (Three testing levels — Level 1). - A class of capabilities (nftables rule loading, NetBird mesh data plane, unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware passthrough, corosync cluster formation) cannot be verified in Molecule and is validated only at Level 2 (staging) or Level 3 (external) — a conscious, documented decision, not a gap (What Molecule tests — and what it does not). - The project builds and hosts its own `molecule-debian13` image rather than relying on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a custom image to avoid drift, disappearance, or unexpected changes outside project control (Molecule test image). - Level 4 service-UI acceptance is authorable now but its execution is deferred, pending `ubongo`, the `playwright` plugin, Authentik, and a staging deploy (Three testing levels — Level 4). - Promotion to staging and to production stays behind intentional manual approval gates; automation proves isolated correctness, a human confirms promotion safety (CI pipeline).