- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the "not tested in Molecule" table - ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing - accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records) - CLAUDE.md: add make test-integration[/-clean] to key-commands; add ADR-025 + runbook rows to further-reading - hardware/reference.md: note one ephemeral KVM test VM on ubongo - STATUS.md: add integration harness entry (built, lint+pytest clean; RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.2 KiB
ADR-008 — Testing methodology
Practical point-of-use pitfalls (nft render checks, Molecule
community.docker, apply-path coverage blind spots) live indocs/testing/gotchas.md.
Status
Accepted (2026-05-30)
Context
Ansible roles must be idempotent and correct before they touch production hosts. This document records the testing strategy, what each level covers, and — critically — what is explicitly out of scope for automated testing and why.
Decision
Three testing levels
Level 1 — Molecule (per role, always required)
Runs in Docker on the control node (ubongo) or in CI. Fast (~5 min per role).
What happens during molecule test:
create— start the test containerconverge— apply the role viaconverge.ymlidempotency— runconverge.ymlagain; fail if any task reportschangedverify— assert expected state viaverify.ymldestroy— remove the container
The idempotency step is non-negotiable. Every role must pass it cleanly.
verify.yml must assert outcomes, not task success:
# Wrong — only proves the task ran
- assert:
that: result is success
# Right — proves the outcome exists
- ansible.builtin.command: systemctl is-active fail2ban
changed_when: false
register: svc
- ansible.builtin.assert:
that: svc.stdout == "active"
Level 2 — Staging playbook (full stack, real VMs)
make check PLAYBOOK=site followed by make deploy PLAYBOOK=site on
Terraform-provisioned staging VMs. Catches inter-role dependencies and ordering
issues that Molecule cannot see (e.g., docker_host role requires base to
have already run and configured the firewall).
Run before every merge to main.
Level 3 — External smoke test from askari
Once askari is operational: scripted checks from outside the network confirming
that public-facing services respond correctly. Catches firewall and reverse proxy
configuration issues invisible to Ansible check mode.
Level 4 — Service-UI acceptance (Claude-driven exploratory)
A Claude-driven exploratory check of a service's application UI, run as
/verify-service <name> on ubongo (ADR-017). Claude drives Chromium via the
playwright plugin against a staging deploy, authenticates through the real
Caddy (ADR-024) + Authentik SSO flow using a test user in the staging test group, then
executes the service's roles/<service>/VERIFY.md acceptance journeys and
free-explores — judging pass/fail, screenshotting key states. It writes a dated report
to docs/testing/reviews/ and hands the operator a manual-test checklist for anything
it can't verify (hardware, paid/external flows, subjective judgment).
Catches application-level regressions no lower level sees ("does PhotoPrism actually serve photos?"). Placement: after Level 2 (staging deploy), before production promotion. Exploratory and interactive by design — not a deterministic CI/cron gate (that role belongs to health checks / Uptime Kuma).
Status: the skill, the VERIFY.md template, and standards are authorable now;
running it is deferred on ubongo + the playwright plugin + Authentik + a staging
deploy (STATUS.md). Full design: ADR-017.
Molecule test image
No external images. The project builds and hosts its own test image.
Source: .docker/molecule-debian13/Dockerfile
Base: debian:trixie-slim (official Debian 13, Docker Hub — only external
dependency permitted here, as the base OS image is not substitutable)
Registry: forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest
Build and push with:
make molecule-image # build locally
make molecule-image-push # push to Forgejo registry (requires docker login)
The scaffold molecule.yml references this image with pre_build_image: true,
meaning Molecule uses the image as-is and does not attempt to build it.
Why not geerlingguy/docker-debian13-ansible? It is a Docker Hub image outside project control. It is not a Galaxy role, but it is an external dependency that can drift, disappear, or introduce unexpected changes. The custom image is functionally equivalent and fully owned.
Idempotency requirements
Every role task must satisfy one of these:
| Task type | Requirement |
|---|---|
apt, template, copy, file, user, group, service |
Naturally idempotent — no action needed |
command / shell (read-only) |
changed_when: false |
command / shell (detectable change) |
changed_when: result.stdout | length > 0 or equivalent |
command / shell (creates a file) |
creates: /path/to/artifact |
| Service restart after config change | Move to a handler; handler fires only when notified |
docker compose up -d |
Handler only — notified by template change, never runs unconditionally |
ansible-lint enforces most of these at lint time. The Molecule idempotency step catches anything lint misses.
What Molecule tests — and what it does not
Tested in Molecule
| Capability | Notes |
|---|---|
| Package installation | apt works in the container |
| File and directory creation, permissions, ownership | Full support |
| Template rendering and content | Full support |
| User and group management | Full support |
Service installation and systemd enable |
Requires the systemd-capable image |
| Service start/stop | Works for most services in the container |
| SSH configuration file content | File-level only |
| fail2ban installation and configuration | Install and config file; not live banning |
| Docker daemon installation | Works in privileged container |
| auditd installation and configuration | Install and config file |
| Idempotency of all of the above | Enforced by Molecule's idempotency step |
Not tested in Molecule — explicit exceptions
The following require a real kernel or real hardware and are validated only at Level 2 (staging) or Level 3 (external). This is a conscious, documented decision — not a gap.
| Capability | Reason not testable in Molecule |
|---|---|
nftables rule loading |
Requires nf_tables kernel module; not available in Docker |
| Reboot-survivability / host-firewall × Docker interaction / boot-ordering | Requires a real kernel reboot — the class that caused the 2026-06-17 mesh-hardening incident. Now covered by local VM integration testing (ADR-025). |
NetBird mesh data plane (wt0 WireGuard interface) |
Requires the wireguard kernel module; Molecule checks only that the agent is installed/configured (ADR-016) |
unattended-upgrades behaviour |
Installs correctly; actual upgrade behaviour requires a real apt environment |
| DHCP behaviour (OPNsense) | OPNsense is managed by Ansible but not testable in a container |
| mDNS reflector (Avahi cross-VLAN) | Requires real network interfaces and VLANs |
| Hardware passthrough (NIC, USB) | Not applicable in containers |
| Corosync cluster formation | Requires multiple real nodes |
For the above, Molecule tests only what it can: that the relevant packages are installed, that configuration files render correctly, and that services are enabled. Behavioural correctness is confirmed on staging.
ADR-025 is the concrete build of Level 2/3 — local VM integration testing on
ubongo (libvirt/KVM, throwaway overlay VMs, stdlib-only driver). It specifically
targets the reboot-survivability / host-firewall × Docker / boot-ordering class that
Molecule structurally cannot reach. See docs/decisions/025-local-vm-integration-testing.md.
CI pipeline
push to main
├── yamllint + ansible-lint (fast gate, ~1 min)
└── molecule test (all roles) (parallel, ~5 min per role)
on green (main)
├── review tf-plan if infra changed; make check on staging
└── [manual approval] make deploy PLAYBOOK=site on staging
promote to production
└── [manual approval] make deploy PLAYBOOK=site on production
Manual gates are intentional. Automated tests prove correctness in isolation; a human confirms the change is safe to promote.
Consequences
Drawn from the limitations and trade-offs already stated above:
- The Molecule idempotency step is non-negotiable; every role must pass it cleanly (Three testing levels — Level 1).
- A class of capabilities (nftables rule loading, NetBird mesh data plane, unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware passthrough, corosync cluster formation) cannot be verified in Molecule and is validated only at Level 2 (staging) or Level 3 (external) — a conscious, documented decision, not a gap (What Molecule tests — and what it does not).
- The project builds and hosts its own
molecule-debian13image rather than relying on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a custom image to avoid drift, disappearance, or unexpected changes outside project control (Molecule test image). - Level 4 service-UI acceptance is authorable now but its execution is deferred,
pending
ubongo, theplaywrightplugin, Authentik, and a staging deploy (Three testing levels — Level 4). - Promotion to staging and to production stays behind intentional manual approval gates; automation proves isolated correctness, a human confirms promotion safety (CI pipeline).