boma/docs/decisions/008-testing.md
sjat 4732730515 docs: wire ADR-025 into testing/control-host/risks/status/capacity
- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the
  "not tested in Molecule" table
- ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs
  (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing
- accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records)
- CLAUDE.md: add make test-integration[/-clean] to key-commands;
  add ADR-025 + runbook rows to further-reading
- hardware/reference.md: note one ephemeral KVM test VM on ubongo
- STATUS.md: add integration harness entry (built, lint+pytest clean;
  RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:51:22 +02:00

9.2 KiB
Raw Permalink Blame History

ADR-008 — Testing methodology

Practical point-of-use pitfalls (nft render checks, Molecule community.docker, apply-path coverage blind spots) live in docs/testing/gotchas.md.

Status

Accepted (2026-05-30)

Context

Ansible roles must be idempotent and correct before they touch production hosts. This document records the testing strategy, what each level covers, and — critically — what is explicitly out of scope for automated testing and why.


Decision

Three testing levels

Level 1 — Molecule (per role, always required)

Runs in Docker on the control node (ubongo) or in CI. Fast (~5 min per role).

What happens during molecule test:

  1. create — start the test container
  2. converge — apply the role via converge.yml
  3. idempotency — run converge.yml again; fail if any task reports changed
  4. verify — assert expected state via verify.yml
  5. destroy — remove the container

The idempotency step is non-negotiable. Every role must pass it cleanly.

verify.yml must assert outcomes, not task success:

# Wrong — only proves the task ran
- assert:
    that: result is success

# Right — proves the outcome exists
- ansible.builtin.command: systemctl is-active fail2ban
  changed_when: false
  register: svc
- ansible.builtin.assert:
    that: svc.stdout == "active"

Level 2 — Staging playbook (full stack, real VMs)

make check PLAYBOOK=site followed by make deploy PLAYBOOK=site on Terraform-provisioned staging VMs. Catches inter-role dependencies and ordering issues that Molecule cannot see (e.g., docker_host role requires base to have already run and configured the firewall).

Run before every merge to main.

Level 3 — External smoke test from askari

Once askari is operational: scripted checks from outside the network confirming that public-facing services respond correctly. Catches firewall and reverse proxy configuration issues invisible to Ansible check mode.

Level 4 — Service-UI acceptance (Claude-driven exploratory)

A Claude-driven exploratory check of a service's application UI, run as /verify-service <name> on ubongo (ADR-017). Claude drives Chromium via the playwright plugin against a staging deploy, authenticates through the real Caddy (ADR-024) + Authentik SSO flow using a test user in the staging test group, then executes the service's roles/<service>/VERIFY.md acceptance journeys and free-explores — judging pass/fail, screenshotting key states. It writes a dated report to docs/testing/reviews/ and hands the operator a manual-test checklist for anything it can't verify (hardware, paid/external flows, subjective judgment).

Catches application-level regressions no lower level sees ("does PhotoPrism actually serve photos?"). Placement: after Level 2 (staging deploy), before production promotion. Exploratory and interactive by design — not a deterministic CI/cron gate (that role belongs to health checks / Uptime Kuma).

Status: the skill, the VERIFY.md template, and standards are authorable now; running it is deferred on ubongo + the playwright plugin + Authentik + a staging deploy (STATUS.md). Full design: ADR-017.


Molecule test image

No external images. The project builds and hosts its own test image.

Source: .docker/molecule-debian13/Dockerfile Base: debian:trixie-slim (official Debian 13, Docker Hub — only external dependency permitted here, as the base OS image is not substitutable) Registry: forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest

Build and push with:

make molecule-image        # build locally
make molecule-image-push   # push to Forgejo registry (requires docker login)

The scaffold molecule.yml references this image with pre_build_image: true, meaning Molecule uses the image as-is and does not attempt to build it.

Why not geerlingguy/docker-debian13-ansible? It is a Docker Hub image outside project control. It is not a Galaxy role, but it is an external dependency that can drift, disappear, or introduce unexpected changes. The custom image is functionally equivalent and fully owned.


Idempotency requirements

Every role task must satisfy one of these:

Task type Requirement
apt, template, copy, file, user, group, service Naturally idempotent — no action needed
command / shell (read-only) changed_when: false
command / shell (detectable change) changed_when: result.stdout | length > 0 or equivalent
command / shell (creates a file) creates: /path/to/artifact
Service restart after config change Move to a handler; handler fires only when notified
docker compose up -d Handler only — notified by template change, never runs unconditionally

ansible-lint enforces most of these at lint time. The Molecule idempotency step catches anything lint misses.


What Molecule tests — and what it does not

Tested in Molecule

Capability Notes
Package installation apt works in the container
File and directory creation, permissions, ownership Full support
Template rendering and content Full support
User and group management Full support
Service installation and systemd enable Requires the systemd-capable image
Service start/stop Works for most services in the container
SSH configuration file content File-level only
fail2ban installation and configuration Install and config file; not live banning
Docker daemon installation Works in privileged container
auditd installation and configuration Install and config file
Idempotency of all of the above Enforced by Molecule's idempotency step

Not tested in Molecule — explicit exceptions

The following require a real kernel or real hardware and are validated only at Level 2 (staging) or Level 3 (external). This is a conscious, documented decision — not a gap.

Capability Reason not testable in Molecule
nftables rule loading Requires nf_tables kernel module; not available in Docker
Reboot-survivability / host-firewall × Docker interaction / boot-ordering Requires a real kernel reboot — the class that caused the 2026-06-17 mesh-hardening incident. Now covered by local VM integration testing (ADR-025).
NetBird mesh data plane (wt0 WireGuard interface) Requires the wireguard kernel module; Molecule checks only that the agent is installed/configured (ADR-016)
unattended-upgrades behaviour Installs correctly; actual upgrade behaviour requires a real apt environment
DHCP behaviour (OPNsense) OPNsense is managed by Ansible but not testable in a container
mDNS reflector (Avahi cross-VLAN) Requires real network interfaces and VLANs
Hardware passthrough (NIC, USB) Not applicable in containers
Corosync cluster formation Requires multiple real nodes

For the above, Molecule tests only what it can: that the relevant packages are installed, that configuration files render correctly, and that services are enabled. Behavioural correctness is confirmed on staging.

ADR-025 is the concrete build of Level 2/3 — local VM integration testing on ubongo (libvirt/KVM, throwaway overlay VMs, stdlib-only driver). It specifically targets the reboot-survivability / host-firewall × Docker / boot-ordering class that Molecule structurally cannot reach. See docs/decisions/025-local-vm-integration-testing.md.


CI pipeline

push to main
  ├── yamllint + ansible-lint          (fast gate, ~1 min)
  └── molecule test (all roles)        (parallel, ~5 min per role)

on green (main)
  ├── review tf-plan if infra changed; make check on staging
  └── [manual approval] make deploy PLAYBOOK=site on staging

promote to production
  └── [manual approval] make deploy PLAYBOOK=site on production

Manual gates are intentional. Automated tests prove correctness in isolation; a human confirms the change is safe to promote.


Consequences

Drawn from the limitations and trade-offs already stated above:

  • The Molecule idempotency step is non-negotiable; every role must pass it cleanly (Three testing levels — Level 1).
  • A class of capabilities (nftables rule loading, NetBird mesh data plane, unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware passthrough, corosync cluster formation) cannot be verified in Molecule and is validated only at Level 2 (staging) or Level 3 (external) — a conscious, documented decision, not a gap (What Molecule tests — and what it does not).
  • The project builds and hosts its own molecule-debian13 image rather than relying on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a custom image to avoid drift, disappearance, or unexpected changes outside project control (Molecule test image).
  • Level 4 service-UI acceptance is authorable now but its execution is deferred, pending ubongo, the playwright plugin, Authentik, and a staging deploy (Three testing levels — Level 4).
  • Promotion to staging and to production stays behind intentional manual approval gates; automation proves isolated correctness, a human confirms promotion safety (CI pipeline).