- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the "not tested in Molecule" table - ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing - accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records) - CLAUDE.md: add make test-integration[/-clean] to key-commands; add ADR-025 + runbook rows to further-reading - hardware/reference.md: note one ephemeral KVM test VM on ubongo - STATUS.md: add integration harness entry (built, lint+pytest clean; RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
216 lines
9.2 KiB
Markdown
216 lines
9.2 KiB
Markdown
# ADR-008 — Testing methodology
|
||
|
||
> Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`,
|
||
> apply-path coverage blind spots) live in `docs/testing/gotchas.md`.
|
||
|
||
## Status
|
||
|
||
Accepted (2026-05-30)
|
||
|
||
## Context
|
||
|
||
Ansible roles must be idempotent and correct before they touch production hosts.
|
||
This document records the testing strategy, what each level covers, and — critically
|
||
— what is explicitly out of scope for automated testing and why.
|
||
|
||
---
|
||
|
||
## Decision
|
||
|
||
### Three testing levels
|
||
|
||
#### Level 1 — Molecule (per role, always required)
|
||
|
||
Runs in Docker on the control node (`ubongo`) or in CI. Fast (~5 min per role).
|
||
|
||
**What happens during `molecule test`:**
|
||
1. `create` — start the test container
|
||
2. `converge` — apply the role via `converge.yml`
|
||
3. **`idempotency`** — run `converge.yml` again; fail if any task reports `changed`
|
||
4. `verify` — assert expected state via `verify.yml`
|
||
5. `destroy` — remove the container
|
||
|
||
The idempotency step is non-negotiable. Every role must pass it cleanly.
|
||
|
||
**`verify.yml` must assert outcomes, not task success:**
|
||
|
||
```yaml
|
||
# Wrong — only proves the task ran
|
||
- assert:
|
||
that: result is success
|
||
|
||
# Right — proves the outcome exists
|
||
- ansible.builtin.command: systemctl is-active fail2ban
|
||
changed_when: false
|
||
register: svc
|
||
- ansible.builtin.assert:
|
||
that: svc.stdout == "active"
|
||
```
|
||
|
||
#### Level 2 — Staging playbook (full stack, real VMs)
|
||
|
||
`make check PLAYBOOK=site` followed by `make deploy PLAYBOOK=site` on
|
||
Terraform-provisioned staging VMs. Catches inter-role dependencies and ordering
|
||
issues that Molecule cannot see (e.g., `docker_host` role requires `base` to
|
||
have already run and configured the firewall).
|
||
|
||
Run before every merge to `main`.
|
||
|
||
#### Level 3 — External smoke test from askari
|
||
|
||
Once `askari` is operational: scripted checks from outside the network confirming
|
||
that public-facing services respond correctly. Catches firewall and reverse proxy
|
||
configuration issues invisible to Ansible check mode.
|
||
|
||
#### Level 4 — Service-UI acceptance (Claude-driven exploratory)
|
||
|
||
A Claude-driven exploratory check of a service's **application UI**, run as
|
||
`/verify-service <name>` on `ubongo` (ADR-017). Claude drives Chromium via the
|
||
`playwright` plugin against a **staging** deploy, authenticates through the real
|
||
Caddy (ADR-024) + Authentik SSO flow using a test user in the staging `test` group, then
|
||
executes the service's `roles/<service>/VERIFY.md` acceptance journeys *and*
|
||
free-explores — judging pass/fail, screenshotting key states. It writes a dated report
|
||
to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything
|
||
it can't verify (hardware, paid/external flows, subjective judgment).
|
||
|
||
Catches application-level regressions no lower level sees ("does PhotoPrism actually
|
||
serve photos?"). Placement: after Level 2 (staging deploy), before production
|
||
promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate
|
||
(that role belongs to health checks / Uptime Kuma).
|
||
|
||
**Status:** the skill, the `VERIFY.md` template, and standards are authorable now;
|
||
running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging
|
||
deploy (STATUS.md). Full design: ADR-017.
|
||
|
||
---
|
||
|
||
### Molecule test image
|
||
|
||
**No external images.** The project builds and hosts its own test image.
|
||
|
||
**Source**: `.docker/molecule-debian13/Dockerfile`
|
||
**Base**: `debian:trixie-slim` (official Debian 13, Docker Hub — only external
|
||
dependency permitted here, as the base OS image is not substitutable)
|
||
**Registry**: `forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest`
|
||
|
||
Build and push with:
|
||
```bash
|
||
make molecule-image # build locally
|
||
make molecule-image-push # push to Forgejo registry (requires docker login)
|
||
```
|
||
|
||
The scaffold `molecule.yml` references this image with `pre_build_image: true`,
|
||
meaning Molecule uses the image as-is and does not attempt to build it.
|
||
|
||
**Why not geerlingguy/docker-debian13-ansible?** It is a Docker Hub image outside
|
||
project control. It is not a Galaxy role, but it is an external dependency that
|
||
can drift, disappear, or introduce unexpected changes. The custom image is
|
||
functionally equivalent and fully owned.
|
||
|
||
---
|
||
|
||
### Idempotency requirements
|
||
|
||
Every role task must satisfy one of these:
|
||
|
||
| Task type | Requirement |
|
||
|---|---|
|
||
| `apt`, `template`, `copy`, `file`, `user`, `group`, `service` | Naturally idempotent — no action needed |
|
||
| `command` / `shell` (read-only) | `changed_when: false` |
|
||
| `command` / `shell` (detectable change) | `changed_when: result.stdout \| length > 0` or equivalent |
|
||
| `command` / `shell` (creates a file) | `creates: /path/to/artifact` |
|
||
| Service restart after config change | Move to a handler; handler fires only when notified |
|
||
| `docker compose up -d` | Handler only — notified by template change, never runs unconditionally |
|
||
|
||
ansible-lint enforces most of these at lint time. The Molecule idempotency step
|
||
catches anything lint misses.
|
||
|
||
---
|
||
|
||
### What Molecule tests — and what it does not
|
||
|
||
#### Tested in Molecule
|
||
|
||
| Capability | Notes |
|
||
|---|---|
|
||
| Package installation | `apt` works in the container |
|
||
| File and directory creation, permissions, ownership | Full support |
|
||
| Template rendering and content | Full support |
|
||
| User and group management | Full support |
|
||
| Service installation and `systemd enable` | Requires the systemd-capable image |
|
||
| Service start/stop | Works for most services in the container |
|
||
| SSH configuration file content | File-level only |
|
||
| fail2ban installation and configuration | Install and config file; not live banning |
|
||
| Docker daemon installation | Works in privileged container |
|
||
| auditd installation and configuration | Install and config file |
|
||
| Idempotency of all of the above | Enforced by Molecule's idempotency step |
|
||
|
||
#### Not tested in Molecule — explicit exceptions
|
||
|
||
The following require a real kernel or real hardware and are validated only at
|
||
Level 2 (staging) or Level 3 (external). This is a conscious, documented decision
|
||
— not a gap.
|
||
|
||
| Capability | Reason not testable in Molecule |
|
||
|---|---|
|
||
| `nftables` rule loading | Requires `nf_tables` kernel module; not available in Docker |
|
||
| **Reboot-survivability / host-firewall × Docker interaction / boot-ordering** | **Requires a real kernel reboot — the class that caused the 2026-06-17 mesh-hardening incident. Now covered by local VM integration testing (ADR-025).** |
|
||
| NetBird mesh data plane (`wt0` WireGuard interface) | Requires the `wireguard` kernel module; Molecule checks only that the agent is installed/configured (ADR-016) |
|
||
| `unattended-upgrades` behaviour | Installs correctly; actual upgrade behaviour requires a real apt environment |
|
||
| DHCP behaviour (OPNsense) | OPNsense is managed by Ansible but not testable in a container |
|
||
| mDNS reflector (Avahi cross-VLAN) | Requires real network interfaces and VLANs |
|
||
| Hardware passthrough (NIC, USB) | Not applicable in containers |
|
||
| Corosync cluster formation | Requires multiple real nodes |
|
||
|
||
For the above, Molecule tests only what it can: that the relevant packages are
|
||
installed, that configuration files render correctly, and that services are enabled.
|
||
Behavioural correctness is confirmed on staging.
|
||
|
||
**ADR-025 is the concrete build of Level 2/3** — local VM integration testing on
|
||
ubongo (libvirt/KVM, throwaway overlay VMs, stdlib-only driver). It specifically
|
||
targets the reboot-survivability / host-firewall × Docker / boot-ordering class that
|
||
Molecule structurally cannot reach. See `docs/decisions/025-local-vm-integration-testing.md`.
|
||
|
||
---
|
||
|
||
### CI pipeline
|
||
|
||
```
|
||
push to main
|
||
├── yamllint + ansible-lint (fast gate, ~1 min)
|
||
└── molecule test (all roles) (parallel, ~5 min per role)
|
||
|
||
on green (main)
|
||
├── review tf-plan if infra changed; make check on staging
|
||
└── [manual approval] make deploy PLAYBOOK=site on staging
|
||
|
||
promote to production
|
||
└── [manual approval] make deploy PLAYBOOK=site on production
|
||
```
|
||
|
||
Manual gates are intentional. Automated tests prove correctness in isolation;
|
||
a human confirms the change is safe to promote.
|
||
|
||
---
|
||
|
||
## Consequences
|
||
|
||
Drawn from the limitations and trade-offs already stated above:
|
||
|
||
- The Molecule idempotency step is non-negotiable; every role must pass it cleanly
|
||
(Three testing levels — Level 1).
|
||
- A class of capabilities (nftables rule loading, NetBird mesh data plane,
|
||
unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware
|
||
passthrough, corosync cluster formation) cannot be verified in Molecule and is
|
||
validated only at Level 2 (staging) or Level 3 (external) — a conscious,
|
||
documented decision, not a gap (What Molecule tests — and what it does not).
|
||
- The project builds and hosts its own `molecule-debian13` image rather than relying
|
||
on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a
|
||
custom image to avoid drift, disappearance, or unexpected changes outside project
|
||
control (Molecule test image).
|
||
- Level 4 service-UI acceptance is authorable now but its execution is deferred,
|
||
pending `ubongo`, the `playwright` plugin, Authentik, and a staging deploy (Three
|
||
testing levels — Level 4).
|
||
- Promotion to staging and to production stays behind intentional manual approval
|
||
gates; automation proves isolated correctness, a human confirms promotion safety
|
||
(CI pipeline).
|