boma/docs/decisions/008-testing.md
sjat 4732730515 docs: wire ADR-025 into testing/control-host/risks/status/capacity
- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the
  "not tested in Molecule" table
- ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs
  (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing
- accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records)
- CLAUDE.md: add make test-integration[/-clean] to key-commands;
  add ADR-025 + runbook rows to further-reading
- hardware/reference.md: note one ephemeral KVM test VM on ubongo
- STATUS.md: add integration harness entry (built, lint+pytest clean;
  RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:51:22 +02:00

216 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-008 — Testing methodology
> Practical point-of-use pitfalls (nft render checks, Molecule `community.docker`,
> apply-path coverage blind spots) live in `docs/testing/gotchas.md`.
## Status
Accepted (2026-05-30)
## Context
Ansible roles must be idempotent and correct before they touch production hosts.
This document records the testing strategy, what each level covers, and — critically
— what is explicitly out of scope for automated testing and why.
---
## Decision
### Three testing levels
#### Level 1 — Molecule (per role, always required)
Runs in Docker on the control node (`ubongo`) or in CI. Fast (~5 min per role).
**What happens during `molecule test`:**
1. `create` — start the test container
2. `converge` — apply the role via `converge.yml`
3. **`idempotency`** — run `converge.yml` again; fail if any task reports `changed`
4. `verify` — assert expected state via `verify.yml`
5. `destroy` — remove the container
The idempotency step is non-negotiable. Every role must pass it cleanly.
**`verify.yml` must assert outcomes, not task success:**
```yaml
# Wrong — only proves the task ran
- assert:
that: result is success
# Right — proves the outcome exists
- ansible.builtin.command: systemctl is-active fail2ban
changed_when: false
register: svc
- ansible.builtin.assert:
that: svc.stdout == "active"
```
#### Level 2 — Staging playbook (full stack, real VMs)
`make check PLAYBOOK=site` followed by `make deploy PLAYBOOK=site` on
Terraform-provisioned staging VMs. Catches inter-role dependencies and ordering
issues that Molecule cannot see (e.g., `docker_host` role requires `base` to
have already run and configured the firewall).
Run before every merge to `main`.
#### Level 3 — External smoke test from askari
Once `askari` is operational: scripted checks from outside the network confirming
that public-facing services respond correctly. Catches firewall and reverse proxy
configuration issues invisible to Ansible check mode.
#### Level 4 — Service-UI acceptance (Claude-driven exploratory)
A Claude-driven exploratory check of a service's **application UI**, run as
`/verify-service <name>` on `ubongo` (ADR-017). Claude drives Chromium via the
`playwright` plugin against a **staging** deploy, authenticates through the real
Caddy (ADR-024) + Authentik SSO flow using a test user in the staging `test` group, then
executes the service's `roles/<service>/VERIFY.md` acceptance journeys *and*
free-explores — judging pass/fail, screenshotting key states. It writes a dated report
to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything
it can't verify (hardware, paid/external flows, subjective judgment).
Catches application-level regressions no lower level sees ("does PhotoPrism actually
serve photos?"). Placement: after Level 2 (staging deploy), before production
promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate
(that role belongs to health checks / Uptime Kuma).
**Status:** the skill, the `VERIFY.md` template, and standards are authorable now;
running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging
deploy (STATUS.md). Full design: ADR-017.
---
### Molecule test image
**No external images.** The project builds and hosts its own test image.
**Source**: `.docker/molecule-debian13/Dockerfile`
**Base**: `debian:trixie-slim` (official Debian 13, Docker Hub — only external
dependency permitted here, as the base OS image is not substitutable)
**Registry**: `forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest`
Build and push with:
```bash
make molecule-image # build locally
make molecule-image-push # push to Forgejo registry (requires docker login)
```
The scaffold `molecule.yml` references this image with `pre_build_image: true`,
meaning Molecule uses the image as-is and does not attempt to build it.
**Why not geerlingguy/docker-debian13-ansible?** It is a Docker Hub image outside
project control. It is not a Galaxy role, but it is an external dependency that
can drift, disappear, or introduce unexpected changes. The custom image is
functionally equivalent and fully owned.
---
### Idempotency requirements
Every role task must satisfy one of these:
| Task type | Requirement |
|---|---|
| `apt`, `template`, `copy`, `file`, `user`, `group`, `service` | Naturally idempotent — no action needed |
| `command` / `shell` (read-only) | `changed_when: false` |
| `command` / `shell` (detectable change) | `changed_when: result.stdout \| length > 0` or equivalent |
| `command` / `shell` (creates a file) | `creates: /path/to/artifact` |
| Service restart after config change | Move to a handler; handler fires only when notified |
| `docker compose up -d` | Handler only — notified by template change, never runs unconditionally |
ansible-lint enforces most of these at lint time. The Molecule idempotency step
catches anything lint misses.
---
### What Molecule tests — and what it does not
#### Tested in Molecule
| Capability | Notes |
|---|---|
| Package installation | `apt` works in the container |
| File and directory creation, permissions, ownership | Full support |
| Template rendering and content | Full support |
| User and group management | Full support |
| Service installation and `systemd enable` | Requires the systemd-capable image |
| Service start/stop | Works for most services in the container |
| SSH configuration file content | File-level only |
| fail2ban installation and configuration | Install and config file; not live banning |
| Docker daemon installation | Works in privileged container |
| auditd installation and configuration | Install and config file |
| Idempotency of all of the above | Enforced by Molecule's idempotency step |
#### Not tested in Molecule — explicit exceptions
The following require a real kernel or real hardware and are validated only at
Level 2 (staging) or Level 3 (external). This is a conscious, documented decision
— not a gap.
| Capability | Reason not testable in Molecule |
|---|---|
| `nftables` rule loading | Requires `nf_tables` kernel module; not available in Docker |
| **Reboot-survivability / host-firewall × Docker interaction / boot-ordering** | **Requires a real kernel reboot — the class that caused the 2026-06-17 mesh-hardening incident. Now covered by local VM integration testing (ADR-025).** |
| NetBird mesh data plane (`wt0` WireGuard interface) | Requires the `wireguard` kernel module; Molecule checks only that the agent is installed/configured (ADR-016) |
| `unattended-upgrades` behaviour | Installs correctly; actual upgrade behaviour requires a real apt environment |
| DHCP behaviour (OPNsense) | OPNsense is managed by Ansible but not testable in a container |
| mDNS reflector (Avahi cross-VLAN) | Requires real network interfaces and VLANs |
| Hardware passthrough (NIC, USB) | Not applicable in containers |
| Corosync cluster formation | Requires multiple real nodes |
For the above, Molecule tests only what it can: that the relevant packages are
installed, that configuration files render correctly, and that services are enabled.
Behavioural correctness is confirmed on staging.
**ADR-025 is the concrete build of Level 2/3** — local VM integration testing on
ubongo (libvirt/KVM, throwaway overlay VMs, stdlib-only driver). It specifically
targets the reboot-survivability / host-firewall × Docker / boot-ordering class that
Molecule structurally cannot reach. See `docs/decisions/025-local-vm-integration-testing.md`.
---
### CI pipeline
```
push to main
├── yamllint + ansible-lint (fast gate, ~1 min)
└── molecule test (all roles) (parallel, ~5 min per role)
on green (main)
├── review tf-plan if infra changed; make check on staging
└── [manual approval] make deploy PLAYBOOK=site on staging
promote to production
└── [manual approval] make deploy PLAYBOOK=site on production
```
Manual gates are intentional. Automated tests prove correctness in isolation;
a human confirms the change is safe to promote.
---
## Consequences
Drawn from the limitations and trade-offs already stated above:
- The Molecule idempotency step is non-negotiable; every role must pass it cleanly
(Three testing levels — Level 1).
- A class of capabilities (nftables rule loading, NetBird mesh data plane,
unattended-upgrades behaviour, OPNsense DHCP, Avahi mDNS reflection, hardware
passthrough, corosync cluster formation) cannot be verified in Molecule and is
validated only at Level 2 (staging) or Level 3 (external) — a conscious,
documented decision, not a gap (What Molecule tests — and what it does not).
- The project builds and hosts its own `molecule-debian13` image rather than relying
on an external Docker Hub image (e.g. geerlingguy), accepting the maintenance of a
custom image to avoid drift, disappearance, or unexpected changes outside project
control (Molecule test image).
- Level 4 service-UI acceptance is authorable now but its execution is deferred,
pending `ubongo`, the `playwright` plugin, Authentik, and a staging deploy (Three
testing levels — Level 4).
- Promotion to staging and to production stays behind intentional manual approval
gates; automation proves isolated correctness, a human confirms promotion safety
(CI pipeline).