boma/docs/decisions/008-testing.md

158 lines
5.9 KiB
Markdown
Raw Normal View History

# ADR-008 — Testing methodology
## Context
Ansible roles must be idempotent and correct before they touch production hosts.
This document records the testing strategy, what each level covers, and — critically
— what is explicitly out of scope for automated testing and why.
---
## Three testing levels
### Level 1 — Molecule (per role, always required)
Runs in Docker on the control node or in CI. Fast (~5 min per role).
**What happens during `molecule test`:**
1. `create` — start the test container
2. `converge` — apply the role via `converge.yml`
3. **`idempotency`** — run `converge.yml` again; fail if any task reports `changed`
4. `verify` — assert expected state via `verify.yml`
5. `destroy` — remove the container
The idempotency step is non-negotiable. Every role must pass it cleanly.
**`verify.yml` must assert outcomes, not task success:**
```yaml
# Wrong — only proves the task ran
- assert:
that: result is success
# Right — proves the outcome exists
- ansible.builtin.command: systemctl is-active fail2ban
changed_when: false
register: svc
- ansible.builtin.assert:
that: svc.stdout == "active"
```
### Level 2 — Staging playbook (full stack, real VMs)
`make check PLAYBOOK=site` followed by `make deploy PLAYBOOK=site` on
Terraform-provisioned staging VMs. Catches inter-role dependencies and ordering
issues that Molecule cannot see (e.g., `docker_host` role requires `base` to
have already run and configured the firewall).
Run before every merge to `main`.
### Level 3 — External smoke test from askari
Once `askari` is operational: scripted checks from outside the network confirming
that public-facing services respond correctly. Catches firewall and reverse proxy
configuration issues invisible to Ansible check mode.
---
## Molecule test image
**No external images.** The project builds and hosts its own test image.
**Source**: `.docker/molecule-debian13/Dockerfile`
**Base**: `debian:trixie-slim` (official Debian 13, Docker Hub — only external
dependency permitted here, as the base OS image is not substitutable)
**Registry**: `forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest`
Build and push with:
```bash
make molecule-image # build locally
make molecule-image-push # push to Forgejo registry (requires docker login)
```
The scaffold `molecule.yml` references this image with `pre_build_image: true`,
meaning Molecule uses the image as-is and does not attempt to build it.
**Why not geerlingguy/docker-debian13-ansible?** It is a Docker Hub image outside
project control. It is not a Galaxy role, but it is an external dependency that
can drift, disappear, or introduce unexpected changes. The custom image is
functionally equivalent and fully owned.
---
## Idempotency requirements
Every role task must satisfy one of these:
| Task type | Requirement |
|---|---|
| `apt`, `template`, `copy`, `file`, `user`, `group`, `service` | Naturally idempotent — no action needed |
| `command` / `shell` (read-only) | `changed_when: false` |
| `command` / `shell` (detectable change) | `changed_when: result.stdout \| length > 0` or equivalent |
| `command` / `shell` (creates a file) | `creates: /path/to/artifact` |
| Service restart after config change | Move to a handler; handler fires only when notified |
| `docker compose up -d` | Handler only — notified by template change, never runs unconditionally |
ansible-lint enforces most of these at lint time. The Molecule idempotency step
catches anything lint misses.
---
## What Molecule tests — and what it does not
### Tested in Molecule
| Capability | Notes |
|---|---|
| Package installation | `apt` works in the container |
| File and directory creation, permissions, ownership | Full support |
| Template rendering and content | Full support |
| User and group management | Full support |
| Service installation and `systemd enable` | Requires the systemd-capable image |
| Service start/stop | Works for most services in the container |
| SSH configuration file content | File-level only |
| fail2ban installation and configuration | Install and config file; not live banning |
| Docker daemon installation | Works in privileged container |
| auditd installation and configuration | Install and config file |
| Idempotency of all of the above | Enforced by Molecule's idempotency step |
### Not tested in Molecule — explicit exceptions
The following require a real kernel or real hardware and are validated only at
Level 2 (staging) or Level 3 (external). This is a conscious, documented decision
— not a gap.
| Capability | Reason not testable in Molecule |
|---|---|
| `nftables` rule loading | Requires `nf_tables` kernel module; not available in Docker |
| WireGuard tunnel establishment | Requires `wireguard` kernel module |
| `unattended-upgrades` behaviour | Installs correctly; actual upgrade behaviour requires a real apt environment |
| DHCP behaviour (OPNsense) | OPNsense is managed by Ansible but not testable in a container |
| mDNS reflector (Avahi cross-VLAN) | Requires real network interfaces and VLANs |
| Hardware passthrough (NIC, USB) | Not applicable in containers |
| Corosync cluster formation | Requires multiple real nodes |
For the above, Molecule tests only what it can: that the relevant packages are
installed, that configuration files render correctly, and that services are enabled.
Behavioural correctness is confirmed on staging.
---
## CI pipeline
```
push to main
├── yamllint + ansible-lint (fast gate, ~1 min)
└── molecule test (all roles) (parallel, ~5 min per role)
on green (main)
├── review tf-plan if infra changed; make check on staging
└── [manual approval] make deploy PLAYBOOK=site on staging
promote to production
└── [manual approval] make deploy PLAYBOOK=site on production
```
Manual gates are intentional. Automated tests prove correctness in isolation;
a human confirms the change is safe to promote.