boma/docs/decisions/017-service-ui-verification.md
sjat 9e0c264658 docs: reconcile lower-severity review findings (O9-O24)
- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
  outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
  boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
  <host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
  placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
  offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)

O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:31:40 +02:00

112 lines
6.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-017 — Service-UI acceptance verification (Level 4)
## Status
Accepted (2026-06-05). Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.
## Context
ADR-008 defines testing Levels 13 (Molecule, staging deploy, external smoke) and a
Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none
answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises
it, generates test users, and instructs the operator on manual tests. Today Claude sees
a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this
is the active counterpart.
## Decision
A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as
`/verify-service <name>` on `ubongo`. Five settled forks:
1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic
scripts. A scripted regression suite is explicitly not built here.
2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron
gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
3. **Staging, full exercise** — Claude creates test users and exercises features
(incl. destructive flows) against a *staging* deploy; the rebuildable sandbox
resolves safety.
4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through
Caddy (ADR-024) + Authentik as a real user would.
5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an
acceptance spec of critical journeys; Claude executes it and explores beyond it.
## VERIFY.md standard
Every service role ships a populated `roles/<service>/VERIFY.md`, copied from
`docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from
`service-security-template.md`. A new role convention. It lists the service's critical
user journeys (what "working" means), what good looks like, and what is not
browser-verifiable (→ manual handoff). It also joins the pre-production gate in
`docs/security/service-checklist.md`.
## Test-user standard (TODO 2.3)
Test identities live only in the **staging** Authentik (never production): a dedicated
`test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild
or explicit `test`-group cleanup.
## Reporting & manual handoff
`/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-<service>.md` (+ `latest.md`),
mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey,
observations, the test-user/env used, a verdict, and a structured **manual-test
checklist** for anything Claude can't do (physical device, paid/external flow,
subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links
them.
## Safety
- **Staging-only guard** — the skill refuses to run against production (exploratory
clicking is destructive); ADR-002-aligned hard stop.
- **Confined blast radius** — test users only in the staging `test` group; the run
sticks to the target service.
- **No secrets leaked** — the git-ignored screenshot dir is the safety boundary;
avoid capturing credential screens.
## Dependencies
- `ubongo` (ADR-015) — runs the browser. Designed, not built.
- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`).
- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
- `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched.
## What was ruled out
| Option | Reason |
|---|---|
| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. |
| Staging bypasses SSO / per-app users | Wouldn't exercise the real Caddy+Authentik path; central test users are faithful. |
| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. |
## Consequences
- The harness is confined to staging by a hard stop: it refuses to run against
production because exploratory clicking is destructive, the blast radius is bounded to
the target service, and test users live only in the staging `test` group (Safety).
- No secrets leak: the git-ignored screenshot dir is the safety boundary and credential
screens are avoided (Safety; Reporting & manual handoff).
- Test identities are ephemeral per-run credentials in the staging Authentik only —
never production, none persisted in `vault.yml` — created reuse-or-create and torn
down via staging rebuild or `test`-group cleanup (Test-user standard).
- Anything Claude cannot exercise (physical device, paid/external flow, subjective
judgment) is handed off via a structured manual-test checklist in the run report
(Reporting & manual handoff).
- Authoring is possible now (this ADR, the `VERIFY.md` template, the `/verify-service`
skill, conventions/checklist edits), but running is deferred on its dependencies:
`ubongo`, the `playwright` plugin, Authentik, a staging deploy, and `make new-role`
scaffolding `VERIFY.md` (Status; Dependencies).
## Related
ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).