From cc3337502fae3177b4d7d0513affe263c6154431 Mon Sep 17 00:00:00 2001 From: sjat Date: Fri, 5 Jun 2026 13:13:09 +0200 Subject: [PATCH] Add ADR-017 (service-UI acceptance verification, Level 4) --- docs/decisions/017-service-ui-verification.md | 92 +++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 docs/decisions/017-service-ui-verification.md diff --git a/docs/decisions/017-service-ui-verification.md b/docs/decisions/017-service-ui-verification.md new file mode 100644 index 0000000..62fdb5a --- /dev/null +++ b/docs/decisions/017-service-ui-verification.md @@ -0,0 +1,92 @@ +# ADR-017 — Service-UI acceptance verification (Level 4) + +## Context + +ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a +Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none +answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?" +(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users + +manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises +it, generates test users, and instructs the operator on manual tests. Today Claude sees +a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this +is the active counterpart. + +## Decision + +A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as +`/verify-service ` on `ubongo`. Five settled forks: + +1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic + scripts. A scripted regression suite is explicitly not built here. +2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron + gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later. +3. **Staging, full exercise** — Claude creates test users and exercises features + (incl. destructive flows) against a *staging* deploy; the rebuildable sandbox + resolves safety. +4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through + Traefik + Authentik as a real user would. +5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an + acceptance spec of critical journeys; Claude executes it and explores beyond it. + +## VERIFY.md standard + +Every service role ships a populated `roles//VERIFY.md`, copied from +`docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from +`service-security-template.md`. A new role convention. It lists the service's critical +user journeys (what "working" means), what good looks like, and what is not +browser-verifiable (→ manual handoff). It also joins the pre-production gate in +`docs/security/service-checklist.md`. + +## Test-user standard (TODO 2.3) + +Test identities live only in the **staging** Authentik (never production): a dedicated +`test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so +nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild +or explicit `test`-group cleanup. + +## Reporting & manual handoff + +`/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-.md` (+ `latest.md`), +mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey, +observations, the test-user/env used, a verdict, and a structured **manual-test +checklist** for anything Claude can't do (physical device, paid/external flow, +subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a +git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links +them. + +## Safety + +- **Staging-only guard** — the skill refuses to run against production (exploratory + clicking is destructive); ADR-002-aligned hard stop. +- **Confined blast radius** — test users only in the staging `test` group; the run + sticks to the target service. +- **No secrets leaked** — the git-ignored screenshot dir is the safety boundary; + avoid capturing credential screens. + +## Status + +Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md` +template, the `/verify-service` skill, the convention/checklist/Further-reading edits, +`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies. + +## Dependencies + +- `ubongo` (ADR-015) — runs the browser. Designed, not built. +- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`). +- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO. +- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs. +- `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched. + +## What was ruled out + +| Option | Reason | +|---|---| +| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. | +| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. | +| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. | +| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. | +| Staging bypasses SSO / per-app users | Wouldn't exercise the real Traefik+Authentik path; central test users are faithful. | +| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. | + +See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security), +ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).