2026-06-05 13:13:09 +02:00
|
|
|
|
# ADR-017 — Service-UI acceptance verification (Level 4)
|
|
|
|
|
|
|
|
|
|
|
|
## Context
|
|
|
|
|
|
|
|
|
|
|
|
ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a
|
|
|
|
|
|
Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none
|
|
|
|
|
|
answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
|
|
|
|
|
|
(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
|
|
|
|
|
|
manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises
|
|
|
|
|
|
it, generates test users, and instructs the operator on manual tests. Today Claude sees
|
|
|
|
|
|
a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this
|
|
|
|
|
|
is the active counterpart.
|
|
|
|
|
|
|
|
|
|
|
|
## Decision
|
|
|
|
|
|
|
|
|
|
|
|
A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as
|
|
|
|
|
|
`/verify-service <name>` on `ubongo`. Five settled forks:
|
|
|
|
|
|
|
|
|
|
|
|
1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic
|
|
|
|
|
|
scripts. A scripted regression suite is explicitly not built here.
|
|
|
|
|
|
2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron
|
|
|
|
|
|
gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
|
|
|
|
|
|
3. **Staging, full exercise** — Claude creates test users and exercises features
|
|
|
|
|
|
(incl. destructive flows) against a *staging* deploy; the rebuildable sandbox
|
|
|
|
|
|
resolves safety.
|
|
|
|
|
|
4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through
|
|
|
|
|
|
Traefik + Authentik as a real user would.
|
|
|
|
|
|
5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an
|
|
|
|
|
|
acceptance spec of critical journeys; Claude executes it and explores beyond it.
|
|
|
|
|
|
|
|
|
|
|
|
## VERIFY.md standard
|
|
|
|
|
|
|
|
|
|
|
|
Every service role ships a populated `roles/<service>/VERIFY.md`, copied from
|
|
|
|
|
|
`docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from
|
|
|
|
|
|
`service-security-template.md`. A new role convention. It lists the service's critical
|
|
|
|
|
|
user journeys (what "working" means), what good looks like, and what is not
|
|
|
|
|
|
browser-verifiable (→ manual handoff). It also joins the pre-production gate in
|
|
|
|
|
|
`docs/security/service-checklist.md`.
|
|
|
|
|
|
|
|
|
|
|
|
## Test-user standard (TODO 2.3)
|
|
|
|
|
|
|
|
|
|
|
|
Test identities live only in the **staging** Authentik (never production): a dedicated
|
|
|
|
|
|
`test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
|
|
|
|
|
|
nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild
|
|
|
|
|
|
or explicit `test`-group cleanup.
|
|
|
|
|
|
|
|
|
|
|
|
## Reporting & manual handoff
|
|
|
|
|
|
|
|
|
|
|
|
`/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-<service>.md` (+ `latest.md`),
|
|
|
|
|
|
mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey,
|
|
|
|
|
|
observations, the test-user/env used, a verdict, and a structured **manual-test
|
|
|
|
|
|
checklist** for anything Claude can't do (physical device, paid/external flow,
|
|
|
|
|
|
subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
|
|
|
|
|
|
git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links
|
|
|
|
|
|
them.
|
|
|
|
|
|
|
|
|
|
|
|
## Safety
|
|
|
|
|
|
|
|
|
|
|
|
- **Staging-only guard** — the skill refuses to run against production (exploratory
|
|
|
|
|
|
clicking is destructive); ADR-002-aligned hard stop.
|
|
|
|
|
|
- **Confined blast radius** — test users only in the staging `test` group; the run
|
|
|
|
|
|
sticks to the target service.
|
|
|
|
|
|
- **No secrets leaked** — the git-ignored screenshot dir is the safety boundary;
|
|
|
|
|
|
avoid capturing credential screens.
|
|
|
|
|
|
|
|
|
|
|
|
## Status
|
|
|
|
|
|
|
2026-06-10 14:51:51 +02:00
|
|
|
|
Accepted (2026-06-05). Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
|
2026-06-05 13:13:09 +02:00
|
|
|
|
template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
|
|
|
|
|
|
`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.
|
|
|
|
|
|
|
|
|
|
|
|
## Dependencies
|
|
|
|
|
|
|
|
|
|
|
|
- `ubongo` (ADR-015) — runs the browser. Designed, not built.
|
|
|
|
|
|
- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`).
|
|
|
|
|
|
- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
|
|
|
|
|
|
- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
|
|
|
|
|
|
- `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched.
|
|
|
|
|
|
|
|
|
|
|
|
## What was ruled out
|
|
|
|
|
|
|
|
|
|
|
|
| Option | Reason |
|
|
|
|
|
|
|---|---|
|
|
|
|
|
|
| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
|
|
|
|
|
|
| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
|
|
|
|
|
|
| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
|
|
|
|
|
|
| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. |
|
|
|
|
|
|
| Staging bypasses SSO / per-app users | Wouldn't exercise the real Traefik+Authentik path; central test users are faithful. |
|
|
|
|
|
|
| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. |
|
|
|
|
|
|
|
|
|
|
|
|
See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
|
|
|
|
|
|
ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
|
2026-06-10 14:51:51 +02:00
|
|
|
|
|
|
|
|
|
|
## Consequences
|
|
|
|
|
|
|
|
|
|
|
|
- The harness is confined to staging by a hard stop: it refuses to run against
|
|
|
|
|
|
production because exploratory clicking is destructive, the blast radius is bounded to
|
|
|
|
|
|
the target service, and test users live only in the staging `test` group (Safety).
|
|
|
|
|
|
- No secrets leak: the git-ignored screenshot dir is the safety boundary and credential
|
|
|
|
|
|
screens are avoided (Safety; Reporting & manual handoff).
|
|
|
|
|
|
- Test identities are ephemeral per-run credentials in the staging Authentik only —
|
|
|
|
|
|
never production, none persisted in `vault.yml` — created reuse-or-create and torn
|
|
|
|
|
|
down via staging rebuild or `test`-group cleanup (Test-user standard).
|
|
|
|
|
|
- Anything Claude cannot exercise (physical device, paid/external flow, subjective
|
|
|
|
|
|
judgment) is handed off via a structured manual-test checklist in the run report
|
|
|
|
|
|
(Reporting & manual handoff).
|
|
|
|
|
|
- Authoring is possible now (this ADR, the `VERIFY.md` template, the `/verify-service`
|
|
|
|
|
|
skill, conventions/checklist edits), but running is deferred on its dependencies:
|
|
|
|
|
|
`ubongo`, the `playwright` plugin, Authentik, a staging deploy, and `make new-role`
|
|
|
|
|
|
scaffolding `VERIFY.md` (Status; Dependencies).
|