- STATUS: docker_host is built+applied, not scaffold-only (O1) - ADR-004: backup points to ADR-022, not "out of scope"; service-role file table gains ACCESS.md + BACKUP.md rows (O2, O5) - Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22) - ADR-016/017/018 now lead with ## Status per ADR-023 (O4) - ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6) - CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7) - ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18) - new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15) O9 (hosts.yml header) left open: the file is generator-owned (hook-protected); fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.1 KiB
ADR-017 — Service-UI acceptance verification (Level 4)
Status
Accepted (2026-06-05). Designed. Authorable now: this ADR, the ADR-008 Level 4 expansion, the VERIFY.md
template, the /verify-service skill, the convention/checklist/Further-reading edits,
.gitignore/dir, STATUS/TODO. Running is deferred on its dependencies.
Context
ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a
Level 4 stub. Nothing below Level 4 exercises a service's application UI — none
answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
manual-test instruction): Claude spins up a browser, sees the service UI, exercises
it, generates test users, and instructs the operator on manual tests. Today Claude sees
a browser only passively (/screenshot fetches operator-taken shots from mamba); this
is the active counterpart.
Decision
A Claude-driven exploratory service-UI verification harness — Level 4 — invoked as
/verify-service <name> on ubongo. Five settled forks:
- Claude-driven exploratory — Claude navigates with judgment, not deterministic scripts. A scripted regression suite is explicitly not built here.
- Interactive, Claude-in-the-loop — exploratory judgment can't be a headless cron gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
- Staging, full exercise — Claude creates test users and exercises features (incl. destructive flows) against a staging deploy; the rebuildable sandbox resolves safety.
- Test users in Authentik (central IdP), real SSO flow — authenticates through Caddy (ADR-024) + Authentik as a real user would.
- Per-service
VERIFY.mdbackbone + free exploration — each service role ships an acceptance spec of critical journeys; Claude executes it and explores beyond it.
VERIFY.md standard
Every service role ships a populated roles/<service>/VERIFY.md, copied from
docs/testing/service-verify-template.md — parallel to SECURITY.md from
service-security-template.md. A new role convention. It lists the service's critical
user journeys (what "working" means), what good looks like, and what is not
browser-verifiable (→ manual handoff). It also joins the pre-production gate in
docs/security/service-checklist.md.
Test-user standard (TODO 2.3)
Test identities live only in the staging Authentik (never production): a dedicated
test group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
nothing persisted, none in vault.yml); reuse-or-create; teardown via staging rebuild
or explicit test-group cleanup.
Reporting & manual handoff
/verify-service writes docs/testing/reviews/YYYY-MM-DD-<service>.md (+ latest.md),
mirroring /review-repo and /capacity-review: pass/fail per VERIFY.md journey,
observations, the test-user/env used, a verdict, and a structured manual-test
checklist for anything Claude can't do (physical device, paid/external flow,
subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
git-ignored working dir on ubongo (PNG bloat + secret-leak risk); the report links
them.
Safety
- Staging-only guard — the skill refuses to run against production (exploratory clicking is destructive); ADR-002-aligned hard stop.
- Confined blast radius — test users only in the staging
testgroup; the run sticks to the target service. - No secrets leaked — the git-ignored screenshot dir is the safety boundary; avoid capturing credential screens.
Dependencies
ubongo(ADR-015) — runs the browser. Designed, not built.playwrightClaude Code plugin — enabled when this lands (claude-code-setup.md).- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
make new-rolescaffoldingVERIFY.md— deferred to when that scaffold is next touched.
What was ruled out
| Option | Reason |
|---|---|
| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; VERIFY.md gives a backbone. |
| Staging bypasses SSO / per-app users | Wouldn't exercise the real Caddy+Authentik path; central test users are faithful. |
| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on ubongo. |
See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
ADR-004 (VERIFY.md parallels SECURITY.md), ADR-013/014 (heritage / knowledge sourcing).
Consequences
- The harness is confined to staging by a hard stop: it refuses to run against
production because exploratory clicking is destructive, the blast radius is bounded to
the target service, and test users live only in the staging
testgroup (Safety). - No secrets leak: the git-ignored screenshot dir is the safety boundary and credential screens are avoided (Safety; Reporting & manual handoff).
- Test identities are ephemeral per-run credentials in the staging Authentik only —
never production, none persisted in
vault.yml— created reuse-or-create and torn down via staging rebuild ortest-group cleanup (Test-user standard). - Anything Claude cannot exercise (physical device, paid/external flow, subjective judgment) is handed off via a structured manual-test checklist in the run report (Reporting & manual handoff).
- Authoring is possible now (this ADR, the
VERIFY.mdtemplate, the/verify-serviceskill, conventions/checklist edits), but running is deferred on its dependencies:ubongo, theplaywrightplugin, Authentik, a staging deploy, andmake new-rolescaffoldingVERIFY.md(Status; Dependencies).