boma/docs/decisions/017-service-ui-verification.md
sjat 175777e36a docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)
- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:06:33 +02:00

6.1 KiB
Raw Blame History

ADR-017 — Service-UI acceptance verification (Level 4)

Status

Accepted (2026-06-05). Designed. Authorable now: this ADR, the ADR-008 Level 4 expansion, the VERIFY.md template, the /verify-service skill, the convention/checklist/Further-reading edits, .gitignore/dir, STATUS/TODO. Running is deferred on its dependencies.

Context

ADR-008 defines testing Levels 13 (Molecule, staging deploy, external smoke) and a Level 4 stub. Nothing below Level 4 exercises a service's application UI — none answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?" (TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users + manual-test instruction): Claude spins up a browser, sees the service UI, exercises it, generates test users, and instructs the operator on manual tests. Today Claude sees a browser only passively (/screenshot fetches operator-taken shots from mamba); this is the active counterpart.

Decision

A Claude-driven exploratory service-UI verification harness — Level 4 — invoked as /verify-service <name> on ubongo. Five settled forks:

  1. Claude-driven exploratory — Claude navigates with judgment, not deterministic scripts. A scripted regression suite is explicitly not built here.
  2. Interactive, Claude-in-the-loop — exploratory judgment can't be a headless cron gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
  3. Staging, full exercise — Claude creates test users and exercises features (incl. destructive flows) against a staging deploy; the rebuildable sandbox resolves safety.
  4. Test users in Authentik (central IdP), real SSO flow — authenticates through Caddy (ADR-024) + Authentik as a real user would.
  5. Per-service VERIFY.md backbone + free exploration — each service role ships an acceptance spec of critical journeys; Claude executes it and explores beyond it.

VERIFY.md standard

Every service role ships a populated roles/<service>/VERIFY.md, copied from docs/testing/service-verify-template.md — parallel to SECURITY.md from service-security-template.md. A new role convention. It lists the service's critical user journeys (what "working" means), what good looks like, and what is not browser-verifiable (→ manual handoff). It also joins the pre-production gate in docs/security/service-checklist.md.

Test-user standard (TODO 2.3)

Test identities live only in the staging Authentik (never production): a dedicated test group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so nothing persisted, none in vault.yml); reuse-or-create; teardown via staging rebuild or explicit test-group cleanup.

Reporting & manual handoff

/verify-service writes docs/testing/reviews/YYYY-MM-DD-<service>.md (+ latest.md), mirroring /review-repo and /capacity-review: pass/fail per VERIFY.md journey, observations, the test-user/env used, a verdict, and a structured manual-test checklist for anything Claude can't do (physical device, paid/external flow, subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a git-ignored working dir on ubongo (PNG bloat + secret-leak risk); the report links them.

Safety

  • Staging-only guard — the skill refuses to run against production (exploratory clicking is destructive); ADR-002-aligned hard stop.
  • Confined blast radius — test users only in the staging test group; the run sticks to the target service.
  • No secrets leaked — the git-ignored screenshot dir is the safety boundary; avoid capturing credential screens.

Dependencies

  • ubongo (ADR-015) — runs the browser. Designed, not built.
  • playwright Claude Code plugin — enabled when this lands (claude-code-setup.md).
  • Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
  • A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
  • make new-role scaffolding VERIFY.md — deferred to when that scaffold is next touched.

What was ruled out

Option Reason
Scripted Playwright regression suite Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this.
Scheduled headless smoke gate Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma.
Verify against production Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead.
Free-form, no per-service spec Non-repeatable, can miss a critical flow; VERIFY.md gives a backbone.
Staging bypasses SSO / per-app users Wouldn't exercise the real Caddy+Authentik path; central test users are faithful.
Commit screenshots to the repo Repo bloat + secret-leak risk; git-ignored on ubongo.

See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security), ADR-004 (VERIFY.md parallels SECURITY.md), ADR-013/014 (heritage / knowledge sourcing).

Consequences

  • The harness is confined to staging by a hard stop: it refuses to run against production because exploratory clicking is destructive, the blast radius is bounded to the target service, and test users live only in the staging test group (Safety).
  • No secrets leak: the git-ignored screenshot dir is the safety boundary and credential screens are avoided (Safety; Reporting & manual handoff).
  • Test identities are ephemeral per-run credentials in the staging Authentik only — never production, none persisted in vault.yml — created reuse-or-create and torn down via staging rebuild or test-group cleanup (Test-user standard).
  • Anything Claude cannot exercise (physical device, paid/external flow, subjective judgment) is handed off via a structured manual-test checklist in the run report (Reporting & manual handoff).
  • Authoring is possible now (this ADR, the VERIFY.md template, the /verify-service skill, conventions/checklist edits), but running is deferred on its dependencies: ubongo, the playwright plugin, Authentik, a staging deploy, and make new-role scaffolding VERIFY.md (Status; Dependencies).