boma/docs/decisions/017-service-ui-verification.md
sjat 175777e36a docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)
- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:06:33 +02:00

110 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-017 — Service-UI acceptance verification (Level 4)
## Status
Accepted (2026-06-05). Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.
## Context
ADR-008 defines testing Levels 13 (Molecule, staging deploy, external smoke) and a
Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none
answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises
it, generates test users, and instructs the operator on manual tests. Today Claude sees
a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this
is the active counterpart.
## Decision
A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as
`/verify-service <name>` on `ubongo`. Five settled forks:
1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic
scripts. A scripted regression suite is explicitly not built here.
2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron
gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
3. **Staging, full exercise** — Claude creates test users and exercises features
(incl. destructive flows) against a *staging* deploy; the rebuildable sandbox
resolves safety.
4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through
Caddy (ADR-024) + Authentik as a real user would.
5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an
acceptance spec of critical journeys; Claude executes it and explores beyond it.
## VERIFY.md standard
Every service role ships a populated `roles/<service>/VERIFY.md`, copied from
`docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from
`service-security-template.md`. A new role convention. It lists the service's critical
user journeys (what "working" means), what good looks like, and what is not
browser-verifiable (→ manual handoff). It also joins the pre-production gate in
`docs/security/service-checklist.md`.
## Test-user standard (TODO 2.3)
Test identities live only in the **staging** Authentik (never production): a dedicated
`test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild
or explicit `test`-group cleanup.
## Reporting & manual handoff
`/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-<service>.md` (+ `latest.md`),
mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey,
observations, the test-user/env used, a verdict, and a structured **manual-test
checklist** for anything Claude can't do (physical device, paid/external flow,
subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links
them.
## Safety
- **Staging-only guard** — the skill refuses to run against production (exploratory
clicking is destructive); ADR-002-aligned hard stop.
- **Confined blast radius** — test users only in the staging `test` group; the run
sticks to the target service.
- **No secrets leaked** — the git-ignored screenshot dir is the safety boundary;
avoid capturing credential screens.
## Dependencies
- `ubongo` (ADR-015) — runs the browser. Designed, not built.
- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`).
- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
- `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched.
## What was ruled out
| Option | Reason |
|---|---|
| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. |
| Staging bypasses SSO / per-app users | Wouldn't exercise the real Caddy+Authentik path; central test users are faithful. |
| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. |
See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
## Consequences
- The harness is confined to staging by a hard stop: it refuses to run against
production because exploratory clicking is destructive, the blast radius is bounded to
the target service, and test users live only in the staging `test` group (Safety).
- No secrets leak: the git-ignored screenshot dir is the safety boundary and credential
screens are avoided (Safety; Reporting & manual handoff).
- Test identities are ephemeral per-run credentials in the staging Authentik only —
never production, none persisted in `vault.yml` — created reuse-or-create and torn
down via staging rebuild or `test`-group cleanup (Test-user standard).
- Anything Claude cannot exercise (physical device, paid/external flow, subjective
judgment) is handed off via a structured manual-test checklist in the run report
(Reporting & manual handoff).
- Authoring is possible now (this ADR, the `VERIFY.md` template, the `/verify-service`
skill, conventions/checklist edits), but running is deferred on its dependencies:
`ubongo`, the `playwright` plugin, Authentik, a staging deploy, and `make new-role`
scaffolding `VERIFY.md` (Status; Dependencies).