boma/docs/decisions/017-service-ui-verification.md
sjat 9e0c264658 docs: reconcile lower-severity review findings (O9-O24)
- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
  outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
  boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
  <host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
  placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
  offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)

O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:31:40 +02:00

6.1 KiB
Raw Blame History

ADR-017 — Service-UI acceptance verification (Level 4)

Status

Accepted (2026-06-05). Designed. Authorable now: this ADR, the ADR-008 Level 4 expansion, the VERIFY.md template, the /verify-service skill, the convention/checklist/Further-reading edits, .gitignore/dir, STATUS/TODO. Running is deferred on its dependencies.

Context

ADR-008 defines testing Levels 13 (Molecule, staging deploy, external smoke) and a Level 4 stub. Nothing below Level 4 exercises a service's application UI — none answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?" (TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users + manual-test instruction): Claude spins up a browser, sees the service UI, exercises it, generates test users, and instructs the operator on manual tests. Today Claude sees a browser only passively (/screenshot fetches operator-taken shots from mamba); this is the active counterpart.

Decision

A Claude-driven exploratory service-UI verification harness — Level 4 — invoked as /verify-service <name> on ubongo. Five settled forks:

  1. Claude-driven exploratory — Claude navigates with judgment, not deterministic scripts. A scripted regression suite is explicitly not built here.
  2. Interactive, Claude-in-the-loop — exploratory judgment can't be a headless cron gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
  3. Staging, full exercise — Claude creates test users and exercises features (incl. destructive flows) against a staging deploy; the rebuildable sandbox resolves safety.
  4. Test users in Authentik (central IdP), real SSO flow — authenticates through Caddy (ADR-024) + Authentik as a real user would.
  5. Per-service VERIFY.md backbone + free exploration — each service role ships an acceptance spec of critical journeys; Claude executes it and explores beyond it.

VERIFY.md standard

Every service role ships a populated roles/<service>/VERIFY.md, copied from docs/testing/service-verify-template.md — parallel to SECURITY.md from service-security-template.md. A new role convention. It lists the service's critical user journeys (what "working" means), what good looks like, and what is not browser-verifiable (→ manual handoff). It also joins the pre-production gate in docs/security/service-checklist.md.

Test-user standard (TODO 2.3)

Test identities live only in the staging Authentik (never production): a dedicated test group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so nothing persisted, none in vault.yml); reuse-or-create; teardown via staging rebuild or explicit test-group cleanup.

Reporting & manual handoff

/verify-service writes docs/testing/reviews/YYYY-MM-DD-<service>.md (+ latest.md), mirroring /review-repo and /capacity-review: pass/fail per VERIFY.md journey, observations, the test-user/env used, a verdict, and a structured manual-test checklist for anything Claude can't do (physical device, paid/external flow, subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a git-ignored working dir on ubongo (PNG bloat + secret-leak risk); the report links them.

Safety

  • Staging-only guard — the skill refuses to run against production (exploratory clicking is destructive); ADR-002-aligned hard stop.
  • Confined blast radius — test users only in the staging test group; the run sticks to the target service.
  • No secrets leaked — the git-ignored screenshot dir is the safety boundary; avoid capturing credential screens.

Dependencies

  • ubongo (ADR-015) — runs the browser. Designed, not built.
  • playwright Claude Code plugin — enabled when this lands (claude-code-setup.md).
  • Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
  • A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
  • make new-role scaffolding VERIFY.md — deferred to when that scaffold is next touched.

What was ruled out

Option Reason
Scripted Playwright regression suite Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this.
Scheduled headless smoke gate Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma.
Verify against production Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead.
Free-form, no per-service spec Non-repeatable, can miss a critical flow; VERIFY.md gives a backbone.
Staging bypasses SSO / per-app users Wouldn't exercise the real Caddy+Authentik path; central test users are faithful.
Commit screenshots to the repo Repo bloat + secret-leak risk; git-ignored on ubongo.

Consequences

  • The harness is confined to staging by a hard stop: it refuses to run against production because exploratory clicking is destructive, the blast radius is bounded to the target service, and test users live only in the staging test group (Safety).
  • No secrets leak: the git-ignored screenshot dir is the safety boundary and credential screens are avoided (Safety; Reporting & manual handoff).
  • Test identities are ephemeral per-run credentials in the staging Authentik only — never production, none persisted in vault.yml — created reuse-or-create and torn down via staging rebuild or test-group cleanup (Test-user standard).
  • Anything Claude cannot exercise (physical device, paid/external flow, subjective judgment) is handed off via a structured manual-test checklist in the run report (Reporting & manual handoff).
  • Authoring is possible now (this ADR, the VERIFY.md template, the /verify-service skill, conventions/checklist edits), but running is deferred on its dependencies: ubongo, the playwright plugin, Authentik, a staging deploy, and make new-role scaffolding VERIFY.md (Status; Dependencies).

ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security), ADR-004 (VERIFY.md parallels SECURITY.md), ADR-013/014 (heritage / knowledge sourcing).