- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional, outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative boma.baobab.band -> boma.wingu.me transition note already added earlier - terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and <host>.boma.baobab.band per ADR-007 naming (O11) - ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections placed after Consequences, matching ADR-014/019-023 (O13) - docs/README + inventories/README: list the missing subdirs / offsite_hosts + offsite.yml merge behaviour (O14, O29 note) - ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19) - ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20) - ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21) - netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23) - ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24) - capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28) - tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9) - tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep) O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected); the fix lives in the generator for the next regeneration. make lint + pytest (57) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.1 KiB
ADR-017 — Service-UI acceptance verification (Level 4)
Status
Accepted (2026-06-05). Designed. Authorable now: this ADR, the ADR-008 Level 4 expansion, the VERIFY.md
template, the /verify-service skill, the convention/checklist/Further-reading edits,
.gitignore/dir, STATUS/TODO. Running is deferred on its dependencies.
Context
ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a
Level 4 stub. Nothing below Level 4 exercises a service's application UI — none
answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
manual-test instruction): Claude spins up a browser, sees the service UI, exercises
it, generates test users, and instructs the operator on manual tests. Today Claude sees
a browser only passively (/screenshot fetches operator-taken shots from mamba); this
is the active counterpart.
Decision
A Claude-driven exploratory service-UI verification harness — Level 4 — invoked as
/verify-service <name> on ubongo. Five settled forks:
- Claude-driven exploratory — Claude navigates with judgment, not deterministic scripts. A scripted regression suite is explicitly not built here.
- Interactive, Claude-in-the-loop — exploratory judgment can't be a headless cron gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
- Staging, full exercise — Claude creates test users and exercises features (incl. destructive flows) against a staging deploy; the rebuildable sandbox resolves safety.
- Test users in Authentik (central IdP), real SSO flow — authenticates through Caddy (ADR-024) + Authentik as a real user would.
- Per-service
VERIFY.mdbackbone + free exploration — each service role ships an acceptance spec of critical journeys; Claude executes it and explores beyond it.
VERIFY.md standard
Every service role ships a populated roles/<service>/VERIFY.md, copied from
docs/testing/service-verify-template.md — parallel to SECURITY.md from
service-security-template.md. A new role convention. It lists the service's critical
user journeys (what "working" means), what good looks like, and what is not
browser-verifiable (→ manual handoff). It also joins the pre-production gate in
docs/security/service-checklist.md.
Test-user standard (TODO 2.3)
Test identities live only in the staging Authentik (never production): a dedicated
test group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
nothing persisted, none in vault.yml); reuse-or-create; teardown via staging rebuild
or explicit test-group cleanup.
Reporting & manual handoff
/verify-service writes docs/testing/reviews/YYYY-MM-DD-<service>.md (+ latest.md),
mirroring /review-repo and /capacity-review: pass/fail per VERIFY.md journey,
observations, the test-user/env used, a verdict, and a structured manual-test
checklist for anything Claude can't do (physical device, paid/external flow,
subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
git-ignored working dir on ubongo (PNG bloat + secret-leak risk); the report links
them.
Safety
- Staging-only guard — the skill refuses to run against production (exploratory clicking is destructive); ADR-002-aligned hard stop.
- Confined blast radius — test users only in the staging
testgroup; the run sticks to the target service. - No secrets leaked — the git-ignored screenshot dir is the safety boundary; avoid capturing credential screens.
Dependencies
ubongo(ADR-015) — runs the browser. Designed, not built.playwrightClaude Code plugin — enabled when this lands (claude-code-setup.md).- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
make new-rolescaffoldingVERIFY.md— deferred to when that scaffold is next touched.
What was ruled out
| Option | Reason |
|---|---|
| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; VERIFY.md gives a backbone. |
| Staging bypasses SSO / per-app users | Wouldn't exercise the real Caddy+Authentik path; central test users are faithful. |
| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on ubongo. |
Consequences
- The harness is confined to staging by a hard stop: it refuses to run against
production because exploratory clicking is destructive, the blast radius is bounded to
the target service, and test users live only in the staging
testgroup (Safety). - No secrets leak: the git-ignored screenshot dir is the safety boundary and credential screens are avoided (Safety; Reporting & manual handoff).
- Test identities are ephemeral per-run credentials in the staging Authentik only —
never production, none persisted in
vault.yml— created reuse-or-create and torn down via staging rebuild ortest-group cleanup (Test-user standard). - Anything Claude cannot exercise (physical device, paid/external flow, subjective judgment) is handed off via a structured manual-test checklist in the run report (Reporting & manual handoff).
- Authoring is possible now (this ADR, the
VERIFY.mdtemplate, the/verify-serviceskill, conventions/checklist edits), but running is deferred on its dependencies:ubongo, theplaywrightplugin, Authentik, a staging deploy, andmake new-rolescaffoldingVERIFY.md(Status; Dependencies).
Related
ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
ADR-004 (VERIFY.md parallels SECURITY.md), ADR-013/014 (heritage / knowledge sourcing).