TODO: mark headless-browsing + test-user standard decided (ADR-017)

STATUS: record Level 4 service-UI verification (ADR-017)
Git-ignore verify screenshots; add testing/reviews dir
2026-06-05 13:20:40 +02:00 · 2026-06-05 13:19:53 +02:00 · 2026-06-05 13:19:04 +02:00 · 2026-06-05 13:18:07 +02:00 · 2026-06-05 13:17:16 +02:00 · 2026-06-05 13:16:25 +02:00
12 changed files with 1049 additions and 9 deletions
--- a/.claude/commands/verify-service.md
+++ b/.claude/commands/verify-service.md
@ -0,0 +1,65 @@
+Exploratory service-UI verification (ADR-008 Level 4 / ADR-017)
+
+Drive a browser against a **staging** deploy of a service, exercise its
+`roles/<service>/VERIFY.md` acceptance journeys plus free exploration, and write a
+tracked report. Argument: the service/role name (e.g. `/verify-service photoprism`).
+
+## Prerequisites (this is forward-looking — ADR-017 dependencies)
+
+This skill cannot run until all of these exist; if any is missing, say so and stop —
+do not improvise around it:
+
+- `ubongo` with the `playwright` Claude Code plugin (browser automation tools).
+- A **staging** deploy of the target service (ADR-008 Level 2).
+- Authentik (staging) for test-user provisioning + SSO.
+- `roles/<name>/VERIFY.md` present.
+
+## Process
+
+### Phase 0 — safety gate (staging only)
+
+Confirm the target resolves to the **staging** environment/inventory, never production.
+If you cannot prove it is staging, **stop** — exploratory clicking is destructive
+(ADR-002). State why you stopped.
+
+### Phase 1 — read intent
+
+Read `roles/<name>/VERIFY.md`: the Critical user journeys, What good looks like, Not
+browser-verifiable, and Test data sections.
+
+### Phase 2 — test user
+
+Provision (reuse-or-create) a test user in the staging Authentik `test` group, with
+ephemeral credentials held only for this run. Never use a real/production account.
+
+### Phase 3 — drive the browser
+
+Via the `playwright` plugin, on `ubongo`: open the service's staging URL (resolved via
+boma DNS), authenticate through the real Traefik + Authentik SSO flow, then execute each
+`VERIFY.md` journey — judging pass/fail and screenshotting key states — and free-explore
+for anything obviously broken. Save screenshots to the git-ignored `.verify-runs/`
+working dir; avoid capturing credential screens.
+
+### Phase 4 — write the report
+
+Save to `docs/testing/reviews/YYYY-MM-DD-<name>.md` and overwrite
+`docs/testing/reviews/latest.md`. Structure:
+
+- **One-line verdict** — e.g. "5/5 journeys passed; one manual check pending".
+- **Run metadata** — date, service, staging env, test user, reviewed commit SHA.
+- **Per-journey result** — pass/fail against `VERIFY.md`, with the evidence (linked
+  screenshot path) and any observation.
+- **Free-exploration findings** — anything noticed beyond the listed journeys.
+- **Manual-test checklist** — the "Not browser-verifiable" items plus anything Claude
+  couldn't do: numbered steps, expected result, and why it was handed off.
+
+### Phase 5 — clean up + commit
+
+Offer to clean up the `test`-group user (or note that the staging rebuild will).
+Commit the report markdown per CLAUDE.md git conventions. **Do not** commit
+`.verify-runs/` (git-ignored).
+
+## Notes
+
+- Reports (markdown) are committed; screenshots stay local on `ubongo` in `.verify-runs/`.
+- Exploratory and interactive — this is not a deterministic CI gate.
--- a/.gitignore
+++ b/.gitignore
@ -31,3 +31,6 @@ terraform/**/*.tfstate
 terraform/**/*.tfstate.backup
 terraform/**/terraform.tfvars
 # .terraform.lock.hcl is intentionally tracked (pins provider versions)
+
+# Service-UI verification screenshots (kept locally on ubongo, not committed — ADR-017)
+.verify-runs/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -82,6 +82,7 @@ Full design rationale: `docs/decisions/`
 - Every role must have a populated `README.md`
 - Every role must have `meta/main.yml` filled in
 - Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md`
+- Every **service** role must have a populated `VERIFY.md` (ADR-008/017) — copy `docs/testing/service-verify-template.md`
 - One service = one self-contained role; no shared multi-service roles (ADR-004)
 - Role names: `snake_case`, descriptive nouns (`base`, `docker_host`, `reverse_proxy`)
 - Use `make new-role NAME=<name>` to scaffold — never create role structure by hand
@ -204,6 +205,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 | Network topology       | `docs/decisions/007-network.md`       |
 | Mesh VPN (NetBird, self-hosted) | `docs/decisions/016-mesh-vpn.md` |
 | Testing methodology    | `docs/decisions/008-testing.md`       |
+| Service-UI verification (Level 4) | `docs/decisions/017-service-ui-verification.md` |
 | TF ↔ Ansible handoff   | `docs/decisions/009-provisioning-handoff.md` |
 | Forgejo & CI           | `docs/decisions/010-forgejo-ci.md`    |
 | Update management      | `docs/decisions/011-update-management.md` |
--- a/STATUS.md
+++ b/STATUS.md
@ -55,6 +55,7 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
 | `ubongo` — physical control / AI-worker host | ADR-015 | Replaces the cluster control VM with a dedicated always-on x86 box outside the cluster. Decision recorded; box not yet acquired/installed, not in inventory. |
 | NetBird mesh — coordinator on `askari` | ADR-016 | Self-hosted NetBird control plane (management/signal/relay) on askari; replaces ADR-007 WireGuard. Decision recorded; not deployed (askari + service-role machinery not built). |
 | NetBird agent enrollment in `base` | ADR-016 | Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on `wt0`. Designed; base role not built. |
+| Service-UI verification (Level 4) | ADR-017 / ADR-008 | `/verify-service` skill + `VERIFY.md` template + standards are authorable and present; *running* deferred on ubongo + `playwright` plugin + Authentik + a staging deploy. |

 ## Keeping this honest

--- a/docs/TODO.md
+++ b/docs/TODO.md
@ -7,8 +7,12 @@
   1. Choose and configure code-testing tooling (Molecule, etc.).
   2. Decide how the AI interprets Molecule output and performs live testing:
      API calls, curl pulls of web products, log reviews, and headless browsing.
-   3. Define a standard for generating test users and for instructing the user to
-      perform relevant manual tests.
+      — Headless browsing DECIDED (ADR-017): the `/verify-service` Level 4 harness.
+      The API/curl/log-review siblings remain open.
+   3. ~~Define a standard for generating test users and for instructing the user to
+      perform relevant manual tests.~~ DECIDED (ADR-017): test users in the staging
+      Authentik `test` group; manual tests handed off as a checklist in the
+      `/verify-service` report.

 3. **Building services**
   1. Decide how to manage logs.
--- a/docs/decisions/008-testing.md
+++ b/docs/decisions/008-testing.md
@ -53,14 +53,25 @@ Once `askari` is operational: scripted checks from outside the network confirmin
 that public-facing services respond correctly. Catches firewall and reverse proxy
 configuration issues invisible to Ansible check mode.

-### Level 4 — Service-UI acceptance (planned, not built)
+### Level 4 — Service-UI acceptance (Claude-driven exploratory)

-Claude drives a headless browser from `ubongo` against a *deployed* service: loads
-the rendered UI, creates test users, exercises features, and hands the operator a
-manual test script for the rest. Catches application-level regressions that no lower
-level sees. The harness (Playwright/headless-Chromium, screenshot-back-to-Claude) is
-a **separate spec**; `ubongo` is sized for it (ADR-015). Status: designed, not built
-(STATUS.md).
+A Claude-driven exploratory check of a service's **application UI**, run as
+`/verify-service <name>` on `ubongo` (ADR-017). Claude drives Chromium via the
+`playwright` plugin against a **staging** deploy, authenticates through the real
+Traefik + Authentik SSO flow using a test user in the staging `test` group, then
+executes the service's `roles/<service>/VERIFY.md` acceptance journeys *and*
+free-explores — judging pass/fail, screenshotting key states. It writes a dated report
+to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything
+it can't verify (hardware, paid/external flows, subjective judgment).
+
+Catches application-level regressions no lower level sees ("does PhotoPrism actually
+serve photos?"). Placement: after Level 2 (staging deploy), before production
+promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate
+(that role belongs to health checks / Uptime Kuma).
+
+**Status:** the skill, the `VERIFY.md` template, and standards are authorable now;
+running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging
+deploy (STATUS.md). Full design: ADR-017.

 ---

--- a/docs/decisions/017-service-ui-verification.md
+++ b/docs/decisions/017-service-ui-verification.md
@ -0,0 +1,92 @@
+# ADR-017 — Service-UI acceptance verification (Level 4)
+
+## Context
+
+ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a
+Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none
+answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
+(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
+manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises
+it, generates test users, and instructs the operator on manual tests. Today Claude sees
+a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this
+is the active counterpart.
+
+## Decision
+
+A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as
+`/verify-service <name>` on `ubongo`. Five settled forks:
+
+1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic
+   scripts. A scripted regression suite is explicitly not built here.
+2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron
+   gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
+3. **Staging, full exercise** — Claude creates test users and exercises features
+   (incl. destructive flows) against a *staging* deploy; the rebuildable sandbox
+   resolves safety.
+4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through
+   Traefik + Authentik as a real user would.
+5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an
+   acceptance spec of critical journeys; Claude executes it and explores beyond it.
+
+## VERIFY.md standard
+
+Every service role ships a populated `roles/<service>/VERIFY.md`, copied from
+`docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from
+`service-security-template.md`. A new role convention. It lists the service's critical
+user journeys (what "working" means), what good looks like, and what is not
+browser-verifiable (→ manual handoff). It also joins the pre-production gate in
+`docs/security/service-checklist.md`.
+
+## Test-user standard (TODO 2.3)
+
+Test identities live only in the **staging** Authentik (never production): a dedicated
+`test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
+nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild
+or explicit `test`-group cleanup.
+
+## Reporting & manual handoff
+
+`/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-<service>.md` (+ `latest.md`),
+mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey,
+observations, the test-user/env used, a verdict, and a structured **manual-test
+checklist** for anything Claude can't do (physical device, paid/external flow,
+subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
+git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links
+them.
+
+## Safety
+
+- **Staging-only guard** — the skill refuses to run against production (exploratory
+  clicking is destructive); ADR-002-aligned hard stop.
+- **Confined blast radius** — test users only in the staging `test` group; the run
+  sticks to the target service.
+- **No secrets leaked** — the git-ignored screenshot dir is the safety boundary;
+  avoid capturing credential screens.
+
+## Status
+
+Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
+template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
+`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.
+
+## Dependencies
+
+- `ubongo` (ADR-015) — runs the browser. Designed, not built.
+- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`).
+- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
+- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
+- `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched.
+
+## What was ruled out
+
+| Option | Reason |
+|---|---|
+| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
+| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
+| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
+| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. |
+| Staging bypasses SSO / per-app users | Wouldn't exercise the real Traefik+Authentik path; central test users are faithful. |
+| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. |
+
+See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
+ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
--- a/docs/security/service-checklist.md
+++ b/docs/security/service-checklist.md
@ -48,6 +48,9 @@ This checklist is the generic **bar**. Each service answers it in its own

 - [ ] Logs go somewhere reviewable (central aggregation when available)
 - [ ] Backup/restore is covered if the service holds state
+- [ ] Passed Level 4 service-UI verification (`/verify-service`) against staging — the
+      service has a populated `roles/<service>/VERIFY.md` and its critical journeys
+      verified (ADR-008 Level 4 / ADR-017)

 > Deviations are allowed but must be **conscious**: record them in
 > `docs/security/accepted-risks.md`, don't leave them implicit.
--- a/docs/superpowers/plans/2026-06-05-service-ui-verification.md
+++ b/docs/superpowers/plans/2026-06-05-service-ui-verification.md
@ -0,0 +1,605 @@
+# Service-UI Verification (Level 4) Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Build the authorable-now parts of ADR-008 Level 4 — a Claude-driven exploratory service-UI verification harness — namely ADR-017, the `/verify-service` skill, the per-service `VERIFY.md` template/convention, and the doc reconciliations; the *live run* stays deferred on `ubongo`/Authentik/staging.
+
+**Architecture:** Mostly documentation + two new authorable artifacts (the `/verify-service` Claude Code command and the `VERIFY.md` template). No application code, no Ansible roles (none of the prerequisite roles exist). The harness *mechanism* is the `playwright` Claude Code plugin driving Chromium on `ubongo`; this plan does not install or run it — it records the decision, the standards, and the orchestration logic.
+
+**Tech Stack:** Markdown + a Claude Code command file. Verification is the repo's pre-commit hooks plus a final cross-reference/staleness sweep. No markdown linter exists, so "tests" are hook-pass + grep checks.
+
+---
+
+## Pre-flight (read once before starting)
+
+- **`rbw` must be unlocked before every commit** (the pre-commit ansible-lint hook decrypts `vault.yml`). Run `rbw unlocked`; if it exits non-zero, stop and ask the user to `rbw unlock`.
+- **Commit style:** one commit per task, imperative subject ≤72 chars.
+- **Order matters:** Task 1 (ADR-017) lands first — later tasks link to it.
+- **Spec reference:** `docs/superpowers/specs/2026-06-05-service-ui-verification-design.md`.
+- **Branch:** the controller creates `chore/service-ui-verification-docs` off `main` before dispatching Task 1; do not implement on `main`.
+
+---
+
+## File map
+
+| File | Action | Responsibility |
+|---|---|---|
+| `docs/decisions/017-service-ui-verification.md` | Create | Home of record for Level 4 verification |
+| `docs/decisions/008-testing.md` | Modify | Expand the Level 4 stub; link ADR-017 |
+| `docs/testing/service-verify-template.md` | Create | The `VERIFY.md` template (parallels `service-security-template.md`) |
+| `.claude/commands/verify-service.md` | Create | The `/verify-service <name>` orchestrating skill |
+| `docs/security/service-checklist.md` | Modify | Add "passed Level 4" to the pre-deploy gate |
+| `CLAUDE.md` | Modify | Role-convention bullet (`VERIFY.md`); Further-reading ADR-017 row |
+| `.gitignore` | Modify | Ignore the screenshot working dir |
+| `docs/testing/reviews/README.md` | Create | Explains the committed-report dir (also makes the dir exist in git) |
+| `STATUS.md` | Modify | Row: Level 4 verification (skill/template authorable; running deferred) |
+| `docs/TODO.md` | Modify | Mark 2.2 (browser) + 2.3 addressed by ADR-017 |
+
+**Deferred (not in this plan):** scaffolding `VERIFY.md` into `make new-role` (do it when that scaffold is next touched — noted in ADR-017); the Authentik test-user provisioning automation; per-service `VERIFY.md` files (no service roles exist); installing/running the `playwright` plugin.
+
+---
+
+### Task 1: Author ADR-017 (the home of record)
+
+**Files:**
+- Create: `docs/decisions/017-service-ui-verification.md`
+
+- [ ] **Step 1: Create the ADR file**
+
+Create `docs/decisions/017-service-ui-verification.md` with exactly this content (preserve em-dashes —, backticks, table pipes):
+
+```markdown
+# ADR-017 — Service-UI acceptance verification (Level 4)
+
+## Context
+
+ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a
+Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none
+answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
+(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
+manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises
+it, generates test users, and instructs the operator on manual tests. Today Claude sees
+a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this
+is the active counterpart.
+
+## Decision
+
+A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as
+`/verify-service <name>` on `ubongo`. Five settled forks:
+
+1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic
+   scripts. A scripted regression suite is explicitly not built here.
+2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron
+   gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
+3. **Staging, full exercise** — Claude creates test users and exercises features
+   (incl. destructive flows) against a *staging* deploy; the rebuildable sandbox
+   resolves safety.
+4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through
+   Traefik + Authentik as a real user would.
+5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an
+   acceptance spec of critical journeys; Claude executes it and explores beyond it.
+
+## VERIFY.md standard
+
+Every service role ships a populated `roles/<service>/VERIFY.md`, copied from
+`docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from
+`service-security-template.md`. A new role convention. It lists the service's critical
+user journeys (what "working" means), what good looks like, and what is not
+browser-verifiable (→ manual handoff). It also joins the pre-production gate in
+`docs/security/service-checklist.md`.
+
+## Test-user standard (TODO 2.3)
+
+Test identities live only in the **staging** Authentik (never production): a dedicated
+`test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
+nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild
+or explicit `test`-group cleanup.
+
+## Reporting & manual handoff
+
+`/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-<service>.md` (+ `latest.md`),
+mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey,
+observations, the test-user/env used, a verdict, and a structured **manual-test
+checklist** for anything Claude can't do (physical device, paid/external flow,
+subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
+git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links
+them.
+
+## Safety
+
+- **Staging-only guard** — the skill refuses to run against production (exploratory
+  clicking is destructive); ADR-002-aligned hard stop.
+- **Confined blast radius** — test users only in the staging `test` group; the run
+  sticks to the target service.
+- **No secrets leaked** — the git-ignored screenshot dir is the safety boundary;
+  avoid capturing credential screens.
+
+## Status
+
+Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
+template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
+`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.
+
+## Dependencies
+
+- `ubongo` (ADR-015) — runs the browser. Designed, not built.
+- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`).
+- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
+- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
+- `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched.
+
+## What was ruled out
+
+| Option | Reason |
+|---|---|
+| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
+| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
+| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
+| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. |
+| Staging bypasses SSO / per-app users | Wouldn't exercise the real Traefik+Authentik path; central test users are faithful. |
+| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. |
+
+See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
+ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
+```
+
+- [ ] **Step 2: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files docs/decisions/017-service-ui-verification.md`
+Expected: Passed/Skipped.
+```bash
+git add docs/decisions/017-service-ui-verification.md
+git commit -m "Add ADR-017 (service-UI acceptance verification, Level 4)"
+```
+
+---
+
+### Task 2: Expand the ADR-008 Level 4 stub
+
+**Files:**
+- Modify: `docs/decisions/008-testing.md`
+
+- [ ] **Step 1: Replace the Level 4 stub with the full definition**
+
+Find this exact block:
+```
+### Level 4 — Service-UI acceptance (planned, not built)
+
+Claude drives a headless browser from `ubongo` against a *deployed* service: loads
+the rendered UI, creates test users, exercises features, and hands the operator a
+manual test script for the rest. Catches application-level regressions that no lower
+level sees. The harness (Playwright/headless-Chromium, screenshot-back-to-Claude) is
+a **separate spec**; `ubongo` is sized for it (ADR-015). Status: designed, not built
+(STATUS.md).
+```
+Replace with:
+```
+### Level 4 — Service-UI acceptance (Claude-driven exploratory)
+
+A Claude-driven exploratory check of a service's **application UI**, run as
+`/verify-service <name>` on `ubongo` (ADR-017). Claude drives Chromium via the
+`playwright` plugin against a **staging** deploy, authenticates through the real
+Traefik + Authentik SSO flow using a test user in the staging `test` group, then
+executes the service's `roles/<service>/VERIFY.md` acceptance journeys *and*
+free-explores — judging pass/fail, screenshotting key states. It writes a dated report
+to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything
+it can't verify (hardware, paid/external flows, subjective judgment).
+
+Catches application-level regressions no lower level sees ("does PhotoPrism actually
+serve photos?"). Placement: after Level 2 (staging deploy), before production
+promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate
+(that role belongs to health checks / Uptime Kuma).
+
+**Status:** the skill, the `VERIFY.md` template, and standards are authorable now;
+running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging
+deploy (STATUS.md). Full design: ADR-017.
+```
+
+- [ ] **Step 2: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files docs/decisions/008-testing.md`
+Expected: Passed/Skipped.
+```bash
+git add docs/decisions/008-testing.md
+git commit -m "ADR-008: expand Level 4 into the verify-service harness (ADR-017)"
+```
+
+---
+
+### Task 3: Create the `VERIFY.md` template
+
+**Files:**
+- Create: `docs/testing/service-verify-template.md`
+
+- [ ] **Step 1: Create the template**
+
+Create `docs/testing/service-verify-template.md` with exactly this content (preserve `&lt;`/`&gt;` HTML escapes, em-dashes, backticks):
+
+```markdown
+# Per-service verification record — template
+
+Copy this file to `roles/<service>/VERIFY.md` and fill it in when building a service
+role (ADR-008 Level 4 / ADR-017). It is the per-service **acceptance spec**: the
+critical user journeys that define "working" for this service. `/verify-service <name>`
+reads it, drives a browser through them against the staging deploy, and explores beyond
+them.
+
+Delete this preamble in the copy and start from the heading below.
+
+---
+
+# Verify — &lt;service&gt;
+
+## Critical user journeys
+
+The acceptance criteria — what "working" means for this service. Numbered; each is an
+action and its expected result. Example shape (replace with this service's flows):
+
+1. SSO login via Authentik succeeds and lands on the service's home/dashboard.
+2. &lt;core action&gt; — e.g. "upload a test image" → &lt;expected&gt; — "a thumbnail renders".
+3. &lt;core action&gt; → &lt;expected&gt;.
+
+## What good looks like
+
+Key states/screens Claude should confirm (and screenshot) — the visual/textual signals
+that the journeys above actually succeeded.
+
+- &lt;e.g. "the uploaded image appears in the library grid within ~10s"&gt;
+
+## Not browser-verifiable
+
+Items to route to the manual-test handoff — things a headless browser can't or
+shouldn't judge.
+
+- &lt;e.g. hardware passthrough, a paid/external integration, subjective media quality&gt;
+
+## Test data
+
+What the journeys need, provisioned in the **staging** Authentik `test` group
+(ephemeral, torn down by staging rebuild).
+
+- &lt;e.g. "one test user; no pre-seeded content"&gt;
+```
+
+- [ ] **Step 2: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files docs/testing/service-verify-template.md`
+Expected: Passed/Skipped.
+```bash
+git add docs/testing/service-verify-template.md
+git commit -m "Add VERIFY.md template for service-UI acceptance (ADR-017)"
+```
+
+---
+
+### Task 4: Create the `/verify-service` skill
+
+**Files:**
+- Create: `.claude/commands/verify-service.md`
+
+- [ ] **Step 1: Create the command file**
+
+Create `.claude/commands/verify-service.md` with exactly this content (preserve em-dashes, backticks, code fences):
+
+```markdown
+Exploratory service-UI verification (ADR-008 Level 4 / ADR-017)
+
+Drive a browser against a **staging** deploy of a service, exercise its
+`roles/<service>/VERIFY.md` acceptance journeys plus free exploration, and write a
+tracked report. Argument: the service/role name (e.g. `/verify-service photoprism`).
+
+## Prerequisites (this is forward-looking — ADR-017 dependencies)
+
+This skill cannot run until all of these exist; if any is missing, say so and stop —
+do not improvise around it:
+
+- `ubongo` with the `playwright` Claude Code plugin (browser automation tools).
+- A **staging** deploy of the target service (ADR-008 Level 2).
+- Authentik (staging) for test-user provisioning + SSO.
+- `roles/<name>/VERIFY.md` present.
+
+## Process
+
+### Phase 0 — safety gate (staging only)
+
+Confirm the target resolves to the **staging** environment/inventory, never production.
+If you cannot prove it is staging, **stop** — exploratory clicking is destructive
+(ADR-002). State why you stopped.
+
+### Phase 1 — read intent
+
+Read `roles/<name>/VERIFY.md`: the Critical user journeys, What good looks like, Not
+browser-verifiable, and Test data sections.
+
+### Phase 2 — test user
+
+Provision (reuse-or-create) a test user in the staging Authentik `test` group, with
+ephemeral credentials held only for this run. Never use a real/production account.
+
+### Phase 3 — drive the browser
+
+Via the `playwright` plugin, on `ubongo`: open the service's staging URL (resolved via
+boma DNS), authenticate through the real Traefik + Authentik SSO flow, then execute each
+`VERIFY.md` journey — judging pass/fail and screenshotting key states — and free-explore
+for anything obviously broken. Save screenshots to the git-ignored `.verify-runs/`
+working dir; avoid capturing credential screens.
+
+### Phase 4 — write the report
+
+Save to `docs/testing/reviews/YYYY-MM-DD-<name>.md` and overwrite
+`docs/testing/reviews/latest.md`. Structure:
+
+- **One-line verdict** — e.g. "5/5 journeys passed; one manual check pending".
+- **Run metadata** — date, service, staging env, test user, reviewed commit SHA.
+- **Per-journey result** — pass/fail against `VERIFY.md`, with the evidence (linked
+  screenshot path) and any observation.
+- **Free-exploration findings** — anything noticed beyond the listed journeys.
+- **Manual-test checklist** — the "Not browser-verifiable" items plus anything Claude
+  couldn't do: numbered steps, expected result, and why it was handed off.
+
+### Phase 5 — clean up + commit
+
+Offer to clean up the `test`-group user (or note that the staging rebuild will).
+Commit the report markdown per CLAUDE.md git conventions. **Do not** commit
+`.verify-runs/` (git-ignored).
+
+## Notes
+
+- Reports (markdown) are committed; screenshots stay local on `ubongo` in `.verify-runs/`.
+- Exploratory and interactive — this is not a deterministic CI gate.
+```
+
+- [ ] **Step 2: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files .claude/commands/verify-service.md`
+Expected: Passed/Skipped.
+```bash
+git add .claude/commands/verify-service.md
+git commit -m "Add /verify-service skill for Level 4 UI verification (ADR-017)"
+```
+
+---
+
+### Task 5: Add Level 4 to the service-clearance gate
+
+**Files:**
+- Modify: `docs/security/service-checklist.md`
+
+- [ ] **Step 1: Add an Operability bullet for Level 4**
+
+Find this exact block:
+```
+## Operability (security-adjacent)
+
+- [ ] Logs go somewhere reviewable (central aggregation when available)
+- [ ] Backup/restore is covered if the service holds state
+```
+Replace with:
+```
+## Operability (security-adjacent)
+
+- [ ] Logs go somewhere reviewable (central aggregation when available)
+- [ ] Backup/restore is covered if the service holds state
+- [ ] Passed Level 4 service-UI verification (`/verify-service`) against staging — the
+      service has a populated `roles/<service>/VERIFY.md` and its critical journeys
+      verified (ADR-008 Level 4 / ADR-017)
+```
+
+- [ ] **Step 2: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files docs/security/service-checklist.md`
+Expected: Passed/Skipped.
+```bash
+git add docs/security/service-checklist.md
+git commit -m "service-checklist: add Level 4 UI verification to the gate"
+```
+
+---
+
+### Task 6: Update CLAUDE.md (role convention + Further reading)
+
+**Files:**
+- Modify: `CLAUDE.md`
+
+- [ ] **Step 1: Add the `VERIFY.md` role-convention bullet**
+
+Find this exact line:
+```
+- Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md`
+```
+Replace with that SAME line followed by a new bullet:
+```
+- Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md`
+- Every **service** role must have a populated `VERIFY.md` (ADR-008/017) — copy `docs/testing/service-verify-template.md`
+```
+
+- [ ] **Step 2: Add the ADR-017 Further-reading row**
+
+Find this exact line:
+```
+| Testing methodology    | `docs/decisions/008-testing.md`       |
+```
+Replace with that SAME line followed by a new row:
+```
+| Testing methodology    | `docs/decisions/008-testing.md`       |
+| Service-UI verification (Level 4) | `docs/decisions/017-service-ui-verification.md` |
+```
+
+- [ ] **Step 3: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files CLAUDE.md`
+Expected: Passed/Skipped.
+```bash
+git add CLAUDE.md
+git commit -m "CLAUDE.md: VERIFY.md role convention; link ADR-017"
+```
+
+---
+
+### Task 7: Git-ignore screenshots + create the reviews dir
+
+**Files:**
+- Modify: `.gitignore`
+- Create: `docs/testing/reviews/README.md`
+
+- [ ] **Step 1: Add the screenshot working dir to `.gitignore`**
+
+Find this exact block at the end of `.gitignore`:
+```
+# Terraform
+terraform/**/.terraform/
+terraform/**/*.tfstate
+terraform/**/*.tfstate.backup
+terraform/**/terraform.tfvars
+# .terraform.lock.hcl is intentionally tracked (pins provider versions)
+```
+Replace with:
+```
+# Terraform
+terraform/**/.terraform/
+terraform/**/*.tfstate
+terraform/**/*.tfstate.backup
+terraform/**/terraform.tfvars
+# .terraform.lock.hcl is intentionally tracked (pins provider versions)
+
+# Service-UI verification screenshots (kept locally on ubongo, not committed — ADR-017)
+.verify-runs/
+```
+
+- [ ] **Step 2: Create the reviews dir README (so the dir exists in git)**
+
+Create `docs/testing/reviews/README.md` with exactly this content:
+```markdown
+# Service-UI verification reports
+
+Dated reports written by `/verify-service` (ADR-008 Level 4 / ADR-017), one per run:
+`YYYY-MM-DD-<service>.md`, plus `latest.md`. These markdown reports are committed; the
+screenshots they reference stay local on `ubongo` in the git-ignored `.verify-runs/`
+working dir.
+
+No reports yet — the harness is designed, not yet runnable (see STATUS.md).
+```
+
+- [ ] **Step 3: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files .gitignore docs/testing/reviews/README.md`
+Expected: Passed/Skipped.
+```bash
+git add .gitignore docs/testing/reviews/README.md
+git commit -m "Git-ignore verify screenshots; add testing/reviews dir"
+```
+
+---
+
+### Task 8: Add the Level 4 row to STATUS.md
+
+**Files:**
+- Modify: `STATUS.md`
+
+- [ ] **Step 1: Add a row to the "Designed but not built" table**
+
+Find this exact line:
+```
+| NetBird agent enrollment in `base` | ADR-016 | Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on `wt0`. Designed; base role not built. |
+```
+Replace with that SAME line followed by the new row:
+```
+| NetBird agent enrollment in `base` | ADR-016 | Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on `wt0`. Designed; base role not built. |
+| Service-UI verification (Level 4) | ADR-017 / ADR-008 | `/verify-service` skill + `VERIFY.md` template + standards are authorable and present; *running* deferred on ubongo + `playwright` plugin + Authentik + a staging deploy. |
+```
+
+- [ ] **Step 2: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files STATUS.md`
+Expected: Passed/Skipped.
+```bash
+git add STATUS.md
+git commit -m "STATUS: record Level 4 service-UI verification (ADR-017)"
+```
+
+---
+
+### Task 9: Mark TODO 2.2/2.3 addressed
+
+**Files:**
+- Modify: `docs/TODO.md`
+
+- [ ] **Step 1: Annotate the Testing items**
+
+Find this exact block:
+```
+2. **Testing**
+   1. Choose and configure code-testing tooling (Molecule, etc.).
+   2. Decide how the AI interprets Molecule output and performs live testing:
+      API calls, curl pulls of web products, log reviews, and headless browsing.
+   3. Define a standard for generating test users and for instructing the user to
+      perform relevant manual tests.
+```
+Replace with:
+```
+2. **Testing**
+   1. Choose and configure code-testing tooling (Molecule, etc.).
+   2. Decide how the AI interprets Molecule output and performs live testing:
+      API calls, curl pulls of web products, log reviews, and headless browsing.
+      — Headless browsing DECIDED (ADR-017): the `/verify-service` Level 4 harness.
+      The API/curl/log-review siblings remain open.
+   3. ~~Define a standard for generating test users and for instructing the user to
+      perform relevant manual tests.~~ DECIDED (ADR-017): test users in the staging
+      Authentik `test` group; manual tests handed off as a checklist in the
+      `/verify-service` report.
+```
+
+- [ ] **Step 2: Verify and commit**
+
+Run: `rbw unlocked && pre-commit run --files docs/TODO.md`
+Expected: Passed/Skipped.
+```bash
+git add docs/TODO.md
+git commit -m "TODO: mark headless-browsing + test-user standard decided (ADR-017)"
+```
+
+---
+
+### Task 10: Final consistency sweep
+
+**Files:** none modified (verification only)
+
+- [ ] **Step 1: Confirm ADR-017 is present and cross-linked**
+
+Run:
+```bash
+test -f docs/decisions/017-service-ui-verification.md && echo "ADR-017 present"
+grep -rl "ADR-017\|017-service-ui-verification" docs/ CLAUDE.md STATUS.md .claude/ | grep -vE "superpowers/(plans|specs)/"
+```
+Expected: the file exists and the referencing files appear — ADR-008, CLAUDE.md, STATUS.md, the `VERIFY.md` template, the `/verify-service` skill, service-checklist, TODO, the reviews README.
+
+- [ ] **Step 2: Confirm the new artifacts exist and the Level 4 stub is gone**
+
+Run:
+```bash
+ls docs/testing/service-verify-template.md .claude/commands/verify-service.md docs/testing/reviews/README.md
+grep -n "planned, not built" docs/decisions/008-testing.md || echo "Level 4 stub replaced (good)"
+grep -n "\.verify-runs/" .gitignore && echo "screenshot dir ignored (good)"
+```
+Expected: all three files listed; the old Level 4 "planned, not built" stub line gone; `.verify-runs/` in `.gitignore`.
+
+- [ ] **Step 3: Full hook run**
+
+Run: `rbw unlocked && pre-commit run --all-files`
+Expected: all hooks Passed/Skipped. Fix anything that fails (likely trailing whitespace / end-of-file) and amend the owning commit.
+
+- [ ] **Step 4: Push (only if the user asks)**
+
+```bash
+git push origin <branch-or-main-after-merge>
+```
+
+---
+
+## Self-review notes (author)
+
+- **Spec coverage:** decision/forks/architecture → Task 1 (ADR-017) + Task 2 (ADR-008); `VERIFY.md` standard → Task 3 (template) + Task 6 (convention) + Task 5 (gate); skill/mechanism/reporting/safety → Task 4 (`/verify-service`); reporting dir + screenshot policy → Task 7; STATUS/TODO reconciliation → Tasks 8–9. ✓
+- **Buildable-now vs deferred:** every task is authorable without `ubongo`/Authentik/staging; the skill carries an explicit Prerequisites gate so it cannot pretend to run. Deferred items (new-role scaffold, Authentik automation, per-service `VERIFY.md`, plugin install) are recorded in ADR-017/STATUS, not implemented. ✓
+- **No placeholders:** every create/edit shows exact content; the `&lt;…&gt;` tokens in the template are deliberate (match `service-security-template.md`'s house style). ✓
+- **Name consistency:** `/verify-service`, `roles/<service>/VERIFY.md`, `docs/testing/service-verify-template.md`, `docs/testing/reviews/`, `.verify-runs/`, and the `test` Authentik group are used identically across all tasks. ✓
+```
--- a/docs/superpowers/specs/2026-06-05-service-ui-verification-design.md
+++ b/docs/superpowers/specs/2026-06-05-service-ui-verification-design.md
@ -0,0 +1,203 @@
+# Design — Service-UI acceptance verification (ADR-008 Level 4)
+
+- **Date:** 2026-06-05
+- **Status:** Approved design — pending implementation plan
+- **Resolves:** ADR-015 deferred item #2 (browser-E2E verification harness); TODO 2.2
+  (browser portion) + TODO 2.3 (test users + manual-test instruction)
+- **Expands:** ADR-008 Level 4 (currently a stub)
+- **Becomes:** ADR-017 (this design is the basis for that ADR)
+
+---
+
+## Problem
+
+ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a
+**Level 4 stub**: "Claude drives a headless browser from `ubongo` against a deployed
+service: loads the rendered UI, creates test users, exercises features, and hands the
+operator a manual test script." Nothing below Level 4 actually exercises a service's
+**application UI** — Molecule tests the role in a container, Level 2 confirms the stack
+converges, Level 3 confirms public endpoints respond. None answer "does PhotoPrism
+actually let me log in, upload a photo, and see a thumbnail?" (TODO 8.2).
+
+The operator's original ask: *"Claude could spin up a browser and actually see the
+generated service web-UIs to verify various things. Perhaps even generate test users
+and test features and instruct me on tests as well."* That is TODO 2.2 (headless
+browsing) + TODO 2.3 (test-user generation + manual-test instruction).
+
+Today Claude "sees" a browser only **passively** — the `/screenshot` skill fetches
+screenshots the operator took on `mamba`. This harness is the **active** counterpart:
+Claude drives the browser itself.
+
+## Decisions (the settled forks)
+
+1. **Nature — Claude-driven exploratory.** Claude navigates the live UI with judgment
+   (look, click, reason about whether it works, notice anything off), not deterministic
+   scripts. This is the distinctive value; a scripted Playwright regression suite is
+   explicitly *not* built here.
+2. **Mode — interactive, Claude-in-the-loop.** Follows from #1: exploratory judgment
+   can't be a headless cron gate. Scheduled smoke-testing stays out of scope (that is a
+   determinism job for health checks / Uptime Kuma later).
+3. **Environment — staging, full exercise.** Claude creates test users and exercises
+   features (including destructive flows) against a *staging* deploy. Staging is a
+   rebuildable sandbox, so this resolves safety: no production-data risk, no prod
+   pollution.
+4. **Auth — test users in Authentik (central IdP), real SSO flow.** Claude's browser
+   authenticates through Traefik + Authentik exactly as a real user would, faithfully
+   testing the real access path.
+5. **Structure — per-service `VERIFY.md` backbone + free exploration.** Each service
+   role ships an acceptance spec of critical user journeys; Claude executes it *and*
+   explores beyond it. Repeatable + intent-capturing, without losing exploratory value.
+
+## Scope
+
+In scope: the **browser/UI** verification harness (TODO 2.2 browser portion) + the
+**test-user** and **manual-test-instruction** standards (TODO 2.3) = ADR-008 **Level 4**.
+
+Out of scope (siblings, noted not built): the other TODO-2.2 "live testing" methods —
+API calls, `curl` pulls, log review. They share the spirit but are not browser work.
+Also out: a scripted/CI regression suite; scheduled headless smoke checks.
+
+---
+
+## Architecture, mechanism, and workflow placement
+
+**Mechanism.** Claude drives a real Chromium on `ubongo` via the **`playwright` Claude
+Code plugin** (already earmarked in `claude-code-setup.md`, enabled when this lands).
+No bespoke browser code — Claude calls the Playwright MCP tools (navigate, click, type,
+screenshot, read DOM) and reasons over what it sees. Active counterpart to the passive
+`/screenshot`-from-`mamba` pattern.
+
+**Orchestration.** A boma skill/command — **`/verify-service <name>`** — run
+interactively on `ubongo`. It:
+1. Reads the service's `roles/<name>/VERIFY.md` acceptance spec.
+2. Provisions/uses a test user in the **staging** Authentik.
+3. Drives the browser through the real SSO flow into the staging service.
+4. Executes the listed journeys exploratorily (judging pass/fail, screenshotting key
+   states) and free-explores.
+5. Writes a dated verification report with linked screenshots.
+6. Emits a manual-test checklist for anything it couldn't do.
+
+**Pipeline placement.** Level 4 runs after Level 2 (staging deploy) and before
+production promotion:
+`build role → molecule (L1) → staging deploy (L2) → /verify-service (L4) → promote`.
+It reaches the staging service over the LAN from `ubongo` (services on `srv`; resolved
+via boma DNS), through Traefik + Authentik as a real user would.
+
+**Boundaries (one unit, clear interface):** the skill *orchestrates*; `VERIFY.md`
+*declares intent* (per service); Authentik *provides identity*; the report *captures
+results*. Each is independently understandable and swappable.
+
+---
+
+## The `VERIFY.md` standard
+
+Every service role ships a populated `roles/<service>/VERIFY.md`, copied from a new
+template `docs/testing/service-verify-template.md` — parallel to how each role ships
+`SECURITY.md` from `service-security-template.md`. It becomes a **role convention**
+(every *service* role must have a populated `VERIFY.md`).
+
+Contents:
+- **Critical user journeys** — the acceptance criteria that define "working" for this
+  service (e.g. PhotoPrism: *SSO login → library loads → upload a test image →
+  thumbnail generates → search finds it*).
+- **What good looks like** — states/screenshots to confirm.
+- **Not browser-verifiable** — items to route to the manual-test handoff (hardware,
+  paid/external flows, subjective quality).
+
+`/verify-service` reads `roles/<name>/VERIFY.md`, executes those journeys, and explores
+beyond them.
+
+## Test-user generation standard (TODO 2.3)
+
+Test identities are provisioned in the **staging** Authentik (never the production IdP
+— test accounts must not exist in prod):
+- **Convention:** a dedicated `test` group / naming prefix (e.g. `test-<service>@…`) so
+  accounts are identifiable and bulk-removable.
+- **Credentials:** ephemeral, generated per run (staging is rebuildable); held only for
+  the run. No test creds in `vault.yml`.
+- **Idempotent:** reuse-or-create.
+- **Teardown:** primary teardown is the staging rebuild (sandbox); the skill also
+  offers explicit cleanup of the `test` group.
+
+## Reporting & manual-test handoff
+
+- **Report:** `/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-<service>.md`
+  (plus `latest.md`), mirroring `/review-repo`→`docs/reviews/` and
+  `/capacity-review`→`docs/hardware/reviews/`. It contains pass/fail per `VERIFY.md`
+  journey, observations, the test-user/env used, a verdict, and the manual-test
+  checklist. The committed markdown is the durable artifact.
+- **Screenshots:** saved to a **git-ignored** dir on `ubongo` (PNGs would bloat the
+  repo); the report links them and inlines only a few key evidence shots.
+- **Manual-test handoff (TODO 2.3):** anything Claude can't do — physical device,
+  paid/external flow, subjective judgment — becomes a **structured checklist** in the
+  report (numbered steps, expected result, why handed off). The operator runs them and
+  reports back. This is the "instruct me on tests" half of the vision, as a first-class
+  output.
+
+## Safety
+
+Even though staging is a sandbox:
+- **Staging-only guard.** The skill refuses to run against production (verifies it is
+  pointed at the staging environment/inventory before acting) — an ADR-002-aligned hard
+  stop, since exploratory clicking is destructive by nature.
+- **Confined blast radius.** Test users live only in the staging `test` group; the run
+  sticks to the target service.
+- **No secrets leaked.** Screenshots can capture on-screen tokens/credentials, so the
+  git-ignored screenshot dir is also the safety boundary (evidence isn't committed by
+  default), and the skill avoids capturing credential screens.
+
+---
+
+## Documentation & implementation changes
+
+This is a substantial capability → its own ADR-017, with reconciliations:
+
+| Doc / artifact | Change |
+|---|---|
+| ADR-017 (new) | Home of record: harness, the five settled forks, `VERIFY.md` standard, test-user + manual-handoff standards, safety. |
+| ADR-008 (testing) | Expand the Level 4 stub into the full definition; link ADR-017. |
+| `docs/testing/service-verify-template.md` (new) | The `VERIFY.md` template (parallels `service-security-template.md`). |
+| `.claude/commands/verify-service.md` (new) | The `/verify-service <name>` orchestrating skill. |
+| `CLAUDE.md` | Role conventions: every *service* role must ship a populated `VERIFY.md`. Further reading: ADR-017. |
+| `docs/security/service-checklist.md` | Add "passed Level 4 (`/verify-service`)" to the pre-production service-clearance gate. |
+| `.gitignore` + `docs/testing/reviews/` | Ignore the screenshot dir; create the reviews dir (README/`.gitkeep`). |
+| `STATUS.md` | Row: Level 4 verification — skill + template authorable; *running* deferred. |
+| `docs/TODO.md` | Mark 2.2 (browser portion) + 2.3 addressed by ADR-017; note API/`curl`/log siblings remain. |
+| `make new-role` scaffold | Scaffold `VERIFY.md` into new service roles (when that scaffold is next touched). |
+
+**Buildable now** (no `ubongo`/Authentik/staging needed): ADR-017, the ADR-008
+expansion, the `VERIFY.md` template, the `/verify-service` skill logic, the convention +
+checklist + Further-reading edits, `.gitignore`/dir, STATUS/TODO. This spec yields real
+working artifacts immediately — the skill and standards exist and are reviewable; only
+the *live run* waits on the stack.
+
+**Deferred** (needs the stack): actually running it (`ubongo` + `playwright` plugin +
+Authentik + a staging deploy); the Authentik test-user provisioning automation;
+per-service `VERIFY.md` files (need the service roles, which don't exist yet).
+
+---
+
+## Dependencies
+
+- `ubongo` (ADR-015) — the host that runs the browser. Designed, not built.
+- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`).
+- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
+- A staging environment with the service deployed (ADR-008 Level 2) — staging is
+  currently empty stubs.
+
+---
+
+## What was ruled out
+
+| Option | Reason |
+|---|---|
+| Scripted Playwright regression suite | The operator wants exploratory judgment, not deterministic scripts; scripts add authoring/maintenance burden. A scripted layer could come later but is not this. |
+| Scheduled headless smoke gate (cron) | Needs determinism, which the exploratory nature excludes; that role belongs to health checks / Uptime Kuma. |
+| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. Production gets non-destructive checks elsewhere, not here. |
+| Free-form exploration with no per-service spec | Flexible but non-repeatable and can miss a service's critical flow; `VERIFY.md` gives a backbone while keeping free exploration. |
+| Staging bypasses SSO / per-app local users | Wouldn't exercise the real Traefik+Authentik access path; central test users in Authentik are faithful. |
+| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`, markdown report committed. |
+
+See also: ADR-008 (testing — expanded), ADR-015 (control host — runs the browser),
+ADR-002 (security), ADR-004 (one service = one role — `VERIFY.md` parallels
+`SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
--- a/docs/testing/reviews/README.md
+++ b/docs/testing/reviews/README.md
@ -0,0 +1,8 @@
+# Service-UI verification reports
+
+Dated reports written by `/verify-service` (ADR-008 Level 4 / ADR-017), one per run:
+`YYYY-MM-DD-<service>.md`, plus `latest.md`. These markdown reports are committed; the
+screenshots they reference stay local on `ubongo` in the git-ignored `.verify-runs/`
+working dir.
+
+No reports yet — the harness is designed, not yet runnable (see STATUS.md).
--- a/docs/testing/service-verify-template.md
+++ b/docs/testing/service-verify-template.md
@ -0,0 +1,43 @@
+# Per-service verification record — template
+
+Copy this file to `roles/<service>/VERIFY.md` and fill it in when building a service
+role (ADR-008 Level 4 / ADR-017). It is the per-service **acceptance spec**: the
+critical user journeys that define "working" for this service. `/verify-service <name>`
+reads it, drives a browser through them against the staging deploy, and explores beyond
+them.
+
+Delete this preamble in the copy and start from the heading below.
+
+---
+
+# Verify — &lt;service&gt;
+
+## Critical user journeys
+
+The acceptance criteria — what "working" means for this service. Numbered; each is an
+action and its expected result. Example shape (replace with this service's flows):
+
+1. SSO login via Authentik succeeds and lands on the service's home/dashboard.
+2. &lt;core action&gt; — e.g. "upload a test image" → &lt;expected&gt; — "a thumbnail renders".
+3. &lt;core action&gt; → &lt;expected&gt;.
+
+## What good looks like
+
+Key states/screens Claude should confirm (and screenshot) — the visual/textual signals
+that the journeys above actually succeeded.
+
+- &lt;e.g. "the uploaded image appears in the library grid within ~10s"&gt;
+
+## Not browser-verifiable
+
+Items to route to the manual-test handoff — things a headless browser can't or
+shouldn't judge.
+
+- &lt;e.g. hardware passthrough, a paid/external integration, subjective media quality&gt;
+
+## Test data
+
+What the journeys need, provisioned in the **staging** Authentik `test` group
+(ephemeral, torn down by staging rebuild).
+
+- &lt;e.g. "one test user; no pre-seeded content"&gt;
Author	SHA1	Message	Date
sjat	91d851fe4d	TODO: mark headless-browsing + test-user standard decided (ADR-017)	2026-06-05 13:20:40 +02:00
sjat	01e4f96983	STATUS: record Level 4 service-UI verification (ADR-017)	2026-06-05 13:19:53 +02:00
sjat	eb415db96e	Git-ignore verify screenshots; add testing/reviews dir Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-05 13:19:04 +02:00
sjat	920e47b50d	CLAUDE.md: VERIFY.md role convention; link ADR-017 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-05 13:18:07 +02:00
sjat	22c0747c0b	service-checklist: add Level 4 UI verification to the gate	2026-06-05 13:17:16 +02:00
sjat	25f04002df	Add /verify-service skill for Level 4 UI verification (ADR-017)	2026-06-05 13:16:25 +02:00
sjat	05abb3b6a5	Add VERIFY.md template for service-UI acceptance (ADR-017)	2026-06-05 13:15:13 +02:00
sjat	2df1f98153	ADR-008: expand Level 4 into the verify-service harness (ADR-017)	2026-06-05 13:14:12 +02:00
sjat	cc3337502f	Add ADR-017 (service-UI acceptance verification, Level 4)	2026-06-05 13:13:09 +02:00
sjat	be6a064f44	Add implementation plan for service-UI verification (Level 4) Task-by-task: author ADR-017, expand ADR-008 Level 4, create the VERIFY.md template + /verify-service skill, and reconcile the checklist/CLAUDE.md/ gitignore/STATUS/TODO. Buildable-now artifacts; live run stays deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 13:11:43 +02:00
sjat	2bd11b5aa9	Add design spec for service-UI verification (ADR-008 Level 4) Resolves ADR-015 deferred item #2 + TODO 2.2/2.3: a Claude-driven exploratory browser harness (/verify-service) that exercises staging service UIs through real SSO, backed by a per-service VERIFY.md, with test users in staging Authentik and a manual-test handoff. Basis for ADR-017. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 13:05:11 +02:00