# Service-UI Verification (Level 4) Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Build the authorable-now parts of ADR-008 Level 4 — a Claude-driven exploratory service-UI verification harness — namely ADR-017, the `/verify-service` skill, the per-service `VERIFY.md` template/convention, and the doc reconciliations; the *live run* stays deferred on `ubongo`/Authentik/staging. **Architecture:** Mostly documentation + two new authorable artifacts (the `/verify-service` Claude Code command and the `VERIFY.md` template). No application code, no Ansible roles (none of the prerequisite roles exist). The harness *mechanism* is the `playwright` Claude Code plugin driving Chromium on `ubongo`; this plan does not install or run it — it records the decision, the standards, and the orchestration logic. **Tech Stack:** Markdown + a Claude Code command file. Verification is the repo's pre-commit hooks plus a final cross-reference/staleness sweep. No markdown linter exists, so "tests" are hook-pass + grep checks. --- ## Pre-flight (read once before starting) - **`rbw` must be unlocked before every commit** (the pre-commit ansible-lint hook decrypts `vault.yml`). Run `rbw unlocked`; if it exits non-zero, stop and ask the user to `rbw unlock`. - **Commit style:** one commit per task, imperative subject ≤72 chars. - **Order matters:** Task 1 (ADR-017) lands first — later tasks link to it. - **Spec reference:** `docs/superpowers/specs/2026-06-05-service-ui-verification-design.md`. - **Branch:** the controller creates `chore/service-ui-verification-docs` off `main` before dispatching Task 1; do not implement on `main`. --- ## File map | File | Action | Responsibility | |---|---|---| | `docs/decisions/017-service-ui-verification.md` | Create | Home of record for Level 4 verification | | `docs/decisions/008-testing.md` | Modify | Expand the Level 4 stub; link ADR-017 | | `docs/testing/service-verify-template.md` | Create | The `VERIFY.md` template (parallels `service-security-template.md`) | | `.claude/commands/verify-service.md` | Create | The `/verify-service ` orchestrating skill | | `docs/security/service-checklist.md` | Modify | Add "passed Level 4" to the pre-deploy gate | | `CLAUDE.md` | Modify | Role-convention bullet (`VERIFY.md`); Further-reading ADR-017 row | | `.gitignore` | Modify | Ignore the screenshot working dir | | `docs/testing/reviews/README.md` | Create | Explains the committed-report dir (also makes the dir exist in git) | | `STATUS.md` | Modify | Row: Level 4 verification (skill/template authorable; running deferred) | | `docs/TODO.md` | Modify | Mark 2.2 (browser) + 2.3 addressed by ADR-017 | **Deferred (not in this plan):** scaffolding `VERIFY.md` into `make new-role` (do it when that scaffold is next touched — noted in ADR-017); the Authentik test-user provisioning automation; per-service `VERIFY.md` files (no service roles exist); installing/running the `playwright` plugin. --- ### Task 1: Author ADR-017 (the home of record) **Files:** - Create: `docs/decisions/017-service-ui-verification.md` - [ ] **Step 1: Create the ADR file** Create `docs/decisions/017-service-ui-verification.md` with exactly this content (preserve em-dashes —, backticks, table pipes): ```markdown # ADR-017 — Service-UI acceptance verification (Level 4) ## Context ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?" (TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users + manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises it, generates test users, and instructs the operator on manual tests. Today Claude sees a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this is the active counterpart. ## Decision A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as `/verify-service ` on `ubongo`. Five settled forks: 1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic scripts. A scripted regression suite is explicitly not built here. 2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later. 3. **Staging, full exercise** — Claude creates test users and exercises features (incl. destructive flows) against a *staging* deploy; the rebuildable sandbox resolves safety. 4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through Traefik + Authentik as a real user would. 5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an acceptance spec of critical journeys; Claude executes it and explores beyond it. ## VERIFY.md standard Every service role ships a populated `roles//VERIFY.md`, copied from `docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from `service-security-template.md`. A new role convention. It lists the service's critical user journeys (what "working" means), what good looks like, and what is not browser-verifiable (→ manual handoff). It also joins the pre-production gate in `docs/security/service-checklist.md`. ## Test-user standard (TODO 2.3) Test identities live only in the **staging** Authentik (never production): a dedicated `test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild or explicit `test`-group cleanup. ## Reporting & manual handoff `/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-.md` (+ `latest.md`), mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey, observations, the test-user/env used, a verdict, and a structured **manual-test checklist** for anything Claude can't do (physical device, paid/external flow, subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links them. ## Safety - **Staging-only guard** — the skill refuses to run against production (exploratory clicking is destructive); ADR-002-aligned hard stop. - **Confined blast radius** — test users only in the staging `test` group; the run sticks to the target service. - **No secrets leaked** — the git-ignored screenshot dir is the safety boundary; avoid capturing credential screens. ## Status Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md` template, the `/verify-service` skill, the convention/checklist/Further-reading edits, `.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies. ## Dependencies - `ubongo` (ADR-015) — runs the browser. Designed, not built. - `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`). - Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO. - A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs. - `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched. ## What was ruled out | Option | Reason | |---|---| | Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. | | Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. | | Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. | | Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. | | Staging bypasses SSO / per-app users | Wouldn't exercise the real Traefik+Authentik path; central test users are faithful. | | Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. | See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security), ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing). ``` - [ ] **Step 2: Verify and commit** Run: `rbw unlocked && pre-commit run --files docs/decisions/017-service-ui-verification.md` Expected: Passed/Skipped. ```bash git add docs/decisions/017-service-ui-verification.md git commit -m "Add ADR-017 (service-UI acceptance verification, Level 4)" ``` --- ### Task 2: Expand the ADR-008 Level 4 stub **Files:** - Modify: `docs/decisions/008-testing.md` - [ ] **Step 1: Replace the Level 4 stub with the full definition** Find this exact block: ``` ### Level 4 — Service-UI acceptance (planned, not built) Claude drives a headless browser from `ubongo` against a *deployed* service: loads the rendered UI, creates test users, exercises features, and hands the operator a manual test script for the rest. Catches application-level regressions that no lower level sees. The harness (Playwright/headless-Chromium, screenshot-back-to-Claude) is a **separate spec**; `ubongo` is sized for it (ADR-015). Status: designed, not built (STATUS.md). ``` Replace with: ``` ### Level 4 — Service-UI acceptance (Claude-driven exploratory) A Claude-driven exploratory check of a service's **application UI**, run as `/verify-service ` on `ubongo` (ADR-017). Claude drives Chromium via the `playwright` plugin against a **staging** deploy, authenticates through the real Traefik + Authentik SSO flow using a test user in the staging `test` group, then executes the service's `roles//VERIFY.md` acceptance journeys *and* free-explores — judging pass/fail, screenshotting key states. It writes a dated report to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything it can't verify (hardware, paid/external flows, subjective judgment). Catches application-level regressions no lower level sees ("does PhotoPrism actually serve photos?"). Placement: after Level 2 (staging deploy), before production promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate (that role belongs to health checks / Uptime Kuma). **Status:** the skill, the `VERIFY.md` template, and standards are authorable now; running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging deploy (STATUS.md). Full design: ADR-017. ``` - [ ] **Step 2: Verify and commit** Run: `rbw unlocked && pre-commit run --files docs/decisions/008-testing.md` Expected: Passed/Skipped. ```bash git add docs/decisions/008-testing.md git commit -m "ADR-008: expand Level 4 into the verify-service harness (ADR-017)" ``` --- ### Task 3: Create the `VERIFY.md` template **Files:** - Create: `docs/testing/service-verify-template.md` - [ ] **Step 1: Create the template** Create `docs/testing/service-verify-template.md` with exactly this content (preserve `<`/`>` HTML escapes, em-dashes, backticks): ```markdown # Per-service verification record — template Copy this file to `roles//VERIFY.md` and fill it in when building a service role (ADR-008 Level 4 / ADR-017). It is the per-service **acceptance spec**: the critical user journeys that define "working" for this service. `/verify-service ` reads it, drives a browser through them against the staging deploy, and explores beyond them. Delete this preamble in the copy and start from the heading below. --- # Verify — <service> ## Critical user journeys The acceptance criteria — what "working" means for this service. Numbered; each is an action and its expected result. Example shape (replace with this service's flows): 1. SSO login via Authentik succeeds and lands on the service's home/dashboard. 2. <core action> — e.g. "upload a test image" → <expected> — "a thumbnail renders". 3. <core action> → <expected>. ## What good looks like Key states/screens Claude should confirm (and screenshot) — the visual/textual signals that the journeys above actually succeeded. - <e.g. "the uploaded image appears in the library grid within ~10s"> ## Not browser-verifiable Items to route to the manual-test handoff — things a headless browser can't or shouldn't judge. - <e.g. hardware passthrough, a paid/external integration, subjective media quality> ## Test data What the journeys need, provisioned in the **staging** Authentik `test` group (ephemeral, torn down by staging rebuild). - <e.g. "one test user; no pre-seeded content"> ``` - [ ] **Step 2: Verify and commit** Run: `rbw unlocked && pre-commit run --files docs/testing/service-verify-template.md` Expected: Passed/Skipped. ```bash git add docs/testing/service-verify-template.md git commit -m "Add VERIFY.md template for service-UI acceptance (ADR-017)" ``` --- ### Task 4: Create the `/verify-service` skill **Files:** - Create: `.claude/commands/verify-service.md` - [ ] **Step 1: Create the command file** Create `.claude/commands/verify-service.md` with exactly this content (preserve em-dashes, backticks, code fences): ```markdown Exploratory service-UI verification (ADR-008 Level 4 / ADR-017) Drive a browser against a **staging** deploy of a service, exercise its `roles//VERIFY.md` acceptance journeys plus free exploration, and write a tracked report. Argument: the service/role name (e.g. `/verify-service photoprism`). ## Prerequisites (this is forward-looking — ADR-017 dependencies) This skill cannot run until all of these exist; if any is missing, say so and stop — do not improvise around it: - `ubongo` with the `playwright` Claude Code plugin (browser automation tools). - A **staging** deploy of the target service (ADR-008 Level 2). - Authentik (staging) for test-user provisioning + SSO. - `roles//VERIFY.md` present. ## Process ### Phase 0 — safety gate (staging only) Confirm the target resolves to the **staging** environment/inventory, never production. If you cannot prove it is staging, **stop** — exploratory clicking is destructive (ADR-002). State why you stopped. ### Phase 1 — read intent Read `roles//VERIFY.md`: the Critical user journeys, What good looks like, Not browser-verifiable, and Test data sections. ### Phase 2 — test user Provision (reuse-or-create) a test user in the staging Authentik `test` group, with ephemeral credentials held only for this run. Never use a real/production account. ### Phase 3 — drive the browser Via the `playwright` plugin, on `ubongo`: open the service's staging URL (resolved via boma DNS), authenticate through the real Traefik + Authentik SSO flow, then execute each `VERIFY.md` journey — judging pass/fail and screenshotting key states — and free-explore for anything obviously broken. Save screenshots to the git-ignored `.verify-runs/` working dir; avoid capturing credential screens. ### Phase 4 — write the report Save to `docs/testing/reviews/YYYY-MM-DD-.md` and overwrite `docs/testing/reviews/latest.md`. Structure: - **One-line verdict** — e.g. "5/5 journeys passed; one manual check pending". - **Run metadata** — date, service, staging env, test user, reviewed commit SHA. - **Per-journey result** — pass/fail against `VERIFY.md`, with the evidence (linked screenshot path) and any observation. - **Free-exploration findings** — anything noticed beyond the listed journeys. - **Manual-test checklist** — the "Not browser-verifiable" items plus anything Claude couldn't do: numbered steps, expected result, and why it was handed off. ### Phase 5 — clean up + commit Offer to clean up the `test`-group user (or note that the staging rebuild will). Commit the report markdown per CLAUDE.md git conventions. **Do not** commit `.verify-runs/` (git-ignored). ## Notes - Reports (markdown) are committed; screenshots stay local on `ubongo` in `.verify-runs/`. - Exploratory and interactive — this is not a deterministic CI gate. ``` - [ ] **Step 2: Verify and commit** Run: `rbw unlocked && pre-commit run --files .claude/commands/verify-service.md` Expected: Passed/Skipped. ```bash git add .claude/commands/verify-service.md git commit -m "Add /verify-service skill for Level 4 UI verification (ADR-017)" ``` --- ### Task 5: Add Level 4 to the service-clearance gate **Files:** - Modify: `docs/security/service-checklist.md` - [ ] **Step 1: Add an Operability bullet for Level 4** Find this exact block: ``` ## Operability (security-adjacent) - [ ] Logs go somewhere reviewable (central aggregation when available) - [ ] Backup/restore is covered if the service holds state ``` Replace with: ``` ## Operability (security-adjacent) - [ ] Logs go somewhere reviewable (central aggregation when available) - [ ] Backup/restore is covered if the service holds state - [ ] Passed Level 4 service-UI verification (`/verify-service`) against staging — the service has a populated `roles//VERIFY.md` and its critical journeys verified (ADR-008 Level 4 / ADR-017) ``` - [ ] **Step 2: Verify and commit** Run: `rbw unlocked && pre-commit run --files docs/security/service-checklist.md` Expected: Passed/Skipped. ```bash git add docs/security/service-checklist.md git commit -m "service-checklist: add Level 4 UI verification to the gate" ``` --- ### Task 6: Update CLAUDE.md (role convention + Further reading) **Files:** - Modify: `CLAUDE.md` - [ ] **Step 1: Add the `VERIFY.md` role-convention bullet** Find this exact line: ``` - Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md` ``` Replace with that SAME line followed by a new bullet: ``` - Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md` - Every **service** role must have a populated `VERIFY.md` (ADR-008/017) — copy `docs/testing/service-verify-template.md` ``` - [ ] **Step 2: Add the ADR-017 Further-reading row** Find this exact line: ``` | Testing methodology | `docs/decisions/008-testing.md` | ``` Replace with that SAME line followed by a new row: ``` | Testing methodology | `docs/decisions/008-testing.md` | | Service-UI verification (Level 4) | `docs/decisions/017-service-ui-verification.md` | ``` - [ ] **Step 3: Verify and commit** Run: `rbw unlocked && pre-commit run --files CLAUDE.md` Expected: Passed/Skipped. ```bash git add CLAUDE.md git commit -m "CLAUDE.md: VERIFY.md role convention; link ADR-017" ``` --- ### Task 7: Git-ignore screenshots + create the reviews dir **Files:** - Modify: `.gitignore` - Create: `docs/testing/reviews/README.md` - [ ] **Step 1: Add the screenshot working dir to `.gitignore`** Find this exact block at the end of `.gitignore`: ``` # Terraform terraform/**/.terraform/ terraform/**/*.tfstate terraform/**/*.tfstate.backup terraform/**/terraform.tfvars # .terraform.lock.hcl is intentionally tracked (pins provider versions) ``` Replace with: ``` # Terraform terraform/**/.terraform/ terraform/**/*.tfstate terraform/**/*.tfstate.backup terraform/**/terraform.tfvars # .terraform.lock.hcl is intentionally tracked (pins provider versions) # Service-UI verification screenshots (kept locally on ubongo, not committed — ADR-017) .verify-runs/ ``` - [ ] **Step 2: Create the reviews dir README (so the dir exists in git)** Create `docs/testing/reviews/README.md` with exactly this content: ```markdown # Service-UI verification reports Dated reports written by `/verify-service` (ADR-008 Level 4 / ADR-017), one per run: `YYYY-MM-DD-.md`, plus `latest.md`. These markdown reports are committed; the screenshots they reference stay local on `ubongo` in the git-ignored `.verify-runs/` working dir. No reports yet — the harness is designed, not yet runnable (see STATUS.md). ``` - [ ] **Step 3: Verify and commit** Run: `rbw unlocked && pre-commit run --files .gitignore docs/testing/reviews/README.md` Expected: Passed/Skipped. ```bash git add .gitignore docs/testing/reviews/README.md git commit -m "Git-ignore verify screenshots; add testing/reviews dir" ``` --- ### Task 8: Add the Level 4 row to STATUS.md **Files:** - Modify: `STATUS.md` - [ ] **Step 1: Add a row to the "Designed but not built" table** Find this exact line: ``` | NetBird agent enrollment in `base` | ADR-016 | Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on `wt0`. Designed; base role not built. | ``` Replace with that SAME line followed by the new row: ``` | NetBird agent enrollment in `base` | ADR-016 | Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on `wt0`. Designed; base role not built. | | Service-UI verification (Level 4) | ADR-017 / ADR-008 | `/verify-service` skill + `VERIFY.md` template + standards are authorable and present; *running* deferred on ubongo + `playwright` plugin + Authentik + a staging deploy. | ``` - [ ] **Step 2: Verify and commit** Run: `rbw unlocked && pre-commit run --files STATUS.md` Expected: Passed/Skipped. ```bash git add STATUS.md git commit -m "STATUS: record Level 4 service-UI verification (ADR-017)" ``` --- ### Task 9: Mark TODO 2.2/2.3 addressed **Files:** - Modify: `docs/TODO.md` - [ ] **Step 1: Annotate the Testing items** Find this exact block: ``` 2. **Testing** 1. Choose and configure code-testing tooling (Molecule, etc.). 2. Decide how the AI interprets Molecule output and performs live testing: API calls, curl pulls of web products, log reviews, and headless browsing. 3. Define a standard for generating test users and for instructing the user to perform relevant manual tests. ``` Replace with: ``` 2. **Testing** 1. Choose and configure code-testing tooling (Molecule, etc.). 2. Decide how the AI interprets Molecule output and performs live testing: API calls, curl pulls of web products, log reviews, and headless browsing. — Headless browsing DECIDED (ADR-017): the `/verify-service` Level 4 harness. The API/curl/log-review siblings remain open. 3. ~~Define a standard for generating test users and for instructing the user to perform relevant manual tests.~~ DECIDED (ADR-017): test users in the staging Authentik `test` group; manual tests handed off as a checklist in the `/verify-service` report. ``` - [ ] **Step 2: Verify and commit** Run: `rbw unlocked && pre-commit run --files docs/TODO.md` Expected: Passed/Skipped. ```bash git add docs/TODO.md git commit -m "TODO: mark headless-browsing + test-user standard decided (ADR-017)" ``` --- ### Task 10: Final consistency sweep **Files:** none modified (verification only) - [ ] **Step 1: Confirm ADR-017 is present and cross-linked** Run: ```bash test -f docs/decisions/017-service-ui-verification.md && echo "ADR-017 present" grep -rl "ADR-017\|017-service-ui-verification" docs/ CLAUDE.md STATUS.md .claude/ | grep -vE "superpowers/(plans|specs)/" ``` Expected: the file exists and the referencing files appear — ADR-008, CLAUDE.md, STATUS.md, the `VERIFY.md` template, the `/verify-service` skill, service-checklist, TODO, the reviews README. - [ ] **Step 2: Confirm the new artifacts exist and the Level 4 stub is gone** Run: ```bash ls docs/testing/service-verify-template.md .claude/commands/verify-service.md docs/testing/reviews/README.md grep -n "planned, not built" docs/decisions/008-testing.md || echo "Level 4 stub replaced (good)" grep -n "\.verify-runs/" .gitignore && echo "screenshot dir ignored (good)" ``` Expected: all three files listed; the old Level 4 "planned, not built" stub line gone; `.verify-runs/` in `.gitignore`. - [ ] **Step 3: Full hook run** Run: `rbw unlocked && pre-commit run --all-files` Expected: all hooks Passed/Skipped. Fix anything that fails (likely trailing whitespace / end-of-file) and amend the owning commit. - [ ] **Step 4: Push (only if the user asks)** ```bash git push origin ``` --- ## Self-review notes (author) - **Spec coverage:** decision/forks/architecture → Task 1 (ADR-017) + Task 2 (ADR-008); `VERIFY.md` standard → Task 3 (template) + Task 6 (convention) + Task 5 (gate); skill/mechanism/reporting/safety → Task 4 (`/verify-service`); reporting dir + screenshot policy → Task 7; STATUS/TODO reconciliation → Tasks 8–9. ✓ - **Buildable-now vs deferred:** every task is authorable without `ubongo`/Authentik/staging; the skill carries an explicit Prerequisites gate so it cannot pretend to run. Deferred items (new-role scaffold, Authentik automation, per-service `VERIFY.md`, plugin install) are recorded in ADR-017/STATUS, not implemented. ✓ - **No placeholders:** every create/edit shows exact content; the `<…>` tokens in the template are deliberate (match `service-security-template.md`'s house style). ✓ - **Name consistency:** `/verify-service`, `roles//VERIFY.md`, `docs/testing/service-verify-template.md`, `docs/testing/reviews/`, `.verify-runs/`, and the `test` Authentik group are used identically across all tasks. ✓ ```