boma/docs/superpowers/plans/2026-06-05-service-ui-verification.md
sjat be6a064f44 Add implementation plan for service-UI verification (Level 4)
Task-by-task: author ADR-017, expand ADR-008 Level 4, create the VERIFY.md
template + /verify-service skill, and reconcile the checklist/CLAUDE.md/
gitignore/STATUS/TODO. Buildable-now artifacts; live run stays deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 13:11:43 +02:00

25 KiB
Raw Blame History

Service-UI Verification (Level 4) Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build the authorable-now parts of ADR-008 Level 4 — a Claude-driven exploratory service-UI verification harness — namely ADR-017, the /verify-service skill, the per-service VERIFY.md template/convention, and the doc reconciliations; the live run stays deferred on ubongo/Authentik/staging.

Architecture: Mostly documentation + two new authorable artifacts (the /verify-service Claude Code command and the VERIFY.md template). No application code, no Ansible roles (none of the prerequisite roles exist). The harness mechanism is the playwright Claude Code plugin driving Chromium on ubongo; this plan does not install or run it — it records the decision, the standards, and the orchestration logic.

Tech Stack: Markdown + a Claude Code command file. Verification is the repo's pre-commit hooks plus a final cross-reference/staleness sweep. No markdown linter exists, so "tests" are hook-pass + grep checks.


Pre-flight (read once before starting)

  • rbw must be unlocked before every commit (the pre-commit ansible-lint hook decrypts vault.yml). Run rbw unlocked; if it exits non-zero, stop and ask the user to rbw unlock.
  • Commit style: one commit per task, imperative subject ≤72 chars.
  • Order matters: Task 1 (ADR-017) lands first — later tasks link to it.
  • Spec reference: docs/superpowers/specs/2026-06-05-service-ui-verification-design.md.
  • Branch: the controller creates chore/service-ui-verification-docs off main before dispatching Task 1; do not implement on main.

File map

File Action Responsibility
docs/decisions/017-service-ui-verification.md Create Home of record for Level 4 verification
docs/decisions/008-testing.md Modify Expand the Level 4 stub; link ADR-017
docs/testing/service-verify-template.md Create The VERIFY.md template (parallels service-security-template.md)
.claude/commands/verify-service.md Create The /verify-service <name> orchestrating skill
docs/security/service-checklist.md Modify Add "passed Level 4" to the pre-deploy gate
CLAUDE.md Modify Role-convention bullet (VERIFY.md); Further-reading ADR-017 row
.gitignore Modify Ignore the screenshot working dir
docs/testing/reviews/README.md Create Explains the committed-report dir (also makes the dir exist in git)
STATUS.md Modify Row: Level 4 verification (skill/template authorable; running deferred)
docs/TODO.md Modify Mark 2.2 (browser) + 2.3 addressed by ADR-017

Deferred (not in this plan): scaffolding VERIFY.md into make new-role (do it when that scaffold is next touched — noted in ADR-017); the Authentik test-user provisioning automation; per-service VERIFY.md files (no service roles exist); installing/running the playwright plugin.


Task 1: Author ADR-017 (the home of record)

Files:

  • Create: docs/decisions/017-service-ui-verification.md

  • Step 1: Create the ADR file

Create docs/decisions/017-service-ui-verification.md with exactly this content (preserve em-dashes —, backticks, table pipes):

# ADR-017 — Service-UI acceptance verification (Level 4)

## Context

ADR-008 defines testing Levels 13 (Molecule, staging deploy, external smoke) and a
Level 4 stub. Nothing below Level 4 exercises a service's **application UI** — none
answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?"
(TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users +
manual-test instruction): Claude spins up a browser, *sees* the service UI, exercises
it, generates test users, and instructs the operator on manual tests. Today Claude sees
a browser only passively (`/screenshot` fetches operator-taken shots from `mamba`); this
is the active counterpart.

## Decision

A Claude-driven exploratory service-UI verification harness — **Level 4** — invoked as
`/verify-service <name>` on `ubongo`. Five settled forks:

1. **Claude-driven exploratory** — Claude navigates with judgment, not deterministic
   scripts. A scripted regression suite is explicitly not built here.
2. **Interactive, Claude-in-the-loop** — exploratory judgment can't be a headless cron
   gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
3. **Staging, full exercise** — Claude creates test users and exercises features
   (incl. destructive flows) against a *staging* deploy; the rebuildable sandbox
   resolves safety.
4. **Test users in Authentik (central IdP), real SSO flow** — authenticates through
   Traefik + Authentik as a real user would.
5. **Per-service `VERIFY.md` backbone + free exploration** — each service role ships an
   acceptance spec of critical journeys; Claude executes it and explores beyond it.

## VERIFY.md standard

Every service role ships a populated `roles/<service>/VERIFY.md`, copied from
`docs/testing/service-verify-template.md` — parallel to `SECURITY.md` from
`service-security-template.md`. A new role convention. It lists the service's critical
user journeys (what "working" means), what good looks like, and what is not
browser-verifiable (→ manual handoff). It also joins the pre-production gate in
`docs/security/service-checklist.md`.

## Test-user standard (TODO 2.3)

Test identities live only in the **staging** Authentik (never production): a dedicated
`test` group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so
nothing persisted, none in `vault.yml`); reuse-or-create; teardown via staging rebuild
or explicit `test`-group cleanup.

## Reporting & manual handoff

`/verify-service` writes `docs/testing/reviews/YYYY-MM-DD-<service>.md` (+ `latest.md`),
mirroring `/review-repo` and `/capacity-review`: pass/fail per `VERIFY.md` journey,
observations, the test-user/env used, a verdict, and a structured **manual-test
checklist** for anything Claude can't do (physical device, paid/external flow,
subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a
git-ignored working dir on `ubongo` (PNG bloat + secret-leak risk); the report links
them.

## Safety

- **Staging-only guard** — the skill refuses to run against production (exploratory
  clicking is destructive); ADR-002-aligned hard stop.
- **Confined blast radius** — test users only in the staging `test` group; the run
  sticks to the target service.
- **No secrets leaked** — the git-ignored screenshot dir is the safety boundary;
  avoid capturing credential screens.

## Status

Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.

## Dependencies

- `ubongo` (ADR-015) — runs the browser. Designed, not built.
- `playwright` Claude Code plugin — enabled when this lands (`claude-code-setup.md`).
- Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
- A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
- `make new-role` scaffolding `VERIFY.md` — deferred to when that scaffold is next touched.

## What was ruled out

| Option | Reason |
|---|---|
| Scripted Playwright regression suite | Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this. |
| Scheduled headless smoke gate | Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma. |
| Verify against production | Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead. |
| Free-form, no per-service spec | Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone. |
| Staging bypasses SSO / per-app users | Wouldn't exercise the real Traefik+Authentik path; central test users are faithful. |
| Commit screenshots to the repo | Repo bloat + secret-leak risk; git-ignored on `ubongo`. |

See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
  • Step 2: Verify and commit

Run: rbw unlocked && pre-commit run --files docs/decisions/017-service-ui-verification.md Expected: Passed/Skipped.

git add docs/decisions/017-service-ui-verification.md
git commit -m "Add ADR-017 (service-UI acceptance verification, Level 4)"

Task 2: Expand the ADR-008 Level 4 stub

Files:

  • Modify: docs/decisions/008-testing.md

  • Step 1: Replace the Level 4 stub with the full definition

Find this exact block:

### Level 4 — Service-UI acceptance (planned, not built)

Claude drives a headless browser from `ubongo` against a *deployed* service: loads
the rendered UI, creates test users, exercises features, and hands the operator a
manual test script for the rest. Catches application-level regressions that no lower
level sees. The harness (Playwright/headless-Chromium, screenshot-back-to-Claude) is
a **separate spec**; `ubongo` is sized for it (ADR-015). Status: designed, not built
(STATUS.md).

Replace with:

### Level 4 — Service-UI acceptance (Claude-driven exploratory)

A Claude-driven exploratory check of a service's **application UI**, run as
`/verify-service <name>` on `ubongo` (ADR-017). Claude drives Chromium via the
`playwright` plugin against a **staging** deploy, authenticates through the real
Traefik + Authentik SSO flow using a test user in the staging `test` group, then
executes the service's `roles/<service>/VERIFY.md` acceptance journeys *and*
free-explores — judging pass/fail, screenshotting key states. It writes a dated report
to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything
it can't verify (hardware, paid/external flows, subjective judgment).

Catches application-level regressions no lower level sees ("does PhotoPrism actually
serve photos?"). Placement: after Level 2 (staging deploy), before production
promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate
(that role belongs to health checks / Uptime Kuma).

**Status:** the skill, the `VERIFY.md` template, and standards are authorable now;
running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging
deploy (STATUS.md). Full design: ADR-017.
  • Step 2: Verify and commit

Run: rbw unlocked && pre-commit run --files docs/decisions/008-testing.md Expected: Passed/Skipped.

git add docs/decisions/008-testing.md
git commit -m "ADR-008: expand Level 4 into the verify-service harness (ADR-017)"

Task 3: Create the VERIFY.md template

Files:

  • Create: docs/testing/service-verify-template.md

  • Step 1: Create the template

Create docs/testing/service-verify-template.md with exactly this content (preserve &lt;/&gt; HTML escapes, em-dashes, backticks):

# Per-service verification record — template

Copy this file to `roles/<service>/VERIFY.md` and fill it in when building a service
role (ADR-008 Level 4 / ADR-017). It is the per-service **acceptance spec**: the
critical user journeys that define "working" for this service. `/verify-service <name>`
reads it, drives a browser through them against the staging deploy, and explores beyond
them.

Delete this preamble in the copy and start from the heading below.

---

# Verify — &lt;service&gt;

## Critical user journeys

The acceptance criteria — what "working" means for this service. Numbered; each is an
action and its expected result. Example shape (replace with this service's flows):

1. SSO login via Authentik succeeds and lands on the service's home/dashboard.
2. &lt;core action&gt; — e.g. "upload a test image" → &lt;expected&gt; — "a thumbnail renders".
3. &lt;core action&gt;&lt;expected&gt;.

## What good looks like

Key states/screens Claude should confirm (and screenshot) — the visual/textual signals
that the journeys above actually succeeded.

- &lt;e.g. "the uploaded image appears in the library grid within ~10s"&gt;

## Not browser-verifiable

Items to route to the manual-test handoff — things a headless browser can't or
shouldn't judge.

- &lt;e.g. hardware passthrough, a paid/external integration, subjective media quality&gt;

## Test data

What the journeys need, provisioned in the **staging** Authentik `test` group
(ephemeral, torn down by staging rebuild).

- &lt;e.g. "one test user; no pre-seeded content"&gt;
  • Step 2: Verify and commit

Run: rbw unlocked && pre-commit run --files docs/testing/service-verify-template.md Expected: Passed/Skipped.

git add docs/testing/service-verify-template.md
git commit -m "Add VERIFY.md template for service-UI acceptance (ADR-017)"

Task 4: Create the /verify-service skill

Files:

  • Create: .claude/commands/verify-service.md

  • Step 1: Create the command file

Create .claude/commands/verify-service.md with exactly this content (preserve em-dashes, backticks, code fences):

Exploratory service-UI verification (ADR-008 Level 4 / ADR-017)

Drive a browser against a **staging** deploy of a service, exercise its
`roles/<service>/VERIFY.md` acceptance journeys plus free exploration, and write a
tracked report. Argument: the service/role name (e.g. `/verify-service photoprism`).

## Prerequisites (this is forward-looking — ADR-017 dependencies)

This skill cannot run until all of these exist; if any is missing, say so and stop —
do not improvise around it:

- `ubongo` with the `playwright` Claude Code plugin (browser automation tools).
- A **staging** deploy of the target service (ADR-008 Level 2).
- Authentik (staging) for test-user provisioning + SSO.
- `roles/<name>/VERIFY.md` present.

## Process

### Phase 0 — safety gate (staging only)

Confirm the target resolves to the **staging** environment/inventory, never production.
If you cannot prove it is staging, **stop** — exploratory clicking is destructive
(ADR-002). State why you stopped.

### Phase 1 — read intent

Read `roles/<name>/VERIFY.md`: the Critical user journeys, What good looks like, Not
browser-verifiable, and Test data sections.

### Phase 2 — test user

Provision (reuse-or-create) a test user in the staging Authentik `test` group, with
ephemeral credentials held only for this run. Never use a real/production account.

### Phase 3 — drive the browser

Via the `playwright` plugin, on `ubongo`: open the service's staging URL (resolved via
boma DNS), authenticate through the real Traefik + Authentik SSO flow, then execute each
`VERIFY.md` journey — judging pass/fail and screenshotting key states — and free-explore
for anything obviously broken. Save screenshots to the git-ignored `.verify-runs/`
working dir; avoid capturing credential screens.

### Phase 4 — write the report

Save to `docs/testing/reviews/YYYY-MM-DD-<name>.md` and overwrite
`docs/testing/reviews/latest.md`. Structure:

- **One-line verdict** — e.g. "5/5 journeys passed; one manual check pending".
- **Run metadata** — date, service, staging env, test user, reviewed commit SHA.
- **Per-journey result** — pass/fail against `VERIFY.md`, with the evidence (linked
  screenshot path) and any observation.
- **Free-exploration findings** — anything noticed beyond the listed journeys.
- **Manual-test checklist** — the "Not browser-verifiable" items plus anything Claude
  couldn't do: numbered steps, expected result, and why it was handed off.

### Phase 5 — clean up + commit

Offer to clean up the `test`-group user (or note that the staging rebuild will).
Commit the report markdown per CLAUDE.md git conventions. **Do not** commit
`.verify-runs/` (git-ignored).

## Notes

- Reports (markdown) are committed; screenshots stay local on `ubongo` in `.verify-runs/`.
- Exploratory and interactive — this is not a deterministic CI gate.
  • Step 2: Verify and commit

Run: rbw unlocked && pre-commit run --files .claude/commands/verify-service.md Expected: Passed/Skipped.

git add .claude/commands/verify-service.md
git commit -m "Add /verify-service skill for Level 4 UI verification (ADR-017)"

Task 5: Add Level 4 to the service-clearance gate

Files:

  • Modify: docs/security/service-checklist.md

  • Step 1: Add an Operability bullet for Level 4

Find this exact block:

## Operability (security-adjacent)

- [ ] Logs go somewhere reviewable (central aggregation when available)
- [ ] Backup/restore is covered if the service holds state

Replace with:

## Operability (security-adjacent)

- [ ] Logs go somewhere reviewable (central aggregation when available)
- [ ] Backup/restore is covered if the service holds state
- [ ] Passed Level 4 service-UI verification (`/verify-service`) against staging — the
      service has a populated `roles/<service>/VERIFY.md` and its critical journeys
      verified (ADR-008 Level 4 / ADR-017)
  • Step 2: Verify and commit

Run: rbw unlocked && pre-commit run --files docs/security/service-checklist.md Expected: Passed/Skipped.

git add docs/security/service-checklist.md
git commit -m "service-checklist: add Level 4 UI verification to the gate"

Task 6: Update CLAUDE.md (role convention + Further reading)

Files:

  • Modify: CLAUDE.md

  • Step 1: Add the VERIFY.md role-convention bullet

Find this exact line:

- Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md`

Replace with that SAME line followed by a new bullet:

- Every **service** role must have a populated `SECURITY.md` (ADR-002/004) — copy `docs/security/service-security-template.md`
- Every **service** role must have a populated `VERIFY.md` (ADR-008/017) — copy `docs/testing/service-verify-template.md`
  • Step 2: Add the ADR-017 Further-reading row

Find this exact line:

| Testing methodology    | `docs/decisions/008-testing.md`       |

Replace with that SAME line followed by a new row:

| Testing methodology    | `docs/decisions/008-testing.md`       |
| Service-UI verification (Level 4) | `docs/decisions/017-service-ui-verification.md` |
  • Step 3: Verify and commit

Run: rbw unlocked && pre-commit run --files CLAUDE.md Expected: Passed/Skipped.

git add CLAUDE.md
git commit -m "CLAUDE.md: VERIFY.md role convention; link ADR-017"

Task 7: Git-ignore screenshots + create the reviews dir

Files:

  • Modify: .gitignore

  • Create: docs/testing/reviews/README.md

  • Step 1: Add the screenshot working dir to .gitignore

Find this exact block at the end of .gitignore:

# Terraform
terraform/**/.terraform/
terraform/**/*.tfstate
terraform/**/*.tfstate.backup
terraform/**/terraform.tfvars
# .terraform.lock.hcl is intentionally tracked (pins provider versions)

Replace with:

# Terraform
terraform/**/.terraform/
terraform/**/*.tfstate
terraform/**/*.tfstate.backup
terraform/**/terraform.tfvars
# .terraform.lock.hcl is intentionally tracked (pins provider versions)

# Service-UI verification screenshots (kept locally on ubongo, not committed — ADR-017)
.verify-runs/
  • Step 2: Create the reviews dir README (so the dir exists in git)

Create docs/testing/reviews/README.md with exactly this content:

# Service-UI verification reports

Dated reports written by `/verify-service` (ADR-008 Level 4 / ADR-017), one per run:
`YYYY-MM-DD-<service>.md`, plus `latest.md`. These markdown reports are committed; the
screenshots they reference stay local on `ubongo` in the git-ignored `.verify-runs/`
working dir.

No reports yet — the harness is designed, not yet runnable (see STATUS.md).
  • Step 3: Verify and commit

Run: rbw unlocked && pre-commit run --files .gitignore docs/testing/reviews/README.md Expected: Passed/Skipped.

git add .gitignore docs/testing/reviews/README.md
git commit -m "Git-ignore verify screenshots; add testing/reviews dir"

Task 8: Add the Level 4 row to STATUS.md

Files:

  • Modify: STATUS.md

  • Step 1: Add a row to the "Designed but not built" table

Find this exact line:

| NetBird agent enrollment in `base` | ADR-016 | Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on `wt0`. Designed; base role not built. |

Replace with that SAME line followed by the new row:

| NetBird agent enrollment in `base` | ADR-016 | Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on `wt0`. Designed; base role not built. |
| Service-UI verification (Level 4) | ADR-017 / ADR-008 | `/verify-service` skill + `VERIFY.md` template + standards are authorable and present; *running* deferred on ubongo + `playwright` plugin + Authentik + a staging deploy. |
  • Step 2: Verify and commit

Run: rbw unlocked && pre-commit run --files STATUS.md Expected: Passed/Skipped.

git add STATUS.md
git commit -m "STATUS: record Level 4 service-UI verification (ADR-017)"

Task 9: Mark TODO 2.2/2.3 addressed

Files:

  • Modify: docs/TODO.md

  • Step 1: Annotate the Testing items

Find this exact block:

2. **Testing**
   1. Choose and configure code-testing tooling (Molecule, etc.).
   2. Decide how the AI interprets Molecule output and performs live testing:
      API calls, curl pulls of web products, log reviews, and headless browsing.
   3. Define a standard for generating test users and for instructing the user to
      perform relevant manual tests.

Replace with:

2. **Testing**
   1. Choose and configure code-testing tooling (Molecule, etc.).
   2. Decide how the AI interprets Molecule output and performs live testing:
      API calls, curl pulls of web products, log reviews, and headless browsing.
      — Headless browsing DECIDED (ADR-017): the `/verify-service` Level 4 harness.
      The API/curl/log-review siblings remain open.
   3. ~~Define a standard for generating test users and for instructing the user to
      perform relevant manual tests.~~ DECIDED (ADR-017): test users in the staging
      Authentik `test` group; manual tests handed off as a checklist in the
      `/verify-service` report.
  • Step 2: Verify and commit

Run: rbw unlocked && pre-commit run --files docs/TODO.md Expected: Passed/Skipped.

git add docs/TODO.md
git commit -m "TODO: mark headless-browsing + test-user standard decided (ADR-017)"

Task 10: Final consistency sweep

Files: none modified (verification only)

  • Step 1: Confirm ADR-017 is present and cross-linked

Run:

test -f docs/decisions/017-service-ui-verification.md && echo "ADR-017 present"
grep -rl "ADR-017\|017-service-ui-verification" docs/ CLAUDE.md STATUS.md .claude/ | grep -vE "superpowers/(plans|specs)/"

Expected: the file exists and the referencing files appear — ADR-008, CLAUDE.md, STATUS.md, the VERIFY.md template, the /verify-service skill, service-checklist, TODO, the reviews README.

  • Step 2: Confirm the new artifacts exist and the Level 4 stub is gone

Run:

ls docs/testing/service-verify-template.md .claude/commands/verify-service.md docs/testing/reviews/README.md
grep -n "planned, not built" docs/decisions/008-testing.md || echo "Level 4 stub replaced (good)"
grep -n "\.verify-runs/" .gitignore && echo "screenshot dir ignored (good)"

Expected: all three files listed; the old Level 4 "planned, not built" stub line gone; .verify-runs/ in .gitignore.

  • Step 3: Full hook run

Run: rbw unlocked && pre-commit run --all-files Expected: all hooks Passed/Skipped. Fix anything that fails (likely trailing whitespace / end-of-file) and amend the owning commit.

  • Step 4: Push (only if the user asks)
git push origin <branch-or-main-after-merge>

Self-review notes (author)

  • Spec coverage: decision/forks/architecture → Task 1 (ADR-017) + Task 2 (ADR-008); VERIFY.md standard → Task 3 (template) + Task 6 (convention) + Task 5 (gate); skill/mechanism/reporting/safety → Task 4 (/verify-service); reporting dir + screenshot policy → Task 7; STATUS/TODO reconciliation → Tasks 89. ✓
  • Buildable-now vs deferred: every task is authorable without ubongo/Authentik/staging; the skill carries an explicit Prerequisites gate so it cannot pretend to run. Deferred items (new-role scaffold, Authentik automation, per-service VERIFY.md, plugin install) are recorded in ADR-017/STATUS, not implemented. ✓
  • No placeholders: every create/edit shows exact content; the &lt;…&gt; tokens in the template are deliberate (match service-security-template.md's house style). ✓
  • Name consistency: /verify-service, roles/<service>/VERIFY.md, docs/testing/service-verify-template.md, docs/testing/reviews/, .verify-runs/, and the test Authentik group are used identically across all tasks. ✓