From 2df1f9815321b94afd5c0be2ed0ad7d614ab987e Mon Sep 17 00:00:00 2001 From: sjat Date: Fri, 5 Jun 2026 13:14:12 +0200 Subject: [PATCH] ADR-008: expand Level 4 into the verify-service harness (ADR-017) --- docs/decisions/008-testing.md | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/docs/decisions/008-testing.md b/docs/decisions/008-testing.md index aa45d3d..477ce0c 100644 --- a/docs/decisions/008-testing.md +++ b/docs/decisions/008-testing.md @@ -53,14 +53,25 @@ Once `askari` is operational: scripted checks from outside the network confirmin that public-facing services respond correctly. Catches firewall and reverse proxy configuration issues invisible to Ansible check mode. -### Level 4 — Service-UI acceptance (planned, not built) +### Level 4 — Service-UI acceptance (Claude-driven exploratory) -Claude drives a headless browser from `ubongo` against a *deployed* service: loads -the rendered UI, creates test users, exercises features, and hands the operator a -manual test script for the rest. Catches application-level regressions that no lower -level sees. The harness (Playwright/headless-Chromium, screenshot-back-to-Claude) is -a **separate spec**; `ubongo` is sized for it (ADR-015). Status: designed, not built -(STATUS.md). +A Claude-driven exploratory check of a service's **application UI**, run as +`/verify-service ` on `ubongo` (ADR-017). Claude drives Chromium via the +`playwright` plugin against a **staging** deploy, authenticates through the real +Traefik + Authentik SSO flow using a test user in the staging `test` group, then +executes the service's `roles//VERIFY.md` acceptance journeys *and* +free-explores — judging pass/fail, screenshotting key states. It writes a dated report +to `docs/testing/reviews/` and hands the operator a manual-test checklist for anything +it can't verify (hardware, paid/external flows, subjective judgment). + +Catches application-level regressions no lower level sees ("does PhotoPrism actually +serve photos?"). Placement: after Level 2 (staging deploy), before production +promotion. Exploratory and interactive by design — *not* a deterministic CI/cron gate +(that role belongs to health checks / Uptime Kuma). + +**Status:** the skill, the `VERIFY.md` template, and standards are authorable now; +running it is deferred on `ubongo` + the `playwright` plugin + Authentik + a staging +deploy (STATUS.md). Full design: ADR-017. ---