sjat/boma

sjat 175777e36a docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)

- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-14 19:06:33 +02:00

6.1 KiB

Raw Blame History

ADR-017 — Service-UI acceptance verification (Level 4)

Status

Accepted (2026-06-05). Designed. Authorable now: this ADR, the ADR-008 Level 4 expansion, the VERIFY.md template, the /verify-service skill, the convention/checklist/Further-reading edits, .gitignore/dir, STATUS/TODO. Running is deferred on its dependencies.

Context

ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a Level 4 stub. Nothing below Level 4 exercises a service's application UI — none answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?" (TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users + manual-test instruction): Claude spins up a browser, sees the service UI, exercises it, generates test users, and instructs the operator on manual tests. Today Claude sees a browser only passively (/screenshot fetches operator-taken shots from mamba); this is the active counterpart.

Decision

A Claude-driven exploratory service-UI verification harness — Level 4 — invoked as /verify-service <name> on ubongo. Five settled forks:

Claude-driven exploratory — Claude navigates with judgment, not deterministic scripts. A scripted regression suite is explicitly not built here.
Interactive, Claude-in-the-loop — exploratory judgment can't be a headless cron gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
Staging, full exercise — Claude creates test users and exercises features (incl. destructive flows) against a staging deploy; the rebuildable sandbox resolves safety.
Test users in Authentik (central IdP), real SSO flow — authenticates through Caddy (ADR-024) + Authentik as a real user would.
Per-service VERIFY.md backbone + free exploration — each service role ships an acceptance spec of critical journeys; Claude executes it and explores beyond it.

VERIFY.md standard

Every service role ships a populated roles/<service>/VERIFY.md, copied from docs/testing/service-verify-template.md — parallel to SECURITY.md from service-security-template.md. A new role convention. It lists the service's critical user journeys (what "working" means), what good looks like, and what is not browser-verifiable (→ manual handoff). It also joins the pre-production gate in docs/security/service-checklist.md.

Test-user standard (TODO 2.3)

Test identities live only in the staging Authentik (never production): a dedicated test group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so nothing persisted, none in vault.yml); reuse-or-create; teardown via staging rebuild or explicit test-group cleanup.

Reporting & manual handoff

/verify-service writes docs/testing/reviews/YYYY-MM-DD-<service>.md (+ latest.md), mirroring /review-repo and /capacity-review: pass/fail per VERIFY.md journey, observations, the test-user/env used, a verdict, and a structured manual-test checklist for anything Claude can't do (physical device, paid/external flow, subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a git-ignored working dir on ubongo (PNG bloat + secret-leak risk); the report links them.

Safety

Staging-only guard — the skill refuses to run against production (exploratory clicking is destructive); ADR-002-aligned hard stop.
Confined blast radius — test users only in the staging test group; the run sticks to the target service.
No secrets leaked — the git-ignored screenshot dir is the safety boundary; avoid capturing credential screens.

Dependencies

ubongo (ADR-015) — runs the browser. Designed, not built.
playwright Claude Code plugin — enabled when this lands (claude-code-setup.md).
Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
make new-role scaffolding VERIFY.md — deferred to when that scaffold is next touched.

What was ruled out

Option	Reason
Scripted Playwright regression suite	Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this.
Scheduled headless smoke gate	Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma.
Verify against production	Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead.
Free-form, no per-service spec	Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone.
Staging bypasses SSO / per-app users	Wouldn't exercise the real Caddy+Authentik path; central test users are faithful.
Commit screenshots to the repo	Repo bloat + secret-leak risk; git-ignored on `ubongo`.

See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security), ADR-004 (VERIFY.md parallels SECURITY.md), ADR-013/014 (heritage / knowledge sourcing).

Consequences

The harness is confined to staging by a hard stop: it refuses to run against production because exploratory clicking is destructive, the blast radius is bounded to the target service, and test users live only in the staging test group (Safety).
No secrets leak: the git-ignored screenshot dir is the safety boundary and credential screens are avoided (Safety; Reporting & manual handoff).
Test identities are ephemeral per-run credentials in the staging Authentik only — never production, none persisted in vault.yml — created reuse-or-create and torn down via staging rebuild or test-group cleanup (Test-user standard).
Anything Claude cannot exercise (physical device, paid/external flow, subjective judgment) is handed off via a structured manual-test checklist in the run report (Reporting & manual handoff).
Authoring is possible now (this ADR, the VERIFY.md template, the /verify-service skill, conventions/checklist edits), but running is deferred on its dependencies: ubongo, the playwright plugin, Authentik, a staging deploy, and make new-role scaffolding VERIFY.md (Status; Dependencies).

6.1 KiB Raw Blame History Unescape Escape