sjat/boma

sjat 9e0c264658 docs: reconcile lower-severity review findings (O9-O24)

- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
  outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
  boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
  <host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
  placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
  offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)

O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-14 19:31:40 +02:00

6.1 KiB

Raw Blame History

ADR-017 — Service-UI acceptance verification (Level 4)

Status

Accepted (2026-06-05). Designed. Authorable now: this ADR, the ADR-008 Level 4 expansion, the VERIFY.md template, the /verify-service skill, the convention/checklist/Further-reading edits, .gitignore/dir, STATUS/TODO. Running is deferred on its dependencies.

Context

ADR-008 defines testing Levels 1–3 (Molecule, staging deploy, external smoke) and a Level 4 stub. Nothing below Level 4 exercises a service's application UI — none answer "does PhotoPrism actually let me log in, upload a photo, and see a thumbnail?" (TODO 8.2). The operator's ask (TODO 2.2 headless browsing + TODO 2.3 test users + manual-test instruction): Claude spins up a browser, sees the service UI, exercises it, generates test users, and instructs the operator on manual tests. Today Claude sees a browser only passively (/screenshot fetches operator-taken shots from mamba); this is the active counterpart.

Decision

A Claude-driven exploratory service-UI verification harness — Level 4 — invoked as /verify-service <name> on ubongo. Five settled forks:

Claude-driven exploratory — Claude navigates with judgment, not deterministic scripts. A scripted regression suite is explicitly not built here.
Interactive, Claude-in-the-loop — exploratory judgment can't be a headless cron gate; scheduled smoke is a determinism job for health checks / Uptime Kuma later.
Staging, full exercise — Claude creates test users and exercises features (incl. destructive flows) against a staging deploy; the rebuildable sandbox resolves safety.
Test users in Authentik (central IdP), real SSO flow — authenticates through Caddy (ADR-024) + Authentik as a real user would.
Per-service VERIFY.md backbone + free exploration — each service role ships an acceptance spec of critical journeys; Claude executes it and explores beyond it.

VERIFY.md standard

Every service role ships a populated roles/<service>/VERIFY.md, copied from docs/testing/service-verify-template.md — parallel to SECURITY.md from service-security-template.md. A new role convention. It lists the service's critical user journeys (what "working" means), what good looks like, and what is not browser-verifiable (→ manual handoff). It also joins the pre-production gate in docs/security/service-checklist.md.

Test-user standard (TODO 2.3)

Test identities live only in the staging Authentik (never production): a dedicated test group / naming prefix; ephemeral per-run credentials (staging is rebuildable, so nothing persisted, none in vault.yml); reuse-or-create; teardown via staging rebuild or explicit test-group cleanup.

Reporting & manual handoff

/verify-service writes docs/testing/reviews/YYYY-MM-DD-<service>.md (+ latest.md), mirroring /review-repo and /capacity-review: pass/fail per VERIFY.md journey, observations, the test-user/env used, a verdict, and a structured manual-test checklist for anything Claude can't do (physical device, paid/external flow, subjective judgment) — the "instruct me on tests" output. Screenshots are saved to a git-ignored working dir on ubongo (PNG bloat + secret-leak risk); the report links them.

Safety

Staging-only guard — the skill refuses to run against production (exploratory clicking is destructive); ADR-002-aligned hard stop.
Confined blast radius — test users only in the staging test group; the run sticks to the target service.
No secrets leaked — the git-ignored screenshot dir is the safety boundary; avoid capturing credential screens.

Dependencies

ubongo (ADR-015) — runs the browser. Designed, not built.
playwright Claude Code plugin — enabled when this lands (claude-code-setup.md).
Authentik (CAPABILITIES §2, planned) — central IdP for test users + SSO.
A staging deploy of the service (ADR-008 Level 2) — staging is currently empty stubs.
make new-role scaffolding VERIFY.md — deferred to when that scaffold is next touched.

What was ruled out

Option	Reason
Scripted Playwright regression suite	Operator wants exploratory judgment; scripts add maintenance burden. Could be a later layer, not this.
Scheduled headless smoke gate	Needs determinism the exploratory nature excludes; belongs to health checks / Uptime Kuma.
Verify against production	Exploratory clicking + test-user creation is destructive/polluting; staging sandbox instead.
Free-form, no per-service spec	Non-repeatable, can miss a critical flow; `VERIFY.md` gives a backbone.
Staging bypasses SSO / per-app users	Wouldn't exercise the real Caddy+Authentik path; central test users are faithful.
Commit screenshots to the repo	Repo bloat + secret-leak risk; git-ignored on `ubongo`.

Consequences

The harness is confined to staging by a hard stop: it refuses to run against production because exploratory clicking is destructive, the blast radius is bounded to the target service, and test users live only in the staging test group (Safety).
No secrets leak: the git-ignored screenshot dir is the safety boundary and credential screens are avoided (Safety; Reporting & manual handoff).
Test identities are ephemeral per-run credentials in the staging Authentik only — never production, none persisted in vault.yml — created reuse-or-create and torn down via staging rebuild or test-group cleanup (Test-user standard).
Anything Claude cannot exercise (physical device, paid/external flow, subjective judgment) is handed off via a structured manual-test checklist in the run report (Reporting & manual handoff).
Authoring is possible now (this ADR, the VERIFY.md template, the /verify-service skill, conventions/checklist edits), but running is deferred on its dependencies: ubongo, the playwright plugin, Authentik, a staging deploy, and make new-role scaffolding VERIFY.md (Status; Dependencies).

ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security), ADR-004 (VERIFY.md parallels SECURITY.md), ADR-013/014 (heritage / knowledge sourcing).

6.1 KiB Raw Blame History Unescape Escape