diff --git a/docs/CAPABILITIES.md b/docs/CAPABILITIES.md index 783e634..a4eda32 100644 --- a/docs/CAPABILITIES.md +++ b/docs/CAPABILITIES.md @@ -109,6 +109,7 @@ _(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not cont | Update watcher | DIUN | S | planned | New-image alerts driving the update process | ADR-011 | | Scheduled jobs | `scheduled_jobs` role + `claude -p` jobs | S | planned | Declarative cron: `/review-repo`, security/capacity reviews, sanity checks | TODO 8 | | Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 | +| Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik | --- diff --git a/docs/TODO.md b/docs/TODO.md index 644548b..c54226a 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -27,7 +27,7 @@ 7. Define a tagging standard that lets us target runs without over-tagging. 8. Ensure the right things are backed up (incl. database dumps if we land on PBS). 9. Decide: a central database server, or individual database services per app? - 10. Should we continue to use the base-container method, or maybe something in the improvements of the methods in boma moods the point? + 10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)? 11. Deliberate tagging strategy. 4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani? diff --git a/docs/decisions/003-toolchain.md b/docs/decisions/003-toolchain.md index 93cf1b7..f981642 100644 --- a/docs/decisions/003-toolchain.md +++ b/docs/decisions/003-toolchain.md @@ -82,7 +82,8 @@ Config files: `.ansible-lint`, `.yamllint` in repo root. 2. On green → deploy to staging 3. [manual promote gate] → deploy to production -`act_runner` runs as a Docker container on the control node or a dedicated runner VM. +`act_runner` runs as a Docker container on `ubongo` (the control node — ADR-015), or on +a dedicated runner VM later if CI load warrants a separate host. --- diff --git a/docs/decisions/008-testing.md b/docs/decisions/008-testing.md index 477ce0c..d0b44c6 100644 --- a/docs/decisions/008-testing.md +++ b/docs/decisions/008-testing.md @@ -145,7 +145,7 @@ Level 2 (staging) or Level 3 (external). This is a conscious, documented decisio | Capability | Reason not testable in Molecule | |---|---| | `nftables` rule loading | Requires `nf_tables` kernel module; not available in Docker | -| WireGuard tunnel establishment | Requires `wireguard` kernel module | +| NetBird mesh data plane (`wt0` WireGuard interface) | Requires the `wireguard` kernel module; Molecule checks only that the agent is installed/configured (ADR-016) | | `unattended-upgrades` behaviour | Installs correctly; actual upgrade behaviour requires a real apt environment | | DHCP behaviour (OPNsense) | OPNsense is managed by Ansible but not testable in a container | | mDNS reflector (Avahi cross-VLAN) | Requires real network interfaces and VLANs | diff --git a/docs/decisions/010-forgejo-ci.md b/docs/decisions/010-forgejo-ci.md index 8a8e15c..836fa74 100644 --- a/docs/decisions/010-forgejo-ci.md +++ b/docs/decisions/010-forgejo-ci.md @@ -63,8 +63,8 @@ Trunk-based, matching ADR-003 / ADR-008: push to main → lint + Molecule → deploy staging → [manual gate] → deploy production ``` -Runner: `act_runner` on the control node or a dedicated runner VM. Actions is not -yet enabled — see STATUS.md. +Runner: `act_runner` on `ubongo` (the control node — ADR-015), or a dedicated runner VM +later if CI load warrants a separate host. Actions is not yet enabled — see STATUS.md. --- diff --git a/docs/decisions/011-update-management.md b/docs/decisions/011-update-management.md index 6497ab4..c96bbe6 100644 --- a/docs/decisions/011-update-management.md +++ b/docs/decisions/011-update-management.md @@ -64,8 +64,8 @@ Because these are primarily Proxmox VMs, take a **VM snapshot before the Friday ### 5. Stateful upgrades — 8-weekly analysis, human-gated, backup-first Stateful services are **never** touched by the weekly run. Instead, **every 8 weeks** -an automated analysis job (a scheduled `claude -p`, per the `scheduled_jobs` plan and -ADR-010) does: +an automated analysis job (a scheduled `claude -p`, per the `scheduled_jobs` design in +`docs/TODO.md` 8.3, not yet built) does: 1. Read changelogs / breaking-change notes for each pinned stateful image; diff the pinned tag against what's available. @@ -125,7 +125,7 @@ alert-driven. | -------------------------------------- | ----------------------------------------------------------------------------- | | One uniform policy for all services | Ignores blast radius; stateful data loss ≠ stateless re-pull. | | Rolling `latest` for stateful services | Unattended schema/migration changes are how you lose data. | -| Digest-pinning the stateful tier | Unreadable in diffs; snapshot-before + backups give the immutability instead. | +| Digest-_only_ pin (no readable tag) for stateful | Unreadable in diffs — the tiered rule pins `tag@digest` (readable tag *and* digest) instead (Decision 2). | | Pinning the stateless tier | No durable data to protect; pins just add churn DIUN already covers. | | Auto-updating stateful on a timer | Must be human-gated and backup-first; only the _analysis_ is automated. | | Updating the whole fleet at once | Simultaneous reboots hide which host/phase actually broke. |