docs(adr): restructure ADRs 016-018 to ADR-023 conformance
Make the existing Status sections parseable (Accepted (date) + the existing designed-not-built note) and add Consequences sections assembled from each ADR's already-stated residual risks, trade-offs and build status. No decision substance changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
40a428975a
commit
0df24909e3
3 changed files with 63 additions and 3 deletions
|
|
@ -90,7 +90,7 @@ allocated for it.
|
|||
|
||||
## Status
|
||||
|
||||
Designed, not built — depends on the unbuilt `base` role and service-role machinery
|
||||
Accepted (2026-06-05). Designed, not built — depends on the unbuilt `base` role and service-role machinery
|
||||
(STATUS.md). This ADR records the decision and doc reconciliation; role tasks land when
|
||||
`base` exists.
|
||||
|
||||
|
|
@ -108,3 +108,22 @@ Designed, not built — depends on the unbuilt `base` role and service-role mach
|
|||
See also: ADR-007 (network — amended), ADR-015 (control host), ADR-002 (security),
|
||||
ADR-011 (version pinning), ADR-004 (one service = one role), ADR-009 (TF↔Ansible
|
||||
handoff), ADR-013 (heritage — V4 ran WireGuard; NetBird is translated, not transplanted).
|
||||
|
||||
## Consequences
|
||||
|
||||
- A new public surface appears on `askari` — management API + dashboard (80/443) +
|
||||
Coturn (3478) — mitigated by TLS, embedded-IdP login, source-IP limits where
|
||||
practical, `base` hardening and version-pinned NetBird, and recorded as accepted-risk
|
||||
R3 (Security).
|
||||
- On-LAN SSH never depends on the mesh: `base` allows inbound SSH from `ubongo`'s LAN
|
||||
address as a mesh-independent secondary path, so a mesh/coordinator outage never
|
||||
blocks on-LAN SSH and Ansible stays off the mesh (Security; Recovery & operations).
|
||||
- The mesh survives a homelab outage because the coordinator is off-site on `askari`,
|
||||
with its management datastore backed up encrypted off `askari` and peers keeping
|
||||
last-known config through a brief coordinator outage (Recovery & operations).
|
||||
- Choosing NetBird over plain OPNsense WireGuard, Tailscale, Tailscale+Headscale, an
|
||||
on-cluster coordinator, a `ubongo` subnet router, and a standalone IdP gains
|
||||
identity/ACL policy, self-hosted sovereignty, no routing SPOF, and a light single
|
||||
operator footprint (What was ruled out).
|
||||
- Implementation is pending: the role tasks land only once the unbuilt `base` role and
|
||||
service-role machinery exist (Status).
|
||||
|
|
|
|||
|
|
@ -65,7 +65,7 @@ them.
|
|||
|
||||
## Status
|
||||
|
||||
Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
|
||||
Accepted (2026-06-05). Designed. **Authorable now:** this ADR, the ADR-008 Level 4 expansion, the `VERIFY.md`
|
||||
template, the `/verify-service` skill, the convention/checklist/Further-reading edits,
|
||||
`.gitignore`/dir, STATUS/TODO. **Running is deferred** on its dependencies.
|
||||
|
||||
|
|
@ -90,3 +90,21 @@ template, the `/verify-service` skill, the convention/checklist/Further-reading
|
|||
|
||||
See also: ADR-008 (testing — expanded), ADR-015 (control host), ADR-002 (security),
|
||||
ADR-004 (`VERIFY.md` parallels `SECURITY.md`), ADR-013/014 (heritage / knowledge sourcing).
|
||||
|
||||
## Consequences
|
||||
|
||||
- The harness is confined to staging by a hard stop: it refuses to run against
|
||||
production because exploratory clicking is destructive, the blast radius is bounded to
|
||||
the target service, and test users live only in the staging `test` group (Safety).
|
||||
- No secrets leak: the git-ignored screenshot dir is the safety boundary and credential
|
||||
screens are avoided (Safety; Reporting & manual handoff).
|
||||
- Test identities are ephemeral per-run credentials in the staging Authentik only —
|
||||
never production, none persisted in `vault.yml` — created reuse-or-create and torn
|
||||
down via staging rebuild or `test`-group cleanup (Test-user standard).
|
||||
- Anything Claude cannot exercise (physical device, paid/external flow, subjective
|
||||
judgment) is handed off via a structured manual-test checklist in the run report
|
||||
(Reporting & manual handoff).
|
||||
- Authoring is possible now (this ADR, the `VERIFY.md` template, the `/verify-service`
|
||||
skill, conventions/checklist edits), but running is deferred on its dependencies:
|
||||
`ubongo`, the `playwright` plugin, Authentik, a staging deploy, and `make new-role`
|
||||
scaffolding `VERIFY.md` (Status; Dependencies).
|
||||
|
|
|
|||
|
|
@ -72,7 +72,7 @@ tracked allocation in `docs/hardware/reference.md` (ADR-012).
|
|||
|
||||
## Status
|
||||
|
||||
Designed. **Authorable now:** this ADR + the ADR-002/CAPABILITIES/ADR-012/
|
||||
Accepted (2026-06-06). Designed. **Authorable now:** this ADR + the ADR-002/CAPABILITIES/ADR-012/
|
||||
accepted-risks/STATUS/TODO reconciliations. **Deferred on the stack:** Alloy-in-`base`,
|
||||
the `loki`/`grafana` service roles, OPNsense syslog config, the push-only credential,
|
||||
and the live pipeline.
|
||||
|
|
@ -97,3 +97,26 @@ the metrics stack (Prometheus / `node_exporter`) for SSD-wearout + log-silence a
|
|||
See also: ADR-002 (security baseline — realised here), ADR-016 (mesh / `askari`),
|
||||
ADR-007 (OPNsense / `askari`), ADR-012 (hardware/capacity), ADR-004 (service-role
|
||||
standard), ADR-011 (health checks — distinct from this).
|
||||
|
||||
## Consequences
|
||||
|
||||
- Opportunistic track-covering and host-pivot-to-store are defeated because logs leave
|
||||
the host in near-real-time and the off-cluster security trail is append-only, so it
|
||||
survives full-cluster compromise (Security, integrity & residual risks).
|
||||
- Conscious residuals remain: append-only is not cryptographic WORM (root-on-`askari`
|
||||
could edit chunks — R4); there is a few-seconds un-shipped window; agent compromise
|
||||
can stop future shipping but not alter shipped history; a stolen push credential
|
||||
appends noise but cannot delete; and an `askari` outage buffers then flushes on
|
||||
reconnect (Security, integrity & residual risks).
|
||||
- A host going silent is itself an alert (Security, integrity & residual risks).
|
||||
- Only a bounded security subset ships off-site — `auditd`, `authpriv`, `fail2ban`,
|
||||
AIDE, Suricata and key container security events tagged `security="true"` — while the
|
||||
cluster Loki holds everything, keeping off-site volume small (Data flow & the security
|
||||
subset).
|
||||
- Disk-wear is a managed parameter: log storage on NVMe/SSD or HDD never SD/USB flash,
|
||||
bounded verbosity at source, tuned Loki retention/compaction, and monitored SSD
|
||||
wearout/TBW with an alert; log storage is a tracked allocation in
|
||||
`docs/hardware/reference.md` (Retention & disk-wear).
|
||||
- The decision is authorable now but the live pipeline is deferred on the stack:
|
||||
Alloy-in-`base`, the `loki`/`grafana` service roles, OPNsense syslog config, and the
|
||||
push-only credential (Status; Dependencies).
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue