Splits the work into Tranche A (land now: ADR-021, ADR-016/020
reconciliation, ssh-from-control firewall source, ACCESS.md template,
/check-access command, governance + index wiring) and Tranche B
(build-pending on infra: per-service access__* + rendered ACCESS.md,
/check-access running).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brainstorming spec for ADR-021: operational access as a deployment
deliverable. Two layers (host baseline + per-service), a three-tier
access ladder (mesh SSH -> LAN SSH from ubongo -> console break-glass),
declarative access__* data rendering ACCESS.md and driving a
/check-access verifier. Resolves TODO 3.2 (API access) and 7.2 (host
access); amends ADR-016 (SSH also from ubongo) and ADR-020
(ssh-from-control source).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bare 'nft list ruleset' has no leading flush, so the timer's 'nft -f rollback'
was a no-op on first apply (empty file) and errored ('table exists') on later
applies — the auto-rollback silently did nothing, defeating the askari lockout
safeguard. Prepend 'flush ruleset' so the revert is atomic + self-contained.
Verified the snapshot->lockout->revert round-trip in an isolated netns.
Also fix stale STATUS prose (base is partially built, not absent).
established/related keeps the in-flight session alive across the swap, so the
prior 'next task runs' confirm always passed even if new connections were
bricked — the rollback was theater. reset_connection + wait_for_connection now
force a fresh handshake through the new ruleset; failure aborts the play and the
armed timer reverts. (meta: reset_connection ignores 'when' — benign extra
reconnect on no-op runs; verified idempotent in molecule.)
nft -c rejects iif "wt0" when the interface is absent (container, or any host
before NetBird); iifname matches by name and is robust to wt0 coming/going.
Drop the ansible_host fixture override (the docker connection uses it as the
container name) — molecule covers zone resolution, pytest covers service->IP.
Adds role_tag_problems() to check-tags.py: every role imported in a
play's roles: block must carry its own role name as a tag (extra tags
allowed; templated role names skipped). Wires the check into main() so
make lint catches violations. 6 new unit tests (29 total, all passing).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Task-by-task docs plan: author ADR-018 and reconcile ADR-002, accepted-risks
(R4), CAPABILITIES, ADR-012, STATUS, TODO, CLAUDE.md. Roles/pipeline deferred
on the base + service-role machinery.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All logs -> on-cluster Loki for troubleshooting/trends; a security-relevant
subset also ships write-only off-site to askari (append-only, tamper-resistant
against full-cluster compromise); skip WORM (accepted-risk R4). Alloy agent in
base; loki/grafana service roles; disk-wear handled as a design parameter.
Basis for ADR-018.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- broken-path-ref: skip template/generated-report paths — a placeholder
(<service>) immediately following the match, a YYYY-MM-DD date token, or a
path under a generated-report reviews/ dir (14 -> 0 on the current tree).
- marker: skip numbered-backlog references (TODO 8.2, TODO-3.1, TODO (2.2,
TODO item 16) which point at the backlog, not code markers (35 -> 2; the
remaining two are literal "TODO:" strings in a plan doc). Real code markers
(TODO:, FIXME, etc.) still caught — verified with a synthetic fixture.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>