Commit graph

160 commits

Author SHA1 Message Date
2dbcac11a0 chore(tooling): scope ansible-lint to ansible content; venv PATH in make test
Kaizen 2026-06-10 fixes:
- ansible-lint pre-commit hook now `always_run: false` + a files filter for
  roles/playbooks/inventories YAML, so docs-/config-only commits skip it and no
  longer need `rbw unlock` (root cause was ansible-lint auto-decrypting the
  group_vars vault, not the syntax-check).
- `make test`/`test-all` prepend $(CURDIR)/.venv/bin to PATH so non-activated
  agent runs find ansible-config/ansible-playbook.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 12:51:30 +02:00
9be4366ac3 feat(backup): backup strategy foundation layer (ADR-022)
Plan 1 of the backup & DR strategy: ADR-022, per-service backup__* contract +
BACKUP.md governance (template + checklist gate + new-role runbook step + dormant
/check-backup), and hardware/CAPABILITIES updates. Docs-only; the backup role and
live restore testing are Plans 2-3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:32:36 +02:00
ed6d5463aa docs(backup): final-review fixes — stateless BACKUP.md, dump-step wording, spec sync
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:32:06 +02:00
1e85c11ede docs(backup): update hardware ref (ubongo M70q, add fisi) + CAPABILITIES §9 (ADR-022)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 11:25:37 +02:00
5f946ac640 feat(backup): add dormant /check-backup verifier (ADR-022) 2026-06-10 11:22:57 +02:00
01e47d0890 docs(backup): add BACKUP.md step to new-role runbook (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:21:56 +02:00
81dac4f28b docs(backup): gate BACKUP.md in service checklist (ADR-022) 2026-06-10 11:20:55 +02:00
f3f80443d0 docs(backup): add BACKUP.md template + backup__* contract (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:20:01 +02:00
f5c97d1f36 docs(backup): record ADR-022; wire into CLAUDE.md, STATUS, TODO
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 11:19:01 +02:00
da116e1d92 docs(friction): log execution-mode ask (4th occurrence)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:06:25 +02:00
2041bd3b70 docs(backup): add foundation-layer implementation plan (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:05:17 +02:00
eaffd8d900 docs(backup): add backup & DR strategy design (→ ADR-022)
Data-only restic backups, rebuild-from-code recovery (Model A); central
off-cluster pull node (fisi) with 8TB mirror; 3-2-1 via pCloud (rclone)
+ rotated USB air-gap. Per-service backup__* contract + BACKUP.md as a
hard convention. Two-tier restore testing (ubongo container restore-verify
+ semi-annual staging DR rehearsal). One restic password escrowed to
Vaultwarden + paper (restic + vault passwords) for a non-circular
break-glass. Dead-man's-switch alerting via Uptime Kuma.

Resolves TODO 3.8; grounds ADR-011's backup-first assumption.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:00:01 +02:00
032adf1525 docs(friction): log execution-mode recurrence; fix list de-indents
Complete the 2026-06-09 entry (third recurrence of presenting the
execution-mode menu despite the standing subagent-driven preference) and
restore two continuation-line indents a markdown formatter had stripped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 08:54:37 +02:00
f151e99d04 docs(access): correct ADR-021 governance (runbook+gate, not scaffold)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:52:24 +02:00
13f0d482bd docs(access): wire ADR-021 into CLAUDE.md, STATUS, TODO
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:48:31 +02:00
649925b303 docs(access): gate ACCESS.md in checklist + new-role runbook (ADR-021)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:46:51 +02:00
384b94e34b feat(access): add /check-access verifier command (ADR-021, dormant)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:45:24 +02:00
0c507bbace feat(base): add ssh-from-control management-plane source (ADR-021)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 17:43:55 +02:00
46d091e82e docs(access): add ACCESS.md service record template
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:36:28 +02:00
f8098c2e15 docs(access): reconcile ADR-016/020 with control-node SSH source (ADR-021)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:34:57 +02:00
0fe9e45f57 docs(access): add ADR-021 operational-access doctrine 2026-06-09 17:33:46 +02:00
cdbd66410a docs(access): implementation plan for ADR-021 operational access
Splits the work into Tranche A (land now: ADR-021, ADR-016/020
reconciliation, ssh-from-control firewall source, ACCESS.md template,
/check-access command, governance + index wiring) and Tranche B
(build-pending on infra: per-service access__* + rendered ACCESS.md,
/check-access running).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 17:16:49 +02:00
fd4bbbc977 docs(access): design operational-access doctrine (ADR-021)
Brainstorming spec for ADR-021: operational access as a deployment
deliverable. Two layers (host baseline + per-service), a three-tier
access ladder (mesh SSH -> LAN SSH from ubongo -> console break-glass),
declarative access__* data rendering ACCESS.md and driving a
/check-access verifier. Resolves TODO 3.2 (API access) and 7.2 (host
access); amends ADR-016 (SSH also from ubongo) and ADR-020
(ssh-from-control source).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 17:10:54 +02:00
fcfb056591 docs(friction): record host-nftables build gotchas (iif/iifname, molecule ansible_host, venv PATH, apply-path coverage)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:16:21 +02:00
402913efb3 fix(base): make rollback snapshot restorable (flush-prefixed)
Bare 'nft list ruleset' has no leading flush, so the timer's 'nft -f rollback'
was a no-op on first apply (empty file) and errored ('table exists') on later
applies — the auto-rollback silently did nothing, defeating the askari lockout
safeguard. Prepend 'flush ruleset' so the revert is atomic + self-contained.
Verified the snapshot->lockout->revert round-trip in an isolated netns.
Also fix stale STATUS prose (base is partially built, not absent).
2026-06-06 19:15:38 +02:00
90683c7912 docs: record base firewall concern built (ADR-020 host layer)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 19:10:27 +02:00
6fb104e934 test(base): molecule verify asserts rendered firewall rules + nft -c
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 19:07:24 +02:00
b006196cc5 fix(base): confirm firewall apply over a FRESH connection
established/related keeps the in-flight session alive across the swap, so the
prior 'next task runs' confirm always passed even if new connections were
bricked — the rollback was theater. reset_connection + wait_for_connection now
force a fresh handshake through the new ruleset; failure aborts the play and the
armed timer reverts. (meta: reset_connection ignores 'when' — benign extra
reconnect on no-op runs; verified idempotent in molecule.)
2026-06-06 19:06:39 +02:00
026a29f609 feat(base): safe nftables apply with systemd-run auto-rollback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 19:03:58 +02:00
bca74458fb fix(base): iifname for load-time safety; zone-source molecule fixture
nft -c rejects iif "wt0" when the interface is absent (container, or any host
before NetBird); iifname matches by name and is robust to wt0 coming/going.
Drop the ansible_host fixture override (the docker connection uses it as the
container name) — molecule covers zone resolution, pytest covers service->IP.
2026-06-06 19:02:50 +02:00
eeab5ed8de feat(base): render nftables ruleset from catalog (+ molecule fixture)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:57:44 +02:00
7dae93e4e1 fix(base): firewall resolver fails fast on empty/malformed sources; cover hosts: + proto default
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:56:04 +02:00
4127f8bc6b feat(base): firewall catalog resolver filter plugin + tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:51:10 +02:00
390cd3b335 feat(base): shared firewall catalog/zones + firewall defaults
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:49:40 +02:00
2486e31f7d feat(base): scaffold role + meta/README (firewall concern incoming)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:48:35 +02:00
03329d7d25 docs(plan): host nftables firewall implementation plan
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:47:48 +02:00
d7fbaca554 docs(spec): host nftables firewall design (ADR-020 build #1)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:40:50 +02:00
2ad50e4d5b docs(capabilities): note two-layer firewall model (ADR-020) 2026-06-06 16:00:19 +02:00
a9287427e3 docs(todo): mark 3.5 firewall strategy decided (ADR-020) 2026-06-06 16:00:01 +02:00
e24aab28b2 docs: link ADR-020; harden firewall guardrail to the service catalog 2026-06-06 15:59:47 +02:00
d311f67098 docs(adr): ADR-020 firewall strategy (two-layer + shared catalog) 2026-06-06 15:59:30 +02:00
8d1d8a88ea docs(friction): escalate execution-mode prompt; no plan→impl approval gate
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 15:57:40 +02:00
f700f4a475 docs(plan): firewall strategy ADR-020 landing plan
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 15:42:17 +02:00
2a65391c0e docs(spec): firewall strategy design (TODO 3.5 → ADR-020)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 15:36:24 +02:00
86bb3559ad STATUS: record tag standard + enforcement (ADR-019)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 15:23:58 +02:00
fac438cc92 fix(tags): recognize name: role key; only check roles: in plays
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:20:09 +02:00
5aeeb094eb feat(tags): enforce role imports carry their role-name tag
Adds role_tag_problems() to check-tags.py: every role imported in a
play's roles: block must carry its own role name as a tag (extra tags
allowed; templated role names skipped). Wires the check into main() so
make lint catches violations. 6 new unit tests (29 total, all passing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:12:48 +02:00
2e5a1e1e23 fix(tags): exclude molecule scenarios from tag scan; clarify ADR enforcement
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 09:50:14 +02:00
24b5e9361e docs(tags): ADR-019 + CLAUDE.md/TODO/CAPABILITIES (tagging standard)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 09:42:22 +02:00
9584cc2c76 feat(tags): Proxmox VM metadata convention (managed-by=terraform)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 09:39:19 +02:00