Commit graph

207 commits

Author SHA1 Message Date
0030b45bbd docs(adr-016): soften the second stale off-site-backup claim (R8 consistency)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 11:42:49 +02:00
a483f4e55c fix: address whole-branch review (anchor pin regexp, ADR-016 backup note, verify comment)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 11:41:19 +02:00
c09b7fe6a5 docs(security): accept the single-coordinator mesh SPOF (R8) + ADR-016 availability amendment
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 11:34:21 +02:00
0286c78f36 docs(plan): mesh-hardening SPOF — accept + DNS-resilience implementation plan
Two tasks: a base mesh coordinator-FQDN /etc/hosts pin (Molecule TDD) + the accept-and-document docs (R8, ADR-016 availability amendment, STATUS/ROADMAP). Coordinator backup deferred to ADR-022.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 10:49:26 +02:00
3ba22d199a docs(spec): mesh-hardening SPOF — accept single-coordinator SPOF + DNS-resilience pin
Sub-project 3 of the mesh-hardening follow-on. Accepts the single off-site coordinator as a documented availability SPOF (R8 + ADR-016 amendment) given the narrow blast radius (LAN/intra-cluster/local traffic unaffected; only remote relayed mesh access breaks). Hardens the one real gap: a base mesh coordinator-FQDN /etc/hosts pin so managed hosts survive a local-DNS hiccup. Coordinator off-site backup explicitly deferred to an ADR-022 kickoff (no throwaway infra).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 10:42:19 +02:00
f10fe8bb60 docs(status): mesh-hardening askari redesign applied + live reboot-validated (2026-06-20)
Live cutover complete: base INPUT-only default-deny + wt0-primary SSH + permanent WAN break-glass on askari, netbird_coordinator geo-disabled. A real reboot recovered unattended — firewall persisted, Docker forwarding + public services up, coordinator geo-disabled (no FATAL), mesh + both SSH paths back. ROADMAP sub-project 3 (askari redesign) marked DONE; next = relay-SPOF reduction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 09:22:20 +02:00
d6e80990b2 fix(integration): real wait_for_ip arp-fallback test + document substrate coverage gap
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 22:41:11 +02:00
4933186d31 docs(friction): task-3 integration-gate findings (dnsmasq, nftables, hostname)
Documents three blockers found while developing the askari_inputonly
integration-test profile:

1. inet filter default-deny silently blocks libvirt dnsmasq DHCP: nftables
   multi-table independence means ip filter LIBVIRT_INP accept does NOT
   prevent inet filter drop. Diagnosed via strace; fixed with a drop-in.

2. libvirt leaseshelper PID-file: virPidFileReleasePath unlinks the file after
   every call; nobody cannot recreate in /run/. Fix: suid root C wrapper.

3. cloud-init rejects underscores in local-hostname → skips network-config
   → no DHCP. Fix: sanitize with replace("_", "-") in meta-data hostname.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-19 19:16:45 +02:00
9f0626040b docs(todo): add note on ubongo↔cluster network topology question
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-19 19:15:18 +02:00
61cbcc6c18 docs(friction): re-asked settled defaults (push + subagent-driven) at plan->execute handoff
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 17:11:01 +02:00
6be758bece docs(plan): mesh-hardening redesign — askari implementation plan
Four tasks: netbird_coordinator geolocation disable (TDD via Molecule) -> inventory enablement (INPUT-only firewall + WAN break-glass + manage over wt0) -> an askari_inputonly integration profile (the reboot-safety GREEN gate) -> the operator-gated supervised live cutover + STATUS/ROADMAP update. Tasks 1-3 are autonomously implementable; Task 4 is operator-gated (live off-site host, lockout risk).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 16:32:27 +02:00
a178729587 docs(spec): mesh-hardening redesign — askari wt0-primary + WAN break-glass
Redesign of the backed-out 2026-06-17 askari SSH->wt0 attempt. Mirrors the proven ubongo 2/3 pattern (INPUT-only default-deny, SSH scoped by iifname wt0, no sshd ListenAddress change -> no boot-race) and adds the coordinator-host exception the incident demanded: a permanent non-mesh break-glass (WAN :22 from ubongo's static WAN IP + the Hetzner console), WAN :22 deliberately left open. Folds in the netbird_coordinator geo-DB robustness fix (FRICTION #4) so a transient egress blip can't FATAL the control plane. Harness-GREEN gate before a supervised live cutover.

Operator decision (2026-06-19): do this redesign first, then a separate sub-project to reduce askari's SPOF role.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 16:25:26 +02:00
ef5e049e9b docs(status): mesh-hardening 2/3 — ubongo reboot-validated
After an operator reboot of ubongo, verified live that the INPUT-only default-deny ruleset re-applied on boot (input chain policy drop + the full wt0/ssh-from-control/admin-addr allow-list), the wt0 mesh came back (Management+Signal Connected), and both SSH paths recovered clean. Closes the 'real-host reboot validation pending' item for mesh-hardening 2/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 16:25:19 +02:00
fa2c4c6368 docs(status): mesh-hardening 2/3 — ubongo INPUT-only default-deny applied
base firewall applied + live-verified on ubongo (INPUT-only default-deny;
base__firewall_input_only). Records the Docker-nat-flush caveat (needs a restart
docker on a Docker host), the claude self-SSH grant, and reboot-validation-pending.
ROADMAP: sub-project 2 done; remaining = NetBird ACL + askari redesign.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 15:34:20 +02:00
a881185c73 docs(friction): base firewall flush wipes Docker nat (cutover finding)
Applying base's nftables (even INPUT-only/forward-accept) to a Docker host
flushes Docker's ip nat -> container egress breaks until 'systemctl restart
docker'. Found on the ubongo mesh-hardening 2/3 live cutover; the Docker-less
test VM couldn't surface it. Self-heals on reboot (dockerd re-adds nat;
forward=accept doesn't block). Runbook/docker_host follow-ups noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 15:16:21 +02:00
180af46879 docs(friction): log the Molecule input_only-accept coverage gap
Final-review finding: the default Molecule scenario only renders the forward
drop (input_only off) branch; the accept branch is covered by the integration
harness only. Tracked for a kaizen decision (2nd scenario vs accept the split).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 10:40:29 +02:00
8d8c86fa39 docs(friction): VM-testing standard + libvirt stale-session gotcha
Two signals from running the ubongo harness gate: (1) the operator wants a
standard pre-authorising isolated VM integration tests on ubongo so the agent
doesn't ask each time; (2) a stale agent session (shell predating the
integration_test libvirt-group grant) carries stale process groups, so the
harness's qemu-img/file writes are denied -> run via 'sg libvirt -c ...';
self-heal idea noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 10:32:09 +02:00
66a9a0af08 docs: ubongo admin-addrs add 10.20.10.17 + flag raw-lease follow-up
Allow a second operator workstation (10.20.10.17) onto ubongo's LAN SSH
alongside mamba (10.20.10.50). Both are raw DHCP leases; recorded a FRICTION
open signal to replace them with MAC-pinned OPNsense reservations when
OPNsense-as-code lands (ADR-020 / TODO 3.5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:26:04 +02:00
e14e347047 docs(plan): mesh-hardening 2/3 — ubongo implementation plan
Five tasks: base knobs (input-only forward policy + admin-addr SSH allow,
TDD via Molecule) → enable on the control group → a 'be ubongo' integration
profile (profile-aware verify) → the real-VM harness GREEN gate → the
operator-supervised live cutover (signal-6 order, physical-console break-glass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:26:04 +02:00
24a1d909c9 docs(spec): mesh-hardening 2/3 — ubongo INPUT-only default-deny
Sub-project 2 of the mesh-hardening follow-on (the post-incident roadmap
ordering puts ubongo first). Harden the control node's inbound surface via
base's nftables firewall as INPUT-only default-deny: the forward chain stays
permissive (new base__firewall_input_only knob) so Docker egress + the
libvirt-NAT integration harness keep working, and there is no sshd ListenAddress
change — sidestepping the ip_nonlocal_bind boot-race that sank askari. SSH
allowed from wt0, ssh-from-control (Ansible self), and mamba on the LAN (new
base__firewall_admin_addrs). Harness-validated before an operator-supervised
cutover; the physical console is the permanent break-glass.

Design maps to the four relevant 2026-06-17 incident lessons (FRICTION signals
1/2/3/6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:12:58 +02:00
77a20b8d40 docs(runbook): netbird-client mesh-drop / DNS troubleshooting
Document the 2026-06-18 incident class: a road-warrior laptop losing DNS on a network transition strands NetBird (can't resolve the coordinator FQDN), taking ubongo unreachable until DNS recovers. Adds triage (local DNS vs coordinator), device mitigations (reliable resolvers + hosts-file pin), the non-mesh LAN break-glass to ubongo, and why ubongo is relay-only (deferred mesh-hardening, not a bug) — including the break-glass rule that hardening must preserve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 22:30:41 +02:00
bc8592616b fix: address final whole-branch review findings
- ADR-023 §4: ADR-015 no-sudo sub-decision now Superseded-by ADR-025 (bidirectional), not just an in-place amendment.
- STATUS: drop the deferred `reset` verb; honest integration_test (molecule not run in this env; applied to ubongo) + verify (forward/DNAT, not wt0); RED->GREEN validated.
- driver: remove unused `import shutil`.
- README: fix the ADR-025 link filename.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:52:28 +02:00
d7bd31babb docs(adr/status): integration-testing harness RED→GREEN validated (ADR-025)
The local-VM integration harness RED→GREEN acceptance passed on real hardware
(2026-06-18): a KVM VM on ubongo reproduced the 2026-06-17 nftables/Docker reboot
breakage (RED) and survived with the docker_host container-forward drop-in (GREEN).

ADR-025: Status updated to PASSED; shakedown learnings section added (UEFI boot
required, claude sudo load-bearing); ADR-021 added to Related.
STATUS.md: integration-harness section updated from PENDING to PASSED; ubongo
entry updated to reflect claude NOPASSWD sudo + sjat-ansible NOPASSWD removal;
last-reviewed date updated.
docs/TODO.md: item 2.4 collapsed to one-line pointer per the file's convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:39:30 +02:00
cc772ff845 docs(adr/security): record claude NOPASSWD sudo model (ADR-015 amend + R7)
The integration-testing shakedown reversed ADR-015's "no local sudo" sub-decision:
the claude AI-worker now has NOPASSWD:ALL sudo on ubongo — without it, virsh,
nft, and journalctl all block during VM diagnosis. Compensating controls:
password-locked account, auditd/Loki attribution, repo-managed revocable drop-in.

ADR-015: dated amendment note in Status + expanded AI-worker identity section.
ADR-021: new §Sudo model (amendment 2026-06-18) — claude=NOPASSWD, sjat=password
required; former sjat NOPASSWD drop-in removed 2026-06-18 (least-privilege cleanup).
accepted-risks.md: R7 added (claude NOPASSWD:ALL on ubongo); last-reviewed updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:39:20 +02:00
941141e270 docs(friction): capture 9 signals from the ADR-025 harness shakedown
UEFI-vs-BIOS boot loop, no-sudo diagnosis gap (-> claude sudo decision), qemu
session-vs-system URI, system-qemu home-traversal, directory-inventory phantom
hosts, jinja trim_blocks render trap, empty apt lists on fresh cloud images,
NAT-gateway firewall allow, and the review-vs-hardware coverage lesson.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 16:30:13 +02:00
f51ae1a13d docs(runbook): integration-testing runbook + pre-flight cross-links
- New docs/runbooks/integration-testing.md: when to use (firewall/
  sshd/boot/Docker changes); make test-integration commands; lower-
  level driver sub-commands; cert tier guidance; diagnostics dir;
  VM inspection (virsh console / SSH); safety invariants; resource
  constraints; adding a new profile; self-validating acceptance test.
- docs/runbooks/new-host.md: pre-flight warning before deploying
  lockout-risky changes (firewall/sshd/boot) while break-glass is open
- docs/runbooks/new-role.md: step 13 pre-flight for lockout-risky roles

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:59:06 +02:00
4732730515 docs: wire ADR-025 into testing/control-host/risks/status/capacity
- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the
  "not tested in Molecule" table
- ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs
  (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing
- accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records)
- CLAUDE.md: add make test-integration[/-clean] to key-commands;
  add ADR-025 + runbook rows to further-reading
- hardware/reference.md: note one ephemeral KVM test VM on ubongo
- STATUS.md: add integration harness entry (built, lint+pytest clean;
  RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:51:22 +02:00
edcc347a95 docs(adr): ADR-025 local VM integration testing
Accepted decision to implement ADR-008 Level 2/3 on ubongo via
libvirt/KVM directly: throwaway VM overlays, stdlib-only driver,
tiered cert fidelity, three safety invariants. Addresses the
2026-06-17 mesh-hardening incident's reboot-survivability gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:49:52 +02:00
65533be4d9 docs(plan): implementation plan for local VM integration testing (2.4)
20-task TDD plan: integration_test substrate role, stdlib virsh driver, askari profile, tiered certs, RED->GREEN acceptance, docker_host container-forward fix, ADR-025 + docs. Follows the 2026-06-18 design spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 11:56:04 +02:00
02e1eb7449 docs(spec): design local VM integration testing on ubongo (2.4)
Throwaway KVM VMs on ubongo (libvirt, Approach A) that mirror a real host (real Docker, real reboot, real role apply) to catch the reboot/firewall/boot-order class Molecule cannot - the 2026-06-17 mesh-hardening incident. First profile: be askari; tiered certs (internal + le-staging built, le-prod-wildcard on-demand). Concrete build of ADR-008 Level 2/3; to be recorded as ADR-025.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 11:35:51 +02:00
69faaf5e43 docs(todo): local VM integration testing (2.4) + screenshot hand-off (10.8)
From the 2026-06-17 mesh-hardening incident: Molecule can't catch
reboot/firewall-x-Docker/boot-order bugs — build local-VM pre-deploy testing
on ubongo (ADR-008 Level 2/3). And a smooth screenshot hand-off for the agent
during incidents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 22:27:26 +02:00
958e35e3c3 docs(friction): capture 6 signals from the mesh-hardening 1/3 incident
firewall-breaks-Docker-hosts, ip_nonlocal_bind didn't beat the boot race,
coordinator-host circular bootstrap, NetBird geo-DB FATAL dependency, no
off-site coordinator backup, and reboot-tested-after-removing-break-glass.
For the next /kaizen + the mesh-hardening re-spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 22:21:19 +02:00
dfa363cecd docs(plan): mesh-hardening 1/3 — askari SSH onto wt0 implementation plan
5 tasks: base sshd ListenAddress+ip_nonlocal_bind (Molecule), firewall public
zone + askari catalog, inventory wt0 override, TF retire WAN :22, then the live
operator-supervised staged cutover.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 20:25:59 +02:00
292c204752 docs(spec): mesh-hardening 1/3 — move askari SSH onto wt0
Decomposes the M5 mesh-hardening follow-on into 3 independent sub-specs; this
is sub-project 1. Three-layer SSH-on-wt0 (sshd ListenAddress=mesh + nftables
iifname wt0 + retire the Hetzner WAN :22), ip_nonlocal_bind to beat the
post-boot wt0 bind race (fail-closed), live wt0 fact for the listen addr,
staged cutover with the firewall auto-rollback as the safety gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 20:15:12 +02:00
e5a8e5d3b9 docs(roadmap): Phase 1 complete — point Next step at mesh-hardening follow-on
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 18:39:08 +02:00
a0762c563e docs(kaizen): bind-mount gotcha + consume 7 signals into the ledger (2026-06-17)
Migrate the single-file-bind-mount/stale-config gotcha (reload-in-place needs a
directory mount; restart-based roles don't) to docs/testing/gotchas.md, and move
all 7 open signals out of FRICTION.md's Open-signals section into the new
2026-06-17 decisions-ledger block: all consumed, 1 PARK (the ubongo
self-management gap, tracked in STATUS), 0 REMOVE. Relax test_load_signals to
accept an empty Open-signals section (the goal state after a kaizen pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 17:50:17 +02:00
c1323a3f29 feat(make): registry-login via vaulted Forgejo token (kaizen)
scripts/registry-login.sh reads vault.forgejo.registry_token and pipes it to
docker login --password-stdin (never echoed, never on argv); 'make registry-login'
wires it with the venv binaries. Adds the operator-minted CHANGEME vault stub
(fill via make edit-vault) and a per-machine prereq note in the claude-code-setup
runbook, so 'make caddy-image-push'/'molecule-image-push' become agent-completable
non-interactively. Consumes the 2026-06-15 signal in docs/FRICTION.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 17:50:07 +02:00
5d14efc864 docs: Phase 1 complete — clients enrolled + NetBird client runbook
mamba + work laptop enrolled in the mesh → ubongo reachable from anywhere; the
mobile-access goal is met and Phase 1 (remote access) is complete. Adds
docs/runbooks/netbird-client.md (reusable client-enrollment runbook) + STATUS/
ROADMAP flips + CLAUDE.md reading-table entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 17:11:32 +02:00
4c8fb9e03b docs: M5 mesh enrollment — ubongo + askari on the mesh
STATUS: base mesh concern built + applied; ubongo (100.99.146.14) + askari
(100.99.226.39) enrolled, link verified; ubongo agent-management access (sjat key
+ NOPASSWD sudo) recorded. ROADMAP M5: infra done, laptops = operator step,
mesh-hardening split out as the deferred follow-on. FRICTION: docs-only-commit rbw
guard + control-node self-management access gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 16:40:02 +02:00
4cfc3cddd5 docs(friction): re-asked operator about push + execution mode (settled)
I re-surfaced two already-settled decisions as questions (push to origin; subagent
vs inline) at the M5 handoff. The existing execution-mode guard only matches the
writing-plans menu's literal text, so free-form prose re-asks slip through. Default:
push as backup and go subagent-driven without asking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 15:58:26 +02:00
55776fb03c docs(plan): M5 mesh-enrollment implementation plan
8 tasks: build the base 'mesh' concern + tag + vault stub + per-host opt-in
(autonomous), operator handoff for /setup + setup key, gated live enrol of
ubongo + askari, operator laptop enrol, docs. Reachability-only; lockdown deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 15:49:28 +02:00
4142bb15f8 docs(spec): M5 mesh-enrollment design (reachability-only)
base 'mesh' concern enrols NetBird agents on ubongo + askari via a reusable scoped
setup key (vault); laptops enrolled by the operator. Reachability via the default
peer policy; the base nftables default-deny on ubongo + ACL tightening are deferred
to a follow-on. Resolves ROADMAP M5 design; next: writing-plans.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 15:44:13 +02:00
684718f4a5 docs(netbird): M4b done — STATUS/ROADMAP/risks/friction
netbird_coordinator built + applied to askari (first service role, dashboard live).
STATUS: new "real and working" row + askari/coordinator rows updated. ROADMAP: M4b
done, M5 (peer enrol) next, recorded the v0.72.4 combined-container/embedded-Dex/
no-Coturn reality. accepted-risks R3: Coturn -> STUN wording. FRICTION: single-file
bind-mount stale-inode gotcha + check-before-first-deploy artifact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 07:48:53 +02:00
19e675fa5a docs(friction): log registry-push auth gotcha (no creds in vault)
Building images is fully automatable; pushing to the Forgejo registry needs an
interactive docker login, and registry creds aren't in vault — so an agent can't
complete a push. Captured for the next kaizen review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:58:45 +02:00
b3468b34e4 docs: record Caddy/Gandi DNS-01 as resolved + proven (was M4a deferral)
ADR-024 Status/Consequences, STATUS.md, ROADMAP M4a, and the FRICTION ledger now
record that the DNS-01 path is built and proven, with the root cause of the M4a
failure (version skew: pre-Bearer libdns/gandi sent the deprecated Apikey header;
plus building on a Hetzner IP). Traefik was reconsidered and rejected again — lego's
Gandi provider has the same PAT-vs-Apikey question, so it would not have helped.

Dated review reports and spec/plan snapshots are left as historical records.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:57:55 +02:00
293c1f88d8 docs(todo): collapse done items to one-line pointers; open-only convention
TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 22:00:53 +02:00
13ae674cc9 chore(kaizen): first /kaizen run — curate 12 friction signals
Dogfood of the new /kaizen command. 11 consumed, 1 kept open.
- SYSTEMATIZE → docs/testing/gotchas.md (apply:{tags} propagation, Molecule
  tag-isolation testing, API/templating render-only gap); CLAUDE.md
  (item['key'] loop convention, TF module required_providers); public_dns
  README (Gandi null-MX workaround).
- CHANGE → extend the Stop hook to also guard the brainstorming spec-review gate
  (verified: blocks the gate, passes meta-discussion).
- SYSTEMATIZE → make new-role scaffolds the access__/backup__ noqa reminder;
  ADR-004 documents the cross-role-naming convention.
- ALREADY-BUILT/ACCEPTED → exec-menu guard verified firing; ADR-023; ADR-024;
  subagent-faithfulness now embodied in the two-stage subagent review.
- KEEP-OPEN → a repo-scan.py check for ADRs that over-claim reconciliation.

Nudge: OVERDUE (13 signals) → ok (1). make lint + 16 friction-scan tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:46:23 +02:00
d1e1e38879 feat(kaizen): nudge in /review-repo; STATUS + TODO
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:27:23 +02:00
d14639e80a docs(plan): /kaizen command — implementation plan (TODO 11)
7 tasks: friction-scan.py (TDD, --json/--nudge) + tests; kaizen.md command;
/review-repo nudge hookup + STATUS/TODO; dogfood run. Mirrors /review-repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:09:29 +02:00
1a0e30e278 docs(spec): /kaizen — kaizen-loop command (TODO 11)
Curate-only consume pass over FRICTION.md Open signals: interactive guided
session, add/change/park/remove verdicts (park-with-resurrection-trigger to
protect out-of-phase tooling on a solo project), single source = FRICTION.md,
ledger is the durable record. Mirrors /review-repo (command md + stdlib scanner).
Stage 1 on-demand + stage-2 nudge; headless/cron deferred (TODO 11.3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:05:09 +02:00