boma/docs/superpowers/specs/2026-06-14-kaizen-command-design.md
sjat 1a0e30e278 docs(spec): /kaizen — kaizen-loop command (TODO 11)
Curate-only consume pass over FRICTION.md Open signals: interactive guided
session, add/change/park/remove verdicts (park-with-resurrection-trigger to
protect out-of-phase tooling on a solo project), single source = FRICTION.md,
ledger is the durable record. Mirrors /review-repo (command md + stdlib scanner).
Stage 1 on-demand + stage-2 nudge; headless/cron deferred (TODO 11.3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:05:09 +02:00

9.9 KiB
Raw Blame History

/kaizen — kaizen-loop command (design)

Status: Designed, not built. Resolves docs/TODO.md item 11 (Kaizen loop). Date: 2026-06-14.

Context

boma runs a kaizen (continuous-improvement) loop on its own methodology and tooling. The capture half already works: raw signals are appended to docs/FRICTION.mdOpen signals during work (tags [friction]/[gotcha]/[recurring]/[unused]). The consume half — periodically reading those signals, deciding add / change / park / remove, migrating durable knowledge into the right docs, and archiving consumed signals into the decisions ledger — is manual and therefore easy to skip. The one kaizen review on record (FRICTION.md, 2026-06-10 ledger block) was done by hand; its own process note says "the /retro tool (TODO 11) still isn't built, so this review was manual."

This spec defines the command that makes the consume pass a repeatable, low-friction ritual. The name /kaizen is chosen over the placeholder /retro: a "retro" is a backward-looking, time-boxed ritual, whereas this is continuous improvement — and FRICTION.md already speaks this language ("kaizen friction log", "Kaizen reviews — decisions ledger"), so the command name reinforces the artifact names.

Decisions (from the 2026-06-14 brainstorm)

  1. Scope: curate-only. /kaizen consumes the FRICTION.md Open signals; it does not auto-harvest new signals. Capture stays manual and continuous. FRICTION.md is the single input source (single source of truth).
  2. Verdict model: add / change / park / remove, with a critical split on the reductive side to protect a solo, phase-shifting project:
    • Knowledge is never removed — it is migrated to the right doc or archived to the ledger. The reductive verdicts act only on active surface (scripts, checks, conventions, plugins), never on understanding.
    • parkout-of-phase but not obsolete; plausibly valuable in a later focus. Moved out of the active surface but recorded in the ledger with (a) where it now lives (git SHA/branch/doc) and (b) an explicit resurrection trigger.
    • remove — reserved for the obsolete: superseded, wrong, never worked, duplicated.
    • Every reductive verdict must classify why unused: obsolete → remove, out-of-phase → park. The default for "not touched lately but not wrong" is park.
    • Reversibility safety net: single operator + everything in git, so even a wrong remove is git revert-able with a ledger breadcrumb; park lowers the cost further.
    • Precedent in-repo: docs/runbooks/claude-code-setup.md already lists "Deferred plugins … with triggers" — park-with-a-trigger, made a first-class kaizen outcome.
  3. Trigger model: on-demand command + a light nudge, staged.
    • Stage 1 (this spec): the on-demand /kaizen command.
    • Stage 2 (this spec, small follow-on): a nudgefriction-scan.py --nudge prints a one-line "loop overdue" reminder, surfaced inside /review-repo's report.
    • Deferred (TODO 11.3): a scheduled headless run (report-only) once the notification (ntfy) + scheduled-job/cron stack exists.
  4. Apply model: interactive guided session. /kaizen proposes verdicts (one or grouped); the operator approves / modifies / rejects each; on approval the command performs the mechanical edit and shows the diff, then commits at close-out. There is no auto-applied "safe class" (unlike /review-repo): every kaizen verdict is a judgment call, so the human is in the loop for each. Report-only behaviour is reserved for the future headless path.

Components

Mirrors the /review-repo shape (.claude/commands/review-repo.md + scripts/repo-scan.py):

  1. scripts/friction-scan.py — stdlib only; parses FRICTION.md Open signals and emits structured data. Two modes:
    • --json (default): the Phase-0 input for /kaizen.
    • --nudge: prints one line and signals "overdue" per the thresholds below.
  2. .claude/commands/kaizen.md — the interactive curation process (session flow below).
  3. tests/test_friction_scan.py — unit tests for the parser (matches the tests/test_repo_scan.py convention).
  4. /review-repo hook-up (stage 2)review-repo.md calls friction-scan.py --nudge and includes the line in its report.
  5. Deferred: the headless/cron path (TODO 11.3).

friction-scan.py output schema

Per Open signal: {tag, first_seen, age_days, recurrence_count, referenced_paths, still_exists, text}.

  • tag — one of friction / gotcha / recurring / unused.
  • first_seen / age_days — parsed from the leading date — [tag] marker.
  • recurrence_count — best-effort from explicit markers in the entry (entries already write "5th occurrence (06-05/06/06/…)"); refined by the human during triage.
  • referenced_paths / still_exists — paths the signal names and whether they still exist on disk (a missing target hints the signal may be already-resolved).

--nudge reports overdue when any holds: recurrence_count >= 3 for any signal, open count >= 8, or oldest age_days >= 21. Thresholds are constants, tunable, and the self-eval phase revisits them.

The /kaizen session flow

Phase 0 — scan (deterministic). Run friction-scan.py --json. Produces the agenda and the cheap "is this still real?" check (still_exists).

Phase 1 — triage. Order signals by recurrence, then age, then tag. Group signals sharing a root cause (e.g. the execution-mode-menu and brainstorming-gate signals are both "external skill script vs boma convention" — curated together). Present the agenda before editing anything: counts of open / recurring / likely-already-resolved.

Phase 2 — per-signal curation (interactive). For each signal/group, present: a one-line restatement, the evidence (age/recurrence, still-real), and a proposed verdict

  • systematize → migrate the durable lesson into its right home (a runbook, an ADR, CLAUDE.md, a new repo-scan.py check, or a hook),
  • change → adjust an existing tool/convention/config rather than document it,
  • park → ledger row with git location + resurrection trigger,
  • remove → obsolete; ledger row with the reason,
  • already-built → the systematization already exists / the fix landed elsewhere; archive,
  • accepted → conscious no-op (revisit-if-recurs); archive,
  • keep-open → still accruing; leave in Open signals (the only verdict with no ledger row).

The ledger verdict vocabulary is therefore SYSTEMATIZE · CHANGE · PARK · REMOVE · ALREADY-BUILT · ACCEPTED (keep-open produces no row). These extend the verdicts the 2026-06-10 ledger block already used (CHANGE, MIGRATE, already built, accepted).

The operator approves / modifies / rejects each. On approval, the command performs the mechanical edit (migrate the text into the target doc; move the signal from Open signals into the ledger table; delete/park the file) and shows the diff. park and remove both delete from the active tree — the difference is the ledger row (park records a resurrection trigger). Git history + the ledger row are the park mechanism; there is no parked/ graveyard directory.

Phase 3 — close-out.

  • Write a new dated review block in the ledger (newest-first, same shape as the 2026-06-10 block).
  • Bias-to-remove discipline check — if every verdict this pass was "add", flag that the loop is only accreting.
  • Self-eval (light) — is /kaizen being run often enough (oldest-consumed age); should the nudge thresholds change.
  • make lint if code/docs changed; commit per CLAUDE.md git conventions (the curation is one logical unit — straight to main if small/safe, a branch if sweeping).
  • Print a one-line summary: consumed X · parked Y · removed Z · kept-open W · migrated → <docs>.

Ledger row format

A new dated block extends the existing ## Kaizen reviews — decisions ledger table:

column content
Signal (first seen) the signal + first-seen date + recurrence (e.g. "5× 06-05…06-14")
Verdict SYSTEMATIZE · CHANGE · PARK · REMOVE · ALREADY-BUILT · ACCEPTED
Resolution / where it lives now systematize → the doc/guard it migrated to; park → git location + resurrection trigger; remove → why obsolete

Parked rows stay permanently visible in the ledger with their trigger, so a future phase can grep PARK and revive — the explicit answer to "don't drop something we'll come back to."

Out of scope (YAGNI)

  • Headless/cron run — deferred to the notify + cron stack (TODO 11.3).
  • Auto-harvesting new signals — rejected; capture stays manual, and the [unused] tag is how dormant tooling enters the loop.
  • Decision/ADR re-challenge — that is TODO 13 ("Intentions"), a separate future command; /kaizen curates methodology/tooling signals, not service/architecture decisions.
  • Auto tooling-usage inventory — rejected for the same reason as auto-harvest.
  • A separate report artifact (à la docs/reviews/) — the ledger is the durable record; the interactive session is the "report".

Relationship to /review-repo

/review-repo audits repo drift (code/doc staleness, conformance). /kaizen curates methodology/tooling friction. They stay distinct. Future integration (not in this spec): when /review-repo sees a finding recur across runs, it could append a [recurring] signal to FRICTION.md, making /review-repo a producer into the single input source that /kaizen consumes.

Build order

  1. scripts/friction-scan.py (--json) + tests/test_friction_scan.py.
  2. .claude/commands/kaizen.md (the session flow).
  3. First real /kaizen run against the current Open signals (dogfood).
  4. Stage 2: --nudge + /review-repo hook-up.
  5. (Deferred) headless/cron — TODO 11.3.