boma/docs/superpowers/specs/2026-06-14-kaizen-command-design.md

# `/kaizen` — kaizen-loop command (design)

**Status:** Designed, not built. Resolves `docs/TODO.md` item 11 (Kaizen loop).
**Date:** 2026-06-14.

## Context

boma runs a kaizen (continuous-improvement) loop on its own methodology and tooling.
The *capture* half already works: raw signals are appended to `docs/FRICTION.md` →
*Open signals* during work (tags `[friction]`/`[gotcha]`/`[recurring]`/`[unused]`). The
*consume* half — periodically reading those signals, deciding **add / change / park /
remove**, migrating durable knowledge into the right docs, and archiving consumed signals
into the decisions ledger — is **manual** and therefore easy to skip. The one kaizen
review on record (`FRICTION.md`, 2026-06-10 ledger block) was done by hand; its own
process note says "the `/retro` tool (TODO 11) still isn't built, so this review was
manual."

This spec defines the command that makes the consume pass a repeatable, low-friction
ritual. The name **`/kaizen`** is chosen over the placeholder `/retro`: a "retro" is a
backward-looking, time-boxed ritual, whereas this is continuous improvement — and
`FRICTION.md` already speaks this language ("kaizen friction log", "Kaizen reviews —
decisions ledger"), so the command name reinforces the artifact names.

## Decisions (from the 2026-06-14 brainstorm)

1. **Scope: curate-only.** `/kaizen` consumes the `FRICTION.md` *Open signals*; it does
   **not** auto-harvest new signals. Capture stays manual and continuous. `FRICTION.md`
   is the **single input source** (single source of truth).
2. **Verdict model: add / change / park / remove**, with a critical split on the
   reductive side to protect a solo, phase-shifting project:
   - **Knowledge is never removed** — it is *migrated* to the right doc or *archived* to
     the ledger. The reductive verdicts act only on *active surface* (scripts, checks,
     conventions, plugins), never on understanding.
   - **park** — *out-of-phase but not obsolete*; plausibly valuable in a later focus.
     Moved out of the active surface but recorded in the ledger with **(a)** where it now
     lives (git SHA/branch/doc) and **(b)** an explicit **resurrection trigger**.
   - **remove** — reserved for the *obsolete*: superseded, wrong, never worked, duplicated.
   - Every reductive verdict must classify *why unused*: **obsolete → remove**,
     **out-of-phase → park**. The default for "not touched lately but not wrong" is **park**.
   - Reversibility safety net: single operator + everything in git, so even a wrong
     `remove` is `git revert`-able with a ledger breadcrumb; `park` lowers the cost further.
   - Precedent in-repo: `docs/runbooks/claude-code-setup.md` already lists "Deferred
     plugins … with triggers" — park-with-a-trigger, made a first-class kaizen outcome.
3. **Trigger model: on-demand command + a light nudge, staged.**
   - Stage 1 (this spec): the **on-demand** `/kaizen` command.
   - Stage 2 (this spec, small follow-on): a **nudge** — `friction-scan.py --nudge`
     prints a one-line "loop overdue" reminder, surfaced inside `/review-repo`'s report.
   - Deferred (TODO 11.3): a **scheduled headless** run (report-only) once the
     notification (ntfy) + scheduled-job/cron stack exists.
4. **Apply model: interactive guided session.** `/kaizen` proposes verdicts (one or
   grouped); the operator approves / modifies / rejects each; on approval the command
   performs the mechanical edit and shows the diff, then commits at close-out. There is
   no auto-applied "safe class" (unlike `/review-repo`): every kaizen verdict is a
   judgment call, so the human is in the loop for each. Report-only behaviour is reserved
   for the future headless path.

## Components

Mirrors the `/review-repo` shape (`.claude/commands/review-repo.md` + `scripts/repo-scan.py`):

1. **`scripts/friction-scan.py`** — stdlib only; parses `FRICTION.md` *Open signals* and
   emits structured data. Two modes:
   - `--json` (default): the Phase-0 input for `/kaizen`.
   - `--nudge`: prints one line and signals "overdue" per the thresholds below.
2. **`.claude/commands/kaizen.md`** — the interactive curation process (session flow below).
3. **`tests/test_friction_scan.py`** — unit tests for the parser (matches the
   `tests/test_repo_scan.py` convention).
4. **`/review-repo` hook-up (stage 2)** — `review-repo.md` calls `friction-scan.py
   --nudge` and includes the line in its report.
5. **Deferred:** the headless/cron path (TODO 11.3).

### `friction-scan.py` output schema

Per *Open signal*: `{tag, first_seen, age_days, recurrence_count, referenced_paths,
still_exists, text}`.

- `tag` — one of `friction` / `gotcha` / `recurring` / `unused`.
- `first_seen` / `age_days` — parsed from the leading `date — [tag]` marker.
- `recurrence_count` — best-effort from explicit markers in the entry (entries already
  write "5th occurrence (06-05/06/06/…)"); refined by the human during triage.
- `referenced_paths` / `still_exists` — paths the signal names and whether they still
  exist on disk (a missing target hints the signal may be already-resolved).

`--nudge` reports **overdue** when **any** holds: `recurrence_count >= 3` for any signal,
open count `>= 8`, or oldest `age_days >= 21`. Thresholds are constants, tunable, and the
self-eval phase revisits them.

## The `/kaizen` session flow

**Phase 0 — scan (deterministic).** Run `friction-scan.py --json`. Produces the agenda
and the cheap "is this still real?" check (`still_exists`).

**Phase 1 — triage.** Order signals by recurrence, then age, then tag. Group signals
sharing a root cause (e.g. the execution-mode-menu and brainstorming-gate signals are both
"external skill script vs boma convention" — curated together). Present the agenda before
editing anything: counts of open / recurring / likely-already-resolved.

**Phase 2 — per-signal curation (interactive).** For each signal/group, present: a
one-line restatement, the evidence (age/recurrence, still-real), and a proposed
**verdict** —
- **systematize** → migrate the durable lesson into its right home (a runbook, an ADR,
  `CLAUDE.md`, a new `repo-scan.py` check, or a hook),
- **change** → adjust an existing tool/convention/config rather than document it,
- **park** → ledger row with git location + resurrection trigger,
- **remove** → obsolete; ledger row with the reason,
- **already-built** → the systematization already exists / the fix landed elsewhere; archive,
- **accepted** → conscious no-op (revisit-if-recurs); archive,
- **keep-open** → still accruing; leave in *Open signals* (the only verdict with no ledger row).

The ledger verdict vocabulary is therefore `SYSTEMATIZE · CHANGE · PARK · REMOVE ·
ALREADY-BUILT · ACCEPTED` (keep-open produces no row). These extend the verdicts the
2026-06-10 ledger block already used (CHANGE, MIGRATE, already built, accepted).

The operator approves / modifies / rejects each. On approval, the command performs the
mechanical edit (migrate the text into the target doc; **move the signal from *Open
signals* into the ledger table**; delete/park the file) and shows the diff. **park and
remove both delete from the active tree** — the difference is the ledger row (park records
a resurrection trigger). Git history + the ledger row *are* the park mechanism; there is
no `parked/` graveyard directory.

**Phase 3 — close-out.**
- Write a new dated review block in the ledger (newest-first, same shape as the 2026-06-10
  block).
- **Bias-to-remove discipline check** — if every verdict this pass was "add", flag that
  the loop is only accreting.
- **Self-eval** (light) — is `/kaizen` being run often enough (oldest-consumed age); should
  the nudge thresholds change.
- `make lint` if code/docs changed; commit per `CLAUDE.md` git conventions (the curation
  is one logical unit — straight to `main` if small/safe, a branch if sweeping).
- Print a one-line summary: consumed X · parked Y · removed Z · kept-open W · migrated →
  \<docs>.

### Ledger row format

A new dated block extends the existing `## Kaizen reviews — decisions ledger` table:

| column | content |
|---|---|
| Signal (first seen) | the signal + first-seen date + recurrence (e.g. "5× 06-05…06-14") |
| Verdict | `SYSTEMATIZE` · `CHANGE` · `PARK` · `REMOVE` · `ALREADY-BUILT` · `ACCEPTED` |
| Resolution / where it lives now | systematize → the doc/guard it migrated to; **park** → git location + resurrection trigger; remove → why obsolete |

Parked rows stay permanently visible in the ledger with their trigger, so a future phase
can `grep PARK` and revive — the explicit answer to "don't drop something we'll come back to."

## Out of scope (YAGNI)

- **Headless/cron run** — deferred to the notify + cron stack (TODO 11.3).
- **Auto-harvesting new signals** — rejected; capture stays manual, and the `[unused]` tag
  is how dormant tooling enters the loop.
- **Decision/ADR re-challenge** — that is TODO 13 ("Intentions"), a separate future
  command; `/kaizen` curates methodology/tooling signals, not service/architecture decisions.
- **Auto tooling-usage inventory** — rejected for the same reason as auto-harvest.
- **A separate report artifact** (à la `docs/reviews/`) — the ledger *is* the durable
  record; the interactive session is the "report".

## Relationship to `/review-repo`

`/review-repo` audits **repo drift** (code/doc staleness, conformance). `/kaizen` curates
**methodology/tooling** friction. They stay distinct. Future integration (not in this
spec): when `/review-repo` sees a finding recur across runs, it could append a
`[recurring]` signal to `FRICTION.md`, making `/review-repo` a *producer* into the single
input source that `/kaizen` consumes.

## Build order

1. `scripts/friction-scan.py` (`--json`) + `tests/test_friction_scan.py`.
2. `.claude/commands/kaizen.md` (the session flow).
3. First real `/kaizen` run against the current Open signals (dogfood).
4. Stage 2: `--nudge` + `/review-repo` hook-up.
5. (Deferred) headless/cron — TODO 11.3.