Add kaizen friction log and schedule the kaizen-loop setup

docs/FRICTION.md: a running log of friction/gotchas/recurring-fixes/unused tooling,
seeded with this session's real signals — raw material for the periodic kaizen
review. docs/TODO.md: schedule building /retro in ~1 week, and record the Claude-setup
decision. (Also carries your earlier backlog edits.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-05-30 22:05:40 +02:00
parent 778b581729
commit 11af84938d
2 changed files with 52 additions and 1 deletions

37
docs/FRICTION.md Normal file
View file

@ -0,0 +1,37 @@
# FRICTION.md — kaizen friction log
Raw signals for the periodic **kaizen review** (the methodology retrospective; see
`docs/TODO.md`). This is the input that keeps our tooling and conventions sharpening
over time instead of only accreting.
**How to use:** append freely *during* work — don't curate, don't fix here. Capture
friction, surprises, fixes that keep recurring, and tooling that isn't earning its
keep. The kaizen review reads this, then proposes **add / change / remove** (biased
toward *remove*) and records the decisions as ADRs.
**Entry format:** `date — [tag] observation — (optional) → systematization idea`
Tags: `[friction]` recurring annoyance · `[gotcha]` surprising behaviour ·
`[recurring]` keeps coming back, should be systematized · `[unused]` tooling not
earning its keep.
---
## 2026-05-30 — initial seed (from the Claude-Code setup session)
- `[recurring]` Every `git commit` needs `rbw` unlocked (the pre-commit ansible-lint
hook decrypts `vault.yml` for its syntax-check). Mitigated with a 5h lock timeout
and an `rbw unlocked` pre-flight convention. → *Open:* could ansible-lint skip vault
decryption for syntax-check, so committing doesn't need the vault at all?
- `[gotcha]` pre-commit stashes *unstaged* changes before running hooks, so a partial
commit reverted an interdependent file (`ansible.cfg`) and failed. → Commit
interdependent changes together, or stage the config change first.
- `[gotcha]` `make new-role` had never worked on this host: `mkdir {a,b,c}` brace
expansion fails under `/bin/sh` (dash). Fixed with explicit paths. → A real run
catches what static review can't; consider smoke-testing scaffold commands.
- `[gotcha]` `rbw sync` is required after adding a Vaultwarden item before `rbw get`
finds it (stale local cache).
- `[gotcha]` This shell is zsh — unquoted `$VAR` does not word-split, so a variable
holding a file list was passed as a single argument. → Use explicit args/arrays.
- `[friction]` Long sessions: I make a batch of edits but can't commit until you
`rbw unlock`. The 5h timeout + pre-flight check address the symptom; watch whether
it still bites.

View file

@ -3,7 +3,7 @@
- [x] Main readme only says ansible, not terraform. Should properbly be included.
- [x] Main readme does not include a description of the name boma, nor the scope (i.e. infrastructure - not laptops)
- [ ] Method to review repo to ensure
- [x] Method to review repo to ensure
- We dont carry around code, comments, notes, etc. that is no longer needed but was perhaps added to fix an issue that has been resolved.
- That all code, structure, comments, notes etc. follow our design decisions.
- That clear intent is documented throughout - and that there are not any overlaps, contradictions etc.
@ -21,6 +21,8 @@
- What to install on nodes?
- firewalls?
- apps?
- wirering up loki, prometheous, grafana dashboards, grafana alerts, uptimekuma alerts on askari
- tagging strategy - we need a specific standard so that we can target runs, but dont over-tag.
- [ ] Split horizon FQDN - with or without nyumbani
@ -48,3 +50,15 @@
managed /etc/cron.d file. Open Qs: general role vs control-node-only; prune
undeclared jobs (repo authoritative) vs additive; validate headless email + that
cron's env has the `claude` CLI. The /review-repo fortnightly job is the first entry.
- [ ] Claude setup
- superpowers or other methodologies? → decided: brainstorm for intent, capture as
ADRs (skip plan files); hooks + slash commands + /review-repo for enforcement at scale.
- [ ] Kaizen loop — set up ~2026-06-06 (one week from now)
- Build `/retro`: reads `docs/FRICTION.md` + `/review-repo` recurring findings + a
tooling-usage inventory; proposes add / change / **remove** (biased to remove);
records decisions as ADRs; evaluates itself. Recurrence-triggered + light periodic sweep.
- `docs/FRICTION.md` is live now — keep appending raw signals until the retro consumes them.
- [ ] What is the right order of operation when spinning up from scratch? (OS, DNS, authentik, traefik...?)