8.3 KiB
8.3 KiB
FRICTION.md — kaizen friction log
Raw signals for the periodic kaizen review (the methodology retrospective; see
docs/TODO.md). This is the input that keeps our tooling and conventions sharpening
over time instead of only accreting.
How to use: append freely during work — don't curate, don't fix here. Capture friction, surprises, fixes that keep recurring, and tooling that isn't earning its keep. The kaizen review reads this, then proposes add / change / remove (biased toward remove) and records the decisions as ADRs.
Entry format: date — [tag] observation — (optional) → systematization idea
Tags: [friction] recurring annoyance · [gotcha] surprising behaviour ·
[recurring] keeps coming back, should be systematized · [unused] tooling not
earning its keep.
2026-05-30 — initial seed (from the Claude-Code setup session)
[recurring]Everygit commitneedsrbwunlocked (the pre-commit ansible-lint hook decryptsvault.ymlfor its syntax-check). Mitigated with a 5h lock timeout and anrbw unlockedpre-flight convention. → Open: could ansible-lint skip vault decryption for syntax-check, so committing doesn't need the vault at all?[gotcha]pre-commit stashes unstaged changes before running hooks, so a partial commit reverted an interdependent file (ansible.cfg) and failed. → Commit interdependent changes together, or stage the config change first.[gotcha]make new-rolehad never worked on this host:mkdir {a,b,c}brace expansion fails under/bin/sh(dash). Fixed with explicit paths. → A real run catches what static review can't; consider smoke-testing scaffold commands.[gotcha]rbw syncis required after adding a Vaultwarden item beforerbw getfinds it (stale local cache).[gotcha]This shell is zsh — unquoted$VARdoes not word-split, so a variable holding a file list was passed as a single argument. → Use explicit args/arrays.[friction]Long sessions: I make a batch of edits but can't commit until yourbw unlock. The 5h timeout + pre-flight check address the symptom; watch whether it still bites.[gotcha]Hooks (or any new.claude/settings.json) added mid-session don't activate until a Claude Code restart — the settings watcher only tracks settings files that existed at session start. Opening/hooksand dismissing did not load them. → Fresh sessions load them normally; restart after adding hooks.
2026-05-31
- I asked to draft an ADR and got: No formal status-header convention, but since this is a draft for discussion I'll mark it Proposed so it isn't mistaken for an accepted decision. Here's the draft.
2026-06-01
[friction]Thefinishing-a-development-branchflow (and generic AI/dev tooling) offers "push and open a Pull Request," but our Forgejooriginis trunk-based with no merge-request / approval gate (CLAUDE.md git conventions). That option doesn't apply — the real path is local fast-forward merge tomain, then push. → Skills and conventions that assume a GitHub-style PR workflow need a homelab-aware variant; encode that here "finishing a branch" means merge-locally-then-push, not open-a-PR.
2026-06-05
[recurring]Thewriting-plansskill ends by asking "subagent-driven vs inline execution?" — always answer subagent-driven here. Don't ask; default straight to subagent-driven (fresh subagent per task + review between tasks). → Standing preference; skip the execution-mode prompt.[recurring]When a deferred decision later resolves, docs that referenced the deferral go stale and a plan's file-map can miss them (e.g. resolving the mesh-VPN choice leftnew-host.mdstill saying "mesh VPN (choice deferred)"; the ubongo work similarly left a contradiction in CLAUDE.md). A broadened final grep sweep caught both. → On resolving a deferred decision, grep all canonical docs for the deferral language ("choice deferred", "pending", "TBD", the placeholder's name) and reconcile every hit — don't rely on the plan's file-map alone. Worth a/review-repocheck for lingering "deferred/pending/TBD" references whose ADR has since resolved.- Recurred a 3rd time (same day): ADR-017 resolved the browser-E2E harness but
left ADR-015's own "Deferred" list item #2 still reading as open — not caught by the
ADR-017 plan's sweep (which only checked for its own placeholder language), only
by a later STATUS pass. Lesson sharpened: the stale reference often lives in the
originating ADR's Deferred section, which the resolving ADR's plan won't think
to grep. → When an ADR resolves another ADR's deferred item, edit that source
ADR's Deferred list in the same change. Three hits now — promote from "worth a
check" to build it: a
/review-reporule flagging any ADR "Deferred/Open" entry whose subject is named as RESOLVED/DECIDED elsewhere.
- Recurred a 3rd time (same day): ADR-017 resolved the browser-E2E harness but
left ADR-015's own "Deferred" list item #2 still reading as open — not caught by the
ADR-017 plan's sweep (which only checked for its own placeholder language), only
by a later STATUS pass. Lesson sharpened: the stale reference often lives in the
originating ADR's Deferred section, which the resolving ADR's plan won't think
to grep. → When an ADR resolves another ADR's deferred item, edit that source
ADR's Deferred list in the same change. Three hits now — promote from "worth a
check" to build it: a
2026-06-06
[recurring]Asked the execution-mode question AGAIN ("subagent-driven vs inline — which approach?") at the end ofwriting-plans, despite the 2026-06-05 standing preference and thealways-subagent-driven-executionmemory both saying don't ask. Root cause: thewriting-plansskill's "Execution Handoff" step scripts the menu, and I followed the skill text over the user's standing override. Second occurrence → escalate from "skip the prompt" to a hard rule: never present the execution-mode menu; finishing a plan means defaulting straight to subagent-driven.[friction]Don't pause for approval between writing a plan and implementing it. The user has standing pre-approval to carry straight through plan → implementation. The brainstorming/plan flow already has explicit approval gates (design approval, spec review); adding another "shall I proceed to implement?" gate after the plan is written is redundant friction. → Afterwriting-plansfinishes, begin subagent-driven implementation directly. The only reason to stop is a genuine blocker or ambiguity, not a routine checkpoint.
Host nftables firewall build (base role)
[gotcha]nft -crejectsiif "<name>"when the interface is absent (it resolves to an interface index at load time). The render+syntax-check Molecule step caughtiif "wt0"failing in the container — and it would fail identically on any real host before NetBird brings upwt0. Useiifname "<name>"(string match, no existence requirement, survives the interface coming/going) for any interface that may be absent.[gotcha]Molecule'scommunity.dockerconnection usesansible_hostas the container name (remote_addr). Settingansible_hostas data in a scenario'shost_vars(e.g. to give a resolver a fake IP) breaks the connection →UNREACHABLE, "Failed to create temporary directory". Don't overrideansible_hostin molecule; feed fixture IPs another way (or keep fixtures to zone sources and unit-test IP resolution).[recurring]make test ROLE=<r>needs the venv on PATH. Run non-activated (as agents do), molecule dies withFileNotFoundError: 'ansible-config'— it shells out toansible-config/ansible-playbookby bare name. Workaround:PATH="$PWD/.venv/bin:$PATH" .venv/bin/molecule test. Also the molecule image wasn't in the Forgejo registry (pull → "not found"); had tomake molecule-imageto build it locally. → Consider (a) the Makefiletesttarget prepending.venv/binto PATH, and (b)make molecule-image-pushso a fresh checkout can pull it.[gotcha]Apply-only task paths have no Level-1 coverage, so safety bugs hide there. Thenftauto-rollback snapshot used a barenft list ruleset(no leadingflush ruleset) → the revert was a silent no-op on first apply and errored on later ones; the whole safety net was dead. Molecule never runs the apply (gated off), so only adversarial review + an isolated-netns round-trip test caught it. → For apply/safety paths molecule can't exercise, validate out-of-band (a throwaway--privilegedcontainer with its own netns) and treat a final adversarial review as mandatory, not optional.[note]The render-and-nft -c(no-apply) Molecule approach earned its keep — caught theiif/iifnamebug deterministically without touching the host kernel. Good pattern to reuse for other config-rendering roles.