docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-14 22:00:53 +02:00
parent 13ae674cc9
commit 293c1f88d8

View file

@ -2,72 +2,51 @@
> **Build order lives in `docs/ROADMAP.md`** — that sequences this backlog into
> milestones. This file is the decision backlog; the roadmap is the order we build them.
>
> **Open items only.** Item numbers are stable cross-references (cited by ROADMAP,
> STATUS, ADRs, scripts) — **never renumber**. When an item is decided or built, collapse
> it to a one-line pointer in place; the full record lives in its ADR / `STATUS.md` / the
> `FRICTION.md` decisions ledger.
1. **Forgejo CI** — what CI work remains after ADR-010 (which workflows, runner
setup, etc. still need to be built)?
2. **Testing**
1. Choose and configure code-testing tooling (Molecule, etc.).
2. Decide how the AI interprets Molecule output and performs live testing:
API calls, curl pulls of web products, log reviews, and headless browsing.
— Headless browsing DECIDED (ADR-017): the `/verify-service` Level 4 harness.
The API/curl/log-review siblings remain open.
3. ~~Define a standard for generating test users and for instructing the user to
perform relevant manual tests.~~ DECIDED (ADR-017): test users in the staging
Authentik `test` group; manual tests handed off as a checklist in the
`/verify-service` report.
2. Decide how the AI interprets Molecule output and performs live testing — API
calls, curl pulls of web products, log reviews. Headless browsing → ADR-017
(`/verify-service`); the API/curl/log-review siblings remain open.
3. ~~Standard for test users + manual-test instructions.~~ → ADR-017.
3. **Building services**
1. ~~Decide how to manage logs.~~ DECIDED (ADR-018): all logs → on-cluster Loki via
Grafana Alloy (in `base`); a security subset also ships write-only off-site to
`askari` (append-only); Grafana queries both. WORM skipped (accepted-risk R4).
2. ~~Decide how to manage APIs / API access.~~ DECIDED (ADR-021): per-service `access__*`
data declares the admin API (endpoint + `firewall_ref` to the catalog + vault token
ref + health path); rendered into `ACCESS.md` and probed by `/check-access`. Part of
the two-layer operational-access doctrine.
3. ~~Decide how to import or integrate from baobabAnsibleV4.~~ DECIDED (ADR-013):
translate-don't-transplant — V4 is a source only of gotchas + working config
snippets, re-derived on boma's terms; never structure/requirements/values.
1. ~~Decide how to manage logs.~~ → ADR-018.
2. ~~Decide how to manage APIs / API access.~~ → ADR-021.
3. ~~Decide how to import/integrate from baobabAnsibleV4.~~ → ADR-013.
4. Decide what each node runs — base packages plus which apps/services.
5. ~~Decide the firewall strategy (which firewall, ruleset, per-host vs central).~~
DECIDED (ADR-020): two layers — OPNsense (perimeter + inter-VLAN) + host nftables
(default-deny inbound + east-west allowlist, permissive egress). Single source of
truth: a `group_vars` service catalog with symbolic sources; each layer renders
its own slice. Builds deferred to follow-up specs (host nftables in `base`, then
OPNsense-as-code).
6. Wire up the monitoring stack. Logging topology DECIDED (ADR-018): cluster Loki
(all logs) + off-site security subset on `askari` + Grafana on-cluster (not the
whole stack on `askari`). Still to design/build: Prometheus + metric exporters,
Uptime Kuma, and exactly which alerts live where.
7. ~~Define a tagging standard that lets us target runs without over-tagging.~~
DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed
9-tag concern list (`tests/tags.yml`); union-only targeting; enforced by `make lint`.
8. ~~Ensure the right things are backed up (incl. database dumps if we land on PBS).~~
DECIDED (ADR-022): data-only restic (Model A, no PBS) pulled by an off-cluster
node (`fisi`); per-service `backup__*` + `BACKUP.md`; logical DB dumps; 3-2-1 via
pCloud + rotated USB air-gap. Build: Plans 23.
5. ~~Decide the firewall strategy.~~ → ADR-020 (builds: host nftables in `base` done; OPNsense-as-code pending).
6. Wire up the monitoring stack — Prometheus + metric exporters, Uptime Kuma, and
exactly which alerts live where. (Logging topology → ADR-018.)
7. ~~Define a tagging standard.~~ → ADR-019.
8. ~~Ensure the right things are backed up.~~ → ADR-022 (build: the `backup` role, Plans 23, pending).
9. Decide: a central database server, or individual database services per app?
10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)?
11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7.
10. Should we keep the custom base-container (Molecule test image) method for role
testing, or revisit it as boma's testing approach matures (ADR-008)?
11. ~~Deliberate tagging strategy.~~ → ADR-019 (folded into 3.7).
4. ~~**Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?~~
DECIDED (M1): three-tier scheme on `wingu.me`; `nyumbani` dropped; mesh/LAN-only
default. See `docs/decisions/007-network.md` + the M1 spec.
4. ~~**Split-horizon FQDN.**~~ → ADR-007 / M1 (`wingu.me` three-tier; `nyumbani` dropped; mesh/LAN-only default).
5. **Control node**
1. Set up and test the control node while waiting for hardware.
2. Define control-node bootstrapping — a dedicated recipe and playbook?
3. Set up rbw on the control node.
6. **Updating** 2. Decide the update strategy across services & containers vs packages &
builds / GitHub pulls / Flatpaks. 3. Define scheduling of updates and reboots, including post-update testing.
6. **Updating** — 1. Decide the update strategy across services & containers vs packages
& builds / GitHub pulls / Flatpaks. 2. Define scheduling of updates and reboots,
including post-update testing. (Tracked in item 16 / ADR-011.)
7. **Shell setup**
1. Decide what shell setup matters for the AI's work on the control node.
2. ~~Decide what to set up on the hosts, given that direct access will be rare.~~
DECIDED (ADR-021): the host-layer access baseline — SSH on `wt0` + from `ubongo`,
Docker/Compose tooling, Alloy log shipping, and a recorded break-glass console per
host class.
2. ~~Decide what to set up on the hosts (direct access rare).~~ → ADR-021.
8. **Scheduled work**
1. Run `/review-repo` as `claude -p` via cron every two weeks?
@ -93,41 +72,25 @@
accepted-risk register (`docs/security/accepted-risks.md`). Could pair a
deterministic pre-scan (undeclared open ports, disabled baseline controls,
world-readable secrets, services not behind auth) with a judgement pass.
Open question: standalone, or folded into the kaizen `/retro` (item 11)?
Open question: standalone, or folded into `/kaizen` (item 11)?
9. Should we make a basic function so that tools (and AI) can send messages to the user - email, matrix or ntfy?
10. **Claude setup** — DECIDED: brainstorm for intent, capture as ADRs (skip plan
files); hooks + slash commands + `/review-repo` for enforcement at scale. Any
remaining setup to carry out from this decision?
1. ~~Policy for how we collaborate with references to baobabAnsibleV4 without misusing it.~~ DECIDED — ADR-013.
2. Policy for how we write key documents like ADRs.
10. **Claude setup** — DECIDED: brainstorm for intent → ADRs; hooks + slash commands +
`/review-repo` for enforcement at scale. Remaining:
1. ~~V4 collaboration policy.~~ → ADR-013.
2. ~~Policy for how we write key documents like ADRs.~~ → ADR-023.
3. Further development on how we collaborate on designing the foundation for the project - separate from how we implement new containers etc.
4. ~~How do we make sure agents always use the latest official documentation for the technologies etc. we use?~~ DECIDED — ADR-014 (facts → version-matched docs, cited + stamped; best practices → translated per ADR-013; risk-based triggers; graceful fallback to WebFetch).
5. Always subagent driven?
4. ~~Always-latest official documentation for our tech.~~ → ADR-014.
5. ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`).
6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
7. ~~Reproducible agent toolchain (surfaced by ADR-014).~~ DONE — repo
`.claude/settings.json` declares `extraKnownMarketplaces` + `enabledPlugins`
(active set: superpowers · context7 · terraform · claude-md-management) and a
conservative permissions allowlist; bootstrap procedure in
`docs/runbooks/claude-code-setup.md`. Deferred plugins listed there with
triggers. (Plugin install is still a per-machine `/plugin` action — no native
auto-install.)
7. ~~Reproducible agent toolchain.~~`.claude/settings.json` + `docs/runbooks/claude-code-setup.md`.
11. **Kaizen loop** — set up ~2026-06-06 (one week from now).
1. ~~Build `/retro`~~ **DONE — built as `/kaizen`** (`scripts/friction-scan.py` +
`.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).
Scope narrowed to **curate-only** per the 2026-06-14 spec (no auto-harvest, no
tooling-usage inventory; decision re-challenge is TODO 13, not this). Verdicts are
add / change / **park** / remove (park-with-resurrection-trigger). The `--nudge`
(recurrence/age/backlog) surfaces in `/review-repo`; headless/cron is 11.3.
2. Keep appending raw signals to `docs/FRICTION.md` (live now) until the
retro consumes them.
3. **Automation deferred (revisit when the notify + cron stack is up):** the
first build is an **on-demand** command plus a light recurrence/age **nudge**
(printed reminder when the loop is overdue). Wiring a **scheduled headless
run** — report-only: it proposes add/change/**park**/remove and notifies, but
does not auto-curate/commit — waits until the notification (ntfy) +
scheduled-job stack exists. Look into automating it then.
11. **Kaizen loop**`/kaizen` built (STATUS).
1. ~~Build the loop command.~~`/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).
2. Keep appending raw signals to `docs/FRICTION.md` (ongoing practice; see FRICTION.md).
3. **Automation deferred** (revisit when the notify + cron stack is up): wire a
**scheduled headless** run — report-only (proposes verdicts + notifies, does not
auto-curate/commit). The on-demand command + recurrence/age nudge ship now.
12. **Spin-up / build order** — what is the right order of operations when spinning up
from scratch (OS, DNS, Authentik, Caddy, …)?
@ -135,11 +98,11 @@
13. **Intentions** - Is the current setup clearly identifying intentions throughout? We have the readme files but is that enough? Also, how do we rechallange desisions and how they interact over time. I.e. We have these two services running, but extending one a little bit could make the other redundant so we could remove it. Or an alternative to this services has emerged, and it is actually better.
14. **Script dependencies policy** — utility scripts (`tf_to_inventory.py`,
`repo-scan.py`, `capacity-scan.py`) are stdlib-only by convention, for
run-anywhere portability (control node, CI, bare clone, no venv). Reevaluate
whether selectively allowing libraries (e.g. PyYAML — already present via
Ansible) is a better fit in general: weigh the parsing-correctness win
against losing zero-setup portability. Decide a clear rule and record it.
`repo-scan.py`, `capacity-scan.py`, `friction-scan.py`) are stdlib-only by
convention, for run-anywhere portability (control node, CI, bare clone, no venv).
Reevaluate whether selectively allowing libraries (e.g. PyYAML — already present via
Ansible) is a better fit in general: weigh the parsing-correctness win against losing
zero-setup portability. Decide a clear rule and record it.
15. **Security hardening implementation** — build out the ADR-002 hardening standard.
1. Implement the CIS Debian Benchmark **Level 1 + Level 2** in the `base` role
@ -170,7 +133,4 @@
Friday timing enough at this scale?
6. Notification/control channel — boma's own ntfy topics (ADR-013) + a "skip this
week" / "pause" switch (ties to TODO 9).
7. ~~Reconcile pinning conflict (tags vs digests).~~ DECIDED: tiered rule —
**stateful `tag@digest`** (readable tag + integrity digest), **stateless
rolling tags**. Aligned across ADR-011 (dec. 2), ADR-004, ADR-002 supply-chain
row + accepted-risk R1, the service checklist, and 15.6.
7. ~~Reconcile pinning conflict (tags vs digests).~~ → DECIDED: tiered (stateful `tag@digest`, stateless rolling); ADR-011 dec. 2 / ADR-004 / ADR-002.