12 KiB
ToDo
Build order lives in
docs/ROADMAP.md— that sequences this backlog into milestones. This file is the decision backlog; the roadmap is the order we build them.
-
Forgejo CI — what CI work remains after ADR-010 (which workflows, runner setup, etc. still need to be built)?
-
Testing
- Choose and configure code-testing tooling (Molecule, etc.).
- Decide how the AI interprets Molecule output and performs live testing:
API calls, curl pulls of web products, log reviews, and headless browsing.
— Headless browsing DECIDED (ADR-017): the
/verify-serviceLevel 4 harness. The API/curl/log-review siblings remain open. Define a standard for generating test users and for instructing the user to perform relevant manual tests.DECIDED (ADR-017): test users in the staging Authentiktestgroup; manual tests handed off as a checklist in the/verify-servicereport.
-
Building services
Decide how to manage logs.DECIDED (ADR-018): all logs → on-cluster Loki via Grafana Alloy (inbase); a security subset also ships write-only off-site toaskari(append-only); Grafana queries both. WORM skipped (accepted-risk R4).Decide how to manage APIs / API access.DECIDED (ADR-021): per-serviceaccess__*data declares the admin API (endpoint +firewall_refto the catalog + vault token ref + health path); rendered intoACCESS.mdand probed by/check-access. Part of the two-layer operational-access doctrine.Decide how to import or integrate from baobabAnsibleV4.DECIDED (ADR-013): translate-don't-transplant — V4 is a source only of gotchas + working config snippets, re-derived on boma's terms; never structure/requirements/values.- Decide what each node runs — base packages plus which apps/services.
Decide the firewall strategy (which firewall, ruleset, per-host vs central).DECIDED (ADR-020): two layers — OPNsense (perimeter + inter-VLAN) + host nftables (default-deny inbound + east-west allowlist, permissive egress). Single source of truth: agroup_varsservice catalog with symbolic sources; each layer renders its own slice. Builds deferred to follow-up specs (host nftables inbase, then OPNsense-as-code).- Wire up the monitoring stack. Logging topology DECIDED (ADR-018): cluster Loki
(all logs) + off-site security subset on
askari+ Grafana on-cluster (not the whole stack onaskari). Still to design/build: Prometheus + metric exporters, Uptime Kuma, and exactly which alerts live where. Define a tagging standard that lets us target runs without over-tagging.DECIDED (ADR-019): two-tier — role-name tags (auto, at play level) + a closed 9-tag concern list (tests/tags.yml); union-only targeting; enforced bymake lint.Ensure the right things are backed up (incl. database dumps if we land on PBS).DECIDED (ADR-022): data-only restic (Model A, no PBS) pulled by an off-cluster node (fisi); per-servicebackup__*+BACKUP.md; logical DB dumps; 3-2-1 via pCloud + rotated USB air-gap. Build: Plans 2–3.- Decide: a central database server, or individual database services per app?
- Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)?
Deliberate tagging strategy.DECIDED (ADR-019) — folded into 3.7.
-
Split-horizon FQDN — adopt split-horizon FQDN with or without nyumbani?DECIDED (M1): three-tier scheme onwingu.me;nyumbanidropped; mesh/LAN-only default. Seedocs/decisions/007-network.md+ the M1 spec. -
Control node
- Set up and test the control node while waiting for hardware.
- Define control-node bootstrapping — a dedicated recipe and playbook?
- Set up rbw on the control node.
-
Updating 2. Decide the update strategy across services & containers vs packages & builds / GitHub pulls / Flatpaks. 3. Define scheduling of updates and reboots, including post-update testing.
-
Shell setup
- Decide what shell setup matters for the AI's work on the control node.
Decide what to set up on the hosts, given that direct access will be rare.DECIDED (ADR-021): the host-layer access baseline — SSH onwt0+ fromubongo, Docker/Compose tooling, Alloy log shipping, and a recorded break-glass console per host class.
-
Scheduled work
- Run
/review-repoasclaude -pvia cron every two weeks? - Build sanity checks (e.g. does PhotoPrism have its pictures? are email services receiving and sending?).
- Design a declarative
scheduled_jobsrole so the repo owns which cronjobs run on a host, enforced by Ansible. Sketch (deferred until we have hosts): reads ascheduled_jobs__jobslist from group_vars/host_vars, rendered via a managed/etc/cron.dfile. Open questions:- General role vs control-node-only?
- Prune undeclared jobs (repo authoritative) vs additive?
- Validate headless email and that cron's env has the
claudeCLI. - (The fortnightly
/review-repojob is the first entry.)
- Schedule
/capacity-reviewto run periodically (on-demand only for now). Revisit once the physical cluster + a live usage-stats hook exist, so it reasons on real usage rather than declared intent alone. Decide the usage source first: Proxmox RRD (built-in, no extra infra) vs the Prometheus/Loki/Grafana/Grafana-Alloy stack we will likely set up anyway (richer, per-process, but more to run) — see TODO 3.6. Don't build the Proxmox-RRD hook before settling this, to avoid throwaway work. - Build a
/security-reviewskill (sibling to/review-repo): re-check the security posture against ADR-002, surface drift, and re-challenge the accepted-risk register (docs/security/accepted-risks.md). Could pair a deterministic pre-scan (undeclared open ports, disabled baseline controls, world-readable secrets, services not behind auth) with a judgement pass. Open question: standalone, or folded into the kaizen/retro(item 11)?
- Run
-
Should we make a basic function so that tools (and AI) can send messages to the user - email, matrix or ntfy?
-
Claude setup — DECIDED: brainstorm for intent, capture as ADRs (skip plan files); hooks + slash commands +
/review-repofor enforcement at scale. Any remaining setup to carry out from this decision?Policy for how we collaborate with references to baobabAnsibleV4 without misusing it.DECIDED — ADR-013.- Policy for how we write key documents like ADRs.
- Further development on how we collaborate on designing the foundation for the project - separate from how we implement new containers etc.
How do we make sure agents always use the latest official documentation for the technologies etc. we use?DECIDED — ADR-014 (facts → version-matched docs, cited + stamped; best practices → translated per ADR-013; risk-based triggers; graceful fallback to WebFetch).- Always subagent driven?
- When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
Reproducible agent toolchain (surfaced by ADR-014).DONE — repo.claude/settings.jsondeclaresextraKnownMarketplaces+enabledPlugins(active set: superpowers · context7 · terraform · claude-md-management) and a conservative permissions allowlist; bootstrap procedure indocs/runbooks/claude-code-setup.md. Deferred plugins listed there with triggers. (Plugin install is still a per-machine/pluginaction — no native auto-install.)
-
Kaizen loop — set up ~2026-06-06 (one week from now).
BuildDONE — built as/retro/kaizen(scripts/friction-scan.py+.claude/commands/kaizen.md; specdocs/superpowers/specs/2026-06-14-kaizen-command-design.md). Scope narrowed to curate-only per the 2026-06-14 spec (no auto-harvest, no tooling-usage inventory; decision re-challenge is TODO 13, not this). Verdicts are add / change / park / remove (park-with-resurrection-trigger). The--nudge(recurrence/age/backlog) surfaces in/review-repo; headless/cron is 11.3.- Keep appending raw signals to
docs/FRICTION.md(live now) until the retro consumes them. - Automation deferred (revisit when the notify + cron stack is up): the first build is an on-demand command plus a light recurrence/age nudge (printed reminder when the loop is overdue). Wiring a scheduled headless run — report-only: it proposes add/change/park/remove and notifies, but does not auto-curate/commit — waits until the notification (ntfy) + scheduled-job stack exists. Look into automating it then.
-
Spin-up / build order — what is the right order of operations when spinning up from scratch (OS, DNS, Authentik, Caddy, …)?
-
Intentions - Is the current setup clearly identifying intentions throughout? We have the readme files but is that enough? Also, how do we rechallange desisions and how they interact over time. I.e. We have these two services running, but extending one a little bit could make the other redundant so we could remove it. Or an alternative to this services has emerged, and it is actually better.
-
Script dependencies policy — utility scripts (
tf_to_inventory.py,repo-scan.py,capacity-scan.py) are stdlib-only by convention, for run-anywhere portability (control node, CI, bare clone, no venv). Reevaluate whether selectively allowing libraries (e.g. PyYAML — already present via Ansible) is a better fit in general: weigh the parsing-correctness win against losing zero-setup portability. Decide a clear rule and record it. -
Security hardening implementation — build out the ADR-002 hardening standard.
- Implement the CIS Debian Benchmark Level 1 + Level 2 in the
baserole (local tasks; CIS /dev-secas reference only — no Galaxy roles). Includes AppArmor (enforce mode) and AIDE file-integrity. - Implement the CIS Docker Benchmark: daemon/engine settings in
docker_host; per-container settings enforced viadocs/security/service-checklist.md. - VM disk layout for CIS L2: separate
/tmp,/var,/var/log,/homepartitions withnodev,nosuid,noexec— a Terraform/cloud-init concern (ADR-006). Decide the template layout before provisioning, since it is painful to retrofit. - Network IDS: enable Suricata on OPNsense (IDS first; IPS later?).
- Active security alerting: wire AIDE,
auditd,fail2ban, and Suricata into the Loki/Grafana alerting stack (ties to 3.6). - Supply-chain hygiene: enforce tiered image pinning (stateful
tag@digest; stateless rolling tags — ADR-011) + official/verified images via the service checklist; revisit active scanning (Trivy/Grype) once a triage stack exists (R1).
- Implement the CIS Debian Benchmark Level 1 + Level 2 in the
-
ADR-011 (update management) — resolve open questions + accept. Committed as Proposed; resolve before marking Accepted:
- Snapshot driver — control node calling the Proxmox API vs a Proxmox-side hook (crosses the TF/Ansible boundary, ADR-006/009).
- Cadences — is weekly OS patching right; should reboots be rarer than
apt? - Health-check harness — where it lives and the minimum bar that counts as "in order" before the weekly run ships (ties to ADR-008, TODO 2.2 / 8.2).
- Stateful classification home — per-role
__statefulflag vs a group_vars list. - Staging-first? — hit a staging host before production, or is snapshot-before + Friday timing enough at this scale?
- Notification/control channel — boma's own ntfy topics (ADR-013) + a "skip this week" / "pause" switch (ties to TODO 9).
Reconcile pinning conflict (tags vs digests).DECIDED: tiered rule — statefultag@digest(readable tag + integrity digest), stateless rolling tags. Aligned across ADR-011 (dec. 2), ADR-004, ADR-002 supply-chain row + accepted-risk R1, the service checklist, and 15.6.