boma/docs/TODO.md

# ToDo

> **Build order lives in `docs/ROADMAP.md`** — that sequences this backlog into
> milestones. This file is the decision backlog; the roadmap is the order we build them.
>
> **Open items only.** Item numbers are stable cross-references (cited by ROADMAP,
> STATUS, ADRs, scripts) — **never renumber**. When an item is decided or built, collapse
> it to a one-line pointer in place; the full record lives in its ADR / `STATUS.md` / the
> `FRICTION.md` decisions ledger.

1. **Forgejo CI** — what CI work remains after ADR-010 (which workflows, runner
   setup, etc. still need to be built)?

2. **Testing**
   1. Choose and configure code-testing tooling (Molecule, etc.).
   2. Decide how the AI interprets Molecule output and performs live testing — API
      calls, curl pulls of web products, log reviews. Headless browsing → ADR-017
      (`/verify-service`); the API/curl/log-review siblings remain open.
   3. ~~Standard for test users + manual-test instructions.~~ → ADR-017.
   4. ~~Local VM integration testing on ubongo.~~ → ADR-025 / `make test-integration` (built + RED→GREEN validated 2026-06-18).

3. **Building services**
   1. ~~Decide how to manage logs.~~ → ADR-018.
   2. ~~Decide how to manage APIs / API access.~~ → ADR-021.
   3. ~~Decide how to import/integrate from baobabAnsibleV4.~~ → ADR-013.
   4. Decide what each node runs — base packages plus which apps/services.
   5. ~~Decide the firewall strategy.~~ → ADR-020 (builds: host nftables in `base` done; OPNsense-as-code pending).
   6. Wire up the monitoring stack — Prometheus + metric exporters, Uptime Kuma, and
      exactly which alerts live where. (Logging topology → ADR-018.)
   7. ~~Define a tagging standard.~~ → ADR-019.
   8. ~~Ensure the right things are backed up.~~ → ADR-022 (build: the `backup` role, Plans 2–3, pending).
   9. Decide: a central database server, or individual database services per app?
   10. Should we keep the custom base-container (Molecule test image) method for role
       testing, or revisit it as boma's testing approach matures (ADR-008)?
   11. ~~Deliberate tagging strategy.~~ → ADR-019 (folded into 3.7).

4. ~~**Split-horizon FQDN.**~~ → ADR-007 / M1 (`wingu.me` three-tier; `nyumbani` dropped; mesh/LAN-only default).

5. **Control node**
   1. Set up and test the control node while waiting for hardware.
   2. Define control-node bootstrapping — a dedicated recipe and playbook?
   3. Set up rbw on the control node.

6. **Updating** — 1. Decide the update strategy across services & containers vs packages
   & builds / GitHub pulls / Flatpaks. 2. Define scheduling of updates and reboots,
   including post-update testing. (Tracked in item 16 / ADR-011.)

7. **Shell setup**
   1. Decide what shell setup matters for the AI's work on the control node.
   2. ~~Decide what to set up on the hosts (direct access rare).~~ → ADR-021.

8. **Scheduled work**
   1. Run `/review-repo` as `claude -p` via cron every two weeks?
   2. Build sanity checks (e.g. does PhotoPrism have its pictures? are email
      services receiving and sending?).
   3. Design a declarative `scheduled_jobs` role so the repo owns which cronjobs
      run on a host, enforced by Ansible. Sketch (deferred until we have hosts):
      reads a `scheduled_jobs__jobs` list from group_vars/host_vars, rendered via
      a managed `/etc/cron.d` file. Open questions:
      1. General role vs control-node-only?
      2. Prune undeclared jobs (repo authoritative) vs additive?
      3. Validate headless email and that cron's env has the `claude` CLI.
      4. (The fortnightly `/review-repo` job is the first entry.)
   4. Schedule `/capacity-review` to run periodically (on-demand only for now).
      Revisit once the physical cluster + a live usage-stats hook exist, so it
      reasons on real usage rather than declared intent alone. **Decide the usage
      source first:** Proxmox RRD (built-in, no extra infra) vs the
      Prometheus/Loki/Grafana/Grafana-Alloy stack we will likely set up anyway
      (richer, per-process, but more to run) — see TODO 3.6. Don't build the
      Proxmox-RRD hook before settling this, to avoid throwaway work.
   5. Build a `/security-review` skill (sibling to `/review-repo`): re-check the
      security posture against ADR-002, surface drift, and re-challenge the
      accepted-risk register (`docs/security/accepted-risks.md`). Could pair a
      deterministic pre-scan (undeclared open ports, disabled baseline controls,
      world-readable secrets, services not behind auth) with a judgement pass.
      Open question: standalone, or folded into `/kaizen` (item 11)?
9. Should we make a basic function so that tools (and AI) can send messages to the user - email, matrix or ntfy?

10. **Claude setup** — DECIDED: brainstorm for intent → ADRs; hooks + slash commands +
    `/review-repo` for enforcement at scale. Remaining:
    1. ~~V4 collaboration policy.~~ → ADR-013.
    2. ~~Policy for how we write key documents like ADRs.~~ → ADR-023.
    3. Further development on how we collaborate on designing the foundation for the project - separate from how we implement new containers etc.
    4. ~~Always-latest official documentation for our tech.~~ → ADR-014.
    5. ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`).
    6. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
    7. ~~Reproducible agent toolchain.~~ → `.claude/settings.json` + `docs/runbooks/claude-code-setup.md`.
    8. **Screenshot hand-off to the agent.** Give the operator a smooth way to hand the
       agent a screenshot (e.g. of a Hetzner/VNC console during an incident) — the agent
       can already read image files; the gap is the hand-off. During the 2026-06-17
       incident the only diagnostic channel was console screenshots, copied manually to
       `/tmp` and `find`-located. Options: a known drop path the agent checks (e.g.
       `~/screenshots/`), a small `screenshot`/paste helper or slash-command, or a
       clipboard→file convention. Cheap, high-value for incident work.

11. **Kaizen loop** — `/kaizen` built (STATUS).
    1. ~~Build the loop command.~~ → `/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).
    2. Keep appending raw signals to `docs/FRICTION.md` (ongoing practice; see FRICTION.md).
    3. **Automation deferred** (revisit when the notify + cron stack is up): wire a
       **scheduled headless** run — report-only (proposes verdicts + notifies, does not
       auto-curate/commit). The on-demand command + recurrence/age nudge ship now.

12. **Spin-up / build order** — what is the right order of operations when spinning up
    from scratch (OS, DNS, Authentik, Caddy, …)?

13. **Intentions** - Is the current setup clearly identifying intentions throughout? We have the readme files but is that enough? Also, how do we rechallange desisions and how they interact over time. I.e. We have these two services running, but extending one a little bit could make the other redundant so we could remove it. Or an alternative to this services has emerged, and it is actually better.

14. **Script dependencies policy** — utility scripts (`tf_to_inventory.py`,
    `repo-scan.py`, `capacity-scan.py`, `friction-scan.py`) are stdlib-only by
    convention, for run-anywhere portability (control node, CI, bare clone, no venv).
    Reevaluate whether selectively allowing libraries (e.g. PyYAML — already present via
    Ansible) is a better fit in general: weigh the parsing-correctness win against losing
    zero-setup portability. Decide a clear rule and record it.

15. **Security hardening implementation** — build out the ADR-002 hardening standard.
    1. Implement the CIS Debian Benchmark **Level 1 + Level 2** in the `base` role
       (local tasks; CIS / `dev-sec` as reference only — no Galaxy roles). Includes
       AppArmor (enforce mode) and AIDE file-integrity.
    2. Implement the CIS Docker Benchmark: daemon/engine settings in `docker_host`;
       per-container settings enforced via `docs/security/service-checklist.md`.
    3. VM disk layout for CIS L2: separate `/tmp`, `/var`, `/var/log`, `/home`
       partitions with `nodev,nosuid,noexec` — a Terraform/cloud-init concern
       (ADR-006). Decide the template layout **before** provisioning, since it is
       painful to retrofit.
    4. Network IDS: enable Suricata on OPNsense (IDS first; IPS later?).
    5. Active security alerting: wire AIDE, `auditd`, `fail2ban`, and Suricata into
       the Loki/Grafana alerting stack (ties to 3.6).
    6. Supply-chain hygiene: enforce tiered image pinning (stateful `tag@digest`;
       stateless rolling tags — ADR-011) + official/verified images via the service
       checklist; revisit active scanning (Trivy/Grype) once a triage stack exists (R1).
    7. Is our network setup as it should be? I am not sure if all traffic between ubongo and notes goes via askari? what if askari breaks - will the rest work?

16. **ADR-011 (update management) — resolve open questions + accept.** Committed as
    **Proposed**; resolve before marking Accepted:
    1. Snapshot driver — control node calling the Proxmox API vs a Proxmox-side hook
       (crosses the TF/Ansible boundary, ADR-006/009).
    2. Cadences — is weekly OS patching right; should reboots be rarer than `apt`?
    3. Health-check harness — where it lives and the minimum bar that counts as
       "in order" before the weekly run ships (ties to ADR-008, TODO 2.2 / 8.2).
    4. Stateful classification home — per-role `__stateful` flag vs a group_vars list.
    5. Staging-first? — hit a staging host before production, or is snapshot-before +
       Friday timing enough at this scale?
    6. Notification/control channel — boma's own ntfy topics (ADR-013) + a "skip this
       week" / "pause" switch (ties to TODO 9).
    7. ~~Reconcile pinning conflict (tags vs digests).~~ → DECIDED: tiered (stateful `tag@digest`, stateless rolling); ADR-011 dec. 2 / ADR-004 / ADR-002.
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
+								# ToDo
-												docs(roadmap): add ROADMAP.md — remote-access-first build order

High-level build order for the project (Approach A): one Off-site/Remote-access
track first (Gandi DNS-as-code -> askari -> NetBird control plane -> enroll
ubongo + road-warrior laptops -> harden), a procurement gate sized by
/capacity-review, then the Cluster track. Sequences the docs/TODO.md backlog into
milestones and records why the order is what it is.

Decisions captured this session: Gandi over Cloudflare is values-driven and
independent of NetBird (sequenced first so records are born at Gandi); public DNS
managed as code (Ansible, consistent with internal DNS + Terraform-owns-no-DNS);
NetBird-on-ubongo before base default-deny (chicken-and-egg); cluster procurement
gated on patterns proven on two cheap hosts.

Wire ROADMAP.md into CLAUDE.md's Further-reading index and point TODO.md at it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-11 22:12:38 +02:00
+								> **Build order lives in `docs/ROADMAP.md`** — that sequences this backlog into
 								> milestones. This file is the decision backlog; the roadmap is the order we build them.
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+								>
 								> **Open items only.** Item numbers are stable cross-references (cited by ROADMAP,
 								> STATUS, ADRs, scripts) — **never renumber**. When an item is decided or built, collapse
 								> it to a one-line pointer in place; the full record lives in its ADR / `STATUS.md` / the
 								> `FRICTION.md` decisions ledger.
-												docs(roadmap): add ROADMAP.md — remote-access-first build order

High-level build order for the project (Approach A): one Off-site/Remote-access
track first (Gandi DNS-as-code -> askari -> NetBird control plane -> enroll
ubongo + road-warrior laptops -> harden), a procurement gate sized by
/capacity-review, then the Cluster track. Sequences the docs/TODO.md backlog into
milestones and records why the order is what it is.

Decisions captured this session: Gandi over Cloudflare is values-driven and
independent of NetBird (sequenced first so records are born at Gandi); public DNS
managed as code (Ansible, consistent with internal DNS + Terraform-owns-no-DNS);
NetBird-on-ubongo before base default-deny (chicken-and-egg); cluster procurement
gated on patterns proven on two cheap hosts.

Wire ROADMAP.md into CLAUDE.md's Further-reading index and point TODO.md at it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-11 22:12:38 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Forgejo CI** — what CI work remains after ADR-010 (which workflows, runner
 								   setup, etc. still need to be built)?
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Testing**
 . Choose and configure code-testing tooling (Molecule, etc.).
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. Decide how the AI interprets Molecule output and performs live testing — API
 								      calls, curl pulls of web products, log reviews. Headless browsing → ADR-017
 								      (`/verify-service`); the API/curl/log-review siblings remain open.
 . ~~Standard for test users + manual-test instructions.~~ → ADR-017.
-												docs(adr/status): integration-testing harness RED→GREEN validated (ADR-025)

The local-VM integration harness RED→GREEN acceptance passed on real hardware
(2026-06-18): a KVM VM on ubongo reproduced the 2026-06-17 nftables/Docker reboot
breakage (RED) and survived with the docker_host container-forward drop-in (GREEN).

ADR-025: Status updated to PASSED; shakedown learnings section added (UEFI boot
required, claude sudo load-bearing); ADR-021 added to Related.
STATUS.md: integration-harness section updated from PENDING to PASSED; ubongo
entry updated to reflect claude NOPASSWD sudo + sjat-ansible NOPASSWD removal;
last-reviewed date updated.
docs/TODO.md: item 2.4 collapsed to one-line pointer per the file's convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-18 21:39:30 +02:00
+. ~~Local VM integration testing on ubongo.~~ → ADR-025 / `make test-integration` (built + RED→GREEN validated 2026-06-18).
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Building services**
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. ~~Decide how to manage logs.~~ → ADR-018.
 . ~~Decide how to manage APIs / API access.~~ → ADR-021.
 . ~~Decide how to import/integrate from baobabAnsibleV4.~~ → ADR-013.
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. Decide what each node runs — base packages plus which apps/services.
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. ~~Decide the firewall strategy.~~ → ADR-020 (builds: host nftables in `base` done; OPNsense-as-code pending).
 . Wire up the monitoring stack — Prometheus + metric exporters, Uptime Kuma, and
 								      exactly which alerts live where. (Logging topology → ADR-018.)
 . ~~Define a tagging standard.~~ → ADR-019.
 . ~~Ensure the right things are backed up.~~ → ADR-022 (build: the `backup` role, Plans 2–3, pending).
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. Decide: a central database server, or individual database services per app?
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. Should we keep the custom base-container (Molecule test image) method for role
 								       testing, or revisit it as boma's testing approach matures (ADR-008)?
 . ~~Deliberate tagging strategy.~~ → ADR-019 (folded into 3.7).
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. ~~**Split-horizon FQDN.**~~ → ADR-007 / M1 (`wingu.me` three-tier; `nyumbani` dropped; mesh/LAN-only default).
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Control node**
 . Set up and test the control node while waiting for hardware.
 . Define control-node bootstrapping — a dedicated recipe and playbook?
-												review-repo: fix DNS-write contradictions + stale control-node/template refs

Auto-fixes from /review-repo:
- ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record"
  (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run)
- ADR-005: control node is physical ubongo, not cloned from the template (ADR-015)
- CLAUDE.md: add the VERIFY.md template to Further reading
- TODO.md: typo fixes (we we / seperate)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-05 18:23:16 +02:00
+. Set up rbw on the control node.
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. **Updating** — 1. Decide the update strategy across services & containers vs packages
 								   & builds / GitHub pulls / Flatpaks. 2. Define scheduling of updates and reboots,
 								   including post-update testing. (Tracked in item 16 / ADR-011.)
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Shell setup**
 . Decide what shell setup matters for the AI's work on the control node.
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. ~~Decide what to set up on the hosts (direct access rare).~~ → ADR-021.
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Scheduled work**
 . Run `/review-repo` as `claude -p` via cron every two weeks?
 . Build sanity checks (e.g. does PhotoPrism have its pictures? are email
 								      services receiving and sending?).
 . Design a declarative `scheduled_jobs` role so the repo owns which cronjobs
 								      run on a host, enforced by Ansible. Sketch (deferred until we have hosts):
 								      reads a `scheduled_jobs__jobs` list from group_vars/host_vars, rendered via
 								      a managed `/etc/cron.d` file. Open questions:
 . General role vs control-node-only?
 . Prune undeclared jobs (repo authoritative) vs additive?
 . Validate headless email and that cron's env has the `claude` CLI.
 . (The fortnightly `/review-repo` job is the first entry.)
 . Schedule `/capacity-review` to run periodically (on-demand only for now).
 								      Revisit once the physical cluster + a live usage-stats hook exist, so it
 								      reasons on real usage rather than declared intent alone. **Decide the usage
 								      source first:** Proxmox RRD (built-in, no extra infra) vs the
 								      Prometheus/Loki/Grafana/Grafana-Alloy stack we will likely set up anyway
 								      (richer, per-process, but more to run) — see TODO 3.6. Don't build the
 								      Proxmox-RRD hook before settling this, to avoid throwaway work.
-												Expand ADR-002 into a security baseline + strategy

Add a managerial security frame on top of the host baseline: explicit threat
model (opportunistic external, lateral movement/blast radius, operator/agent
error; supply chain accepted-lower-priority), security principles, and four
governance mechanisms that ADR-002 establishes and links out to:

- docs/security/service-checklist.md — per-service security bar (referenced
  from the new-role runbook)
- docs/security/accepted-risks.md — living accepted-risk register (R1-R4)
- planned /security-review skill (TODO 8.5)
- agent guardrails in CLAUDE.md "what Claude must not do"

STATUS.md records the frame as present (manual enforcement) and /security-review
as planned-not-built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-04 14:39:51 +02:00
+. Build a `/security-review` skill (sibling to `/review-repo`): re-check the
 								      security posture against ADR-002, surface drift, and re-challenge the
 								      accepted-risk register (`docs/security/accepted-risks.md`). Could pair a
 								      deterministic pre-scan (undeclared open ports, disabled baseline controls,
 								      world-readable secrets, services not behind auth) with a judgement pass.
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+								      Open question: standalone, or folded into `/kaizen` (item 11)?
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. Should we make a basic function so that tools (and AI) can send messages to the user - email, matrix or ntfy?
-												Track discussion backlog (docs/todo.md)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 18:23:19 +02:00
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. **Claude setup** — DECIDED: brainstorm for intent → ADRs; hooks + slash commands +
 								    `/review-repo` for enforcement at scale. Remaining:
 . ~~V4 collaboration policy.~~ → ADR-013.
 . ~~Policy for how we write key documents like ADRs.~~ → ADR-023.
-												review-repo: fix DNS-write contradictions + stale control-node/template refs

Auto-fixes from /review-repo:
- ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record"
  (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run)
- ADR-005: control node is physical ubongo, not cloned from the template (ADR-015)
- CLAUDE.md: add the VERIFY.md template to Further reading
- TODO.md: typo fixes (we we / seperate)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-05 18:23:16 +02:00
+. Further development on how we collaborate on designing the foundation for the project - separate from how we implement new containers etc.
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. ~~Always-latest official documentation for our tech.~~ → ADR-014.
 . ~~Always subagent-driven?~~ → DECIDED: yes (standing agreement; enforced by `.claude/hooks/guard-execution-mode-menu.sh`).
-												Expand ADR-002 into a security baseline + strategy

Add a managerial security frame on top of the host baseline: explicit threat
model (opportunistic external, lateral movement/blast radius, operator/agent
error; supply chain accepted-lower-priority), security principles, and four
governance mechanisms that ADR-002 establishes and links out to:

- docs/security/service-checklist.md — per-service security bar (referenced
  from the new-role runbook)
- docs/security/accepted-risks.md — living accepted-risk register (R1-R4)
- planned /security-review skill (TODO 8.5)
- agent guardrails in CLAUDE.md "what Claude must not do"

STATUS.md records the frame as present (manual enforcement) and /security-review
as planned-not-built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-04 14:39:51 +02:00
+. When AI deploys, i.e. runs playbooks etc., should we make a methodology so that it does not have to poll all the time or review all the output. Perhaps something about the MAKE method could provide only the relevant feedback?
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. ~~Reproducible agent toolchain.~~ → `.claude/settings.json` + `docs/runbooks/claude-code-setup.md`.
-												docs(todo): local VM integration testing (2.4) + screenshot hand-off (10.8)

From the 2026-06-17 mesh-hardening incident: Molecule can't catch
reboot/firewall-x-Docker/boot-order bugs — build local-VM pre-deploy testing
on ubongo (ADR-008 Level 2/3). And a smooth screenshot hand-off for the agent
during incidents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-17 22:27:26 +02:00
+. **Screenshot hand-off to the agent.** Give the operator a smooth way to hand the
 								       agent a screenshot (e.g. of a Hetzner/VNC console during an incident) — the agent
 								       can already read image files; the gap is the hand-off. During the 2026-06-17
 								       incident the only diagnostic channel was console screenshots, copied manually to
 								       `/tmp` and `find`-located. Options: a known drop path the agent checks (e.g.
 								       `~/screenshots/`), a small `screenshot`/paste helper or slash-command, or a
 								       clipboard→file convention. Cheap, high-value for incident work.
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
 . **Kaizen loop** — `/kaizen` built (STATUS).
 . ~~Build the loop command.~~ → `/kaizen` (`scripts/friction-scan.py` + `.claude/commands/kaizen.md`; spec `docs/superpowers/specs/2026-06-14-kaizen-command-design.md`).
 . Keep appending raw signals to `docs/FRICTION.md` (ongoing practice; see FRICTION.md).
 . **Automation deferred** (revisit when the notify + cron stack is up): wire a
 								       **scheduled headless** run — report-only (proposes verdicts + notifies, does not
 								       auto-curate/commit). The on-demand command + recurrence/age nudge ship now.
-												Add kaizen friction log and schedule the kaizen-loop setup

docs/FRICTION.md: a running log of friction/gotchas/recurring-fixes/unused tooling,
seeded with this session's real signals — raw material for the periodic kaizen
review. docs/TODO.md: schedule building /retro in ~1 week, and record the Claude-setup
decision. (Also carries your earlier backlog edits.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 22:05:40 +02:00
-												review-repo: fix DNS-write contradictions + stale control-node/template refs

Auto-fixes from /review-repo:
- ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record"
  (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run)
- ADR-005: control node is physical ubongo, not cloned from the template (ADR-015)
- CLAUDE.md: add the VERIFY.md template to Further reading
- TODO.md: typo fixes (we we / seperate)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-05 18:23:16 +02:00
+. **Spin-up / build order** — what is the right order of operations when spinning up
-												docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)

- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 19:06:33 +02:00
+								    from scratch (OS, DNS, Authentik, Caddy, …)?
-												Add kaizen friction log and schedule the kaizen-loop setup

docs/FRICTION.md: a running log of friction/gotchas/recurring-fixes/unused tooling,
seeded with this session's real signals — raw material for the periodic kaizen
review. docs/TODO.md: schedule building /retro in ~1 week, and record the Claude-setup
decision. (Also carries your earlier backlog edits.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 22:05:40 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Intentions** - Is the current setup clearly identifying intentions throughout? We have the readme files but is that enough? Also, how do we rechallange desisions and how they interact over time. I.e. We have these two services running, but extending one a little bit could make the other redundant so we could remove it. Or an alternative to this services has emerged, and it is actually better.
-												Add kaizen friction log and schedule the kaizen-loop setup

docs/FRICTION.md: a running log of friction/gotchas/recurring-fixes/unused tooling,
seeded with this session's real signals — raw material for the periodic kaizen
review. docs/TODO.md: schedule building /retro in ~1 week, and record the Claude-setup
decision. (Also carries your earlier backlog edits.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-30 22:05:40 +02:00
-												Add hardware reference & capacity-evaluation design spec

Brainstormed design for docs/hardware/reference.md (physical compute +
network gear + workload placement intent), a stdlib-only capacity-scan.py,
and an on-demand /capacity-review skill that reports to docs/hardware/reviews/.
Mirrors the repo-scan -> /review-repo -> docs/reviews triad.

TODO additions: schedule /capacity-review later and decide its usage-stats
source (Proxmox RRD vs the Prometheus/Loki/Grafana/Alloy stack) before
building any hook (8.4); reevaluate the stdlib-only script policy (#14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-01 09:59:16 +02:00
+. **Script dependencies policy** — utility scripts (`tf_to_inventory.py`,
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+								    `repo-scan.py`, `capacity-scan.py`, `friction-scan.py`) are stdlib-only by
 								    convention, for run-anywhere portability (control node, CI, bare clone, no venv).
 								    Reevaluate whether selectively allowing libraries (e.g. PyYAML — already present via
 								    Ansible) is a better fit in general: weigh the parsing-correctness win against losing
 								    zero-setup portability. Decide a clear rule and record it.
-												Re-challenge accepted risks; adopt CIS hardening + IDS

Walked the seeded accepted-risk register (R1-R4) and turned inherited gaps into
deliberate decisions:

- Supply chain (R1): tightened to required baseline hygiene (digest pinning,
  official/verified images); active scanning deferred — stays an accepted risk
- CIS (R2): adopted as a positive decision — CIS Debian L1+L2 (base role) + CIS
  Docker (docker_host + service checklist); app layer via the checklist
- SELinux/AppArmor (R3): AppArmor becomes a baseline control (CIS-enforced);
  register keeps a clean "no SELinux" accept
- IDS (R4): adopt AIDE (baseline via CIS) + Suricata on OPNsense + active alerting

Register shrinks from 4 inherited gaps to 2 deliberate accepts. ADR-002 gains a
Hardening standard section; STATUS + TODO 15 track the (unbuilt) implementation,
including the CIS L2 partition impact on VM provisioning (ADR-006).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-04 15:15:39 +02:00
 . **Security hardening implementation** — build out the ADR-002 hardening standard.
 . Implement the CIS Debian Benchmark **Level 1 + Level 2** in the `base` role
 								       (local tasks; CIS / `dev-sec` as reference only — no Galaxy roles). Includes
 								       AppArmor (enforce mode) and AIDE file-integrity.
 . Implement the CIS Docker Benchmark: daemon/engine settings in `docker_host`;
 								       per-container settings enforced via `docs/security/service-checklist.md`.
 . VM disk layout for CIS L2: separate `/tmp`, `/var`, `/var/log`, `/home`
 								       partitions with `nodev,nosuid,noexec` — a Terraform/cloud-init concern
 								       (ADR-006). Decide the template layout **before** provisioning, since it is
 								       painful to retrofit.
 . Network IDS: enable Suricata on OPNsense (IDS first; IPS later?).
 . Active security alerting: wire AIDE, `auditd`, `fail2ban`, and Suricata into
 								       the Loki/Grafana alerting stack (ties to 3.6).
-												Reconcile image pinning to a tiered tag@digest rule

Resolve the conflict between ADR-011 (tags-not-digests) and the security work
(digest pinning) with one coherent rule that respects ADR-011's stateless/stateful
split:

- Stateful → pin `tag@digest` (readable tag + integrity digest): legible diffs AND
  tamper-evidence. Snapshots cover broken updates; the digest covers swapped images.
- Stateless → rolling tags (latest/stable); digest-pinning would defeat the rolling
  design. Integrity rests on official/verified images + disposability.

Aligned across ADR-011 (decision 2), ADR-004 (image management), ADR-002
(supply-chain row), accepted-risk R1, the service checklist, and TODO 15.6.
TODO 16.7 marked decided.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-04 19:21:36 +02:00
+. Supply-chain hygiene: enforce tiered image pinning (stateful `tag@digest`;
 								       stateless rolling tags — ADR-011) + official/verified images via the service
 								       checklist; revisit active scanning (Trivy/Grype) once a triage stack exists (R1).
-												docs(todo): add note on ubongo↔cluster network topology question

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

											
										
										
											2026-06-19 19:15:18 +02:00
+. Is our network setup as it should be? I am not sure if all traffic between ubongo and notes goes via askari? what if askari breaks - will the rest work?
-												Add ADR-013 (V4 heritage policy); track ADR-011

ADR-013 sets how boma draws on AnsibleBaobabV4 without inheriting it:
translate-don't-transplant — V4 is evidence, never authority. It is a legitimate
source only of operational gotchas and working config snippets (re-derived on
boma's terms); never requirements, domain values, structure, or conventions.
Provenance stays transient (commits/conversation), durable docs stay clean. AI
consultation guardrails included. Resolves TODO 3.3 and 10.1.

Also bring ADR-011 (update management, Proposed draft) under version control:
- fix its "reuse V4's ntfy topics" line to "boma defines its own" (ADR-013)
- track its 6 open questions in TODO 16, plus a 7th: reconcile its tags-not-digests
  pinning with the digest-pinning the security work now mandates (R1 / checklist /
  15.6) — they currently conflict.

CLAUDE.md gains a V4 guardrail + ADR-013 pointer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-04 19:07:48 +02:00
 . **ADR-011 (update management) — resolve open questions + accept.** Committed as
 								    **Proposed**; resolve before marking Accepted:
 . Snapshot driver — control node calling the Proxmox API vs a Proxmox-side hook
 								       (crosses the TF/Ansible boundary, ADR-006/009).
 . Cadences — is weekly OS patching right; should reboots be rarer than `apt`?
 . Health-check harness — where it lives and the minimum bar that counts as
 								       "in order" before the weekly run ships (ties to ADR-008, TODO 2.2 / 8.2).
 . Stateful classification home — per-role `__stateful` flag vs a group_vars list.
 . Staging-first? — hit a staging host before production, or is snapshot-before +
 								       Friday timing enough at this scale?
 . Notification/control channel — boma's own ntfy topics (ADR-013) + a "skip this
 								       week" / "pause" switch (ties to TODO 9).
-												docs(todo): collapse done items to one-line pointers; open-only convention

TODO had accreted multi-line DECIDED/DONE summaries duplicating the ADRs they
cite. Collapsed every done item to a one-line "~~task~~ -> ADR-NNN" pointer and
added an "open items only" convention note up top. Item numbers are stable
cross-references (ROADMAP/STATUS/ADRs/scripts cite them) so they are PRESERVED,
not renumbered — verified all externally-referenced numbers survive. 176->136 lines.
No new ledger: the record already lives in the ADRs / STATUS.md / FRICTION ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

											
										
										
											2026-06-14 22:00:53 +02:00
+. ~~Reconcile pinning conflict (tags vs digests).~~ → DECIDED: tiered (stateful `tag@digest`, stateless rolling); ADR-011 dec. 2 / ADR-004 / ADR-002.