diff --git a/docs/TODO.md b/docs/TODO.md index c54226a..9d9d38e 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -15,15 +15,19 @@ `/verify-service` report. 3. **Building services** - 1. Decide how to manage logs. + 1. ~~Decide how to manage logs.~~ DECIDED (ADR-018): all logs → on-cluster Loki via + Grafana Alloy (in `base`); a security subset also ships write-only off-site to + `askari` (append-only); Grafana queries both. WORM skipped (accepted-risk R4). 2. Decide how to manage APIs / API access. 3. ~~Decide how to import or integrate from baobabAnsibleV4.~~ DECIDED (ADR-013): translate-don't-transplant — V4 is a source only of gotchas + working config snippets, re-derived on boma's terms; never structure/requirements/values. 4. Decide what each node runs — base packages plus which apps/services. 5. Decide the firewall strategy (which firewall, ruleset, per-host vs central). - 6. Wire up Loki, Prometheus, Grafana dashboards, Grafana alerts, and Uptime - Kuma alerts on askari. + 6. Wire up the monitoring stack. Logging topology DECIDED (ADR-018): cluster Loki + (all logs) + off-site security subset on `askari` + Grafana on-cluster (not the + whole stack on `askari`). Still to design/build: Prometheus + metric exporters, + Uptime Kuma, and exactly which alerts live where. 7. Define a tagging standard that lets us target runs without over-tagging. 8. Ensure the right things are backed up (incl. database dumps if we land on PBS). 9. Decide: a central database server, or individual database services per app?