11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01 description, base/playbooks/scripts/terraform/public_dns README build-state, CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23 stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported. make lint green. No stale-deferred (ADR-011 open questions still open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
155 lines
10 KiB
Markdown
155 lines
10 KiB
Markdown
# boma capabilities overview
|
|
|
|
A high-level, **living** map of what boma is intended to do — organised by capability
|
|
domain, decided on boma's own terms. This is the frame of reference for "what does
|
|
what" alignment, for judging which tools/plugins are relevant, and (later) for
|
|
planning what runs on which node. It is **intent**, not status — see `STATUS.md` for
|
|
what actually exists.
|
|
|
|
Decided fresh per **ADR-013** (V4 is not a source of scope); a V4 completeness check
|
|
is recorded at the bottom. Individual service *picks* (e.g. Jellyfin vs Plex) and
|
|
node placement are **surfaced here, not resolved here** — they are downstream
|
|
decisions this frame enables.
|
|
|
|
## Legend
|
|
|
|
- **Tier** — **P** platform (others depend on it) · **A** user-facing app · **S** supporting
|
|
- **Commitment** — **core** (foundational, committed) · **planned** (intended) ·
|
|
**candidate** (wanted, option not settled) · **maybe-later** (nice to have)
|
|
- ⚠️ = interacts with an existing ADR; reconcile before building.
|
|
|
|
---
|
|
|
|
## 1. Edge & networking — [P]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Reverse proxy / TLS | Caddy (ADR-024) | P | core | Edge routing + ACME certs for everything exposed | Spin-up order names it (TODO 12) |
|
|
| Internal DNS | `dns` role → dns1/dns2 | P | core | Authoritative internal zone (ADR-007) | Ansible-rendered zone |
|
|
| Public DNS | `public_dns` role → Gandi LiveDNS | P | core | wingu.me zone as code (ADR-007) | anti-spoof baseline; mesh/LAN-only default; applied (M1) |
|
|
| VPN / remote access | NetBird (self-hosted on `askari`) | P | core | Secure mesh remote access to `srv`/`mgmt` | **Decided (ADR-016):** NetBird mesh replaces ADR-007 OPNsense WireGuard |
|
|
| Service portal / dashboard | Homepage | A | candidate | One landing page listing all services — a "what does what" front door | Gap surfaced by V4; fits boma's legibility goal |
|
|
|
|
_(DHCP, firewall, mDNS reflection live on OPNsense — Ansible-managed, not containers.)_
|
|
|
|
_Firewalling is two-layer (ADR-020): OPNsense at the perimeter + inter-VLAN, plus
|
|
per-host `nftables` (default-deny inbound + east-west allowlist) rendered by the `base`
|
|
role from a shared `group_vars` service catalog. The host `nftables` layer is built (the
|
|
`base` firewall concern); the OPNsense layer is still to be built._
|
|
|
|
## 2. Identity & access — [P]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| SSO / identity provider | Authentik | P | planned | Central auth / forward-auth for exposed services | Named in spin-up order (TODO 12) |
|
|
| Secrets / password vault | Vaultwarden | P | core | Personal vault; also holds the Ansible vault master password | Already used by `rbw` (ADR-002) |
|
|
|
|
## 3. Observability — [P]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Metrics | Prometheus | P | planned | Time-series metrics + alert rules | TODO 3.6 |
|
|
| Logs | Loki (cluster all-logs + off-site security subset on `askari`) | P | core | Central log aggregation; a security subset ships write-only off-site (append-only) | **Decided (ADR-018)** |
|
|
| Log shipping agent | Grafana Alloy (in `base`) | P | core | Collects journald + container + security logs on every host; ships to Loki (ADR-018) | **Decided (ADR-018)** |
|
|
| Dashboards | Grafana | P | planned | Visualisation + alerting (incl. AIDE/`auditd`/`fail2ban`/Suricata + log-silence — ADR-018) | TODO 3.6 |
|
|
| Uptime checks | Uptime Kuma | P | planned | Endpoint up/down checks | TODO 3.6 |
|
|
| External watchdog | askari (Hetzner VPS) | P | core | Off-site monitoring that survives a homelab outage | ADR-007 |
|
|
| Notify / alerting | ntfy · Matrix · email (multi-channel) | S | planned | Deliver alerts to the user across channels | TODO 9; Matrix homeserver in §8 |
|
|
| Metric exporters | node_exporter, cAdvisor, … | S | planned | Feed Prometheus | per host/service |
|
|
|
|
## 4. Source & CI — [P/A]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Git hosting | Forgejo | P/A | core | Repo hosting + off-machine backup | **Live** (ADR-010) |
|
|
| CI runner | Forgejo Actions (`act_runner`) | P | planned | Pipelines (lint/test/deploy) | ADR-010 / ADR-008 |
|
|
|
|
## 5. Home automation & IoT — [A]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Automation hub | Home Assistant | A | planned | IoT controller; bridges `srv`↔`iot` | ADR-007 (`10.20.0.13`) |
|
|
| Device messaging | MQTT broker (Mosquitto) | S | candidate | Messaging backbone for HA/devices | Only if devices need it |
|
|
| Radio / firmware bridges | Zigbee2MQTT · ESPHome | S | maybe-later | Zigbee/ESP device integration | Hardware-dependent |
|
|
|
|
## 6. Media — [A]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Media server | Jellyfin · *vs* Plex | A | candidate | Stream media to the household | Jellyfin = FOSS default |
|
|
| Library automation | \*arr stack (Sonarr/Radarr/…) | A | candidate | Acquire/organise media | ADR-011 example |
|
|
| Download client | qBittorrent (+ VPN) | S | candidate | Fetching | Egress isolation needed |
|
|
| Indexer helper | FlareSolverr | S | candidate | CAPTCHA/Cloudflare for indexers | ADR-011 example |
|
|
| Books / audiobooks / podcasts | Audiobookshelf · Calibre-Web · (Readarr) | A | candidate | Ebook + audiobook + podcast library | Gap surfaced by V4; video-only was too narrow |
|
|
| Download VPN egress | Gluetun | S | candidate | Routes the download client through a commercial VPN | Pairs with qBittorrent |
|
|
|
|
## 7. Personal cloud & files — [A]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Files / sync | Nextcloud | A | planned | Personal cloud, file sync & share; CalDAV/CardDAV (contacts & calendar) | ADR-011 |
|
|
| Photos | PhotoPrism · *vs* Immich | A | candidate | Photo library + phone backup | ADR-011 lists PhotoPrism; Immich is the modern alt |
|
|
| Office documents | Collabora Online | A | candidate | In-browser document editing for Nextcloud | Gap surfaced by V4 |
|
|
| LAN file shares | Samba · NFS | S | candidate | Raw SMB/NFS shares (distinct from Nextcloud sync) | Gap surfaced by V4; only if a direct-share need exists |
|
|
|
|
## 8. Communications — [A]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Real-time chat | Matrix homeserver (Synapse · *vs* Conduit) | A | planned | Self-hosted messaging; also an alert route | Stateful + internet-facing → careful exposure, own `SECURITY.md` |
|
|
| Bridges | mautrix-* | S | maybe-later | Bridge other networks into Matrix | After the homeserver is stable |
|
|
| Self-hosted email | Poste.io · Mailcow | A | maybe-later | Run boma's own mail server | ⚠️ Deliverability + security are heavy; V4 ran one — re-justify hard before committing |
|
|
|
|
## 9. Data & backup — [P/S]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Databases | Postgres/MariaDB — central *vs* per-app | P | candidate | Backing store for stateful apps | Open: central server vs per-service (TODO 3.9) |
|
|
| Backup engine | restic (data-only) | S | planned | Per-service state: file dirs + logical DB dumps, pulled by `fisi` | ADR-022 (PBS deferred) |
|
|
| Off-site target | pCloud (via rclone) | S | planned | Encrypted off-site copy of the restic repo (3-2-1) | ADR-022; sync-coupled |
|
|
| Air-gap target | USB hard drives | S | planned | Rotated offline cold copy — the immutable backstop | ADR-022; udev-triggered `restic copy` |
|
|
|
|
## 10. Operations & support — [S]
|
|
|
|
| Capability | Candidate service(s) | Tier | Commitment | What it does | Notes / open |
|
|
|---|---|---|---|---|---|
|
|
| Update watcher | DIUN | S | planned | New-image alerts driving the update process | ADR-011 |
|
|
| Scheduled jobs | `scheduled_jobs` role + `claude -p` jobs | S | planned | Declarative cron: `/review-repo`, security/capacity reviews, sanity checks | TODO 8 |
|
|
| Sanity / smoke | whoami + health checks | S | planned | Verification endpoints + "is it actually working" checks | ADR-011 / TODO 8.2 |
|
|
| Service-UI verification | `/verify-service` skill | S | planned | Claude-driven exploratory Level 4 acceptance check of a deployed service's UI | Decided (ADR-017); running deferred on ubongo + playwright + Authentik |
|
|
|
|
- **Targeted runs** (ADR-019): playbooks are sliced with `--tags` along two axes —
|
|
role/service (tag = role name) or a closed list of cross-cutting concerns
|
|
(`firewall`, `logging`, `config`, `deploy`, …); the vocabulary is lint-enforced.
|
|
|
|
---
|
|
|
|
## V4 completeness check
|
|
|
|
Run against AnsibleBaobabV4's role set per ADR-013/014 — used **only** as a coverage
|
|
check, not a source of scope. Each finding was re-justified on boma's terms before it
|
|
changed anything here.
|
|
|
|
**Strong alignment (confirms the fresh frame).** Most of boma's picks correspond to
|
|
services V4 actually ran: Traefik, DNS, Vaultwarden, Forgejo (+runner), Prometheus,
|
|
Grafana, Grafana-Alloy, Loki, Uptime Kuma, exporters, ntfy, DIUN, Nextcloud (+db),
|
|
PhotoPrism (+db), Jellyfin, the \*arr stack, qBittorrent, FlareSolverr, Home Assistant,
|
|
a Matrix homeserver (V4: Conduwuit) + Element web client, WireGuard, backups (incl.
|
|
cloud). That overlap is reassuring — these were arrived at independently and match.
|
|
|
|
**Gaps it surfaced (added above as `candidate`/`maybe-later`, re-justified):** books/
|
|
audiobooks/podcasts library; Gluetun download-VPN egress; Collabora office docs;
|
|
Samba/NFS LAN shares; CalDAV/CardDAV via Nextcloud; a service portal (Homepage);
|
|
self-hosted email (parked maybe-later — heavy). Also noted: a Matrix deployment needs
|
|
a **client** (Element web), not just the homeserver.
|
|
|
|
**Long-tail surfaced but parked (maybe-later, not added as rows):** local self-hosted
|
|
AI/LLM, a game server (Minecraft), generic static-site hosting. Plausible someday;
|
|
none are committed.
|
|
|
|
**Confirmed exclusions (V4 had them; boma deliberately does not).** V4 mixed in a lot
|
|
of **workstation/desktop** config — XFCE/GNOME desktops, kiosk mode, nvim/kitty/tmux,
|
|
LibreOffice, antivirus, remote desktop. boma is **server-only**, so these are correctly
|
|
absent. Likewise the removed Knowledge domain (Discourse, Snipe-IT, MRBS booking) and
|
|
V4-specific project websites — out of boma's scope by design. The narrower surface is
|
|
intentional, not an oversight.
|