boma/docs/superpowers/plans/2026-06-14-m4b-netbird.md
sjat e3461375f5 docs(plan): M4b — NetBird coordinator service role
Capture NetBird's configure.sh reference for a pinned version → translate into
boma role templates (compose + management.json + dex/openid + turnserver),
external-proxy mode behind the M4a Caddy (netbird.askari.wingu.me). First service
role: full ADR-004 standard files; secrets generated/CHANGEME-stubbed (setup key
for M5). Gated live deploy + verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:20:04 +02:00

91 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# M4b — NetBird coordinator (service role) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: superpowers:subagent-driven-development (recommended) or superpowers:executing-plans. Steps use `- [ ]` checkboxes.
**Goal:** Deploy the self-hosted NetBird control plane on askari as boma's first real service role (`netbird`), fronted by the M4a Caddy, reachable at `https://netbird.askari.wingu.me` with the embedded Dex login.
**Architecture:** NetBird's own `configure.sh` generates the canonical compose + config for a pinned version; boma **captures that reference once and translates it into role templates** (ADR-004/013 — don't run their imperative script in production, render from templates). Runs in **external-reverse-proxy mode** (no bundled Traefik); Caddy adds a `netbird.askari.wingu.me` route. Secrets (datastore encryption key, TURN password, Dex secrets) are generated into vault; the setup key is stubbed `CHANGEME` for M5.
**Tech Stack:** NetBird (combined `netbird-server` container if stable for the pinned version, else the multi-container set), embedded Dex IdP, Coturn, Docker Compose, Caddy (M4a), Ansible.
**Spec:** `docs/superpowers/specs/2026-06-14-netbird-coordinator-m4-design.md` · **Prereq:** M4a (Docker + Caddy) ✓ on askari.
**Execution context:** Task 1 runs `configure.sh` in a scratch dir (capture only). Tasks 26 author. **Task 7 deploys live to askari** (gated). NetBird self-hosting is finicky — expect live debugging.
---
### Task 1: Capture NetBird's reference setup (pin the version)
- [ ] **Step 1:** Pick + pin the NetBird version (ADR-014 — check the latest stable release). Record it.
- [ ] **Step 2:** In a scratch dir (on ubongo, throwaway), fetch NetBird's `getting-started`/`configure.sh` for that version and run it with answers for: domain `netbird.askari.wingu.me`, **external reverse proxy** (disable bundled Traefik/Caddy), **embedded Dex** (no external SSO), Let's Encrypt off (Caddy terminates TLS).
- [ ] **Step 3:** Capture the generated files verbatim into the plan/notes: `docker-compose.yml`, `management.json` (or `config.yaml`), `turnserver.conf`, `openid-configuration.json`, dashboard env. Also capture NetBird's **Caddy external-proxy template** (their docs ship one) — it shows the exact upstreams + HTTP/2/gRPC routing the dashboard/management/signal/relay need.
- [ ] **Step 4:** No commit (reference capture; informs Tasks 24).
---
### Task 2: `netbird` service role — templates
**Files:** `roles/netbird/` (scaffold via `make new-role NAME=netbird`): `defaults/main.yml`, `tasks/main.yml`, `templates/{docker-compose.yml,management.json,turnserver.conf,openid-configuration.json,dashboard.env}.j2`, `handlers/main.yml`, `README.md`.
- [ ] **Step 1:** Translate the captured compose into `templates/docker-compose.yml.j2` — containers, the shared `boma` Docker network (so Caddy reaches them by name), **no host port mappings except what Caddy/Coturn need** (Coturn 3478/udp; everything else internal, Caddy fronts it). Pin image tags (ADR-011).
- [ ] **Step 2:** Translate `management.json`/`config.yaml` into a template — fill `Datadir`, `DataStoreEncryptionKey` (`{{ vault.netbird.datastore_key }}`), `HttpConfig` (public URL `https://netbird.askari.wingu.me`), `TURNConfig` (coturn host + `{{ vault.netbird.turn_password }}`), `Signal`, `Relay`, `Store` (sqlite), and the embedded-Dex IdP block (DeviceAuthorizationFlow/PKCE, `openid-configuration.json` URL).
- [ ] **Step 3:** `turnserver.conf.j2` (realm = `netbird.askari.wingu.me`, the TURN secret), `openid-configuration.json.j2`, `dashboard.env.j2` (`NETBIRD_MGMT_API_ENDPOINT=https://netbird.askari.wingu.me`, the `AUTH_*` Dex values).
- [ ] **Step 4:** `defaults/main.yml` (`netbird__*` knobs: version, base_dir `/opt/services/netbird`, domain) + `tasks/main.yml` (ADR-004 deploy mechanics: ensure dir, render all files, `community.docker.docker_compose_v2` up; `netbird__manage` toggle for Molecule).
- [ ] **Step 5:** `make lint`; commit `feat(netbird): coordinator service role (compose + config templates)`.
---
### Task 3: Secrets (CHANGEME convention + generated)
- [ ] **Step 1:** Add to vault (`make edit-vault`): `vault.netbird.datastore_key`, `vault.netbird.turn_password`, any Dex client secret — **generate** strong values (or stub `CHANGEME` + a comment if operator-supplied). Add `vault.netbird.setup_key: CHANGEME` with a comment "created in the NetBird dashboard after first boot — M5 enrolment".
- [ ] **Step 2:** `make check-vault` confirms structure + lists the `setup_key` placeholder.
- [ ] **Step 3:** Commit the vault.
---
### Task 4: Wire Caddy + DNS
- [ ] **Step 1:** Append to `reverse_proxy__routes` (`group_vars/all/reverse_proxy.yml`): `{host: netbird.askari.wingu.me, upstream: "<netbird container:port>"}` — per the captured Caddy template (NetBird needs HTTP/2 + gRPC; add the required Caddy directives, e.g. separate handles for the management gRPC path if the template shows them).
- [ ] **Step 2:** `netbird.askari.wingu.me` already resolves via the `*.askari.wingu.me` wildcard (M4a) — no new DNS record.
- [ ] **Step 3:** Commit.
---
### Task 5: Service-role standard files (ADR-004, authored)
- [ ] **Step 1:** Author `roles/netbird/SECURITY.md` (copy `docs/security/service-security-template.md`; record the public surface = Caddy 443 + Coturn 3478, embedded-Dex auth, accepted-risk R3).
- [ ] **Step 2:** `VERIFY.md` (copy the template; the `/verify-service` UI spec — run later when the playwright harness exists).
- [ ] **Step 3:** `ACCESS.md` (ADR-021; the dashboard/admin access + `access__*` intent).
- [ ] **Step 4:** `BACKUP.md` (ADR-022; the **datastore is stateful**`backup__*` data; record that off-site backup is **pending `fisi`** — an accepted risk for now).
- [ ] **Step 5:** `make lint`; commit `docs(netbird): service-role standard files (SECURITY/VERIFY/ACCESS/BACKUP)`.
---
### Task 6: Add netbird to the offsite playbook
- [ ] **Step 1:** In `playbooks/offsite.yml`, add `netbird` after `reverse_proxy` (role-name tag). `make lint`. Commit.
---
### Task 7: Deploy to askari + verify (gated, live — expect debugging)
> NetBird self-hosting is finicky; budget for iterating on the management config + Caddy routing.
- [ ] **Step 1:** `make check PLAYBOOK=offsite LIMIT=askari TAGS=netbird` — review.
- [ ] **Step 2:** `make deploy PLAYBOOK=offsite LIMIT=askari TAGS=netbird``make deploy ... TAGS=reverse_proxy` (Caddy reloads with the netbird route).
- [ ] **Step 3:** Verify: `docker compose ps` all healthy; `curl -sI https://netbird.askari.wingu.me` → 200 with the M4a cert; the **dashboard loads** in a browser; the management API responds. Iterate on config/routing until green.
- [ ] **Step 4:** No repo commit (host state).
---
### Task 8: Docs
- [ ] **Step 1:** STATUS — `netbird` coordinator built + applied (dashboard live); the first service role. ROADMAP M4b done; **M5 (enrol) next**. `make lint`; commit.
---
## Self-Review (completed)
- **Spec coverage:** external-proxy NetBird + embedded Dex (Decisions 3) → Tasks 1,2,4; first service role + standard files (Decision 7) → Tasks 2,5; firewall 3478 (Decision 5) → done in M4a; setup key M5 + CHANGEME (Decision 8) → Task 3; Caddy front (M4a) → Task 4. Enrolment → M5, correct.
- **Placeholder scan:** the concrete config field *values* are intentionally captured from `configure.sh` (Task 1) rather than invented — version-sensitive, and inventing them would be wrong. The plan pins the method, not guesses.
- **Risk:** NetBird's external-proxy + gRPC routing is the hard part — Task 1 captures NetBird's own Caddy template to get it right, and Task 7 budgets for live iteration.