boma/docs/superpowers/plans/2026-06-14-m4b-netbird.md
sjat e3461375f5 docs(plan): M4b — NetBird coordinator service role
Capture NetBird's configure.sh reference for a pinned version → translate into
boma role templates (compose + management.json + dex/openid + turnserver),
external-proxy mode behind the M4a Caddy (netbird.askari.wingu.me). First service
role: full ADR-004 standard files; secrets generated/CHANGEME-stubbed (setup key
for M5). Gated live deploy + verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:20:04 +02:00

7.6 KiB
Raw Blame History

M4b — NetBird coordinator (service role) Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: superpowers:subagent-driven-development (recommended) or superpowers:executing-plans. Steps use - [ ] checkboxes.

Goal: Deploy the self-hosted NetBird control plane on askari as boma's first real service role (netbird), fronted by the M4a Caddy, reachable at https://netbird.askari.wingu.me with the embedded Dex login.

Architecture: NetBird's own configure.sh generates the canonical compose + config for a pinned version; boma captures that reference once and translates it into role templates (ADR-004/013 — don't run their imperative script in production, render from templates). Runs in external-reverse-proxy mode (no bundled Traefik); Caddy adds a netbird.askari.wingu.me route. Secrets (datastore encryption key, TURN password, Dex secrets) are generated into vault; the setup key is stubbed CHANGEME for M5.

Tech Stack: NetBird (combined netbird-server container if stable for the pinned version, else the multi-container set), embedded Dex IdP, Coturn, Docker Compose, Caddy (M4a), Ansible.

Spec: docs/superpowers/specs/2026-06-14-netbird-coordinator-m4-design.md · Prereq: M4a (Docker + Caddy) ✓ on askari.

Execution context: Task 1 runs configure.sh in a scratch dir (capture only). Tasks 26 author. Task 7 deploys live to askari (gated). NetBird self-hosting is finicky — expect live debugging.


Task 1: Capture NetBird's reference setup (pin the version)

  • Step 1: Pick + pin the NetBird version (ADR-014 — check the latest stable release). Record it.
  • Step 2: In a scratch dir (on ubongo, throwaway), fetch NetBird's getting-started/configure.sh for that version and run it with answers for: domain netbird.askari.wingu.me, external reverse proxy (disable bundled Traefik/Caddy), embedded Dex (no external SSO), Let's Encrypt off (Caddy terminates TLS).
  • Step 3: Capture the generated files verbatim into the plan/notes: docker-compose.yml, management.json (or config.yaml), turnserver.conf, openid-configuration.json, dashboard env. Also capture NetBird's Caddy external-proxy template (their docs ship one) — it shows the exact upstreams + HTTP/2/gRPC routing the dashboard/management/signal/relay need.
  • Step 4: No commit (reference capture; informs Tasks 24).

Task 2: netbird service role — templates

Files: roles/netbird/ (scaffold via make new-role NAME=netbird): defaults/main.yml, tasks/main.yml, templates/{docker-compose.yml,management.json,turnserver.conf,openid-configuration.json,dashboard.env}.j2, handlers/main.yml, README.md.

  • Step 1: Translate the captured compose into templates/docker-compose.yml.j2 — containers, the shared boma Docker network (so Caddy reaches them by name), no host port mappings except what Caddy/Coturn need (Coturn 3478/udp; everything else internal, Caddy fronts it). Pin image tags (ADR-011).
  • Step 2: Translate management.json/config.yaml into a template — fill Datadir, DataStoreEncryptionKey ({{ vault.netbird.datastore_key }}), HttpConfig (public URL https://netbird.askari.wingu.me), TURNConfig (coturn host + {{ vault.netbird.turn_password }}), Signal, Relay, Store (sqlite), and the embedded-Dex IdP block (DeviceAuthorizationFlow/PKCE, openid-configuration.json URL).
  • Step 3: turnserver.conf.j2 (realm = netbird.askari.wingu.me, the TURN secret), openid-configuration.json.j2, dashboard.env.j2 (NETBIRD_MGMT_API_ENDPOINT=https://netbird.askari.wingu.me, the AUTH_* Dex values).
  • Step 4: defaults/main.yml (netbird__* knobs: version, base_dir /opt/services/netbird, domain) + tasks/main.yml (ADR-004 deploy mechanics: ensure dir, render all files, community.docker.docker_compose_v2 up; netbird__manage toggle for Molecule).
  • Step 5: make lint; commit feat(netbird): coordinator service role (compose + config templates).

Task 3: Secrets (CHANGEME convention + generated)

  • Step 1: Add to vault (make edit-vault): vault.netbird.datastore_key, vault.netbird.turn_password, any Dex client secret — generate strong values (or stub CHANGEME + a comment if operator-supplied). Add vault.netbird.setup_key: CHANGEME with a comment "created in the NetBird dashboard after first boot — M5 enrolment".
  • Step 2: make check-vault confirms structure + lists the setup_key placeholder.
  • Step 3: Commit the vault.

Task 4: Wire Caddy + DNS

  • Step 1: Append to reverse_proxy__routes (group_vars/all/reverse_proxy.yml): {host: netbird.askari.wingu.me, upstream: "<netbird container:port>"} — per the captured Caddy template (NetBird needs HTTP/2 + gRPC; add the required Caddy directives, e.g. separate handles for the management gRPC path if the template shows them).
  • Step 2: netbird.askari.wingu.me already resolves via the *.askari.wingu.me wildcard (M4a) — no new DNS record.
  • Step 3: Commit.

Task 5: Service-role standard files (ADR-004, authored)

  • Step 1: Author roles/netbird/SECURITY.md (copy docs/security/service-security-template.md; record the public surface = Caddy 443 + Coturn 3478, embedded-Dex auth, accepted-risk R3).
  • Step 2: VERIFY.md (copy the template; the /verify-service UI spec — run later when the playwright harness exists).
  • Step 3: ACCESS.md (ADR-021; the dashboard/admin access + access__* intent).
  • Step 4: BACKUP.md (ADR-022; the datastore is statefulbackup__* data; record that off-site backup is pending fisi — an accepted risk for now).
  • Step 5: make lint; commit docs(netbird): service-role standard files (SECURITY/VERIFY/ACCESS/BACKUP).

Task 6: Add netbird to the offsite playbook

  • Step 1: In playbooks/offsite.yml, add netbird after reverse_proxy (role-name tag). make lint. Commit.

Task 7: Deploy to askari + verify (gated, live — expect debugging)

NetBird self-hosting is finicky; budget for iterating on the management config + Caddy routing.

  • Step 1: make check PLAYBOOK=offsite LIMIT=askari TAGS=netbird — review.
  • Step 2: make deploy PLAYBOOK=offsite LIMIT=askari TAGS=netbirdmake deploy ... TAGS=reverse_proxy (Caddy reloads with the netbird route).
  • Step 3: Verify: docker compose ps all healthy; curl -sI https://netbird.askari.wingu.me → 200 with the M4a cert; the dashboard loads in a browser; the management API responds. Iterate on config/routing until green.
  • Step 4: No repo commit (host state).

Task 8: Docs

  • Step 1: STATUS — netbird coordinator built + applied (dashboard live); the first service role. ROADMAP M4b done; M5 (enrol) next. make lint; commit.

Self-Review (completed)

  • Spec coverage: external-proxy NetBird + embedded Dex (Decisions 3) → Tasks 1,2,4; first service role + standard files (Decision 7) → Tasks 2,5; firewall 3478 (Decision 5) → done in M4a; setup key M5 + CHANGEME (Decision 8) → Task 3; Caddy front (M4a) → Task 4. Enrolment → M5, correct.
  • Placeholder scan: the concrete config field values are intentionally captured from configure.sh (Task 1) rather than invented — version-sensitive, and inventing them would be wrong. The plan pins the method, not guesses.
  • Risk: NetBird's external-proxy + gRPC routing is the hard part — Task 1 captures NetBird's own Caddy template to get it right, and Task 7 budgets for live iteration.