boma/roles/netbird_coordinator/README.md

65 lines
3.6 KiB
Markdown
Raw Normal View History

# netbird_coordinator
Self-hosted **NetBird coordinator** — the mesh-VPN control plane (ADR-016). Runs on
`askari` (the off-site Hetzner host) and is the rendezvous point every NetBird peer
talks to. Deployed via Docker Compose (ADR-004), behind the Caddy reverse proxy.
## Architecture — combined server
NetBird's self-hosted stack is now a **single combined server image** plus a separate
dashboard UI — there is no longer a separate signal / relay / coturn / dex container,
and no `turnserver.conf` / `management.json` / `openid-configuration.json`.
| Container | Image | Role |
|---|---|---|
| `netbird-server` | `netbirdio/netbird-server` | Management API + Signal + Relay + STUN + embedded Dex IdP (`/oauth2`), all on one process. Config at `/etc/netbird/config.yaml`. State in the `netbird_data` volume (SQLite). |
| `netbird-dashboard` | `netbirdio/dashboard` | Web UI. Configured purely by environment (`dashboard.env`); a public PKCE OIDC client, so its client secret is intentionally empty. |
Both containers join the **existing external `boma` Docker network** (created by the
`reverse_proxy` role's compose) so Caddy reaches them by container name. The only
host-exposed port is **`3478/udp` (STUN)**; HTTP/gRPC/WS traffic enters via Caddy over
the boma network, not via host ports.
### Reverse-proxy routing (added separately — M4a Caddy)
This role does **not** add the Caddy route. The route is a separate task and must
front several upstreams on `netbird-server` over the boma network, all to the same
backend:
- HTTP — `/api/*`, `/oauth2/*`
- Native gRPC (h2c) — `/signalexchange.SignalExchange/*`, `/management.ManagementService/*`
- WebSocket — `/relay*`, `/ws-proxy/*` (upgrade + long timeouts)
- Dashboard catch-all — `/*``netbird-dashboard`
gRPC needs HTTP/2 (h2c) upstream support; WS/gRPC need extended timeouts.
## Variables — `netbird_coordinator__*`
| Variable | Default | Description |
|---|---|---|
| `netbird_coordinator__server_image` | `netbirdio/netbird-server:0.72.4` | Combined server image (pinned; never `latest`) |
| `netbird_coordinator__dashboard_image` | `netbirdio/dashboard:v2.39.0` | Dashboard image (versioned independently of the server) |
| `netbird_coordinator__base_dir` | `/opt/services/netbird` | Working directory for the Compose project |
| `netbird_coordinator__domain` | `netbird.askari.wingu.me` | Public hostname; feeds `exposedAddress`, the OIDC issuer, redirect URIs, and the dashboard endpoints |
| `netbird_coordinator__trusted_proxies` | `["172.16.0.0/12"]` | Source ranges NetBird trusts `X-Forwarded-*` from (`server.reverseProxy.trustedHTTPProxies`). Must cover Caddy's source IP on the boma network — verify the actual bridge subnet at deploy |
| `netbird_coordinator__manage` | `true` | Set `false` in Molecule to render templates without a Docker daemon |
Production overrides live in `inventories/production/group_vars/`.
## Secrets
Two secrets come from the vault and are rendered into the host-side `config.yaml`
(mode 0640, `no_log`); they never touch the work tree or the dashboard:
- `vault.netbird.auth_secret``server.authSecret`
- `vault.netbird.datastore_key``server.store.encryptionKey` (base64; keep the padding)
The dashboard's OIDC client is a public PKCE client, so `AUTH_CLIENT_SECRET` is
intentionally empty — `dashboard.env` carries no secrets.
## `netbird_coordinator__manage` toggle
Docker operations (`docker compose up`, the restart handler) are gated on
`netbird_coordinator__manage | bool`. Molecule sets it `false` so the role can be tested
(template rendering, directory creation) without a Docker daemon.