boma/roles/netbird_coordinator/README.md

68 lines
3.8 KiB
Markdown
Raw Normal View History

# netbird_coordinator
Self-hosted **NetBird coordinator** — the mesh-VPN control plane (ADR-016). Runs on
`askari` (the off-site Hetzner host) and is the rendezvous point every NetBird peer
talks to. Deployed via Docker Compose (ADR-004), behind the Caddy reverse proxy.
## Architecture — combined server
NetBird's self-hosted stack is now a **single combined server image** plus a separate
dashboard UI — there is no longer a separate signal / relay / coturn / dex container,
and no `turnserver.conf` / `management.json` / `openid-configuration.json`.
| Container | Image | Role |
|---|---|---|
| `netbird-server` | `netbirdio/netbird-server` | Management API + Signal + Relay + STUN + embedded Dex IdP (`/oauth2`), all on one process. Config at `/etc/netbird/config.yaml`. State in the `netbird_data` volume (SQLite). |
| `netbird-dashboard` | `netbirdio/dashboard` | Web UI. Configured purely by environment (`dashboard.env`); a public PKCE OIDC client, so its client secret is intentionally empty. |
Both containers join the **existing external `boma` Docker network** (created by the
`reverse_proxy` role's compose) so Caddy reaches them by container name. The only
host-exposed port is **`3478/udp` (STUN)**; HTTP/gRPC/WS traffic enters via Caddy over
the boma network, not via host ports.
### Reverse-proxy routing (added separately — M4a Caddy)
This role does **not** add the Caddy route. The route is a separate task and must
front several upstreams on `netbird-server` over the boma network, all to the same
backend:
- Native gRPC (signal + management) — matched by **`Content-Type: application/grpc*`**
(not by path) → `h2c://netbird-server:80`
- HTTP + WebSocket — paths `/relay*`, `/ws-proxy/*`, `/api/*`, `/oauth2/*``netbird-server:80`
- Dashboard catch-all — `/*``netbird-dashboard:80`
This matches NetBird's own external-proxy Caddy example: gRPC (the
`/management.ManagementService/*` + `/signalexchange.SignalExchange/*` services) is
selected by content-type rather than enumerated by path. gRPC needs HTTP/2 (h2c)
upstream support; WS/gRPC need long timeouts (Caddy sets none by default).
## Variables — `netbird_coordinator__*`
| Variable | Default | Description |
|---|---|---|
| `netbird_coordinator__server_image` | `netbirdio/netbird-server:0.72.4` | Combined server image (pinned; never `latest`) |
| `netbird_coordinator__dashboard_image` | `netbirdio/dashboard:v2.39.0` | Dashboard image (versioned independently of the server) |
| `netbird_coordinator__base_dir` | `/opt/services/netbird` | Working directory for the Compose project |
| `netbird_coordinator__domain` | `netbird.askari.wingu.me` | Public hostname; feeds `exposedAddress`, the OIDC issuer, redirect URIs, and the dashboard endpoints |
| `netbird_coordinator__trusted_proxies` | `["172.16.0.0/12"]` | Source ranges NetBird trusts `X-Forwarded-*` from (`server.reverseProxy.trustedHTTPProxies`). Must cover Caddy's source IP on the boma network — verify the actual bridge subnet at deploy |
| `netbird_coordinator__manage` | `true` | Set `false` in Molecule to render templates without a Docker daemon |
Production overrides live in `inventories/production/group_vars/`.
## Secrets
Two secrets come from the vault and are rendered into the host-side `config.yaml`
(mode 0640, `no_log`); they never touch the work tree or the dashboard:
- `vault.netbird.auth_secret``server.authSecret`
- `vault.netbird.datastore_key``server.store.encryptionKey` (base64; keep the padding)
The dashboard's OIDC client is a public PKCE client, so `AUTH_CLIENT_SECRET` is
intentionally empty — `dashboard.env` carries no secrets.
## `netbird_coordinator__manage` toggle
Docker operations (`docker compose up`, the restart handler) are gated on
`netbird_coordinator__manage | bool`. Molecule sets it `false` so the role can be tested
(template rendering, directory creation) without a Docker daemon.