boma/roles/netbird_coordinator/SECURITY.md
sjat 94dd6da14c docs(netbird): describe gRPC routing as the deployed Content-Type matcher
README/SECURITY said gRPC was path-matched (/management.ManagementService/* etc.);
the deployed Caddy route selects gRPC by Content-Type: application/grpc* (NetBird's
own external-proxy example). Reconciled the prose to what actually runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 07:54:09 +02:00

6.7 KiB
Raw Permalink Blame History

Security — netbird_coordinator (NetBird control plane)

Exposure

  • Published ports:
    • 443/tcpnot host-published; reached via the M4a Caddy reverse proxy on the boma Docker network. Caddy fronts the dashboard SPA, the management REST API (/api), the embedded Dex IdP (/oauth2), native gRPC over h2c (the management + signal services, matched by Content-Type: application/grpc*), and the relay WebSocket (/relay*, /ws-proxy/*). TLS terminates at Caddy (Let's Encrypt HTTP-01); upstreams listen plain :80 on the internal network only.
    • 3478/udpSTUN, host-published directly (netbird-server's only host port), bypassing Caddy because STUN is UDP and not HTTP.
    • The Hetzner Cloud Firewall already opens 80/443/3478 (done in M4a) — this role adds no new firewall change. The host nftables firewall_catalog (ADR-020) stays empty for askari; the cloud firewall is the authoritative edge here.
    • In-container only, never published: metrics :9090, healthcheck :9000.
  • Auth surface: the embedded Dex IdP shipped inside netbird-server (served at /oauth2). The dashboard authenticates as a public PKCE OIDC client (AUTH_CLIENT_ID=netbird-dashboard, no client secret — intentionally empty). The management REST/gRPC API is behind Dex-issued JWTs. The first admin user is created via a one-time /setup page on first boot, reachable only while zero users exist; once an admin exists, /setup is closed. Peer enrolment uses setup keys minted in the dashboard after login (used in M5, not part of this provisioning).
  • Reachability: public — askari is internet-facing. The HTTP surface is reachable only through Caddy (single public entry point, ADR-024); STUN/3478-udp is reachable directly on askari's public IP. The management API controls the whole mesh, so this is a deliberate public attack surface (see accepted risk R3 below).
  • Data sensitivity: stateful — holds the entire mesh control-plane state (peers, setup keys, ACLs, IdP users) in an encrypted SQLite datastore at /var/lib/netbird in the netbird_data volume. The datastore is encrypted with vault.netbird.datastore_key; a restore needs both the volume and that key. See backup record: BACKUP.md (backup__state: true).

Checklist status

Each item from docs/security/service-checklist.md:

  • Secrets in vault; no default creds; nothing secret in git/images — two secrets come from the vault (vault.netbird.auth_secret, vault.netbird.datastore_key), rendered into host-side config.yaml (mode 0640, task no_log: true). No default creds: the first admin is bootstrapped interactively via /setup; the dashboard's OIDC client secret is intentionally empty (public PKCE), not a leaked credential.
  • Non-root; no privileged/host-network unless justified; minimal mounts; caps dropped — ⚠️ both containers run the upstream images' default user; no privileged, no host networking (bridge boma). netbird-server mounts the read-only config.yaml (:ro) and the netbird_data named volume; it publishes only 3478/udp. Hardening is the upstream default; revisit if NetBird documents a rootless/cap-drop posture.
  • Ports declared; behind reverse proxy + auth if exposed; least-privilege inter-service reach — the HTTP surface (443) is behind Caddy + Dex auth; STUN/3478 is intentionally direct (UDP, can't proxy) and opened only at the Hetzner Cloud Firewall (M4a). Containers reach Caddy by name on the boma network; nothing else is published.
  • Image pinned (tag/digest), update path known — ⚠️ stateful tier (ADR-011) — pinned to exact tags netbirdio/netbird-server:0.72.4 and netbirdio/dashboard:v2.39.0, not yet tag@digest. Watched by DIUN; bumped deliberately on boma's cadence (ADR-011). Tighten to digests when convenient.
  • Logs reviewable; backup/restore covered if stateful — docker logs netbird-server / netbird-dashboard now (json-file driver capped at 500m×2 since the default never rotates), Loki labels declared for the ADR-018 pipeline. Stateful: backup is declared in BACKUP.md but not yet captured (pending the fisi pull node — see Residual risks).

Service-specific hardening

  • Trusted-proxy pinning: server.reverseProxy.trustedHTTPProxies is set from netbird_coordinator__trusted_proxies so NetBird honours X-Forwarded-* only from Caddy's source range on the boma bridge — rendered via to_json so an empty override becomes [] (trust nothing), never YAML null. Tighten the range to Caddy's actual container subnet at deploy (docker network inspect boma).
  • /setup self-closes: the one-time admin-bootstrap page is reachable only while the IdP has zero users — first login closes the window, so there is no standing unauthenticated admin-creation route.
  • No standing unauthenticated admin surface: the management REST/gRPC API requires a Dex-issued JWT; metrics (:9090) and healthcheck (:9000) are in-container only and never published (access__api describes the authenticated path).
  • Secrets never reach the dashboard or work tree: config.yaml (with both secrets) is rendered 0640 with no_log; dashboard.env carries no secrets (public client).

Residual / accepted risks

  • Public mesh control plane on askari — the management API + dashboard (443 via Caddy) and STUN (3478/udp) are exposed on askari's public IP; the management API controls the whole mesh. Accepted as R3 in docs/security/accepted-risks.md (self-hosting = no third-party trust + an off-site control plane that survives a homelab outage). Mitigated by TLS + embedded-Dex login, trusted-proxy pinning, base hardening, and version-pinned NetBird patched on boma's cadence. Revisit per R3's trigger (a coordinator compromise / unpatched NetBird CVE, or the management plane becoming reachable without auth). (Note: R3's text says "Coturn (UDP 3478)"; the v0.72.4 combined server actually exposes plain STUN on 3478/udp with no Coturn — same port and surface, no functional difference to the accepted risk.)
  • Off-site backup not yet captured — the service is stateful (backup__state: true) but the restic/fisi pull pipeline (ADR-022 Plan 2) is not built. Until then, the encrypted datastore is not backed up off-host: a loss of askari loses the mesh control-plane state (recoverable only by re-bootstrapping a fresh coordinator and re-enrolling peers). Accepted for now; revisit when fisi lands. See BACKUP.md.
  • Images pinned to tags, not digests — stateful tier wants tag@digest (ADR-011); currently exact tags. Revisit when convenient.