README/SECURITY said gRPC was path-matched (/management.ManagementService/* etc.); the deployed Caddy route selects gRPC by Content-Type: application/grpc* (NetBird's own external-proxy example). Reconciled the prose to what actually runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.7 KiB
6.7 KiB
Security — netbird_coordinator (NetBird control plane)
Exposure
- Published ports:
443/tcp— not host-published; reached via the M4a Caddy reverse proxy on thebomaDocker network. Caddy fronts the dashboard SPA, the management REST API (/api), the embedded Dex IdP (/oauth2), native gRPC over h2c (the management + signal services, matched byContent-Type: application/grpc*), and the relay WebSocket (/relay*,/ws-proxy/*). TLS terminates at Caddy (Let's Encrypt HTTP-01); upstreams listen plain:80on the internal network only.3478/udp— STUN, host-published directly (netbird-server's only host port), bypassing Caddy because STUN is UDP and not HTTP.- The Hetzner Cloud Firewall already opens 80/443/3478 (done in M4a) — this role
adds no new firewall change. The host nftables
firewall_catalog(ADR-020) stays empty for askari; the cloud firewall is the authoritative edge here. - In-container only, never published: metrics
:9090, healthcheck:9000.
- Auth surface: the embedded Dex IdP shipped inside
netbird-server(served at/oauth2). The dashboard authenticates as a public PKCE OIDC client (AUTH_CLIENT_ID=netbird-dashboard, no client secret — intentionally empty). The management REST/gRPC API is behind Dex-issued JWTs. The first admin user is created via a one-time/setuppage on first boot, reachable only while zero users exist; once an admin exists,/setupis closed. Peer enrolment uses setup keys minted in the dashboard after login (used in M5, not part of this provisioning). - Reachability: public — askari is internet-facing. The HTTP surface is reachable only through Caddy (single public entry point, ADR-024); STUN/3478-udp is reachable directly on askari's public IP. The management API controls the whole mesh, so this is a deliberate public attack surface (see accepted risk R3 below).
- Data sensitivity: stateful — holds the entire mesh control-plane state (peers,
setup keys, ACLs, IdP users) in an encrypted SQLite datastore at
/var/lib/netbirdin thenetbird_datavolume. The datastore is encrypted withvault.netbird.datastore_key; a restore needs both the volume and that key. See backup record:BACKUP.md(backup__state: true).
Checklist status
Each item from docs/security/service-checklist.md:
- Secrets in vault; no default creds; nothing secret in git/images — ✅ two secrets
come from the vault (
vault.netbird.auth_secret,vault.netbird.datastore_key), rendered into host-sideconfig.yaml(mode0640, taskno_log: true). No default creds: the first admin is bootstrapped interactively via/setup; the dashboard's OIDC client secret is intentionally empty (public PKCE), not a leaked credential. - Non-root; no
privileged/host-network unless justified; minimal mounts; caps dropped — ⚠️ both containers run the upstream images' default user; noprivileged, no host networking (bridgeboma).netbird-servermounts the read-onlyconfig.yaml(:ro) and thenetbird_datanamed volume; it publishes only3478/udp. Hardening is the upstream default; revisit if NetBird documents a rootless/cap-drop posture. - Ports declared; behind reverse proxy + auth if exposed; least-privilege
inter-service reach — ✅ the HTTP surface (443) is behind Caddy + Dex auth; STUN/3478
is intentionally direct (UDP, can't proxy) and opened only at the Hetzner Cloud
Firewall (M4a). Containers reach Caddy by name on the
bomanetwork; nothing else is published. - Image pinned (tag/digest), update path known — ⚠️ stateful tier (ADR-011) — pinned
to exact tags
netbirdio/netbird-server:0.72.4andnetbirdio/dashboard:v2.39.0, not yettag@digest. Watched by DIUN; bumped deliberately on boma's cadence (ADR-011). Tighten to digests when convenient. - Logs reviewable; backup/restore covered if stateful — ✅
docker logs netbird-server/netbird-dashboardnow (json-file driver capped at 500m×2 since the default never rotates), Loki labels declared for the ADR-018 pipeline. Stateful: backup is declared inBACKUP.mdbut not yet captured (pending the fisi pull node — see Residual risks).
Service-specific hardening
- Trusted-proxy pinning:
server.reverseProxy.trustedHTTPProxiesis set fromnetbird_coordinator__trusted_proxiesso NetBird honoursX-Forwarded-*only from Caddy's source range on thebomabridge — rendered viato_jsonso an empty override becomes[](trust nothing), never YAMLnull. Tighten the range to Caddy's actual container subnet at deploy (docker network inspect boma). /setupself-closes: the one-time admin-bootstrap page is reachable only while the IdP has zero users — first login closes the window, so there is no standing unauthenticated admin-creation route.- No standing unauthenticated admin surface: the management REST/gRPC API requires a
Dex-issued JWT; metrics (
:9090) and healthcheck (:9000) are in-container only and never published (access__apidescribes the authenticated path). - Secrets never reach the dashboard or work tree:
config.yaml(with both secrets) is rendered0640withno_log;dashboard.envcarries no secrets (public client).
Residual / accepted risks
- Public mesh control plane on askari — the management API + dashboard (443 via
Caddy) and STUN (3478/udp) are exposed on askari's public IP; the management API
controls the whole mesh. Accepted as R3 in
docs/security/accepted-risks.md(self-hosting = no third-party trust + an off-site control plane that survives a homelab outage). Mitigated by TLS + embedded-Dex login, trusted-proxy pinning,basehardening, and version-pinned NetBird patched on boma's cadence. Revisit per R3's trigger (a coordinator compromise / unpatched NetBird CVE, or the management plane becoming reachable without auth). (Note: R3's text says "Coturn (UDP 3478)"; the v0.72.4 combined server actually exposes plain STUN on 3478/udp with no Coturn — same port and surface, no functional difference to the accepted risk.) - Off-site backup not yet captured — the service is stateful (
backup__state: true) but the restic/fisipull pipeline (ADR-022 Plan 2) is not built. Until then, the encrypted datastore is not backed up off-host: a loss of askari loses the mesh control-plane state (recoverable only by re-bootstrapping a fresh coordinator and re-enrolling peers). Accepted for now; revisit whenfisilands. SeeBACKUP.md. - Images pinned to tags, not digests — stateful tier wants
tag@digest(ADR-011); currently exact tags. Revisit when convenient.