2026-06-15 18:01:29 +02:00
|
|
|
|
# Security — netbird_coordinator (NetBird control plane)
|
|
|
|
|
|
|
|
|
|
|
|
## Exposure
|
|
|
|
|
|
|
|
|
|
|
|
- **Published ports:**
|
|
|
|
|
|
- `443/tcp` — **not host-published**; reached via the M4a Caddy reverse proxy on the
|
|
|
|
|
|
`boma` Docker network. Caddy fronts the dashboard SPA, the management REST API
|
2026-06-16 07:54:09 +02:00
|
|
|
|
(`/api`), the embedded Dex IdP (`/oauth2`), native gRPC over h2c (the management +
|
|
|
|
|
|
signal services, matched by `Content-Type: application/grpc*`), and the relay
|
|
|
|
|
|
WebSocket (`/relay*`, `/ws-proxy/*`). TLS terminates at Caddy (Let's Encrypt
|
2026-06-15 18:01:29 +02:00
|
|
|
|
HTTP-01); upstreams listen plain `:80` on the internal network only.
|
|
|
|
|
|
- `3478/udp` — **STUN, host-published directly** (`netbird-server`'s only host port),
|
|
|
|
|
|
bypassing Caddy because STUN is UDP and not HTTP.
|
|
|
|
|
|
- The **Hetzner Cloud Firewall already opens 80/443/3478** (done in M4a) — this role
|
|
|
|
|
|
adds **no** new firewall change. The host nftables `firewall_catalog` (ADR-020)
|
|
|
|
|
|
stays empty for askari; the cloud firewall is the authoritative edge here.
|
|
|
|
|
|
- In-container only, never published: metrics `:9090`, healthcheck `:9000`.
|
|
|
|
|
|
- **Auth surface:** the **embedded Dex IdP** shipped inside `netbird-server` (served at
|
|
|
|
|
|
`/oauth2`). The dashboard authenticates as a **public PKCE OIDC client**
|
|
|
|
|
|
(`AUTH_CLIENT_ID=netbird-dashboard`, **no client secret** — intentionally empty). The
|
|
|
|
|
|
management REST/gRPC API is behind Dex-issued JWTs. The **first admin user is created
|
|
|
|
|
|
via a one-time `/setup` page on first boot**, reachable only while zero users exist;
|
|
|
|
|
|
once an admin exists, `/setup` is closed. Peer enrolment uses **setup keys** minted in
|
|
|
|
|
|
the dashboard after login (used in M5, not part of this provisioning).
|
|
|
|
|
|
- **Reachability:** public — askari is internet-facing. The HTTP surface is reachable
|
|
|
|
|
|
only through Caddy (single public entry point, ADR-024); STUN/3478-udp is reachable
|
|
|
|
|
|
directly on askari's public IP. The management API controls the whole mesh, so this is
|
|
|
|
|
|
a deliberate public attack surface (see accepted risk **R3** below).
|
|
|
|
|
|
- **Data sensitivity:** **stateful** — holds the entire mesh control-plane state (peers,
|
|
|
|
|
|
setup keys, ACLs, IdP users) in an **encrypted SQLite datastore** at `/var/lib/netbird`
|
|
|
|
|
|
in the `netbird_data` volume. The datastore is encrypted with
|
|
|
|
|
|
`vault.netbird.datastore_key`; a restore needs **both** the volume **and** that key.
|
|
|
|
|
|
See backup record: `BACKUP.md` (`backup__state: true`).
|
|
|
|
|
|
|
|
|
|
|
|
## Checklist status
|
|
|
|
|
|
|
|
|
|
|
|
Each item from `docs/security/service-checklist.md`:
|
|
|
|
|
|
|
|
|
|
|
|
- [x] Secrets in vault; no default creds; nothing secret in git/images — ✅ two secrets
|
|
|
|
|
|
come from the vault (`vault.netbird.auth_secret`, `vault.netbird.datastore_key`),
|
|
|
|
|
|
rendered into host-side `config.yaml` (mode `0640`, task `no_log: true`). No default
|
|
|
|
|
|
creds: the first admin is bootstrapped interactively via `/setup`; the dashboard's
|
|
|
|
|
|
OIDC client secret is intentionally empty (public PKCE), not a leaked credential.
|
|
|
|
|
|
- [x] Non-root; no `privileged`/host-network unless justified; minimal mounts; caps
|
|
|
|
|
|
dropped — ⚠️ both containers run the upstream images' default user; no `privileged`,
|
|
|
|
|
|
no host networking (bridge `boma`). `netbird-server` mounts the read-only `config.yaml`
|
|
|
|
|
|
(`:ro`) and the `netbird_data` named volume; it publishes only `3478/udp`. Hardening
|
|
|
|
|
|
is the upstream default; revisit if NetBird documents a rootless/cap-drop posture.
|
|
|
|
|
|
- [x] Ports declared; behind reverse proxy + auth if exposed; least-privilege
|
|
|
|
|
|
inter-service reach — ✅ the HTTP surface (443) is behind Caddy + Dex auth; STUN/3478
|
|
|
|
|
|
is intentionally direct (UDP, can't proxy) and opened only at the Hetzner Cloud
|
|
|
|
|
|
Firewall (M4a). Containers reach Caddy by name on the `boma` network; nothing else is
|
|
|
|
|
|
published.
|
|
|
|
|
|
- [x] Image pinned (tag/digest), update path known — ⚠️ stateful tier (ADR-011) — pinned
|
|
|
|
|
|
to exact tags `netbirdio/netbird-server:0.72.4` and `netbirdio/dashboard:v2.39.0`, not
|
|
|
|
|
|
yet `tag@digest`. Watched by DIUN; bumped deliberately on boma's cadence (ADR-011).
|
|
|
|
|
|
Tighten to digests when convenient.
|
|
|
|
|
|
- [x] Logs reviewable; backup/restore covered if stateful — ✅ `docker logs
|
|
|
|
|
|
netbird-server` / `netbird-dashboard` now (json-file driver capped at 500m×2 since the
|
|
|
|
|
|
default never rotates), Loki labels declared for the ADR-018 pipeline. Stateful: backup
|
|
|
|
|
|
is declared in `BACKUP.md` but **not yet captured** (pending the fisi pull node — see
|
|
|
|
|
|
Residual risks).
|
|
|
|
|
|
|
|
|
|
|
|
## Service-specific hardening
|
|
|
|
|
|
|
|
|
|
|
|
- **Trusted-proxy pinning:** `server.reverseProxy.trustedHTTPProxies` is set from
|
|
|
|
|
|
`netbird_coordinator__trusted_proxies` so NetBird honours `X-Forwarded-*` **only** from
|
|
|
|
|
|
Caddy's source range on the `boma` bridge — rendered via `to_json` so an empty override
|
|
|
|
|
|
becomes `[]` (trust nothing), never YAML `null`. Tighten the range to Caddy's actual
|
|
|
|
|
|
container subnet at deploy (`docker network inspect boma`).
|
|
|
|
|
|
- **`/setup` self-closes:** the one-time admin-bootstrap page is reachable only while the
|
|
|
|
|
|
IdP has zero users — first login closes the window, so there is no standing
|
|
|
|
|
|
unauthenticated admin-creation route.
|
|
|
|
|
|
- **No standing unauthenticated admin surface:** the management REST/gRPC API requires a
|
|
|
|
|
|
Dex-issued JWT; metrics (`:9090`) and healthcheck (`:9000`) are in-container only and
|
|
|
|
|
|
never published (`access__api` describes the authenticated path).
|
|
|
|
|
|
- **Secrets never reach the dashboard or work tree:** `config.yaml` (with both secrets)
|
|
|
|
|
|
is rendered `0640` with `no_log`; `dashboard.env` carries no secrets (public client).
|
|
|
|
|
|
|
|
|
|
|
|
## Residual / accepted risks
|
|
|
|
|
|
|
|
|
|
|
|
- **Public mesh control plane on askari** — the management API + dashboard (443 via
|
|
|
|
|
|
Caddy) and STUN (3478/udp) are exposed on askari's public IP; the management API
|
|
|
|
|
|
controls the whole mesh. Accepted as **R3** in `docs/security/accepted-risks.md`
|
|
|
|
|
|
(self-hosting = no third-party trust + an off-site control plane that survives a
|
|
|
|
|
|
homelab outage). Mitigated by TLS + embedded-Dex login, trusted-proxy pinning, `base`
|
|
|
|
|
|
hardening, and version-pinned NetBird patched on boma's cadence. Revisit per R3's
|
|
|
|
|
|
trigger (a coordinator compromise / unpatched NetBird CVE, or the management plane
|
|
|
|
|
|
becoming reachable without auth). *(Note: R3's text says "Coturn (UDP 3478)"; the
|
|
|
|
|
|
v0.72.4 combined server actually exposes plain STUN on 3478/udp with no Coturn — same
|
|
|
|
|
|
port and surface, no functional difference to the accepted risk.)*
|
|
|
|
|
|
- **Off-site backup not yet captured** — the service is stateful (`backup__state: true`)
|
|
|
|
|
|
but the restic/`fisi` pull pipeline (ADR-022 Plan 2) is not built. Until then, the
|
|
|
|
|
|
encrypted datastore is **not** backed up off-host: a loss of askari loses the mesh
|
|
|
|
|
|
control-plane state (recoverable only by re-bootstrapping a fresh coordinator and
|
|
|
|
|
|
re-enrolling peers). Accepted for now; revisit when `fisi` lands. See `BACKUP.md`.
|
|
|
|
|
|
- **Images pinned to tags, not digests** — stateful tier wants `tag@digest` (ADR-011);
|
|
|
|
|
|
currently exact tags. Revisit when convenient.
|