Document the 2026-06-18 incident class: a road-warrior laptop losing DNS on a network transition strands NetBird (can't resolve the coordinator FQDN), taking ubongo unreachable until DNS recovers. Adds triage (local DNS vs coordinator), device mitigations (reliable resolvers + hosts-file pin), the non-mesh LAN break-glass to ubongo, and why ubongo is relay-only (deferred mesh-hardening, not a bug) — including the break-glass rule that hardening must preserve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.4 KiB
Runbook — Enrolling a NetBird client (road-warrior device)
Joins a client/road-warrior device (laptop, desktop, phone) to the boma NetBird mesh
so it can reach ubongo and other peers from anywhere. The self-hosted coordinator is on
askari (ADR-016, M4b); enrollment lands a device on the 100.64.0.0/10 overlay.
Hosts vs clients. Managed Linux hosts join via the
baserole'smeshconcern (base__mesh_enabled: true+ the reusable key invault.netbird.setup_key) — see ADR-016 / thebaseREADME, not this runbook. This runbook is for user devices NetBird doesn't manage with Ansible.
verified: NetBird client install + self-hosted --management-url flow · docs.netbird.io
(/get-started/install/windows, /get-started/cli) · 2026-06-17
Prerequisites
- The coordinator's first-boot
/setupadmin exists and you can log in athttps://netbird.askari.wingu.me. - Auth, pick one:
- SSO (recommended for a personal device) — your dashboard account; no secret to copy.
- Setup key — dashboard → Settings → Setup Keys → a reusable key (mint a client-specific one for clean ACL grouping, or reuse the existing reusable key).
- Local admin rights on the device (the client installs a service).
- Coordinator facts: management URL
https://netbird.askari.wingu.me;ubongo=100.99.146.14(ubongo.netbird.selfhosted);askari=100.99.226.39.
Part A — Windows 11
- Install: download + run the MSI https://pkgs.netbird.io/windows/msi/x64
(official x64 client; installs the tray app + the
netbirdservice). - Connect from an elevated Windows Terminal / PowerShell ("Run as administrator"):
A browser opens — sign in with your dashboard account. (SSO won't open a browser? use a key:netbird up --management-url https://netbird.askari.wingu.menetbird up --setup-key <KEY> --management-url https://netbird.askari.wingu.me.) - Proceed to Part C (verify).
Part B — Other platforms (same management URL)
- macOS / Linux desktop: install the client (macOS: NetBird app / Homebrew; Linux:
pkgs.netbird.ioper the distro — same apt/rpm flow asbase'smeshconcern), thennetbird up --management-url https://netbird.askari.wingu.me(Linux: prefixsudo). - Android / iOS: install the NetBird app, then in Settings → Advanced /
Server set the management server to
https://netbird.askari.wingu.mebefore logging in; connect and complete the SSO login. (Setup keys are supported in-app too.)
Part C — Verify + use
netbird status # expect: Management: Connected, Signal: Connected, a 100.x NetBird IP
netbird status -d # peer detail — ubongo (100.99.146.14) + askari (100.99.226.39) listed
Reach ubongo over the mesh:
ssh sjat@100.99.146.14 # or: ssh sjat@ubongo.netbird.selfhosted
SSH auth is separate from the mesh: ubongo is key-only (passwords disabled), so the
device needs an SSH key authorised for sjat@ubongo. The mesh provides the network path;
the SSH key provides auth.
Troubleshooting — mesh drops / SSH to ubongo times out
Symptom: SSH to ubongo (or any peer) times out for minutes and recovers on its own;
netbird status shows Management/Signal: Disconnected or peers stuck Connecting.
verified: client DNS/relay behaviour + NRPT scope read from a 0.72.4 debug bundle;
mitigations per docs.netbird.io (/manage/dns/troubleshooting,
/help/troubleshooting-client) · 2026-06-18
1. Triage — is it your device or the coordinator? On the device:
netbird status -d # Management/Signal Connected? peers P2P/Relayed?
nslookup netbird.askari.wingu.me # coordinator FQDN
nslookup pkgs.netbird.io # a PUBLIC name — control test
If the relay/handshake errors say lookup netbird.askari.wingu.me: no such host and
a public name (pkgs.netbird.io) also fails to resolve, your local resolver is
dead — the coordinator and ubongo are almost certainly fine. NetBird only manages
*.netbird.selfhosted resolution (a single NRPT rule), so it is not the cause.
Confirm from the other side if you can: the dashboard shows peer last-seen; askari/
ubongo staying green ⇒ the fault is your device's network.
Why it cascades: NetBird re-resolves the coordinator FQDN on every reconnect. A
network transition (Wi-Fi ↔ phone hotspot, sleep/wake) that briefly kills DNS means it
can't reach management/signal/relay — and since ubongo is relay-only (below), there
is no direct path to fall back to, so SSH dies until DNS recovers.
2. Make the device resilient:
- Reliable resolvers — set the device's DNS to public resolvers (
1.1.1.1,8.8.8.8) rather than a network-handed or homelab-internal resolver that's unreachable off-LAN. Windows: inspect withGet-DnsClientServerAddress. - Pin the coordinator so a DNS hiccup can't strand the client — add to the hosts file
(
C:\Windows\System32\drivers\etc\hostsas admin, or/etc/hosts):77.42.120.136 netbird.askari.wingu.measkari's stable WAN IP; TLS still validates on the hostname. Removes the multi-minute reconnect deadlocks.
3. Break-glass — reach ubongo without the mesh. When the mesh is down you still need
a way in. On the home LAN, go straight to ubongo's wired address (bypasses the mesh and
coordinator DNS entirely):
ssh sjat@10.20.10.151 # ubongo eno1 (LAN) — verify this works from your device NOW
⚠️ This works today only because
ubongo's host-firewall default-deny is not yet applied. When the deferred mesh-hardening lands (SSH only onwt0), this path closes unless a break-glass SSH rule is added to the firewall catalog. That hardening must keep a non-mesh break-glass (catalog SSH rule from a trusted LAN/admin source) — else a DNS/mesh outage = full lockout. (ADR-021 break-glass.)
Why ubongo is relay-only (and P2P is not the fix). Peers connect to ubongo as
Relayed, never P2P: its nftables default-deny drops the inbound UDP that ICE
hole-punching needs (egress is open, so STUN itself succeeds). This is the intended
current posture — P2P / NAT-traversal is the deferred mesh-hardening (ADR-016/020,
STATUS.md). Enabling it needs a firewall-catalog UDP entry plus an accepted-risks.md
deviation or ADR amendment, and OPNsense NAT work — and it would not have prevented a
DNS-driven outage (a re-handshake still needs signal, which needs DNS). Tracked as future
hardening, not a quick fix.
Notes
- Split-tunnel: NetBird routes only the
100.xoverlay by default — normal/work networking is unaffected. - Persistence: the service auto-starts on boot and reconnects; the tray app has
Connect/Disconnect; CLI
netbird down/netbird up(no flags after first setup). - Troubleshooting — "failed while getting Management Service public key" / won't
register: confirm
https://netbird.askari.wingu.meloads in a browser from the device (DNS + TLS + the gRPC routing through Caddy are reachable), the URL is exact, and the terminal is elevated. For peers stuck Disconnected/Connecting or SSH-to-ubongotimeouts that recover on their own, see Troubleshooting — mesh drops above. - Removing a device:
netbird downthen uninstall; revoke its peer in the dashboard (and the setup key if one-off).