base 'mesh' concern enrols NetBird agents on ubongo + askari via a reusable scoped setup key (vault); laptops enrolled by the operator. Reachability via the default peer policy; the base nftables default-deny on ubongo + ACL tightening are deferred to a follow-on. Resolves ROADMAP M5 design; next: writing-plans. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.7 KiB
M5 — Mesh enrollment (NetBird agents) → mobile access · design
Status: Design (2026-06-17). Implements ROADMAP M5, the last milestone of Phase 1
(remote access). Builds on M4b (the netbird_coordinator is live on askari). Design
resolved by ADR-016 (mesh, agent-per-host) and ADR-021 (SSH ladder); this spec is
the build-shaping for that decision. Next: writing-plans.
Goal
ubongo reachable from anywhere over the self-hosted NetBird mesh — the Phase-1
mobile-access goal. Reachability only. The host-firewall lockdown and NetBird
ACL-tightening are deliberately deferred (see §6).
Decisions (settled in brainstorming)
- Scope = reachability, not lockdown. The goal needs only: agents enrolled + the
laptops on the mesh + a peer policy permitting laptop→
ubongo.ubongo's SSH is already open, so reachability requires no firewall change. Applying thebasenftables default-deny toubongois the lockout-risky part on the control node and is split into a follow-on (§6). - One reusable, scoped, expiring setup key. A single reusable key in
vault.netbird.setup_key, scoped to auto-assign peers to aboma-hostsgroup, with an expiry.basere-runs idempotently across hosts. Matches ADR-016's single vault path; blast radius is limited by scope + expiry + the fact that joining the mesh grants no access on its own (peer policy gates that). Rejected: per-host one-off ephemeral keys — more operator toil and they don't fit a single vault key for a re-runnable role. askariis enrolled as a peer (ADR-016: it runs the stack and is a peer). The agent coexists with the coordinator container on the same host. Enables later movingaskari's SSH off the Hetzner-firewall WAN allow ontowt0, and gives a host-to-host mesh link verifiable fromubongo.
Architecture
1. New base concern: mesh (agent enrollment)
A new roles/base/tasks/mesh.yml, included from base/tasks/main.yml via
include_tasks with apply: { tags: [mesh] } (the dynamic-include tag-propagation
gotcha — see existing concerns), tagged mesh. A new mesh entry is added to the closed
tag vocabulary in tests/tags.yml.
The concern:
- installs a pinned NetBird agent from the official NetBird apt repo (repo + key added
like
docker_hostdoes for Docker; exact package + version verified in the plan per ADR-014). Version-pinned (ADR-011). - enrolls idempotently: run
netbird up --management-url {{ base__mesh_management_url }} --setup-key <key>only whennetbird statusreports the host is not already connected (guard on acommandcheck,changed_whenaccordingly). The setup key is passed withno_log: true. - does NOT touch the host firewall. Enrollment is purely additive:
wt0comes up,sshdkeeps listening on all interfaces exactly as today. No lockout risk in M5.
Knobs (base__mesh_*, defaults in roles/base/defaults/main.yml):
| Var | Default | Purpose |
|---|---|---|
base__mesh_enabled |
false |
Policy/opt-in gate. false ⇒ the whole concern is skipped, so applying base to a host not ready to join the mesh is a no-op. Set true per host/group (ubongo, askari) to enrol. |
base__mesh_manage |
true |
Test gate for the live daemon step. true ⇒ run netbird up; Molecule sets false so the concern can be exercised without a real coordinator/key (mirrors reverse_proxy__manage / netbird_coordinator__manage). |
base__mesh_management_url |
https://netbird.askari.wingu.me |
The M4b coordinator. |
base__mesh_setup_key |
"{{ vault.netbird.setup_key }}" |
Reusable scoped key (vault). |
base__mesh_version |
pinned (plan) | NetBird agent version (ADR-011). |
2. Vault
Add vault.netbird.setup_key: CHANGEME with a comment stating it is a reusable, scoped
(boma-hosts), expiring setup key minted in the NetBird dashboard after first-boot
/setup. The agent cannot mint it — the operator supplies it via make edit-vault.
make check-vault lists the outstanding CHANGEME until then. base/tasks/mesh.yml wires
to {{ vault.netbird.setup_key }}.
3. Enrollment scope
ubongo—basemeshconcern applied (tagged), bringing upwt0. Its otherbaseconcerns (firewall,hardening) stay unapplied —TAGS=meshscopes the run to enrollment only, so no default-deny lands on the control node.askari—basemeshconcern applied; agent enrols against its own public coordinator URL and coexists with the coordinator container.mamba+ work laptop — operator installs the NetBird client and logs in via the dashboard (embedded Dex SSO). Not Ansible-managed; out of automation scope.
4. Reachability
M5 relies on NetBird's default peer policy for laptop→ubongo reachability. The plan
verifies the pinned version's default-policy behaviour (ADR-014); if it is not
allow-by-default, the plan adds one minimal policy permitting the admin group → ubongo
SSH. ACL-tightening to default-deny + scoped policies (ADR-016 intent) is deferred
(§6).
Testing
- Automated (I do, needs nothing from operator): Molecule for the
basemeshconcern withbase__mesh_enabled: true,base__mesh_manage: false, and a dummyvault.netbird.setup_key— so the install/enrol tasks are exercised but the livenetbird up(which needs a real coordinator + key) is gated off. Note: this concern is install + a daemon command, so its render-only surface is thin (the "render-only tests miss the real call" gotcha) — Molecule asserts the enrol command is constructed correctly + idempotency guard works; full enrollment is proven in the live step below. Also assertbase__mesh_enabled: falseis a clean no-op.make lint(incl.check-tagsfor the newmeshtag). - Live (gated, after the operator handoff): apply
baseTAGS=meshtoubongo+askari; verifywt0is up and theubongo↔askarimesh link works fromubongo(both are peers I manage — e.g.netbird statusshows the peer, ping the peer's mesh IP). - Goal verification (operator): from a laptop on the mesh, SSH
ubongoover its NetBird/wt0address. This is the mobile-access goal landing.
Operator handoff (the steps only the operator can do)
- Dashboard
/setup(one-time) → create the admin user. - Mint a reusable, scoped (
boma-hosts), expiring setup key →make edit-vaultto replace theCHANGEME→ re-encrypt. (make check-vaultconfirms.) - Install the NetBird client on
mamba+ the work laptop, log in via the dashboard. - Confirm SSH to
ubongoover the mesh.
Out of scope / deferred (the "mesh hardening" follow-on)
basenftables default-deny onubongo(SSH only onwt0+ thebase__firewall_control_addrLAN fallback, ADR-021/020). Built + dormant today; applying it to the control node is the lockout-risky step and gets its own deliberate change after the mesh path toubongois proven solid.- NetBird ACL tightening to default-deny + scoped per-group policies (ADR-016: admin
peers →
srv+mgmt, clients least-privilege). M5 uses the default policy. askariSSH ontowt0(retiring the Hetzner-firewall WAN SSH allow) — enabled byaskarinow being a peer, but a separate change.
Maps to
ADR-016 (mesh, agent-per-host, setup keys in vault), ADR-021 (SSH ladder — wt0 primary +
ssh-from-control; the lockdown that uses this is deferred), ADR-020 (host firewall —
default-deny deferred), ADR-002 (security baseline), ADR-011 (version-pinned agent),
ADR-004 (enrollment lives in base, not a new role), ADR-014 (verify agent
version/package + default-policy behaviour in the plan).