8 tasks: build the base 'mesh' concern + tag + vault stub + per-host opt-in (autonomous), operator handoff for /setup + setup key, gated live enrol of ubongo + askari, operator laptop enrol, docs. Reachability-only; lockdown deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14 KiB
M5 — Mesh enrollment (NetBird agents) Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax.
Goal: ubongo reachable from anywhere over the NetBird mesh — enrol NetBird agents on ubongo + askari via a new opt-in base mesh concern; the operator enrols the laptops.
Architecture: A new base concern (roles/base/tasks/mesh.yml) installs a pinned NetBird agent and runs netbird up with a reusable scoped setup key from vault. Gated by base__mesh_enabled (per-host opt-in) and base__mesh_manage (skips network/daemon actions for Molecule). No firewall change — enrollment is additive (wt0 comes up, SSH keeps listening), so there is zero lockout risk. The host nftables default-deny + NetBird ACL tightening are a separate, deferred follow-on.
Tech Stack: NetBird agent (apt, pinned), Ansible (base role), Molecule, the M4b coordinator at https://netbird.askari.wingu.me.
Spec: docs/superpowers/specs/2026-06-17-m5-mesh-enrollment-design.md
Execution context: Tasks 1–4 author + commit (need nothing from the operator). Task 5 is an operator handoff (dashboard /setup + mint key). Task 6 applies live to ubongo + askari (gated). Task 7 is operator-only (laptops). Task 8 docs.
File structure
| File | Change | Responsibility |
|---|---|---|
tests/tags.yml |
modify | add the mesh concern to the closed tag vocabulary |
roles/base/defaults/main.yml |
modify | base__mesh_* knobs |
roles/base/tasks/mesh.yml |
create | the enrollment concern (install + netbird up) |
roles/base/tasks/main.yml |
modify | include mesh.yml (gated, tagged) |
roles/base/README.md |
modify | document the mesh concern + knobs |
roles/base/molecule/default/converge.yml |
modify | enable mesh (manage off) + dummy key |
roles/base/molecule/default/verify.yml |
modify | assert mesh wiring / no-op |
inventories/production/group_vars/control/vars.yml |
modify | base__mesh_enabled: true (ubongo) |
inventories/production/group_vars/offsite_hosts/vars.yml |
create | base__mesh_enabled: true (askari) |
inventories/production/group_vars/all/vault.yml |
modify (vault) | vault.netbird.setup_key: CHANGEME |
STATUS.md, docs/ROADMAP.md, docs/FRICTION.md |
modify | M5 done; deferred hardening; friction note |
Task 1: Verify + pin the NetBird agent; add the mesh tag
- Step 1 (ADR-014 verification — record the answers): confirm against current NetBird docs/repo (WebFetch
docs.netbird.io,pkgs.netbird.io):- the apt repo URL + signing-key URL + suite/component (the install-script publishes an apt source — capture the exact
debline and key URL); - the package name (headless agent — expected
netbird) and that version0.72.4(matching the coordinator) is installable, plus the apt version-pin syntax; - the exact
netbird statusoutput string that indicates an established management connection (for the idempotency guard — e.g.Management: Connected); - the
netbird upflags (--management-url,--setup-key); - whether the pinned NetBird's default peer policy is allow-by-default (decides §Task 6 step 4). Record all of this in the commit message / a note block.
- the apt repo URL + signing-key URL + suite/component (the install-script publishes an apt source — capture the exact
- Step 2: add
meshtotests/tags.ymlunderconcerns::
- mesh # NetBird agent enrollment (ADR-016)
- Step 3:
make lint→ expectcheck-tags: OK(an unused vocab entry is allowed; nothing references it yet). Expected: 0 failures. - Step 4: commit
feat(base): add the 'mesh' concern tag (NetBird agent, ADR-016).
Task 2: base mesh concern — defaults + tasks + include + README
Files: roles/base/defaults/main.yml, roles/base/tasks/mesh.yml (create), roles/base/tasks/main.yml, roles/base/README.md.
- Step 1: append the knobs to
roles/base/defaults/main.yml:
# NetBird mesh agent enrollment (ADR-016). Opt-in: default off so applying `base` to a
# host not (yet) on the mesh is a no-op for this concern. The live actions (apt install
# over the network, `netbird up` against the coordinator) are additionally gated by
# base__mesh_manage so Molecule can exercise the wiring without a coordinator.
base__mesh_enabled: false
base__mesh_manage: true
base__mesh_management_url: "https://netbird.askari.wingu.me"
base__mesh_setup_key: "{{ vault.netbird.setup_key }}" # noqa: var-naming[no-role-prefix] is NOT needed — this carries the base__ prefix
base__mesh_version: "0.72.4" # match the coordinator; confirmed installable in Task 1
- Step 2: create
roles/base/tasks/mesh.yml(use the Task-1-verified repo URL/key/pin; the values below are the expected ones to confirm):
---
# NetBird agent enrollment (ADR-016). Additive only — no firewall change here.
- name: Ensure /etc/apt/keyrings exists
ansible.builtin.file:
path: /etc/apt/keyrings
state: directory
mode: "0755"
tags: [mesh]
- name: Add the NetBird APT GPG key
ansible.builtin.get_url:
url: https://pkgs.netbird.io/debian/public.key # confirm in Task 1
dest: /etc/apt/keyrings/netbird.asc
mode: "0644"
when: base__mesh_manage | bool
tags: [mesh]
- name: Add the NetBird APT repository
ansible.builtin.apt_repository:
repo: >-
deb [signed-by=/etc/apt/keyrings/netbird.asc]
https://pkgs.netbird.io/debian stable main # confirm in Task 1
filename: netbird
state: present
when: base__mesh_manage | bool
tags: [mesh]
- name: Install the NetBird agent (pinned)
ansible.builtin.apt:
name: "netbird={{ base__mesh_version }}" # confirm pin syntax in Task 1
state: present
update_cache: true
when: base__mesh_manage | bool
tags: [mesh]
- name: Check current NetBird connection status
ansible.builtin.command: netbird status
register: _netbird_status
changed_when: false
failed_when: false
when: base__mesh_manage | bool
tags: [mesh]
- name: Enrol this host in the mesh
ansible.builtin.command: >-
netbird up
--management-url {{ base__mesh_management_url }}
--setup-key {{ base__mesh_setup_key }}
register: _netbird_up
changed_when: _netbird_up.rc == 0
when:
- base__mesh_manage | bool
- "'Management: Connected' not in (_netbird_status.stdout | default(''))" # confirm string in Task 1
no_log: true # setup key is on the argv
tags: [mesh]
- Step 3: in
roles/base/tasks/main.yml, add the include (after the existing concerns), gated bybase__mesh_enabled:
- name: NetBird mesh enrollment
ansible.builtin.include_tasks:
file: mesh.yml
apply:
tags: [mesh]
when: base__mesh_enabled | bool
tags: [mesh]
- Step 4: document the concern in
roles/base/README.md(purpose; thebase__mesh_*knobs table; that it is additive/no-firewall; that the setup key comes fromvault.netbird.setup_key; theenabled/managegating). - Step 5:
make lint→ 0 failures. Commitfeat(base): NetBird agent enrollment concern (mesh).
Task 3: Molecule coverage
Files: roles/base/molecule/default/converge.yml, roles/base/molecule/default/verify.yml.
The concern is install + a daemon command needing a live coordinator, so the hermetic Molecule surface is thin (the known "render-only misses the real call" gotcha). Molecule proves: (a) enabling mesh with
manage: falsedoes not break the base converge and is idempotent; (b)base__mesh_enabled: false(the default, already exercised by the existing firewall test) is a clean no-op. Full install+enrol is proven live in Task 6.
- Step 1: in
converge.ymladd tovars::
base__mesh_enabled: true
base__mesh_manage: false # skip network/daemon actions
base__mesh_setup_key: "dummy-molecule-key"
- Step 2: in
verify.ymladd a task asserting the concern is a clean no-op undermanage: false—netbirdis NOT installed andwt0does not exist (since all live actions are gated off):
- name: Confirm mesh manage=false did not install/enrol
ansible.builtin.command: which netbird
register: _nb
changed_when: false
failed_when: false
- name: Assert netbird absent under manage=false
ansible.builtin.assert:
that:
- _nb.rc != 0
fail_msg: "netbird should not be installed when base__mesh_manage is false"
- Step 3:
make test ROLE=base→ converge + idempotence + verify pass (failed=0). The existing firewall assertions still pass (mesh vars don't affect them). - Step 4: commit
test(base): molecule coverage for the mesh concern (manage-off no-op).
Task 4: Vault stub + per-host opt-in
- Step 1 (vault — needs
rbwunlocked):make decrypt FILE=inventories/production/group_vars/all/vault.yml; add undervault.netbird(alongsideauth_secret/datastore_key):
# Reusable, scoped (group "boma-hosts"), expiring NetBird setup key. Mint it in the
# dashboard (Setup Keys) AFTER the first-boot /setup admin exists. Consumed by the
# base 'mesh' concern. CHANGEME until the operator supplies it via `make edit-vault`.
setup_key: CHANGEME
make encrypt FILE=...; make check-vault → confirms structure + lists the setup_key CHANGEME.
- Step 2: set the opt-in. In
inventories/production/group_vars/control/vars.ymladdbase__mesh_enabled: true(ubongo). Createinventories/production/group_vars/offsite_hosts/vars.yml:
---
# askari is a NetBird peer as well as the coordinator host (ADR-016).
base__mesh_enabled: true
- Step 3:
make lint→ 0 failures. Commitfeat(base): vault setup_key stub + enable mesh on ubongo + askari.
Task 5: Operator handoff — first-boot admin + setup key (GATED, operator does this)
Nothing here is automatable — the agent cannot create a dashboard admin or mint a key.
- Step 1 (operator): browse
https://netbird.askari.wingu.me, complete the one-time/setupto create the admin user, log in. - Step 2 (operator): create a reusable setup key, scoped to auto-assign peers to a
boma-hostsgroup, with an expiry. Copy the key value. - Step 3 (operator):
make edit-vault→ replacevault.netbird.setup_key'sCHANGEMEwith the real key →:wq(re-encrypts) →make check-vaultshows no outstanding CHANGEME. The key never enters the chat. - Step 4: no repo commit beyond the (already-encrypted) vault, which is unchanged on disk structure.
Task 6: Enrol ubongo + askari (GATED, live — needs Task 5 done + rbw unlocked)
- Step 1:
make check PLAYBOOK=site LIMIT=askari TAGS=mesh— review (askari isansible-user managed; cleaner first target than the control node). Thenmake deploy PLAYBOOK=site LIMIT=askari TAGS=mesh. - Step 2: verify on askari:
netbird statusshowsManagement: Connected;ip link show wt0exists. (Agent coexists with the coordinator container; it reaches the coordinator via the public URL.) - Step 3:
make check PLAYBOOK=site LIMIT=ubongo TAGS=mesh— review. Note: ubongo is managed assjatwithbecome: true(same pathdev_envused viaplaybooks/workstation.yml); confirmsjatsudo works (the run will prompt/fail clearly if a become password is needed). Thenmake deploy PLAYBOOK=site LIMIT=ubongo TAGS=mesh. - Step 4: verify the mesh link from ubongo:
netbird statusshowsubongoconnected and listsaskarias a peer; ping askari's NetBird (100.x) address. If the pinned NetBird is NOT allow-by-default (Task 1, Step 1), add one minimal dashboard policy permitting the admin group →ubongoSSH (or temporarily the default policy) so Task 7 can connect. - Step 5: no repo commit (host state).
Task 7: Enrol the road-warrior clients → goal lands (operator)
- Step 1 (operator): install the NetBird client on
mamba+ the work laptop; log in via the dashboard (Dex SSO) so they join the mesh. - Step 2 (operator): from a laptop (anywhere),
ssh sjat@<ubongo-netbird-ip>(or the mesh hostname) — connection succeeds. ← the mobile-access goal lands here. - Step 3: confirm with the operator that remote access works end-to-end.
Task 8: Docs
- Step 1:
STATUS.md— move "NetBird agent enrollment inbase" to built + applied (ubongo + askari enrolled; reachability achieved). Note themeshconcern + opt-in. ubongo row: mesh-enrolled (its other base concerns still pending). askari row: NetBird peer. - Step 2:
docs/ROADMAP.md— M5 ✅ DONE; Phase 1 (remote access) complete. Next: the Procurement gate (/capacity-review→ buy cluster hardware). Record the deferred "mesh hardening" follow-on (ubongo nftables default-deny + NetBird ACL tightening + askari SSH→wt0). - Step 3:
docs/FRICTION.md— add a signal: a docs-only commit still tripped therbw-locked pre-commit guard (2026-06-17), although the 2026-06-10 kaizen fix was meant to let docs-/config-only commits through without vault — the hook scoping or a blanket guard needs a look. - Step 4:
make lint; commitdocs: M5 done — Phase 1 remote access complete.
Self-Review (completed)
- Spec coverage:
meshconcern (spec §1) → Tasks 1–3; vault stub (spec §2) → Task 4; ubongo+askari enrol (spec §3) → Tasks 4,6; laptops (spec §3) → Task 7; reachability via default policy (spec §4) → Task 6 step 4; deferred hardening (spec §6) → recorded in Task 8; operator handoff (spec) → Task 5. Testing (spec) → Task 3 (hermetic) + Task 6 (live). All covered. - Placeholder scan: the "confirm in Task 1" markers are ADR-014 verification points executed in Task 1 (the repo URL/key/pin/status-string), not vague TODOs — Task 2's code carries the expected values to confirm, matching how M4a/M4b pinned versions in-plan.
- Consistency:
base__mesh_enabled(opt-in) vsbase__mesh_manage(test gate) used consistently across defaults, tasks, include, converge, and the no-op assertion;vault.netbird.setup_keymatches between defaults, vault stub, and Task 5;meshtag added (Task 1) before it is used (Task 2). - Risk: the only live risk is Task 6 on the control node — mitigated because the
meshconcern makes no firewall change (SSH stays open on all paths), askari is enrolled first as the lower-risk rehearsal, and the host nftables lockdown is explicitly out of scope.