Compare commits
3 commits
94dd6da14c
...
4cfc3cddd5
| Author | SHA1 | Date | |
|---|---|---|---|
| 4cfc3cddd5 | |||
| 55776fb03c | |||
| 4142bb15f8 |
3 changed files with 378 additions and 0 deletions
|
|
@ -56,6 +56,19 @@ _(append new raw signals here; the next kaizen review consumes them)_
|
|||
`not ansible_check_mode` (clean "skipped" in dry-run; compose can't be meaningfully
|
||||
dry-run before first deploy anyway), OR document the one-time expected failure. Decide one.
|
||||
|
||||
- `[recurring]` **Re-asked the operator about settled defaults — push + execution mode**
|
||||
(2026-06-17): at the M5 plan handoff I asked (a) whether to push to origin and (b) which
|
||||
execution mode (subagent-driven vs inline) — both already settled: CLAUDE.md says push to
|
||||
`origin` often (off-machine backup), and TODO 10.5 / the standing agreement is "always
|
||||
subagent-driven" (there's even `guard-execution-mode-menu.sh`). Same shape as the 5×
|
||||
"execution-mode menu asked AGAIN" ledger entries — but this time the ask was my own
|
||||
free-form prose ("want those pushed now?", "which execution approach?"), which the
|
||||
existing menu-text matcher does NOT catch (it keys on the writing-plans menu's literal
|
||||
text). → the gap is that the guard only matches that literal menu; free-form re-asks slip
|
||||
through. Candidate: widen the Stop-hook matcher to also flag prose re-asks of
|
||||
push-vs-not / subagent-vs-inline, since prose reminders have already failed this many
|
||||
times. Default behaviour: **push as backup and proceed subagent-driven without asking.**
|
||||
|
||||
- `[recurring]` **ADRs claim cross-doc reconciliation they didn't actually perform**
|
||||
(2026-06-14): ADR-024's Status + Consequences asserted "ADR-017 prose that mentioned
|
||||
Traefik is updated to read Caddy" — but ADR-008/017/019 + CAPABILITIES still said
|
||||
|
|
|
|||
234
docs/superpowers/plans/2026-06-17-m5-mesh-enrollment.md
Normal file
234
docs/superpowers/plans/2026-06-17-m5-mesh-enrollment.md
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
# M5 — Mesh enrollment (NetBird agents) Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax.
|
||||
|
||||
**Goal:** `ubongo` reachable from anywhere over the NetBird mesh — enrol NetBird agents on `ubongo` + `askari` via a new opt-in `base` `mesh` concern; the operator enrols the laptops.
|
||||
|
||||
**Architecture:** A new `base` concern (`roles/base/tasks/mesh.yml`) installs a pinned NetBird agent and runs `netbird up` with a reusable scoped setup key from vault. Gated by `base__mesh_enabled` (per-host opt-in) and `base__mesh_manage` (skips network/daemon actions for Molecule). **No firewall change** — enrollment is additive (`wt0` comes up, SSH keeps listening), so there is zero lockout risk. The host nftables default-deny + NetBird ACL tightening are a separate, deferred follow-on.
|
||||
|
||||
**Tech Stack:** NetBird agent (apt, pinned), Ansible (`base` role), Molecule, the M4b coordinator at `https://netbird.askari.wingu.me`.
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-06-17-m5-mesh-enrollment-design.md`
|
||||
|
||||
**Execution context:** Tasks 1–4 author + commit (need nothing from the operator). **Task 5 is an operator handoff** (dashboard `/setup` + mint key). **Task 6 applies live to `ubongo` + `askari`** (gated). Task 7 is operator-only (laptops). Task 8 docs.
|
||||
|
||||
---
|
||||
|
||||
## File structure
|
||||
|
||||
| File | Change | Responsibility |
|
||||
|---|---|---|
|
||||
| `tests/tags.yml` | modify | add the `mesh` concern to the closed tag vocabulary |
|
||||
| `roles/base/defaults/main.yml` | modify | `base__mesh_*` knobs |
|
||||
| `roles/base/tasks/mesh.yml` | **create** | the enrollment concern (install + `netbird up`) |
|
||||
| `roles/base/tasks/main.yml` | modify | include `mesh.yml` (gated, tagged) |
|
||||
| `roles/base/README.md` | modify | document the `mesh` concern + knobs |
|
||||
| `roles/base/molecule/default/converge.yml` | modify | enable mesh (manage off) + dummy key |
|
||||
| `roles/base/molecule/default/verify.yml` | modify | assert mesh wiring / no-op |
|
||||
| `inventories/production/group_vars/control/vars.yml` | modify | `base__mesh_enabled: true` (ubongo) |
|
||||
| `inventories/production/group_vars/offsite_hosts/vars.yml` | **create** | `base__mesh_enabled: true` (askari) |
|
||||
| `inventories/production/group_vars/all/vault.yml` | modify (vault) | `vault.netbird.setup_key: CHANGEME` |
|
||||
| `STATUS.md`, `docs/ROADMAP.md`, `docs/FRICTION.md` | modify | M5 done; deferred hardening; friction note |
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Verify + pin the NetBird agent; add the `mesh` tag
|
||||
|
||||
- [ ] **Step 1 (ADR-014 verification — record the answers):** confirm against current NetBird docs/repo (WebFetch `docs.netbird.io`, `pkgs.netbird.io`):
|
||||
- the **apt repo** URL + signing-key URL + suite/component (the install-script publishes an apt source — capture the exact `deb` line and key URL);
|
||||
- the **package name** (headless agent — expected `netbird`) and that **version `0.72.4`** (matching the coordinator) is installable, plus the apt **version-pin syntax**;
|
||||
- the exact **`netbird status`** output string that indicates an established management connection (for the idempotency guard — e.g. `Management: Connected`);
|
||||
- the **`netbird up`** flags (`--management-url`, `--setup-key`);
|
||||
- whether the pinned NetBird's **default peer policy is allow-by-default** (decides §Task 6 step 4). Record all of this in the commit message / a note block.
|
||||
- [ ] **Step 2:** add `mesh` to `tests/tags.yml` under `concerns:`:
|
||||
```yaml
|
||||
- mesh # NetBird agent enrollment (ADR-016)
|
||||
```
|
||||
- [ ] **Step 3:** `make lint` → expect `check-tags: OK` (an unused vocab entry is allowed; nothing references it yet). Expected: 0 failures.
|
||||
- [ ] **Step 4:** commit `feat(base): add the 'mesh' concern tag (NetBird agent, ADR-016)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 2: `base` `mesh` concern — defaults + tasks + include + README
|
||||
|
||||
**Files:** `roles/base/defaults/main.yml`, `roles/base/tasks/mesh.yml` (create), `roles/base/tasks/main.yml`, `roles/base/README.md`.
|
||||
|
||||
- [ ] **Step 1:** append the knobs to `roles/base/defaults/main.yml`:
|
||||
```yaml
|
||||
# NetBird mesh agent enrollment (ADR-016). Opt-in: default off so applying `base` to a
|
||||
# host not (yet) on the mesh is a no-op for this concern. The live actions (apt install
|
||||
# over the network, `netbird up` against the coordinator) are additionally gated by
|
||||
# base__mesh_manage so Molecule can exercise the wiring without a coordinator.
|
||||
base__mesh_enabled: false
|
||||
base__mesh_manage: true
|
||||
base__mesh_management_url: "https://netbird.askari.wingu.me"
|
||||
base__mesh_setup_key: "{{ vault.netbird.setup_key }}" # noqa: var-naming[no-role-prefix] is NOT needed — this carries the base__ prefix
|
||||
base__mesh_version: "0.72.4" # match the coordinator; confirmed installable in Task 1
|
||||
```
|
||||
- [ ] **Step 2:** create `roles/base/tasks/mesh.yml` (use the Task-1-verified repo URL/key/pin; the values below are the expected ones to confirm):
|
||||
```yaml
|
||||
---
|
||||
# NetBird agent enrollment (ADR-016). Additive only — no firewall change here.
|
||||
- name: Ensure /etc/apt/keyrings exists
|
||||
ansible.builtin.file:
|
||||
path: /etc/apt/keyrings
|
||||
state: directory
|
||||
mode: "0755"
|
||||
tags: [mesh]
|
||||
|
||||
- name: Add the NetBird APT GPG key
|
||||
ansible.builtin.get_url:
|
||||
url: https://pkgs.netbird.io/debian/public.key # confirm in Task 1
|
||||
dest: /etc/apt/keyrings/netbird.asc
|
||||
mode: "0644"
|
||||
when: base__mesh_manage | bool
|
||||
tags: [mesh]
|
||||
|
||||
- name: Add the NetBird APT repository
|
||||
ansible.builtin.apt_repository:
|
||||
repo: >-
|
||||
deb [signed-by=/etc/apt/keyrings/netbird.asc]
|
||||
https://pkgs.netbird.io/debian stable main # confirm in Task 1
|
||||
filename: netbird
|
||||
state: present
|
||||
when: base__mesh_manage | bool
|
||||
tags: [mesh]
|
||||
|
||||
- name: Install the NetBird agent (pinned)
|
||||
ansible.builtin.apt:
|
||||
name: "netbird={{ base__mesh_version }}" # confirm pin syntax in Task 1
|
||||
state: present
|
||||
update_cache: true
|
||||
when: base__mesh_manage | bool
|
||||
tags: [mesh]
|
||||
|
||||
- name: Check current NetBird connection status
|
||||
ansible.builtin.command: netbird status
|
||||
register: _netbird_status
|
||||
changed_when: false
|
||||
failed_when: false
|
||||
when: base__mesh_manage | bool
|
||||
tags: [mesh]
|
||||
|
||||
- name: Enrol this host in the mesh
|
||||
ansible.builtin.command: >-
|
||||
netbird up
|
||||
--management-url {{ base__mesh_management_url }}
|
||||
--setup-key {{ base__mesh_setup_key }}
|
||||
register: _netbird_up
|
||||
changed_when: _netbird_up.rc == 0
|
||||
when:
|
||||
- base__mesh_manage | bool
|
||||
- "'Management: Connected' not in (_netbird_status.stdout | default(''))" # confirm string in Task 1
|
||||
no_log: true # setup key is on the argv
|
||||
tags: [mesh]
|
||||
```
|
||||
- [ ] **Step 3:** in `roles/base/tasks/main.yml`, add the include (after the existing concerns), gated by `base__mesh_enabled`:
|
||||
```yaml
|
||||
- name: NetBird mesh enrollment
|
||||
ansible.builtin.include_tasks:
|
||||
file: mesh.yml
|
||||
apply:
|
||||
tags: [mesh]
|
||||
when: base__mesh_enabled | bool
|
||||
tags: [mesh]
|
||||
```
|
||||
- [ ] **Step 4:** document the concern in `roles/base/README.md` (purpose; the `base__mesh_*` knobs table; that it is additive/no-firewall; that the setup key comes from `vault.netbird.setup_key`; the `enabled`/`manage` gating).
|
||||
- [ ] **Step 5:** `make lint` → 0 failures. Commit `feat(base): NetBird agent enrollment concern (mesh)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Molecule coverage
|
||||
|
||||
**Files:** `roles/base/molecule/default/converge.yml`, `roles/base/molecule/default/verify.yml`.
|
||||
|
||||
> The concern is install + a daemon command needing a live coordinator, so the hermetic Molecule surface is thin (the known "render-only misses the real call" gotcha). Molecule proves: (a) enabling mesh with `manage: false` does not break the base converge and is idempotent; (b) `base__mesh_enabled: false` (the default, already exercised by the existing firewall test) is a clean no-op. Full install+enrol is proven live in Task 6.
|
||||
|
||||
- [ ] **Step 1:** in `converge.yml` add to `vars:`:
|
||||
```yaml
|
||||
base__mesh_enabled: true
|
||||
base__mesh_manage: false # skip network/daemon actions
|
||||
base__mesh_setup_key: "dummy-molecule-key"
|
||||
```
|
||||
- [ ] **Step 2:** in `verify.yml` add a task asserting the concern is a clean no-op under `manage: false` — `netbird` is NOT installed and `wt0` does not exist (since all live actions are gated off):
|
||||
```yaml
|
||||
- name: Confirm mesh manage=false did not install/enrol
|
||||
ansible.builtin.command: which netbird
|
||||
register: _nb
|
||||
changed_when: false
|
||||
failed_when: false
|
||||
- name: Assert netbird absent under manage=false
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- _nb.rc != 0
|
||||
fail_msg: "netbird should not be installed when base__mesh_manage is false"
|
||||
```
|
||||
- [ ] **Step 3:** `make test ROLE=base` → converge + idempotence + verify pass (`failed=0`). The existing firewall assertions still pass (mesh vars don't affect them).
|
||||
- [ ] **Step 4:** commit `test(base): molecule coverage for the mesh concern (manage-off no-op)`.
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Vault stub + per-host opt-in
|
||||
|
||||
- [ ] **Step 1 (vault — needs `rbw` unlocked):** `make decrypt FILE=inventories/production/group_vars/all/vault.yml`; add under `vault.netbird` (alongside `auth_secret`/`datastore_key`):
|
||||
```yaml
|
||||
# Reusable, scoped (group "boma-hosts"), expiring NetBird setup key. Mint it in the
|
||||
# dashboard (Setup Keys) AFTER the first-boot /setup admin exists. Consumed by the
|
||||
# base 'mesh' concern. CHANGEME until the operator supplies it via `make edit-vault`.
|
||||
setup_key: CHANGEME
|
||||
```
|
||||
`make encrypt FILE=...`; `make check-vault` → confirms structure + lists the `setup_key` CHANGEME.
|
||||
- [ ] **Step 2:** set the opt-in. In `inventories/production/group_vars/control/vars.yml` add `base__mesh_enabled: true` (ubongo). Create `inventories/production/group_vars/offsite_hosts/vars.yml`:
|
||||
```yaml
|
||||
---
|
||||
# askari is a NetBird peer as well as the coordinator host (ADR-016).
|
||||
base__mesh_enabled: true
|
||||
```
|
||||
- [ ] **Step 3:** `make lint` → 0 failures. Commit `feat(base): vault setup_key stub + enable mesh on ubongo + askari`.
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Operator handoff — first-boot admin + setup key (GATED, operator does this)
|
||||
|
||||
> Nothing here is automatable — the agent cannot create a dashboard admin or mint a key.
|
||||
|
||||
- [ ] **Step 1 (operator):** browse `https://netbird.askari.wingu.me`, complete the one-time `/setup` to create the admin user, log in.
|
||||
- [ ] **Step 2 (operator):** create a **reusable** setup key, **scoped** to auto-assign peers to a `boma-hosts` group, with an **expiry**. Copy the key value.
|
||||
- [ ] **Step 3 (operator):** `make edit-vault` → replace `vault.netbird.setup_key`'s `CHANGEME` with the real key → `:wq` (re-encrypts) → `make check-vault` shows no outstanding CHANGEME. The key never enters the chat.
|
||||
- [ ] **Step 4:** no repo commit beyond the (already-encrypted) vault, which is unchanged on disk structure.
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Enrol `ubongo` + `askari` (GATED, live — needs Task 5 done + `rbw` unlocked)
|
||||
|
||||
- [ ] **Step 1:** `make check PLAYBOOK=site LIMIT=askari TAGS=mesh` — review (askari is `ansible`-user managed; cleaner first target than the control node). Then `make deploy PLAYBOOK=site LIMIT=askari TAGS=mesh`.
|
||||
- [ ] **Step 2:** verify on askari: `netbird status` shows `Management: Connected`; `ip link show wt0` exists. (Agent coexists with the coordinator container; it reaches the coordinator via the public URL.)
|
||||
- [ ] **Step 3:** `make check PLAYBOOK=site LIMIT=ubongo TAGS=mesh` — review. Note: ubongo is managed as `sjat` with `become: true` (same path `dev_env` used via `playbooks/workstation.yml`); confirm `sjat` sudo works (the run will prompt/fail clearly if a become password is needed). Then `make deploy PLAYBOOK=site LIMIT=ubongo TAGS=mesh`.
|
||||
- [ ] **Step 4:** verify the mesh link from ubongo: `netbird status` shows `ubongo` connected and lists `askari` as a peer; ping askari's NetBird (`100.x`) address. If the pinned NetBird is NOT allow-by-default (Task 1, Step 1), add one minimal dashboard policy permitting the admin group → `ubongo` SSH (or temporarily the default policy) so Task 7 can connect.
|
||||
- [ ] **Step 5:** no repo commit (host state).
|
||||
|
||||
---
|
||||
|
||||
### Task 7: Enrol the road-warrior clients → goal lands (operator)
|
||||
|
||||
- [ ] **Step 1 (operator):** install the NetBird client on `mamba` + the work laptop; log in via the dashboard (Dex SSO) so they join the mesh.
|
||||
- [ ] **Step 2 (operator):** from a laptop (anywhere), `ssh sjat@<ubongo-netbird-ip>` (or the mesh hostname) — connection succeeds. **← the mobile-access goal lands here.**
|
||||
- [ ] **Step 3:** confirm with the operator that remote access works end-to-end.
|
||||
|
||||
---
|
||||
|
||||
### Task 8: Docs
|
||||
|
||||
- [ ] **Step 1:** `STATUS.md` — move "NetBird agent enrollment in `base`" to **built + applied** (ubongo + askari enrolled; reachability achieved). Note the `mesh` concern + opt-in. ubongo row: mesh-enrolled (its other base concerns still pending). askari row: NetBird peer.
|
||||
- [ ] **Step 2:** `docs/ROADMAP.md` — **M5 ✅ DONE**; Phase 1 (remote access) complete. Next: the **Procurement gate** (`/capacity-review` → buy cluster hardware). Record the deferred "mesh hardening" follow-on (ubongo nftables default-deny + NetBird ACL tightening + askari SSH→`wt0`).
|
||||
- [ ] **Step 3:** `docs/FRICTION.md` — add a signal: a **docs-only commit still tripped the `rbw`-locked pre-commit guard** (2026-06-17), although the 2026-06-10 kaizen fix was meant to let docs-/config-only commits through without vault — the hook scoping or a blanket guard needs a look.
|
||||
- [ ] **Step 4:** `make lint`; commit `docs: M5 done — Phase 1 remote access complete`.
|
||||
|
||||
---
|
||||
|
||||
## Self-Review (completed)
|
||||
|
||||
- **Spec coverage:** `mesh` concern (spec §1) → Tasks 1–3; vault stub (spec §2) → Task 4; ubongo+askari enrol (spec §3) → Tasks 4,6; laptops (spec §3) → Task 7; reachability via default policy (spec §4) → Task 6 step 4; deferred hardening (spec §6) → recorded in Task 8; operator handoff (spec) → Task 5. Testing (spec) → Task 3 (hermetic) + Task 6 (live). All covered.
|
||||
- **Placeholder scan:** the "confirm in Task 1" markers are ADR-014 verification points executed in Task 1 (the repo URL/key/pin/status-string), not vague TODOs — Task 2's code carries the expected values to confirm, matching how M4a/M4b pinned versions in-plan.
|
||||
- **Consistency:** `base__mesh_enabled` (opt-in) vs `base__mesh_manage` (test gate) used consistently across defaults, tasks, include, converge, and the no-op assertion; `vault.netbird.setup_key` matches between defaults, vault stub, and Task 5; `mesh` tag added (Task 1) before it is used (Task 2).
|
||||
- **Risk:** the only live risk is Task 6 on the control node — mitigated because the `mesh` concern makes **no firewall change** (SSH stays open on all paths), askari is enrolled first as the lower-risk rehearsal, and the host nftables lockdown is explicitly out of scope.
|
||||
131
docs/superpowers/specs/2026-06-17-m5-mesh-enrollment-design.md
Normal file
131
docs/superpowers/specs/2026-06-17-m5-mesh-enrollment-design.md
Normal file
|
|
@ -0,0 +1,131 @@
|
|||
# M5 — Mesh enrollment (NetBird agents) → mobile access · design
|
||||
|
||||
**Status:** Design (2026-06-17). Implements ROADMAP **M5**, the last milestone of Phase 1
|
||||
(remote access). Builds on M4b (the `netbird_coordinator` is live on `askari`). Design
|
||||
resolved by **ADR-016** (mesh, agent-per-host) and **ADR-021** (SSH ladder); this spec is
|
||||
the build-shaping for that decision. Next: `writing-plans`.
|
||||
|
||||
## Goal
|
||||
|
||||
`ubongo` reachable from anywhere over the self-hosted NetBird mesh — the Phase-1
|
||||
mobile-access goal. **Reachability only.** The host-firewall lockdown and NetBird
|
||||
ACL-tightening are deliberately **deferred** (see §6).
|
||||
|
||||
## Decisions (settled in brainstorming)
|
||||
|
||||
1. **Scope = reachability, not lockdown.** The goal needs only: agents enrolled + the
|
||||
laptops on the mesh + a peer policy permitting laptop→`ubongo`. `ubongo`'s SSH is
|
||||
already open, so reachability requires **no firewall change**. Applying the `base`
|
||||
nftables default-deny to `ubongo` is the lockout-risky part on the control node and is
|
||||
split into a follow-on (§6).
|
||||
2. **One reusable, scoped, expiring setup key.** A single reusable key in
|
||||
`vault.netbird.setup_key`, scoped to auto-assign peers to a `boma-hosts` group, with an
|
||||
expiry. `base` re-runs idempotently across hosts. Matches ADR-016's single vault path;
|
||||
blast radius is limited by scope + expiry + the fact that joining the mesh grants no
|
||||
access on its own (peer policy gates that). Rejected: per-host one-off ephemeral keys —
|
||||
more operator toil and they don't fit a single vault key for a re-runnable role.
|
||||
3. **`askari` is enrolled as a peer** (ADR-016: it runs the stack *and* is a peer). The
|
||||
agent coexists with the coordinator container on the same host. Enables later moving
|
||||
`askari`'s SSH off the Hetzner-firewall WAN allow onto `wt0`, and gives a host-to-host
|
||||
mesh link verifiable from `ubongo`.
|
||||
|
||||
## Architecture
|
||||
|
||||
### 1. New `base` concern: `mesh` (agent enrollment)
|
||||
|
||||
A new `roles/base/tasks/mesh.yml`, included from `base/tasks/main.yml` via
|
||||
`include_tasks` with `apply: { tags: [mesh] }` (the dynamic-include tag-propagation
|
||||
gotcha — see existing concerns), tagged `mesh`. A new `mesh` entry is added to the closed
|
||||
tag vocabulary in `tests/tags.yml`.
|
||||
|
||||
The concern:
|
||||
|
||||
- **installs a pinned NetBird agent** from the official NetBird apt repo (repo + key added
|
||||
like `docker_host` does for Docker; exact package + version **verified in the plan** per
|
||||
ADR-014). Version-pinned (ADR-011).
|
||||
- **enrolls idempotently:** run `netbird up --management-url {{ base__mesh_management_url }}
|
||||
--setup-key <key>` **only when** `netbird status` reports the host is not already
|
||||
connected (guard on a `command` check, `changed_when` accordingly). The setup key is
|
||||
passed with `no_log: true`.
|
||||
- **does NOT touch the host firewall.** Enrollment is purely additive: `wt0` comes up,
|
||||
`sshd` keeps listening on all interfaces exactly as today. No lockout risk in M5.
|
||||
|
||||
**Knobs (`base__mesh_*`, defaults in `roles/base/defaults/main.yml`):**
|
||||
|
||||
| Var | Default | Purpose |
|
||||
|---|---|---|
|
||||
| `base__mesh_enabled` | `false` | **Policy/opt-in gate.** `false` ⇒ the whole concern is skipped, so applying `base` to a host not ready to join the mesh is a no-op. Set `true` per host/group (`ubongo`, `askari`) to enrol. |
|
||||
| `base__mesh_manage` | `true` | **Test gate** for the live daemon step. `true` ⇒ run `netbird up`; Molecule sets `false` so the concern can be exercised without a real coordinator/key (mirrors `reverse_proxy__manage` / `netbird_coordinator__manage`). |
|
||||
| `base__mesh_management_url` | `https://netbird.askari.wingu.me` | The M4b coordinator. |
|
||||
| `base__mesh_setup_key` | `"{{ vault.netbird.setup_key }}"` | Reusable scoped key (vault). |
|
||||
| `base__mesh_version` | pinned (plan) | NetBird agent version (ADR-011). |
|
||||
|
||||
### 2. Vault
|
||||
|
||||
Add `vault.netbird.setup_key: CHANGEME` with a comment stating it is a **reusable, scoped
|
||||
(`boma-hosts`), expiring** setup key minted in the NetBird dashboard after first-boot
|
||||
`/setup`. The agent cannot mint it — the operator supplies it via `make edit-vault`.
|
||||
`make check-vault` lists the outstanding `CHANGEME` until then. `base/tasks/mesh.yml` wires
|
||||
to `{{ vault.netbird.setup_key }}`.
|
||||
|
||||
### 3. Enrollment scope
|
||||
|
||||
- **`ubongo`** — `base` `mesh` concern applied (tagged), bringing up `wt0`. Its other
|
||||
`base` concerns (`firewall`, `hardening`) stay unapplied — `TAGS=mesh` scopes the run to
|
||||
enrollment only, so no default-deny lands on the control node.
|
||||
- **`askari`** — `base` `mesh` concern applied; agent enrols against its own public
|
||||
coordinator URL and coexists with the coordinator container.
|
||||
- **`mamba` + work laptop** — **operator** installs the NetBird client and logs in via the
|
||||
dashboard (embedded Dex SSO). Not Ansible-managed; out of automation scope.
|
||||
|
||||
### 4. Reachability
|
||||
|
||||
M5 relies on NetBird's **default peer policy** for laptop→`ubongo` reachability. The plan
|
||||
**verifies the pinned version's default-policy behaviour** (ADR-014); if it is not
|
||||
allow-by-default, the plan adds one minimal policy permitting the admin group → `ubongo`
|
||||
SSH. ACL-tightening to default-deny + scoped policies (ADR-016 intent) is **deferred**
|
||||
(§6).
|
||||
|
||||
## Testing
|
||||
|
||||
- **Automated (I do, needs nothing from operator):** Molecule for the `base` `mesh`
|
||||
concern with `base__mesh_enabled: true`, `base__mesh_manage: false`, and a dummy
|
||||
`vault.netbird.setup_key` — so the install/enrol tasks are exercised but the live
|
||||
`netbird up` (which needs a real coordinator + key) is gated off. Note: this concern is
|
||||
install + a daemon command, so its render-only surface is thin (the "render-only tests
|
||||
miss the real call" gotcha) — Molecule asserts the enrol command is constructed
|
||||
correctly + idempotency guard works; full enrollment is proven in the live step below.
|
||||
Also assert `base__mesh_enabled: false` is a clean no-op. `make lint` (incl.
|
||||
`check-tags` for the new `mesh` tag).
|
||||
- **Live (gated, after the operator handoff):** apply `base` `TAGS=mesh` to `ubongo` +
|
||||
`askari`; verify `wt0` is up and the **`ubongo`↔`askari` mesh link** works from `ubongo`
|
||||
(both are peers I manage — e.g. `netbird status` shows the peer, ping the peer's mesh IP).
|
||||
- **Goal verification (operator):** from a laptop on the mesh, SSH `ubongo` over its
|
||||
NetBird/`wt0` address. This is the mobile-access goal landing.
|
||||
|
||||
## Operator handoff (the steps only the operator can do)
|
||||
|
||||
1. Dashboard `/setup` (one-time) → create the admin user.
|
||||
2. Mint a **reusable, scoped (`boma-hosts`), expiring** setup key → `make edit-vault` to
|
||||
replace the `CHANGEME` → re-encrypt. (`make check-vault` confirms.)
|
||||
3. Install the NetBird client on `mamba` + the work laptop, log in via the dashboard.
|
||||
4. Confirm SSH to `ubongo` over the mesh.
|
||||
|
||||
## Out of scope / deferred (the "mesh hardening" follow-on)
|
||||
|
||||
- **`base` nftables default-deny on `ubongo`** (SSH only on `wt0` + the
|
||||
`base__firewall_control_addr` LAN fallback, ADR-021/020). Built + dormant today; applying
|
||||
it to the control node is the lockout-risky step and gets its own deliberate change
|
||||
**after** the mesh path to `ubongo` is proven solid.
|
||||
- **NetBird ACL tightening** to default-deny + scoped per-group policies (ADR-016: admin
|
||||
peers → `srv`+`mgmt`, clients least-privilege). M5 uses the default policy.
|
||||
- **`askari` SSH onto `wt0`** (retiring the Hetzner-firewall WAN SSH allow) — enabled by
|
||||
`askari` now being a peer, but a separate change.
|
||||
|
||||
## Maps to
|
||||
|
||||
ADR-016 (mesh, agent-per-host, setup keys in vault), ADR-021 (SSH ladder — `wt0` primary +
|
||||
`ssh-from-control`; the lockdown that *uses* this is deferred), ADR-020 (host firewall —
|
||||
default-deny deferred), ADR-002 (security baseline), ADR-011 (version-pinned agent),
|
||||
ADR-004 (enrollment lives in `base`, not a new role), ADR-014 (verify agent
|
||||
version/package + default-policy behaviour in the plan).
|
||||
Loading…
Add table
Reference in a new issue