From 65cf20a993e75f188954daee0414d59c06e60d52 Mon Sep 17 00:00:00 2001 From: sjat Date: Sun, 14 Jun 2026 17:19:21 +0200 Subject: [PATCH] =?UTF-8?q?docs(spec):=20M4=20=E2=80=94=20NetBird=20coordi?= =?UTF-8?q?nator=20on=20askari=20+=20Caddy=20reverse=20proxy?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Caddy becomes boma's standard reverse proxy (amends the soft Traefik assumption; new ADR) with Gandi DNS-01 certs (custom xcaddy image, reuses vault.gandi.pat) — the only cert path for mesh/LAN-only services. NetBird self-hosted in external-proxy mode (embedded Dex), compose rendered from boma templates (ADR-004/013). Three roles: docker_host (first real content), reverse_proxy (new, Caddy), netbird (first service role w/ full ADR-004 standard files). Firewall + DNS amendments; backup execution deferred (fisi). caddy-dns/gandi + NetBird self-host facts verified. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...026-06-14-netbird-coordinator-m4-design.md | 120 ++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-14-netbird-coordinator-m4-design.md diff --git a/docs/superpowers/specs/2026-06-14-netbird-coordinator-m4-design.md b/docs/superpowers/specs/2026-06-14-netbird-coordinator-m4-design.md new file mode 100644 index 0000000..3eff213 --- /dev/null +++ b/docs/superpowers/specs/2026-06-14-netbird-coordinator-m4-design.md @@ -0,0 +1,120 @@ +# Design — NetBird coordinator on askari + Caddy reverse proxy (M4) + +- **Date:** 2026-06-14 +- **Status:** Draft → straight to plan (per the standing skip-the-spec-review-gate agreement) +- **Roadmap milestone:** M4 (`docs/ROADMAP.md`) +- **Implements:** ADR-016 (NetBird coordinator self-hosted on askari), ADR-004 (first + service role) +- **Establishes:** a new **ADR — boma's reverse proxy is Caddy** (amends the soft Traefik + assumption in the roadmap/ADR-017 prose) + +--- + +## Problem + +The NetBird mesh control plane (ADR-016) must run on askari so ubongo + road-warrior +laptops can enrol (M5) and reach ubongo from anywhere. This is also boma's **first real +service role** (ADR-004) and its **first reverse proxy** — so M4 sets two precedents: +the service-role pattern, and Caddy as boma's standard reverse proxy. + +## Decisions (as settled) + +1. **Caddy is boma's standard reverse proxy** (replaces the soft Traefik assumption — no + ADR ever pinned Traefik). Rationale: boma renders all config from Ansible templates + (ADR-004), so Traefik's dynamic Docker-label discovery is wasted; Caddy's templated + Caddyfile + automatic HTTPS fits the "render from the catalog" model; far simpler for a + solo operator; `forward_auth` to Authentik later keeps the auth story. → small new ADR. +2. **Caddy + Gandi DNS-01** (not HTTP-01). boma's services are mostly **mesh/LAN-only with + no public DNS record**, and you cannot HTTP-01 an unexposed host — DNS-01 is the only + cert path for them (the reason M1 built Gandi DNS-01). One mechanism fleet-wide; reuses + `vault.gandi.pat`. Cost: a **custom Caddy image** (`xcaddy` + `caddy-dns/gandi`) — fits + boma's "build our own images" pattern (the Molecule image). +3. **NetBird in external-reverse-proxy mode** — disable its bundled Traefik; boma's Caddy + terminates TLS for `netbird.askari.wingu.me` and proxies to the NetBird containers. + Embedded **Dex** IdP (ADR-016). The compose + server config are **rendered from boma + Jinja templates** (ADR-004 + ADR-013 translate-don't-transplant), based on NetBird's + current self-host reference read at implementation time. +4. **Three roles, applied to askari (`offsite_hosts`):** + - **`docker_host`** — first real content: install Docker engine + compose plugin, + version-pinned (ADR-011). (Cluster daemon-hardening + `nftables.d` integration stay + deferred to the cluster.) + - **`reverse_proxy`** (new) — the custom Caddy image + a `Caddyfile` rendered from route + data + `.env` with `GANDI_BEARER_TOKEN={{ vault.gandi.pat }}`. boma's standard proxy; + generalises to the cluster later (not built now). + - **`netbird`** (new) — **boma's first service role**: renders the NetBird compose + + server config + `.env` from vault; the full ADR-004 standard files. +5. **Firewall:** amend the M2 Hetzner Cloud Firewall (TF `offsite`) to open **80/443 TCP + + 3478 UDP** (NetBird's public ports). SSH-from-ubongo stays. +6. **DNS:** add `netbird.askari.wingu.me` → askari's IP via `public_dns` (M1 role). +7. **Standard service-role files authored, execution deferred.** SECURITY/VERIFY/ACCESS/ + BACKUP.md written for `netbird` (the precedent), but `/verify-service` (playwright) and + `fisi` backup don't exist yet — BACKUP.md records the datastore + an **accepted risk** + that off-site backup is pending; VERIFY.md is authored, run later. +8. **Setup keys are an M5 artifact** (created post-deploy via the dashboard/API). M4 stubs + `vault.netbird.setup_key: CHANGEME` (the placeholder convention) for M5 to fill. + +## Verified facts (ADR-014) + +> verified: caddy-dns/gandi v1.1.0 (2025-07) · module `dns.providers.gandi`, `xcaddy` +> build, PAT via `GANDI_BEARER_TOKEN`, `tls { dns gandi {env.GANDI_BEARER_TOKEN} }` · +> WebFetch github.com/caddy-dns/gandi · 2026-06-14 +> verified: NetBird self-host · Docker Compose (management + signal + relay + coturn + +> dashboard), embedded Dex, ports 80/443 TCP + 3478 UDP, supports an external reverse +> proxy · WebFetch docs.netbird.io/selfhosted · 2026-06-14 +> to verify in the plan: exact NetBird compose/`config.yaml`/`dashboard.env` schema for +> the pinned version, the external-proxy config knobs, and which secrets are +> role-generated vs operator-supplied. + +## Architecture & data flow + +``` +road-warrior / ubongo ──TLS──> Caddy (askari:443, netbird.askari.wingu.me) + │ cert: ACME DNS-01 via Gandi (vault.gandi.pat) + └─> NetBird dashboard + management/signal (HTTP, internal) +NetBird agents ──UDP 3478──> Coturn (STUN/TURN) ; ──relay──> relay +``` +- All containers on askari via Docker Compose (rendered by Ansible). +- Caddy and NetBird share a Docker network; only Caddy (80/443) + Coturn (3478) face the + internet (Hetzner Cloud Firewall + the container port mapping). + +## Roles (units, each testable) + +- `docker_host` — `tasks/main.yml`: add Docker apt repo (pinned), install + `docker-ce` + `docker-compose-plugin`, enable the service. Molecule: install + `docker + --version`. (Tag `packages`/role-name.) +- `reverse_proxy` — custom image (`.docker/caddy-gandi/Dockerfile`, `xcaddy` + + `caddy-dns/gandi`, built/pushed like the Molecule image); `templates/{docker-compose, + Caddyfile,env}.j2`; route data in `group_vars` (`reverse_proxy__routes`). Molecule: + render + `caddy validate`. +- `netbird` — `templates/{docker-compose.yml,config.yaml,dashboard.env,...}.j2` rendered + from `netbird__*` + vault; the ADR-004 standard files. Deploy mechanics per ADR-004. + +## Testing + +- **Molecule** per role where it fits (docker_host: install; reverse_proxy: `caddy + validate` on the rendered Caddyfile). NetBird's full stack is heavy for a container — + rely on **live verification on askari** (compose up; `curl -sI https://netbird.askari.wingu.me` + → 200 + valid cert; dashboard loads; `docker compose ps` healthy). +- **Live (gated, on askari):** deploy the three roles; verify the cert issues via DNS-01, + the dashboard is reachable over TLS, and the NetBird services are healthy. (Enrolment is + M5.) + +## Scope boundaries — what M4 is NOT + +- **Not** enrolment (ubongo + laptops) or narrowing SSH to `wt0` — **M5**. +- **Not** the cluster reverse proxy / Authentik forward-auth — **Phase 2** (the + `reverse_proxy` role is built to generalise, but only askari/NetBird is wired now). +- **Not** off-site backup execution of the datastore — pending `fisi` (ADR-022); recorded + as an accepted risk with BACKUP.md authored. +- **Not** auditd/CIS, host firewall on askari (still M5/Phase 2). + +## Open items (resolve in the plan) + +- Pin the NetBird version + read its current self-host compose/config; pin Caddy + + `caddy-dns/gandi` versions; pin Docker CE. +- Decide which `netbird` secrets are **role-generated** (turn password, dex secrets — via + `community.general.random_string`/lookups, persisted to vault) vs operator-supplied + (none expected beyond the M5 setup key). +- Confirm the custom Caddy image build/host (local build vs the Forgejo registry, like the + Molecule image). +- `netbird.askari.wingu.me` as an A (to askari's IP) vs CNAME to `askari.wingu.me`.