# Design — NetBird coordinator on askari + Caddy reverse proxy (M4) - **Date:** 2026-06-14 - **Status:** Draft → straight to plan (per the standing skip-the-spec-review-gate agreement) - **Roadmap milestone:** M4 (`docs/ROADMAP.md`) - **Implements:** ADR-016 (NetBird coordinator self-hosted on askari), ADR-004 (first service role) - **Establishes:** a new **ADR — boma's reverse proxy is Caddy** (amends the soft Traefik assumption in the roadmap/ADR-017 prose) --- ## Problem The NetBird mesh control plane (ADR-016) must run on askari so ubongo + road-warrior laptops can enrol (M5) and reach ubongo from anywhere. This is also boma's **first real service role** (ADR-004) and its **first reverse proxy** — so M4 sets two precedents: the service-role pattern, and Caddy as boma's standard reverse proxy. ## Decisions (as settled) 1. **Caddy is boma's standard reverse proxy** (replaces the soft Traefik assumption — no ADR ever pinned Traefik). Rationale: boma renders all config from Ansible templates (ADR-004), so Traefik's dynamic Docker-label discovery is wasted; Caddy's templated Caddyfile + automatic HTTPS fits the "render from the catalog" model; far simpler for a solo operator; `forward_auth` to Authentik later keeps the auth story. → small new ADR. 2. **Caddy + Gandi DNS-01** (not HTTP-01). boma's services are mostly **mesh/LAN-only with no public DNS record**, and you cannot HTTP-01 an unexposed host — DNS-01 is the only cert path for them (the reason M1 built Gandi DNS-01). One mechanism fleet-wide; reuses `vault.gandi.pat`. Cost: a **custom Caddy image** (`xcaddy` + `caddy-dns/gandi`) — fits boma's "build our own images" pattern (the Molecule image). 3. **NetBird in external-reverse-proxy mode** — disable its bundled Traefik; boma's Caddy terminates TLS for `netbird.askari.wingu.me` and proxies to the NetBird containers. Embedded **Dex** IdP (ADR-016). The compose + server config are **rendered from boma Jinja templates** (ADR-004 + ADR-013 translate-don't-transplant), based on NetBird's current self-host reference read at implementation time. 4. **Three roles, applied to askari (`offsite_hosts`):** - **`docker_host`** — first real content: install Docker engine + compose plugin, version-pinned (ADR-011). (Cluster daemon-hardening + `nftables.d` integration stay deferred to the cluster.) - **`reverse_proxy`** (new) — the custom Caddy image + a `Caddyfile` rendered from route data + `.env` with `GANDI_BEARER_TOKEN={{ vault.gandi.pat }}`. boma's standard proxy; generalises to the cluster later (not built now). - **`netbird`** (new) — **boma's first service role**: renders the NetBird compose + server config + `.env` from vault; the full ADR-004 standard files. 5. **Firewall:** amend the M2 Hetzner Cloud Firewall (TF `offsite`) to open **80/443 TCP + 3478 UDP** (NetBird's public ports). SSH-from-ubongo stays. 6. **DNS:** add `netbird.askari.wingu.me` → askari's IP via `public_dns` (M1 role). 7. **Standard service-role files authored, execution deferred.** SECURITY/VERIFY/ACCESS/ BACKUP.md written for `netbird` (the precedent), but `/verify-service` (playwright) and `fisi` backup don't exist yet — BACKUP.md records the datastore + an **accepted risk** that off-site backup is pending; VERIFY.md is authored, run later. 8. **Setup keys are an M5 artifact** (created post-deploy via the dashboard/API). M4 stubs `vault.netbird.setup_key: CHANGEME` (the placeholder convention) for M5 to fill. ## Verified facts (ADR-014) > verified: caddy-dns/gandi v1.1.0 (2025-07) · module `dns.providers.gandi`, `xcaddy` > build, PAT via `GANDI_BEARER_TOKEN`, `tls { dns gandi {env.GANDI_BEARER_TOKEN} }` · > WebFetch github.com/caddy-dns/gandi · 2026-06-14 > verified: NetBird self-host · Docker Compose (management + signal + relay + coturn + > dashboard), embedded Dex, ports 80/443 TCP + 3478 UDP, supports an external reverse > proxy · WebFetch docs.netbird.io/selfhosted · 2026-06-14 > to verify in the plan: exact NetBird compose/`config.yaml`/`dashboard.env` schema for > the pinned version, the external-proxy config knobs, and which secrets are > role-generated vs operator-supplied. ## Architecture & data flow ``` road-warrior / ubongo ──TLS──> Caddy (askari:443, netbird.askari.wingu.me) │ cert: ACME DNS-01 via Gandi (vault.gandi.pat) └─> NetBird dashboard + management/signal (HTTP, internal) NetBird agents ──UDP 3478──> Coturn (STUN/TURN) ; ──relay──> relay ``` - All containers on askari via Docker Compose (rendered by Ansible). - Caddy and NetBird share a Docker network; only Caddy (80/443) + Coturn (3478) face the internet (Hetzner Cloud Firewall + the container port mapping). ## Roles (units, each testable) - `docker_host` — `tasks/main.yml`: add Docker apt repo (pinned), install `docker-ce` + `docker-compose-plugin`, enable the service. Molecule: install + `docker --version`. (Tag `packages`/role-name.) - `reverse_proxy` — custom image (`.docker/caddy-gandi/Dockerfile`, `xcaddy` + `caddy-dns/gandi`, built/pushed like the Molecule image); `templates/{docker-compose, Caddyfile,env}.j2`; route data in `group_vars` (`reverse_proxy__routes`). Molecule: render + `caddy validate`. - `netbird` — `templates/{docker-compose.yml,config.yaml,dashboard.env,...}.j2` rendered from `netbird__*` + vault; the ADR-004 standard files. Deploy mechanics per ADR-004. ## Testing - **Molecule** per role where it fits (docker_host: install; reverse_proxy: `caddy validate` on the rendered Caddyfile). NetBird's full stack is heavy for a container — rely on **live verification on askari** (compose up; `curl -sI https://netbird.askari.wingu.me` → 200 + valid cert; dashboard loads; `docker compose ps` healthy). - **Live (gated, on askari):** deploy the three roles; verify the cert issues via DNS-01, the dashboard is reachable over TLS, and the NetBird services are healthy. (Enrolment is M5.) ## Scope boundaries — what M4 is NOT - **Not** enrolment (ubongo + laptops) or narrowing SSH to `wt0` — **M5**. - **Not** the cluster reverse proxy / Authentik forward-auth — **Phase 2** (the `reverse_proxy` role is built to generalise, but only askari/NetBird is wired now). - **Not** off-site backup execution of the datastore — pending `fisi` (ADR-022); recorded as an accepted risk with BACKUP.md authored. - **Not** auditd/CIS, host firewall on askari (still M5/Phase 2). ## Open items (resolve in the plan) - Pin the NetBird version + read its current self-host compose/config; pin Caddy + `caddy-dns/gandi` versions; pin Docker CE. - Decide which `netbird` secrets are **role-generated** (turn password, dex secrets — via `community.general.random_string`/lookups, persisted to vault) vs operator-supplied (none expected beyond the M5 setup key). - Confirm the custom Caddy image build/host (local build vs the Forgejo registry, like the Molecule image). - `netbird.askari.wingu.me` as an A (to askari's IP) vs CNAME to `askari.wingu.me`.