Caddy becomes boma's standard reverse proxy (amends the soft Traefik assumption; new ADR) with Gandi DNS-01 certs (custom xcaddy image, reuses vault.gandi.pat) — the only cert path for mesh/LAN-only services. NetBird self-hosted in external-proxy mode (embedded Dex), compose rendered from boma templates (ADR-004/013). Three roles: docker_host (first real content), reverse_proxy (new, Caddy), netbird (first service role w/ full ADR-004 standard files). Firewall + DNS amendments; backup execution deferred (fisi). caddy-dns/gandi + NetBird self-host facts verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7 KiB
Design — NetBird coordinator on askari + Caddy reverse proxy (M4)
- Date: 2026-06-14
- Status: Draft → straight to plan (per the standing skip-the-spec-review-gate agreement)
- Roadmap milestone: M4 (
docs/ROADMAP.md) - Implements: ADR-016 (NetBird coordinator self-hosted on askari), ADR-004 (first service role)
- Establishes: a new ADR — boma's reverse proxy is Caddy (amends the soft Traefik assumption in the roadmap/ADR-017 prose)
Problem
The NetBird mesh control plane (ADR-016) must run on askari so ubongo + road-warrior laptops can enrol (M5) and reach ubongo from anywhere. This is also boma's first real service role (ADR-004) and its first reverse proxy — so M4 sets two precedents: the service-role pattern, and Caddy as boma's standard reverse proxy.
Decisions (as settled)
- Caddy is boma's standard reverse proxy (replaces the soft Traefik assumption — no
ADR ever pinned Traefik). Rationale: boma renders all config from Ansible templates
(ADR-004), so Traefik's dynamic Docker-label discovery is wasted; Caddy's templated
Caddyfile + automatic HTTPS fits the "render from the catalog" model; far simpler for a
solo operator;
forward_authto Authentik later keeps the auth story. → small new ADR. - Caddy + Gandi DNS-01 (not HTTP-01). boma's services are mostly mesh/LAN-only with
no public DNS record, and you cannot HTTP-01 an unexposed host — DNS-01 is the only
cert path for them (the reason M1 built Gandi DNS-01). One mechanism fleet-wide; reuses
vault.gandi.pat. Cost: a custom Caddy image (xcaddy+caddy-dns/gandi) — fits boma's "build our own images" pattern (the Molecule image). - NetBird in external-reverse-proxy mode — disable its bundled Traefik; boma's Caddy
terminates TLS for
netbird.askari.wingu.meand proxies to the NetBird containers. Embedded Dex IdP (ADR-016). The compose + server config are rendered from boma Jinja templates (ADR-004 + ADR-013 translate-don't-transplant), based on NetBird's current self-host reference read at implementation time. - Three roles, applied to askari (
offsite_hosts):docker_host— first real content: install Docker engine + compose plugin, version-pinned (ADR-011). (Cluster daemon-hardening +nftables.dintegration stay deferred to the cluster.)reverse_proxy(new) — the custom Caddy image + aCaddyfilerendered from route data +.envwithGANDI_BEARER_TOKEN={{ vault.gandi.pat }}. boma's standard proxy; generalises to the cluster later (not built now).netbird(new) — boma's first service role: renders the NetBird compose + server config +.envfrom vault; the full ADR-004 standard files.
- Firewall: amend the M2 Hetzner Cloud Firewall (TF
offsite) to open 80/443 TCP + 3478 UDP (NetBird's public ports). SSH-from-ubongo stays. - DNS: add
netbird.askari.wingu.me→ askari's IP viapublic_dns(M1 role). - Standard service-role files authored, execution deferred. SECURITY/VERIFY/ACCESS/
BACKUP.md written for
netbird(the precedent), but/verify-service(playwright) andfisibackup don't exist yet — BACKUP.md records the datastore + an accepted risk that off-site backup is pending; VERIFY.md is authored, run later. - Setup keys are an M5 artifact (created post-deploy via the dashboard/API). M4 stubs
vault.netbird.setup_key: CHANGEME(the placeholder convention) for M5 to fill.
Verified facts (ADR-014)
verified: caddy-dns/gandi v1.1.0 (2025-07) · module
dns.providers.gandi,xcaddybuild, PAT viaGANDI_BEARER_TOKEN,tls { dns gandi {env.GANDI_BEARER_TOKEN} }· WebFetch github.com/caddy-dns/gandi · 2026-06-14 verified: NetBird self-host · Docker Compose (management + signal + relay + coturn + dashboard), embedded Dex, ports 80/443 TCP + 3478 UDP, supports an external reverse proxy · WebFetch docs.netbird.io/selfhosted · 2026-06-14 to verify in the plan: exact NetBird compose/config.yaml/dashboard.envschema for the pinned version, the external-proxy config knobs, and which secrets are role-generated vs operator-supplied.
Architecture & data flow
road-warrior / ubongo ──TLS──> Caddy (askari:443, netbird.askari.wingu.me)
│ cert: ACME DNS-01 via Gandi (vault.gandi.pat)
└─> NetBird dashboard + management/signal (HTTP, internal)
NetBird agents ──UDP 3478──> Coturn (STUN/TURN) ; ──relay──> relay
- All containers on askari via Docker Compose (rendered by Ansible).
- Caddy and NetBird share a Docker network; only Caddy (80/443) + Coturn (3478) face the internet (Hetzner Cloud Firewall + the container port mapping).
Roles (units, each testable)
docker_host—tasks/main.yml: add Docker apt repo (pinned), installdocker-ce+docker-compose-plugin, enable the service. Molecule: install +docker --version. (Tagpackages/role-name.)reverse_proxy— custom image (.docker/caddy-gandi/Dockerfile,xcaddy+caddy-dns/gandi, built/pushed like the Molecule image);templates/{docker-compose, Caddyfile,env}.j2; route data ingroup_vars(reverse_proxy__routes). Molecule: render +caddy validate.netbird—templates/{docker-compose.yml,config.yaml,dashboard.env,...}.j2rendered fromnetbird__*+ vault; the ADR-004 standard files. Deploy mechanics per ADR-004.
Testing
- Molecule per role where it fits (docker_host: install; reverse_proxy:
caddy validateon the rendered Caddyfile). NetBird's full stack is heavy for a container — rely on live verification on askari (compose up;curl -sI https://netbird.askari.wingu.me→ 200 + valid cert; dashboard loads;docker compose pshealthy). - Live (gated, on askari): deploy the three roles; verify the cert issues via DNS-01, the dashboard is reachable over TLS, and the NetBird services are healthy. (Enrolment is M5.)
Scope boundaries — what M4 is NOT
- Not enrolment (ubongo + laptops) or narrowing SSH to
wt0— M5. - Not the cluster reverse proxy / Authentik forward-auth — Phase 2 (the
reverse_proxyrole is built to generalise, but only askari/NetBird is wired now). - Not off-site backup execution of the datastore — pending
fisi(ADR-022); recorded as an accepted risk with BACKUP.md authored. - Not auditd/CIS, host firewall on askari (still M5/Phase 2).
Open items (resolve in the plan)
- Pin the NetBird version + read its current self-host compose/config; pin Caddy +
caddy-dns/gandiversions; pin Docker CE. - Decide which
netbirdsecrets are role-generated (turn password, dex secrets — viacommunity.general.random_string/lookups, persisted to vault) vs operator-supplied (none expected beyond the M5 setup key). - Confirm the custom Caddy image build/host (local build vs the Forgejo registry, like the Molecule image).
netbird.askari.wingu.meas an A (to askari's IP) vs CNAME toaskari.wingu.me.