docs(spec): M4 — NetBird coordinator on askari + Caddy reverse proxy
Caddy becomes boma's standard reverse proxy (amends the soft Traefik assumption; new ADR) with Gandi DNS-01 certs (custom xcaddy image, reuses vault.gandi.pat) — the only cert path for mesh/LAN-only services. NetBird self-hosted in external-proxy mode (embedded Dex), compose rendered from boma templates (ADR-004/013). Three roles: docker_host (first real content), reverse_proxy (new, Caddy), netbird (first service role w/ full ADR-004 standard files). Firewall + DNS amendments; backup execution deferred (fisi). caddy-dns/gandi + NetBird self-host facts verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
181a02fd3a
commit
65cf20a993
1 changed files with 120 additions and 0 deletions
|
|
@ -0,0 +1,120 @@
|
|||
# Design — NetBird coordinator on askari + Caddy reverse proxy (M4)
|
||||
|
||||
- **Date:** 2026-06-14
|
||||
- **Status:** Draft → straight to plan (per the standing skip-the-spec-review-gate agreement)
|
||||
- **Roadmap milestone:** M4 (`docs/ROADMAP.md`)
|
||||
- **Implements:** ADR-016 (NetBird coordinator self-hosted on askari), ADR-004 (first
|
||||
service role)
|
||||
- **Establishes:** a new **ADR — boma's reverse proxy is Caddy** (amends the soft Traefik
|
||||
assumption in the roadmap/ADR-017 prose)
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
The NetBird mesh control plane (ADR-016) must run on askari so ubongo + road-warrior
|
||||
laptops can enrol (M5) and reach ubongo from anywhere. This is also boma's **first real
|
||||
service role** (ADR-004) and its **first reverse proxy** — so M4 sets two precedents:
|
||||
the service-role pattern, and Caddy as boma's standard reverse proxy.
|
||||
|
||||
## Decisions (as settled)
|
||||
|
||||
1. **Caddy is boma's standard reverse proxy** (replaces the soft Traefik assumption — no
|
||||
ADR ever pinned Traefik). Rationale: boma renders all config from Ansible templates
|
||||
(ADR-004), so Traefik's dynamic Docker-label discovery is wasted; Caddy's templated
|
||||
Caddyfile + automatic HTTPS fits the "render from the catalog" model; far simpler for a
|
||||
solo operator; `forward_auth` to Authentik later keeps the auth story. → small new ADR.
|
||||
2. **Caddy + Gandi DNS-01** (not HTTP-01). boma's services are mostly **mesh/LAN-only with
|
||||
no public DNS record**, and you cannot HTTP-01 an unexposed host — DNS-01 is the only
|
||||
cert path for them (the reason M1 built Gandi DNS-01). One mechanism fleet-wide; reuses
|
||||
`vault.gandi.pat`. Cost: a **custom Caddy image** (`xcaddy` + `caddy-dns/gandi`) — fits
|
||||
boma's "build our own images" pattern (the Molecule image).
|
||||
3. **NetBird in external-reverse-proxy mode** — disable its bundled Traefik; boma's Caddy
|
||||
terminates TLS for `netbird.askari.wingu.me` and proxies to the NetBird containers.
|
||||
Embedded **Dex** IdP (ADR-016). The compose + server config are **rendered from boma
|
||||
Jinja templates** (ADR-004 + ADR-013 translate-don't-transplant), based on NetBird's
|
||||
current self-host reference read at implementation time.
|
||||
4. **Three roles, applied to askari (`offsite_hosts`):**
|
||||
- **`docker_host`** — first real content: install Docker engine + compose plugin,
|
||||
version-pinned (ADR-011). (Cluster daemon-hardening + `nftables.d` integration stay
|
||||
deferred to the cluster.)
|
||||
- **`reverse_proxy`** (new) — the custom Caddy image + a `Caddyfile` rendered from route
|
||||
data + `.env` with `GANDI_BEARER_TOKEN={{ vault.gandi.pat }}`. boma's standard proxy;
|
||||
generalises to the cluster later (not built now).
|
||||
- **`netbird`** (new) — **boma's first service role**: renders the NetBird compose +
|
||||
server config + `.env` from vault; the full ADR-004 standard files.
|
||||
5. **Firewall:** amend the M2 Hetzner Cloud Firewall (TF `offsite`) to open **80/443 TCP +
|
||||
3478 UDP** (NetBird's public ports). SSH-from-ubongo stays.
|
||||
6. **DNS:** add `netbird.askari.wingu.me` → askari's IP via `public_dns` (M1 role).
|
||||
7. **Standard service-role files authored, execution deferred.** SECURITY/VERIFY/ACCESS/
|
||||
BACKUP.md written for `netbird` (the precedent), but `/verify-service` (playwright) and
|
||||
`fisi` backup don't exist yet — BACKUP.md records the datastore + an **accepted risk**
|
||||
that off-site backup is pending; VERIFY.md is authored, run later.
|
||||
8. **Setup keys are an M5 artifact** (created post-deploy via the dashboard/API). M4 stubs
|
||||
`vault.netbird.setup_key: CHANGEME` (the placeholder convention) for M5 to fill.
|
||||
|
||||
## Verified facts (ADR-014)
|
||||
|
||||
> verified: caddy-dns/gandi v1.1.0 (2025-07) · module `dns.providers.gandi`, `xcaddy`
|
||||
> build, PAT via `GANDI_BEARER_TOKEN`, `tls { dns gandi {env.GANDI_BEARER_TOKEN} }` ·
|
||||
> WebFetch github.com/caddy-dns/gandi · 2026-06-14
|
||||
> verified: NetBird self-host · Docker Compose (management + signal + relay + coturn +
|
||||
> dashboard), embedded Dex, ports 80/443 TCP + 3478 UDP, supports an external reverse
|
||||
> proxy · WebFetch docs.netbird.io/selfhosted · 2026-06-14
|
||||
> to verify in the plan: exact NetBird compose/`config.yaml`/`dashboard.env` schema for
|
||||
> the pinned version, the external-proxy config knobs, and which secrets are
|
||||
> role-generated vs operator-supplied.
|
||||
|
||||
## Architecture & data flow
|
||||
|
||||
```
|
||||
road-warrior / ubongo ──TLS──> Caddy (askari:443, netbird.askari.wingu.me)
|
||||
│ cert: ACME DNS-01 via Gandi (vault.gandi.pat)
|
||||
└─> NetBird dashboard + management/signal (HTTP, internal)
|
||||
NetBird agents ──UDP 3478──> Coturn (STUN/TURN) ; ──relay──> relay
|
||||
```
|
||||
- All containers on askari via Docker Compose (rendered by Ansible).
|
||||
- Caddy and NetBird share a Docker network; only Caddy (80/443) + Coturn (3478) face the
|
||||
internet (Hetzner Cloud Firewall + the container port mapping).
|
||||
|
||||
## Roles (units, each testable)
|
||||
|
||||
- `docker_host` — `tasks/main.yml`: add Docker apt repo (pinned), install
|
||||
`docker-ce` + `docker-compose-plugin`, enable the service. Molecule: install + `docker
|
||||
--version`. (Tag `packages`/role-name.)
|
||||
- `reverse_proxy` — custom image (`.docker/caddy-gandi/Dockerfile`, `xcaddy` +
|
||||
`caddy-dns/gandi`, built/pushed like the Molecule image); `templates/{docker-compose,
|
||||
Caddyfile,env}.j2`; route data in `group_vars` (`reverse_proxy__routes`). Molecule:
|
||||
render + `caddy validate`.
|
||||
- `netbird` — `templates/{docker-compose.yml,config.yaml,dashboard.env,...}.j2` rendered
|
||||
from `netbird__*` + vault; the ADR-004 standard files. Deploy mechanics per ADR-004.
|
||||
|
||||
## Testing
|
||||
|
||||
- **Molecule** per role where it fits (docker_host: install; reverse_proxy: `caddy
|
||||
validate` on the rendered Caddyfile). NetBird's full stack is heavy for a container —
|
||||
rely on **live verification on askari** (compose up; `curl -sI https://netbird.askari.wingu.me`
|
||||
→ 200 + valid cert; dashboard loads; `docker compose ps` healthy).
|
||||
- **Live (gated, on askari):** deploy the three roles; verify the cert issues via DNS-01,
|
||||
the dashboard is reachable over TLS, and the NetBird services are healthy. (Enrolment is
|
||||
M5.)
|
||||
|
||||
## Scope boundaries — what M4 is NOT
|
||||
|
||||
- **Not** enrolment (ubongo + laptops) or narrowing SSH to `wt0` — **M5**.
|
||||
- **Not** the cluster reverse proxy / Authentik forward-auth — **Phase 2** (the
|
||||
`reverse_proxy` role is built to generalise, but only askari/NetBird is wired now).
|
||||
- **Not** off-site backup execution of the datastore — pending `fisi` (ADR-022); recorded
|
||||
as an accepted risk with BACKUP.md authored.
|
||||
- **Not** auditd/CIS, host firewall on askari (still M5/Phase 2).
|
||||
|
||||
## Open items (resolve in the plan)
|
||||
|
||||
- Pin the NetBird version + read its current self-host compose/config; pin Caddy +
|
||||
`caddy-dns/gandi` versions; pin Docker CE.
|
||||
- Decide which `netbird` secrets are **role-generated** (turn password, dex secrets — via
|
||||
`community.general.random_string`/lookups, persisted to vault) vs operator-supplied
|
||||
(none expected beyond the M5 setup key).
|
||||
- Confirm the custom Caddy image build/host (local build vs the Forgejo registry, like the
|
||||
Molecule image).
|
||||
- `netbird.askari.wingu.me` as an A (to askari's IP) vs CNAME to `askari.wingu.me`.
|
||||
Loading…
Add table
Reference in a new issue