docs(plan): M4a — Docker + Caddy reverse proxy platform

First of M4's two build phases: docker_host (Docker engine), custom xcaddy Caddy
image (caddy-dns/gandi), reverse_proxy role (Caddyfile from a route catalog,
DNS-01 wildcard cert for *.askari.wingu.me via vault.gandi.pat), ADR-024 (Caddy is
boma's reverse proxy), firewall 80/443 + DNS, proven by serving a test route over
TLS. M4b (NetBird) follows, reading NetBird's current self-host compose then.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-14 17:20:53 +02:00
parent 65cf20a993
commit dd8c6825ba

View file

@ -0,0 +1,146 @@
# M4a — Docker + Caddy reverse proxy (platform) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans. Steps use checkbox (`- [ ]`) syntax.
**Goal:** Stand up the platform NetBird needs — Docker on askari + boma's standard Caddy reverse proxy with Gandi DNS-01 wildcard certs — proven end-to-end by serving a test route over TLS.
**Architecture:** `docker_host` installs Docker engine + compose (pinned). A custom Caddy image (`xcaddy` + `caddy-dns/gandi`) gives DNS-01 via `vault.gandi.pat`. The `reverse_proxy` role renders a Caddyfile from `reverse_proxy__routes` data + an `.env`. The M2 Hetzner firewall opens 80/443; `public_dns` publishes `*.askari.wingu.me`. M4b adds NetBird as a route.
**Tech Stack:** Docker CE, Caddy (custom xcaddy build), ACME DNS-01 (Gandi), Ansible, Terraform (hcloud firewall).
**Spec:** `docs/superpowers/specs/2026-06-14-netbird-coordinator-m4-design.md`
**Execution context:** Tasks author here; **Task 7 applies live to askari + issues a real cert** (gated). The custom image builds with Docker (available).
---
### Task 1: ADR — boma's reverse proxy is Caddy
- [ ] **Step 1:** Create `docs/decisions/024-reverse-proxy.md` following ADR-023's
structure (Status: Accepted; Context; Decision; Consequences; Related). Decision:
**Caddy** is boma's reverse proxy (rationale from the M4 spec Decision 1: Ansible-rendered
config fits Caddy not Traefik's discovery; automatic HTTPS + Gandi DNS-01; simpler at
this scale; `forward_auth` to Authentik preserved). Note it amends the soft Traefik
assumption in the roadmap/ADR-017 prose (no prior ADR pinned Traefik).
- [ ] **Step 2:** Add the ADR-024 row to `CLAUDE.md`'s Further-reading table and update
the roadmap Phase-2 "auth + reverse proxy" line (Authentik + **Caddy**, not Traefik).
- [ ] **Step 3:** `make lint`; commit `docs(adr): ADR-024 — Caddy is boma's reverse proxy`.
---
### Task 2: `docker_host` — install Docker engine
**Files:** `roles/docker_host/{defaults,tasks}/main.yml`, `roles/docker_host/README.md`.
- [ ] **Step 1:** `defaults/main.yml``docker_host__compose_version`-style pins (use the
Docker apt repo; pin via apt or accept repo latest with a comment). Variables:
`docker_host__packages: [docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin]`.
- [ ] **Step 2:** `tasks/main.yml` — add the Docker apt repo + GPG key (`ansible.builtin.apt_key`/`deb822_repository`),
`apt` install `docker_host__packages`, enable+start `docker`. (Tag: role-name; concern `packages`.)
- [ ] **Step 3:** Fill `README.md` (purpose, vars). `make lint`.
- [ ] **Step 4:** Molecule: converge installs Docker; verify `docker --version` + service active. (`make test ROLE=docker_host`; build the image if needed.)
- [ ] **Step 5:** Commit `feat(docker_host): install Docker engine + compose plugin`.
---
### Task 3: Custom Caddy image (xcaddy + caddy-dns/gandi)
**Files:** `.docker/caddy-gandi/Dockerfile`, `Makefile` (a `caddy-image` target).
- [ ] **Step 1:** `.docker/caddy-gandi/Dockerfile` (verify the latest stable Caddy + plugin tags per ADR-014):
```dockerfile
FROM caddy:2-builder AS build
RUN xcaddy build --with github.com/caddy-dns/gandi
FROM caddy:2
COPY --from=build /usr/bin/caddy /usr/bin/caddy
```
- [ ] **Step 2:** `Makefile` — add `caddy-image` (build, tagged for the Forgejo registry like the Molecule image) + `caddy-image-push`. Add to `.PHONY` + help.
- [ ] **Step 3:** Build it: `make caddy-image`; verify `docker run --rm <img> caddy list-modules | grep dns.providers.gandi`. Expected: the module is listed.
- [ ] **Step 4:** Commit `feat(docker): custom Caddy image with the Gandi DNS-01 plugin`.
---
### Task 4: `reverse_proxy` role (Caddy)
**Files:** create `roles/reverse_proxy/{defaults,tasks}/main.yml`, `templates/{docker-compose.yml.j2,Caddyfile.j2,env.j2}`, `README.md`; `inventories/production/group_vars/all/reverse_proxy.yml`.
- [ ] **Step 1:** `group_vars/all/reverse_proxy.yml` — route data:
```yaml
reverse_proxy__image: "<forgejo-registry>/sjat/caddy-gandi:latest"
reverse_proxy__base_dir: /opt/services/reverse_proxy
reverse_proxy__acme_domain: askari.wingu.me # wildcard *.askari.wingu.me
reverse_proxy__routes: [] # M4b appends: {host: netbird.askari.wingu.me, upstream: "netbird-dashboard:80"}
```
- [ ] **Step 2:** `templates/Caddyfile.j2` — global TLS via Gandi DNS-01 + a per-route block:
```
{
email admin@wingu.me
}
*.{{ reverse_proxy__acme_domain }} {
tls {
dns gandi {env.GANDI_BEARER_TOKEN}
}
{% for r in reverse_proxy__routes %}
@{{ r.host | replace('.', '_') }} host {{ r.host }}
handle @{{ r.host | replace('.', '_') }} {
reverse_proxy {{ r.upstream }}
}
{% endfor %}
handle {
respond "boma reverse proxy" 200
}
}
```
- [ ] **Step 3:** `templates/env.j2``GANDI_BEARER_TOKEN={{ vault.gandi.pat }}`.
- [ ] **Step 4:** `templates/docker-compose.yml.j2` — the Caddy service (image `reverse_proxy__image`, ports 80:80 + 443:443, env_file, volumes for the Caddyfile + cert data, restart unless-stopped).
- [ ] **Step 5:** `tasks/main.yml` — ADR-004 deploy mechanics: ensure `base_dir`, render compose+Caddyfile+env, `community.docker.docker_compose_v2` up. (Adds `community.docker` to `requirements.yml` with the on-demand comment.)
- [ ] **Step 6:** `README.md`; `make lint`.
- [ ] **Step 7:** Molecule (render-only): converge renders the files (compose `apply:false`-style or skip the up in container); verify `caddy validate --config Caddyfile` passes. Commit `feat(reverse_proxy): Caddy role (Gandi DNS-01, route catalog)`.
---
### Task 5: Open the firewall (TF) + DNS
- [ ] **Step 1:** In `terraform/modules/hetzner_vm/main.tf`, add Caddy ports to the firewall (variable-driven so other hosts differ): inbound **80/tcp** + **443/tcp** from `0.0.0.0/0` + **3478/udp** (NetBird, M4b uses it) — gate behind a `var.public_web` bool defaulting false; set true for askari in `environments/offsite/main.tf`. `terraform fmt`.
- [ ] **Step 2:** `make tf-plan TF_ENV=offsite` (review: firewall adds 80/443[/3478]) → **gated** `make tf-apply TF_ENV=offsite`.
- [ ] **Step 3:** Add `*.askari.wingu.me` A → askari's IP to `public_dns__records` (`group_vars/all/public_dns.yml`); `make deploy PLAYBOOK=dns`; `dig +short test.askari.wingu.me` → askari IP.
- [ ] **Step 4:** Commit the TF + DNS changes.
---
### Task 6: Playbook wiring
- [ ] **Step 1:** Create `playbooks/offsite.yml` targeting `offsite_hosts`: roles `docker_host` then `reverse_proxy` (each with its role-name tag). `make lint` (check-tags verifies the role-name tags).
- [ ] **Step 2:** Commit `feat(offsite): playbook applying docker_host + reverse_proxy to askari`.
---
### Task 7: Apply to askari + prove TLS (gated, live)
> Live on askari. Issues a **real cert** via DNS-01. `rbw` unlocked.
- [ ] **Step 1:** `make check PLAYBOOK=offsite LIMIT=askari` — review.
- [ ] **Step 2:** `make deploy PLAYBOOK=offsite LIMIT=askari` — Docker installs, Caddy comes up.
- [ ] **Step 3:** Prove it (from ubongo): `curl -sSI https://test.askari.wingu.me``HTTP/2 200` with a **valid Let's Encrypt cert** (the wildcard `*.askari.wingu.me` issued via Gandi DNS-01). `curl -s https://test.askari.wingu.me``boma reverse proxy`.
- [ ] **Step 4:** `.venv/bin/ansible offsite_hosts -b -m command -a 'docker compose -f /opt/services/reverse_proxy/docker-compose.yml ps'` → Caddy healthy.
- [ ] **Step 5:** No repo commit (host state).
---
### Task 8: Docs
- [ ] **Step 1:** STATUS.md — Docker on askari + the `reverse_proxy` (Caddy) role built + applied; `*.askari.wingu.me` cert live. ROADMAP M4 — note M4a done, M4b (NetBird) next.
- [ ] **Step 2:** `make lint`; commit.
---
## Self-Review (completed)
- **Spec coverage:** Caddy-as-standard ADR (Decision 1) → Task 1; docker_host (Decision 4) →
Task 2; custom Caddy image + DNS-01 (Decisions 2) → Task 3; reverse_proxy role + route
catalog (Decision 4) → Task 4; firewall 80/443/3478 (Decision 5) → Task 5; DNS (Decision 6)
→ Task 5; live cert proof (testing) → Task 7. NetBird itself (Decisions 3,7,8) → **M4b**, correct.
- **Placeholder scan:** `<forgejo-registry>` is the known registry host (`forgejo.nyumbani.baobab.band/...`) — fill from the Molecule image var; not a logic gap. Version pins (Caddy, Docker, plugin) are flagged ADR-014 verifications, done in their tasks.
- **Name consistency:** `reverse_proxy__*`, `vault.gandi.pat``GANDI_BEARER_TOKEN`, `*.askari.wingu.me` used consistently across role, templates, firewall, and DNS.
- **Risk:** the custom image + DNS-01 is the novel bit — Task 3 verifies the module loads and Task 7 proves a real cert issues before M4b depends on it.