- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional, outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative boma.baobab.band -> boma.wingu.me transition note already added earlier - terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and <host>.boma.baobab.band per ADR-007 naming (O11) - ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections placed after Consequences, matching ADR-014/019-023 (O13) - docs/README + inventories/README: list the missing subdirs / offsite_hosts + offsite.yml merge behaviour (O14, O29 note) - ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19) - ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20) - ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21) - netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23) - ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24) - capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28) - tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9) - tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep) O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected); the fix lives in the generator for the next regeneration. make lint + pytest (57) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.3 KiB
ADR-024 — Reverse proxy: Caddy (ACME — HTTP-01 public, DNS-01 private)
Status
Accepted (2026-06-14). Amends the soft Traefik assumption carried by the roadmap (Phase-2 step 5) and ADR-017 prose; those are updated to read "Caddy (ADR-024)".
Cert method follows exposure (revised 2026-06-14, M4a). The cert challenge depends on whether a host is publicly reachable: public hosts (askari) use HTTP-01 with vanilla Caddy — simplest, no plugin; mesh/LAN-only cluster services (no public A-record) need DNS-01 (the M1 Gandi capability), since they can't satisfy HTTP-01. The DNS-01 path is deferred to Phase 2: the
caddy-dns/gandiplugin did not create the ACME TXT records on askari despite a verified-valid token (and Hetzner IPs are 403'd by Google's Go module infra, blocking the on-host custom build) — both to be sorted when the cluster's private services actually need DNS-01. The body below describes the DNS-01 design; askari (M4a) ships on HTTP-01.
Context
boma needs a reverse proxy to front its services with TLS. ADR-002 requires every
service to sit behind a proxy with authentication before it is reachable; ADR-007/M1
delivers a *.<domain> wildcard cert via ACME DNS-01 against Gandi (the apex boma
domain, matching ROADMAP M1) — the only viable cert path for mesh/LAN-only services
that cannot satisfy HTTP-01 (no public A-record to point at).
The roadmap (Phase-2, step 5) and ADR-017 prose assumed Traefik + Authentik as the auth-and-proxy pair without an ADR ever pinning Traefik. On closer inspection:
- Traefik's headline feature is dynamic Docker-label discovery — it discovers and routes services automatically from container labels without any static config.
- boma already renders all config from Ansible templates and the
group_varscatalog (ADR-004). That makes dynamic label discovery a disadvantage: a service that is not in the catalog does not exist (CLAUDE.md), so any route that Traefik auto-discovers outside the catalog would be unaudited. - The first reverse-proxy instance is needed on
askarifor M4 (NetBird), a host wheredocker_hostspatterns are being established under off-site/VPS constraints, not a full Proxmox cluster with many services.
No production investment in Traefik config has been made; the decision can be made cleanly here.
Decision
boma's reverse proxy is Caddy.
1. Rationale for Caddy over Traefik
- Traefik's dynamic label discovery is wasted — boma renders config from the catalog; Caddy's static Caddyfile maps naturally to "render from templates" (ADR-004).
- Caddy's Caddyfile is simple to template with
ansible.builtin.template; one file, oneansible_managedheader, no side-channel label state. - Automatic HTTPS via ACME DNS-01: the
caddy-dns/gandiplugin satisfies the Gandi DNS-01 challenge, which is the only cert path for services with no public A-record (ADR-007/M1 wildcard strategy). - Far simpler for a solo operator: no dashboard-as-a-service, no routing-rule DSL, no dynamic config files to reconcile.
forward_authto Authentik is a first-class Caddy directive — the planned Authentik auth story (ADR-002) is preserved without Traefik as the middleman.
2. Custom image (DNS-01 path only — Phase 2)
Applies only to the DNS-01 path, which is deferred to Phase 2 (see the Status note). M4a ships vanilla
caddy:2on askari (HTTP-01) — no custom image.
Caddy's official Docker image does not include third-party DNS plugins. The caddy-dns/gandi
plugin must be compiled in via xcaddy. When the cluster's mesh/LAN-only services need
DNS-01, boma builds a custom image:
FROM caddy:builder AS builder
RUN xcaddy build --with github.com/caddy-dns/gandi
FROM caddy:latest
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
That image would be maintained as a boma artifact (Forgejo registry, pinned digest in the Compose template) — the cost of the Gandi DNS-01 path. (On askari this approach hit two blockers, so DNS-01 is deferred; see the Status note.)
3. Deployment scope
The first Caddy instance runs on askari (M4a), serving a test vhost over HTTP-01 to
prove the proxy + ACME path. It fronts the NetBird stack in M4b (when the
netbird_coordinator role is built). The pattern generalises to the Proxmox cluster in
Phase 2 when services multiply.
4. Authentik integration (deferred)
forward_auth to Authentik is deferred to Phase 2 (when Authentik is deployed on the
cluster). The Caddyfile template will carry a placeholder comment. No Traefik-Authentik
middleware migration is required.
Consequences
- Roadmap Phase-2 step 5 is updated from "Authentik + Traefik" to "Authentik + Caddy (ADR-024)".
- ADR-017 prose that mentioned Traefik is updated to read "Caddy (ADR-024)".
- M4a (public hosts, HTTP-01) runs vanilla
caddy:2— no custom image. If/when the Phase-2 DNS-01 path lands, a custom Caddy image (xcaddy+caddy-dns/gandi) must be built, pushed to the Forgejo registry, and kept current (plugin + base image updates). - Caddyfile config is rendered by Ansible from
group_vars— consistent with ADR-004 and easier to review than distributed container labels. forward_authto Authentik is available when Authentik is deployed; no extra middleware layer required.- The
proxyconcern tag (already intests/tags.yml) covers Caddy config tasks.
What was ruled out
- Traefik — dynamic label discovery is a mismatch for boma's catalog-rendered config model (ADR-004); more complex for a solo operator; no prior investment to protect.
- nginx / HAProxy — no built-in ACME; require a separate ACME client (certbot, acme.sh) adding operational surface; Caddy's integrated ACME is simpler.
- NetBird's bundled TLS — NetBird's management UI can serve its own TLS, but that doesn't generalise; a real proxy separates concerns and applies to every service.
Related
- ADR-002 — services behind a proxy with authentication (the requirement this satisfies).
- ADR-004 — Docker & Compose model (template-rendered config, catalog-driven).
- ADR-007 / M1 — Gandi DNS-01 ACME path (the TLS strategy Caddy implements).
- ADR-016 — NetBird (M4 is the first deployment of this proxy).
- ADR-017 — service-UI verification; forward_auth to Authentik is the future auth story.