Compare commits

...

7 commits

Author SHA1 Message Date
1862b7a828 docs(m4a): HTTP-01 for askari; ADR-024 cert-method-follows-exposure; STATUS/roadmap/friction
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:14:38 +02:00
b7e919d6b3 refactor(reverse_proxy): vanilla Caddy + HTTP-01 (drop DNS-01 custom image)
Switch from a custom caddy-dns/gandi image built on-host to the official
caddy:2 image with per-host ACME HTTP-01 certificates. Removes the
Dockerfile, env.j2 (Gandi token), on-host image build/ship/load tasks,
the caddy-image Makefile target, and the wildcard DNS-01 Caddyfile.
Each route now gets its own server block and automatic certificate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:11:20 +02:00
9c169561d7 feat(offsite): *.askari.wingu.me wildcard + offsite.yml (docker_host + reverse_proxy)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:39:44 +02:00
1ee343dfca feat(tf): open Caddy 80/443 + NetBird 3478 on askari (public_web)
hetzner_vm gains a public_web bool (default false); offsite sets it true. Firewall
adds 80/443 tcp + 3478 udp from anywhere (SSH-from-ubongo preserved). For M4 Caddy
+ NetBird.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:38:51 +02:00
50b6445bdd feat(reverse_proxy): Caddy role (Gandi DNS-01, on-host image build, route catalog)
Implements the Caddy reverse proxy role (ADR-024): builds boma/caddy-gandi:latest
on-host (caddy-dns/gandi plugin), renders Caddyfile from route catalog, brings
Compose project up. Adds community.docker to requirements.yml, production group_vars,
and a caddy-image Makefile target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:36:58 +02:00
456c27d12b feat(docker_host): install Docker engine + compose plugin
Implements the docker_host role tasks: prerequisites, /etc/apt/keyrings
directory (ordered before the GPG key write), Docker APT key + repo, and
docker-ce/cli/containerd.io/compose-plugin install. Daemon hardening and
nftables.d integration remain deferred to Phase 2 (cluster + base firewall).
Updates defaults, README, and molecule verify to assert docker --version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:28:51 +02:00
d10f6de84b docs(adr): ADR-024 — Caddy is boma's reverse proxy
Adds ADR-024 pinning Caddy (xcaddy + caddy-dns/gandi) as boma's reverse
proxy, superseding the soft Traefik assumption in the roadmap and ADR-017
prose. Updates CLAUDE.md Further reading table and ROADMAP.md Phase-2 step 5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:28:42 +02:00
26 changed files with 501 additions and 39 deletions

View file

@ -249,6 +249,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
| Operational access | `docs/decisions/021-operational-access.md` |
| Backup & disaster recovery | `docs/decisions/022-backup.md` |
| ADR structure & lifecycle | `docs/decisions/023-adr-structure.md` |
| Reverse proxy (Caddy) | `docs/decisions/024-reverse-proxy.md` |
| Adding a new role | `docs/runbooks/new-role.md` |
| Adding a new host | `docs/runbooks/new-host.md` |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |

View file

@ -30,7 +30,8 @@ _Last reviewed: 2026-06-14._
| `make check` / `make deploy PLAYBOOK=<name>` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). |
| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (SPF `-all` + DMARC reject), and the Gandi-defaults purge are defined + unit-tested (`tests/test_public_dns.py`). **Applied to wingu.me (2026-06-14):** purged Gandi's 13 seeded defaults; zone now holds only the SPF + DMARC TXT records; idempotent re-run clean. No null-MX (Gandi rejects `0 .`) — the MX is removed, so no MX + no apex A = no mail. M1 of the roadmap. |
| `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (now relevant — the offsite tfstate exists). |
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me``77.42.120.136`. **SSH-hardened + fail2ban (M3 `hardening` concern applied).** **Pending:** NetBird coordinator (M4), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). |
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me``77.42.120.136`. **SSH-hardened + fail2ban (M3).** **Docker + Caddy reverse proxy (M4a):** `docker_host` + `reverse_proxy` (vanilla Caddy, HTTP-01) applied; `https://test.askari.wingu.me` serves a valid Let's Encrypt cert ✓ (firewall opens 80/443/3478). **Pending:** NetBird coordinator (M4b), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). |
| `roles/docker_host/` (Docker engine) + `roles/reverse_proxy/` (Caddy, ADR-024) | **Built + applied** (askari, M4a). `docker_host` installs Docker CE + compose; `reverse_proxy` is boma's standard Caddy proxy (HTTP-01 for public hosts; routes from `reverse_proxy__routes`). DNS-01 for cluster mesh/LAN-only services is deferred to Phase 2 (caddy-dns/gandi unresolved — see FRICTION). |
## Scaffolded but empty — NOT implemented

View file

@ -21,6 +21,20 @@ earning its keep.
_(append new raw signals here; the next kaizen review consumes them)_
- `[gotcha]` **Hetzner IPs are 403'd by Google's Go module infra; caddy-dns/gandi DNS-01
didn't issue** (2026-06-14, M4a): building the custom Caddy image *on askari* failed —
`proxy.golang.org` and `golang.org` both return **403 Forbidden** to the Hetzner IP
(worked on ubongo). Reworked the role to build on the control node + `docker save`/`load`
to the target. *Then* the `caddy-dns/gandi` DNS-01 plugin would not create the
`_acme-challenge` TXT despite a token verified to (a) be in Caddy's env and (b) create
TXT records via the Gandi API directly — no plugin error, just "propagation timeout,
last error <nil>"; resolvers/timeout tuning didn't help. **Resolution:** askari is a
*public* host, so switched it to **HTTP-01 + vanilla Caddy** (works, drops the custom
image entirely). DNS-01 deferred to Phase 2 (cluster's mesh/LAN-only services) — the
plugin + the Hetzner-build-block to be solved then. → lesson: prefer HTTP-01 wherever a
host is publicly reachable; reserve DNS-01 (and its plugin/build complexity) for hosts
that genuinely can't do HTTP-01. Both bugs surfaced only on the live host.
- `[gotcha]` **A tag on `include_tasks` does NOT reach the included tasks — need
`apply: {tags:}`** (2026-06-14): M3's `base/tasks/main.yml` tagged the ssh/fail2ban
`include_tasks` with `hardening`, but `make deploy … TAGS=hardening` ran *nothing*

View file

@ -109,8 +109,15 @@ active. Full CIS L1/L2, auditd, AppArmor, AIDE remain deferred to Phase 2 (TODO
### M4 · NetBird control plane on `askari` — first real service role
Built in two phases. **M4a (platform) — ✅ DONE:** Docker on askari + boma's standard
**Caddy** reverse proxy (ADR-024), proven by `https://test.askari.wingu.me` serving a
valid Let's Encrypt cert (HTTP-01 — DNS-01 deferred to Phase 2, see ADR-024/FRICTION).
Firewall opened 80/443/3478. Spec/plan: `…2026-06-14-netbird-coordinator-m4-design.md` /
`…2026-06-14-m4a-docker-caddy.md`. **M4b (next):** the `netbird` service role — read
NetBird's current self-host compose then.
Deploy the NetBird stack (management / signal / relay / Coturn + dashboard) with the
**embedded IdP** (ADR-016 — no Authentik dependency).
**embedded IdP** (ADR-016 — no Authentik dependency), fronted by the now-proven Caddy.
- **First exercise of:** the service-role conventions (`SECURITY.md` / `VERIFY.md` /
`ACCESS.md` / `BACKUP.md`), public **TLS / ACME**, and the **backup contract**
@ -156,8 +163,8 @@ Canonical dependency order:
3. **`docker_host`** — real Docker engine + Compose, daemon hardening, `nftables.d`
container rules (currently a scaffold; ADR-004, ADR-020).
4. **`dns` role** — render the internal zone from inventory (ADR-007).
5. **Auth + reverse proxy** — Authentik + Traefik: the foundation every service sits
behind with authentication (ADR-002).
5. **Auth + reverse proxy** — Authentik + **Caddy** (ADR-024): the foundation every
service sits behind with authentication (ADR-002).
6. **Monitoring** — Loki + Grafana Alloy (logging, ADR-018) + Prometheus/exporters +
Uptime Kuma; decide which alerts live where (TODO 3.6).
7. **Service roles** — PhotoPrism, email, indexers, … (`docs/CAPABILITIES.md`); each

View file

@ -0,0 +1,117 @@
# ADR-024 — Reverse proxy: Caddy (ACME — HTTP-01 public, DNS-01 private)
## Status
Accepted (2026-06-14). Amends the soft Traefik assumption carried by the roadmap
(Phase-2 step 5) and ADR-017 prose; those are updated to read "Caddy (ADR-024)".
> **Cert method follows exposure (revised 2026-06-14, M4a).** The cert *challenge*
> depends on whether a host is publicly reachable: **public hosts** (askari) use
> **HTTP-01** with **vanilla Caddy** — simplest, no plugin; **mesh/LAN-only cluster
> services** (no public A-record) need **DNS-01** (the M1 Gandi capability), since they
> can't satisfy HTTP-01. The DNS-01 path is **deferred to Phase 2**: the `caddy-dns/gandi`
> plugin did not create the ACME TXT records on askari despite a verified-valid token
> (and Hetzner IPs are 403'd by Google's Go module infra, blocking the on-host custom
> build) — both to be sorted when the cluster's private services actually need DNS-01.
> The body below describes the DNS-01 design; askari (M4a) ships on HTTP-01.
## Context
boma needs a reverse proxy to front its services with TLS. ADR-002 requires every
service to sit behind a proxy with authentication before it is reachable; ADR-007/M1
delivers a `*.boma.<domain>` wildcard cert via ACME DNS-01 against Gandi — the only
viable cert path for mesh/LAN-only services that cannot satisfy HTTP-01 (no public
A-record to point at).
The roadmap (Phase-2, step 5) and ADR-017 prose assumed **Traefik + Authentik** as the
auth-and-proxy pair without an ADR ever pinning Traefik. On closer inspection:
- Traefik's headline feature is **dynamic Docker-label discovery** — it discovers and
routes services automatically from container labels without any static config.
- boma already renders *all* config from Ansible templates and the `group_vars` catalog
(ADR-004). That makes dynamic label discovery a disadvantage: a service that is not in
the catalog does not exist (CLAUDE.md), so any route that Traefik auto-discovers
outside the catalog would be unaudited.
- The first reverse-proxy instance is needed on `askari` for M4 (NetBird), a host where
`docker_hosts` patterns are being established under off-site/VPS constraints, not a
full Proxmox cluster with many services.
No production investment in Traefik config has been made; the decision can be made
cleanly here.
## Decision
boma's reverse proxy is **Caddy**.
### 1. Rationale for Caddy over Traefik
1. Traefik's dynamic label discovery is wasted — boma renders config from the catalog;
Caddy's static Caddyfile maps naturally to "render from templates" (ADR-004).
2. Caddy's Caddyfile is simple to template with `ansible.builtin.template`; one file,
one `ansible_managed` header, no side-channel label state.
3. **Automatic HTTPS** via ACME DNS-01: the `caddy-dns/gandi` plugin satisfies the
Gandi DNS-01 challenge, which is the only cert path for services with no public
A-record (ADR-007/M1 wildcard strategy).
4. Far simpler for a solo operator: no dashboard-as-a-service, no routing-rule DSL,
no dynamic config files to reconcile.
5. `forward_auth` to Authentik is a first-class Caddy directive — the planned
Authentik auth story (ADR-002) is preserved without Traefik as the middleman.
### 2. Custom image
Caddy's official Docker image does not include third-party DNS plugins. The `caddy-dns/gandi`
plugin must be compiled in via `xcaddy`. boma builds a custom image:
```
FROM caddy:builder AS builder
RUN xcaddy build --with github.com/caddy-dns/gandi
FROM caddy:latest
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
```
This image is maintained as a boma artifact (Forgejo registry, pinned digest in the
Compose template). It is the cost of the Gandi DNS-01 path — unavoidable regardless of
proxy choice.
### 3. Deployment scope
The first Caddy instance fronts the NetBird stack on `askari` (M4). The pattern
generalises to the Proxmox cluster in Phase 2 when services multiply.
### 4. Authentik integration (deferred)
`forward_auth` to Authentik is deferred to Phase 2 (when Authentik is deployed on the
cluster). The Caddyfile template will carry a placeholder comment. No Traefik-Authentik
middleware migration is required.
## Consequences
- **Roadmap Phase-2 step 5** is updated from "Authentik + Traefik" to "Authentik +
Caddy (ADR-024)".
- **ADR-017 prose** that mentioned Traefik is updated to read "Caddy (ADR-024)".
- A custom Caddy image (`xcaddy` + `caddy-dns/gandi`) must be built, pushed to the
Forgejo registry, and kept current (plugin + base image updates).
- Caddyfile config is rendered by Ansible from `group_vars` — consistent with ADR-004
and easier to review than distributed container labels.
- `forward_auth` to Authentik is available when Authentik is deployed; no extra
middleware layer required.
- The `proxy` concern tag (already in `tests/tags.yml`) covers Caddy config tasks.
## What was ruled out
- **Traefik** — dynamic label discovery is a mismatch for boma's catalog-rendered
config model (ADR-004); more complex for a solo operator; no prior investment to
protect.
- **nginx / HAProxy** — no built-in ACME; require a separate ACME client (certbot,
acme.sh) adding operational surface; Caddy's integrated ACME is simpler.
- **NetBird's bundled TLS** — NetBird's management UI can serve its own TLS, but that
doesn't generalise; a real proxy separates concerns and applies to every service.
## Related
- ADR-002 — services behind a proxy with authentication (the requirement this satisfies).
- ADR-004 — Docker & Compose model (template-rendered config, catalog-driven).
- ADR-007 / M1 — Gandi DNS-01 ACME path (the TLS strategy Caddy implements).
- ADR-016 — NetBird (M4 is the first deployment of this proxy).
- ADR-017 — service-UI verification; forward_auth to Authentik is the future auth story.

View file

@ -13,6 +13,9 @@ public_dns__records:
# askari (off-site host, TF-provisioned M2) — public A so it's reachable by name +
# for future ACME on *.askari.wingu.me. Mesh/LAN-only home services never appear here.
- {record: askari, type: A, values: ["77.42.120.136"], ttl: 1800}
# Wildcard for askari's services (test/netbird/...) → same host; Caddy gets a
# *.askari.wingu.me cert via DNS-01 (M4a).
- {record: "*.askari", type: A, values: ["77.42.120.136"], ttl: 1800}
# Absent — Gandi's auto-seeded defaults we don't want (purged once, idempotent thereafter).
public_dns__absent:

View file

@ -0,0 +1,6 @@
---
# Caddy reverse proxy on askari (ADR-024). Vanilla Caddy, ACME HTTP-01 (public host).
reverse_proxy__acme_email: admin@wingu.me
reverse_proxy__routes:
- {host: test.askari.wingu.me, respond: "boma reverse proxy"}
# M4b appends: {host: netbird.askari.wingu.me, upstream: "netbird-dashboard:80"}

11
playbooks/offsite.yml Normal file
View file

@ -0,0 +1,11 @@
---
# offsite.yml — off-site hosts (askari): Docker engine + the Caddy reverse proxy.
# NetBird (M4b) appends to this play. Run: make deploy PLAYBOOK=offsite LIMIT=askari
- name: Configure off-site hosts
hosts: offsite_hosts
become: true
roles:
- role: docker_host
tags: [docker_host]
- role: reverse_proxy
tags: [reverse_proxy]

View file

@ -16,3 +16,8 @@ collections:
# LiveDNS). PAT auth requires >= 9.0.0.
- name: community.general
version: ">=9.0.0"
# community.docker — docker_image (build the Caddy image on-host) + docker_compose_v2
# (reverse_proxy role).
- name: community.docker
version: ">=3.0.0"

View file

@ -1,25 +1,25 @@
# docker_host
Docker engine + Compose runtime applied to every host in the `docker_hosts` group.
Provides the container platform that the per-service roles (one service = one role,
ADR-004) deploy their Compose stacks onto.
Installs the Docker CE engine and the Compose plugin on every host in the
`docker_hosts` group. Provides the container runtime that per-service roles
(one service = one role, ADR-004) deploy their Compose stacks onto.
> **Status: scaffolded, not yet implemented.** This role has no tasks yet — applying it
> is a no-op. It is wired into `playbooks/site.yml` so the full standard state is
> expressed end-to-end, and so `make lint` covers it. See `STATUS.md`.
## Scope
## Planned scope
This role covers the **engine install only**. The following are deferred to Phase 2
(when the Proxmox cluster and `base` host firewall exist):
- Install Docker engine + the Compose plugin, version-pinned (ADR-011).
- Daemon hardening: `iptables: false` (the host `base` firewall owns nftables, ADR-020),
log driver, `live-restore`, user-namespace remapping where practical (ADR-002).
- Render container forward/NAT rules into `/etc/nftables.d/*.nft` — the include hook the
`base` role's ruleset exposes (see `roles/base/README.md`).
- Provide the runtime the service roles deploy their Compose files onto.
- Daemon hardening (`iptables: false`, log driver, `live-restore`, userns remapping).
- Rendering container forward/NAT rules into `/etc/nftables.d/*.nft` (the `base` role
hook for container firewall integration, ADR-020).
## Variables
None yet. Placeholders will use the `docker_host__*` namespace (CLAUDE.md convention).
| Variable | Default | Description |
|---|---|---|
| `docker_host__packages` | `[docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin]` | APT packages installed from the Docker CE repository |
All variables use the `docker_host__` double-underscore namespace (CLAUDE.md convention).
## Example
@ -31,4 +31,14 @@ None yet. Placeholders will use the `docker_host__*` namespace (CLAUDE.md conven
tags: [docker_host]
```
See ADR-004 (`docs/decisions/004-docker-model.md`) for the Docker & Compose model.
## Tags
All tasks carry the `packages` concern tag (APT package install, ADR-019).
## Related
- ADR-004 (`docs/decisions/004-docker-model.md`) — Docker & Compose model.
- ADR-020 (`docs/decisions/020-firewall.md`) — daemon hardening + `nftables.d`
integration (deferred to Phase 2).
- ADR-011 (`docs/decisions/011-update-management.md`) — version pinning policy
(future: pin Docker CE version explicitly).

View file

@ -1 +1,8 @@
---
# Docker engine install (ADR-004). Cluster-specific daemon hardening + nftables.d
# integration are deferred to when the cluster + host firewall exist.
docker_host__packages:
- docker-ce
- docker-ce-cli
- containerd.io
- docker-compose-plugin

View file

@ -4,8 +4,14 @@
gather_facts: true
tasks:
- name: Add verification tasks here
ansible.builtin.assert:
that: true
msg: "Replace this with real assertions"
- name: Verify docker binary is present
ansible.builtin.command: docker --version
register: docker_version_output
changed_when: false
tags: [verify]
- name: Assert docker --version succeeded
ansible.builtin.assert:
that: docker_version_output.rc == 0
msg: "docker --version failed — Docker was not installed correctly"
tags: [verify]

View file

@ -1,13 +1,39 @@
---
# docker_host — Docker engine + Compose runtime for hosts in the docker_hosts group.
#
# SCAFFOLDED, NOT YET IMPLEMENTED. This role is referenced by playbooks/site.yml so the
# full standard state is expressed end-to-end, but it has no tasks yet — applying it is a
# no-op. See STATUS.md ("Scaffolded but empty") and ADR-004 (Docker & Compose model).
#
# Planned scope (ADR-002/004/020):
# - install Docker engine + compose plugin (version-pinned, per ADR-011)
# - daemon hardening: iptables:false (host nftables owns the firewall, ADR-020),
# log-driver, live-restore, userns where practical
# - render container forward/NAT rules into /etc/nftables.d/*.nft (the base-role hook)
# - deploy per-service Compose stacks from the service roles (one service = one role)
- name: Install prerequisites
ansible.builtin.apt:
name: [ca-certificates, curl, gnupg]
state: present
update_cache: true
tags: [packages]
- name: Ensure /etc/apt/keyrings exists
ansible.builtin.file:
path: /etc/apt/keyrings
state: directory
mode: "0755"
tags: [packages]
- name: Add Docker's APT GPG key
ansible.builtin.get_url:
url: https://download.docker.com/linux/debian/gpg
dest: /etc/apt/keyrings/docker.asc
mode: "0644"
tags: [packages]
- name: Add the Docker APT repository
ansible.builtin.apt_repository:
repo: >-
deb [arch={{ 'amd64' if ansible_architecture == 'x86_64' else ansible_architecture }}
signed-by=/etc/apt/keyrings/docker.asc]
https://download.docker.com/linux/debian
{{ ansible_distribution_release }} stable
filename: docker
state: present
tags: [packages]
- name: Install Docker engine + compose plugin
ansible.builtin.apt:
name: "{{ docker_host__packages }}"
state: present
update_cache: true
tags: [packages]

View file

@ -0,0 +1,62 @@
# reverse_proxy
Boma's standard Caddy reverse proxy (ADR-024). Runs on `askari` (the off-site
Hetzner host) and terminates TLS for all public-facing services via ACME HTTP-01.
Uses the official `caddy:2` image — no custom build, no DNS plugin, no token required.
## How TLS works
Caddy obtains per-hostname certificates using the ACME HTTP-01 challenge. Port 80
must be reachable from the internet for the challenge to succeed. Each `host` in
`reverse_proxy__routes` gets its own certificate automatically.
> **DNS-01 (for mesh/LAN-only cluster services) is deferred to Phase 2.** The
> `caddy-dns/gandi` plugin failed to issue certificates during M4a and needs
> investigation before it can be used.
## Route catalog — `reverse_proxy__routes`
Services register themselves as routes by appending an entry to
`reverse_proxy__routes` in `group_vars/all/reverse_proxy.yml`:
```yaml
reverse_proxy__routes:
- {host: app.askari.wingu.me, upstream: "app:8080"}
- {host: health.askari.wingu.me, respond: "ok"}
```
Each entry renders a separate server block in the Caddyfile:
```
app.askari.wingu.me {
reverse_proxy app:8080
}
health.askari.wingu.me {
respond "ok" 200
}
```
Use `upstream` to proxy to a Docker service, or `respond` to return a static string.
## Variables
| Variable | Default | Description |
|---|---|---|
| `reverse_proxy__base_dir` | `/opt/services/reverse_proxy` | Working directory for Compose project |
| `reverse_proxy__acme_email` | `admin@example.test` | ACME registration email |
| `reverse_proxy__routes` | `[]` | List of `{host, upstream}` or `{host, respond}` entries |
| `reverse_proxy__manage` | `true` | Set `false` in Molecule to skip Docker tasks |
Production overrides live in
`inventories/production/group_vars/all/reverse_proxy.yml`.
## `reverse_proxy__manage` toggle
Docker operations (`docker compose up`) are gated on `reverse_proxy__manage | bool`.
Set it to `false` in Molecule so the role can be tested (template rendering, directory
creation) without a Docker daemon.
## Secrets
None. HTTP-01 requires no credentials.

View file

@ -0,0 +1,6 @@
---
# Caddy reverse proxy (ADR-024). Vanilla Caddy; TLS via ACME HTTP-01 (public hosts).
reverse_proxy__base_dir: /opt/services/reverse_proxy
reverse_proxy__acme_email: admin@example.test
reverse_proxy__routes: [] # each: {host: x, upstream: "svc:port"} OR {host: x, respond: "text"}
reverse_proxy__manage: true # set false in Molecule to render without Docker

View file

@ -0,0 +1,7 @@
---
- name: Reload caddy
listen: reload caddy
community.docker.docker_container_exec:
container: caddy
command: caddy reload --config /etc/caddy/Caddyfile
when: reverse_proxy__manage | bool

View file

@ -0,0 +1,13 @@
---
galaxy_info:
author: sjat
description: >-
Caddy reverse proxy with ACME DNS-01 TLS via Gandi (ADR-024). Builds the
custom image on-host (caddy-dns/gandi) and manages it via Docker Compose.
license: MIT
min_ansible_version: "2.17"
platforms:
- name: Debian
versions:
- trixie
dependencies: []

View file

@ -0,0 +1,16 @@
---
- name: Converge
hosts: all
gather_facts: true
vars:
reverse_proxy__manage: false
reverse_proxy__acme_email: admin@example.test
reverse_proxy__routes:
- host: app.example.test
upstream: "app:80"
- host: t.example.test
respond: "ok"
roles:
- role: reverse_proxy

View file

@ -0,0 +1,31 @@
---
dependency:
name: galaxy
options:
requirements-file: ../../requirements.yml
driver:
name: docker
platforms:
- name: instance
# Project-owned image built from .docker/molecule-debian13/Dockerfile
# and hosted in the Forgejo container registry.
# Build/push with: make molecule-image / make molecule-image-push
image: forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest
pre_build_image: true
privileged: true # required for systemd
cgroupns_mode: host
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
command: /lib/systemd/systemd
provisioner:
name: ansible
inventory:
host_vars:
instance:
ansible_user: root
verifier:
name: ansible

View file

@ -0,0 +1,22 @@
---
- name: Verify
hosts: all
gather_facts: false
tasks:
- name: Slurp the rendered Caddyfile
ansible.builtin.slurp:
src: /opt/services/reverse_proxy/Caddyfile
register: _caddyfile
tags: [verify]
- name: Assert Caddyfile exists and contains expected content
ansible.builtin.assert:
that:
- _caddyfile.content | b64decode | length > 0
- "'app.example.test' in (_caddyfile.content | b64decode)"
- "'reverse_proxy app:80' in (_caddyfile.content | b64decode)"
- "'respond \"ok\" 200' in (_caddyfile.content | b64decode)"
fail_msg: "Caddyfile is missing expected content"
success_msg: "Caddyfile rendered correctly"
tags: [verify]

View file

@ -0,0 +1,29 @@
---
- name: Ensure the service directory exists
ansible.builtin.file:
path: "{{ reverse_proxy__base_dir }}"
state: directory
mode: "0750"
tags: [config]
- name: Render the Caddyfile
ansible.builtin.template:
src: Caddyfile.j2
dest: "{{ reverse_proxy__base_dir }}/Caddyfile"
mode: "0644"
notify: reload caddy
tags: [config]
- name: Render the compose file
ansible.builtin.template:
src: docker-compose.yml.j2
dest: "{{ reverse_proxy__base_dir }}/docker-compose.yml"
mode: "0644"
tags: [config]
- name: Bring the reverse proxy up
community.docker.docker_compose_v2:
project_src: "{{ reverse_proxy__base_dir }}"
state: present
when: reverse_proxy__manage | bool
tags: [deploy]

View file

@ -0,0 +1,12 @@
{
email {{ reverse_proxy__acme_email }}
}
{% for r in reverse_proxy__routes %}
{{ r.host }} {
{% if r.upstream is defined %}
reverse_proxy {{ r.upstream }}
{% else %}
respond "{{ r.respond | default('boma') }}" 200
{% endif %}
}
{% endfor %}

View file

@ -0,0 +1,22 @@
services:
caddy:
image: caddy:2
container_name: caddy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
networks:
- boma
volumes:
caddy_data:
caddy_config:
networks:
boma:
name: boma

View file

@ -5,13 +5,14 @@
module "askari" {
source = "../../modules/hetzner_vm"
name = "askari"
server_type = "cx23" # x86, 2 vCPU / 4 GB / 40 GB (CAX11/ARM was out of stock in
name = "askari"
server_type = "cx23" # x86, 2 vCPU / 4 GB / 40 GB (CAX11/ARM was out of stock in
# every EU location 2026-06-14; cx23 is same-spec + cheaper)
location = "hel1" # Helsinki
image = "debian-13"
ansible_ssh_pubkey = var.ansible_ssh_pubkey
ssh_admin_cidrs = var.ssh_admin_cidrs
public_web = true # Caddy 80/443 + NetBird 3478 (M4)
labels = {
env = "offsite"
group = "offsite_hosts"

View file

@ -26,14 +26,35 @@ resource "hcloud_ssh_key" "ansible" {
resource "hcloud_firewall" "this" {
name = "${var.name}-fw"
# SSH from the control node only. NetBird ports (UDP 3478, TCP 80/443) are added
# in M4 when the coordinator deploys (ADR-020); host nftables stays catalog-driven.
# SSH from the control node only.
rule {
direction = "in"
protocol = "tcp"
port = "22"
source_ips = var.ssh_admin_cidrs
}
# Public web (Caddy 80/443) + NetBird STUN/TURN (3478/udp) only when public_web
# (ADR-024, M4). Host nftables stays catalog-driven (ADR-020).
dynamic "rule" {
for_each = var.public_web ? ["80", "443"] : []
content {
direction = "in"
protocol = "tcp"
port = rule.value
source_ips = ["0.0.0.0/0", "::/0"]
}
}
dynamic "rule" {
for_each = var.public_web ? ["3478"] : []
content {
direction = "in"
protocol = "udp"
port = rule.value
source_ips = ["0.0.0.0/0", "::/0"]
}
}
}
resource "hcloud_server" "this" {

View file

@ -28,6 +28,12 @@ variable "ssh_admin_cidrs" {
type = list(string)
}
variable "public_web" {
description = "Open the public web/NetBird ports (80/443 TCP, 3478 UDP) to the internet"
type = bool
default = false
}
variable "labels" {
description = "Hetzner resource labels (metadata only)"
type = map(string)