docs(reverse_proxy): service-role SECURITY/VERIFY/ACCESS records (O12)

reverse_proxy is the first built+applied service role; add the per-service
records CLAUDE.md/ADR-002/008/017/021 require. Add access__*/backup__* data to
defaults as the source of truth (ADR-021/022). reverse_proxy is stateless (ACME
certs re-issue via HTTP-01), so it declares backup__state: false with a reason
rather than a BACKUP.md (ADR-022 convention).

The access__*/backup__* cross-role field names intentionally don't carry the
reverse_proxy__ prefix, so each is marked `# noqa: var-naming[no-role-prefix]`
(ansible-lint has no per-prefix allowlist; rule stays enabled elsewhere).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-14 19:06:23 +02:00
parent 718781053f
commit cb8f924d4b
4 changed files with 164 additions and 0 deletions

View file

@ -0,0 +1,37 @@
# Access — reverse_proxy (Caddy)
Rendered from the role's `access__*` data (`roles/reverse_proxy/defaults/main.yml`) —
the source of truth that also drives `/check-access`. Regenerate from the data; edit the
data, not the tables. Host: `askari` (off-site Hetzner; ADR-007/016).
## Access paths
The documented ways in, by tier (rendered from `access__*`):
| Tier | Path | Invocation |
|---|---|---|
| primary | `wt0` mesh SSH | `ssh askari` (over the NetBird mesh — pending M5; see notes) |
| secondary | LAN/WAN SSH from `ubongo` | `ssh ansible@askari` (from the control node; Hetzner firewall allows only ubongo's WAN) |
| — | container exec + compose | `docker compose -p reverse_proxy -f /opt/services/reverse_proxy/docker-compose.yml ps` / `… exec caddy sh` |
| — | logs | `docker logs caddy` now; Loki labels `{service: caddy}` once the ADR-018 pipeline lands |
| — | admin API | n/a — Caddy admin API bound to container localhost `:2019`, never exposed (`access__api.enabled: false`) |
## Break-glass
Mesh-and-LAN-independent fallback for this host's class (recorded, not routine):
- **Hetzner rescue system + Cloud Console** (VNC) for `askari` — boot the rescue image
or attach the web console from the Hetzner Cloud panel if SSH is unreachable.
## Operational notes
- **Mesh not yet enrolled (M5).** Until `askari` joins the NetBird mesh, the `wt0`
primary path does not exist — the only SSH route is the secondary one (from `ubongo`'s
WAN IP, which the TF-managed Hetzner Cloud Firewall allowlists). Promote `wt0` to
primary once M5 lands.
- **Caddy wedged / bad config:** the Caddyfile is rendered read-only by Ansible; to
recover, fix `reverse_proxy__routes` in `group_vars` and re-run the role (it reloads
Caddy via the handler). To inspect live config: `docker exec caddy caddy validate
--config /etc/caddy/Caddyfile`.
- **Cert issuance failing:** check that port 80 is reachable from the internet (HTTP-01
needs it) and watch `docker logs caddy` for ACME errors before assuming a routing fault.

View file

@ -0,0 +1,61 @@
# Security — reverse_proxy (Caddy)
## Exposure
- **Published ports:** `80/tcp` + `443/tcp` (HTTP→HTTPS redirect + TLS). Both are
declared in the `group_vars` firewall catalog as the askari `public_web` opens
(ADR-020); the Hetzner Cloud Firewall also opens 80/443 (and 3478 for NetBird).
Port 80 must stay open to the internet for the ACME HTTP-01 challenge.
- **Auth surface:** none of its own. Caddy is the TLS terminator and router; per-service
authentication (Authentik `forward_auth`) is added at each route in Phase 2 (ADR-024
§4). Today it fronts only a static `respond` test vhost and (M4b) the NetBird stack,
which carries its own auth.
- **Reachability:** public — askari is internet-facing. Caddy is the single public entry
point; upstreams sit on the internal `boma` Docker network and are reached by name, not
published directly.
- **Data sensitivity:** none persistent worth protecting — only ACME account keys +
issued certificates in the `caddy_data` volume, which are re-issuable (HTTP-01). No
user data, no secrets at rest. See backup record: `backup__state: false` (stateless).
## Checklist status
Each item from `docs/security/service-checklist.md`:
- [x] Secrets in vault; no default creds; nothing secret in git/images — ✅ n/a: HTTP-01
needs no credentials; the only config input is `reverse_proxy__acme_email` (not secret).
- [x] Non-root; no `privileged`/host-network unless justified; minimal mounts; caps
dropped — ⚠️ official `caddy:2` runs as root (to bind 80/443); no `privileged`, no host
network (bridge `boma`); mounts are the read-only Caddyfile + two named volumes. Root
inside the container is the upstream default; revisit if Caddy ships a rootless variant.
- [x] Ports declared in `group_vars`; behind reverse proxy + auth if exposed;
least-privilege inter-service reach — ✅ 80/443 in the catalog; Caddy *is* the proxy;
upstreams are not published, only reachable on the `boma` network.
- [x] Image pinned (tag/digest), update path known — ⚠️ pinned to the `caddy:2` major
tag (stateless tier, ADR-011/ADR-004), not a digest; refreshed deliberately and watched
by DIUN. Tighten to `tag@digest` if the proxy is reclassified as stateful.
- [x] Logs reviewable; backup/restore covered if stateful — ✅ stateless (no backup
needed); logs via `docker logs caddy` now, Loki labels declared for the ADR-018 pipeline.
## Service-specific hardening
- **HTTP-01 only, no DNS token:** vanilla `caddy:2`, no `caddy-dns/gandi` plugin and no
Gandi API token on the host — removes a credential and a custom-image supply chain
(ADR-024 revised Status).
- **Caddyfile is read-only** in the container (`:ro` mount); rendered solely by Ansible
from the `group_vars` route catalog — no dynamic label discovery, so no route exists
that wasn't declared (the reason Caddy was chosen over Traefik, ADR-024 §1).
- **Admin API not exposed:** Caddy's admin endpoint stays on container-localhost `:2019`;
never published, never in the firewall catalog (`access__api.enabled: false`).
- **Automatic HTTPS:** HTTP is redirected to HTTPS and modern TLS defaults are Caddy's
out-of-the-box behaviour (no manual cipher config needed).
## Residual / accepted risks
- **Container runs as root** — upstream `caddy:2` default (needs to bind low ports).
Rationale: official image, no rootless variant wired yet; blast radius limited to the
proxy container. Revisit: adopt a rootless Caddy image if upstream stabilises one.
- **Image pinned to a major tag, not a digest** — accepted for the stateless tier
(ADR-011). Revisit if the role gains state.
- **ACME re-issuance vs Let's Encrypt rate limits** — losing `caddy_data` triggers
re-issuance; rapid repeated rebuilds could hit LE rate limits. Acceptable for a handful
of askari hostnames; noted in the backup rationale.

View file

@ -0,0 +1,44 @@
# Verify — reverse_proxy (Caddy)
`reverse_proxy` has no application UI of its own — it is the TLS terminator and router.
"Working" is verified at the HTTP/TLS layer (what `/verify-service` can drive with a
browser/HTTP client against the public hostnames it serves), not via an app login.
## Critical user journeys
1. **HTTPS serves with a valid cert** — request `https://<a host in
reverse_proxy__routes>` (e.g. `https://test.askari.wingu.me`) → 200 with a valid
Let's Encrypt certificate (trusted chain, CN/SAN matches the host, not expired).
2. **HTTP redirects to HTTPS** — request `http://<host>` → 308/301 redirect to the
`https://` URL (Caddy's automatic-HTTPS redirect).
3. **A `respond` route returns its static body** — the test vhost returns its configured
string with 200.
4. **An `upstream` route proxies through** — once a real upstream is registered (M4b
NetBird), `https://<host>` reaches the upstream's response, not a Caddy error page.
5. **An unknown host is not served a valid cert** — a hostname not in
`reverse_proxy__routes` does not get a certificate / is not routed (no accidental
catch-all).
## What good looks like
- The browser padlock shows a valid Let's Encrypt certificate for the requested host;
the SAN matches and the chain is trusted.
- `http://` visibly becomes `https://` in the address bar.
- The expected body (static `respond` text, or the upstream's page) renders.
## Not browser-verifiable
- Certificate *renewal* (60-day cadence) — confirm out of band via `docker logs caddy`
/ Loki, not a single browser session.
- Behaviour when port 80 is blocked (HTTP-01 would fail) — an infrastructure/firewall
check, route to the manual handoff.
- The deferred DNS-01 path for mesh/LAN-only services (Phase 2, ADR-024) — not yet live.
## Test data
Provisioned in the **staging** deploy (no Authentik user needed — there is no SSO on the
proxy itself):
- At least one `reverse_proxy__routes` entry with a public DNS A-record pointing at the
staging host, so HTTP-01 can complete. A static `respond` route is enough for journeys
13 and 5.

View file

@ -4,3 +4,25 @@ reverse_proxy__base_dir: /opt/services/reverse_proxy
reverse_proxy__acme_email: admin@example.test
reverse_proxy__routes: [] # each: {host: x, upstream: "svc:port"} OR {host: x, respond: "text"}
reverse_proxy__manage: true # set false in Molecule to render without Docker
# access__*/backup__* are the ADR-021/022 CROSS-ROLE conventions — shared field names that
# render ACCESS.md/BACKUP.md and drive /check-access · /check-backup. They intentionally do
# NOT carry the reverse_proxy__ prefix, so each is marked `# noqa: var-naming[no-role-prefix]`
# (ansible-lint's role-prefix rule has no per-prefix allowlist; keeping it enabled elsewhere).
# Operational-access record (ADR-021) — source of truth for ACCESS.md + /check-access.
access__service: reverse_proxy # noqa: var-naming[no-role-prefix]
access__compose_project: reverse_proxy # noqa: var-naming[no-role-prefix]
access__compose_path: "{{ reverse_proxy__base_dir }}/docker-compose.yml" # noqa: var-naming[no-role-prefix]
access__containers: [caddy] # noqa: var-naming[no-role-prefix]
access__log: # noqa: var-naming[no-role-prefix]
loki_labels: { service: caddy } # intent; Loki/Alloy pipeline is ADR-018 (pending)
access__api: # noqa: var-naming[no-role-prefix]
enabled: false
reason: "Caddy admin API bound to container localhost :2019; never exposed (ADR-020 catalog owns ports)"
# Backup contract (ADR-022). Stateless: Caddy's /data holds only ACME account keys +
# issued certs, which are re-requested automatically on restart via HTTP-01 (no manual
# steps). Residual risk: Let's Encrypt rate limits on rapid repeated re-issuance.
backup__service: reverse_proxy # noqa: var-naming[no-role-prefix]
backup__state: false # noqa: var-naming[no-role-prefix]