docs(reverse_proxy): service-role SECURITY/VERIFY/ACCESS records (O12)
reverse_proxy is the first built+applied service role; add the per-service records CLAUDE.md/ADR-002/008/017/021 require. Add access__*/backup__* data to defaults as the source of truth (ADR-021/022). reverse_proxy is stateless (ACME certs re-issue via HTTP-01), so it declares backup__state: false with a reason rather than a BACKUP.md (ADR-022 convention). The access__*/backup__* cross-role field names intentionally don't carry the reverse_proxy__ prefix, so each is marked `# noqa: var-naming[no-role-prefix]` (ansible-lint has no per-prefix allowlist; rule stays enabled elsewhere). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
718781053f
commit
cb8f924d4b
4 changed files with 164 additions and 0 deletions
37
roles/reverse_proxy/ACCESS.md
Normal file
37
roles/reverse_proxy/ACCESS.md
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
# Access — reverse_proxy (Caddy)
|
||||
|
||||
Rendered from the role's `access__*` data (`roles/reverse_proxy/defaults/main.yml`) —
|
||||
the source of truth that also drives `/check-access`. Regenerate from the data; edit the
|
||||
data, not the tables. Host: `askari` (off-site Hetzner; ADR-007/016).
|
||||
|
||||
## Access paths
|
||||
|
||||
The documented ways in, by tier (rendered from `access__*`):
|
||||
|
||||
| Tier | Path | Invocation |
|
||||
|---|---|---|
|
||||
| primary | `wt0` mesh SSH | `ssh askari` (over the NetBird mesh — pending M5; see notes) |
|
||||
| secondary | LAN/WAN SSH from `ubongo` | `ssh ansible@askari` (from the control node; Hetzner firewall allows only ubongo's WAN) |
|
||||
| — | container exec + compose | `docker compose -p reverse_proxy -f /opt/services/reverse_proxy/docker-compose.yml ps` / `… exec caddy sh` |
|
||||
| — | logs | `docker logs caddy` now; Loki labels `{service: caddy}` once the ADR-018 pipeline lands |
|
||||
| — | admin API | n/a — Caddy admin API bound to container localhost `:2019`, never exposed (`access__api.enabled: false`) |
|
||||
|
||||
## Break-glass
|
||||
|
||||
Mesh-and-LAN-independent fallback for this host's class (recorded, not routine):
|
||||
|
||||
- **Hetzner rescue system + Cloud Console** (VNC) for `askari` — boot the rescue image
|
||||
or attach the web console from the Hetzner Cloud panel if SSH is unreachable.
|
||||
|
||||
## Operational notes
|
||||
|
||||
- **Mesh not yet enrolled (M5).** Until `askari` joins the NetBird mesh, the `wt0`
|
||||
primary path does not exist — the only SSH route is the secondary one (from `ubongo`'s
|
||||
WAN IP, which the TF-managed Hetzner Cloud Firewall allowlists). Promote `wt0` to
|
||||
primary once M5 lands.
|
||||
- **Caddy wedged / bad config:** the Caddyfile is rendered read-only by Ansible; to
|
||||
recover, fix `reverse_proxy__routes` in `group_vars` and re-run the role (it reloads
|
||||
Caddy via the handler). To inspect live config: `docker exec caddy caddy validate
|
||||
--config /etc/caddy/Caddyfile`.
|
||||
- **Cert issuance failing:** check that port 80 is reachable from the internet (HTTP-01
|
||||
needs it) and watch `docker logs caddy` for ACME errors before assuming a routing fault.
|
||||
61
roles/reverse_proxy/SECURITY.md
Normal file
61
roles/reverse_proxy/SECURITY.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
# Security — reverse_proxy (Caddy)
|
||||
|
||||
## Exposure
|
||||
|
||||
- **Published ports:** `80/tcp` + `443/tcp` (HTTP→HTTPS redirect + TLS). Both are
|
||||
declared in the `group_vars` firewall catalog as the askari `public_web` opens
|
||||
(ADR-020); the Hetzner Cloud Firewall also opens 80/443 (and 3478 for NetBird).
|
||||
Port 80 must stay open to the internet for the ACME HTTP-01 challenge.
|
||||
- **Auth surface:** none of its own. Caddy is the TLS terminator and router; per-service
|
||||
authentication (Authentik `forward_auth`) is added at each route in Phase 2 (ADR-024
|
||||
§4). Today it fronts only a static `respond` test vhost and (M4b) the NetBird stack,
|
||||
which carries its own auth.
|
||||
- **Reachability:** public — askari is internet-facing. Caddy is the single public entry
|
||||
point; upstreams sit on the internal `boma` Docker network and are reached by name, not
|
||||
published directly.
|
||||
- **Data sensitivity:** none persistent worth protecting — only ACME account keys +
|
||||
issued certificates in the `caddy_data` volume, which are re-issuable (HTTP-01). No
|
||||
user data, no secrets at rest. See backup record: `backup__state: false` (stateless).
|
||||
|
||||
## Checklist status
|
||||
|
||||
Each item from `docs/security/service-checklist.md`:
|
||||
|
||||
- [x] Secrets in vault; no default creds; nothing secret in git/images — ✅ n/a: HTTP-01
|
||||
needs no credentials; the only config input is `reverse_proxy__acme_email` (not secret).
|
||||
- [x] Non-root; no `privileged`/host-network unless justified; minimal mounts; caps
|
||||
dropped — ⚠️ official `caddy:2` runs as root (to bind 80/443); no `privileged`, no host
|
||||
network (bridge `boma`); mounts are the read-only Caddyfile + two named volumes. Root
|
||||
inside the container is the upstream default; revisit if Caddy ships a rootless variant.
|
||||
- [x] Ports declared in `group_vars`; behind reverse proxy + auth if exposed;
|
||||
least-privilege inter-service reach — ✅ 80/443 in the catalog; Caddy *is* the proxy;
|
||||
upstreams are not published, only reachable on the `boma` network.
|
||||
- [x] Image pinned (tag/digest), update path known — ⚠️ pinned to the `caddy:2` major
|
||||
tag (stateless tier, ADR-011/ADR-004), not a digest; refreshed deliberately and watched
|
||||
by DIUN. Tighten to `tag@digest` if the proxy is reclassified as stateful.
|
||||
- [x] Logs reviewable; backup/restore covered if stateful — ✅ stateless (no backup
|
||||
needed); logs via `docker logs caddy` now, Loki labels declared for the ADR-018 pipeline.
|
||||
|
||||
## Service-specific hardening
|
||||
|
||||
- **HTTP-01 only, no DNS token:** vanilla `caddy:2`, no `caddy-dns/gandi` plugin and no
|
||||
Gandi API token on the host — removes a credential and a custom-image supply chain
|
||||
(ADR-024 revised Status).
|
||||
- **Caddyfile is read-only** in the container (`:ro` mount); rendered solely by Ansible
|
||||
from the `group_vars` route catalog — no dynamic label discovery, so no route exists
|
||||
that wasn't declared (the reason Caddy was chosen over Traefik, ADR-024 §1).
|
||||
- **Admin API not exposed:** Caddy's admin endpoint stays on container-localhost `:2019`;
|
||||
never published, never in the firewall catalog (`access__api.enabled: false`).
|
||||
- **Automatic HTTPS:** HTTP is redirected to HTTPS and modern TLS defaults are Caddy's
|
||||
out-of-the-box behaviour (no manual cipher config needed).
|
||||
|
||||
## Residual / accepted risks
|
||||
|
||||
- **Container runs as root** — upstream `caddy:2` default (needs to bind low ports).
|
||||
Rationale: official image, no rootless variant wired yet; blast radius limited to the
|
||||
proxy container. Revisit: adopt a rootless Caddy image if upstream stabilises one.
|
||||
- **Image pinned to a major tag, not a digest** — accepted for the stateless tier
|
||||
(ADR-011). Revisit if the role gains state.
|
||||
- **ACME re-issuance vs Let's Encrypt rate limits** — losing `caddy_data` triggers
|
||||
re-issuance; rapid repeated rebuilds could hit LE rate limits. Acceptable for a handful
|
||||
of askari hostnames; noted in the backup rationale.
|
||||
44
roles/reverse_proxy/VERIFY.md
Normal file
44
roles/reverse_proxy/VERIFY.md
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
# Verify — reverse_proxy (Caddy)
|
||||
|
||||
`reverse_proxy` has no application UI of its own — it is the TLS terminator and router.
|
||||
"Working" is verified at the HTTP/TLS layer (what `/verify-service` can drive with a
|
||||
browser/HTTP client against the public hostnames it serves), not via an app login.
|
||||
|
||||
## Critical user journeys
|
||||
|
||||
1. **HTTPS serves with a valid cert** — request `https://<a host in
|
||||
reverse_proxy__routes>` (e.g. `https://test.askari.wingu.me`) → 200 with a valid
|
||||
Let's Encrypt certificate (trusted chain, CN/SAN matches the host, not expired).
|
||||
2. **HTTP redirects to HTTPS** — request `http://<host>` → 308/301 redirect to the
|
||||
`https://` URL (Caddy's automatic-HTTPS redirect).
|
||||
3. **A `respond` route returns its static body** — the test vhost returns its configured
|
||||
string with 200.
|
||||
4. **An `upstream` route proxies through** — once a real upstream is registered (M4b
|
||||
NetBird), `https://<host>` reaches the upstream's response, not a Caddy error page.
|
||||
5. **An unknown host is not served a valid cert** — a hostname not in
|
||||
`reverse_proxy__routes` does not get a certificate / is not routed (no accidental
|
||||
catch-all).
|
||||
|
||||
## What good looks like
|
||||
|
||||
- The browser padlock shows a valid Let's Encrypt certificate for the requested host;
|
||||
the SAN matches and the chain is trusted.
|
||||
- `http://` visibly becomes `https://` in the address bar.
|
||||
- The expected body (static `respond` text, or the upstream's page) renders.
|
||||
|
||||
## Not browser-verifiable
|
||||
|
||||
- Certificate *renewal* (60-day cadence) — confirm out of band via `docker logs caddy`
|
||||
/ Loki, not a single browser session.
|
||||
- Behaviour when port 80 is blocked (HTTP-01 would fail) — an infrastructure/firewall
|
||||
check, route to the manual handoff.
|
||||
- The deferred DNS-01 path for mesh/LAN-only services (Phase 2, ADR-024) — not yet live.
|
||||
|
||||
## Test data
|
||||
|
||||
Provisioned in the **staging** deploy (no Authentik user needed — there is no SSO on the
|
||||
proxy itself):
|
||||
|
||||
- At least one `reverse_proxy__routes` entry with a public DNS A-record pointing at the
|
||||
staging host, so HTTP-01 can complete. A static `respond` route is enough for journeys
|
||||
1–3 and 5.
|
||||
|
|
@ -4,3 +4,25 @@ reverse_proxy__base_dir: /opt/services/reverse_proxy
|
|||
reverse_proxy__acme_email: admin@example.test
|
||||
reverse_proxy__routes: [] # each: {host: x, upstream: "svc:port"} OR {host: x, respond: "text"}
|
||||
reverse_proxy__manage: true # set false in Molecule to render without Docker
|
||||
|
||||
# access__*/backup__* are the ADR-021/022 CROSS-ROLE conventions — shared field names that
|
||||
# render ACCESS.md/BACKUP.md and drive /check-access · /check-backup. They intentionally do
|
||||
# NOT carry the reverse_proxy__ prefix, so each is marked `# noqa: var-naming[no-role-prefix]`
|
||||
# (ansible-lint's role-prefix rule has no per-prefix allowlist; keeping it enabled elsewhere).
|
||||
|
||||
# Operational-access record (ADR-021) — source of truth for ACCESS.md + /check-access.
|
||||
access__service: reverse_proxy # noqa: var-naming[no-role-prefix]
|
||||
access__compose_project: reverse_proxy # noqa: var-naming[no-role-prefix]
|
||||
access__compose_path: "{{ reverse_proxy__base_dir }}/docker-compose.yml" # noqa: var-naming[no-role-prefix]
|
||||
access__containers: [caddy] # noqa: var-naming[no-role-prefix]
|
||||
access__log: # noqa: var-naming[no-role-prefix]
|
||||
loki_labels: { service: caddy } # intent; Loki/Alloy pipeline is ADR-018 (pending)
|
||||
access__api: # noqa: var-naming[no-role-prefix]
|
||||
enabled: false
|
||||
reason: "Caddy admin API bound to container localhost :2019; never exposed (ADR-020 catalog owns ports)"
|
||||
|
||||
# Backup contract (ADR-022). Stateless: Caddy's /data holds only ACME account keys +
|
||||
# issued certs, which are re-requested automatically on restart via HTTP-01 (no manual
|
||||
# steps). Residual risk: Let's Encrypt rate limits on rapid repeated re-issuance.
|
||||
backup__service: reverse_proxy # noqa: var-naming[no-role-prefix]
|
||||
backup__state: false # noqa: var-naming[no-role-prefix]
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue