Compare commits
7 commits
dd8c6825ba
...
1862b7a828
| Author | SHA1 | Date | |
|---|---|---|---|
| 1862b7a828 | |||
| b7e919d6b3 | |||
| 9c169561d7 | |||
| 1ee343dfca | |||
| 50b6445bdd | |||
| 456c27d12b | |||
| d10f6de84b |
26 changed files with 501 additions and 39 deletions
|
|
@ -249,6 +249,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
|
|||
| Operational access | `docs/decisions/021-operational-access.md` |
|
||||
| Backup & disaster recovery | `docs/decisions/022-backup.md` |
|
||||
| ADR structure & lifecycle | `docs/decisions/023-adr-structure.md` |
|
||||
| Reverse proxy (Caddy) | `docs/decisions/024-reverse-proxy.md` |
|
||||
| Adding a new role | `docs/runbooks/new-role.md` |
|
||||
| Adding a new host | `docs/runbooks/new-host.md` |
|
||||
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |
|
||||
|
|
|
|||
|
|
@ -30,7 +30,8 @@ _Last reviewed: 2026-06-14._
|
|||
| `make check` / `make deploy PLAYBOOK=<name>` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). |
|
||||
| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (SPF `-all` + DMARC reject), and the Gandi-defaults purge are defined + unit-tested (`tests/test_public_dns.py`). **Applied to wingu.me (2026-06-14):** purged Gandi's 13 seeded defaults; zone now holds only the SPF + DMARC TXT records; idempotent re-run clean. No null-MX (Gandi rejects `0 .`) — the MX is removed, so no MX + no apex A = no mail. M1 of the roadmap. |
|
||||
| `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (now relevant — the offsite tfstate exists). |
|
||||
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **SSH-hardened + fail2ban (M3 `hardening` concern applied).** **Pending:** NetBird coordinator (M4), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). |
|
||||
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **SSH-hardened + fail2ban (M3).** **Docker + Caddy reverse proxy (M4a):** `docker_host` + `reverse_proxy` (vanilla Caddy, HTTP-01) applied; `https://test.askari.wingu.me` serves a valid Let's Encrypt cert ✓ (firewall opens 80/443/3478). **Pending:** NetBird coordinator (M4b), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). |
|
||||
| `roles/docker_host/` (Docker engine) + `roles/reverse_proxy/` (Caddy, ADR-024) | **Built + applied** (askari, M4a). `docker_host` installs Docker CE + compose; `reverse_proxy` is boma's standard Caddy proxy (HTTP-01 for public hosts; routes from `reverse_proxy__routes`). DNS-01 for cluster mesh/LAN-only services is deferred to Phase 2 (caddy-dns/gandi unresolved — see FRICTION). |
|
||||
|
||||
## Scaffolded but empty — NOT implemented
|
||||
|
||||
|
|
|
|||
|
|
@ -21,6 +21,20 @@ earning its keep.
|
|||
|
||||
_(append new raw signals here; the next kaizen review consumes them)_
|
||||
|
||||
- `[gotcha]` **Hetzner IPs are 403'd by Google's Go module infra; caddy-dns/gandi DNS-01
|
||||
didn't issue** (2026-06-14, M4a): building the custom Caddy image *on askari* failed —
|
||||
`proxy.golang.org` and `golang.org` both return **403 Forbidden** to the Hetzner IP
|
||||
(worked on ubongo). Reworked the role to build on the control node + `docker save`/`load`
|
||||
to the target. *Then* the `caddy-dns/gandi` DNS-01 plugin would not create the
|
||||
`_acme-challenge` TXT despite a token verified to (a) be in Caddy's env and (b) create
|
||||
TXT records via the Gandi API directly — no plugin error, just "propagation timeout,
|
||||
last error <nil>"; resolvers/timeout tuning didn't help. **Resolution:** askari is a
|
||||
*public* host, so switched it to **HTTP-01 + vanilla Caddy** (works, drops the custom
|
||||
image entirely). DNS-01 deferred to Phase 2 (cluster's mesh/LAN-only services) — the
|
||||
plugin + the Hetzner-build-block to be solved then. → lesson: prefer HTTP-01 wherever a
|
||||
host is publicly reachable; reserve DNS-01 (and its plugin/build complexity) for hosts
|
||||
that genuinely can't do HTTP-01. Both bugs surfaced only on the live host.
|
||||
|
||||
- `[gotcha]` **A tag on `include_tasks` does NOT reach the included tasks — need
|
||||
`apply: {tags:}`** (2026-06-14): M3's `base/tasks/main.yml` tagged the ssh/fail2ban
|
||||
`include_tasks` with `hardening`, but `make deploy … TAGS=hardening` ran *nothing*
|
||||
|
|
|
|||
|
|
@ -109,8 +109,15 @@ active. Full CIS L1/L2, auditd, AppArmor, AIDE remain deferred to Phase 2 (TODO
|
|||
|
||||
### M4 · NetBird control plane on `askari` — first real service role
|
||||
|
||||
Built in two phases. **M4a (platform) — ✅ DONE:** Docker on askari + boma's standard
|
||||
**Caddy** reverse proxy (ADR-024), proven by `https://test.askari.wingu.me` serving a
|
||||
valid Let's Encrypt cert (HTTP-01 — DNS-01 deferred to Phase 2, see ADR-024/FRICTION).
|
||||
Firewall opened 80/443/3478. Spec/plan: `…2026-06-14-netbird-coordinator-m4-design.md` /
|
||||
`…2026-06-14-m4a-docker-caddy.md`. **M4b (next):** the `netbird` service role — read
|
||||
NetBird's current self-host compose then.
|
||||
|
||||
Deploy the NetBird stack (management / signal / relay / Coturn + dashboard) with the
|
||||
**embedded IdP** (ADR-016 — no Authentik dependency).
|
||||
**embedded IdP** (ADR-016 — no Authentik dependency), fronted by the now-proven Caddy.
|
||||
|
||||
- **First exercise of:** the service-role conventions (`SECURITY.md` / `VERIFY.md` /
|
||||
`ACCESS.md` / `BACKUP.md`), public **TLS / ACME**, and the **backup contract** —
|
||||
|
|
@ -156,8 +163,8 @@ Canonical dependency order:
|
|||
3. **`docker_host`** — real Docker engine + Compose, daemon hardening, `nftables.d`
|
||||
container rules (currently a scaffold; ADR-004, ADR-020).
|
||||
4. **`dns` role** — render the internal zone from inventory (ADR-007).
|
||||
5. **Auth + reverse proxy** — Authentik + Traefik: the foundation every service sits
|
||||
behind with authentication (ADR-002).
|
||||
5. **Auth + reverse proxy** — Authentik + **Caddy** (ADR-024): the foundation every
|
||||
service sits behind with authentication (ADR-002).
|
||||
6. **Monitoring** — Loki + Grafana Alloy (logging, ADR-018) + Prometheus/exporters +
|
||||
Uptime Kuma; decide which alerts live where (TODO 3.6).
|
||||
7. **Service roles** — PhotoPrism, email, indexers, … (`docs/CAPABILITIES.md`); each
|
||||
|
|
|
|||
117
docs/decisions/024-reverse-proxy.md
Normal file
117
docs/decisions/024-reverse-proxy.md
Normal file
|
|
@ -0,0 +1,117 @@
|
|||
# ADR-024 — Reverse proxy: Caddy (ACME — HTTP-01 public, DNS-01 private)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (2026-06-14). Amends the soft Traefik assumption carried by the roadmap
|
||||
(Phase-2 step 5) and ADR-017 prose; those are updated to read "Caddy (ADR-024)".
|
||||
|
||||
> **Cert method follows exposure (revised 2026-06-14, M4a).** The cert *challenge*
|
||||
> depends on whether a host is publicly reachable: **public hosts** (askari) use
|
||||
> **HTTP-01** with **vanilla Caddy** — simplest, no plugin; **mesh/LAN-only cluster
|
||||
> services** (no public A-record) need **DNS-01** (the M1 Gandi capability), since they
|
||||
> can't satisfy HTTP-01. The DNS-01 path is **deferred to Phase 2**: the `caddy-dns/gandi`
|
||||
> plugin did not create the ACME TXT records on askari despite a verified-valid token
|
||||
> (and Hetzner IPs are 403'd by Google's Go module infra, blocking the on-host custom
|
||||
> build) — both to be sorted when the cluster's private services actually need DNS-01.
|
||||
> The body below describes the DNS-01 design; askari (M4a) ships on HTTP-01.
|
||||
|
||||
## Context
|
||||
|
||||
boma needs a reverse proxy to front its services with TLS. ADR-002 requires every
|
||||
service to sit behind a proxy with authentication before it is reachable; ADR-007/M1
|
||||
delivers a `*.boma.<domain>` wildcard cert via ACME DNS-01 against Gandi — the only
|
||||
viable cert path for mesh/LAN-only services that cannot satisfy HTTP-01 (no public
|
||||
A-record to point at).
|
||||
|
||||
The roadmap (Phase-2, step 5) and ADR-017 prose assumed **Traefik + Authentik** as the
|
||||
auth-and-proxy pair without an ADR ever pinning Traefik. On closer inspection:
|
||||
|
||||
- Traefik's headline feature is **dynamic Docker-label discovery** — it discovers and
|
||||
routes services automatically from container labels without any static config.
|
||||
- boma already renders *all* config from Ansible templates and the `group_vars` catalog
|
||||
(ADR-004). That makes dynamic label discovery a disadvantage: a service that is not in
|
||||
the catalog does not exist (CLAUDE.md), so any route that Traefik auto-discovers
|
||||
outside the catalog would be unaudited.
|
||||
- The first reverse-proxy instance is needed on `askari` for M4 (NetBird), a host where
|
||||
`docker_hosts` patterns are being established under off-site/VPS constraints, not a
|
||||
full Proxmox cluster with many services.
|
||||
|
||||
No production investment in Traefik config has been made; the decision can be made
|
||||
cleanly here.
|
||||
|
||||
## Decision
|
||||
|
||||
boma's reverse proxy is **Caddy**.
|
||||
|
||||
### 1. Rationale for Caddy over Traefik
|
||||
|
||||
1. Traefik's dynamic label discovery is wasted — boma renders config from the catalog;
|
||||
Caddy's static Caddyfile maps naturally to "render from templates" (ADR-004).
|
||||
2. Caddy's Caddyfile is simple to template with `ansible.builtin.template`; one file,
|
||||
one `ansible_managed` header, no side-channel label state.
|
||||
3. **Automatic HTTPS** via ACME DNS-01: the `caddy-dns/gandi` plugin satisfies the
|
||||
Gandi DNS-01 challenge, which is the only cert path for services with no public
|
||||
A-record (ADR-007/M1 wildcard strategy).
|
||||
4. Far simpler for a solo operator: no dashboard-as-a-service, no routing-rule DSL,
|
||||
no dynamic config files to reconcile.
|
||||
5. `forward_auth` to Authentik is a first-class Caddy directive — the planned
|
||||
Authentik auth story (ADR-002) is preserved without Traefik as the middleman.
|
||||
|
||||
### 2. Custom image
|
||||
|
||||
Caddy's official Docker image does not include third-party DNS plugins. The `caddy-dns/gandi`
|
||||
plugin must be compiled in via `xcaddy`. boma builds a custom image:
|
||||
|
||||
```
|
||||
FROM caddy:builder AS builder
|
||||
RUN xcaddy build --with github.com/caddy-dns/gandi
|
||||
|
||||
FROM caddy:latest
|
||||
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
|
||||
```
|
||||
|
||||
This image is maintained as a boma artifact (Forgejo registry, pinned digest in the
|
||||
Compose template). It is the cost of the Gandi DNS-01 path — unavoidable regardless of
|
||||
proxy choice.
|
||||
|
||||
### 3. Deployment scope
|
||||
|
||||
The first Caddy instance fronts the NetBird stack on `askari` (M4). The pattern
|
||||
generalises to the Proxmox cluster in Phase 2 when services multiply.
|
||||
|
||||
### 4. Authentik integration (deferred)
|
||||
|
||||
`forward_auth` to Authentik is deferred to Phase 2 (when Authentik is deployed on the
|
||||
cluster). The Caddyfile template will carry a placeholder comment. No Traefik-Authentik
|
||||
middleware migration is required.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Roadmap Phase-2 step 5** is updated from "Authentik + Traefik" to "Authentik +
|
||||
Caddy (ADR-024)".
|
||||
- **ADR-017 prose** that mentioned Traefik is updated to read "Caddy (ADR-024)".
|
||||
- A custom Caddy image (`xcaddy` + `caddy-dns/gandi`) must be built, pushed to the
|
||||
Forgejo registry, and kept current (plugin + base image updates).
|
||||
- Caddyfile config is rendered by Ansible from `group_vars` — consistent with ADR-004
|
||||
and easier to review than distributed container labels.
|
||||
- `forward_auth` to Authentik is available when Authentik is deployed; no extra
|
||||
middleware layer required.
|
||||
- The `proxy` concern tag (already in `tests/tags.yml`) covers Caddy config tasks.
|
||||
|
||||
## What was ruled out
|
||||
|
||||
- **Traefik** — dynamic label discovery is a mismatch for boma's catalog-rendered
|
||||
config model (ADR-004); more complex for a solo operator; no prior investment to
|
||||
protect.
|
||||
- **nginx / HAProxy** — no built-in ACME; require a separate ACME client (certbot,
|
||||
acme.sh) adding operational surface; Caddy's integrated ACME is simpler.
|
||||
- **NetBird's bundled TLS** — NetBird's management UI can serve its own TLS, but that
|
||||
doesn't generalise; a real proxy separates concerns and applies to every service.
|
||||
|
||||
## Related
|
||||
|
||||
- ADR-002 — services behind a proxy with authentication (the requirement this satisfies).
|
||||
- ADR-004 — Docker & Compose model (template-rendered config, catalog-driven).
|
||||
- ADR-007 / M1 — Gandi DNS-01 ACME path (the TLS strategy Caddy implements).
|
||||
- ADR-016 — NetBird (M4 is the first deployment of this proxy).
|
||||
- ADR-017 — service-UI verification; forward_auth to Authentik is the future auth story.
|
||||
|
|
@ -13,6 +13,9 @@ public_dns__records:
|
|||
# askari (off-site host, TF-provisioned M2) — public A so it's reachable by name +
|
||||
# for future ACME on *.askari.wingu.me. Mesh/LAN-only home services never appear here.
|
||||
- {record: askari, type: A, values: ["77.42.120.136"], ttl: 1800}
|
||||
# Wildcard for askari's services (test/netbird/...) → same host; Caddy gets a
|
||||
# *.askari.wingu.me cert via DNS-01 (M4a).
|
||||
- {record: "*.askari", type: A, values: ["77.42.120.136"], ttl: 1800}
|
||||
|
||||
# Absent — Gandi's auto-seeded defaults we don't want (purged once, idempotent thereafter).
|
||||
public_dns__absent:
|
||||
|
|
|
|||
6
inventories/production/group_vars/all/reverse_proxy.yml
Normal file
6
inventories/production/group_vars/all/reverse_proxy.yml
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
---
|
||||
# Caddy reverse proxy on askari (ADR-024). Vanilla Caddy, ACME HTTP-01 (public host).
|
||||
reverse_proxy__acme_email: admin@wingu.me
|
||||
reverse_proxy__routes:
|
||||
- {host: test.askari.wingu.me, respond: "boma reverse proxy"}
|
||||
# M4b appends: {host: netbird.askari.wingu.me, upstream: "netbird-dashboard:80"}
|
||||
11
playbooks/offsite.yml
Normal file
11
playbooks/offsite.yml
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
---
|
||||
# offsite.yml — off-site hosts (askari): Docker engine + the Caddy reverse proxy.
|
||||
# NetBird (M4b) appends to this play. Run: make deploy PLAYBOOK=offsite LIMIT=askari
|
||||
- name: Configure off-site hosts
|
||||
hosts: offsite_hosts
|
||||
become: true
|
||||
roles:
|
||||
- role: docker_host
|
||||
tags: [docker_host]
|
||||
- role: reverse_proxy
|
||||
tags: [reverse_proxy]
|
||||
|
|
@ -16,3 +16,8 @@ collections:
|
|||
# LiveDNS). PAT auth requires >= 9.0.0.
|
||||
- name: community.general
|
||||
version: ">=9.0.0"
|
||||
|
||||
# community.docker — docker_image (build the Caddy image on-host) + docker_compose_v2
|
||||
# (reverse_proxy role).
|
||||
- name: community.docker
|
||||
version: ">=3.0.0"
|
||||
|
|
|
|||
|
|
@ -1,25 +1,25 @@
|
|||
# docker_host
|
||||
|
||||
Docker engine + Compose runtime applied to every host in the `docker_hosts` group.
|
||||
Provides the container platform that the per-service roles (one service = one role,
|
||||
ADR-004) deploy their Compose stacks onto.
|
||||
Installs the Docker CE engine and the Compose plugin on every host in the
|
||||
`docker_hosts` group. Provides the container runtime that per-service roles
|
||||
(one service = one role, ADR-004) deploy their Compose stacks onto.
|
||||
|
||||
> **Status: scaffolded, not yet implemented.** This role has no tasks yet — applying it
|
||||
> is a no-op. It is wired into `playbooks/site.yml` so the full standard state is
|
||||
> expressed end-to-end, and so `make lint` covers it. See `STATUS.md`.
|
||||
## Scope
|
||||
|
||||
## Planned scope
|
||||
This role covers the **engine install only**. The following are deferred to Phase 2
|
||||
(when the Proxmox cluster and `base` host firewall exist):
|
||||
|
||||
- Install Docker engine + the Compose plugin, version-pinned (ADR-011).
|
||||
- Daemon hardening: `iptables: false` (the host `base` firewall owns nftables, ADR-020),
|
||||
log driver, `live-restore`, user-namespace remapping where practical (ADR-002).
|
||||
- Render container forward/NAT rules into `/etc/nftables.d/*.nft` — the include hook the
|
||||
`base` role's ruleset exposes (see `roles/base/README.md`).
|
||||
- Provide the runtime the service roles deploy their Compose files onto.
|
||||
- Daemon hardening (`iptables: false`, log driver, `live-restore`, userns remapping).
|
||||
- Rendering container forward/NAT rules into `/etc/nftables.d/*.nft` (the `base` role
|
||||
hook for container firewall integration, ADR-020).
|
||||
|
||||
## Variables
|
||||
|
||||
None yet. Placeholders will use the `docker_host__*` namespace (CLAUDE.md convention).
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `docker_host__packages` | `[docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin]` | APT packages installed from the Docker CE repository |
|
||||
|
||||
All variables use the `docker_host__` double-underscore namespace (CLAUDE.md convention).
|
||||
|
||||
## Example
|
||||
|
||||
|
|
@ -31,4 +31,14 @@ None yet. Placeholders will use the `docker_host__*` namespace (CLAUDE.md conven
|
|||
tags: [docker_host]
|
||||
```
|
||||
|
||||
See ADR-004 (`docs/decisions/004-docker-model.md`) for the Docker & Compose model.
|
||||
## Tags
|
||||
|
||||
All tasks carry the `packages` concern tag (APT package install, ADR-019).
|
||||
|
||||
## Related
|
||||
|
||||
- ADR-004 (`docs/decisions/004-docker-model.md`) — Docker & Compose model.
|
||||
- ADR-020 (`docs/decisions/020-firewall.md`) — daemon hardening + `nftables.d`
|
||||
integration (deferred to Phase 2).
|
||||
- ADR-011 (`docs/decisions/011-update-management.md`) — version pinning policy
|
||||
(future: pin Docker CE version explicitly).
|
||||
|
|
|
|||
|
|
@ -1 +1,8 @@
|
|||
---
|
||||
# Docker engine install (ADR-004). Cluster-specific daemon hardening + nftables.d
|
||||
# integration are deferred to when the cluster + host firewall exist.
|
||||
docker_host__packages:
|
||||
- docker-ce
|
||||
- docker-ce-cli
|
||||
- containerd.io
|
||||
- docker-compose-plugin
|
||||
|
|
|
|||
|
|
@ -4,8 +4,14 @@
|
|||
gather_facts: true
|
||||
|
||||
tasks:
|
||||
- name: Add verification tasks here
|
||||
ansible.builtin.assert:
|
||||
that: true
|
||||
msg: "Replace this with real assertions"
|
||||
- name: Verify docker binary is present
|
||||
ansible.builtin.command: docker --version
|
||||
register: docker_version_output
|
||||
changed_when: false
|
||||
tags: [verify]
|
||||
|
||||
- name: Assert docker --version succeeded
|
||||
ansible.builtin.assert:
|
||||
that: docker_version_output.rc == 0
|
||||
msg: "docker --version failed — Docker was not installed correctly"
|
||||
tags: [verify]
|
||||
|
|
|
|||
|
|
@ -1,13 +1,39 @@
|
|||
---
|
||||
# docker_host — Docker engine + Compose runtime for hosts in the docker_hosts group.
|
||||
#
|
||||
# SCAFFOLDED, NOT YET IMPLEMENTED. This role is referenced by playbooks/site.yml so the
|
||||
# full standard state is expressed end-to-end, but it has no tasks yet — applying it is a
|
||||
# no-op. See STATUS.md ("Scaffolded but empty") and ADR-004 (Docker & Compose model).
|
||||
#
|
||||
# Planned scope (ADR-002/004/020):
|
||||
# - install Docker engine + compose plugin (version-pinned, per ADR-011)
|
||||
# - daemon hardening: iptables:false (host nftables owns the firewall, ADR-020),
|
||||
# log-driver, live-restore, userns where practical
|
||||
# - render container forward/NAT rules into /etc/nftables.d/*.nft (the base-role hook)
|
||||
# - deploy per-service Compose stacks from the service roles (one service = one role)
|
||||
- name: Install prerequisites
|
||||
ansible.builtin.apt:
|
||||
name: [ca-certificates, curl, gnupg]
|
||||
state: present
|
||||
update_cache: true
|
||||
tags: [packages]
|
||||
|
||||
- name: Ensure /etc/apt/keyrings exists
|
||||
ansible.builtin.file:
|
||||
path: /etc/apt/keyrings
|
||||
state: directory
|
||||
mode: "0755"
|
||||
tags: [packages]
|
||||
|
||||
- name: Add Docker's APT GPG key
|
||||
ansible.builtin.get_url:
|
||||
url: https://download.docker.com/linux/debian/gpg
|
||||
dest: /etc/apt/keyrings/docker.asc
|
||||
mode: "0644"
|
||||
tags: [packages]
|
||||
|
||||
- name: Add the Docker APT repository
|
||||
ansible.builtin.apt_repository:
|
||||
repo: >-
|
||||
deb [arch={{ 'amd64' if ansible_architecture == 'x86_64' else ansible_architecture }}
|
||||
signed-by=/etc/apt/keyrings/docker.asc]
|
||||
https://download.docker.com/linux/debian
|
||||
{{ ansible_distribution_release }} stable
|
||||
filename: docker
|
||||
state: present
|
||||
tags: [packages]
|
||||
|
||||
- name: Install Docker engine + compose plugin
|
||||
ansible.builtin.apt:
|
||||
name: "{{ docker_host__packages }}"
|
||||
state: present
|
||||
update_cache: true
|
||||
tags: [packages]
|
||||
|
|
|
|||
62
roles/reverse_proxy/README.md
Normal file
62
roles/reverse_proxy/README.md
Normal file
|
|
@ -0,0 +1,62 @@
|
|||
# reverse_proxy
|
||||
|
||||
Boma's standard Caddy reverse proxy (ADR-024). Runs on `askari` (the off-site
|
||||
Hetzner host) and terminates TLS for all public-facing services via ACME HTTP-01.
|
||||
Uses the official `caddy:2` image — no custom build, no DNS plugin, no token required.
|
||||
|
||||
## How TLS works
|
||||
|
||||
Caddy obtains per-hostname certificates using the ACME HTTP-01 challenge. Port 80
|
||||
must be reachable from the internet for the challenge to succeed. Each `host` in
|
||||
`reverse_proxy__routes` gets its own certificate automatically.
|
||||
|
||||
> **DNS-01 (for mesh/LAN-only cluster services) is deferred to Phase 2.** The
|
||||
> `caddy-dns/gandi` plugin failed to issue certificates during M4a and needs
|
||||
> investigation before it can be used.
|
||||
|
||||
## Route catalog — `reverse_proxy__routes`
|
||||
|
||||
Services register themselves as routes by appending an entry to
|
||||
`reverse_proxy__routes` in `group_vars/all/reverse_proxy.yml`:
|
||||
|
||||
```yaml
|
||||
reverse_proxy__routes:
|
||||
- {host: app.askari.wingu.me, upstream: "app:8080"}
|
||||
- {host: health.askari.wingu.me, respond: "ok"}
|
||||
```
|
||||
|
||||
Each entry renders a separate server block in the Caddyfile:
|
||||
|
||||
```
|
||||
app.askari.wingu.me {
|
||||
reverse_proxy app:8080
|
||||
}
|
||||
|
||||
health.askari.wingu.me {
|
||||
respond "ok" 200
|
||||
}
|
||||
```
|
||||
|
||||
Use `upstream` to proxy to a Docker service, or `respond` to return a static string.
|
||||
|
||||
## Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `reverse_proxy__base_dir` | `/opt/services/reverse_proxy` | Working directory for Compose project |
|
||||
| `reverse_proxy__acme_email` | `admin@example.test` | ACME registration email |
|
||||
| `reverse_proxy__routes` | `[]` | List of `{host, upstream}` or `{host, respond}` entries |
|
||||
| `reverse_proxy__manage` | `true` | Set `false` in Molecule to skip Docker tasks |
|
||||
|
||||
Production overrides live in
|
||||
`inventories/production/group_vars/all/reverse_proxy.yml`.
|
||||
|
||||
## `reverse_proxy__manage` toggle
|
||||
|
||||
Docker operations (`docker compose up`) are gated on `reverse_proxy__manage | bool`.
|
||||
Set it to `false` in Molecule so the role can be tested (template rendering, directory
|
||||
creation) without a Docker daemon.
|
||||
|
||||
## Secrets
|
||||
|
||||
None. HTTP-01 requires no credentials.
|
||||
6
roles/reverse_proxy/defaults/main.yml
Normal file
6
roles/reverse_proxy/defaults/main.yml
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
---
|
||||
# Caddy reverse proxy (ADR-024). Vanilla Caddy; TLS via ACME HTTP-01 (public hosts).
|
||||
reverse_proxy__base_dir: /opt/services/reverse_proxy
|
||||
reverse_proxy__acme_email: admin@example.test
|
||||
reverse_proxy__routes: [] # each: {host: x, upstream: "svc:port"} OR {host: x, respond: "text"}
|
||||
reverse_proxy__manage: true # set false in Molecule to render without Docker
|
||||
7
roles/reverse_proxy/handlers/main.yml
Normal file
7
roles/reverse_proxy/handlers/main.yml
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
---
|
||||
- name: Reload caddy
|
||||
listen: reload caddy
|
||||
community.docker.docker_container_exec:
|
||||
container: caddy
|
||||
command: caddy reload --config /etc/caddy/Caddyfile
|
||||
when: reverse_proxy__manage | bool
|
||||
13
roles/reverse_proxy/meta/main.yml
Normal file
13
roles/reverse_proxy/meta/main.yml
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
---
|
||||
galaxy_info:
|
||||
author: sjat
|
||||
description: >-
|
||||
Caddy reverse proxy with ACME DNS-01 TLS via Gandi (ADR-024). Builds the
|
||||
custom image on-host (caddy-dns/gandi) and manages it via Docker Compose.
|
||||
license: MIT
|
||||
min_ansible_version: "2.17"
|
||||
platforms:
|
||||
- name: Debian
|
||||
versions:
|
||||
- trixie
|
||||
dependencies: []
|
||||
16
roles/reverse_proxy/molecule/default/converge.yml
Normal file
16
roles/reverse_proxy/molecule/default/converge.yml
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
- name: Converge
|
||||
hosts: all
|
||||
gather_facts: true
|
||||
|
||||
vars:
|
||||
reverse_proxy__manage: false
|
||||
reverse_proxy__acme_email: admin@example.test
|
||||
reverse_proxy__routes:
|
||||
- host: app.example.test
|
||||
upstream: "app:80"
|
||||
- host: t.example.test
|
||||
respond: "ok"
|
||||
|
||||
roles:
|
||||
- role: reverse_proxy
|
||||
31
roles/reverse_proxy/molecule/default/molecule.yml
Normal file
31
roles/reverse_proxy/molecule/default/molecule.yml
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
---
|
||||
dependency:
|
||||
name: galaxy
|
||||
options:
|
||||
requirements-file: ../../requirements.yml
|
||||
|
||||
driver:
|
||||
name: docker
|
||||
|
||||
platforms:
|
||||
- name: instance
|
||||
# Project-owned image built from .docker/molecule-debian13/Dockerfile
|
||||
# and hosted in the Forgejo container registry.
|
||||
# Build/push with: make molecule-image / make molecule-image-push
|
||||
image: forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest
|
||||
pre_build_image: true
|
||||
privileged: true # required for systemd
|
||||
cgroupns_mode: host
|
||||
volumes:
|
||||
- /sys/fs/cgroup:/sys/fs/cgroup:rw
|
||||
command: /lib/systemd/systemd
|
||||
|
||||
provisioner:
|
||||
name: ansible
|
||||
inventory:
|
||||
host_vars:
|
||||
instance:
|
||||
ansible_user: root
|
||||
|
||||
verifier:
|
||||
name: ansible
|
||||
22
roles/reverse_proxy/molecule/default/verify.yml
Normal file
22
roles/reverse_proxy/molecule/default/verify.yml
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
---
|
||||
- name: Verify
|
||||
hosts: all
|
||||
gather_facts: false
|
||||
|
||||
tasks:
|
||||
- name: Slurp the rendered Caddyfile
|
||||
ansible.builtin.slurp:
|
||||
src: /opt/services/reverse_proxy/Caddyfile
|
||||
register: _caddyfile
|
||||
tags: [verify]
|
||||
|
||||
- name: Assert Caddyfile exists and contains expected content
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- _caddyfile.content | b64decode | length > 0
|
||||
- "'app.example.test' in (_caddyfile.content | b64decode)"
|
||||
- "'reverse_proxy app:80' in (_caddyfile.content | b64decode)"
|
||||
- "'respond \"ok\" 200' in (_caddyfile.content | b64decode)"
|
||||
fail_msg: "Caddyfile is missing expected content"
|
||||
success_msg: "Caddyfile rendered correctly"
|
||||
tags: [verify]
|
||||
29
roles/reverse_proxy/tasks/main.yml
Normal file
29
roles/reverse_proxy/tasks/main.yml
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
---
|
||||
- name: Ensure the service directory exists
|
||||
ansible.builtin.file:
|
||||
path: "{{ reverse_proxy__base_dir }}"
|
||||
state: directory
|
||||
mode: "0750"
|
||||
tags: [config]
|
||||
|
||||
- name: Render the Caddyfile
|
||||
ansible.builtin.template:
|
||||
src: Caddyfile.j2
|
||||
dest: "{{ reverse_proxy__base_dir }}/Caddyfile"
|
||||
mode: "0644"
|
||||
notify: reload caddy
|
||||
tags: [config]
|
||||
|
||||
- name: Render the compose file
|
||||
ansible.builtin.template:
|
||||
src: docker-compose.yml.j2
|
||||
dest: "{{ reverse_proxy__base_dir }}/docker-compose.yml"
|
||||
mode: "0644"
|
||||
tags: [config]
|
||||
|
||||
- name: Bring the reverse proxy up
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: "{{ reverse_proxy__base_dir }}"
|
||||
state: present
|
||||
when: reverse_proxy__manage | bool
|
||||
tags: [deploy]
|
||||
12
roles/reverse_proxy/templates/Caddyfile.j2
Normal file
12
roles/reverse_proxy/templates/Caddyfile.j2
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
{
|
||||
email {{ reverse_proxy__acme_email }}
|
||||
}
|
||||
{% for r in reverse_proxy__routes %}
|
||||
{{ r.host }} {
|
||||
{% if r.upstream is defined %}
|
||||
reverse_proxy {{ r.upstream }}
|
||||
{% else %}
|
||||
respond "{{ r.respond | default('boma') }}" 200
|
||||
{% endif %}
|
||||
}
|
||||
{% endfor %}
|
||||
22
roles/reverse_proxy/templates/docker-compose.yml.j2
Normal file
22
roles/reverse_proxy/templates/docker-compose.yml.j2
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
services:
|
||||
caddy:
|
||||
image: caddy:2
|
||||
container_name: caddy
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "80:80"
|
||||
- "443:443"
|
||||
volumes:
|
||||
- ./Caddyfile:/etc/caddy/Caddyfile:ro
|
||||
- caddy_data:/data
|
||||
- caddy_config:/config
|
||||
networks:
|
||||
- boma
|
||||
|
||||
volumes:
|
||||
caddy_data:
|
||||
caddy_config:
|
||||
|
||||
networks:
|
||||
boma:
|
||||
name: boma
|
||||
|
|
@ -5,13 +5,14 @@
|
|||
module "askari" {
|
||||
source = "../../modules/hetzner_vm"
|
||||
|
||||
name = "askari"
|
||||
server_type = "cx23" # x86, 2 vCPU / 4 GB / 40 GB (CAX11/ARM was out of stock in
|
||||
name = "askari"
|
||||
server_type = "cx23" # x86, 2 vCPU / 4 GB / 40 GB (CAX11/ARM was out of stock in
|
||||
# every EU location 2026-06-14; cx23 is same-spec + cheaper)
|
||||
location = "hel1" # Helsinki
|
||||
image = "debian-13"
|
||||
ansible_ssh_pubkey = var.ansible_ssh_pubkey
|
||||
ssh_admin_cidrs = var.ssh_admin_cidrs
|
||||
public_web = true # Caddy 80/443 + NetBird 3478 (M4)
|
||||
labels = {
|
||||
env = "offsite"
|
||||
group = "offsite_hosts"
|
||||
|
|
|
|||
|
|
@ -26,14 +26,35 @@ resource "hcloud_ssh_key" "ansible" {
|
|||
resource "hcloud_firewall" "this" {
|
||||
name = "${var.name}-fw"
|
||||
|
||||
# SSH from the control node only. NetBird ports (UDP 3478, TCP 80/443) are added
|
||||
# in M4 when the coordinator deploys (ADR-020); host nftables stays catalog-driven.
|
||||
# SSH from the control node only.
|
||||
rule {
|
||||
direction = "in"
|
||||
protocol = "tcp"
|
||||
port = "22"
|
||||
source_ips = var.ssh_admin_cidrs
|
||||
}
|
||||
|
||||
# Public web (Caddy 80/443) + NetBird STUN/TURN (3478/udp) — only when public_web
|
||||
# (ADR-024, M4). Host nftables stays catalog-driven (ADR-020).
|
||||
dynamic "rule" {
|
||||
for_each = var.public_web ? ["80", "443"] : []
|
||||
content {
|
||||
direction = "in"
|
||||
protocol = "tcp"
|
||||
port = rule.value
|
||||
source_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "rule" {
|
||||
for_each = var.public_web ? ["3478"] : []
|
||||
content {
|
||||
direction = "in"
|
||||
protocol = "udp"
|
||||
port = rule.value
|
||||
source_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "hcloud_server" "this" {
|
||||
|
|
|
|||
|
|
@ -28,6 +28,12 @@ variable "ssh_admin_cidrs" {
|
|||
type = list(string)
|
||||
}
|
||||
|
||||
variable "public_web" {
|
||||
description = "Open the public web/NetBird ports (80/443 TCP, 3478 UDP) to the internet"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "labels" {
|
||||
description = "Hetzner resource labels (metadata only)"
|
||||
type = map(string)
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue