Compare commits
No commits in common. "1862b7a8286af0d3d9455e8dc5aeae0c013a8668" and "dd8c6825badb16d8519e59b77074de47ee7bb88a" have entirely different histories.
1862b7a828
...
dd8c6825ba
26 changed files with 38 additions and 500 deletions
|
|
@ -249,7 +249,6 @@ Single-contributor, trunk-based (no merge requests / approval gates):
|
||||||
| Operational access | `docs/decisions/021-operational-access.md` |
|
| Operational access | `docs/decisions/021-operational-access.md` |
|
||||||
| Backup & disaster recovery | `docs/decisions/022-backup.md` |
|
| Backup & disaster recovery | `docs/decisions/022-backup.md` |
|
||||||
| ADR structure & lifecycle | `docs/decisions/023-adr-structure.md` |
|
| ADR structure & lifecycle | `docs/decisions/023-adr-structure.md` |
|
||||||
| Reverse proxy (Caddy) | `docs/decisions/024-reverse-proxy.md` |
|
|
||||||
| Adding a new role | `docs/runbooks/new-role.md` |
|
| Adding a new role | `docs/runbooks/new-role.md` |
|
||||||
| Adding a new host | `docs/runbooks/new-host.md` |
|
| Adding a new host | `docs/runbooks/new-host.md` |
|
||||||
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |
|
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |
|
||||||
|
|
|
||||||
|
|
@ -30,8 +30,7 @@ _Last reviewed: 2026-06-14._
|
||||||
| `make check` / `make deploy PLAYBOOK=<name>` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). |
|
| `make check` / `make deploy PLAYBOOK=<name>` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). |
|
||||||
| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (SPF `-all` + DMARC reject), and the Gandi-defaults purge are defined + unit-tested (`tests/test_public_dns.py`). **Applied to wingu.me (2026-06-14):** purged Gandi's 13 seeded defaults; zone now holds only the SPF + DMARC TXT records; idempotent re-run clean. No null-MX (Gandi rejects `0 .`) — the MX is removed, so no MX + no apex A = no mail. M1 of the roadmap. |
|
| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (SPF `-all` + DMARC reject), and the Gandi-defaults purge are defined + unit-tested (`tests/test_public_dns.py`). **Applied to wingu.me (2026-06-14):** purged Gandi's 13 seeded defaults; zone now holds only the SPF + DMARC TXT records; idempotent re-run clean. No null-MX (Gandi rejects `0 .`) — the MX is removed, so no MX + no apex A = no mail. M1 of the roadmap. |
|
||||||
| `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (now relevant — the offsite tfstate exists). |
|
| `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (now relevant — the offsite tfstate exists). |
|
||||||
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **SSH-hardened + fail2ban (M3).** **Docker + Caddy reverse proxy (M4a):** `docker_host` + `reverse_proxy` (vanilla Caddy, HTTP-01) applied; `https://test.askari.wingu.me` serves a valid Let's Encrypt cert ✓ (firewall opens 80/443/3478). **Pending:** NetBird coordinator (M4b), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). |
|
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **SSH-hardened + fail2ban (M3 `hardening` concern applied).** **Pending:** NetBird coordinator (M4), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). |
|
||||||
| `roles/docker_host/` (Docker engine) + `roles/reverse_proxy/` (Caddy, ADR-024) | **Built + applied** (askari, M4a). `docker_host` installs Docker CE + compose; `reverse_proxy` is boma's standard Caddy proxy (HTTP-01 for public hosts; routes from `reverse_proxy__routes`). DNS-01 for cluster mesh/LAN-only services is deferred to Phase 2 (caddy-dns/gandi unresolved — see FRICTION). |
|
|
||||||
|
|
||||||
## Scaffolded but empty — NOT implemented
|
## Scaffolded but empty — NOT implemented
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -21,20 +21,6 @@ earning its keep.
|
||||||
|
|
||||||
_(append new raw signals here; the next kaizen review consumes them)_
|
_(append new raw signals here; the next kaizen review consumes them)_
|
||||||
|
|
||||||
- `[gotcha]` **Hetzner IPs are 403'd by Google's Go module infra; caddy-dns/gandi DNS-01
|
|
||||||
didn't issue** (2026-06-14, M4a): building the custom Caddy image *on askari* failed —
|
|
||||||
`proxy.golang.org` and `golang.org` both return **403 Forbidden** to the Hetzner IP
|
|
||||||
(worked on ubongo). Reworked the role to build on the control node + `docker save`/`load`
|
|
||||||
to the target. *Then* the `caddy-dns/gandi` DNS-01 plugin would not create the
|
|
||||||
`_acme-challenge` TXT despite a token verified to (a) be in Caddy's env and (b) create
|
|
||||||
TXT records via the Gandi API directly — no plugin error, just "propagation timeout,
|
|
||||||
last error <nil>"; resolvers/timeout tuning didn't help. **Resolution:** askari is a
|
|
||||||
*public* host, so switched it to **HTTP-01 + vanilla Caddy** (works, drops the custom
|
|
||||||
image entirely). DNS-01 deferred to Phase 2 (cluster's mesh/LAN-only services) — the
|
|
||||||
plugin + the Hetzner-build-block to be solved then. → lesson: prefer HTTP-01 wherever a
|
|
||||||
host is publicly reachable; reserve DNS-01 (and its plugin/build complexity) for hosts
|
|
||||||
that genuinely can't do HTTP-01. Both bugs surfaced only on the live host.
|
|
||||||
|
|
||||||
- `[gotcha]` **A tag on `include_tasks` does NOT reach the included tasks — need
|
- `[gotcha]` **A tag on `include_tasks` does NOT reach the included tasks — need
|
||||||
`apply: {tags:}`** (2026-06-14): M3's `base/tasks/main.yml` tagged the ssh/fail2ban
|
`apply: {tags:}`** (2026-06-14): M3's `base/tasks/main.yml` tagged the ssh/fail2ban
|
||||||
`include_tasks` with `hardening`, but `make deploy … TAGS=hardening` ran *nothing*
|
`include_tasks` with `hardening`, but `make deploy … TAGS=hardening` ran *nothing*
|
||||||
|
|
|
||||||
|
|
@ -109,15 +109,8 @@ active. Full CIS L1/L2, auditd, AppArmor, AIDE remain deferred to Phase 2 (TODO
|
||||||
|
|
||||||
### M4 · NetBird control plane on `askari` — first real service role
|
### M4 · NetBird control plane on `askari` — first real service role
|
||||||
|
|
||||||
Built in two phases. **M4a (platform) — ✅ DONE:** Docker on askari + boma's standard
|
|
||||||
**Caddy** reverse proxy (ADR-024), proven by `https://test.askari.wingu.me` serving a
|
|
||||||
valid Let's Encrypt cert (HTTP-01 — DNS-01 deferred to Phase 2, see ADR-024/FRICTION).
|
|
||||||
Firewall opened 80/443/3478. Spec/plan: `…2026-06-14-netbird-coordinator-m4-design.md` /
|
|
||||||
`…2026-06-14-m4a-docker-caddy.md`. **M4b (next):** the `netbird` service role — read
|
|
||||||
NetBird's current self-host compose then.
|
|
||||||
|
|
||||||
Deploy the NetBird stack (management / signal / relay / Coturn + dashboard) with the
|
Deploy the NetBird stack (management / signal / relay / Coturn + dashboard) with the
|
||||||
**embedded IdP** (ADR-016 — no Authentik dependency), fronted by the now-proven Caddy.
|
**embedded IdP** (ADR-016 — no Authentik dependency).
|
||||||
|
|
||||||
- **First exercise of:** the service-role conventions (`SECURITY.md` / `VERIFY.md` /
|
- **First exercise of:** the service-role conventions (`SECURITY.md` / `VERIFY.md` /
|
||||||
`ACCESS.md` / `BACKUP.md`), public **TLS / ACME**, and the **backup contract** —
|
`ACCESS.md` / `BACKUP.md`), public **TLS / ACME**, and the **backup contract** —
|
||||||
|
|
@ -163,8 +156,8 @@ Canonical dependency order:
|
||||||
3. **`docker_host`** — real Docker engine + Compose, daemon hardening, `nftables.d`
|
3. **`docker_host`** — real Docker engine + Compose, daemon hardening, `nftables.d`
|
||||||
container rules (currently a scaffold; ADR-004, ADR-020).
|
container rules (currently a scaffold; ADR-004, ADR-020).
|
||||||
4. **`dns` role** — render the internal zone from inventory (ADR-007).
|
4. **`dns` role** — render the internal zone from inventory (ADR-007).
|
||||||
5. **Auth + reverse proxy** — Authentik + **Caddy** (ADR-024): the foundation every
|
5. **Auth + reverse proxy** — Authentik + Traefik: the foundation every service sits
|
||||||
service sits behind with authentication (ADR-002).
|
behind with authentication (ADR-002).
|
||||||
6. **Monitoring** — Loki + Grafana Alloy (logging, ADR-018) + Prometheus/exporters +
|
6. **Monitoring** — Loki + Grafana Alloy (logging, ADR-018) + Prometheus/exporters +
|
||||||
Uptime Kuma; decide which alerts live where (TODO 3.6).
|
Uptime Kuma; decide which alerts live where (TODO 3.6).
|
||||||
7. **Service roles** — PhotoPrism, email, indexers, … (`docs/CAPABILITIES.md`); each
|
7. **Service roles** — PhotoPrism, email, indexers, … (`docs/CAPABILITIES.md`); each
|
||||||
|
|
|
||||||
|
|
@ -1,117 +0,0 @@
|
||||||
# ADR-024 — Reverse proxy: Caddy (ACME — HTTP-01 public, DNS-01 private)
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
Accepted (2026-06-14). Amends the soft Traefik assumption carried by the roadmap
|
|
||||||
(Phase-2 step 5) and ADR-017 prose; those are updated to read "Caddy (ADR-024)".
|
|
||||||
|
|
||||||
> **Cert method follows exposure (revised 2026-06-14, M4a).** The cert *challenge*
|
|
||||||
> depends on whether a host is publicly reachable: **public hosts** (askari) use
|
|
||||||
> **HTTP-01** with **vanilla Caddy** — simplest, no plugin; **mesh/LAN-only cluster
|
|
||||||
> services** (no public A-record) need **DNS-01** (the M1 Gandi capability), since they
|
|
||||||
> can't satisfy HTTP-01. The DNS-01 path is **deferred to Phase 2**: the `caddy-dns/gandi`
|
|
||||||
> plugin did not create the ACME TXT records on askari despite a verified-valid token
|
|
||||||
> (and Hetzner IPs are 403'd by Google's Go module infra, blocking the on-host custom
|
|
||||||
> build) — both to be sorted when the cluster's private services actually need DNS-01.
|
|
||||||
> The body below describes the DNS-01 design; askari (M4a) ships on HTTP-01.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
boma needs a reverse proxy to front its services with TLS. ADR-002 requires every
|
|
||||||
service to sit behind a proxy with authentication before it is reachable; ADR-007/M1
|
|
||||||
delivers a `*.boma.<domain>` wildcard cert via ACME DNS-01 against Gandi — the only
|
|
||||||
viable cert path for mesh/LAN-only services that cannot satisfy HTTP-01 (no public
|
|
||||||
A-record to point at).
|
|
||||||
|
|
||||||
The roadmap (Phase-2, step 5) and ADR-017 prose assumed **Traefik + Authentik** as the
|
|
||||||
auth-and-proxy pair without an ADR ever pinning Traefik. On closer inspection:
|
|
||||||
|
|
||||||
- Traefik's headline feature is **dynamic Docker-label discovery** — it discovers and
|
|
||||||
routes services automatically from container labels without any static config.
|
|
||||||
- boma already renders *all* config from Ansible templates and the `group_vars` catalog
|
|
||||||
(ADR-004). That makes dynamic label discovery a disadvantage: a service that is not in
|
|
||||||
the catalog does not exist (CLAUDE.md), so any route that Traefik auto-discovers
|
|
||||||
outside the catalog would be unaudited.
|
|
||||||
- The first reverse-proxy instance is needed on `askari` for M4 (NetBird), a host where
|
|
||||||
`docker_hosts` patterns are being established under off-site/VPS constraints, not a
|
|
||||||
full Proxmox cluster with many services.
|
|
||||||
|
|
||||||
No production investment in Traefik config has been made; the decision can be made
|
|
||||||
cleanly here.
|
|
||||||
|
|
||||||
## Decision
|
|
||||||
|
|
||||||
boma's reverse proxy is **Caddy**.
|
|
||||||
|
|
||||||
### 1. Rationale for Caddy over Traefik
|
|
||||||
|
|
||||||
1. Traefik's dynamic label discovery is wasted — boma renders config from the catalog;
|
|
||||||
Caddy's static Caddyfile maps naturally to "render from templates" (ADR-004).
|
|
||||||
2. Caddy's Caddyfile is simple to template with `ansible.builtin.template`; one file,
|
|
||||||
one `ansible_managed` header, no side-channel label state.
|
|
||||||
3. **Automatic HTTPS** via ACME DNS-01: the `caddy-dns/gandi` plugin satisfies the
|
|
||||||
Gandi DNS-01 challenge, which is the only cert path for services with no public
|
|
||||||
A-record (ADR-007/M1 wildcard strategy).
|
|
||||||
4. Far simpler for a solo operator: no dashboard-as-a-service, no routing-rule DSL,
|
|
||||||
no dynamic config files to reconcile.
|
|
||||||
5. `forward_auth` to Authentik is a first-class Caddy directive — the planned
|
|
||||||
Authentik auth story (ADR-002) is preserved without Traefik as the middleman.
|
|
||||||
|
|
||||||
### 2. Custom image
|
|
||||||
|
|
||||||
Caddy's official Docker image does not include third-party DNS plugins. The `caddy-dns/gandi`
|
|
||||||
plugin must be compiled in via `xcaddy`. boma builds a custom image:
|
|
||||||
|
|
||||||
```
|
|
||||||
FROM caddy:builder AS builder
|
|
||||||
RUN xcaddy build --with github.com/caddy-dns/gandi
|
|
||||||
|
|
||||||
FROM caddy:latest
|
|
||||||
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
|
|
||||||
```
|
|
||||||
|
|
||||||
This image is maintained as a boma artifact (Forgejo registry, pinned digest in the
|
|
||||||
Compose template). It is the cost of the Gandi DNS-01 path — unavoidable regardless of
|
|
||||||
proxy choice.
|
|
||||||
|
|
||||||
### 3. Deployment scope
|
|
||||||
|
|
||||||
The first Caddy instance fronts the NetBird stack on `askari` (M4). The pattern
|
|
||||||
generalises to the Proxmox cluster in Phase 2 when services multiply.
|
|
||||||
|
|
||||||
### 4. Authentik integration (deferred)
|
|
||||||
|
|
||||||
`forward_auth` to Authentik is deferred to Phase 2 (when Authentik is deployed on the
|
|
||||||
cluster). The Caddyfile template will carry a placeholder comment. No Traefik-Authentik
|
|
||||||
middleware migration is required.
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
- **Roadmap Phase-2 step 5** is updated from "Authentik + Traefik" to "Authentik +
|
|
||||||
Caddy (ADR-024)".
|
|
||||||
- **ADR-017 prose** that mentioned Traefik is updated to read "Caddy (ADR-024)".
|
|
||||||
- A custom Caddy image (`xcaddy` + `caddy-dns/gandi`) must be built, pushed to the
|
|
||||||
Forgejo registry, and kept current (plugin + base image updates).
|
|
||||||
- Caddyfile config is rendered by Ansible from `group_vars` — consistent with ADR-004
|
|
||||||
and easier to review than distributed container labels.
|
|
||||||
- `forward_auth` to Authentik is available when Authentik is deployed; no extra
|
|
||||||
middleware layer required.
|
|
||||||
- The `proxy` concern tag (already in `tests/tags.yml`) covers Caddy config tasks.
|
|
||||||
|
|
||||||
## What was ruled out
|
|
||||||
|
|
||||||
- **Traefik** — dynamic label discovery is a mismatch for boma's catalog-rendered
|
|
||||||
config model (ADR-004); more complex for a solo operator; no prior investment to
|
|
||||||
protect.
|
|
||||||
- **nginx / HAProxy** — no built-in ACME; require a separate ACME client (certbot,
|
|
||||||
acme.sh) adding operational surface; Caddy's integrated ACME is simpler.
|
|
||||||
- **NetBird's bundled TLS** — NetBird's management UI can serve its own TLS, but that
|
|
||||||
doesn't generalise; a real proxy separates concerns and applies to every service.
|
|
||||||
|
|
||||||
## Related
|
|
||||||
|
|
||||||
- ADR-002 — services behind a proxy with authentication (the requirement this satisfies).
|
|
||||||
- ADR-004 — Docker & Compose model (template-rendered config, catalog-driven).
|
|
||||||
- ADR-007 / M1 — Gandi DNS-01 ACME path (the TLS strategy Caddy implements).
|
|
||||||
- ADR-016 — NetBird (M4 is the first deployment of this proxy).
|
|
||||||
- ADR-017 — service-UI verification; forward_auth to Authentik is the future auth story.
|
|
||||||
|
|
@ -13,9 +13,6 @@ public_dns__records:
|
||||||
# askari (off-site host, TF-provisioned M2) — public A so it's reachable by name +
|
# askari (off-site host, TF-provisioned M2) — public A so it's reachable by name +
|
||||||
# for future ACME on *.askari.wingu.me. Mesh/LAN-only home services never appear here.
|
# for future ACME on *.askari.wingu.me. Mesh/LAN-only home services never appear here.
|
||||||
- {record: askari, type: A, values: ["77.42.120.136"], ttl: 1800}
|
- {record: askari, type: A, values: ["77.42.120.136"], ttl: 1800}
|
||||||
# Wildcard for askari's services (test/netbird/...) → same host; Caddy gets a
|
|
||||||
# *.askari.wingu.me cert via DNS-01 (M4a).
|
|
||||||
- {record: "*.askari", type: A, values: ["77.42.120.136"], ttl: 1800}
|
|
||||||
|
|
||||||
# Absent — Gandi's auto-seeded defaults we don't want (purged once, idempotent thereafter).
|
# Absent — Gandi's auto-seeded defaults we don't want (purged once, idempotent thereafter).
|
||||||
public_dns__absent:
|
public_dns__absent:
|
||||||
|
|
|
||||||
|
|
@ -1,6 +0,0 @@
|
||||||
---
|
|
||||||
# Caddy reverse proxy on askari (ADR-024). Vanilla Caddy, ACME HTTP-01 (public host).
|
|
||||||
reverse_proxy__acme_email: admin@wingu.me
|
|
||||||
reverse_proxy__routes:
|
|
||||||
- {host: test.askari.wingu.me, respond: "boma reverse proxy"}
|
|
||||||
# M4b appends: {host: netbird.askari.wingu.me, upstream: "netbird-dashboard:80"}
|
|
||||||
|
|
@ -1,11 +0,0 @@
|
||||||
---
|
|
||||||
# offsite.yml — off-site hosts (askari): Docker engine + the Caddy reverse proxy.
|
|
||||||
# NetBird (M4b) appends to this play. Run: make deploy PLAYBOOK=offsite LIMIT=askari
|
|
||||||
- name: Configure off-site hosts
|
|
||||||
hosts: offsite_hosts
|
|
||||||
become: true
|
|
||||||
roles:
|
|
||||||
- role: docker_host
|
|
||||||
tags: [docker_host]
|
|
||||||
- role: reverse_proxy
|
|
||||||
tags: [reverse_proxy]
|
|
||||||
|
|
@ -16,8 +16,3 @@ collections:
|
||||||
# LiveDNS). PAT auth requires >= 9.0.0.
|
# LiveDNS). PAT auth requires >= 9.0.0.
|
||||||
- name: community.general
|
- name: community.general
|
||||||
version: ">=9.0.0"
|
version: ">=9.0.0"
|
||||||
|
|
||||||
# community.docker — docker_image (build the Caddy image on-host) + docker_compose_v2
|
|
||||||
# (reverse_proxy role).
|
|
||||||
- name: community.docker
|
|
||||||
version: ">=3.0.0"
|
|
||||||
|
|
|
||||||
|
|
@ -1,25 +1,25 @@
|
||||||
# docker_host
|
# docker_host
|
||||||
|
|
||||||
Installs the Docker CE engine and the Compose plugin on every host in the
|
Docker engine + Compose runtime applied to every host in the `docker_hosts` group.
|
||||||
`docker_hosts` group. Provides the container runtime that per-service roles
|
Provides the container platform that the per-service roles (one service = one role,
|
||||||
(one service = one role, ADR-004) deploy their Compose stacks onto.
|
ADR-004) deploy their Compose stacks onto.
|
||||||
|
|
||||||
## Scope
|
> **Status: scaffolded, not yet implemented.** This role has no tasks yet — applying it
|
||||||
|
> is a no-op. It is wired into `playbooks/site.yml` so the full standard state is
|
||||||
|
> expressed end-to-end, and so `make lint` covers it. See `STATUS.md`.
|
||||||
|
|
||||||
This role covers the **engine install only**. The following are deferred to Phase 2
|
## Planned scope
|
||||||
(when the Proxmox cluster and `base` host firewall exist):
|
|
||||||
|
|
||||||
- Daemon hardening (`iptables: false`, log driver, `live-restore`, userns remapping).
|
- Install Docker engine + the Compose plugin, version-pinned (ADR-011).
|
||||||
- Rendering container forward/NAT rules into `/etc/nftables.d/*.nft` (the `base` role
|
- Daemon hardening: `iptables: false` (the host `base` firewall owns nftables, ADR-020),
|
||||||
hook for container firewall integration, ADR-020).
|
log driver, `live-restore`, user-namespace remapping where practical (ADR-002).
|
||||||
|
- Render container forward/NAT rules into `/etc/nftables.d/*.nft` — the include hook the
|
||||||
|
`base` role's ruleset exposes (see `roles/base/README.md`).
|
||||||
|
- Provide the runtime the service roles deploy their Compose files onto.
|
||||||
|
|
||||||
## Variables
|
## Variables
|
||||||
|
|
||||||
| Variable | Default | Description |
|
None yet. Placeholders will use the `docker_host__*` namespace (CLAUDE.md convention).
|
||||||
|---|---|---|
|
|
||||||
| `docker_host__packages` | `[docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin]` | APT packages installed from the Docker CE repository |
|
|
||||||
|
|
||||||
All variables use the `docker_host__` double-underscore namespace (CLAUDE.md convention).
|
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
|
|
@ -31,14 +31,4 @@ All variables use the `docker_host__` double-underscore namespace (CLAUDE.md con
|
||||||
tags: [docker_host]
|
tags: [docker_host]
|
||||||
```
|
```
|
||||||
|
|
||||||
## Tags
|
See ADR-004 (`docs/decisions/004-docker-model.md`) for the Docker & Compose model.
|
||||||
|
|
||||||
All tasks carry the `packages` concern tag (APT package install, ADR-019).
|
|
||||||
|
|
||||||
## Related
|
|
||||||
|
|
||||||
- ADR-004 (`docs/decisions/004-docker-model.md`) — Docker & Compose model.
|
|
||||||
- ADR-020 (`docs/decisions/020-firewall.md`) — daemon hardening + `nftables.d`
|
|
||||||
integration (deferred to Phase 2).
|
|
||||||
- ADR-011 (`docs/decisions/011-update-management.md`) — version pinning policy
|
|
||||||
(future: pin Docker CE version explicitly).
|
|
||||||
|
|
|
||||||
|
|
@ -1,8 +1 @@
|
||||||
---
|
---
|
||||||
# Docker engine install (ADR-004). Cluster-specific daemon hardening + nftables.d
|
|
||||||
# integration are deferred to when the cluster + host firewall exist.
|
|
||||||
docker_host__packages:
|
|
||||||
- docker-ce
|
|
||||||
- docker-ce-cli
|
|
||||||
- containerd.io
|
|
||||||
- docker-compose-plugin
|
|
||||||
|
|
|
||||||
|
|
@ -4,14 +4,8 @@
|
||||||
gather_facts: true
|
gather_facts: true
|
||||||
|
|
||||||
tasks:
|
tasks:
|
||||||
- name: Verify docker binary is present
|
- name: Add verification tasks here
|
||||||
ansible.builtin.command: docker --version
|
|
||||||
register: docker_version_output
|
|
||||||
changed_when: false
|
|
||||||
tags: [verify]
|
|
||||||
|
|
||||||
- name: Assert docker --version succeeded
|
|
||||||
ansible.builtin.assert:
|
ansible.builtin.assert:
|
||||||
that: docker_version_output.rc == 0
|
that: true
|
||||||
msg: "docker --version failed — Docker was not installed correctly"
|
msg: "Replace this with real assertions"
|
||||||
tags: [verify]
|
tags: [verify]
|
||||||
|
|
|
||||||
|
|
@ -1,39 +1,13 @@
|
||||||
---
|
---
|
||||||
- name: Install prerequisites
|
# docker_host — Docker engine + Compose runtime for hosts in the docker_hosts group.
|
||||||
ansible.builtin.apt:
|
#
|
||||||
name: [ca-certificates, curl, gnupg]
|
# SCAFFOLDED, NOT YET IMPLEMENTED. This role is referenced by playbooks/site.yml so the
|
||||||
state: present
|
# full standard state is expressed end-to-end, but it has no tasks yet — applying it is a
|
||||||
update_cache: true
|
# no-op. See STATUS.md ("Scaffolded but empty") and ADR-004 (Docker & Compose model).
|
||||||
tags: [packages]
|
#
|
||||||
|
# Planned scope (ADR-002/004/020):
|
||||||
- name: Ensure /etc/apt/keyrings exists
|
# - install Docker engine + compose plugin (version-pinned, per ADR-011)
|
||||||
ansible.builtin.file:
|
# - daemon hardening: iptables:false (host nftables owns the firewall, ADR-020),
|
||||||
path: /etc/apt/keyrings
|
# log-driver, live-restore, userns where practical
|
||||||
state: directory
|
# - render container forward/NAT rules into /etc/nftables.d/*.nft (the base-role hook)
|
||||||
mode: "0755"
|
# - deploy per-service Compose stacks from the service roles (one service = one role)
|
||||||
tags: [packages]
|
|
||||||
|
|
||||||
- name: Add Docker's APT GPG key
|
|
||||||
ansible.builtin.get_url:
|
|
||||||
url: https://download.docker.com/linux/debian/gpg
|
|
||||||
dest: /etc/apt/keyrings/docker.asc
|
|
||||||
mode: "0644"
|
|
||||||
tags: [packages]
|
|
||||||
|
|
||||||
- name: Add the Docker APT repository
|
|
||||||
ansible.builtin.apt_repository:
|
|
||||||
repo: >-
|
|
||||||
deb [arch={{ 'amd64' if ansible_architecture == 'x86_64' else ansible_architecture }}
|
|
||||||
signed-by=/etc/apt/keyrings/docker.asc]
|
|
||||||
https://download.docker.com/linux/debian
|
|
||||||
{{ ansible_distribution_release }} stable
|
|
||||||
filename: docker
|
|
||||||
state: present
|
|
||||||
tags: [packages]
|
|
||||||
|
|
||||||
- name: Install Docker engine + compose plugin
|
|
||||||
ansible.builtin.apt:
|
|
||||||
name: "{{ docker_host__packages }}"
|
|
||||||
state: present
|
|
||||||
update_cache: true
|
|
||||||
tags: [packages]
|
|
||||||
|
|
|
||||||
|
|
@ -1,62 +0,0 @@
|
||||||
# reverse_proxy
|
|
||||||
|
|
||||||
Boma's standard Caddy reverse proxy (ADR-024). Runs on `askari` (the off-site
|
|
||||||
Hetzner host) and terminates TLS for all public-facing services via ACME HTTP-01.
|
|
||||||
Uses the official `caddy:2` image — no custom build, no DNS plugin, no token required.
|
|
||||||
|
|
||||||
## How TLS works
|
|
||||||
|
|
||||||
Caddy obtains per-hostname certificates using the ACME HTTP-01 challenge. Port 80
|
|
||||||
must be reachable from the internet for the challenge to succeed. Each `host` in
|
|
||||||
`reverse_proxy__routes` gets its own certificate automatically.
|
|
||||||
|
|
||||||
> **DNS-01 (for mesh/LAN-only cluster services) is deferred to Phase 2.** The
|
|
||||||
> `caddy-dns/gandi` plugin failed to issue certificates during M4a and needs
|
|
||||||
> investigation before it can be used.
|
|
||||||
|
|
||||||
## Route catalog — `reverse_proxy__routes`
|
|
||||||
|
|
||||||
Services register themselves as routes by appending an entry to
|
|
||||||
`reverse_proxy__routes` in `group_vars/all/reverse_proxy.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
reverse_proxy__routes:
|
|
||||||
- {host: app.askari.wingu.me, upstream: "app:8080"}
|
|
||||||
- {host: health.askari.wingu.me, respond: "ok"}
|
|
||||||
```
|
|
||||||
|
|
||||||
Each entry renders a separate server block in the Caddyfile:
|
|
||||||
|
|
||||||
```
|
|
||||||
app.askari.wingu.me {
|
|
||||||
reverse_proxy app:8080
|
|
||||||
}
|
|
||||||
|
|
||||||
health.askari.wingu.me {
|
|
||||||
respond "ok" 200
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Use `upstream` to proxy to a Docker service, or `respond` to return a static string.
|
|
||||||
|
|
||||||
## Variables
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| `reverse_proxy__base_dir` | `/opt/services/reverse_proxy` | Working directory for Compose project |
|
|
||||||
| `reverse_proxy__acme_email` | `admin@example.test` | ACME registration email |
|
|
||||||
| `reverse_proxy__routes` | `[]` | List of `{host, upstream}` or `{host, respond}` entries |
|
|
||||||
| `reverse_proxy__manage` | `true` | Set `false` in Molecule to skip Docker tasks |
|
|
||||||
|
|
||||||
Production overrides live in
|
|
||||||
`inventories/production/group_vars/all/reverse_proxy.yml`.
|
|
||||||
|
|
||||||
## `reverse_proxy__manage` toggle
|
|
||||||
|
|
||||||
Docker operations (`docker compose up`) are gated on `reverse_proxy__manage | bool`.
|
|
||||||
Set it to `false` in Molecule so the role can be tested (template rendering, directory
|
|
||||||
creation) without a Docker daemon.
|
|
||||||
|
|
||||||
## Secrets
|
|
||||||
|
|
||||||
None. HTTP-01 requires no credentials.
|
|
||||||
|
|
@ -1,6 +0,0 @@
|
||||||
---
|
|
||||||
# Caddy reverse proxy (ADR-024). Vanilla Caddy; TLS via ACME HTTP-01 (public hosts).
|
|
||||||
reverse_proxy__base_dir: /opt/services/reverse_proxy
|
|
||||||
reverse_proxy__acme_email: admin@example.test
|
|
||||||
reverse_proxy__routes: [] # each: {host: x, upstream: "svc:port"} OR {host: x, respond: "text"}
|
|
||||||
reverse_proxy__manage: true # set false in Molecule to render without Docker
|
|
||||||
|
|
@ -1,7 +0,0 @@
|
||||||
---
|
|
||||||
- name: Reload caddy
|
|
||||||
listen: reload caddy
|
|
||||||
community.docker.docker_container_exec:
|
|
||||||
container: caddy
|
|
||||||
command: caddy reload --config /etc/caddy/Caddyfile
|
|
||||||
when: reverse_proxy__manage | bool
|
|
||||||
|
|
@ -1,13 +0,0 @@
|
||||||
---
|
|
||||||
galaxy_info:
|
|
||||||
author: sjat
|
|
||||||
description: >-
|
|
||||||
Caddy reverse proxy with ACME DNS-01 TLS via Gandi (ADR-024). Builds the
|
|
||||||
custom image on-host (caddy-dns/gandi) and manages it via Docker Compose.
|
|
||||||
license: MIT
|
|
||||||
min_ansible_version: "2.17"
|
|
||||||
platforms:
|
|
||||||
- name: Debian
|
|
||||||
versions:
|
|
||||||
- trixie
|
|
||||||
dependencies: []
|
|
||||||
|
|
@ -1,16 +0,0 @@
|
||||||
---
|
|
||||||
- name: Converge
|
|
||||||
hosts: all
|
|
||||||
gather_facts: true
|
|
||||||
|
|
||||||
vars:
|
|
||||||
reverse_proxy__manage: false
|
|
||||||
reverse_proxy__acme_email: admin@example.test
|
|
||||||
reverse_proxy__routes:
|
|
||||||
- host: app.example.test
|
|
||||||
upstream: "app:80"
|
|
||||||
- host: t.example.test
|
|
||||||
respond: "ok"
|
|
||||||
|
|
||||||
roles:
|
|
||||||
- role: reverse_proxy
|
|
||||||
|
|
@ -1,31 +0,0 @@
|
||||||
---
|
|
||||||
dependency:
|
|
||||||
name: galaxy
|
|
||||||
options:
|
|
||||||
requirements-file: ../../requirements.yml
|
|
||||||
|
|
||||||
driver:
|
|
||||||
name: docker
|
|
||||||
|
|
||||||
platforms:
|
|
||||||
- name: instance
|
|
||||||
# Project-owned image built from .docker/molecule-debian13/Dockerfile
|
|
||||||
# and hosted in the Forgejo container registry.
|
|
||||||
# Build/push with: make molecule-image / make molecule-image-push
|
|
||||||
image: forgejo.nyumbani.baobab.band/sjat/molecule-debian13:latest
|
|
||||||
pre_build_image: true
|
|
||||||
privileged: true # required for systemd
|
|
||||||
cgroupns_mode: host
|
|
||||||
volumes:
|
|
||||||
- /sys/fs/cgroup:/sys/fs/cgroup:rw
|
|
||||||
command: /lib/systemd/systemd
|
|
||||||
|
|
||||||
provisioner:
|
|
||||||
name: ansible
|
|
||||||
inventory:
|
|
||||||
host_vars:
|
|
||||||
instance:
|
|
||||||
ansible_user: root
|
|
||||||
|
|
||||||
verifier:
|
|
||||||
name: ansible
|
|
||||||
|
|
@ -1,22 +0,0 @@
|
||||||
---
|
|
||||||
- name: Verify
|
|
||||||
hosts: all
|
|
||||||
gather_facts: false
|
|
||||||
|
|
||||||
tasks:
|
|
||||||
- name: Slurp the rendered Caddyfile
|
|
||||||
ansible.builtin.slurp:
|
|
||||||
src: /opt/services/reverse_proxy/Caddyfile
|
|
||||||
register: _caddyfile
|
|
||||||
tags: [verify]
|
|
||||||
|
|
||||||
- name: Assert Caddyfile exists and contains expected content
|
|
||||||
ansible.builtin.assert:
|
|
||||||
that:
|
|
||||||
- _caddyfile.content | b64decode | length > 0
|
|
||||||
- "'app.example.test' in (_caddyfile.content | b64decode)"
|
|
||||||
- "'reverse_proxy app:80' in (_caddyfile.content | b64decode)"
|
|
||||||
- "'respond \"ok\" 200' in (_caddyfile.content | b64decode)"
|
|
||||||
fail_msg: "Caddyfile is missing expected content"
|
|
||||||
success_msg: "Caddyfile rendered correctly"
|
|
||||||
tags: [verify]
|
|
||||||
|
|
@ -1,29 +0,0 @@
|
||||||
---
|
|
||||||
- name: Ensure the service directory exists
|
|
||||||
ansible.builtin.file:
|
|
||||||
path: "{{ reverse_proxy__base_dir }}"
|
|
||||||
state: directory
|
|
||||||
mode: "0750"
|
|
||||||
tags: [config]
|
|
||||||
|
|
||||||
- name: Render the Caddyfile
|
|
||||||
ansible.builtin.template:
|
|
||||||
src: Caddyfile.j2
|
|
||||||
dest: "{{ reverse_proxy__base_dir }}/Caddyfile"
|
|
||||||
mode: "0644"
|
|
||||||
notify: reload caddy
|
|
||||||
tags: [config]
|
|
||||||
|
|
||||||
- name: Render the compose file
|
|
||||||
ansible.builtin.template:
|
|
||||||
src: docker-compose.yml.j2
|
|
||||||
dest: "{{ reverse_proxy__base_dir }}/docker-compose.yml"
|
|
||||||
mode: "0644"
|
|
||||||
tags: [config]
|
|
||||||
|
|
||||||
- name: Bring the reverse proxy up
|
|
||||||
community.docker.docker_compose_v2:
|
|
||||||
project_src: "{{ reverse_proxy__base_dir }}"
|
|
||||||
state: present
|
|
||||||
when: reverse_proxy__manage | bool
|
|
||||||
tags: [deploy]
|
|
||||||
|
|
@ -1,12 +0,0 @@
|
||||||
{
|
|
||||||
email {{ reverse_proxy__acme_email }}
|
|
||||||
}
|
|
||||||
{% for r in reverse_proxy__routes %}
|
|
||||||
{{ r.host }} {
|
|
||||||
{% if r.upstream is defined %}
|
|
||||||
reverse_proxy {{ r.upstream }}
|
|
||||||
{% else %}
|
|
||||||
respond "{{ r.respond | default('boma') }}" 200
|
|
||||||
{% endif %}
|
|
||||||
}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
@ -1,22 +0,0 @@
|
||||||
services:
|
|
||||||
caddy:
|
|
||||||
image: caddy:2
|
|
||||||
container_name: caddy
|
|
||||||
restart: unless-stopped
|
|
||||||
ports:
|
|
||||||
- "80:80"
|
|
||||||
- "443:443"
|
|
||||||
volumes:
|
|
||||||
- ./Caddyfile:/etc/caddy/Caddyfile:ro
|
|
||||||
- caddy_data:/data
|
|
||||||
- caddy_config:/config
|
|
||||||
networks:
|
|
||||||
- boma
|
|
||||||
|
|
||||||
volumes:
|
|
||||||
caddy_data:
|
|
||||||
caddy_config:
|
|
||||||
|
|
||||||
networks:
|
|
||||||
boma:
|
|
||||||
name: boma
|
|
||||||
|
|
@ -5,14 +5,13 @@
|
||||||
module "askari" {
|
module "askari" {
|
||||||
source = "../../modules/hetzner_vm"
|
source = "../../modules/hetzner_vm"
|
||||||
|
|
||||||
name = "askari"
|
name = "askari"
|
||||||
server_type = "cx23" # x86, 2 vCPU / 4 GB / 40 GB (CAX11/ARM was out of stock in
|
server_type = "cx23" # x86, 2 vCPU / 4 GB / 40 GB (CAX11/ARM was out of stock in
|
||||||
# every EU location 2026-06-14; cx23 is same-spec + cheaper)
|
# every EU location 2026-06-14; cx23 is same-spec + cheaper)
|
||||||
location = "hel1" # Helsinki
|
location = "hel1" # Helsinki
|
||||||
image = "debian-13"
|
image = "debian-13"
|
||||||
ansible_ssh_pubkey = var.ansible_ssh_pubkey
|
ansible_ssh_pubkey = var.ansible_ssh_pubkey
|
||||||
ssh_admin_cidrs = var.ssh_admin_cidrs
|
ssh_admin_cidrs = var.ssh_admin_cidrs
|
||||||
public_web = true # Caddy 80/443 + NetBird 3478 (M4)
|
|
||||||
labels = {
|
labels = {
|
||||||
env = "offsite"
|
env = "offsite"
|
||||||
group = "offsite_hosts"
|
group = "offsite_hosts"
|
||||||
|
|
|
||||||
|
|
@ -26,35 +26,14 @@ resource "hcloud_ssh_key" "ansible" {
|
||||||
resource "hcloud_firewall" "this" {
|
resource "hcloud_firewall" "this" {
|
||||||
name = "${var.name}-fw"
|
name = "${var.name}-fw"
|
||||||
|
|
||||||
# SSH from the control node only.
|
# SSH from the control node only. NetBird ports (UDP 3478, TCP 80/443) are added
|
||||||
|
# in M4 when the coordinator deploys (ADR-020); host nftables stays catalog-driven.
|
||||||
rule {
|
rule {
|
||||||
direction = "in"
|
direction = "in"
|
||||||
protocol = "tcp"
|
protocol = "tcp"
|
||||||
port = "22"
|
port = "22"
|
||||||
source_ips = var.ssh_admin_cidrs
|
source_ips = var.ssh_admin_cidrs
|
||||||
}
|
}
|
||||||
|
|
||||||
# Public web (Caddy 80/443) + NetBird STUN/TURN (3478/udp) — only when public_web
|
|
||||||
# (ADR-024, M4). Host nftables stays catalog-driven (ADR-020).
|
|
||||||
dynamic "rule" {
|
|
||||||
for_each = var.public_web ? ["80", "443"] : []
|
|
||||||
content {
|
|
||||||
direction = "in"
|
|
||||||
protocol = "tcp"
|
|
||||||
port = rule.value
|
|
||||||
source_ips = ["0.0.0.0/0", "::/0"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
dynamic "rule" {
|
|
||||||
for_each = var.public_web ? ["3478"] : []
|
|
||||||
content {
|
|
||||||
direction = "in"
|
|
||||||
protocol = "udp"
|
|
||||||
port = rule.value
|
|
||||||
source_ips = ["0.0.0.0/0", "::/0"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
resource "hcloud_server" "this" {
|
resource "hcloud_server" "this" {
|
||||||
|
|
|
||||||
|
|
@ -28,12 +28,6 @@ variable "ssh_admin_cidrs" {
|
||||||
type = list(string)
|
type = list(string)
|
||||||
}
|
}
|
||||||
|
|
||||||
variable "public_web" {
|
|
||||||
description = "Open the public web/NetBird ports (80/443 TCP, 3478 UDP) to the internet"
|
|
||||||
type = bool
|
|
||||||
default = false
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "labels" {
|
variable "labels" {
|
||||||
description = "Hetzner resource labels (metadata only)"
|
description = "Hetzner resource labels (metadata only)"
|
||||||
type = map(string)
|
type = map(string)
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue