docs(spec): mesh-hardening 1/3 — move askari SSH onto wt0

Decomposes the M5 mesh-hardening follow-on into 3 independent sub-specs; this
is sub-project 1. Three-layer SSH-on-wt0 (sshd ListenAddress=mesh + nftables
iifname wt0 + retire the Hetzner WAN :22), ip_nonlocal_bind to beat the
post-boot wt0 bind race (fail-closed), live wt0 fact for the listen addr,
staged cutover with the firewall auto-rollback as the safety gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-17 20:15:12 +02:00
parent e5a8e5d3b9
commit 292c204752

View file

@ -0,0 +1,156 @@
# Spec — Mesh-hardening (1 of 3): move askari's SSH onto `wt0`
Status: Accepted (2026-06-17)
## Context & scope
The **mesh-hardening follow-on** was deferred from M5 (ROADMAP). It was decomposed into
**three independent sub-projects**, each with its own spec → plan → implementation cycle:
1. **askari SSH → `wt0`** ← *this spec*
2. ubongo nftables default-deny + `ssh-from-control` (its own later spec)
3. NetBird ACL off Allow-All → scoped policies (its own later spec)
This spec covers **only (1)**. It makes askari's SSH reachable **only over the NetBird mesh
interface `wt0`** and closes the WAN `:22` surface at both the host and the Hetzner Cloud
Firewall. It does **not** touch ubongo, the NetBird ACL (stays Allow-All for now — one
moving access-layer at a time), or askari's public service exposure (Caddy 80/443, NetBird
STUN 3478 stay on the WAN).
Current state (STATUS.md): askari is reached at `ansible_host: 77.42.120.136` (WAN, in the
TF-generated `inventories/production/offsite.yml`); `wt0` is up at `100.99.226.39`
(Management+Signal Connected, M5); the base nftables `firewall` concern is **built but not
applied** to askari (the Hetzner Cloud Firewall is its perimeter today); the Hetzner Cloud
Firewall (`terraform/modules/hetzner_vm`) opens `:22` from `var.ssh_admin_cidrs` plus
80/443/3478 from anywhere.
## Goal / success criteria
- SSH to askari succeeds over `wt0` (from ubongo) and **fails from any off-mesh source**.
- The WAN `:22` surface is closed at **both** layers (host nftables = `wt0`-only; Hetzner
Cloud Firewall drops the `:22` rule).
- Public services are unaffected: `https://test.askari.wingu.me` and
`https://netbird.askari.wingu.me` serve valid certs; STUN `3478/udp` still answers.
- Ansible manages askari over `wt0`.
- Break-glass is the **Hetzner web console** (out-of-band; works even if the mesh is down).
- A reboot of askari does **not** lock SSH out (the boot-race below is solved).
## Design — three enforcement layers (defense-in-depth)
1. **sshd** binds `ListenAddress` to askari's `wt0` IP only, so it does not accept on WAN.
2. **host nftables** (base `firewall` concern, ADR-020): catalog-driven default-deny;
`:22` allowed only via `iifname "wt0"` (the interface-name match that survives `wt0`
being absent — see `docs/testing/gotchas.md`); public service ports stay open on WAN.
3. **Hetzner Cloud Firewall** (Terraform): the `:22` `ssh_admin_cidrs` rule is removed;
80/443/3478 stay.
## The boot-race fix (load-bearing)
`wt0` is brought up by NetBird **after** boot, so at sshd start the `wt0` IP may not exist
yet. A plain `ListenAddress 100.99.226.39` would fail to bind → sshd exits → **lockout on
reboot**. Solution:
- **`net.ipv4.ip_nonlocal_bind = 1`** via a sysctl drop-in (`ansible.posix.sysctl`,
persisted under `/etc/sysctl.d/`). This lets sshd bind the `wt0` address even before the
interface exists; once `wt0` comes up with that IP, traffic is delivered to the existing
listener — no reload needed.
- The sshd drop-in **fails closed**: the mesh IP is resolved (see below) and the play
**asserts it is non-empty** before rendering. An empty `ListenAddress` would silently
fall back to listening on all interfaces, defeating the restriction — that must never
render.
**Mesh-IP source (decided):** the **live `wt0` fact** `ansible_wt0.ipv4.address`, gathered
at apply time (`wt0` is up during the play, since M5), with a **`host_var` fallback**
(`base__ssh_listen_addr`, default `""`) and a fail-closed `assert` that one of them yielded
a non-empty address. Live fact is preferred (correct even if NetBird reassigns the IP);
the host_var is an explicit override / belt.
## New & changed code
**Role `base` (the `hardening` + `firewall` concerns):**
- `roles/base/defaults/main.yml` — add:
- `base__ssh_listen_mesh_only: false` — opt-in; when `true`, sshd binds the mesh IP only.
- `base__ssh_listen_addr: ""` — optional explicit mesh-IP override (fallback to the
`ansible_wt0` fact).
- `roles/base/tasks/ssh.yml`
- resolve the mesh IP (`base__ssh_listen_addr` or `ansible_wt0.ipv4.address`) into a fact;
- `assert` it is non-empty **when** `base__ssh_listen_mesh_only`;
- set `net.ipv4.ip_nonlocal_bind = 1` (sysctl drop-in) under the same condition.
- `roles/base/templates/sshd_hardening.conf.j2` — append a conditional
`ListenAddress {{ resolved_mesh_ip }}` block guarded by `base__ssh_listen_mesh_only`
(unset → unchanged behaviour: listen on all). Keep the existing `sshd -t` validation.
**Inventory:**
- `inventories/production/host_vars/askari.yml` (new) — `ansible_host: 100.99.226.39`
(overrides the TF-generated `offsite.yml`; host_vars are not regenerated by
`tf_to_inventory.py`). A header comment explains why.
- `inventories/production/group_vars/offsite_hosts/vars.yml` — add
`base__ssh_listen_mesh_only: true`; ensure `base__firewall_apply: true`.
(`base__mesh_enabled` is already `true` for askari — set in M5 — and is a precondition,
not a change here.)
**Firewall catalog** (`inventories/production/group_vars/all/firewall.yml`):
- Enumerate askari's required ingress so catalog-driven default-deny does **not** drop a
live public service. Derived from the existing `reverse_proxy` + `netbird_coordinator`
definitions: `:22/tcp` on the **mesh** zone (`wt0`); `80,443/tcp` + `3478/udp` on the
**public** zone (WAN). The exact catalog/zone YAML is finalised in the implementation
plan against the `resolve_firewall_rules` filter's schema.
**Terraform** (`terraform/environments/offsite` + `terraform/modules/hetzner_vm`):
- Remove the WAN `:22` ingress rule (e.g. drop `ssh_admin_cidrs` from the firewall, or set
it empty and guard the rule). Keep 80/443/3478. Applied via `make tf-plan/apply
TF_ENV=offsite` (plan shown before apply).
## Staged cutover — a working path at every step
1. **Pre-check:** confirm `ssh sjat@100.99.226.39` and an `ansible askari -m ping` forced
over `wt0` both succeed **before** changing anything.
2. **Repoint Ansible:** add `host_vars/askari.yml` (`ansible_host` = `wt0` IP); verify
`ansible askari -m ping` runs over the mesh. WAN `:22` still open as a fallback here.
3. **Apply `base` (firewall + sshd together):** one `make deploy PLAYBOOK=site LIMIT=askari`
converge applies catalog default-deny (`:22` on `wt0` + public ports) **and** the sshd
`ListenAddress`=mesh + `ip_nonlocal_bind` drop-in. The firewall concern's
`reset_connection``wait_for_connection` (now over `wt0`) plus the armed auto-rollback
timer (`base__firewall_rollback_timeout`, 45 s) is the safety gate — a bad ruleset
reverts itself. The sshd `reload` cannot drop the in-flight `wt0` session. Verify the
public services still respond.
4. **Retire the Hetzner WAN `:22`:** the Terraform change above; `make tf-plan
TF_ENV=offsite` (review) → `make tf-apply`. Verify: `wt0` SSH works; off-mesh `nc -vz
77.42.120.136 22` is refused/times out; `:443` open; STUN answers.
## Testing
- **Molecule** (base `default` scenario; `wt0` absent in-container, `base__firewall_apply:
false` render-only): assert (a) the rendered nftables allows `:22` via `iifname "wt0"`;
(b) with `base__ssh_listen_mesh_only: true` + a fixture mesh IP, the sshd drop-in renders
`ListenAddress <ip>` and `sshd -t` passes; (c) with the flag set but **no** resolvable
mesh IP, the play **fails closed** (the `assert`); (d) the `ip_nonlocal_bind` sysctl task
is present. Keep existing firewall/hardening assertions green.
- **Live, out-of-band:** post-cutover, from an off-mesh host `nc -vz 77.42.120.136 22`
refused; `:443` → open; from ubongo over `wt0`, SSH + `ansible -m ping` succeed; reboot
askari (Hetzner console) and confirm SSH-over-`wt0` returns without intervention.
## Risks & rollback
- **Mid-cutover lockout:** mitigated by the staged order (a path open at each step), the
firewall auto-rollback timer, and `ansible_host`=`wt0` so the connectivity confirm tests
the real new path.
- **Reboot lockout:** mitigated by `ip_nonlocal_bind` (sshd binds `wt0` regardless of
interface timing) + the fail-closed assert (never silently listen-all).
- **Default-deny breaks a public service:** mitigated by enumerating all live ingress into
the catalog and the §Testing service checks; reversible by re-running with
`base__firewall_apply: false` or widening the catalog.
- **Ultimate break-glass:** the Hetzner web console (out-of-band). The TF `:22` rule is
trivially re-addable.
## Out of scope / follow-ons
- ubongo default-deny + `ssh-from-control` (sub-project 2).
- NetBird ACL off Allow-All (sub-project 3) — until then any enrolled peer can reach
askari's `wt0:22`; scoping that is sub-project 3's job.
- `/check-access` (ADR-021) live verification — designed, build still pending.
- STATUS.md / ROADMAP updates land with the implementation, not this spec.