Compare commits

...

4 commits

Author SHA1 Message Date
b0511179cb feat(tf/offsite): retire askari's WAN :22 (mesh-only SSH)
The Hetzner Cloud Firewall SSH rule is now conditional on a non-empty
ssh_admin_cidrs (default []); askari sets it empty so the WAN :22 rule is
removed on the next apply. SSH is reached over wt0; break-glass is the Hetzner
console. Apply is the live cutover (Task 5). Mesh-hardening 1/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 20:51:24 +02:00
cc21344ab1 feat(inventory): manage askari over wt0 + enable mesh-only SSH
host_vars/askari.yml points ansible_host at the wt0 IP (overriding the generated
offsite.yml); offsite_hosts sets base__ssh_listen_mesh_only. Mesh-hardening 1/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 20:49:15 +02:00
3b30e70ba5 feat(firewall): public zone + askari's public services in the catalog
Adds a public (0.0.0.0/0) zone and askari's Caddy (80/443) + NetBird STUN
(3478/udp) ingress so the base nftables default-deny does not drop the live
public services when applied to askari. Molecule + filter unit test cover the
public-zone rendering. Mesh-hardening 1/3 (ADR-020/024/016).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 20:46:03 +02:00
39d2ad38ca feat(base): opt-in sshd ListenAddress on the mesh IP (fail-closed)
base__ssh_listen_mesh_only binds sshd to the live wt0 IP only, with
ip_nonlocal_bind to beat the post-boot bind race and a fail-closed assert so an
unresolved address never silently listens on all interfaces. Molecule covers
the render + sysctl. Mesh-hardening 1/3 (ADR-016/021).

Environmental checkpoint applied: the molecule-debian13 container image lacks
procps (no sysctl binary). Added molecule/default/prepare.yml to install procps
and sysctls: {net.ipv4.ip_nonlocal_bind: "0"} to molecule.yml platform so the
ansible.posix.sysctl task can write and read back the value hermetically.
Sysctl file format is net.ipv4.ip_nonlocal_bind=1 (no spaces); verify.yml
grep pattern updated to match ansible.posix.sysctl's actual output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 20:43:08 +02:00
14 changed files with 130 additions and 14 deletions

View file

@ -2,14 +2,27 @@
# Shared firewall topology — single source of truth for the host nftables layer
# (base role) and OPNsense (future). See docs/decisions/020-firewall.md.
# Zone → subnet (from ADR-007).
# Zone → subnet (from ADR-007). `public` = the WAN (anywhere) for deliberately public
# off-site services (askari); home/cluster services use the internal zones only.
firewall_zones:
mgmt: 10.10.0.0/24
srv: 10.20.0.0/24
lan: 10.30.0.0/24
iot: 10.40.0.0/24
guest: 10.50.0.0/24
public: 0.0.0.0/0
# Service catalog: <name> → placement (host | group | hosts) + ingress[].
# Empty until services are built; hosts still get default-deny + the management plane.
firewall_catalog: {}
# askari's public surface (ADR-024 Caddy + ADR-016 NetBird STUN). NOTE: the host
# nftables template renders IPv4 source rules only; askari is reached via its A record
# (no AAAA), so IPv4-only public rules are sufficient (see the spec's IPv6 note).
firewall_catalog:
reverse_proxy:
host: askari
ingress:
- { from: public, port: 80, proto: tcp }
- { from: public, port: 443, proto: tcp }
netbird_stun:
host: askari
ingress:
- { from: public, port: 3478, proto: udp }

View file

@ -1,6 +1,8 @@
---
# Off-site hosts (askari). askari runs the NetBird coordinator AND is a mesh peer
# (ADR-016, M5) — enrol the agent via base's `mesh` concern. Enrollment only; the
# host firewall default-deny + moving askari's SSH onto wt0 stay deferred to the
# mesh-hardening follow-on.
# (ADR-016, M5). Mesh-hardening 1/3 (2026-06-17): SSH is moved onto wt0 — sshd binds the
# mesh IP only (base__ssh_listen_mesh_only) and the base nftables default-deny applies
# (base__firewall_apply defaults true; SSH allowed on wt0 via base__firewall_mgmt_interface,
# public services via the catalog). base__mesh_enabled stays true (precondition from M5).
base__mesh_enabled: true
base__ssh_listen_mesh_only: true

View file

@ -0,0 +1,6 @@
---
# Manage askari over the NetBird mesh (wt0), not its WAN IP. This OVERRIDES the
# TF-generated inventories/production/offsite.yml (ansible_host = 77.42.120.136); host_vars
# outrank the generated inventory and are NOT touched by `make tf-inventory-offsite`.
# Mesh-hardening 1/3 — once SSH is wt0-only, the WAN IP is no longer reachable for SSH.
ansible_host: 100.99.226.39 # askari's wt0 address (NetBird, M5)

View file

@ -21,6 +21,14 @@ base__fail2ban_findtime: 10m
# base__ssh_authorised_keys lives in group_vars/all/vars.yml (per-person control keys).
base__ssh_authorised_keys: []
# SSH listen-on-mesh (mesh-hardening 1/3, ADR-016/021). Opt-in: when true, sshd binds
# ListenAddress to this host's mesh IP only (not the WAN). The IP comes from the live wt0
# fact (ansible_facts.wt0.ipv4.address); base__ssh_listen_addr overrides it. ip_nonlocal_bind
# lets sshd bind the mesh IP before wt0 exists at boot. Fails closed: the play asserts a
# non-empty address rather than silently listening on all interfaces.
base__ssh_listen_mesh_only: false
base__ssh_listen_addr: ""
# NetBird mesh agent enrollment (ADR-016). Opt-in: default off so applying `base` to a
# host not on the mesh is a no-op for this concern. The live actions (apt install over
# the network, `netbird up` against the coordinator) are additionally gated by

View file

@ -11,10 +11,13 @@
base__mesh_enabled: true
base__mesh_manage: false
base__mesh_setup_key: "dummy-molecule-key"
base__ssh_listen_mesh_only: true
base__ssh_listen_addr: "100.99.0.1" # fixture mesh IP (no wt0 in the container)
firewall_zones:
lan: 10.30.0.0/24
srv: 10.20.0.0/24
mgmt: 10.10.0.0/24
public: 0.0.0.0/0
firewall_catalog:
reverse_proxy:
host: instance
@ -24,5 +27,9 @@
host: instance
ingress:
- { from: srv, port: 2342, proto: tcp }
netbird_stun:
host: instance
ingress:
- { from: public, port: 3478, proto: udp }
roles:
- role: base

View file

@ -19,6 +19,11 @@ platforms:
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
command: /lib/systemd/systemd
# Pre-create the namespaced sysctl so ansible.posix.sysctl can set it (mesh-hardening 1/3).
# The container image lacks procps so the sysctl binary is absent; we also install it in
# prepare.yml. This entry ensures the value exists in the container's netns at startup.
sysctls:
net.ipv4.ip_nonlocal_bind: "0"
provisioner:
name: ansible

View file

@ -0,0 +1,11 @@
---
- name: Prepare
hosts: all
become: true
gather_facts: false
tasks:
- name: Install procps so ansible.posix.sysctl can find the sysctl binary
ansible.builtin.apt:
name: procps
state: present
update_cache: true

View file

@ -38,6 +38,13 @@
- "'tcp dport 2342 accept' in nft"
fail_msg: "missing srv->2342 rule for photoprism"
- name: Assert the public->stun:3478/udp ingress rule (0.0.0.0/0 source)
ansible.builtin.assert:
that:
- "'0.0.0.0/0' in nft"
- "'udp dport 3478 accept' in nft"
fail_msg: "missing public->3478/udp rule for netbird_stun"
- name: Assert the docker_host extension hook is present
ansible.builtin.assert:
that:
@ -58,6 +65,18 @@
ansible.builtin.command: grep -q '^\[sshd\]' /etc/fail2ban/jail.d/sshd.local
changed_when: false
- name: ListenAddress bound to the fixture mesh IP (mesh-only mode)
ansible.builtin.command: grep -q '^ListenAddress 100.99.0.1$' /etc/ssh/sshd_config.d/10-boma.conf
changed_when: false
- name: Sysctl drop-in for ip_nonlocal_bind is present
ansible.builtin.command: grep -q '^net.ipv4.ip_nonlocal_bind=1' /etc/sysctl.d/60-boma-nonlocal-bind.conf
changed_when: false
- name: Kernel ip_nonlocal_bind is live in this netns
ansible.builtin.command: sysctl -n net.ipv4.ip_nonlocal_bind
register: _nonlocal
changed_when: false
failed_when: _nonlocal.stdout | trim != '1'
# mesh concern: enabled but manage=false must be a clean no-op (no install/enrol)
- name: Check whether netbird got installed
ansible.builtin.command: which netbird

View file

@ -1,4 +1,31 @@
---
- name: Resolve the sshd mesh listen address (override, else live wt0 fact)
ansible.builtin.set_fact:
base__ssh_listen_addr_resolved: >-
{{ base__ssh_listen_addr
or ansible_facts.get('wt0', {}).get('ipv4', {}).get('address', '') }}
when: base__ssh_listen_mesh_only | bool
- name: Fail closed — refuse to render sshd without a known mesh address
ansible.builtin.assert:
that:
- base__ssh_listen_addr_resolved | length > 0
fail_msg: >-
base__ssh_listen_mesh_only is true but no mesh address resolved (set
base__ssh_listen_addr or ensure wt0 is up so its fact is gathered). Refusing to
render sshd ListenAddress empty (which would listen on ALL interfaces).
when: base__ssh_listen_mesh_only | bool
- name: Allow sshd to bind the mesh IP before wt0 exists at boot
ansible.posix.sysctl:
name: net.ipv4.ip_nonlocal_bind
value: "1"
sysctl_set: true
state: present
reload: true
sysctl_file: /etc/sysctl.d/60-boma-nonlocal-bind.conf
when: base__ssh_listen_mesh_only | bool
- name: Ensure openssh-server is installed
ansible.builtin.apt:
name: openssh-server

View file

@ -3,3 +3,6 @@ PasswordAuthentication {{ base__ssh_password_authentication }}
PermitRootLogin {{ base__ssh_permit_root_login }}
PubkeyAuthentication yes
KbdInteractiveAuthentication no
{% if base__ssh_listen_mesh_only | bool %}
ListenAddress {{ base__ssh_listen_addr_resolved }}
{% endif %}

View file

@ -11,7 +11,7 @@ module "askari" {
location = "hel1" # Helsinki
image = "debian-13"
ansible_ssh_pubkey = var.ansible_ssh_pubkey
ssh_admin_cidrs = var.ssh_admin_cidrs
ssh_admin_cidrs = [] # mesh-only: SSH is reached over wt0; WAN :22 retired (mesh-hardening 1/3)
public_web = true # Caddy 80/443 + NetBird 3478 (M4)
labels = {
env = "offsite"

View file

@ -26,12 +26,17 @@ resource "hcloud_ssh_key" "ansible" {
resource "hcloud_firewall" "this" {
name = "${var.name}-fw"
# SSH from the control node only.
rule {
direction = "in"
protocol = "tcp"
port = "22"
source_ips = var.ssh_admin_cidrs
# SSH from the control node only and only when admin CIDRs are set. An empty
# ssh_admin_cidrs removes the WAN :22 rule entirely (mesh-only SSH; reach the host over
# wt0, break-glass = Hetzner console). Mesh-hardening 1/3.
dynamic "rule" {
for_each = length(var.ssh_admin_cidrs) > 0 ? [1] : []
content {
direction = "in"
protocol = "tcp"
port = "22"
source_ips = var.ssh_admin_cidrs
}
}
# Public web (Caddy 80/443) + NetBird STUN/TURN (3478/udp) only when public_web

View file

@ -24,8 +24,9 @@ variable "ansible_ssh_pubkey" {
}
variable "ssh_admin_cidrs" {
description = "Source CIDRs allowed to reach SSH (e.g. ubongo's address/32)"
description = "Source CIDRs allowed to reach SSH over the WAN. Empty = no WAN SSH rule (mesh-only)."
type = list(string)
default = []
}
variable "public_web" {

View file

@ -97,3 +97,12 @@ def test_ingress_missing_port_raises():
cat = {"svc": {"host": "docker01", "ingress": [{"from": "lan"}]}}
with pytest.raises(ValueError):
fr.resolve_firewall_rules(cat, ZONES, "docker01", HOSTVARS, GROUPS)
def test_public_zone_resolves_to_anywhere():
catalog = {"web": {"host": "askari",
"ingress": [{"from": "public", "port": 443, "proto": "tcp"}]}}
zones = {"public": "0.0.0.0/0"}
rules = fr.resolve_firewall_rules(catalog, zones, "askari",
{"askari": {"ansible_host": "100.99.226.39"}}, {})
assert rules == [{"proto": "tcp", "port": 443, "sources": ["0.0.0.0/0"]}]