From 9d787a4f5332fbdb58ca272f93c2bd50db24f1d6 Mon Sep 17 00:00:00 2001 From: sjat Date: Sun, 14 Jun 2026 16:55:22 +0200 Subject: [PATCH] =?UTF-8?q?docs(base):=20M3=20done=20=E2=80=94=20ssh=20har?= =?UTF-8?q?dening=20+=20fail2ban=20applied=20to=20askari;=20STATUS=20+=20r?= =?UTF-8?q?oadmap?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- STATUS.md | 4 ++-- docs/ROADMAP.md | 21 ++++++++++++--------- 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/STATUS.md b/STATUS.md index 2e4afe7..9936753 100644 --- a/STATUS.md +++ b/STATUS.md @@ -30,13 +30,13 @@ _Last reviewed: 2026-06-14._ | `make check` / `make deploy PLAYBOOK=` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). | | `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (SPF `-all` + DMARC reject), and the Gandi-defaults purge are defined + unit-tested (`tests/test_public_dns.py`). **Applied to wingu.me (2026-06-14):** purged Gandi's 13 seeded defaults; zone now holds only the SPF + DMARC TXT records; idempotent re-run clean. No null-MX (Gandi rejects `0 .`) — the MX is removed, so no MX + no apex A = no mail. M1 of the roadmap. | | `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (now relevant — the offsite tfstate exists). | -| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **Pending:** `base` hardening (M3), NetBird coordinator (M4), offsite tfstate backup (ADR-022). | +| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **SSH-hardened + fail2ban (M3 `hardening` concern applied).** **Pending:** NetBird coordinator (M4), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). | ## Scaffolded but empty — NOT implemented | Thing | State | |---|---| -| `roles/base/` | **Partially built.** The `firewall` concern is implemented (nftables: catalog-driven default-deny + east-west allowlist + auto-rollback apply; ADR-020) with pytest + Molecule render/syntax tests. Other concerns (SSH hardening, fail2ban, auditd, packages, users) are **not** built yet, so `make deploy PLAYBOOK=site` has no real content to apply (the make target itself now works — see "Real and working today"). | +| `roles/base/` | **Partially built.** Concerns built: `firewall` (nftables: catalog-driven default-deny + east-west allowlist + auto-rollback apply; ADR-020) and **`hardening`** (M3: sshd drop-in key-only + `PermitRootLogin no`, fail2ban sshd jail 5/1h; ADR-002) — both pytest/Molecule-tested. The **`hardening`** concern is **applied to askari** (`make deploy PLAYBOOK=site LIMIT=askari TAGS=hardening`). The `firewall` concern is built but **not yet applied** to any host (mesh-gated to avoid lockout — M5). Not built: auditd, packages, users (Phase 2 / TODO 15). | | `roles/docker_host/` | **Scaffolded, no tasks.** In git (meta/README/molecule filled), wired into `playbooks/site.yml` so the standard state is expressed end-to-end and `make lint` covers it, but it has no tasks yet — applying it is a no-op. Planned scope (Docker engine + Compose, daemon hardening, `nftables.d` container rules) in ADR-004/ADR-020. | | `inventories/*/hosts.yml` | Structured stubs with empty host maps (`hosts: {}`); regenerated by `make tf-inventory` once Terraform has hosts | | `inventories/production/group_vars/{docker_hosts,proxmox_hosts}/` | Empty dirs | diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index b879d00..d7cb860 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -92,17 +92,20 @@ it. Design: `docs/superpowers/specs/2026-06-14-askari-provisioning-design.md`. - **Amends:** ADR-006 (TF scope), ADR-009 (offsite handoff), ADR-020 (Hetzner Cloud Firewall = perimeter), ADR-007/016 (`askari` TF-provisioned, not "added manually"). -### M3 · `base` matured to a "remote-access-sufficient" subset +### M3 · `base` matured to a "remote-access-sufficient" subset — ✅ DONE -Today `base` is firewall-only. Add the subset a real, internet-facing host needs: -**SSH hardening + fail2ban + the NetBird agent task**. Full CIS L1/L2, auditd, AppArmor, -AIDE are deferred to Phase 2. +Added the `hardening` concern to `base` (sshd drop-in key-only + `PermitRootLogin no`; +fail2ban sshd jail 5/1h; ADR-002) and **applied it to askari** by tag +(`make deploy PLAYBOOK=site LIMIT=askari TAGS=hardening`) — SSH still works, fail2ban +active. Full CIS L1/L2, auditd, AppArmor, AIDE remain deferred to Phase 2 (TODO 15). -- **Why a subset:** `askari` is public (Hetzner) — it must be SSH-hardened and firewalled - *with* exposure, but the full hardening standard is not on the critical path to mobile - access. -- **Maps to:** ADR-002 (security baseline), ADR-016 (agent enrollment lives in `base`), - ADR-020 (firewall — already built), TODO 15 (the rest of hardening → Phase 2). +- **NetBird agent → M4** (deferred from M3: it enrolls against the coordinator, which + doesn't exist until M4 — ADR-016's coordinator-first bootstrap order). +- **Host firewall on askari + ubongo hardening → M5** (applying default-deny pre-mesh + would lock out SSH; the Hetzner Cloud Firewall is askari's perimeter until then). +- **Spec/plan:** `docs/superpowers/{specs,plans}/2026-06-14-base-ssh-fail2ban-m3*`. +- **Maps to:** ADR-002 (security baseline), ADR-020 (firewall — built, not yet applied), + TODO 15 (the rest of hardening → Phase 2). ### M4 · NetBird control plane on `askari` — first real service role