docs(base): M3 done — ssh hardening + fail2ban applied to askari; STATUS + roadmap
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
db1e5db138
commit
9d787a4f53
2 changed files with 14 additions and 11 deletions
|
|
@ -30,13 +30,13 @@ _Last reviewed: 2026-06-14._
|
|||
| `make check` / `make deploy PLAYBOOK=<name>` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). |
|
||||
| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (SPF `-all` + DMARC reject), and the Gandi-defaults purge are defined + unit-tested (`tests/test_public_dns.py`). **Applied to wingu.me (2026-06-14):** purged Gandi's 13 seeded defaults; zone now holds only the SPF + DMARC TXT records; idempotent re-run clean. No null-MX (Gandi rejects `0 .`) — the MX is removed, so no MX + no apex A = no mail. M1 of the roadmap. |
|
||||
| `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (now relevant — the offsite tfstate exists). |
|
||||
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **Pending:** `base` hardening (M3), NetBird coordinator (M4), offsite tfstate backup (ADR-022). |
|
||||
| `askari` — off-site Hetzner VPS (ADR-007/016, M2) | **Built + applied.** Provisioned by Terraform (`environments/offsite`, `hetznercloud/hcloud`) as **cx23 / hel1 / Debian 13.5** (CAX11/ARM was out of stock EU-wide on 2026-06-14 → cx23 is same-spec x86, cheaper). cloud-init created the `ansible` user + passwordless sudo; a TF-managed Hetzner Cloud Firewall allows SSH only from ubongo's WAN (`91.226.145.80`). Reachable from ubongo (`ansible offsite_hosts -m ping` ✓), in the `offsite_hosts` inventory (generated `offsite.yml`), published at `askari.wingu.me` → `77.42.120.136`. **SSH-hardened + fail2ban (M3 `hardening` concern applied).** **Pending:** NetBird coordinator (M4), host firewall + mesh enrollment (M5), offsite tfstate backup (ADR-022). |
|
||||
|
||||
## Scaffolded but empty — NOT implemented
|
||||
|
||||
| Thing | State |
|
||||
|---|---|
|
||||
| `roles/base/` | **Partially built.** The `firewall` concern is implemented (nftables: catalog-driven default-deny + east-west allowlist + auto-rollback apply; ADR-020) with pytest + Molecule render/syntax tests. Other concerns (SSH hardening, fail2ban, auditd, packages, users) are **not** built yet, so `make deploy PLAYBOOK=site` has no real content to apply (the make target itself now works — see "Real and working today"). |
|
||||
| `roles/base/` | **Partially built.** Concerns built: `firewall` (nftables: catalog-driven default-deny + east-west allowlist + auto-rollback apply; ADR-020) and **`hardening`** (M3: sshd drop-in key-only + `PermitRootLogin no`, fail2ban sshd jail 5/1h; ADR-002) — both pytest/Molecule-tested. The **`hardening`** concern is **applied to askari** (`make deploy PLAYBOOK=site LIMIT=askari TAGS=hardening`). The `firewall` concern is built but **not yet applied** to any host (mesh-gated to avoid lockout — M5). Not built: auditd, packages, users (Phase 2 / TODO 15). |
|
||||
| `roles/docker_host/` | **Scaffolded, no tasks.** In git (meta/README/molecule filled), wired into `playbooks/site.yml` so the standard state is expressed end-to-end and `make lint` covers it, but it has no tasks yet — applying it is a no-op. Planned scope (Docker engine + Compose, daemon hardening, `nftables.d` container rules) in ADR-004/ADR-020. |
|
||||
| `inventories/*/hosts.yml` | Structured stubs with empty host maps (`hosts: {}`); regenerated by `make tf-inventory` once Terraform has hosts |
|
||||
| `inventories/production/group_vars/{docker_hosts,proxmox_hosts}/` | Empty dirs |
|
||||
|
|
|
|||
|
|
@ -92,17 +92,20 @@ it. Design: `docs/superpowers/specs/2026-06-14-askari-provisioning-design.md`.
|
|||
- **Amends:** ADR-006 (TF scope), ADR-009 (offsite handoff), ADR-020 (Hetzner Cloud
|
||||
Firewall = perimeter), ADR-007/016 (`askari` TF-provisioned, not "added manually").
|
||||
|
||||
### M3 · `base` matured to a "remote-access-sufficient" subset
|
||||
### M3 · `base` matured to a "remote-access-sufficient" subset — ✅ DONE
|
||||
|
||||
Today `base` is firewall-only. Add the subset a real, internet-facing host needs:
|
||||
**SSH hardening + fail2ban + the NetBird agent task**. Full CIS L1/L2, auditd, AppArmor,
|
||||
AIDE are deferred to Phase 2.
|
||||
Added the `hardening` concern to `base` (sshd drop-in key-only + `PermitRootLogin no`;
|
||||
fail2ban sshd jail 5/1h; ADR-002) and **applied it to askari** by tag
|
||||
(`make deploy PLAYBOOK=site LIMIT=askari TAGS=hardening`) — SSH still works, fail2ban
|
||||
active. Full CIS L1/L2, auditd, AppArmor, AIDE remain deferred to Phase 2 (TODO 15).
|
||||
|
||||
- **Why a subset:** `askari` is public (Hetzner) — it must be SSH-hardened and firewalled
|
||||
*with* exposure, but the full hardening standard is not on the critical path to mobile
|
||||
access.
|
||||
- **Maps to:** ADR-002 (security baseline), ADR-016 (agent enrollment lives in `base`),
|
||||
ADR-020 (firewall — already built), TODO 15 (the rest of hardening → Phase 2).
|
||||
- **NetBird agent → M4** (deferred from M3: it enrolls against the coordinator, which
|
||||
doesn't exist until M4 — ADR-016's coordinator-first bootstrap order).
|
||||
- **Host firewall on askari + ubongo hardening → M5** (applying default-deny pre-mesh
|
||||
would lock out SSH; the Hetzner Cloud Firewall is askari's perimeter until then).
|
||||
- **Spec/plan:** `docs/superpowers/{specs,plans}/2026-06-14-base-ssh-fail2ban-m3*`.
|
||||
- **Maps to:** ADR-002 (security baseline), ADR-020 (firewall — built, not yet applied),
|
||||
TODO 15 (the rest of hardening → Phase 2).
|
||||
|
||||
### M4 · NetBird control plane on `askari` — first real service role
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue