docs(spec,plan): M3 — base ssh hardening + fail2ban
ADR-002 baseline (key-only, no root, fail2ban 5/1h) as two base task files under the existing 'hardening' concern tag; applied to askari by tag (NOT the host firewall — that's mesh-gated to avoid lockout; Hetzner Cloud Firewall is the perimeter until M5). NetBird agent deferred to M4. Adds a LIMIT=/TAGS= passthrough to make check/deploy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
a1c0f4814b
commit
cff368ece2
2 changed files with 332 additions and 0 deletions
250
docs/superpowers/plans/2026-06-14-base-ssh-fail2ban-m3.md
Normal file
250
docs/superpowers/plans/2026-06-14-base-ssh-fail2ban-m3.md
Normal file
|
|
@ -0,0 +1,250 @@
|
|||
# base SSH hardening + fail2ban (M3) Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add SSH-hardening + fail2ban concerns to the `base` role (ADR-002 baseline) and apply them to askari — without locking anything out.
|
||||
|
||||
**Architecture:** Two new `base` task files (`ssh.yml`, `fail2ban.yml`), both under the existing `hardening` concern tag, included after `firewall.yml`. Applied to askari **by tag** (`hardening`) so the host firewall (default-deny) is NOT applied pre-mesh — the Hetzner Cloud Firewall remains askari's perimeter until M5. A `LIMIT=`/`TAGS=` passthrough on `make check/deploy` enables the targeted apply.
|
||||
|
||||
**Tech Stack:** Ansible (`ansible.builtin`, `ansible.posix.authorized_key` — already vendored), sshd drop-in config, fail2ban.
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-06-14-base-ssh-fail2ban-m3-design.md`
|
||||
|
||||
**Execution context:** Tasks 1–3 author + Molecule (Docker available). **Task 4 applies to live askari** (gated; reachable from ubongo). No new billed resources.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: `make check/deploy` LIMIT + TAGS passthrough
|
||||
|
||||
**Files:** Modify `Makefile` (the `check` and `deploy` recipes).
|
||||
|
||||
- [ ] **Step 1:** In the `check:` recipe, change the command line to:
|
||||
```makefile
|
||||
$(PLAYBOOK_BIN) $(INVENTORY) $(VAULT_ARGS) $(if $(LIMIT),--limit $(LIMIT)) $(if $(TAGS),--tags $(TAGS)) --check --diff playbooks/$(PLAYBOOK).yml
|
||||
```
|
||||
- [ ] **Step 2:** In the `deploy:` recipe, change the command line to:
|
||||
```makefile
|
||||
$(PLAYBOOK_BIN) $(INVENTORY) $(VAULT_ARGS) $(if $(LIMIT),--limit $(LIMIT)) $(if $(TAGS),--tags $(TAGS)) playbooks/$(PLAYBOOK).yml
|
||||
```
|
||||
- [ ] **Step 3:** Add help lines noting `[LIMIT=<host>] [TAGS=<tags>]` are optional on check/deploy.
|
||||
- [ ] **Step 4:** Sanity-check it parses: `make check PLAYBOOK=dns LIMIT=control TAGS=public_dns 2>&1 | tail -2` (should run check-mode scoped to control). Expected: no make/syntax error.
|
||||
- [ ] **Step 5:** Commit:
|
||||
```bash
|
||||
git add Makefile
|
||||
git commit -m "feat(make): optional LIMIT= and TAGS= passthrough on check/deploy"
|
||||
```
|
||||
(append `Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>`)
|
||||
|
||||
---
|
||||
|
||||
### Task 2: base `hardening` concern — ssh + fail2ban
|
||||
|
||||
**Files:** Create `roles/base/tasks/ssh.yml`, `roles/base/tasks/fail2ban.yml`, `roles/base/templates/sshd_hardening.conf.j2`, `roles/base/templates/fail2ban_sshd.local.j2`; modify `roles/base/tasks/main.yml`, `roles/base/defaults/main.yml`, `roles/base/handlers/main.yml`, `inventories/production/group_vars/all/vars.yml`.
|
||||
|
||||
- [ ] **Step 1:** Append to `roles/base/defaults/main.yml`:
|
||||
```yaml
|
||||
|
||||
# SSH hardening + fail2ban (ADR-002) — `hardening` concern.
|
||||
base__ssh_password_authentication: "no"
|
||||
base__ssh_permit_root_login: "no"
|
||||
base__fail2ban_maxretry: 5
|
||||
base__fail2ban_bantime: 1h
|
||||
base__fail2ban_findtime: 10m
|
||||
# base__ssh_authorised_keys lives in group_vars/all/vars.yml (per-person control keys).
|
||||
```
|
||||
|
||||
- [ ] **Step 2:** Create `roles/base/templates/sshd_hardening.conf.j2`:
|
||||
```
|
||||
# Managed by Ansible (base role, ADR-002). Do not edit on the host.
|
||||
PasswordAuthentication {{ base__ssh_password_authentication }}
|
||||
PermitRootLogin {{ base__ssh_permit_root_login }}
|
||||
PubkeyAuthentication yes
|
||||
KbdInteractiveAuthentication no
|
||||
```
|
||||
|
||||
- [ ] **Step 3:** Create `roles/base/templates/fail2ban_sshd.local.j2`:
|
||||
```
|
||||
# Managed by Ansible (base role, ADR-002).
|
||||
[sshd]
|
||||
enabled = true
|
||||
maxretry = {{ base__fail2ban_maxretry }}
|
||||
bantime = {{ base__fail2ban_bantime }}
|
||||
findtime = {{ base__fail2ban_findtime }}
|
||||
```
|
||||
|
||||
- [ ] **Step 4:** Create `roles/base/tasks/ssh.yml`:
|
||||
```yaml
|
||||
---
|
||||
- name: Ensure openssh-server is installed
|
||||
ansible.builtin.apt:
|
||||
name: openssh-server
|
||||
state: present
|
||||
update_cache: true
|
||||
|
||||
- name: Render hardened sshd drop-in
|
||||
ansible.builtin.template:
|
||||
src: sshd_hardening.conf.j2
|
||||
dest: /etc/ssh/sshd_config.d/10-boma.conf
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
notify: reload sshd
|
||||
|
||||
- name: Validate the full sshd config (drop-in included)
|
||||
ansible.builtin.command: sshd -t
|
||||
changed_when: false
|
||||
|
||||
- name: Authorise control SSH keys for the ansible user
|
||||
ansible.posix.authorized_key:
|
||||
user: "{{ ansible_user | default('ansible') }}"
|
||||
key: "{{ base__ssh_authorised_keys | join('\n') }}"
|
||||
exclusive: true
|
||||
when: base__ssh_authorised_keys | length > 0
|
||||
```
|
||||
|
||||
- [ ] **Step 5:** Create `roles/base/tasks/fail2ban.yml`:
|
||||
```yaml
|
||||
---
|
||||
- name: Install fail2ban
|
||||
ansible.builtin.apt:
|
||||
name: fail2ban
|
||||
state: present
|
||||
update_cache: true
|
||||
|
||||
- name: Configure the sshd jail
|
||||
ansible.builtin.template:
|
||||
src: fail2ban_sshd.local.j2
|
||||
dest: /etc/fail2ban/jail.d/sshd.local
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
notify: restart fail2ban
|
||||
|
||||
- name: Enable and start fail2ban
|
||||
ansible.builtin.service:
|
||||
name: fail2ban
|
||||
enabled: true
|
||||
state: started
|
||||
```
|
||||
|
||||
- [ ] **Step 6:** Replace `roles/base/handlers/main.yml`:
|
||||
```yaml
|
||||
---
|
||||
- name: Reload sshd
|
||||
listen: reload sshd
|
||||
ansible.builtin.service:
|
||||
name: ssh
|
||||
state: reloaded
|
||||
|
||||
- name: Restart fail2ban
|
||||
listen: restart fail2ban
|
||||
ansible.builtin.service:
|
||||
name: fail2ban
|
||||
state: restarted
|
||||
```
|
||||
|
||||
- [ ] **Step 7:** In `roles/base/tasks/main.yml`, add after the firewall include:
|
||||
```yaml
|
||||
- name: SSH hardening
|
||||
ansible.builtin.include_tasks: ssh.yml
|
||||
tags: [hardening]
|
||||
|
||||
- name: fail2ban intrusion deterrence
|
||||
ansible.builtin.include_tasks: fail2ban.yml
|
||||
tags: [hardening]
|
||||
```
|
||||
|
||||
- [ ] **Step 8:** In `inventories/production/group_vars/all/vars.yml`, set `base__ssh_authorised_keys` (replace the empty `[]`):
|
||||
```yaml
|
||||
base__ssh_authorised_keys:
|
||||
- "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKSx1TFLJ9H8vCe5ZJSu7MYmAiH0/OC8evloQjGR0Bqw claude@ubongo"
|
||||
```
|
||||
|
||||
- [ ] **Step 9:** `make lint` — expect `0 failure(s)` + `check-tags: OK` (the `hardening` tag is already in `tests/tags.yml`).
|
||||
- [ ] **Step 10:** Commit:
|
||||
```bash
|
||||
git add roles/base inventories/production/group_vars/all/vars.yml
|
||||
git commit -m "feat(base): ssh hardening + fail2ban (hardening concern, ADR-002)"
|
||||
```
|
||||
(Co-Authored-By trailer)
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Molecule coverage
|
||||
|
||||
**Files:** Modify `roles/base/molecule/default/converge.yml`, `roles/base/molecule/default/verify.yml`.
|
||||
|
||||
- [ ] **Step 1:** In `converge.yml`, the role already runs with `base__firewall_apply: false`. Leave `base__ssh_authorised_keys` unset (defaults to `[]` → the `authorized_key` task is skipped, no test user needed). No converge change needed unless vars are missing — confirm the play still has `roles: [base]`.
|
||||
|
||||
- [ ] **Step 2:** Append assertions to `verify.yml` (after the existing firewall checks):
|
||||
```yaml
|
||||
- name: sshd drop-in present and config valid
|
||||
ansible.builtin.command: sshd -t
|
||||
changed_when: false
|
||||
tags: [verify]
|
||||
|
||||
- name: PasswordAuthentication is disabled
|
||||
ansible.builtin.command: grep -q '^PasswordAuthentication no' /etc/ssh/sshd_config.d/10-boma.conf
|
||||
changed_when: false
|
||||
tags: [verify]
|
||||
|
||||
- name: fail2ban sshd jail configured
|
||||
ansible.builtin.command: grep -q '^\[sshd\]' /etc/fail2ban/jail.d/sshd.local
|
||||
changed_when: false
|
||||
tags: [verify]
|
||||
```
|
||||
|
||||
- [ ] **Step 3:** Run `make test ROLE=base`. Expected: converge installs openssh-server + fail2ban, renders the drop-ins, validates sshd, starts fail2ban; verify passes; idempotence clean. If the Molecule image lacks systemd-for-fail2ban or apt fails offline, capture the error (the image is systemd-enabled per `molecule.yml`).
|
||||
- [ ] **Step 4:** Commit:
|
||||
```bash
|
||||
git add roles/base/molecule
|
||||
git commit -m "test(base): Molecule coverage for ssh hardening + fail2ban"
|
||||
```
|
||||
(Co-Authored-By trailer)
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Apply to askari (gated — live host)
|
||||
|
||||
> Runs against live askari (reachable from ubongo). `rbw` unlocked. Applies ONLY the
|
||||
> `hardening` concern (`--tags hardening`) so the host firewall is not touched.
|
||||
|
||||
- [ ] **Step 1: Dry-run.** `make check PLAYBOOK=site LIMIT=askari TAGS=hardening` — review: openssh-server present, sshd drop-in (`PasswordAuthentication no`, `PermitRootLogin no`), authorized_key for `ansible`, fail2ban installed + sshd jail. Confirm NO firewall tasks appear.
|
||||
- [ ] **Step 2: Apply.** `make deploy PLAYBOOK=site LIMIT=askari TAGS=hardening` — expect changed for the drop-in, fail2ban install/config; `failed=0`.
|
||||
- [ ] **Step 3: Verify SSH still works (lock-out guard).** `.venv/bin/ansible offsite_hosts -m ping` → `pong`. And `.venv/bin/ansible offsite_hosts -b -m command -a 'sshd -t'` → rc=0.
|
||||
- [ ] **Step 4: Verify fail2ban.** `.venv/bin/ansible offsite_hosts -b -m command -a 'fail2ban-client status sshd'` → shows the sshd jail active.
|
||||
- [ ] **Step 5: Idempotence.** Re-run Step 2 → `changed=0`.
|
||||
- [ ] **Step 6: No repo commit** (configures the host, not the repo).
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Docs
|
||||
|
||||
**Files:** Modify `STATUS.md`, `docs/ROADMAP.md`.
|
||||
|
||||
- [ ] **Step 1:** In `STATUS.md`, update the `roles/base/` row (under "Scaffolded but empty"/partial) to note the `hardening` concern (ssh + fail2ban) is now built, and **applied to askari**; firewall concern still pending application (mesh-gated). If askari's row exists in "Real and working today," append "SSH hardened + fail2ban (M3)".
|
||||
- [ ] **Step 2:** In `docs/ROADMAP.md`, mark **M3** as done (ssh + fail2ban built + applied to askari; NetBird agent deferred to M4; host firewall + ubongo hardening at M5).
|
||||
- [ ] **Step 3:** `make lint`; commit:
|
||||
```bash
|
||||
git add STATUS.md docs/ROADMAP.md
|
||||
git commit -m "docs(base): M3 — ssh hardening + fail2ban applied to askari; STATUS + roadmap"
|
||||
```
|
||||
(Co-Authored-By trailer)
|
||||
|
||||
---
|
||||
|
||||
## Self-Review (completed)
|
||||
|
||||
- **Spec coverage:** ssh + fail2ban concerns under `hardening` (Decision 1) → Task 2;
|
||||
apply-by-tag, no firewall (Decision 2) → Task 4 (`TAGS=hardening`); `base__ssh_authorised_keys`
|
||||
populated (Decision 3) → Task 2 Step 8; LIMIT/TAGS passthrough (Decision 4) → Task 1;
|
||||
ADR-002 controls (key-only, no root, fail2ban 5/1h) → Tasks 2; Molecule + live verify
|
||||
(testing) → Tasks 3, 4. Deferrals (agent/M4, host-fw+ubongo/M5, auditd/Phase 2) honoured.
|
||||
- **Placeholder scan:** none — all task/template/handler content is concrete.
|
||||
- **Name consistency:** `base__ssh_*` / `base__fail2ban_*` / `base__ssh_authorised_keys`
|
||||
used identically across defaults, templates, tasks, and group_vars; handler listen-topics
|
||||
(`reload sshd`, `restart fail2ban`) match the `notify:` strings.
|
||||
- **Lock-out guard:** sshd hardening only disables password+root (we use key+sudo); the
|
||||
`ansible` user's key is preserved (`base__ssh_authorised_keys` has it); `sshd -t`
|
||||
validates before reload; firewall untouched (`--tags hardening`). Task 4 verifies SSH
|
||||
post-apply.
|
||||
|
|
@ -0,0 +1,82 @@
|
|||
# Design — `base` SSH hardening + fail2ban (M3)
|
||||
|
||||
- **Date:** 2026-06-14
|
||||
- **Status:** Draft → straight to plan (design is ADR-derived; per the standing
|
||||
skip-the-spec-review-gate agreement)
|
||||
- **Roadmap milestone:** M3 (`docs/ROADMAP.md`) — the "remote-access-sufficient" `base` subset
|
||||
- **Implements:** ADR-002 (SSH key-only, `PermitRootLogin no`, fail2ban 5-fails/1-h)
|
||||
- **Amends:** none (uses decided ADRs); touches ADR-021 only by reference
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
`askari` is a **public** host now (M2) but only cloud-init-hardened. The `base` role so
|
||||
far implements only the `firewall` concern. M3 adds the SSH-hardening + fail2ban concerns
|
||||
to `base` and applies them to `askari` — the minimum to make a public host
|
||||
remote-access-safe — without locking anything out.
|
||||
|
||||
## Decisions (as settled)
|
||||
|
||||
1. **Scope:** two new `base` task files — **`ssh.yml`** (sshd hardening) and
|
||||
**`fail2ban.yml`** — both under the **existing `hardening` concern tag** (already in
|
||||
`tests/tags.yml` / ADR-019: "sshd config, fail2ban, auditd, sysctl") — no vocab change.
|
||||
NetBird agent enrollment → **M4** (needs the coordinator; ADR-016 bootstrap order).
|
||||
auditd / full CIS L1+L2 → **Phase 2** (TODO 15). ubongo's `base` apply + the host
|
||||
firewall on askari → **M5** (when the mesh exists).
|
||||
2. **Apply only `ssh` + `fail2ban` to askari (by tag), NOT the host firewall.** Applying
|
||||
`base`'s default-deny nftables to askari pre-mesh would block the WAN SSH from ubongo
|
||||
(the firewall allows SSH only on `wt0` + from the *LAN* `base__firewall_control_addr`,
|
||||
neither of which matches askari's WAN path) → lockout. The **Hetzner Cloud Firewall**
|
||||
(M2) is askari's perimeter until M5; the host firewall lands with the mesh.
|
||||
3. **`base__ssh_authorised_keys` is populated** with ubongo's control key
|
||||
(`claude@ubongo`) so the `ssh` concern's `authorized_keys` management doesn't remove
|
||||
the cloud-init key and lock out. (Public key — set in `group_vars/all`.)
|
||||
4. **`make check`/`deploy` gain `LIMIT=` + `TAGS=` passthrough** so a concern subset can
|
||||
be applied to one host (e.g. `make deploy PLAYBOOK=site LIMIT=askari TAGS=ssh,fail2ban`).
|
||||
|
||||
## ADR-002 controls implemented
|
||||
|
||||
- **sshd:** `PasswordAuthentication no`, `PermitRootLogin no`, `PubkeyAuthentication yes`,
|
||||
`ChallengeResponseAuthentication no`; the `ansible` user's `authorized_keys` from
|
||||
`base__ssh_authorised_keys`. Validate config (`sshd -t`) before reload; reload via handler.
|
||||
- **fail2ban:** installed + enabled; `sshd` jail, **maxretry 5, bantime 1 h** (knobs in
|
||||
defaults).
|
||||
|
||||
## Implementation
|
||||
|
||||
- `roles/base/tasks/ssh.yml` (tag `hardening`) — render `/etc/ssh/sshd_config.d/10-boma.conf`
|
||||
(drop-in, validated), manage the `ansible` user's `authorized_keys` **only when
|
||||
`base__ssh_authorised_keys` is non-empty** (so Molecule, with empty keys, skips it and
|
||||
doesn't need a test user); notify "reload sshd".
|
||||
- `roles/base/tasks/fail2ban.yml` (tag `hardening`) — apt-install `fail2ban`, render a
|
||||
`jail.d/sshd.local`, enable+start the service.
|
||||
- `roles/base/tasks/main.yml` — `include_tasks` both (each `tags: [hardening]`) after firewall.
|
||||
- `roles/base/defaults/main.yml` — `base__ssh_*` + `base__fail2ban_*` knobs.
|
||||
- `roles/base/handlers/main.yml` — `reload sshd` (listen-topic), config validated first.
|
||||
- `inventories/production/group_vars/all/vars.yml` — populate `base__ssh_authorised_keys`
|
||||
with `claude@ubongo`'s control key.
|
||||
- `Makefile` — `$(if $(LIMIT),--limit $(LIMIT)) $(if $(TAGS),--tags $(TAGS))` on check/deploy.
|
||||
- Molecule: extend the `base` scenario to converge `ssh` + `fail2ban` (with
|
||||
`base__firewall_apply: false`) and verify (`sshd -t` clean, fail2ban jail present).
|
||||
|
||||
## Testing
|
||||
|
||||
- **Molecule** (Debian 13 container): converge ssh + fail2ban; verify sshd drop-in valid,
|
||||
`PasswordAuthentication no` present, fail2ban sshd jail configured. (firewall stays
|
||||
`apply:false`.)
|
||||
- **Live on askari** (gated): `make check` (review) → `make deploy PLAYBOOK=site
|
||||
LIMIT=askari TAGS=hardening` → **verify SSH still works** (`ansible offsite_hosts -m
|
||||
ping` after) → confirm `fail2ban-client status sshd`. Lock-out guard: the `ansible`
|
||||
user keeps key auth throughout (we only disable password/root, which we don't use).
|
||||
|
||||
## Scope boundaries — what M3 is NOT
|
||||
|
||||
- Not the NetBird agent (M4), not the host firewall on askari or ubongo hardening (M5),
|
||||
not auditd / CIS L1+L2 (Phase 2), not `unattended-upgrades` (deferred — ADR-002 baseline,
|
||||
but Phase 2 with the rest of the OS-update story / ADR-011).
|
||||
|
||||
## Open items (resolve in the plan)
|
||||
|
||||
- Whether to also apply to ubongo now (it's already manually key-only) or wait for M5 —
|
||||
default **wait** (avoid any risk to the box I run on; bring it under `base` with the mesh).
|
||||
Loading…
Add table
Reference in a new issue