boma/docs/superpowers/plans/2026-06-14-base-ssh-fail2ban-m3.md
sjat cff368ece2 docs(spec,plan): M3 — base ssh hardening + fail2ban
ADR-002 baseline (key-only, no root, fail2ban 5/1h) as two base task files under
the existing 'hardening' concern tag; applied to askari by tag (NOT the host
firewall — that's mesh-gated to avoid lockout; Hetzner Cloud Firewall is the
perimeter until M5). NetBird agent deferred to M4. Adds a LIMIT=/TAGS= passthrough
to make check/deploy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:38:38 +02:00

250 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# base SSH hardening + fail2ban (M3) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add SSH-hardening + fail2ban concerns to the `base` role (ADR-002 baseline) and apply them to askari — without locking anything out.
**Architecture:** Two new `base` task files (`ssh.yml`, `fail2ban.yml`), both under the existing `hardening` concern tag, included after `firewall.yml`. Applied to askari **by tag** (`hardening`) so the host firewall (default-deny) is NOT applied pre-mesh — the Hetzner Cloud Firewall remains askari's perimeter until M5. A `LIMIT=`/`TAGS=` passthrough on `make check/deploy` enables the targeted apply.
**Tech Stack:** Ansible (`ansible.builtin`, `ansible.posix.authorized_key` — already vendored), sshd drop-in config, fail2ban.
**Spec:** `docs/superpowers/specs/2026-06-14-base-ssh-fail2ban-m3-design.md`
**Execution context:** Tasks 13 author + Molecule (Docker available). **Task 4 applies to live askari** (gated; reachable from ubongo). No new billed resources.
---
### Task 1: `make check/deploy` LIMIT + TAGS passthrough
**Files:** Modify `Makefile` (the `check` and `deploy` recipes).
- [ ] **Step 1:** In the `check:` recipe, change the command line to:
```makefile
$(PLAYBOOK_BIN) $(INVENTORY) $(VAULT_ARGS) $(if $(LIMIT),--limit $(LIMIT)) $(if $(TAGS),--tags $(TAGS)) --check --diff playbooks/$(PLAYBOOK).yml
```
- [ ] **Step 2:** In the `deploy:` recipe, change the command line to:
```makefile
$(PLAYBOOK_BIN) $(INVENTORY) $(VAULT_ARGS) $(if $(LIMIT),--limit $(LIMIT)) $(if $(TAGS),--tags $(TAGS)) playbooks/$(PLAYBOOK).yml
```
- [ ] **Step 3:** Add help lines noting `[LIMIT=<host>] [TAGS=<tags>]` are optional on check/deploy.
- [ ] **Step 4:** Sanity-check it parses: `make check PLAYBOOK=dns LIMIT=control TAGS=public_dns 2>&1 | tail -2` (should run check-mode scoped to control). Expected: no make/syntax error.
- [ ] **Step 5:** Commit:
```bash
git add Makefile
git commit -m "feat(make): optional LIMIT= and TAGS= passthrough on check/deploy"
```
(append `Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>`)
---
### Task 2: base `hardening` concern — ssh + fail2ban
**Files:** Create `roles/base/tasks/ssh.yml`, `roles/base/tasks/fail2ban.yml`, `roles/base/templates/sshd_hardening.conf.j2`, `roles/base/templates/fail2ban_sshd.local.j2`; modify `roles/base/tasks/main.yml`, `roles/base/defaults/main.yml`, `roles/base/handlers/main.yml`, `inventories/production/group_vars/all/vars.yml`.
- [ ] **Step 1:** Append to `roles/base/defaults/main.yml`:
```yaml
# SSH hardening + fail2ban (ADR-002) — `hardening` concern.
base__ssh_password_authentication: "no"
base__ssh_permit_root_login: "no"
base__fail2ban_maxretry: 5
base__fail2ban_bantime: 1h
base__fail2ban_findtime: 10m
# base__ssh_authorised_keys lives in group_vars/all/vars.yml (per-person control keys).
```
- [ ] **Step 2:** Create `roles/base/templates/sshd_hardening.conf.j2`:
```
# Managed by Ansible (base role, ADR-002). Do not edit on the host.
PasswordAuthentication {{ base__ssh_password_authentication }}
PermitRootLogin {{ base__ssh_permit_root_login }}
PubkeyAuthentication yes
KbdInteractiveAuthentication no
```
- [ ] **Step 3:** Create `roles/base/templates/fail2ban_sshd.local.j2`:
```
# Managed by Ansible (base role, ADR-002).
[sshd]
enabled = true
maxretry = {{ base__fail2ban_maxretry }}
bantime = {{ base__fail2ban_bantime }}
findtime = {{ base__fail2ban_findtime }}
```
- [ ] **Step 4:** Create `roles/base/tasks/ssh.yml`:
```yaml
---
- name: Ensure openssh-server is installed
ansible.builtin.apt:
name: openssh-server
state: present
update_cache: true
- name: Render hardened sshd drop-in
ansible.builtin.template:
src: sshd_hardening.conf.j2
dest: /etc/ssh/sshd_config.d/10-boma.conf
owner: root
group: root
mode: "0644"
notify: reload sshd
- name: Validate the full sshd config (drop-in included)
ansible.builtin.command: sshd -t
changed_when: false
- name: Authorise control SSH keys for the ansible user
ansible.posix.authorized_key:
user: "{{ ansible_user | default('ansible') }}"
key: "{{ base__ssh_authorised_keys | join('\n') }}"
exclusive: true
when: base__ssh_authorised_keys | length > 0
```
- [ ] **Step 5:** Create `roles/base/tasks/fail2ban.yml`:
```yaml
---
- name: Install fail2ban
ansible.builtin.apt:
name: fail2ban
state: present
update_cache: true
- name: Configure the sshd jail
ansible.builtin.template:
src: fail2ban_sshd.local.j2
dest: /etc/fail2ban/jail.d/sshd.local
owner: root
group: root
mode: "0644"
notify: restart fail2ban
- name: Enable and start fail2ban
ansible.builtin.service:
name: fail2ban
enabled: true
state: started
```
- [ ] **Step 6:** Replace `roles/base/handlers/main.yml`:
```yaml
---
- name: Reload sshd
listen: reload sshd
ansible.builtin.service:
name: ssh
state: reloaded
- name: Restart fail2ban
listen: restart fail2ban
ansible.builtin.service:
name: fail2ban
state: restarted
```
- [ ] **Step 7:** In `roles/base/tasks/main.yml`, add after the firewall include:
```yaml
- name: SSH hardening
ansible.builtin.include_tasks: ssh.yml
tags: [hardening]
- name: fail2ban intrusion deterrence
ansible.builtin.include_tasks: fail2ban.yml
tags: [hardening]
```
- [ ] **Step 8:** In `inventories/production/group_vars/all/vars.yml`, set `base__ssh_authorised_keys` (replace the empty `[]`):
```yaml
base__ssh_authorised_keys:
- "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKSx1TFLJ9H8vCe5ZJSu7MYmAiH0/OC8evloQjGR0Bqw claude@ubongo"
```
- [ ] **Step 9:** `make lint` — expect `0 failure(s)` + `check-tags: OK` (the `hardening` tag is already in `tests/tags.yml`).
- [ ] **Step 10:** Commit:
```bash
git add roles/base inventories/production/group_vars/all/vars.yml
git commit -m "feat(base): ssh hardening + fail2ban (hardening concern, ADR-002)"
```
(Co-Authored-By trailer)
---
### Task 3: Molecule coverage
**Files:** Modify `roles/base/molecule/default/converge.yml`, `roles/base/molecule/default/verify.yml`.
- [ ] **Step 1:** In `converge.yml`, the role already runs with `base__firewall_apply: false`. Leave `base__ssh_authorised_keys` unset (defaults to `[]` → the `authorized_key` task is skipped, no test user needed). No converge change needed unless vars are missing — confirm the play still has `roles: [base]`.
- [ ] **Step 2:** Append assertions to `verify.yml` (after the existing firewall checks):
```yaml
- name: sshd drop-in present and config valid
ansible.builtin.command: sshd -t
changed_when: false
tags: [verify]
- name: PasswordAuthentication is disabled
ansible.builtin.command: grep -q '^PasswordAuthentication no' /etc/ssh/sshd_config.d/10-boma.conf
changed_when: false
tags: [verify]
- name: fail2ban sshd jail configured
ansible.builtin.command: grep -q '^\[sshd\]' /etc/fail2ban/jail.d/sshd.local
changed_when: false
tags: [verify]
```
- [ ] **Step 3:** Run `make test ROLE=base`. Expected: converge installs openssh-server + fail2ban, renders the drop-ins, validates sshd, starts fail2ban; verify passes; idempotence clean. If the Molecule image lacks systemd-for-fail2ban or apt fails offline, capture the error (the image is systemd-enabled per `molecule.yml`).
- [ ] **Step 4:** Commit:
```bash
git add roles/base/molecule
git commit -m "test(base): Molecule coverage for ssh hardening + fail2ban"
```
(Co-Authored-By trailer)
---
### Task 4: Apply to askari (gated — live host)
> Runs against live askari (reachable from ubongo). `rbw` unlocked. Applies ONLY the
> `hardening` concern (`--tags hardening`) so the host firewall is not touched.
- [ ] **Step 1: Dry-run.** `make check PLAYBOOK=site LIMIT=askari TAGS=hardening` — review: openssh-server present, sshd drop-in (`PasswordAuthentication no`, `PermitRootLogin no`), authorized_key for `ansible`, fail2ban installed + sshd jail. Confirm NO firewall tasks appear.
- [ ] **Step 2: Apply.** `make deploy PLAYBOOK=site LIMIT=askari TAGS=hardening` — expect changed for the drop-in, fail2ban install/config; `failed=0`.
- [ ] **Step 3: Verify SSH still works (lock-out guard).** `.venv/bin/ansible offsite_hosts -m ping``pong`. And `.venv/bin/ansible offsite_hosts -b -m command -a 'sshd -t'` → rc=0.
- [ ] **Step 4: Verify fail2ban.** `.venv/bin/ansible offsite_hosts -b -m command -a 'fail2ban-client status sshd'` → shows the sshd jail active.
- [ ] **Step 5: Idempotence.** Re-run Step 2 → `changed=0`.
- [ ] **Step 6: No repo commit** (configures the host, not the repo).
---
### Task 5: Docs
**Files:** Modify `STATUS.md`, `docs/ROADMAP.md`.
- [ ] **Step 1:** In `STATUS.md`, update the `roles/base/` row (under "Scaffolded but empty"/partial) to note the `hardening` concern (ssh + fail2ban) is now built, and **applied to askari**; firewall concern still pending application (mesh-gated). If askari's row exists in "Real and working today," append "SSH hardened + fail2ban (M3)".
- [ ] **Step 2:** In `docs/ROADMAP.md`, mark **M3** as done (ssh + fail2ban built + applied to askari; NetBird agent deferred to M4; host firewall + ubongo hardening at M5).
- [ ] **Step 3:** `make lint`; commit:
```bash
git add STATUS.md docs/ROADMAP.md
git commit -m "docs(base): M3 — ssh hardening + fail2ban applied to askari; STATUS + roadmap"
```
(Co-Authored-By trailer)
---
## Self-Review (completed)
- **Spec coverage:** ssh + fail2ban concerns under `hardening` (Decision 1) → Task 2;
apply-by-tag, no firewall (Decision 2) → Task 4 (`TAGS=hardening`); `base__ssh_authorised_keys`
populated (Decision 3) → Task 2 Step 8; LIMIT/TAGS passthrough (Decision 4) → Task 1;
ADR-002 controls (key-only, no root, fail2ban 5/1h) → Tasks 2; Molecule + live verify
(testing) → Tasks 3, 4. Deferrals (agent/M4, host-fw+ubongo/M5, auditd/Phase 2) honoured.
- **Placeholder scan:** none — all task/template/handler content is concrete.
- **Name consistency:** `base__ssh_*` / `base__fail2ban_*` / `base__ssh_authorised_keys`
used identically across defaults, templates, tasks, and group_vars; handler listen-topics
(`reload sshd`, `restart fail2ban`) match the `notify:` strings.
- **Lock-out guard:** sshd hardening only disables password+root (we use key+sudo); the
`ansible` user's key is preserved (`base__ssh_authorised_keys` has it); `sshd -t`
validates before reload; firewall untouched (`--tags hardening`). Task 4 verifies SSH
post-apply.