boma/docs/superpowers/plans/2026-06-14-base-ssh-fail2ban-m3.md
sjat cff368ece2 docs(spec,plan): M3 — base ssh hardening + fail2ban
ADR-002 baseline (key-only, no root, fail2ban 5/1h) as two base task files under
the existing 'hardening' concern tag; applied to askari by tag (NOT the host
firewall — that's mesh-gated to avoid lockout; Hetzner Cloud Firewall is the
perimeter until M5). NetBird agent deferred to M4. Adds a LIMIT=/TAGS= passthrough
to make check/deploy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:38:38 +02:00

10 KiB
Raw Permalink Blame History

base SSH hardening + fail2ban (M3) Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add SSH-hardening + fail2ban concerns to the base role (ADR-002 baseline) and apply them to askari — without locking anything out.

Architecture: Two new base task files (ssh.yml, fail2ban.yml), both under the existing hardening concern tag, included after firewall.yml. Applied to askari by tag (hardening) so the host firewall (default-deny) is NOT applied pre-mesh — the Hetzner Cloud Firewall remains askari's perimeter until M5. A LIMIT=/TAGS= passthrough on make check/deploy enables the targeted apply.

Tech Stack: Ansible (ansible.builtin, ansible.posix.authorized_key — already vendored), sshd drop-in config, fail2ban.

Spec: docs/superpowers/specs/2026-06-14-base-ssh-fail2ban-m3-design.md

Execution context: Tasks 13 author + Molecule (Docker available). Task 4 applies to live askari (gated; reachable from ubongo). No new billed resources.


Task 1: make check/deploy LIMIT + TAGS passthrough

Files: Modify Makefile (the check and deploy recipes).

  • Step 1: In the check: recipe, change the command line to:
	$(PLAYBOOK_BIN) $(INVENTORY) $(VAULT_ARGS) $(if $(LIMIT),--limit $(LIMIT)) $(if $(TAGS),--tags $(TAGS)) --check --diff playbooks/$(PLAYBOOK).yml
  • Step 2: In the deploy: recipe, change the command line to:
	$(PLAYBOOK_BIN) $(INVENTORY) $(VAULT_ARGS) $(if $(LIMIT),--limit $(LIMIT)) $(if $(TAGS),--tags $(TAGS)) playbooks/$(PLAYBOOK).yml
  • Step 3: Add help lines noting [LIMIT=<host>] [TAGS=<tags>] are optional on check/deploy.
  • Step 4: Sanity-check it parses: make check PLAYBOOK=dns LIMIT=control TAGS=public_dns 2>&1 | tail -2 (should run check-mode scoped to control). Expected: no make/syntax error.
  • Step 5: Commit:
git add Makefile
git commit -m "feat(make): optional LIMIT= and TAGS= passthrough on check/deploy"

(append Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>)


Task 2: base hardening concern — ssh + fail2ban

Files: Create roles/base/tasks/ssh.yml, roles/base/tasks/fail2ban.yml, roles/base/templates/sshd_hardening.conf.j2, roles/base/templates/fail2ban_sshd.local.j2; modify roles/base/tasks/main.yml, roles/base/defaults/main.yml, roles/base/handlers/main.yml, inventories/production/group_vars/all/vars.yml.

  • Step 1: Append to roles/base/defaults/main.yml:

# SSH hardening + fail2ban (ADR-002) — `hardening` concern.
base__ssh_password_authentication: "no"
base__ssh_permit_root_login: "no"
base__fail2ban_maxretry: 5
base__fail2ban_bantime: 1h
base__fail2ban_findtime: 10m
# base__ssh_authorised_keys lives in group_vars/all/vars.yml (per-person control keys).
  • Step 2: Create roles/base/templates/sshd_hardening.conf.j2:
# Managed by Ansible (base role, ADR-002). Do not edit on the host.
PasswordAuthentication {{ base__ssh_password_authentication }}
PermitRootLogin {{ base__ssh_permit_root_login }}
PubkeyAuthentication yes
KbdInteractiveAuthentication no
  • Step 3: Create roles/base/templates/fail2ban_sshd.local.j2:
# Managed by Ansible (base role, ADR-002).
[sshd]
enabled  = true
maxretry = {{ base__fail2ban_maxretry }}
bantime  = {{ base__fail2ban_bantime }}
findtime = {{ base__fail2ban_findtime }}
  • Step 4: Create roles/base/tasks/ssh.yml:
---
- name: Ensure openssh-server is installed
  ansible.builtin.apt:
    name: openssh-server
    state: present
    update_cache: true

- name: Render hardened sshd drop-in
  ansible.builtin.template:
    src: sshd_hardening.conf.j2
    dest: /etc/ssh/sshd_config.d/10-boma.conf
    owner: root
    group: root
    mode: "0644"
  notify: reload sshd

- name: Validate the full sshd config (drop-in included)
  ansible.builtin.command: sshd -t
  changed_when: false

- name: Authorise control SSH keys for the ansible user
  ansible.posix.authorized_key:
    user: "{{ ansible_user | default('ansible') }}"
    key: "{{ base__ssh_authorised_keys | join('\n') }}"
    exclusive: true
  when: base__ssh_authorised_keys | length > 0
  • Step 5: Create roles/base/tasks/fail2ban.yml:
---
- name: Install fail2ban
  ansible.builtin.apt:
    name: fail2ban
    state: present
    update_cache: true

- name: Configure the sshd jail
  ansible.builtin.template:
    src: fail2ban_sshd.local.j2
    dest: /etc/fail2ban/jail.d/sshd.local
    owner: root
    group: root
    mode: "0644"
  notify: restart fail2ban

- name: Enable and start fail2ban
  ansible.builtin.service:
    name: fail2ban
    enabled: true
    state: started
  • Step 6: Replace roles/base/handlers/main.yml:
---
- name: Reload sshd
  listen: reload sshd
  ansible.builtin.service:
    name: ssh
    state: reloaded

- name: Restart fail2ban
  listen: restart fail2ban
  ansible.builtin.service:
    name: fail2ban
    state: restarted
  • Step 7: In roles/base/tasks/main.yml, add after the firewall include:
- name: SSH hardening
  ansible.builtin.include_tasks: ssh.yml
  tags: [hardening]

- name: fail2ban intrusion deterrence
  ansible.builtin.include_tasks: fail2ban.yml
  tags: [hardening]
  • Step 8: In inventories/production/group_vars/all/vars.yml, set base__ssh_authorised_keys (replace the empty []):
base__ssh_authorised_keys:
  - "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKSx1TFLJ9H8vCe5ZJSu7MYmAiH0/OC8evloQjGR0Bqw claude@ubongo"
  • Step 9: make lint — expect 0 failure(s) + check-tags: OK (the hardening tag is already in tests/tags.yml).
  • Step 10: Commit:
git add roles/base inventories/production/group_vars/all/vars.yml
git commit -m "feat(base): ssh hardening + fail2ban (hardening concern, ADR-002)"

(Co-Authored-By trailer)


Task 3: Molecule coverage

Files: Modify roles/base/molecule/default/converge.yml, roles/base/molecule/default/verify.yml.

  • Step 1: In converge.yml, the role already runs with base__firewall_apply: false. Leave base__ssh_authorised_keys unset (defaults to [] → the authorized_key task is skipped, no test user needed). No converge change needed unless vars are missing — confirm the play still has roles: [base].

  • Step 2: Append assertions to verify.yml (after the existing firewall checks):

    - name: sshd drop-in present and config valid
      ansible.builtin.command: sshd -t
      changed_when: false
      tags: [verify]

    - name: PasswordAuthentication is disabled
      ansible.builtin.command: grep -q '^PasswordAuthentication no' /etc/ssh/sshd_config.d/10-boma.conf
      changed_when: false
      tags: [verify]

    - name: fail2ban sshd jail configured
      ansible.builtin.command: grep -q '^\[sshd\]' /etc/fail2ban/jail.d/sshd.local
      changed_when: false
      tags: [verify]
  • Step 3: Run make test ROLE=base. Expected: converge installs openssh-server + fail2ban, renders the drop-ins, validates sshd, starts fail2ban; verify passes; idempotence clean. If the Molecule image lacks systemd-for-fail2ban or apt fails offline, capture the error (the image is systemd-enabled per molecule.yml).
  • Step 4: Commit:
git add roles/base/molecule
git commit -m "test(base): Molecule coverage for ssh hardening + fail2ban"

(Co-Authored-By trailer)


Task 4: Apply to askari (gated — live host)

Runs against live askari (reachable from ubongo). rbw unlocked. Applies ONLY the hardening concern (--tags hardening) so the host firewall is not touched.

  • Step 1: Dry-run. make check PLAYBOOK=site LIMIT=askari TAGS=hardening — review: openssh-server present, sshd drop-in (PasswordAuthentication no, PermitRootLogin no), authorized_key for ansible, fail2ban installed + sshd jail. Confirm NO firewall tasks appear.
  • Step 2: Apply. make deploy PLAYBOOK=site LIMIT=askari TAGS=hardening — expect changed for the drop-in, fail2ban install/config; failed=0.
  • Step 3: Verify SSH still works (lock-out guard). .venv/bin/ansible offsite_hosts -m pingpong. And .venv/bin/ansible offsite_hosts -b -m command -a 'sshd -t' → rc=0.
  • Step 4: Verify fail2ban. .venv/bin/ansible offsite_hosts -b -m command -a 'fail2ban-client status sshd' → shows the sshd jail active.
  • Step 5: Idempotence. Re-run Step 2 → changed=0.
  • Step 6: No repo commit (configures the host, not the repo).

Task 5: Docs

Files: Modify STATUS.md, docs/ROADMAP.md.

  • Step 1: In STATUS.md, update the roles/base/ row (under "Scaffolded but empty"/partial) to note the hardening concern (ssh + fail2ban) is now built, and applied to askari; firewall concern still pending application (mesh-gated). If askari's row exists in "Real and working today," append "SSH hardened + fail2ban (M3)".
  • Step 2: In docs/ROADMAP.md, mark M3 as done (ssh + fail2ban built + applied to askari; NetBird agent deferred to M4; host firewall + ubongo hardening at M5).
  • Step 3: make lint; commit:
git add STATUS.md docs/ROADMAP.md
git commit -m "docs(base): M3 — ssh hardening + fail2ban applied to askari; STATUS + roadmap"

(Co-Authored-By trailer)


Self-Review (completed)

  • Spec coverage: ssh + fail2ban concerns under hardening (Decision 1) → Task 2; apply-by-tag, no firewall (Decision 2) → Task 4 (TAGS=hardening); base__ssh_authorised_keys populated (Decision 3) → Task 2 Step 8; LIMIT/TAGS passthrough (Decision 4) → Task 1; ADR-002 controls (key-only, no root, fail2ban 5/1h) → Tasks 2; Molecule + live verify (testing) → Tasks 3, 4. Deferrals (agent/M4, host-fw+ubongo/M5, auditd/Phase 2) honoured.
  • Placeholder scan: none — all task/template/handler content is concrete.
  • Name consistency: base__ssh_* / base__fail2ban_* / base__ssh_authorised_keys used identically across defaults, templates, tasks, and group_vars; handler listen-topics (reload sshd, restart fail2ban) match the notify: strings.
  • Lock-out guard: sshd hardening only disables password+root (we use key+sudo); the ansible user's key is preserved (base__ssh_authorised_keys has it); sshd -t validates before reload; firewall untouched (--tags hardening). Task 4 verifies SSH post-apply.