boma/docs/runbooks/new-role.md
sjat f51ae1a13d docs(runbook): integration-testing runbook + pre-flight cross-links
- New docs/runbooks/integration-testing.md: when to use (firewall/
  sshd/boot/Docker changes); make test-integration commands; lower-
  level driver sub-commands; cert tier guidance; diagnostics dir;
  VM inspection (virsh console / SSH); safety invariants; resource
  constraints; adding a new profile; self-validating acceptance test.
- docs/runbooks/new-host.md: pre-flight warning before deploying
  lockout-risky changes (firewall/sshd/boot) while break-glass is open
- docs/runbooks/new-role.md: step 13 pre-flight for lockout-risky roles

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:59:06 +02:00

5 KiB

Runbook — Adding a new Ansible role

When to create a new role

Create a new role when you need to manage a distinct, reusable unit of configuration — a service, a system component, or a behaviour applied to a group of hosts.

Do not create a role for a single task that logically belongs in an existing role.

Procedure

1. Scaffold the role

make new-role NAME=<rolename>

This creates the full directory structure and placeholder files under roles/<rolename>/.

2. Fill in meta/main.yml

galaxy_info:
  role_name: <rolename>
  author: <your name>
  description: <one sentence>
  min_ansible_version: "2.15"
  platforms:
    - name: Debian
      versions:
        - trixie  # Debian 13

3. Define defaults

Add all tuneable variables to defaults/main.yml with inline comments explaining each variable. Use the rolename__varname namespace convention.

4. Write tasks

  • Use FQCN for all modules
  • Every task must have a name: that reads as a sentence
  • Every task must have at least one tags: entry
  • Notify handlers by listen: topic string, not handler name

5. Configure Molecule

Edit molecule/default/molecule.yml to use the Debian 13 test image. Write a converge.yml that applies the role. Write a verify.yml that asserts the expected state.

6. Write the README

Document:

  • Purpose of the role (one paragraph)
  • All variables from defaults/main.yml with types, defaults, and descriptions
  • Example playbook usage
  • Any dependencies or prerequisites

7. Test locally

make test ROLE=<rolename>

Fix any lint or test failures before committing.

8. Add to a playbook

Add the role to the appropriate playbook in playbooks/ and add the host group to inventories/staging/hosts.yml for integration testing.

9. Write the per-service security record (services)

For a service role, copy docs/security/service-security-template.md to roles/<rolename>/SECURITY.md and fill it in: exposure, the checklist status (from docs/security/service-checklist.md), service-specific hardening, and any residual/accepted risks. Filling the Checklist status section is how the service clears the security bar — record any conscious deviation in docs/security/accepted-risks.md. The bar is established by ADR-002; enforcement is manual in review today, with the planned /security-review aggregating every roles/*/SECURITY.md to automate it.

10. Write the per-service verification spec (services)

For a service role, copy docs/testing/service-verify-template.md to roles/<rolename>/VERIFY.md and fill it in: the critical user journeys that define "working" for this service, what good looks like, what is not browser-verifiable (→ manual handoff), and the test data needed. This is the per-service backbone for the Level 4 /verify-service check (ADR-008 / ADR-017) and is part of the pre-production service-clearance gate (docs/security/service-checklist.md).

11. Write the per-service operational-access record (services)

For a service role, copy docs/access/service-access-template.md to roles/<rolename>/ACCESS.md and populate the role's access__* data (access__service, access__compose_project/_path, access__containers, access__log.loki_labels, and access__apienabled + endpoint + firewall_ref + auth.vault_ref + health_path, or enabled: false with a reason). ACCESS.md is rendered from that data; the admin-API path must firewall_ref an entry in the group_vars firewall catalog, never open a port itself (ADR-020/021). Once hosts exist, /check-access <rolename> proves the documented paths are live — part of the service-clearance gate (docs/security/service-checklist.md).

12. Write the per-service backup record (stateful services)

For a stateful service role, copy docs/backup/service-backup-template.md to roles/<rolename>/BACKUP.md and populate the role's backup__* data (backup__service, backup__paths, backup__dumpscmd + dest per logical dump — and backup__quiesce; ADR-022). Prefer logical dumps (pg_dump/mysqldump) over file-level DB copies. BACKUP.md is rendered from that data. A stateless service sets backup__state: false with a reason and gets no BACKUP.md. Once the backup node exists, /check-backup <rolename> proves the declared state is captured — part of the service-clearance gate (docs/security/service-checklist.md).

13. Pre-flight for lockout-risky roles

If the new role touches nftables rules, SSH configuration, or boot ordering, run a local VM integration test and confirm reboot-recovery before deploying to a live host and while the host's break-glass (Proxmox console / Hetzner console) is still open:

make test-integration HOST=<target-host>

See docs/runbooks/integration-testing.md and ADR-025.

14. Commit

git checkout -b role/<rolename>
git add roles/<rolename>
git commit -m "Add <rolename> role"
# merge to main once make test passes, then delete the branch