4.6 KiB
Runbook — Adding a new Ansible role
When to create a new role
Create a new role when you need to manage a distinct, reusable unit of configuration — a service, a system component, or a behaviour applied to a group of hosts.
Do not create a role for a single task that logically belongs in an existing role.
Procedure
1. Scaffold the role
make new-role NAME=<rolename>
This creates the full directory structure and placeholder files under roles/<rolename>/.
2. Fill in meta/main.yml
galaxy_info:
role_name: <rolename>
author: <your name>
description: <one sentence>
min_ansible_version: "2.15"
platforms:
- name: Debian
versions:
- trixie # Debian 13
3. Define defaults
Add all tuneable variables to defaults/main.yml with inline comments explaining
each variable. Use the rolename__varname namespace convention.
4. Write tasks
- Use FQCN for all modules
- Every task must have a
name:that reads as a sentence - Every task must have at least one
tags:entry - Notify handlers by
listen:topic string, not handler name
5. Configure Molecule
Edit molecule/default/molecule.yml to use the Debian 13 test image.
Write a converge.yml that applies the role. Write a verify.yml that
asserts the expected state.
6. Write the README
Document:
- Purpose of the role (one paragraph)
- All variables from
defaults/main.ymlwith types, defaults, and descriptions - Example playbook usage
- Any dependencies or prerequisites
7. Test locally
make test ROLE=<rolename>
Fix any lint or test failures before committing.
8. Add to a playbook
Add the role to the appropriate playbook in playbooks/ and add the host group
to inventories/staging/hosts.yml for integration testing.
9. Write the per-service security record (services)
For a service role, copy docs/security/service-security-template.md to
roles/<rolename>/SECURITY.md and fill it in: exposure, the checklist status
(from docs/security/service-checklist.md), service-specific hardening, and any
residual/accepted risks. Filling the Checklist status section is how the
service clears the security bar — record any conscious deviation in
docs/security/accepted-risks.md. The bar is established by ADR-002; enforcement is
manual in review today, with the planned /security-review aggregating every
roles/*/SECURITY.md to automate it.
10. Write the per-service verification spec (services)
For a service role, copy docs/testing/service-verify-template.md to
roles/<rolename>/VERIFY.md and fill it in: the critical user journeys that define
"working" for this service, what good looks like, what is not browser-verifiable
(→ manual handoff), and the test data needed. This is the per-service backbone for the
Level 4 /verify-service check (ADR-008 / ADR-017) and is part of the pre-production
service-clearance gate (docs/security/service-checklist.md).
11. Write the per-service operational-access record (services)
For a service role, copy docs/access/service-access-template.md to
roles/<rolename>/ACCESS.md and populate the role's access__* data
(access__service, access__compose_project/_path, access__containers,
access__log.loki_labels, and access__api — enabled + endpoint + firewall_ref +
auth.vault_ref + health_path, or enabled: false with a reason). ACCESS.md is
rendered from that data; the admin-API path must firewall_ref an entry in the
group_vars firewall catalog, never open a port itself (ADR-020/021). Once hosts exist,
/check-access <rolename> proves the documented paths are live — part of the
service-clearance gate (docs/security/service-checklist.md).
12. Write the per-service backup record (stateful services)
For a stateful service role, copy docs/backup/service-backup-template.md to
roles/<rolename>/BACKUP.md and populate the role's backup__* data (backup__service,
backup__paths, backup__dumps — cmd + dest per logical dump — and backup__quiesce;
ADR-022). Prefer logical dumps (pg_dump/mysqldump) over file-level DB copies. BACKUP.md
is rendered from that data. A stateless service sets backup__state: false with a
reason and gets no BACKUP.md. Once the backup node exists, /check-backup <rolename>
proves the declared state is captured — part of the service-clearance gate
(docs/security/service-checklist.md).
13. Commit
git checkout -b role/<rolename>
git add roles/<rolename>
git commit -m "Add <rolename> role"
# merge to main once make test passes, then delete the branch