boma/docs/runbooks/new-role.md

125 lines
4.6 KiB
Markdown
Raw Permalink Normal View History

# Runbook — Adding a new Ansible role
## When to create a new role
Create a new role when you need to manage a distinct, reusable unit of
configuration — a service, a system component, or a behaviour applied to
a group of hosts.
Do not create a role for a single task that logically belongs in an existing role.
## Procedure
### 1. Scaffold the role
```bash
make new-role NAME=<rolename>
```
This creates the full directory structure and placeholder files under `roles/<rolename>/`.
### 2. Fill in meta/main.yml
```yaml
galaxy_info:
role_name: <rolename>
author: <your name>
description: <one sentence>
min_ansible_version: "2.15"
platforms:
- name: Debian
versions:
- trixie # Debian 13
```
### 3. Define defaults
Add all tuneable variables to `defaults/main.yml` with inline comments explaining
each variable. Use the `rolename__varname` namespace convention.
### 4. Write tasks
- Use FQCN for all modules
- Every task must have a `name:` that reads as a sentence
- Every task must have at least one `tags:` entry
- Notify handlers by `listen:` topic string, not handler name
### 5. Configure Molecule
Edit `molecule/default/molecule.yml` to use the Debian 13 test image.
Write a `converge.yml` that applies the role. Write a `verify.yml` that
asserts the expected state.
### 6. Write the README
Document:
- Purpose of the role (one paragraph)
- All variables from `defaults/main.yml` with types, defaults, and descriptions
- Example playbook usage
- Any dependencies or prerequisites
### 7. Test locally
```bash
make test ROLE=<rolename>
```
Fix any lint or test failures before committing.
### 8. Add to a playbook
Add the role to the appropriate playbook in `playbooks/` and add the host group
to `inventories/staging/hosts.yml` for integration testing.
### 9. Write the per-service security record (services)
For a **service** role, copy `docs/security/service-security-template.md` to
`roles/<rolename>/SECURITY.md` and fill it in: exposure, the checklist status
(from `docs/security/service-checklist.md`), service-specific hardening, and any
residual/accepted risks. Filling the **Checklist status** section is how the
service clears the security bar — record any conscious deviation in
`docs/security/accepted-risks.md`. The bar is established by ADR-002; enforcement is
manual in review today, with the planned `/security-review` aggregating every
`roles/*/SECURITY.md` to automate it.
### 10. Write the per-service verification spec (services)
For a **service** role, copy `docs/testing/service-verify-template.md` to
`roles/<rolename>/VERIFY.md` and fill it in: the critical user journeys that define
"working" for this service, what good looks like, what is not browser-verifiable
(→ manual handoff), and the test data needed. This is the per-service backbone for the
Level 4 `/verify-service` check (ADR-008 / ADR-017) and is part of the pre-production
service-clearance gate (`docs/security/service-checklist.md`).
### 11. Write the per-service operational-access record (services)
For a **service** role, copy `docs/access/service-access-template.md` to
`roles/<rolename>/ACCESS.md` and populate the role's `access__*` data
(`access__service`, `access__compose_project`/`_path`, `access__containers`,
`access__log.loki_labels`, and `access__api``enabled` + endpoint + `firewall_ref` +
`auth.vault_ref` + `health_path`, or `enabled: false` with a reason). `ACCESS.md` is
rendered from that data; the admin-API path must `firewall_ref` an entry in the
`group_vars` firewall catalog, never open a port itself (ADR-020/021). Once hosts exist,
`/check-access <rolename>` proves the documented paths are live — part of the
service-clearance gate (`docs/security/service-checklist.md`).
### 12. Write the per-service backup record (stateful services)
For a **stateful** service role, copy `docs/backup/service-backup-template.md` to
`roles/<rolename>/BACKUP.md` and populate the role's `backup__*` data (`backup__service`,
`backup__paths`, `backup__dumps``cmd` + `dest` per logical dump — and `backup__quiesce`;
ADR-022). Prefer logical dumps (`pg_dump`/`mysqldump`) over file-level DB copies. `BACKUP.md`
is rendered from that data. A **stateless** service sets `backup__state: false` with a
reason and gets no `BACKUP.md`. Once the backup node exists, `/check-backup <rolename>`
proves the declared state is captured — part of the service-clearance gate
(`docs/security/service-checklist.md`).
### 13. Commit
```bash
git checkout -b role/<rolename>
git add roles/<rolename>
git commit -m "Add <rolename> role"
# merge to main once make test passes, then delete the branch
```