124 lines
4.6 KiB
Markdown
124 lines
4.6 KiB
Markdown
# Runbook — Adding a new Ansible role
|
|
|
|
## When to create a new role
|
|
|
|
Create a new role when you need to manage a distinct, reusable unit of
|
|
configuration — a service, a system component, or a behaviour applied to
|
|
a group of hosts.
|
|
|
|
Do not create a role for a single task that logically belongs in an existing role.
|
|
|
|
## Procedure
|
|
|
|
### 1. Scaffold the role
|
|
|
|
```bash
|
|
make new-role NAME=<rolename>
|
|
```
|
|
|
|
This creates the full directory structure and placeholder files under `roles/<rolename>/`.
|
|
|
|
### 2. Fill in meta/main.yml
|
|
|
|
```yaml
|
|
galaxy_info:
|
|
role_name: <rolename>
|
|
author: <your name>
|
|
description: <one sentence>
|
|
min_ansible_version: "2.15"
|
|
platforms:
|
|
- name: Debian
|
|
versions:
|
|
- trixie # Debian 13
|
|
```
|
|
|
|
### 3. Define defaults
|
|
|
|
Add all tuneable variables to `defaults/main.yml` with inline comments explaining
|
|
each variable. Use the `rolename__varname` namespace convention.
|
|
|
|
### 4. Write tasks
|
|
|
|
- Use FQCN for all modules
|
|
- Every task must have a `name:` that reads as a sentence
|
|
- Every task must have at least one `tags:` entry
|
|
- Notify handlers by `listen:` topic string, not handler name
|
|
|
|
### 5. Configure Molecule
|
|
|
|
Edit `molecule/default/molecule.yml` to use the Debian 13 test image.
|
|
Write a `converge.yml` that applies the role. Write a `verify.yml` that
|
|
asserts the expected state.
|
|
|
|
### 6. Write the README
|
|
|
|
Document:
|
|
- Purpose of the role (one paragraph)
|
|
- All variables from `defaults/main.yml` with types, defaults, and descriptions
|
|
- Example playbook usage
|
|
- Any dependencies or prerequisites
|
|
|
|
### 7. Test locally
|
|
|
|
```bash
|
|
make test ROLE=<rolename>
|
|
```
|
|
|
|
Fix any lint or test failures before committing.
|
|
|
|
### 8. Add to a playbook
|
|
|
|
Add the role to the appropriate playbook in `playbooks/` and add the host group
|
|
to `inventories/staging/hosts.yml` for integration testing.
|
|
|
|
### 9. Write the per-service security record (services)
|
|
|
|
For a **service** role, copy `docs/security/service-security-template.md` to
|
|
`roles/<rolename>/SECURITY.md` and fill it in: exposure, the checklist status
|
|
(from `docs/security/service-checklist.md`), service-specific hardening, and any
|
|
residual/accepted risks. Filling the **Checklist status** section is how the
|
|
service clears the security bar — record any conscious deviation in
|
|
`docs/security/accepted-risks.md`. The bar is established by ADR-002; enforcement is
|
|
manual in review today, with the planned `/security-review` aggregating every
|
|
`roles/*/SECURITY.md` to automate it.
|
|
|
|
### 10. Write the per-service verification spec (services)
|
|
|
|
For a **service** role, copy `docs/testing/service-verify-template.md` to
|
|
`roles/<rolename>/VERIFY.md` and fill it in: the critical user journeys that define
|
|
"working" for this service, what good looks like, what is not browser-verifiable
|
|
(→ manual handoff), and the test data needed. This is the per-service backbone for the
|
|
Level 4 `/verify-service` check (ADR-008 / ADR-017) and is part of the pre-production
|
|
service-clearance gate (`docs/security/service-checklist.md`).
|
|
|
|
### 11. Write the per-service operational-access record (services)
|
|
|
|
For a **service** role, copy `docs/access/service-access-template.md` to
|
|
`roles/<rolename>/ACCESS.md` and populate the role's `access__*` data
|
|
(`access__service`, `access__compose_project`/`_path`, `access__containers`,
|
|
`access__log.loki_labels`, and `access__api` — `enabled` + endpoint + `firewall_ref` +
|
|
`auth.vault_ref` + `health_path`, or `enabled: false` with a reason). `ACCESS.md` is
|
|
rendered from that data; the admin-API path must `firewall_ref` an entry in the
|
|
`group_vars` firewall catalog, never open a port itself (ADR-020/021). Once hosts exist,
|
|
`/check-access <rolename>` proves the documented paths are live — part of the
|
|
service-clearance gate (`docs/security/service-checklist.md`).
|
|
|
|
### 12. Write the per-service backup record (stateful services)
|
|
|
|
For a **stateful** service role, copy `docs/backup/service-backup-template.md` to
|
|
`roles/<rolename>/BACKUP.md` and populate the role's `backup__*` data (`backup__service`,
|
|
`backup__paths`, `backup__dumps` — `cmd` + `dest` per logical dump — and `backup__quiesce`;
|
|
ADR-022). Prefer logical dumps (`pg_dump`/`mysqldump`) over file-level DB copies. `BACKUP.md`
|
|
is rendered from that data. A **stateless** service sets `backup__state: false` with a
|
|
reason and gets no `BACKUP.md`. Once the backup node exists, `/check-backup <rolename>`
|
|
proves the declared state is captured — part of the service-clearance gate
|
|
(`docs/security/service-checklist.md`).
|
|
|
|
### 13. Commit
|
|
|
|
```bash
|
|
git checkout -b role/<rolename>
|
|
git add roles/<rolename>
|
|
git commit -m "Add <rolename> role"
|
|
# merge to main once make test passes, then delete the branch
|
|
```
|