boma/CLAUDE.md
sjat 37cece9dbd Add ADR-010 (Forgejo integration) and rbw-unlocked pre-flight convention
ADR-010: API tokens as least-privilege managed secrets, declarative-first (no
click-ops), automation boundary, planned trunk-based CI. CLAUDE.md/AGENTS.md:
check 'rbw unlocked' before vault-dependent tasks (incl. commits) rather than
failing partway.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:34:07 +02:00

175 lines
8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLAUDE.md — Ansible homelab monorepo
This file is read by Claude Code at the start of every session.
Keep it dense and command-focused. Verbose detail lives in `docs/`.
> **Before assuming a role, provider, or pipeline exists, check `STATUS.md`.**
> Much of the design in `docs/decisions/` is intended, not yet built (e.g. the
> `base`/`docker_host` roles are currently empty; Terraform is not `init`ed).
---
## Project in one paragraph
Homelab infrastructure automation for a Proxmox cluster running 25 Debian 13 VMs.
All hosts share a hardened base configuration. Each host runs a defined set of Docker
services deployed via Compose files rendered from Ansible templates. Ansible runs from
a dedicated control VM. CI runs on Forgejo Actions (self-hosted).
Full design rationale: `docs/decisions/`
---
## Key commands
| Action | Command |
|-------------------------------|--------------------------------------------------|
| Lint everything | `make lint` |
| Test a single role | `make test ROLE=<name>` |
| Test all roles | `make test-all` |
| Check mode (dry run) | `make check PLAYBOOK=<name>` |
| Deploy a playbook | `make deploy PLAYBOOK=<name>` |
| Scaffold a new role | `make new-role NAME=<name>` |
| Review repo for drift/cruft | `/review-repo` (Claude command) |
| Encrypt a vault file | `make encrypt FILE=<path>` |
| Decrypt a vault file | `make decrypt FILE=<path>` |
| Install Python deps | `make setup` |
| Install Ansible collections | `make collections` |
| Initialise Terraform | `make tf-init [TF_ENV=staging]` |
| Terraform plan | `make tf-plan [TF_ENV=staging]` |
| Terraform apply | `make tf-apply [TF_ENV=staging]` |
| Regenerate Ansible inventory | `make tf-inventory TF_ENV=<staging\|production>` |
**Always `tf-plan` before `tf-apply`. Always `check` before `deploy`. Never skip lint.**
`TF_ENV` defaults to `staging`. Always specify `TF_ENV=production` explicitly for production.
---
## Ansible conventions
- **FQCN always**: `ansible.builtin.template`, never `template`
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering
- **Handlers**: use `listen:` topic strings, not direct name references
- **Variables**: `rolename__varname` double-underscore namespace for role defaults
- **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only
- **Loops**: prefer `loop:` over `with_items:`
- **Conditionals**: prefer `true`/`false` over `yes`/`no`
---
## Secrets
- Encrypted files are always named `vault.yml`, sitting alongside `vars.yml`
- Never put plaintext secrets in any file not named `vault.yml`
- Structure secrets as a nested map `vault.<service>.<key>` (e.g.
`vault.grafana.admin_password`); reference as `{{ vault.grafana.admin_password }}`
- Vault password comes from Vaultwarden via `rbw` (`scripts/vault-pass-client.sh`,
wired as `vault_password_file`). Unlock once per session: `rbw unlock`
- **Before any vault-dependent task** (`make deploy/check/encrypt/decrypt`, or **any
git commit** — the pre-commit ansible-lint hook decrypts `vault.yml`), run `rbw
unlocked`; if it exits non-zero, ask the user to `rbw unlock` and wait rather than
starting and failing partway. The agent stays unlocked 5h.
- To edit a vault file: `make decrypt FILE=<path>`, edit, `make encrypt FILE=<path>`
---
## Role conventions
- Every role must have `molecule/default/` scenario targeting Debian 13
- Every role must have a populated `README.md`
- Every role must have `meta/main.yml` filled in
- Role names: `snake_case`, descriptive nouns (`base`, `docker_host`, `reverse_proxy`)
- Use `make new-role NAME=<name>` to scaffold — never create role structure by hand
---
## Inventory structure
```
inventories/
production/ # live hosts — edit with care
hosts.yml
group_vars/
all/ # applies to every host
vars.yml
vault.yml
docker_hosts/ # hosts running Docker services
proxmox_hosts/ # Proxmox nodes themselves
host_vars/ # per-host overrides
staging/ # safe to run freely
```
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`
(`control` holds the one manually-provisioned control node — see ADR-009.)
---
## Git conventions
Single-contributor, trunk-based (no merge requests / approval gates):
- `main` is the trunk and must always work — small, safe changes commit straight to it
- Branch for sweeping or AI-driven changes you want to review as one diff or be able
to abandon: `role/<name>`, `fix/<description>`, `feat/<description>`,
`chore/<description>`; merge to `main` when reviewed, then delete the branch
- Run `make lint` (and `make test` for touched roles) before committing
- Commit in logical units; imperative subject ≤72 chars
- AI agents commit their own work in logical units with a `Co-Authored-By` trailer
- Push to the Forgejo `origin` often — it is the off-machine backup
- Never commit secrets; a `vault.yml` must be `$ANSIBLE_VAULT`-encrypted (pre-commit
enforces this, plus gitleaks secret scanning)
---
## Dependencies policy
- **No Galaxy roles** — all roles are local; never add a Galaxy role to `requirements.yml`
- **Collections on demand** — only add a collection when a task in a committed role
uses a module from it; add a comment in `requirements.yml` naming the module(s) used
- Full rationale: `docs/decisions/003-toolchain.md` (Collections and roles policy)
---
## Terraform conventions
- Terraform owns VM existence only — nothing inside a VM, and no DNS records
- Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory)
- OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider
- Environments are separate directories (`staging/`, `production/`), not workspaces
- Secrets via `TF_VAR_*` env vars only — never in `.tfvars` files
- `terraform.tfvars.example` is tracked; `terraform.tfvars` is gitignored
- `.terraform.lock.hcl` is tracked (pins provider versions)
- Full rationale: `docs/decisions/006-terraform.md`
---
## What Claude must not do without explicit instruction
- Run `make deploy` — always run `make check` first and show output
- Run `make tf-apply` — always run `make tf-plan` first and show output
- Modify `inventories/<env>/hosts.yml` directly — regenerate via `make tf-inventory`
- Edit vault-encrypted files directly — decrypt first, re-encrypt after
- Force-push or rewrite already-pushed history on `main`
- Add a collection to `requirements.yml` without a specific module need in existing role tasks
---
## Further reading
| Topic | File |
|------------------------|---------------------------------------|
| Architecture overview | `docs/decisions/001-architecture.md` |
| Security baseline | `docs/decisions/002-security.md` |
| Toolchain choices | `docs/decisions/003-toolchain.md` |
| Docker & Compose model | `docs/decisions/004-docker-model.md` |
| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` |
| Terraform | `docs/decisions/006-terraform.md` |
| Network topology | `docs/decisions/007-network.md` |
| Testing methodology | `docs/decisions/008-testing.md` |
| TF ↔ Ansible handoff | `docs/decisions/009-provisioning-handoff.md` |
| Forgejo & CI | `docs/decisions/010-forgejo-ci.md` |
| Adding a new role | `docs/runbooks/new-role.md` |
| Adding a new host | `docs/runbooks/new-host.md` |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |