# CLAUDE.md — Ansible homelab monorepo

This file is read by Claude Code at the start of every session.
Keep it dense and command-focused. Verbose detail lives in `docs/`.

> **Before assuming a role, provider, or pipeline exists, check `STATUS.md`.**
> Much of the design in `docs/decisions/` is intended, not yet built (e.g. the
> `base`/`docker_host` roles are currently empty; Terraform is not `init`ed).

---

## Project in one paragraph

Homelab infrastructure automation for a Proxmox cluster running 2–5 Debian 13 VMs.
All hosts share a hardened base configuration. Each host runs a defined set of Docker
services deployed via Compose files rendered from Ansible templates. Ansible runs from
a dedicated control VM. CI runs on Forgejo Actions (self-hosted).

Full design rationale: `docs/decisions/`

---

## Key commands

| Action                        | Command                                          |
|-------------------------------|--------------------------------------------------|
| Lint everything               | `make lint`                                      |
| Test a single role            | `make test ROLE=<name>`                          |
| Test all roles                | `make test-all`                                  |
| Check mode (dry run)          | `make check PLAYBOOK=<name>`                     |
| Deploy a playbook             | `make deploy PLAYBOOK=<name>`                    |
| Scaffold a new role           | `make new-role NAME=<name>`                      |
| Review repo for drift/cruft   | `/review-repo` (Claude command)                  |
| Encrypt a vault file          | `make encrypt FILE=<path>`                       |
| Decrypt a vault file          | `make decrypt FILE=<path>`                       |
| Install Python deps           | `make setup`                                     |
| Install Ansible collections   | `make collections`                               |
| Initialise Terraform          | `make tf-init [TF_ENV=staging]`                  |
| Terraform plan                | `make tf-plan [TF_ENV=staging]`                  |
| Terraform apply               | `make tf-apply [TF_ENV=staging]`                 |
| Regenerate Ansible inventory  | `make tf-inventory TF_ENV=<staging\|production>` |

**Always `tf-plan` before `tf-apply`. Always `check` before `deploy`. Never skip lint.**

`TF_ENV` defaults to `staging`. Always specify `TF_ENV=production` explicitly for production.

---

## Ansible conventions

- **FQCN always**: `ansible.builtin.template`, never `template`
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering
- **Handlers**: use `listen:` topic strings, not direct name references
- **Variables**: `rolename__varname` double-underscore namespace for role defaults
- **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only
- **Loops**: prefer `loop:` over `with_items:`
- **Conditionals**: prefer `true`/`false` over `yes`/`no`

---

## Secrets

- Encrypted files are always named `vault.yml`, sitting alongside `vars.yml`
- Never put plaintext secrets in any file not named `vault.yml`
- Structure secrets as a nested map `vault.<service>.<key>` (e.g.
  `vault.grafana.admin_password`); reference as `{{ vault.grafana.admin_password }}`
- Vault password comes from Vaultwarden via `rbw` (`scripts/vault-pass-client.sh`,
  wired as `vault_password_file`). Unlock once per session: `rbw unlock`
- **Before any vault-dependent task** (`make deploy/check/encrypt/decrypt`, or **any
  git commit** — the pre-commit ansible-lint hook decrypts `vault.yml`), run `rbw
  unlocked`; if it exits non-zero, ask the user to `rbw unlock` and wait rather than
  starting and failing partway. The agent stays unlocked 5h.
- To edit a vault file: `make decrypt FILE=<path>`, edit, `make encrypt FILE=<path>`

---

## Role conventions

- Every role must have `molecule/default/` scenario targeting Debian 13
- Every role must have a populated `README.md`
- Every role must have `meta/main.yml` filled in
- Role names: `snake_case`, descriptive nouns (`base`, `docker_host`, `reverse_proxy`)
- Use `make new-role NAME=<name>` to scaffold — never create role structure by hand

---

## Inventory structure

```
inventories/
  production/         # live hosts — edit with care
    hosts.yml
    group_vars/
      all/            # applies to every host
        vars.yml
        vault.yml
      docker_hosts/   # hosts running Docker services
      proxmox_hosts/  # Proxmox nodes themselves
    host_vars/        # per-host overrides
  staging/            # safe to run freely
```

Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`

(`control` holds the one manually-provisioned control node — see ADR-009.)

---

## Git conventions

Single-contributor, trunk-based (no merge requests / approval gates):

- `main` is the trunk and must always work — small, safe changes commit straight to it
- Branch for sweeping or AI-driven changes you want to review as one diff or be able
  to abandon: `role/<name>`, `fix/<description>`, `feat/<description>`,
  `chore/<description>`; merge to `main` when reviewed, then delete the branch
- Run `make lint` (and `make test` for touched roles) before committing
- Commit in logical units; imperative subject ≤72 chars
- AI agents commit their own work in logical units with a `Co-Authored-By` trailer
- Push to the Forgejo `origin` often — it is the off-machine backup
- Never commit secrets; a `vault.yml` must be `$ANSIBLE_VAULT`-encrypted (pre-commit
  enforces this, plus gitleaks secret scanning)

---

## Dependencies policy

- **No Galaxy roles** — all roles are local; never add a Galaxy role to `requirements.yml`
- **Collections on demand** — only add a collection when a task in a committed role
  uses a module from it; add a comment in `requirements.yml` naming the module(s) used
- Full rationale: `docs/decisions/003-toolchain.md` (Collections and roles policy)

---

## Terraform conventions

- Terraform owns VM existence only — nothing inside a VM, and no DNS records
- Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory)
- OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider
- Environments are separate directories (`staging/`, `production/`), not workspaces
- Secrets via `TF_VAR_*` env vars only — never in `.tfvars` files
- `terraform.tfvars.example` is tracked; `terraform.tfvars` is gitignored
- `.terraform.lock.hcl` is tracked (pins provider versions)
- Full rationale: `docs/decisions/006-terraform.md`

---

## What Claude must not do without explicit instruction

- Run `make deploy` — always run `make check` first and show output
- Run `make tf-apply` — always run `make tf-plan` first and show output
- Modify `inventories/<env>/hosts.yml` directly — regenerate via `make tf-inventory`
- Edit vault-encrypted files directly — decrypt first, re-encrypt after
- Force-push or rewrite already-pushed history on `main`
- Add a collection to `requirements.yml` without a specific module need in existing role tasks

---

## Further reading

| Topic                  | File                                  |
|------------------------|---------------------------------------|
| Architecture overview  | `docs/decisions/001-architecture.md`  |
| Security baseline      | `docs/decisions/002-security.md`      |
| Toolchain choices      | `docs/decisions/003-toolchain.md`     |
| Docker & Compose model | `docs/decisions/004-docker-model.md`  |
| Bootstrapping hosts    | `docs/decisions/005-bootstrapping.md` |
| Terraform              | `docs/decisions/006-terraform.md`     |
| Network topology       | `docs/decisions/007-network.md`       |
| Testing methodology    | `docs/decisions/008-testing.md`       |
| TF ↔ Ansible handoff   | `docs/decisions/009-provisioning-handoff.md` |
| Forgejo & CI           | `docs/decisions/010-forgejo-ci.md`    |
| Adding a new role      | `docs/runbooks/new-role.md`           |
| Adding a new host      | `docs/runbooks/new-host.md`           |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md`     |