Add project orientation and contributor docs

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-05-30 14:10:01 +02:00
parent 9a8181ef18
commit 19d93d32dc
5 changed files with 386 additions and 0 deletions

23
AGENTS.md Normal file
View file

@ -0,0 +1,23 @@
# Guidance for AI coding agents
**Read `CLAUDE.md` first — it is the authoritative, detailed guide for this repo.**
This file exists so that non-Claude tools find the same rules; `CLAUDE.md` is
canonical. Also read **`STATUS.md`** to learn what actually exists versus what is
only designed — much of the ADR-described design is not built yet.
## Non-negotiables (full detail in CLAUDE.md)
- **Verify before claiming done.** Run `make lint` and the relevant `make check` /
`make test`, and report the real output. Never assert success you haven't observed.
- **Never edit generated files** (e.g. `inventories/*/hosts.yml`). Edit the source
(`terraform/environments/<env>/main.tf`) and regenerate with `make tf-inventory`.
Generated files carry a header saying so.
- **Secrets only in `vault.yml`** files — never plaintext elsewhere. Never read,
print, or commit `.vault_pass`.
- **No `make deploy` / `make tf-apply`** without running `make check` / `make tf-plan`
first and showing the output.
- **Before deleting or overwriting a file you did not create, read it first** and
surface what you find rather than proceeding blind.
- **Check `STATUS.md`** before assuming a role, provider, or pipeline exists.
- **Git**: `main` must always work; branch for sweeping changes. Commit your work in
logical units with imperative ≤72-char subjects and a `Co-Authored-By` trailer.

167
CLAUDE.md Normal file
View file

@ -0,0 +1,167 @@
# CLAUDE.md — Ansible homelab monorepo
This file is read by Claude Code at the start of every session.
Keep it dense and command-focused. Verbose detail lives in `docs/`.
> **Before assuming a role, provider, or pipeline exists, check `STATUS.md`.**
> Much of the design in `docs/decisions/` is intended, not yet built (e.g. the
> `base`/`docker_host` roles are currently empty; Terraform is not `init`ed).
---
## Project in one paragraph
Homelab infrastructure automation for a Proxmox cluster running 25 Debian 13 VMs.
All hosts share a hardened base configuration. Each host runs a defined set of Docker
services deployed via Compose files rendered from Ansible templates. Ansible runs from
a dedicated control VM. CI runs on Forgejo Actions (self-hosted).
Full design rationale: `docs/decisions/`
---
## Key commands
| Action | Command |
|-------------------------------|--------------------------------------------------|
| Lint everything | `make lint` |
| Test a single role | `make test ROLE=<name>` |
| Test all roles | `make test-all` |
| Check mode (dry run) | `make check PLAYBOOK=<name>` |
| Deploy a playbook | `make deploy PLAYBOOK=<name>` |
| Scaffold a new role | `make new-role NAME=<name>` |
| Encrypt a vault file | `make encrypt FILE=<path>` |
| Decrypt a vault file | `make decrypt FILE=<path>` |
| Install Python deps | `make setup` |
| Install Ansible collections | `make collections` |
| Initialise Terraform | `make tf-init [TF_ENV=staging]` |
| Terraform plan | `make tf-plan [TF_ENV=staging]` |
| Terraform apply | `make tf-apply [TF_ENV=staging]` |
| Regenerate Ansible inventory | `make tf-inventory TF_ENV=<staging\|production>` |
**Always `tf-plan` before `tf-apply`. Always `check` before `deploy`. Never skip lint.**
`TF_ENV` defaults to `staging`. Always specify `TF_ENV=production` explicitly for production.
---
## Ansible conventions
- **FQCN always**: `ansible.builtin.template`, never `template`
- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering
- **Handlers**: use `listen:` topic strings, not direct name references
- **Variables**: `rolename__varname` double-underscore namespace for role defaults
- **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only
- **Loops**: prefer `loop:` over `with_items:`
- **Conditionals**: prefer `true`/`false` over `yes`/`no`
---
## Secrets
- Encrypted files are always named `vault.yml`, sitting alongside `vars.yml`
- Never put plaintext secrets in any file not named `vault.yml`
- Vault password file: `.vault_pass` (gitignored — obtain via secure channel)
- To edit a vault file: `make decrypt FILE=<path>`, edit, `make encrypt FILE=<path>`
---
## Role conventions
- Every role must have `molecule/default/` scenario targeting Debian 13
- Every role must have a populated `README.md`
- Every role must have `meta/main.yml` filled in
- Role names: `snake_case`, descriptive nouns (`base`, `docker_host`, `reverse_proxy`)
- Use `make new-role NAME=<name>` to scaffold — never create role structure by hand
---
## Inventory structure
```
inventories/
production/ # live hosts — edit with care
hosts.yml
group_vars/
all/ # applies to every host
vars.yml
vault.yml
control/ # the control node (baseline config only)
docker_hosts/ # hosts running Docker services
proxmox_hosts/ # Proxmox nodes themselves
host_vars/ # per-host overrides
staging/ # safe to run freely
```
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`
(`control` holds the one manually-provisioned control node — see ADR-009.)
---
## Git conventions
Single-contributor, trunk-based (no merge requests / approval gates):
- `main` is the trunk and must always work — small, safe changes commit straight to it
- Branch for sweeping or AI-driven changes you want to review as one diff or be able
to abandon: `role/<name>`, `fix/<description>`, `feat/<description>`,
`chore/<description>`; merge to `main` when reviewed, then delete the branch
- Run `make lint` (and `make test` for touched roles) before committing
- Commit in logical units; imperative subject ≤72 chars
- AI agents commit their own work in logical units with a `Co-Authored-By` trailer
- Push to the Forgejo `origin` often — it is the off-machine backup
- Never commit secrets; a `vault.yml` must be `$ANSIBLE_VAULT`-encrypted (pre-commit
enforces this, plus gitleaks secret scanning)
---
## Dependencies policy
- **No Galaxy roles** — all roles are local; never add a Galaxy role to `requirements.yml`
- **Collections on demand** — only add a collection when a task in a committed role
uses a module from it; add a comment in `requirements.yml` naming the module(s) used
- Full rationale: `docs/decisions/003-toolchain.md` (Collections and roles policy)
---
## Terraform conventions
- Terraform owns VM existence only — nothing inside a VM, and no DNS records
- Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory)
- OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider
- Environments are separate directories (`staging/`, `production/`), not workspaces
- Secrets via `TF_VAR_*` env vars only — never in `.tfvars` files
- `terraform.tfvars.example` is tracked; `terraform.tfvars` is gitignored
- `.terraform.lock.hcl` is tracked (pins provider versions)
- Full rationale: `docs/decisions/006-terraform.md`
---
## What Claude must not do without explicit instruction
- Run `make deploy` — always run `make check` first and show output
- Run `make tf-apply` — always run `make tf-plan` first and show output
- Modify `inventories/production/hosts.yml` directly — regenerate via `make tf-inventory`
- Edit vault-encrypted files directly — decrypt first, re-encrypt after
- Push to `main` branch
- Add a collection to `requirements.yml` without a specific module need in existing role tasks
---
## Further reading
| Topic | File |
|------------------------|---------------------------------------|
| Architecture overview | `docs/decisions/001-architecture.md` |
| Security baseline | `docs/decisions/002-security.md` |
| Toolchain choices | `docs/decisions/003-toolchain.md` |
| Docker & Compose model | `docs/decisions/004-docker-model.md` |
| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` |
| Terraform | `docs/decisions/006-terraform.md` |
| Network topology | `docs/decisions/007-network.md` |
| Testing methodology | `docs/decisions/008-testing.md` |
| TF ↔ Ansible handoff | `docs/decisions/009-provisioning-handoff.md` |
| Adding a new role | `docs/runbooks/new-role.md` |
| Adding a new host | `docs/runbooks/new-host.md` |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |

59
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,59 @@
# Contributing
## Conventions
- All Ansible modules use FQCN: `ansible.builtin.template`, not `template`
- Every task has a `name:` that reads as a sentence and at least one tag
- Role variables use `rolename__varname` double-underscore namespace
- No plain text secrets outside `vault.yml` files
## Branching
Single-contributor, trunk-based:
- `main` is the trunk and must always work. Small, self-contained changes commit
straight to `main`.
- Use a short-lived branch for sweeping or AI-driven changes you want to review as
one diff or be able to abandon: `role/<name>`, `fix/<description>`,
`feat/<description>`, `chore/<description>`. Merge to `main` when it looks right,
then delete the branch.
- Run `make lint` (and `make test` for touched roles) before committing.
- Commit messages: imperative mood, ≤72-char subject; commit in logical units.
- Push to Forgejo often — it is the off-machine backup.
## Adding a role
Follow the runbook: `docs/runbooks/new-role.md`
Always use `make new-role NAME=<name>` to scaffold — never create structure by hand.
## Secrets
Vault password is shared via a secure channel (password manager).
Never commit `.vault_pass`. Never put secrets in non-`vault.yml` files.
See `docs/runbooks/rotate-secrets.md` for rotation procedures.
## Generated files
Some files are produced by tooling and must not be hand-edited — change the source
and regenerate. Each generated file carries a header saying so.
| Generated file | Source of truth | Regenerate with |
|---|---|---|
| `inventories/<env>/hosts.yml` | `terraform/environments/<env>/main.tf` (`local.vms`) | `make tf-inventory TF_ENV=<env>` |
Exception: the control node is added to `hosts.yml` by hand — see
`docs/runbooks/new-host.md`.
## Testing
Before opening a merge request:
```bash
make lint
make test ROLE=<affected-role>
make check PLAYBOOK=site
```
All three must pass cleanly.

86
README.md Normal file
View file

@ -0,0 +1,86 @@
# Ansible homelab
Infrastructure automation for a Proxmox-based homelab running primarily Debian 13 VMs
with Docker services. Stable, secure, and fully managed via Ansible.
## Quick start (control node)
```bash
git clone <repo-url> ~/ansible
cd ~/ansible
# Create venv and install dependencies
make setup
make collections
# Place vault password (obtain via secure channel)
echo "your-vault-password" > .vault_pass
chmod 600 .vault_pass
# Verify setup
make lint
```
## Common operations
| What | Command |
| --------------------- | ------------------------------ |
| Lint everything | `make lint` |
| Dry-run site playbook | `make check PLAYBOOK=site` |
| Deploy everything | `make deploy PLAYBOOK=site` |
| Test a role | `make test ROLE=base` |
| Scaffold a new role | `make new-role NAME=myservice` |
See `Makefile` for the full list of targets.
## Project structure
```
.
├── CLAUDE.md # Claude Code session context
├── Makefile # All operations go through here
├── ansible.cfg # Project-scoped Ansible config
├── requirements.txt # Python dependencies
├── requirements.yml # Ansible collections
├── docs/
│ ├── decisions/ # Architecture decision records (ADRs)
│ └── runbooks/ # Step-by-step operational procedures
├── inventories/
│ ├── production/ # Live hosts — edit carefully
│ └── staging/ # Test hosts — safe to run freely
├── playbooks/ # Orchestration playbooks
│ ├── site.yml # Full standard state
│ └── bootstrap.yml # First-run new host setup
├── roles/ # Ansible roles
│ ├── base/ # OS baseline applied to all hosts
│ └── docker_host/ # Docker runtime setup
├── terraform/ # VM provisioning + infra DNS (see ADR-006/009)
│ ├── modules/ # Reusable modules (proxmox_vm)
│ └── environments/ # Per-env state: staging/, production/
└── scripts/ # Helper scripts (tf_to_inventory.py)
```
## Documentation
- **Current state (built vs planned): `STATUS.md`** — read this before assuming
something exists; the ADRs describe intent, not necessarily reality.
- AI agents: `AGENTS.md` (points to `CLAUDE.md`, the authoritative guide)
- Architecture: `docs/decisions/001-architecture.md`
- Security baseline: `docs/decisions/002-security.md`
- Toolchain decisions: `docs/decisions/003-toolchain.md`
- Docker model: `docs/decisions/004-docker-model.md`
- Bootstrapping: `docs/decisions/005-bootstrapping.md`
- Terraform: `docs/decisions/006-terraform.md`
- Network topology: `docs/decisions/007-network.md`
- Testing methodology: `docs/decisions/008-testing.md`
- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md`
## Contributing
See `CONTRIBUTING.md` for conventions, branching strategy, and how to add roles.

51
STATUS.md Normal file
View file

@ -0,0 +1,51 @@
# Project status — what's real vs planned
This repo is partly aspirational: the ADRs in `docs/decisions/` describe the
*intended* design, and some of it is **not built yet**. This file is the ground
truth. **Before relying on a role, provider, or pipeline existing, check here.**
If something is listed as "designed, not built", do not assume it works.
_Last reviewed: 2026-05-30._
## Real and working today
| Thing | State |
|---|---|
| `playbooks/bootstrap.yml` | Works — self-contained (installs Python, creates the `ansible` user + sudoers) |
| `scripts/tf_to_inventory.py` | Works — stdlib only; `terraform output -json``hosts.yml` |
| `.docker/molecule-debian13/Dockerfile` | Present — custom Molecule test image (ADR-008) |
| `docs/decisions/*`, `docs/runbooks/*` | Current and mutually reconciled |
| `Makefile`, lint config (`.ansible-lint`, `.yamllint`), `.gitignore` | Present and used |
| `git` (local) | Initialized — trunk-based on `main`. Off-machine remote (Forgejo) being set up separately. |
| Pre-commit hooks | Configured: lint, gitleaks, vault-encryption guard. Activate with `pre-commit install` after `make setup`. |
| Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below |
## Scaffolded but empty — NOT implemented
| Thing | State |
|---|---|
| `roles/base/` | Empty directory. `site.yml` references it, but it applies nothing. |
| `roles/docker_host/` | Empty directory. Same. |
| `inventories/*/hosts.yml` | Placeholder stubs (commented examples); regenerated by `make tf-inventory` once Terraform has hosts |
| `inventories/production/group_vars/{docker_hosts,proxmox_hosts}/` | Empty dirs |
So `make deploy PLAYBOOK=site` currently does effectively nothing — the roles it
calls are empty.
## Designed but not built
| Thing | Designed in | Notes |
|---|---|---|
| `dns` role (renders the internal zone) | ADR-007 / ADR-009 | Does not exist. Internal DNS ownership is assigned to it by design. |
| Terraform actually provisioning | ADR-006 / ADR-009 | Never `terraform init`ed: no `.terraform.lock.hcl`, no state, no real `local.vms` entries |
| CI (Forgejo Actions) | ADR-003 / ADR-008 | Pipeline described; not implemented |
| Level 2 / 3 testing (staging, `askari` smoke) | ADR-008 | Depends on real VMs / `askari`, which don't exist yet |
| Per-service roles | ADR-004 | Model defined; no service roles built |
| Forgejo remote + CI | ADR-003 / ADR-008 | Local git is live; pushing to `git.baobab.band` and Actions CI are being set up |
## Keeping this honest
Update this file whenever you build, stub, or remove something. It is the first
place an AI tool or new contributor should look to learn what they can actually
rely on. When a row moves from "designed" to "working", move it up — don't leave
stale optimism here.