diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..3789d72 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,23 @@ +# Guidance for AI coding agents + +**Read `CLAUDE.md` first — it is the authoritative, detailed guide for this repo.** +This file exists so that non-Claude tools find the same rules; `CLAUDE.md` is +canonical. Also read **`STATUS.md`** to learn what actually exists versus what is +only designed — much of the ADR-described design is not built yet. + +## Non-negotiables (full detail in CLAUDE.md) + +- **Verify before claiming done.** Run `make lint` and the relevant `make check` / + `make test`, and report the real output. Never assert success you haven't observed. +- **Never edit generated files** (e.g. `inventories/*/hosts.yml`). Edit the source + (`terraform/environments//main.tf`) and regenerate with `make tf-inventory`. + Generated files carry a header saying so. +- **Secrets only in `vault.yml`** files — never plaintext elsewhere. Never read, + print, or commit `.vault_pass`. +- **No `make deploy` / `make tf-apply`** without running `make check` / `make tf-plan` + first and showing the output. +- **Before deleting or overwriting a file you did not create, read it first** and + surface what you find rather than proceeding blind. +- **Check `STATUS.md`** before assuming a role, provider, or pipeline exists. +- **Git**: `main` must always work; branch for sweeping changes. Commit your work in + logical units with imperative ≤72-char subjects and a `Co-Authored-By` trailer. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..d7125a3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,167 @@ +# CLAUDE.md — Ansible homelab monorepo + +This file is read by Claude Code at the start of every session. +Keep it dense and command-focused. Verbose detail lives in `docs/`. + +> **Before assuming a role, provider, or pipeline exists, check `STATUS.md`.** +> Much of the design in `docs/decisions/` is intended, not yet built (e.g. the +> `base`/`docker_host` roles are currently empty; Terraform is not `init`ed). + +--- + +## Project in one paragraph + +Homelab infrastructure automation for a Proxmox cluster running 2–5 Debian 13 VMs. +All hosts share a hardened base configuration. Each host runs a defined set of Docker +services deployed via Compose files rendered from Ansible templates. Ansible runs from +a dedicated control VM. CI runs on Forgejo Actions (self-hosted). + +Full design rationale: `docs/decisions/` + +--- + +## Key commands + +| Action | Command | +|-------------------------------|--------------------------------------------------| +| Lint everything | `make lint` | +| Test a single role | `make test ROLE=` | +| Test all roles | `make test-all` | +| Check mode (dry run) | `make check PLAYBOOK=` | +| Deploy a playbook | `make deploy PLAYBOOK=` | +| Scaffold a new role | `make new-role NAME=` | +| Encrypt a vault file | `make encrypt FILE=` | +| Decrypt a vault file | `make decrypt FILE=` | +| Install Python deps | `make setup` | +| Install Ansible collections | `make collections` | +| Initialise Terraform | `make tf-init [TF_ENV=staging]` | +| Terraform plan | `make tf-plan [TF_ENV=staging]` | +| Terraform apply | `make tf-apply [TF_ENV=staging]` | +| Regenerate Ansible inventory | `make tf-inventory TF_ENV=` | + +**Always `tf-plan` before `tf-apply`. Always `check` before `deploy`. Never skip lint.** + +`TF_ENV` defaults to `staging`. Always specify `TF_ENV=production` explicitly for production. + +--- + +## Ansible conventions + +- **FQCN always**: `ansible.builtin.template`, never `template` +- **Tags**: every task must have at least one tag; playbooks support `--tags` filtering +- **Handlers**: use `listen:` topic strings, not direct name references +- **Variables**: `rolename__varname` double-underscore namespace for role defaults +- **No inline vars in playbooks**: use `group_vars/` or `host_vars/` only +- **Loops**: prefer `loop:` over `with_items:` +- **Conditionals**: prefer `true`/`false` over `yes`/`no` + +--- + +## Secrets + +- Encrypted files are always named `vault.yml`, sitting alongside `vars.yml` +- Never put plaintext secrets in any file not named `vault.yml` +- Vault password file: `.vault_pass` (gitignored — obtain via secure channel) +- To edit a vault file: `make decrypt FILE=`, edit, `make encrypt FILE=` + +--- + +## Role conventions + +- Every role must have `molecule/default/` scenario targeting Debian 13 +- Every role must have a populated `README.md` +- Every role must have `meta/main.yml` filled in +- Role names: `snake_case`, descriptive nouns (`base`, `docker_host`, `reverse_proxy`) +- Use `make new-role NAME=` to scaffold — never create role structure by hand + +--- + +## Inventory structure + +``` +inventories/ + production/ # live hosts — edit with care + hosts.yml + group_vars/ + all/ # applies to every host + vars.yml + vault.yml + control/ # the control node (baseline config only) + docker_hosts/ # hosts running Docker services + proxmox_hosts/ # Proxmox nodes themselves + host_vars/ # per-host overrides + staging/ # safe to run freely +``` + +Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts` + +(`control` holds the one manually-provisioned control node — see ADR-009.) + +--- + +## Git conventions + +Single-contributor, trunk-based (no merge requests / approval gates): + +- `main` is the trunk and must always work — small, safe changes commit straight to it +- Branch for sweeping or AI-driven changes you want to review as one diff or be able + to abandon: `role/`, `fix/`, `feat/`, + `chore/`; merge to `main` when reviewed, then delete the branch +- Run `make lint` (and `make test` for touched roles) before committing +- Commit in logical units; imperative subject ≤72 chars +- AI agents commit their own work in logical units with a `Co-Authored-By` trailer +- Push to the Forgejo `origin` often — it is the off-machine backup +- Never commit secrets; a `vault.yml` must be `$ANSIBLE_VAULT`-encrypted (pre-commit + enforces this, plus gitleaks secret scanning) + +--- + +## Dependencies policy + +- **No Galaxy roles** — all roles are local; never add a Galaxy role to `requirements.yml` +- **Collections on demand** — only add a collection when a task in a committed role + uses a module from it; add a comment in `requirements.yml` naming the module(s) used +- Full rationale: `docs/decisions/003-toolchain.md` (Collections and roles policy) + +--- + +## Terraform conventions + +- Terraform owns VM existence only — nothing inside a VM, and no DNS records +- Internal DNS is entirely Ansible (the `dns` role renders the zone from inventory) +- OPNsense is entirely Ansible; do not reach for a Terraform OPNsense provider +- Environments are separate directories (`staging/`, `production/`), not workspaces +- Secrets via `TF_VAR_*` env vars only — never in `.tfvars` files +- `terraform.tfvars.example` is tracked; `terraform.tfvars` is gitignored +- `.terraform.lock.hcl` is tracked (pins provider versions) +- Full rationale: `docs/decisions/006-terraform.md` + +--- + +## What Claude must not do without explicit instruction + +- Run `make deploy` — always run `make check` first and show output +- Run `make tf-apply` — always run `make tf-plan` first and show output +- Modify `inventories/production/hosts.yml` directly — regenerate via `make tf-inventory` +- Edit vault-encrypted files directly — decrypt first, re-encrypt after +- Push to `main` branch +- Add a collection to `requirements.yml` without a specific module need in existing role tasks + +--- + +## Further reading + +| Topic | File | +|------------------------|---------------------------------------| +| Architecture overview | `docs/decisions/001-architecture.md` | +| Security baseline | `docs/decisions/002-security.md` | +| Toolchain choices | `docs/decisions/003-toolchain.md` | +| Docker & Compose model | `docs/decisions/004-docker-model.md` | +| Bootstrapping hosts | `docs/decisions/005-bootstrapping.md` | +| Terraform | `docs/decisions/006-terraform.md` | +| Network topology | `docs/decisions/007-network.md` | +| Testing methodology | `docs/decisions/008-testing.md` | +| TF ↔ Ansible handoff | `docs/decisions/009-provisioning-handoff.md` | +| Adding a new role | `docs/runbooks/new-role.md` | +| Adding a new host | `docs/runbooks/new-host.md` | +| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..cd5d96b --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,59 @@ +# Contributing + +## Conventions + +- All Ansible modules use FQCN: `ansible.builtin.template`, not `template` +- Every task has a `name:` that reads as a sentence and at least one tag +- Role variables use `rolename__varname` double-underscore namespace +- No plain text secrets outside `vault.yml` files + +## Branching + +Single-contributor, trunk-based: + +- `main` is the trunk and must always work. Small, self-contained changes commit + straight to `main`. +- Use a short-lived branch for sweeping or AI-driven changes you want to review as + one diff or be able to abandon: `role/`, `fix/`, + `feat/`, `chore/`. Merge to `main` when it looks right, + then delete the branch. +- Run `make lint` (and `make test` for touched roles) before committing. +- Commit messages: imperative mood, ≤72-char subject; commit in logical units. +- Push to Forgejo often — it is the off-machine backup. + +## Adding a role + +Follow the runbook: `docs/runbooks/new-role.md` + +Always use `make new-role NAME=` to scaffold — never create structure by hand. + +## Secrets + +Vault password is shared via a secure channel (password manager). +Never commit `.vault_pass`. Never put secrets in non-`vault.yml` files. + +See `docs/runbooks/rotate-secrets.md` for rotation procedures. + +## Generated files + +Some files are produced by tooling and must not be hand-edited — change the source +and regenerate. Each generated file carries a header saying so. + +| Generated file | Source of truth | Regenerate with | +|---|---|---| +| `inventories//hosts.yml` | `terraform/environments//main.tf` (`local.vms`) | `make tf-inventory TF_ENV=` | + +Exception: the control node is added to `hosts.yml` by hand — see +`docs/runbooks/new-host.md`. + +## Testing + +Before opening a merge request: + +```bash +make lint +make test ROLE= +make check PLAYBOOK=site +``` + +All three must pass cleanly. diff --git a/README.md b/README.md new file mode 100644 index 0000000..50bc6df --- /dev/null +++ b/README.md @@ -0,0 +1,86 @@ +# Ansible homelab + +Infrastructure automation for a Proxmox-based homelab running primarily Debian 13 VMs +with Docker services. Stable, secure, and fully managed via Ansible. + +## Quick start (control node) + +```bash +git clone ~/ansible +cd ~/ansible + +# Create venv and install dependencies +make setup +make collections + +# Place vault password (obtain via secure channel) +echo "your-vault-password" > .vault_pass +chmod 600 .vault_pass + +# Verify setup +make lint +``` + +## Common operations + +| What | Command | +| --------------------- | ------------------------------ | +| Lint everything | `make lint` | +| Dry-run site playbook | `make check PLAYBOOK=site` | +| Deploy everything | `make deploy PLAYBOOK=site` | +| Test a role | `make test ROLE=base` | +| Scaffold a new role | `make new-role NAME=myservice` | + +See `Makefile` for the full list of targets. + +## Project structure + +``` +. +├── CLAUDE.md # Claude Code session context +├── Makefile # All operations go through here +├── ansible.cfg # Project-scoped Ansible config +├── requirements.txt # Python dependencies +├── requirements.yml # Ansible collections +│ +├── docs/ +│ ├── decisions/ # Architecture decision records (ADRs) +│ └── runbooks/ # Step-by-step operational procedures +│ +├── inventories/ +│ ├── production/ # Live hosts — edit carefully +│ └── staging/ # Test hosts — safe to run freely +│ +├── playbooks/ # Orchestration playbooks +│ ├── site.yml # Full standard state +│ └── bootstrap.yml # First-run new host setup +│ +├── roles/ # Ansible roles +│ ├── base/ # OS baseline applied to all hosts +│ └── docker_host/ # Docker runtime setup +│ +├── terraform/ # VM provisioning + infra DNS (see ADR-006/009) +│ ├── modules/ # Reusable modules (proxmox_vm) +│ └── environments/ # Per-env state: staging/, production/ +│ +└── scripts/ # Helper scripts (tf_to_inventory.py) +``` + +## Documentation + +- **Current state (built vs planned): `STATUS.md`** — read this before assuming + something exists; the ADRs describe intent, not necessarily reality. +- AI agents: `AGENTS.md` (points to `CLAUDE.md`, the authoritative guide) +- Architecture: `docs/decisions/001-architecture.md` +- Security baseline: `docs/decisions/002-security.md` +- Toolchain decisions: `docs/decisions/003-toolchain.md` +- Docker model: `docs/decisions/004-docker-model.md` +- Bootstrapping: `docs/decisions/005-bootstrapping.md` +- Terraform: `docs/decisions/006-terraform.md` +- Network topology: `docs/decisions/007-network.md` +- Testing methodology: `docs/decisions/008-testing.md` +- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md` + +## Contributing + +See `CONTRIBUTING.md` for conventions, branching strategy, and how to add roles. diff --git a/STATUS.md b/STATUS.md new file mode 100644 index 0000000..3bff46d --- /dev/null +++ b/STATUS.md @@ -0,0 +1,51 @@ +# Project status — what's real vs planned + +This repo is partly aspirational: the ADRs in `docs/decisions/` describe the +*intended* design, and some of it is **not built yet**. This file is the ground +truth. **Before relying on a role, provider, or pipeline existing, check here.** +If something is listed as "designed, not built", do not assume it works. + +_Last reviewed: 2026-05-30._ + +## Real and working today + +| Thing | State | +|---|---| +| `playbooks/bootstrap.yml` | Works — self-contained (installs Python, creates the `ansible` user + sudoers) | +| `scripts/tf_to_inventory.py` | Works — stdlib only; `terraform output -json` → `hosts.yml` | +| `.docker/molecule-debian13/Dockerfile` | Present — custom Molecule test image (ADR-008) | +| `docs/decisions/*`, `docs/runbooks/*` | Current and mutually reconciled | +| `Makefile`, lint config (`.ansible-lint`, `.yamllint`), `.gitignore` | Present and used | +| `git` (local) | Initialized — trunk-based on `main`. Off-machine remote (Forgejo) being set up separately. | +| Pre-commit hooks | Configured: lint, gitleaks, vault-encryption guard. Activate with `pre-commit install` after `make setup`. | +| Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below | + +## Scaffolded but empty — NOT implemented + +| Thing | State | +|---|---| +| `roles/base/` | Empty directory. `site.yml` references it, but it applies nothing. | +| `roles/docker_host/` | Empty directory. Same. | +| `inventories/*/hosts.yml` | Placeholder stubs (commented examples); regenerated by `make tf-inventory` once Terraform has hosts | +| `inventories/production/group_vars/{docker_hosts,proxmox_hosts}/` | Empty dirs | + +So `make deploy PLAYBOOK=site` currently does effectively nothing — the roles it +calls are empty. + +## Designed but not built + +| Thing | Designed in | Notes | +|---|---|---| +| `dns` role (renders the internal zone) | ADR-007 / ADR-009 | Does not exist. Internal DNS ownership is assigned to it by design. | +| Terraform actually provisioning | ADR-006 / ADR-009 | Never `terraform init`ed: no `.terraform.lock.hcl`, no state, no real `local.vms` entries | +| CI (Forgejo Actions) | ADR-003 / ADR-008 | Pipeline described; not implemented | +| Level 2 / 3 testing (staging, `askari` smoke) | ADR-008 | Depends on real VMs / `askari`, which don't exist yet | +| Per-service roles | ADR-004 | Model defined; no service roles built | +| Forgejo remote + CI | ADR-003 / ADR-008 | Local git is live; pushing to `git.baobab.band` and Actions CI are being set up | + +## Keeping this honest + +Update this file whenever you build, stub, or remove something. It is the first +place an AI tool or new contributor should look to learn what they can actually +rely on. When a row moves from "designed" to "working", move it up — don't leave +stale optimism here.