boma/README.md

# boma

Infrastructure-as-code for a self-hosted homelab: a Proxmox cluster of Debian 13 VMs
running Docker services, provisioned with **Terraform** and configured with
**Ansible**. Stable, secure, reproducible, and fully version-controlled.

**Scope** — this repo manages *infrastructure*: the cluster's VMs, their hardened
base OS, and the containerised services they run. It does **not** manage personal
machines (laptops, desktops, phones). Terraform owns VM existence; Ansible owns
everything inside a VM. See `STATUS.md` for what's built vs planned and
`docs/decisions/` for the design rationale.

**The name** — *boma* is Swahili for a fortified homestead enclosure (a stockade
guarding what's within) — fitting for a hardened, self-contained home setup. It
keeps company with the project's other Swahili names: `askari` (the external
sentinel) and `nyumbani` ("home").

## Quick start (control node)

```bash
git clone <repo-url> ~/ansible
cd ~/ansible

# Create venv and install dependencies
make setup
make collections

# Unlock the vault password from Vaultwarden via rbw
# (one-time rbw setup: docs/runbooks/rotate-secrets.md)
rbw unlock

# Verify setup
make lint
```

## Common operations

| What                  | Command                        |
| --------------------- | ------------------------------ |
| Lint everything       | `make lint`                    |
| Dry-run site playbook | `make check PLAYBOOK=site`     |
| Deploy everything     | `make deploy PLAYBOOK=site`    |
| Test a role           | `make test ROLE=base`          |
| Scaffold a new role   | `make new-role NAME=myservice` |

See `Makefile` for the full list of targets.

## Project structure

```
.
├── CLAUDE.md               # Claude Code session context
├── Makefile                # All operations go through here
├── ansible.cfg             # Project-scoped Ansible config
├── requirements.txt        # Python dependencies
├── requirements.yml        # Ansible collections
│
├── docs/
│   ├── decisions/          # Architecture decision records (ADRs)
│   ├── runbooks/           # Step-by-step operational procedures
│   ├── security/           # Per-service security checklist + templates + accepted risks
│   ├── testing/            # VERIFY.md template + service-UI verification reports
│   ├── access/             # ACCESS.md template (ADR-021)
│   ├── backup/             # BACKUP.md template (ADR-022)
│   ├── hardware/           # Physical capacity reference + reviews
│   └── reviews/            # /review-repo reports
│
├── inventories/
│   ├── production/         # Live hosts — edit carefully
│   └── staging/            # Test hosts — safe to run freely
│
├── playbooks/              # Orchestration playbooks
│   ├── site.yml            # Full standard state
│   ├── workstation.yml     # Developer environment (control group)
│   └── bootstrap.yml       # First-run new host setup
│
├── roles/                  # Ansible roles
│   ├── base/               # OS baseline applied to all hosts
│   ├── dev_env/            # Interactive developer environment
│   └── docker_host/        # Docker runtime setup
│
├── terraform/              # VM provisioning only — no DNS (see ADR-006/009)
│   ├── modules/            # Reusable modules (proxmox_vm)
│   └── environments/       # Per-env state: staging/, production/
│
└── scripts/                # Helper scripts (tf_to_inventory.py)
```

## Documentation

- **Current state (built vs planned): `STATUS.md`** — read this before assuming
  something exists; the ADRs describe intent, not necessarily reality.
- AI agents: `AGENTS.md` (points to `CLAUDE.md`, the authoritative guide)
- Architecture: `docs/decisions/001-architecture.md`
- Security baseline: `docs/decisions/002-security.md`
- Toolchain decisions: `docs/decisions/003-toolchain.md`
- Docker model: `docs/decisions/004-docker-model.md`
- Bootstrapping: `docs/decisions/005-bootstrapping.md`
- Terraform: `docs/decisions/006-terraform.md`
- Network topology: `docs/decisions/007-network.md`
- Testing methodology: `docs/decisions/008-testing.md`
- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md`
- Forgejo & CI: `docs/decisions/010-forgejo-ci.md`
- Update management: `docs/decisions/011-update-management.md`
- Hardware & capacity: `docs/decisions/012-hardware-capacity.md`
- Heritage / V4 policy: `docs/decisions/013-heritage-v4.md`
- Sourcing technical knowledge: `docs/decisions/014-knowledge-sourcing.md`
- Control / AI-worker host (`ubongo`): `docs/decisions/015-control-host.md`
- Mesh VPN (NetBird): `docs/decisions/016-mesh-vpn.md`
- Service-UI verification (Level 4): `docs/decisions/017-service-ui-verification.md`
- Logging & log integrity: `docs/decisions/018-logging.md`
- Tagging & run-targeting: `docs/decisions/019-tagging.md`
- Firewall strategy: `docs/decisions/020-firewall.md`
- Operational access: `docs/decisions/021-operational-access.md`
- Backup & disaster recovery: `docs/decisions/022-backup.md`
- ADR structure & lifecycle: `docs/decisions/023-adr-structure.md`
- Reverse proxy (Caddy): `docs/decisions/024-reverse-proxy.md`

(CLAUDE.md carries the full cross-referenced table, including the runbooks and
security/testing docs.)

## Contributing

See `CONTRIBUTING.md` for conventions, branching strategy, and how to add roles.
Clarify README scope and Terraform role; explain the boma name Broaden the intro beyond Ansible (Terraform + Ansible), state the infrastructure-not-personal-devices scope, and explain the Swahili name. Also replace the stale .vault_pass quick-start step with 'rbw unlock'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 18:25:50 +02:00			`# boma`
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
Clarify README scope and Terraform role; explain the boma name Broaden the intro beyond Ansible (Terraform + Ansible), state the infrastructure-not-personal-devices scope, and explain the Swahili name. Also replace the stale .vault_pass quick-start step with 'rbw unlock'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 18:25:50 +02:00			`Infrastructure-as-code for a self-hosted homelab: a Proxmox cluster of Debian 13 VMs`
			`running Docker services, provisioned with Terraform and configured with`
			`Ansible. Stable, secure, reproducible, and fully version-controlled.`

			`Scope — this repo manages infrastructure: the cluster's VMs, their hardened`
			`base OS, and the containerised services they run. It does not manage personal`
			`machines (laptops, desktops, phones). Terraform owns VM existence; Ansible owns`
			everything inside a VM. See `STATUS.md` for what's built vs planned and
			`docs/decisions/` for the design rationale.

			`The name — boma is Swahili for a fortified homestead enclosure (a stockade`
			`guarding what's within) — fitting for a hardened, self-contained home setup. It`
			keeps company with the project's other Swahili names: `askari` (the external
			sentinel) and `nyumbani` ("home").
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`## Quick start (control node)`

			```bash
			`git clone <repo-url> ~/ansible`
			`cd ~/ansible`

			`# Create venv and install dependencies`
			`make setup`
			`make collections`

Clarify README scope and Terraform role; explain the boma name Broaden the intro beyond Ansible (Terraform + Ansible), state the infrastructure-not-personal-devices scope, and explain the Swahili name. Also replace the stale .vault_pass quick-start step with 'rbw unlock'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 18:25:50 +02:00			`# Unlock the vault password from Vaultwarden via rbw`
			`# (one-time rbw setup: docs/runbooks/rotate-secrets.md)`
			`rbw unlock`
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`# Verify setup`
			`make lint`
			```

			`## Common operations`

			`\| What \| Command \|`
			`\| --------------------- \| ------------------------------ \|`
			\| Lint everything \| `make lint` \|
			\| Dry-run site playbook \| `make check PLAYBOOK=site` \|
			\| Deploy everything \| `make deploy PLAYBOOK=site` \|
			\| Test a role \| `make test ROLE=base` \|
			\| Scaffold a new role \| `make new-role NAME=myservice` \|

			See `Makefile` for the full list of targets.

			`## Project structure`

			```
			`.`
			`├── CLAUDE.md # Claude Code session context`
			`├── Makefile # All operations go through here`
			`├── ansible.cfg # Project-scoped Ansible config`
			`├── requirements.txt # Python dependencies`
			`├── requirements.yml # Ansible collections`
			`│`
			`├── docs/`
			`│ ├── decisions/ # Architecture decision records (ADRs)`
Thread the VERIFY.md convention through ADR-004/new-role/README Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the ADR-004 service-role file table, as a new-role runbook step, and the README docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-05 18:52:42 +02:00			`│ ├── runbooks/ # Step-by-step operational procedures`
			`│ ├── security/ # Per-service security checklist + templates + accepted risks`
			`│ ├── testing/ # VERIFY.md template + service-UI verification reports`
docs(review): 2026-06-11 repo audit — fix build-wave doc drift /review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the base(firewall)+dev_env build wave: README/playbook/role notes that still called the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021 cross-links in ADR-016/020. 18 open findings reported (not fixed). Headline: `make lint` is red on `main` (site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022 backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of 12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-11 14:48:00 +02:00			`│ ├── access/ # ACCESS.md template (ADR-021)`
			`│ ├── backup/ # BACKUP.md template (ADR-022)`
Thread the VERIFY.md convention through ADR-004/new-role/README Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the ADR-004 service-role file table, as a new-role runbook step, and the README docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-05 18:52:42 +02:00			`│ ├── hardware/ # Physical capacity reference + reviews`
			`│ └── reviews/ # /review-repo reports`
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00			`│`
			`├── inventories/`
			`│ ├── production/ # Live hosts — edit carefully`
			`│ └── staging/ # Test hosts — safe to run freely`
			`│`
			`├── playbooks/ # Orchestration playbooks`
			`│ ├── site.yml # Full standard state`
docs(review): 2026-06-11 repo audit — fix build-wave doc drift /review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the base(firewall)+dev_env build wave: README/playbook/role notes that still called the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021 cross-links in ADR-016/020. 18 open findings reported (not fixed). Headline: `make lint` is red on `main` (site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022 backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of 12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-11 14:48:00 +02:00			`│ ├── workstation.yml # Developer environment (control group)`
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00			`│ └── bootstrap.yml # First-run new host setup`
			`│`
			`├── roles/ # Ansible roles`
			`│ ├── base/ # OS baseline applied to all hosts`
docs(review): 2026-06-11 repo audit — fix build-wave doc drift /review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the base(firewall)+dev_env build wave: README/playbook/role notes that still called the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021 cross-links in ADR-016/020. 18 open findings reported (not fixed). Headline: `make lint` is red on `main` (site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022 backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of 12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-11 14:48:00 +02:00			`│ ├── dev_env/ # Interactive developer environment`
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00			`│ └── docker_host/ # Docker runtime setup`
			`│`
review-repo: harden scanner, apply safe fixes, record first review First /review-repo run on boma. Hardened repo-scan.py (no TODO.md/prose false positives). Applied 7 safe fixes (DNS staleness x2, STATUS factual correction, hosts.yml path generalisation, trunk-based wording x2, scripts/README). Recorded the run and 17 open findings in docs/reviews/2026-05-30-*. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 19:10:58 +02:00			`├── terraform/ # VM provisioning only — no DNS (see ADR-006/009)`
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00			`│ ├── modules/ # Reusable modules (proxmox_vm)`
			`│ └── environments/ # Per-env state: staging/, production/`
			`│`
			`└── scripts/ # Helper scripts (tf_to_inventory.py)`
			```

			`## Documentation`

			- Current state (built vs planned): `STATUS.md` — read this before assuming
			`something exists; the ADRs describe intent, not necessarily reality.`
			- AI agents: `AGENTS.md` (points to `CLAUDE.md`, the authoritative guide)
			- Architecture: `docs/decisions/001-architecture.md`
			- Security baseline: `docs/decisions/002-security.md`
			- Toolchain decisions: `docs/decisions/003-toolchain.md`
			- Docker model: `docs/decisions/004-docker-model.md`
			- Bootstrapping: `docs/decisions/005-bootstrapping.md`
			- Terraform: `docs/decisions/006-terraform.md`
			- Network topology: `docs/decisions/007-network.md`
			- Testing methodology: `docs/decisions/008-testing.md`
			- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md`
Thread the VERIFY.md convention through ADR-004/new-role/README Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the ADR-004 service-role file table, as a new-role runbook step, and the README docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-05 18:52:42 +02:00			- Forgejo & CI: `docs/decisions/010-forgejo-ci.md`
			- Update management: `docs/decisions/011-update-management.md`
			- Hardware & capacity: `docs/decisions/012-hardware-capacity.md`
			- Heritage / V4 policy: `docs/decisions/013-heritage-v4.md`
			- Sourcing technical knowledge: `docs/decisions/014-knowledge-sourcing.md`
			- Control / AI-worker host (`ubongo`): `docs/decisions/015-control-host.md`
			- Mesh VPN (NetBird): `docs/decisions/016-mesh-vpn.md`
			- Service-UI verification (Level 4): `docs/decisions/017-service-ui-verification.md`
docs(review): 2026-06-14 repo audit — M4a doc drift + Traefik→Caddy lag 11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01 description, base/playbooks/scripts/terraform/public_dns README build-state, CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23 stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported. make lint green. No stale-deferred (ADR-011 open questions still open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-14 18:37:54 +02:00			- Logging & log integrity: `docs/decisions/018-logging.md`
			- Tagging & run-targeting: `docs/decisions/019-tagging.md`
			- Firewall strategy: `docs/decisions/020-firewall.md`
			- Operational access: `docs/decisions/021-operational-access.md`
			- Backup & disaster recovery: `docs/decisions/022-backup.md`
			- ADR structure & lifecycle: `docs/decisions/023-adr-structure.md`
			- Reverse proxy (Caddy): `docs/decisions/024-reverse-proxy.md`
Thread the VERIFY.md convention through ADR-004/new-role/README Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the ADR-004 service-role file table, as a new-role runbook step, and the README docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-05 18:52:42 +02:00
			`(CLAUDE.md carries the full cross-referenced table, including the runbooks and`
			`security/testing docs.)`
Add project orientation and contributor docs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 14:10:01 +02:00
			`## Contributing`

			See `CONTRIBUTING.md` for conventions, branching strategy, and how to add roles.