sjat/boma

No description

Find a file

sjat bc8592616b fix: address final whole-branch review findings - ADR-023 §4: ADR-015 no-sudo sub-decision now Superseded-by ADR-025 (bidirectional), not just an in-place amendment. - STATUS: drop the deferred `reset` verb; honest integration_test (molecule not run in this env; applied to ubongo) + verify (forward/DNAT, not wt0); RED->GREEN validated. - driver: remove unused `import shutil`. - README: fix the ADR-025 link filename. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>		2026-06-18 21:52:28 +02:00
.claude	fix(hooks): scope vault-preflight to staged ansible; catch prose exec re-asks	2026-06-17 17:49:55 +02:00
.docker	feat(docker): custom Caddy image with the Gandi DNS-01 plugin	2026-06-15 06:57:38 +02:00
.scaffold	Fix Forgejo registry path to owner/image format (review R10a)	2026-05-30 21:34:02 +02:00
docs	fix: address final whole-branch review findings	2026-06-18 21:52:28 +02:00
inventories	feat(base): codify AI-worker NOPASSWD sudo (ADR-015 amended)	2026-06-18 21:36:31 +02:00
playbooks	feat(integration_test): KVM/libvirt substrate role on the control node	2026-06-18 12:09:35 +02:00
roles	fix: address final whole-branch review findings	2026-06-18 21:52:28 +02:00
scripts	fix: address final whole-branch review findings	2026-06-18 21:52:28 +02:00
terraform	revert: back out mesh-hardening 1/3 on askari after it broke the Docker host	2026-06-17 22:16:17 +02:00
tests	fix(integration): verify probes :80 without following redirects	2026-06-18 16:57:47 +02:00
.ansible-lint	fix(integration): exclude transient .run/ from linters; --- in generated inventory	2026-06-18 16:44:12 +02:00
.gitignore	feat(make): test-integration / test-integration-clean targets	2026-06-18 12:45:38 +02:00
.pre-commit-config.yaml	chore(tooling): scope ansible-lint to ansible content; venv PATH in make test	2026-06-10 12:51:30 +02:00
.yamllint	fix(integration): exclude transient .run/ from linters; --- in generated inventory	2026-06-18 16:44:12 +02:00
AGENTS.md	Add ADR-010 (Forgejo integration) and rbw-unlocked pre-flight convention	2026-05-30 21:34:07 +02:00
ansible.cfg	feat(make): offsite TF token injection + directory inventory + tf-inventory-offsite	2026-06-14 12:05:41 +02:00
CLAUDE.md	docs: wire ADR-025 into testing/control-host/risks/status/capacity	2026-06-18 12:51:22 +02:00
CONTRIBUTING.md	Purge residual .vault_pass references (review R1-R5)	2026-05-30 19:17:25 +02:00
Makefile	feat(make): test-integration / test-integration-clean targets	2026-06-18 12:45:38 +02:00
README.md	docs(review): 2026-06-14 repo audit — M4a doc drift + Traefik→Caddy lag	2026-06-14 18:37:54 +02:00
requirements.txt	Harden lint setup and clean inventory placeholders	2026-05-30 14:56:16 +02:00
requirements.yml	feat(reverse_proxy): Caddy role (Gandi DNS-01, on-host image build, route catalog)	2026-06-14 17:36:58 +02:00
STATUS.md	fix: address final whole-branch review findings	2026-06-18 21:52:28 +02:00

README.md

boma

Infrastructure-as-code for a self-hosted homelab: a Proxmox cluster of Debian 13 VMs running Docker services, provisioned with Terraform and configured with Ansible. Stable, secure, reproducible, and fully version-controlled.

Scope — this repo manages infrastructure: the cluster's VMs, their hardened base OS, and the containerised services they run. It does not manage personal machines (laptops, desktops, phones). Terraform owns VM existence; Ansible owns everything inside a VM. See STATUS.md for what's built vs planned and docs/decisions/ for the design rationale.

The name — boma is Swahili for a fortified homestead enclosure (a stockade guarding what's within) — fitting for a hardened, self-contained home setup. It keeps company with the project's other Swahili names: askari (the external sentinel) and nyumbani ("home").

Quick start (control node)

git clone <repo-url> ~/ansible
cd ~/ansible

# Create venv and install dependencies
make setup
make collections

# Unlock the vault password from Vaultwarden via rbw
# (one-time rbw setup: docs/runbooks/rotate-secrets.md)
rbw unlock

# Verify setup
make lint

Common operations

What	Command
Lint everything	`make lint`
Dry-run site playbook	`make check PLAYBOOK=site`
Deploy everything	`make deploy PLAYBOOK=site`
Test a role	`make test ROLE=base`
Scaffold a new role	`make new-role NAME=myservice`

See Makefile for the full list of targets.

Project structure

.
├── CLAUDE.md               # Claude Code session context
├── Makefile                # All operations go through here
├── ansible.cfg             # Project-scoped Ansible config
├── requirements.txt        # Python dependencies
├── requirements.yml        # Ansible collections
│
├── docs/
│   ├── decisions/          # Architecture decision records (ADRs)
│   ├── runbooks/           # Step-by-step operational procedures
│   ├── security/           # Per-service security checklist + templates + accepted risks
│   ├── testing/            # VERIFY.md template + service-UI verification reports
│   ├── access/             # ACCESS.md template (ADR-021)
│   ├── backup/             # BACKUP.md template (ADR-022)
│   ├── hardware/           # Physical capacity reference + reviews
│   └── reviews/            # /review-repo reports
│
├── inventories/
│   ├── production/         # Live hosts — edit carefully
│   └── staging/            # Test hosts — safe to run freely
│
├── playbooks/              # Orchestration playbooks
│   ├── site.yml            # Full standard state
│   ├── workstation.yml     # Developer environment (control group)
│   └── bootstrap.yml       # First-run new host setup
│
├── roles/                  # Ansible roles
│   ├── base/               # OS baseline applied to all hosts
│   ├── dev_env/            # Interactive developer environment
│   └── docker_host/        # Docker runtime setup
│
├── terraform/              # VM provisioning only — no DNS (see ADR-006/009)
│   ├── modules/            # Reusable modules (proxmox_vm)
│   └── environments/       # Per-env state: staging/, production/
│
└── scripts/                # Helper scripts (tf_to_inventory.py)

Documentation

Current state (built vs planned): STATUS.md — read this before assuming something exists; the ADRs describe intent, not necessarily reality.
AI agents: AGENTS.md (points to CLAUDE.md, the authoritative guide)
Architecture: docs/decisions/001-architecture.md
Security baseline: docs/decisions/002-security.md
Toolchain decisions: docs/decisions/003-toolchain.md
Docker model: docs/decisions/004-docker-model.md
Bootstrapping: docs/decisions/005-bootstrapping.md
Terraform: docs/decisions/006-terraform.md
Network topology: docs/decisions/007-network.md
Testing methodology: docs/decisions/008-testing.md
Terraform ↔ Ansible handoff: docs/decisions/009-provisioning-handoff.md
Forgejo & CI: docs/decisions/010-forgejo-ci.md
Update management: docs/decisions/011-update-management.md
Hardware & capacity: docs/decisions/012-hardware-capacity.md
Heritage / V4 policy: docs/decisions/013-heritage-v4.md
Sourcing technical knowledge: docs/decisions/014-knowledge-sourcing.md
Control / AI-worker host (ubongo): docs/decisions/015-control-host.md
Mesh VPN (NetBird): docs/decisions/016-mesh-vpn.md
Service-UI verification (Level 4): docs/decisions/017-service-ui-verification.md
Logging & log integrity: docs/decisions/018-logging.md
Tagging & run-targeting: docs/decisions/019-tagging.md
Firewall strategy: docs/decisions/020-firewall.md
Operational access: docs/decisions/021-operational-access.md
Backup & disaster recovery: docs/decisions/022-backup.md
ADR structure & lifecycle: docs/decisions/023-adr-structure.md
Reverse proxy (Caddy): docs/decisions/024-reverse-proxy.md

(CLAUDE.md carries the full cross-referenced table, including the runbooks and security/testing docs.)

Contributing

See CONTRIBUTING.md for conventions, branching strategy, and how to add roles.