sjat/boma

sjat 40a428975a docs(adr): restructure ADR-003 to ADR-023 conformance

Add Status, a descriptive Context, a Decision umbrella over the existing
topical sections (demoted to ###), and a Consequences section assembled
from the ADR's already-stated rationale. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-10 14:50:03 +02:00

7.3 KiB

Raw Permalink Blame History

ADR-003 — Toolchain decisions

Status

Accepted (2026-05-30)

Context

boma needs a defined, reproducible toolchain for running and testing its Ansible monorepo: an execution engine, a Python environment, secrets handling, a testing framework, linting, CI/CD, developer-ergonomics conventions, and a collections/roles policy. This ADR records the choice made for each, together with the alternatives weighed and why they were not adopted.

Decision

Execution engine

Choice: ansible-core (pip-installed, pinned version) + explicit requirements.yml

Not chosen: ansible full package (bundles ~85 collections at a frozen version)

Rationale: Explicit collection pinning allows independent upgrades, smaller installs, and fully reproducible environments. The full package trades these away for convenience that isn't needed in a maintained monorepo.

Python environment

Choice: python3-venv (system Python on Debian 13) + pinned requirements.txt

Not chosen: pyenv (solves multi-version problems on developer laptops, not needed on a dedicated Debian control node with a controlled Python version)

Rationale: The control node runs one Python version. A plain venv is sufficient, reproducible, and has no extra dependencies.

Secrets

Choice: Ansible Vault (file-based, built-in)

Not chosen:

SOPS + age: better git-diff ergonomics, but adds external tooling and key management
HashiCorp Vault: powerful, but significant operational overhead for this scale

Rationale: Vault is built-in, requires no extra services, and works well at this scale. Whole-file encryption makes diffs unreadable regardless of layout, so rather than flattening we organise secrets for human lookup and clean extraction: a nested vault.<service>.<key> map inside each vault.yml, scoped to actual secrets (see CLAUDE.md → Secrets).

Testing

Choice: Molecule with Docker driver (molecule-plugins[docker])

Not chosen:

Molecule + Podman: rootless is appealing, but Docker is simpler on a Debian control node
Molecule + Vagrant: full VMs are slower and require a hypervisor on the control node
No testing: unacceptable for a shared, maintained project

Test image: a self-built, project-owned Debian 13 image with systemd support (.docker/molecule-debian13/), hosted in the Forgejo registry. ADR-008 is canonical for the image and the rationale for not using an external image such as geerlingguy/docker-debian13-ansible.

Verifier: Built-in Ansible verifier. Testinfra added later if deeper assertions are needed.

Linting

Choice: ansible-lint + yamllint + pre-commit

yamllint: catches formatting issues before Ansible sees the file
ansible-lint: enforces correctness and idiomatic style
pre-commit: runs both locally on every commit, preventing CI failures

Config files: .ansible-lint, .yamllint in repo root.

CI/CD

Choice: Forgejo Actions (self-hosted at forgejo.nyumbani.baobab.band) + act_runner

Not chosen: GitHub Actions (external), Jenkins (heavy)

Pipeline (trunk-based — no pull requests; see CLAUDE.md git conventions):

Push to main → lint + Molecule tests
On green → deploy to staging
[manual promote gate] → deploy to production

act_runner runs as a Docker container on ubongo (the control node — ADR-015), or on a dedicated runner VM later if CI load warrants a separate host.

Developer ergonomics

Choice: Makefile as the single interface for all operations

Rationale: All ansible-playbook, molecule, and ansible-lint invocations go through Make targets. This means:

Claude Code always calls make <target> — never constructs raw commands
Collaborators don't need to know the underlying flags
CI uses the same targets as local development (no drift)

direnv: Not used — the control node is a dedicated host, not a shared workstation. The venv is activated in the user's shell profile.

Collections and roles policy

No Galaxy roles. All roles are written and maintained locally in roles/. Galaxy roles introduce external state, versioning surprises, and implicit conventions that conflict with this repo's style.

Collections on demand. A collection is added to requirements.yml only when a task in a committed role actively uses a module from it. Pre-emptive inclusions are removed. Each entry in requirements.yml must justify its presence.

Starting collection set (rationale for each):

Collection	Kept / dropped	Reason
`ansible.posix`	Kept	Ansible-team maintained; fills real `ansible.builtin` gaps (`authorized_key`, `sysctl`, `acl`)
`community.docker`	Dropped	ADR-004 uses `ansible.builtin.command` + `docker compose` — no Docker API modules needed
`community.proxmox`	Dropped	Proxmox configuration is out of scope (ADR-001)
`community.crypto`	Deferred	Add when a role needs cert automation; use `openssl` CLI until then
`community.general`	Deferred	1,500+ modules; add only the specific sub-module needed, with a comment

What was explicitly ruled out

Tool	Reason not adopted
AWX / AAP	Significant operational overhead, not needed at this scale
Semaphore	Revisit if non-SSH operators need to trigger runs
ansible-runner	Only needed when AWX/Semaphore orchestrates runs
ansible-builder	Only needed when packaging Execution Environments for AWX
Kubernetes/Swarm	Out of scope — Docker Compose is the right complexity level
NixOS targets	Poor Ansible fit; all hosts standardised on Debian 13

Terraform is adopted for VM provisioning only (no DNS) — see docs/decisions/006-terraform.md.

Consequences

Drawn from the rationale and trade-offs this ADR already states:

Pinning ansible-core + an explicit requirements.yml and a plain pinned venv keeps the control-node environment small and fully reproducible, at the cost of maintaining the pins (per Execution engine / Python environment).
Ansible Vault's whole-file encryption makes diffs unreadable regardless of layout, so secrets are organised for human lookup (vault.<service>.<key>) rather than diff ergonomics — the trade accepted against SOPS/age (per Secrets).
The Makefile is the single interface: Claude Code and CI invoke the same targets, so local and CI behaviour can't drift and collaborators need not know raw flags (per Developer ergonomics).
Collections are added only on demand, so requirements.yml stays minimal; this defers community.crypto (use openssl CLI until a role needs certs) and community.general (add only the specific sub-module needed) until a real need appears (per Collections and roles policy).
The heavier orchestration tools were declined for this scale, each with a named revisit trigger — e.g. Semaphore if non-SSH operators must trigger runs, AWX-adjacent tooling only if AWX/AAP is ever adopted (per "What was explicitly ruled out").

7.3 KiB Raw Permalink Blame History