boma/docs/decisions/003-toolchain.md
sjat 40a428975a docs(adr): restructure ADR-003 to ADR-023 conformance
Add Status, a descriptive Context, a Decision umbrella over the existing
topical sections (demoted to ###), and a Consequences section assembled
from the ADR's already-stated rationale. No decision substance changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 14:50:03 +02:00

7.3 KiB

ADR-003 — Toolchain decisions

Status

Accepted (2026-05-30)

Context

boma needs a defined, reproducible toolchain for running and testing its Ansible monorepo: an execution engine, a Python environment, secrets handling, a testing framework, linting, CI/CD, developer-ergonomics conventions, and a collections/roles policy. This ADR records the choice made for each, together with the alternatives weighed and why they were not adopted.

Decision

Execution engine

Choice: ansible-core (pip-installed, pinned version) + explicit requirements.yml

Not chosen: ansible full package (bundles ~85 collections at a frozen version)

Rationale: Explicit collection pinning allows independent upgrades, smaller installs, and fully reproducible environments. The full package trades these away for convenience that isn't needed in a maintained monorepo.


Python environment

Choice: python3-venv (system Python on Debian 13) + pinned requirements.txt

Not chosen: pyenv (solves multi-version problems on developer laptops, not needed on a dedicated Debian control node with a controlled Python version)

Rationale: The control node runs one Python version. A plain venv is sufficient, reproducible, and has no extra dependencies.


Secrets

Choice: Ansible Vault (file-based, built-in)

Not chosen:

  • SOPS + age: better git-diff ergonomics, but adds external tooling and key management
  • HashiCorp Vault: powerful, but significant operational overhead for this scale

Rationale: Vault is built-in, requires no extra services, and works well at this scale. Whole-file encryption makes diffs unreadable regardless of layout, so rather than flattening we organise secrets for human lookup and clean extraction: a nested vault.<service>.<key> map inside each vault.yml, scoped to actual secrets (see CLAUDE.md → Secrets).


Testing

Choice: Molecule with Docker driver (molecule-plugins[docker])

Not chosen:

  • Molecule + Podman: rootless is appealing, but Docker is simpler on a Debian control node
  • Molecule + Vagrant: full VMs are slower and require a hypervisor on the control node
  • No testing: unacceptable for a shared, maintained project

Test image: a self-built, project-owned Debian 13 image with systemd support (.docker/molecule-debian13/), hosted in the Forgejo registry. ADR-008 is canonical for the image and the rationale for not using an external image such as geerlingguy/docker-debian13-ansible.

Verifier: Built-in Ansible verifier. Testinfra added later if deeper assertions are needed.


Linting

Choice: ansible-lint + yamllint + pre-commit

  • yamllint: catches formatting issues before Ansible sees the file
  • ansible-lint: enforces correctness and idiomatic style
  • pre-commit: runs both locally on every commit, preventing CI failures

Config files: .ansible-lint, .yamllint in repo root.


CI/CD

Choice: Forgejo Actions (self-hosted at forgejo.nyumbani.baobab.band) + act_runner

Not chosen: GitHub Actions (external), Jenkins (heavy)

Pipeline (trunk-based — no pull requests; see CLAUDE.md git conventions):

  1. Push to main → lint + Molecule tests
  2. On green → deploy to staging
  3. [manual promote gate] → deploy to production

act_runner runs as a Docker container on ubongo (the control node — ADR-015), or on a dedicated runner VM later if CI load warrants a separate host.


Developer ergonomics

Choice: Makefile as the single interface for all operations

Rationale: All ansible-playbook, molecule, and ansible-lint invocations go through Make targets. This means:

  • Claude Code always calls make <target> — never constructs raw commands
  • Collaborators don't need to know the underlying flags
  • CI uses the same targets as local development (no drift)

direnv: Not used — the control node is a dedicated host, not a shared workstation. The venv is activated in the user's shell profile.


Collections and roles policy

No Galaxy roles. All roles are written and maintained locally in roles/. Galaxy roles introduce external state, versioning surprises, and implicit conventions that conflict with this repo's style.

Collections on demand. A collection is added to requirements.yml only when a task in a committed role actively uses a module from it. Pre-emptive inclusions are removed. Each entry in requirements.yml must justify its presence.

Starting collection set (rationale for each):

Collection Kept / dropped Reason
ansible.posix Kept Ansible-team maintained; fills real ansible.builtin gaps (authorized_key, sysctl, acl)
community.docker Dropped ADR-004 uses ansible.builtin.command + docker compose — no Docker API modules needed
community.proxmox Dropped Proxmox configuration is out of scope (ADR-001)
community.crypto Deferred Add when a role needs cert automation; use openssl CLI until then
community.general Deferred 1,500+ modules; add only the specific sub-module needed, with a comment

What was explicitly ruled out

Tool Reason not adopted
AWX / AAP Significant operational overhead, not needed at this scale
Semaphore Revisit if non-SSH operators need to trigger runs
ansible-runner Only needed when AWX/Semaphore orchestrates runs
ansible-builder Only needed when packaging Execution Environments for AWX
Kubernetes/Swarm Out of scope — Docker Compose is the right complexity level
NixOS targets Poor Ansible fit; all hosts standardised on Debian 13

Terraform is adopted for VM provisioning only (no DNS) — see docs/decisions/006-terraform.md.

Consequences

Drawn from the rationale and trade-offs this ADR already states:

  • Pinning ansible-core + an explicit requirements.yml and a plain pinned venv keeps the control-node environment small and fully reproducible, at the cost of maintaining the pins (per Execution engine / Python environment).
  • Ansible Vault's whole-file encryption makes diffs unreadable regardless of layout, so secrets are organised for human lookup (vault.<service>.<key>) rather than diff ergonomics — the trade accepted against SOPS/age (per Secrets).
  • The Makefile is the single interface: Claude Code and CI invoke the same targets, so local and CI behaviour can't drift and collaborators need not know raw flags (per Developer ergonomics).
  • Collections are added only on demand, so requirements.yml stays minimal; this defers community.crypto (use openssl CLI until a role needs certs) and community.general (add only the specific sub-module needed) until a real need appears (per Collections and roles policy).
  • The heavier orchestration tools were declined for this scale, each with a named revisit trigger — e.g. Semaphore if non-SSH operators must trigger runs, AWX-adjacent tooling only if AWX/AAP is ever adopted (per "What was explicitly ruled out").