boma/docs/decisions/003-toolchain.md
sjat 4ee1b66e23 Source vault password from Vaultwarden via rbw; nest vault structure
Master vault password is fetched from Vaultwarden via the rbw agent
(scripts/vault-pass-client.sh, wired as vault_password_file) instead of a
plaintext .vault_pass. Vault secrets use a nested vault.<service>.<key> map.
Encrypted vault.yml files are excluded from lint. Includes the host rename in
Makefile and STATUS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:16:35 +02:00

5.5 KiB

ADR-003 — Toolchain decisions

Execution engine

Choice: ansible-core (pip-installed, pinned version) + explicit requirements.yml

Not chosen: ansible full package (bundles ~85 collections at a frozen version)

Rationale: Explicit collection pinning allows independent upgrades, smaller installs, and fully reproducible environments. The full package trades these away for convenience that isn't needed in a maintained monorepo.


Python environment

Choice: python3-venv (system Python on Debian 13) + pinned requirements.txt

Not chosen: pyenv (solves multi-version problems on developer laptops, not needed on a dedicated Debian control node with a controlled Python version)

Rationale: The control node runs one Python version. A plain venv is sufficient, reproducible, and has no extra dependencies.


Secrets

Choice: Ansible Vault (file-based, built-in)

Not chosen:

  • SOPS + age: better git-diff ergonomics, but adds external tooling and key management
  • HashiCorp Vault: powerful, but significant operational overhead for this scale

Rationale: Vault is built-in, requires no extra services, and works well at this scale. Whole-file encryption makes diffs unreadable regardless of layout, so rather than flattening we organise secrets for human lookup and clean extraction: a nested vault.<service>.<key> map inside each vault.yml, scoped to actual secrets (see CLAUDE.md → Secrets).


Testing

Choice: Molecule with Docker driver (molecule-plugins[docker])

Not chosen:

  • Molecule + Podman: rootless is appealing, but Docker is simpler on a Debian control node
  • Molecule + Vagrant: full VMs are slower and require a hypervisor on the control node
  • No testing: unacceptable for a shared, maintained project

Test image: a self-built, project-owned Debian 13 image with systemd support (.docker/molecule-debian13/), hosted in the Forgejo registry. ADR-008 is canonical for the image and the rationale for not using an external image such as geerlingguy/docker-debian13-ansible.

Verifier: Built-in Ansible verifier. Testinfra added later if deeper assertions are needed.


Linting

Choice: ansible-lint + yamllint + pre-commit

  • yamllint: catches formatting issues before Ansible sees the file
  • ansible-lint: enforces correctness and idiomatic style
  • pre-commit: runs both locally on every commit, preventing CI failures

Config files: .ansible-lint, .yamllint in repo root.


CI/CD

Choice: Forgejo Actions (self-hosted at forgejo.nyumbani.baobab.band) + act_runner

Not chosen: GitHub Actions (external), Jenkins (heavy)

Pipeline:

  1. Push to any branch → lint + Molecule tests
  2. Merge to main → lint + Molecule tests + manual approval gate
  3. After approval → deploy to staging, then production

act_runner runs as a Docker container on the control node or a dedicated runner VM.


Developer ergonomics

Choice: Makefile as the single interface for all operations

Rationale: All ansible-playbook, molecule, and ansible-lint invocations go through Make targets. This means:

  • Claude Code always calls make <target> — never constructs raw commands
  • Collaborators don't need to know the underlying flags
  • CI uses the same targets as local development (no drift)

direnv: Not used — the control node is a dedicated host, not a shared workstation. The venv is activated in the user's shell profile.


Collections and roles policy

No Galaxy roles. All roles are written and maintained locally in roles/. Galaxy roles introduce external state, versioning surprises, and implicit conventions that conflict with this repo's style.

Collections on demand. A collection is added to requirements.yml only when a task in a committed role actively uses a module from it. Pre-emptive inclusions are removed. Each entry in requirements.yml must justify its presence.

Starting collection set (rationale for each):

Collection Kept / dropped Reason
ansible.posix Kept Ansible-team maintained; fills real ansible.builtin gaps (authorized_key, sysctl, acl)
community.docker Dropped ADR-004 uses ansible.builtin.command + docker compose — no Docker API modules needed
community.proxmox Dropped Proxmox configuration is out of scope (ADR-001)
community.crypto Deferred Add when a role needs cert automation; use openssl CLI until then
community.general Deferred 1,500+ modules; add only the specific sub-module needed, with a comment

What was explicitly ruled out

Tool Reason not adopted
AWX / AAP Significant operational overhead, not needed at this scale
Semaphore Revisit if non-SSH operators need to trigger runs
ansible-runner Only needed when AWX/Semaphore orchestrates runs
ansible-builder Only needed when packaging Execution Environments for AWX
Kubernetes/Swarm Out of scope — Docker Compose is the right complexity level
NixOS targets Poor Ansible fit; all hosts standardised on Debian 13

Terraform is adopted for VM provisioning and infrastructure DNS — see docs/decisions/006-terraform.md.