boma/STATUS.md
sjat 03d33f83dd fix(O1): scaffold docker_host role so make lint passes on main
playbooks/site.yml imports the docker_host role, but it didn't exist, so
ansible-lint's syntax-check failed on a clean checkout — breaking CLAUDE.md's
"main must always work" / "Never skip lint" (top open finding O1 from the
2026-06-11 review).

Scaffold docker_host as a proper placeholder via the prescribed mechanism
(make new-role): filled meta/main.yml + README, an honest no-task tasks/main.yml
documenting planned scope (Docker engine + Compose, daemon hardening, nftables.d
container rules per ADR-004/020), and the standard molecule scenario. This
preserves site.yml's full-standard-state intent rather than dropping the play.

Update STATUS.md (docker_host moves from "Not in git" to "scaffolded, no tasks")
and the role/playbook READMEs to match.

make lint: 0 failures, 0 warnings; check-tags OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:53:55 +02:00

10 KiB

Project status — what's real vs planned

This repo is partly aspirational: the ADRs in docs/decisions/ describe the intended design, and some of it is not built yet. This file is the ground truth. Before relying on a role, provider, or pipeline existing, check here. If something is listed as "designed, not built", do not assume it works.

Last reviewed: 2026-06-11.

Real and working today

Thing State
playbooks/bootstrap.yml Works — self-contained (installs Python, creates the ansible user + sudoers)
scripts/tf_to_inventory.py Works — stdlib only; terraform output -jsonhosts.yml
.docker/molecule-debian13/Dockerfile Present — custom Molecule test image (ADR-008)
docs/decisions/*, docs/runbooks/* Current and mutually reconciled
Makefile, lint config (.ansible-lint, .yamllint), .gitignore Present and used
git Initialized, trunk-based on main, pushed to origin (forgejo.nyumbani.baobab.band:7577).
Pre-commit hooks Configured: lint, gitleaks, vault-encryption guard. Activate with pre-commit install after make setup.
Vault password client scripts/vault-pass-client.sh fetches the master password from Vaultwarden via rbw (wired as vault_password_file). Requires rbw installed + rbw unlock.
/review-repo Repo audit: scripts/repo-scan.py (Phase 0) + .claude/commands/review-repo.md, reports to docs/reviews/. On-demand only; cron + email deferred (docs/TODO.md).
Terraform HCL (terraform/) Written (proxmox VM module + envs) — but never run; see below
docs/hardware/reference.md + scripts/capacity-scan.py Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON
/capacity-review Works — on-demand capacity evaluation → docs/hardware/reviews/. Intent-based (no live usage yet)
ADR-002 security strategy + docs/security/{accepted-risks,service-checklist}.md Present — threat model, principles, governance frame; checklist + risk register are docs, enforced manually in review
Service-role standard + per-service SECURITY.md convention Defined (ADR-004 + docs/security/service-security-template.md); not yet applied — no service roles exist
Tag standard + enforcement (ADR-019) Works — tests/tags.yml (closed vocabulary) + scripts/check-tags.py (run by make lint, unit-tested): enforces the tag vocabulary and that each role import in a play's roles: block carries its role-name tag. Governs mostly-unbuilt roles, but the linter is live now. Proxmox VM tag convention (<env>, group, managed-by=terraform) is in the Terraform HCL but unprovisioned.
roles/dev_env/ — interactive developer environment Built + applied. zsh + oh-my-zsh + oh-my-posh, tmux + TPM plugins, neovim; dotfiles deployed via GNU stow (re-derived from V4/fisi per ADR-013). Node.js from a pinned upstream tarball (not Debian's npm). Lint + Molecule (idempotent) green. Applied to ubongo for users sjat + claude (verified: zsh login shells, stow-symlinked .zshrc/.tmux.conf + nvim config, oh-my-zsh, tmux plugins; nvim v0.12.2, oh-my-posh 29.0.1). Run via playbooks/workstation.yml against the control group (no dedicated workstations group yet).
make check / make deploy PLAYBOOK=<name> Works. First end-to-end run (applying dev_env) surfaced + fixed latent bugs: Makefile PLAYBOOK var collision (binary path vs playbook-name arg) meant the targets never ran; ansible.cfg referenced uninstalled community.general callbacks (now built-in default + ansible.posix.profile_tasks); acl package added so Ansible can become_user an unprivileged user. The make targets now function — though site/base/docker_host content is still incomplete (see below).
ubongo — physical control / AI-worker host (ADR-015) Built (partial). Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to fisi (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via make setup/make collections). Repo cloned under a dedicated claude user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory control group at 10.20.10.151. dev_env now applied here (zsh/tmux/nvim for sjat + claude, via playbooks/workstation.yml). Managed as the operator account sjat (group_vars/control sets ansible_user: sjat), not the ansible service user group_vars/all assumes — ubongo has no bootstrapped ansible user. Pending: NetBird mesh enrollment (so SSH is LAN-only); full base hardening (only the firewall concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper ansible-user bootstrap (currently managed as sjat); OPNsense DHCP reservation for 10.20.10.151 (MAC 88:a4:c2:e0:ee:da); Terraform state backup (no TF state yet).

Scaffolded but empty — NOT implemented

Thing State
roles/base/ Partially built. The firewall concern is implemented (nftables: catalog-driven default-deny + east-west allowlist + auto-rollback apply; ADR-020) with pytest + Molecule render/syntax tests. Other concerns (SSH hardening, fail2ban, auditd, packages, users) are not built yet, so make deploy PLAYBOOK=site has no real content to apply (the make target itself now works — see "Real and working today").
roles/docker_host/ Scaffolded, no tasks. In git (meta/README/molecule filled), wired into playbooks/site.yml so the standard state is expressed end-to-end and make lint covers it, but it has no tasks yet — applying it is a no-op. Planned scope (Docker engine + Compose, daemon hardening, nftables.d container rules) in ADR-004/ADR-020.
inventories/*/hosts.yml Structured stubs with empty host maps (hosts: {}); regenerated by make tf-inventory once Terraform has hosts
inventories/production/group_vars/{docker_hosts,proxmox_hosts}/ Empty dirs

So make deploy PLAYBOOK=site has no real content to apply — base is only partially built (its firewall concern only) and the docker_host role is scaffolded but has no tasks yet. (The make check/deploy machinery itself now works — first proven by applying dev_env via playbooks/workstation.yml.)

Designed but not built

Thing Designed in Notes
dns role (renders the internal zone) ADR-007 / ADR-009 Does not exist. Internal DNS ownership is assigned to it by design.
Terraform actually provisioning ADR-006 / ADR-009 Never terraform inited: no .terraform.lock.hcl, no state, no real local.vms entries
CI (Forgejo Actions) ADR-003 / ADR-008 Pipeline described; not implemented
Level 2 / 3 testing (staging, askari smoke) ADR-008 Depends on real VMs / askari, which don't exist yet
Per-service roles ADR-004 Model defined; no service roles built
Forgejo Actions CI ADR-003 / ADR-008 Remote is live (pushed); Actions/act_runner pipeline not yet built
Live usage stats for /capacity-review ADR-012 / TODO 8.4 gather_usage() stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster
/security-review skill ADR-002 / TODO 8.5 Periodic posture re-check + accepted-risk re-challenge; planned, not built
CIS hardening (Debian L1+L2 + Docker) ADR-002 / TODO 15 Implemented by the (unbuilt) base/docker_host roles; brings AppArmor + AIDE as baseline. L2 partitions affect VM provisioning (ADR-006)
Network IDS + security alerting ADR-002 / TODO 15 Suricata on OPNsense + AIDE/auditd/fail2ban alerting into the monitoring stack; not built
NetBird mesh — coordinator on askari ADR-016 Design RESOLVED (ADR-016 + spec + plan); resolves ADR-015 deferred #1. Self-hosted NetBird control plane (management/signal/relay) on askari; replaces ADR-007 WireGuard. Build pending: not deployed (askari + service-role machinery not built).
NetBird agent enrollment in base ADR-016 Design RESOLVED (ADR-016). Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on wt0. Build pending: base role not built.
Service-UI verification (Level 4) ADR-017 / ADR-008 Design RESOLVED (ADR-017 + spec + plan); resolves ADR-015 deferred #2. /verify-service skill + VERIFY.md template + standards are authorable and present. Build pending: running needs ubongo + playwright plugin + Authentik + a staging deploy.
Logging pipeline (Loki + Alloy + off-site subset) ADR-018 Design RESOLVED (ADR-018 + spec). All logs → on-cluster Loki; security subset write-only off-site to askari. Build pending: Alloy in base, loki/grafana service roles, OPNsense syslog — none built.
Security alerting (AIDE/auditd/fail2ban/Suricata + log-silence) ADR-002 / ADR-018 Wired into Grafana on the Loki stack. Designed; depends on the logging pipeline + metrics stack (TODO 3.6).
Operational-access doctrine (ADR-021) ADR-021 Design RESOLVED (ADR-021 + spec + plan). Two-layer doctrine, three-tier access ladder, access__* model, ACCESS.md record, /check-access. Reconciles ADR-016/020 SSH.
ssh-from-control firewall source ADR-021 / ADR-020 Built (dormant). base__firewall_control_addr knob + nftables rule + Molecule assertion landed; empty default = no rule until ubongo's LAN address is set in group_vars.
/check-access verifier ADR-021 Design RESOLVED (.claude/commands/check-access.md authored). Build pending: running needs ubongo + live/staging hosts + vault. Access analogue of /verify-service (ADR-017).
Per-service ACCESS.md records ADR-021 Template + governance present; per-service files render when each service role is built.
Backup backup role + backup_hosts group ADR-022 Does not exist. Pull node (fisi), restic repo, rclone→pCloud, USB air-gap — Plan 2.
Per-service backup__* contract + BACKUP.md ADR-022 Convention defined; inert until service roles exist to declare against.

Keeping this honest

Update this file whenever you build, stub, or remove something. It is the first place an AI tool or new contributor should look to learn what they can actually rely on. When a row moves from "designed" to "working", move it up — don't leave stale optimism here.