10 KiB
Project status — what's real vs planned
This repo is partly aspirational: the ADRs in docs/decisions/ describe the
intended design, and some of it is not built yet. This file is the ground
truth. Before relying on a role, provider, or pipeline existing, check here.
If something is listed as "designed, not built", do not assume it works.
Last reviewed: 2026-06-11.
Real and working today
| Thing | State |
|---|---|
playbooks/bootstrap.yml |
Works — self-contained (installs Python, creates the ansible user + sudoers) |
scripts/tf_to_inventory.py |
Works — stdlib only; terraform output -json → hosts.yml |
.docker/molecule-debian13/Dockerfile |
Present — custom Molecule test image (ADR-008) |
docs/decisions/*, docs/runbooks/* |
Current and mutually reconciled |
Makefile, lint config (.ansible-lint, .yamllint), .gitignore |
Present and used |
git |
Initialized, trunk-based on main, pushed to origin (forgejo.nyumbani.baobab.band:7577). |
| Pre-commit hooks | Configured: lint, gitleaks, vault-encryption guard. Activate with pre-commit install after make setup. |
| Vault password client | scripts/vault-pass-client.sh fetches the master password from Vaultwarden via rbw (wired as vault_password_file). Requires rbw installed + rbw unlock. |
/review-repo |
Repo audit: scripts/repo-scan.py (Phase 0) + .claude/commands/review-repo.md, reports to docs/reviews/. On-demand only; cron + email deferred (docs/TODO.md). |
Terraform HCL (terraform/) |
Written (proxmox VM module + envs) — but never run; see below |
docs/hardware/reference.md + scripts/capacity-scan.py |
Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON |
/capacity-review |
Works — on-demand capacity evaluation → docs/hardware/reviews/. Intent-based (no live usage yet) |
ADR-002 security strategy + docs/security/{accepted-risks,service-checklist}.md |
Present — threat model, principles, governance frame; checklist + risk register are docs, enforced manually in review |
Service-role standard + per-service SECURITY.md convention |
Defined (ADR-004 + docs/security/service-security-template.md); not yet applied — no service roles exist |
| Tag standard + enforcement (ADR-019) | Works — tests/tags.yml (closed vocabulary) + scripts/check-tags.py (run by make lint, unit-tested): enforces the tag vocabulary and that each role import in a play's roles: block carries its role-name tag. Governs mostly-unbuilt roles, but the linter is live now. Proxmox VM tag convention (<env>, group, managed-by=terraform) is in the Terraform HCL but unprovisioned. |
roles/dev_env/ — interactive developer environment |
Built + applied. zsh + oh-my-zsh + oh-my-posh, tmux + TPM plugins, neovim; dotfiles deployed via GNU stow (re-derived from V4/fisi per ADR-013). Node.js from a pinned upstream tarball (not Debian's npm). Lint + Molecule (idempotent) green. Applied to ubongo for users sjat + claude (verified: zsh login shells, stow-symlinked .zshrc/.tmux.conf + nvim config, oh-my-zsh, tmux plugins; nvim v0.12.2, oh-my-posh 29.0.1). Run via playbooks/workstation.yml against the control group (no dedicated workstations group yet). |
make check / make deploy PLAYBOOK=<name> |
Works. First end-to-end run (applying dev_env) surfaced + fixed latent bugs: Makefile PLAYBOOK var collision (binary path vs playbook-name arg) meant the targets never ran; ansible.cfg referenced uninstalled community.general callbacks (now built-in default + ansible.posix.profile_tasks); acl package added so Ansible can become_user an unprivileged user. The make targets now function — though site/base/docker_host content is still incomplete (see below). |
ubongo — physical control / AI-worker host (ADR-015) |
Built (partial). Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to fisi (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via make setup/make collections). Repo cloned under a dedicated claude user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory control group at 10.20.10.151. dev_env now applied here (zsh/tmux/nvim for sjat + claude, via playbooks/workstation.yml). Managed as the operator account sjat (group_vars/control sets ansible_user: sjat), not the ansible service user group_vars/all assumes — ubongo has no bootstrapped ansible user. Pending: NetBird mesh enrollment (so SSH is LAN-only); full base hardening (only the firewall concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper ansible-user bootstrap (currently managed as sjat); OPNsense DHCP reservation for 10.20.10.151 (MAC 88:a4:c2:e0:ee:da); Terraform state backup (no TF state yet). |
Scaffolded but empty — NOT implemented
| Thing | State |
|---|---|
roles/base/ |
Partially built. The firewall concern is implemented (nftables: catalog-driven default-deny + east-west allowlist + auto-rollback apply; ADR-020) with pytest + Molecule render/syntax tests. Other concerns (SSH hardening, fail2ban, auditd, packages, users) are not built yet, so make deploy PLAYBOOK=site has no real content to apply (the make target itself now works — see "Real and working today"). |
roles/docker_host/ |
Not in git. Same. |
inventories/*/hosts.yml |
Structured stubs with empty host maps (hosts: {}); regenerated by make tf-inventory once Terraform has hosts |
inventories/production/group_vars/{docker_hosts,proxmox_hosts}/ |
Empty dirs |
So make deploy PLAYBOOK=site has no real content to apply — base is only partially
built (its firewall concern only) and the docker_host role does not exist yet. (The
make check/deploy machinery itself now works — first proven by applying dev_env via
playbooks/workstation.yml.)
Designed but not built
| Thing | Designed in | Notes |
|---|---|---|
dns role (renders the internal zone) |
ADR-007 / ADR-009 | Does not exist. Internal DNS ownership is assigned to it by design. |
| Terraform actually provisioning | ADR-006 / ADR-009 | Never terraform inited: no .terraform.lock.hcl, no state, no real local.vms entries |
| CI (Forgejo Actions) | ADR-003 / ADR-008 | Pipeline described; not implemented |
Level 2 / 3 testing (staging, askari smoke) |
ADR-008 | Depends on real VMs / askari, which don't exist yet |
| Per-service roles | ADR-004 | Model defined; no service roles built |
| Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/act_runner pipeline not yet built |
Live usage stats for /capacity-review |
ADR-012 / TODO 8.4 | gather_usage() stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |
/security-review skill |
ADR-002 / TODO 8.5 | Periodic posture re-check + accepted-risk re-challenge; planned, not built |
| CIS hardening (Debian L1+L2 + Docker) | ADR-002 / TODO 15 | Implemented by the (unbuilt) base/docker_host roles; brings AppArmor + AIDE as baseline. L2 partitions affect VM provisioning (ADR-006) |
| Network IDS + security alerting | ADR-002 / TODO 15 | Suricata on OPNsense + AIDE/auditd/fail2ban alerting into the monitoring stack; not built |
NetBird mesh — coordinator on askari |
ADR-016 | Design RESOLVED (ADR-016 + spec + plan); resolves ADR-015 deferred #1. Self-hosted NetBird control plane (management/signal/relay) on askari; replaces ADR-007 WireGuard. Build pending: not deployed (askari + service-role machinery not built). |
NetBird agent enrollment in base |
ADR-016 | Design RESOLVED (ADR-016). Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on wt0. Build pending: base role not built. |
| Service-UI verification (Level 4) | ADR-017 / ADR-008 | Design RESOLVED (ADR-017 + spec + plan); resolves ADR-015 deferred #2. /verify-service skill + VERIFY.md template + standards are authorable and present. Build pending: running needs ubongo + playwright plugin + Authentik + a staging deploy. |
| Logging pipeline (Loki + Alloy + off-site subset) | ADR-018 | Design RESOLVED (ADR-018 + spec). All logs → on-cluster Loki; security subset write-only off-site to askari. Build pending: Alloy in base, loki/grafana service roles, OPNsense syslog — none built. |
| Security alerting (AIDE/auditd/fail2ban/Suricata + log-silence) | ADR-002 / ADR-018 | Wired into Grafana on the Loki stack. Designed; depends on the logging pipeline + metrics stack (TODO 3.6). |
| Operational-access doctrine (ADR-021) | ADR-021 | Design RESOLVED (ADR-021 + spec + plan). Two-layer doctrine, three-tier access ladder, access__* model, ACCESS.md record, /check-access. Reconciles ADR-016/020 SSH. |
ssh-from-control firewall source |
ADR-021 / ADR-020 | Built (dormant). base__firewall_control_addr knob + nftables rule + Molecule assertion landed; empty default = no rule until ubongo's LAN address is set in group_vars. |
/check-access verifier |
ADR-021 | Design RESOLVED (.claude/commands/check-access.md authored). Build pending: running needs ubongo + live/staging hosts + vault. Access analogue of /verify-service (ADR-017). |
Per-service ACCESS.md records |
ADR-021 | Template + governance present; per-service files render when each service role is built. |
Backup backup role + backup_hosts group |
ADR-022 | Does not exist. Pull node (fisi), restic repo, rclone→pCloud, USB air-gap — Plan 2. |
Per-service backup__* contract + BACKUP.md |
ADR-022 | Convention defined; inert until service roles exist to declare against. |
Keeping this honest
Update this file whenever you build, stub, or remove something. It is the first place an AI tool or new contributor should look to learn what they can actually rely on. When a row moves from "designed" to "working", move it up — don't leave stale optimism here.