boma/STATUS.md
sjat 349d10d65c docs: record ubongo physical build (2026-06-11)
Move ubongo to 'Built (partial)' in STATUS; fill real M70q hardware specs
(i3-10100T, 16 GB, 256 GB SanDisk X600 SATA, no disk encryption). Record in
ADR-015 the dedicated claude AI-worker identity, LAN-SSH-only operational
reality, and the no-encryption decision; close the rbw offline-cache
recovery-verification item (ADR-015 + rotate-secrets). Add accepted-risk R5
(control-node disk unencrypted at rest) with its compensating controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:26 +02:00

8.4 KiB

Project status — what's real vs planned

This repo is partly aspirational: the ADRs in docs/decisions/ describe the intended design, and some of it is not built yet. This file is the ground truth. Before relying on a role, provider, or pipeline existing, check here. If something is listed as "designed, not built", do not assume it works.

Last reviewed: 2026-06-11.

Real and working today

Thing State
playbooks/bootstrap.yml Works — self-contained (installs Python, creates the ansible user + sudoers)
scripts/tf_to_inventory.py Works — stdlib only; terraform output -jsonhosts.yml
.docker/molecule-debian13/Dockerfile Present — custom Molecule test image (ADR-008)
docs/decisions/*, docs/runbooks/* Current and mutually reconciled
Makefile, lint config (.ansible-lint, .yamllint), .gitignore Present and used
git Initialized, trunk-based on main, pushed to origin (forgejo.nyumbani.baobab.band:7577).
Pre-commit hooks Configured: lint, gitleaks, vault-encryption guard. Activate with pre-commit install after make setup.
Vault password client scripts/vault-pass-client.sh fetches the master password from Vaultwarden via rbw (wired as vault_password_file). Requires rbw installed + rbw unlock.
/review-repo Repo audit: scripts/repo-scan.py (Phase 0) + .claude/commands/review-repo.md, reports to docs/reviews/. On-demand only; cron + email deferred (docs/TODO.md).
Terraform HCL (terraform/) Written (proxmox VM module + envs) — but never run; see below
docs/hardware/reference.md + scripts/capacity-scan.py Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON
/capacity-review Works — on-demand capacity evaluation → docs/hardware/reviews/. Intent-based (no live usage yet)
ADR-002 security strategy + docs/security/{accepted-risks,service-checklist}.md Present — threat model, principles, governance frame; checklist + risk register are docs, enforced manually in review
Service-role standard + per-service SECURITY.md convention Defined (ADR-004 + docs/security/service-security-template.md); not yet applied — no service roles exist
Tag standard + enforcement (ADR-019) Works — tests/tags.yml (closed vocabulary) + scripts/check-tags.py (run by make lint, unit-tested): enforces the tag vocabulary and that each role import in a play's roles: block carries its role-name tag. Governs mostly-unbuilt roles, but the linter is live now. Proxmox VM tag convention (<env>, group, managed-by=terraform) is in the Terraform HCL but unprovisioned.
ubongo — physical control / AI-worker host (ADR-015) Built (partial). Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to fisi (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via make setup/make collections). Repo cloned under a dedicated claude user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory control group at 10.20.10.151. Pending: NetBird mesh enrollment (so SSH is LAN-only); full base hardening (only the firewall concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); OPNsense DHCP reservation for 10.20.10.151 (MAC 88:a4:c2:e0:ee:da); Terraform state backup (no TF state yet).

Scaffolded but empty — NOT implemented

Thing State
roles/base/ Partially built. The firewall concern is implemented (nftables: catalog-driven default-deny + east-west allowlist + auto-rollback apply; ADR-020) with pytest + Molecule render/syntax tests. Other concerns (SSH hardening, fail2ban, auditd, packages, users) are not built yet, so make deploy PLAYBOOK=site is still incomplete.
roles/docker_host/ Not in git. Same.
inventories/*/hosts.yml Structured stubs with empty host maps (hosts: {}); regenerated by make tf-inventory once Terraform has hosts
inventories/production/group_vars/{docker_hosts,proxmox_hosts}/ Empty dirs

So make deploy PLAYBOOK=site is still incomplete — base is only partially built (its firewall concern only) and the docker_host role does not exist yet.

Designed but not built

Thing Designed in Notes
dns role (renders the internal zone) ADR-007 / ADR-009 Does not exist. Internal DNS ownership is assigned to it by design.
Terraform actually provisioning ADR-006 / ADR-009 Never terraform inited: no .terraform.lock.hcl, no state, no real local.vms entries
CI (Forgejo Actions) ADR-003 / ADR-008 Pipeline described; not implemented
Level 2 / 3 testing (staging, askari smoke) ADR-008 Depends on real VMs / askari, which don't exist yet
Per-service roles ADR-004 Model defined; no service roles built
Forgejo Actions CI ADR-003 / ADR-008 Remote is live (pushed); Actions/act_runner pipeline not yet built
Live usage stats for /capacity-review ADR-012 / TODO 8.4 gather_usage() stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster
/security-review skill ADR-002 / TODO 8.5 Periodic posture re-check + accepted-risk re-challenge; planned, not built
CIS hardening (Debian L1+L2 + Docker) ADR-002 / TODO 15 Implemented by the (unbuilt) base/docker_host roles; brings AppArmor + AIDE as baseline. L2 partitions affect VM provisioning (ADR-006)
Network IDS + security alerting ADR-002 / TODO 15 Suricata on OPNsense + AIDE/auditd/fail2ban alerting into the monitoring stack; not built
NetBird mesh — coordinator on askari ADR-016 Design RESOLVED (ADR-016 + spec + plan); resolves ADR-015 deferred #1. Self-hosted NetBird control plane (management/signal/relay) on askari; replaces ADR-007 WireGuard. Build pending: not deployed (askari + service-role machinery not built).
NetBird agent enrollment in base ADR-016 Design RESOLVED (ADR-016). Every Linux host joins the mesh via the base role (setup keys in vault); SSH allowed only on wt0. Build pending: base role not built.
Service-UI verification (Level 4) ADR-017 / ADR-008 Design RESOLVED (ADR-017 + spec + plan); resolves ADR-015 deferred #2. /verify-service skill + VERIFY.md template + standards are authorable and present. Build pending: running needs ubongo + playwright plugin + Authentik + a staging deploy.
Logging pipeline (Loki + Alloy + off-site subset) ADR-018 Design RESOLVED (ADR-018 + spec). All logs → on-cluster Loki; security subset write-only off-site to askari. Build pending: Alloy in base, loki/grafana service roles, OPNsense syslog — none built.
Security alerting (AIDE/auditd/fail2ban/Suricata + log-silence) ADR-002 / ADR-018 Wired into Grafana on the Loki stack. Designed; depends on the logging pipeline + metrics stack (TODO 3.6).
Operational-access doctrine (ADR-021) ADR-021 Design RESOLVED (ADR-021 + spec + plan). Two-layer doctrine, three-tier access ladder, access__* model, ACCESS.md record, /check-access. Reconciles ADR-016/020 SSH.
ssh-from-control firewall source ADR-021 / ADR-020 Built (dormant). base__firewall_control_addr knob + nftables rule + Molecule assertion landed; empty default = no rule until ubongo's LAN address is set in group_vars.
/check-access verifier ADR-021 Design RESOLVED (.claude/commands/check-access.md authored). Build pending: running needs ubongo + live/staging hosts + vault. Access analogue of /verify-service (ADR-017).
Per-service ACCESS.md records ADR-021 Template + governance present; per-service files render when each service role is built.
Backup backup role + backup_hosts group ADR-022 Does not exist. Pull node (fisi), restic repo, rclone→pCloud, USB air-gap — Plan 2.
Per-service backup__* contract + BACKUP.md ADR-022 Convention defined; inert until service roles exist to declare against.

Keeping this honest

Update this file whenever you build, stub, or remove something. It is the first place an AI tool or new contributor should look to learn what they can actually rely on. When a row moves from "designed" to "working", move it up — don't leave stale optimism here.