Record ADR-012 + STATUS/CLAUDE/scripts docs for capacity tooling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-01 10:34:38 +02:00
parent 1060a9c08a
commit 4c535c908e
4 changed files with 46 additions and 0 deletions

View file

@ -31,6 +31,7 @@ Full design rationale: `docs/decisions/`
| Deploy a playbook | `make deploy PLAYBOOK=<name>` |
| Scaffold a new role | `make new-role NAME=<name>` |
| Review repo for drift/cruft | `/review-repo` (Claude command) |
| Review hardware capacity | `/capacity-review` (Claude command) |
| Encrypt a vault file | `make encrypt FILE=<path>` |
| Decrypt a vault file | `make decrypt FILE=<path>` |
| Install Python deps | `make setup` |
@ -170,6 +171,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
| Testing methodology | `docs/decisions/008-testing.md` |
| TF ↔ Ansible handoff | `docs/decisions/009-provisioning-handoff.md` |
| Forgejo & CI | `docs/decisions/010-forgejo-ci.md` |
| Hardware & capacity | `docs/decisions/012-hardware-capacity.md` |
| Adding a new role | `docs/runbooks/new-role.md` |
| Adding a new host | `docs/runbooks/new-host.md` |
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |

View file

@ -21,6 +21,8 @@ _Last reviewed: 2026-05-30._
| Vault password client | `scripts/vault-pass-client.sh` fetches the master password from Vaultwarden via `rbw` (wired as `vault_password_file`). Requires `rbw` installed + `rbw unlock`. |
| `/review-repo` | Repo audit: `scripts/repo-scan.py` (Phase 0) + `.claude/commands/review-repo.md`, reports to `docs/reviews/`. On-demand only; cron + email deferred (`docs/TODO.md`). |
| Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below |
| `docs/hardware/reference.md` + `scripts/capacity-scan.py` | Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON |
| `/capacity-review` | Works — on-demand capacity evaluation → `docs/hardware/reviews/`. Intent-based (no live usage yet) |
## Scaffolded but empty — NOT implemented
@ -44,6 +46,7 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
| Level 2 / 3 testing (staging, `askari` smoke) | ADR-008 | Depends on real VMs / `askari`, which don't exist yet |
| Per-service roles | ADR-004 | Model defined; no service roles built |
| Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
| Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |
## Keeping this honest

View file

@ -0,0 +1,37 @@
# ADR-012 — Hardware reference & capacity evaluation
## Context
The repo modelled the logical/network layer (Terraform VM specs, ADR-007
topology) but not the physical layer — node CPU/RAM/disk capacity, network gear,
or which workloads are designed to run where with what headroom. There was also
no way to ask "is this well-proportioned?" — e.g. HA that isn't needed, a
workload that should move, or a node due an upgrade.
## Decision
- `docs/hardware/reference.md` is the single, hand-maintained source of truth for
physical compute + network gear and workload placement intent. Two
machine-readable tables (node capacity, workload placement) carry the numbers.
- `scripts/capacity-scan.py` (stdlib-only, like `repo-scan.py` / `tf_to_inventory.py`)
parses those tables, computes per-node allocated-vs-physical rollups, and
cross-checks workload hostnames against `terraform output -json` /
`ansible-inventory --list` to surface drift.
- `/capacity-review` reads the scan + intent columns and writes a dated report to
`docs/hardware/reviews/`, mirroring `/review-repo``docs/reviews/`.
- Numeric allocations live in `reference.md`, not Terraform: the current
`terraform output` exposes only `{ip, group}`. Terraform/inventory are used
only for hostname-drift cross-checks.
- **Live usage stats are a future hook.** The cluster is not stood up;
`gather_usage()` returns `available: false` and the evaluator reasons on
declared intent. The usage source (Proxmox RRD vs Prometheus/Loki/Grafana/
Alloy) is undecided — see docs/TODO.md 8.4, to be settled before any hook is
built.
## Consequences
- Right-sizing advice is intent-based until usage data exists; reports say so.
- `reference.md` table headers are a parser contract — changing them needs a
matching `capacity-scan.py` change.
See also: ADR-001 (architecture), ADR-007 (network), ADR-009 (TF ↔ Ansible handoff).

View file

@ -11,3 +11,7 @@ dependencies (keeps them runnable anywhere without a venv).
plaintext secrets.
- `repo-scan.py` — Phase-0 deterministic scan for `/review-repo` (markers, broken
refs, unencrypted vaults, inventory).
- `capacity-scan.py` — deterministic capacity facts for `/capacity-review`: parses
the machine-readable tables in `docs/hardware/reference.md`, computes per-node
allocated-vs-physical rollups, and cross-checks workload hostnames against
Terraform output / Ansible inventory for drift. Emits JSON. See **ADR-012**.