Record ADR-012 + STATUS/CLAUDE/scripts docs for capacity tooling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:34:38 +02:00 · 2026-06-01 10:34:38 +02:00 · 4c535c908e
commit 4c535c908e
parent 1060a9c08a
4 changed files with 46 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -31,6 +31,7 @@ Full design rationale: `docs/decisions/`
 | Deploy a playbook             | `make deploy PLAYBOOK=<name>`                    |
 | Scaffold a new role           | `make new-role NAME=<name>`                      |
 | Review repo for drift/cruft   | `/review-repo` (Claude command)                  |
+| Review hardware capacity      | `/capacity-review` (Claude command)              |
 | Encrypt a vault file          | `make encrypt FILE=<path>`                       |
 | Decrypt a vault file          | `make decrypt FILE=<path>`                       |
 | Install Python deps           | `make setup`                                     |
@ -170,6 +171,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
 | Testing methodology    | `docs/decisions/008-testing.md`       |
 | TF ↔ Ansible handoff   | `docs/decisions/009-provisioning-handoff.md` |
 | Forgejo & CI           | `docs/decisions/010-forgejo-ci.md`    |
+| Hardware & capacity    | `docs/decisions/012-hardware-capacity.md` |
 | Adding a new role      | `docs/runbooks/new-role.md`           |
 | Adding a new host      | `docs/runbooks/new-host.md`           |
 | Rotating vault secrets | `docs/runbooks/rotate-secrets.md`     |
--- a/STATUS.md
+++ b/STATUS.md
@ -21,6 +21,8 @@ _Last reviewed: 2026-05-30._
 | Vault password client | `scripts/vault-pass-client.sh` fetches the master password from Vaultwarden via `rbw` (wired as `vault_password_file`). Requires `rbw` installed + `rbw unlock`. |
 | `/review-repo` | Repo audit: `scripts/repo-scan.py` (Phase 0) + `.claude/commands/review-repo.md`, reports to `docs/reviews/`. On-demand only; cron + email deferred (`docs/TODO.md`). |
 | Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below |
+| `docs/hardware/reference.md` + `scripts/capacity-scan.py` | Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON |
+| `/capacity-review` | Works — on-demand capacity evaluation → `docs/hardware/reviews/`. Intent-based (no live usage yet) |

 ## Scaffolded but empty — NOT implemented

@ -44,6 +46,7 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
 | Level 2 / 3 testing (staging, `askari` smoke) | ADR-008 | Depends on real VMs / `askari`, which don't exist yet |
 | Per-service roles | ADR-004 | Model defined; no service roles built |
 | Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
+| Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |

 ## Keeping this honest

--- a/docs/decisions/012-hardware-capacity.md
+++ b/docs/decisions/012-hardware-capacity.md
@ -0,0 +1,37 @@
+# ADR-012 — Hardware reference & capacity evaluation
+
+## Context
+
+The repo modelled the logical/network layer (Terraform VM specs, ADR-007
+topology) but not the physical layer — node CPU/RAM/disk capacity, network gear,
+or which workloads are designed to run where with what headroom. There was also
+no way to ask "is this well-proportioned?" — e.g. HA that isn't needed, a
+workload that should move, or a node due an upgrade.
+
+## Decision
+
+- `docs/hardware/reference.md` is the single, hand-maintained source of truth for
+  physical compute + network gear and workload placement intent. Two
+  machine-readable tables (node capacity, workload placement) carry the numbers.
+- `scripts/capacity-scan.py` (stdlib-only, like `repo-scan.py` / `tf_to_inventory.py`)
+  parses those tables, computes per-node allocated-vs-physical rollups, and
+  cross-checks workload hostnames against `terraform output -json` /
+  `ansible-inventory --list` to surface drift.
+- `/capacity-review` reads the scan + intent columns and writes a dated report to
+  `docs/hardware/reviews/`, mirroring `/review-repo` → `docs/reviews/`.
+- Numeric allocations live in `reference.md`, not Terraform: the current
+  `terraform output` exposes only `{ip, group}`. Terraform/inventory are used
+  only for hostname-drift cross-checks.
+- **Live usage stats are a future hook.** The cluster is not stood up;
+  `gather_usage()` returns `available: false` and the evaluator reasons on
+  declared intent. The usage source (Proxmox RRD vs Prometheus/Loki/Grafana/
+  Alloy) is undecided — see docs/TODO.md 8.4, to be settled before any hook is
+  built.
+
+## Consequences
+
+- Right-sizing advice is intent-based until usage data exists; reports say so.
+- `reference.md` table headers are a parser contract — changing them needs a
+  matching `capacity-scan.py` change.
+
+See also: ADR-001 (architecture), ADR-007 (network), ADR-009 (TF ↔ Ansible handoff).
--- a/scripts/README.md
+++ b/scripts/README.md
@ -11,3 +11,7 @@ dependencies (keeps them runnable anywhere without a venv).
  plaintext secrets.
 - `repo-scan.py` — Phase-0 deterministic scan for `/review-repo` (markers, broken
  refs, unencrypted vaults, inventory).
+- `capacity-scan.py` — deterministic capacity facts for `/capacity-review`: parses
+  the machine-readable tables in `docs/hardware/reference.md`, computes per-node
+  allocated-vs-physical rollups, and cross-checks workload hostnames against
+  Terraform output / Ansible inventory for drift. Emits JSON. See **ADR-012**.