# ADR-012 — Hardware reference & capacity evaluation

## Status

Accepted (2026-06-01)

## Context

The repo modelled the logical/network layer (Terraform VM specs, ADR-007
topology) but not the physical layer — node CPU/RAM/disk capacity, network gear,
or which workloads are designed to run where with what headroom. There was also
no way to ask "is this well-proportioned?" — e.g. HA that isn't needed, a
workload that should move, or a node due an upgrade.

## Decision

- `docs/hardware/reference.md` is the single, hand-maintained source of truth for
  physical compute + network gear and workload placement intent. Two
  machine-readable tables (node capacity, workload placement) carry the numbers.
  This includes `ubongo`, the physical control node (ADR-015), even though it sits
  outside the Proxmox cluster.
- `scripts/capacity-scan.py` (stdlib-only, like `repo-scan.py` / `tf_to_inventory.py`)
  parses those tables, computes per-node allocated-vs-physical rollups, and
  cross-checks workload hostnames against `terraform output -json` /
  `ansible-inventory --list` to surface drift.
- `/capacity-review` reads the scan + intent columns and writes a dated report to
  `docs/hardware/reviews/YYYY-MM-DD-capacity.md`, also overwriting
  `docs/hardware/reviews/latest.md`, mirroring `/review-repo` → `docs/reviews/`.
- Numeric allocations live in `reference.md`, not Terraform: the current
  `terraform output` exposes only `{ip, group}`. Terraform/inventory are used
  only for hostname-drift cross-checks.
- **Live usage stats are a future hook.** The cluster is not stood up;
  `gather_usage()` returns `available: false` and the evaluator reasons on
  declared intent. The usage source (Proxmox RRD vs Prometheus/Loki/Grafana/
  Alloy) is undecided — see docs/TODO.md 8.4, to be settled before any hook is
  built.

## Consequences

- Right-sizing advice is intent-based until usage data exists; reports say so.
- `reference.md` table headers are a parser contract — changing them needs a
  matching `capacity-scan.py` change.
- Log storage (ADR-018) is a tracked allocation: the cluster Loki host's retention
  budget and `askari`'s security-subset volume belong in `reference.md`, and SSD
  **wearout/TBW** is a monitored metric — logging is write-heavy, so wear is watched,
  not assumed.

See also: ADR-001 (architecture), ADR-007 (network), ADR-009 (TF ↔ Ansible handoff).