3.2 KiB
Evaluate the homelab's hardware capacity and workload placement
Assess current allocation headroom, HA posture, and workload placement against declared
intent, and write a tracked report to docs/hardware/reviews/. On-demand only;
scheduled runs are deferred (see docs/TODO.md 8.4).
Reference material
docs/hardware/reference.md— physical node specs, workload allocations, and the free-text intent columns (criticality,ha_intent,profile,constraints,growth).scripts/capacity-scan.py— deterministic scan; emits JSON with keysnodes,workloads,usage,warnings.
Process
Phase 0 — gather facts
Run python3 scripts/capacity-scan.py and parse its JSON output:
nodes— per-node physical totals, allocated totals,ram_headroom_pct, and theoversubscribedflag.workloads— per-workload allocation rows fromreference.md.usage— live usage stats if available; checkusage.available. Iffalse, every recommendation in the report is intent-based, not usage-based — state this prominently in the report header.warnings— drift findings the scan has already detected (reference vs Terraform/inventory).
Phase 1 — read intent
Read docs/hardware/reference.md for the free-text columns the scan does not parse:
criticality, ha_intent, profile, constraints, and growth, plus the
"Capacity notes" section at the bottom of the file.
Phase 2 — reason across five dimensions
Produce concrete, actionable recommendations. Tag every item with its type and the basis it rests on (intent-based vs usage-based):
- HA / redundancy — anti-affinity violations (e.g. an HA pair co-located on one
node), single points of failure, HA posture that looks like overkill for the
declared
criticality, and high-criticality workloads with no redundancy. - Right-sizing — over- or under-provisioned workloads compared to their
profile. Today this is intent-based (declared allocation vs profile); flag explicitly that it becomes usage-based once thegather_usage()hook in the scan script is live. - Placement / moves — oversubscribed nodes (
oversubscribed: trueor lowram_headroom_pct) and constraint-driven relocations indicated byconstraints. - Upgrade timing — cross-reference
growthnotes against current headroom to estimate a rough runway before a node upgrade is needed. - Drift — surface every entry in the scan's
warningsarray verbatim.
Phase 3 — write the report
Save the report to docs/hardware/reviews/YYYY-MM-DD-capacity.md and overwrite
docs/hardware/reviews/latest.md with the same content.
Report structure:
- One-line summary — overall health signal (e.g. "All nodes within headroom; two HA violations detected").
- Run metadata — date, reviewed commit SHA,
usage.availablestatus. - Section per dimension — each with concrete, actionable items; every item states its basis (intent-based or usage-based) and the evidence behind it.
- Follow-up prompt — a generated, copy-pasteable prompt for the next review or for acting on the top finding.
Commit the report files per CLAUDE.md git conventions.