From 1060a9c08af3d32e5fa1d8dd89cd8195fe23bc0a Mon Sep 17 00:00:00 2001 From: sjat Date: Mon, 1 Jun 2026 10:32:07 +0200 Subject: [PATCH] Add /capacity-review skill Co-Authored-By: Claude Sonnet 4.6 --- .claude/commands/capacity-review.md | 66 +++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 .claude/commands/capacity-review.md diff --git a/.claude/commands/capacity-review.md b/.claude/commands/capacity-review.md new file mode 100644 index 0000000..3daaf74 --- /dev/null +++ b/.claude/commands/capacity-review.md @@ -0,0 +1,66 @@ +# Evaluate the homelab's hardware capacity and workload placement + +Assess current allocation headroom, HA posture, and workload placement against declared +intent, and write a tracked report to `docs/hardware/reviews/`. On-demand only; +scheduled runs are deferred (see `docs/TODO.md` 8.4). + +## Reference material + +- `docs/hardware/reference.md` — physical node specs, workload allocations, and the + free-text intent columns (`criticality`, `ha_intent`, `profile`, `constraints`, `growth`). +- `scripts/capacity-scan.py` — deterministic scan; emits JSON with keys `nodes`, + `workloads`, `usage`, `warnings`. + +## Process + +### Phase 0 — gather facts + +Run `python3 scripts/capacity-scan.py` and parse its JSON output: + +- `nodes` — per-node physical totals, allocated totals, `ram_headroom_pct`, and the + `oversubscribed` flag. +- `workloads` — per-workload allocation rows from `reference.md`. +- `usage` — live usage stats if available; check `usage.available`. If `false`, every + recommendation in the report is **intent-based, not usage-based** — state this + prominently in the report header. +- `warnings` — drift findings the scan has already detected (reference vs Terraform/inventory). + +### Phase 1 — read intent + +Read `docs/hardware/reference.md` for the free-text columns the scan does not parse: +`criticality`, `ha_intent`, `profile`, `constraints`, and `growth`, plus the +"Capacity notes" section at the bottom of the file. + +### Phase 2 — reason across five dimensions + +Produce concrete, actionable recommendations. Tag every item with its type and the +basis it rests on (**intent-based** vs **usage-based**): + +1. **HA / redundancy** — anti-affinity violations (e.g. an HA pair co-located on one + node), single points of failure, HA posture that looks like overkill for the + declared `criticality`, and high-criticality workloads with no redundancy. +2. **Right-sizing** — over- or under-provisioned workloads compared to their `profile`. + Today this is intent-based (declared allocation vs profile); flag explicitly that it + becomes usage-based once the `gather_usage()` hook in the scan script is live. +3. **Placement / moves** — oversubscribed nodes (`oversubscribed: true` or low + `ram_headroom_pct`) and constraint-driven relocations indicated by `constraints`. +4. **Upgrade timing** — cross-reference `growth` notes against current headroom to + estimate a rough runway before a node upgrade is needed. +5. **Drift** — surface every entry in the scan's `warnings` array verbatim. + +### Phase 3 — write the report + +Save the report to `docs/hardware/reviews/YYYY-MM-DD-capacity.md` and overwrite +`docs/hardware/reviews/latest.md` with the same content. + +Report structure: + +- **One-line summary** — overall health signal (e.g. "All nodes within headroom; + two HA violations detected"). +- **Run metadata** — date, reviewed commit SHA, `usage.available` status. +- **Section per dimension** — each with concrete, actionable items; every item states + its basis (intent-based or usage-based) and the evidence behind it. +- **Follow-up prompt** — a generated, copy-pasteable prompt for the next review or + for acting on the top finding. + +Commit the report files per CLAUDE.md git conventions.