Add /capacity-review skill
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
05694f6ea4
commit
1060a9c08a
1 changed files with 66 additions and 0 deletions
66
.claude/commands/capacity-review.md
Normal file
66
.claude/commands/capacity-review.md
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
# Evaluate the homelab's hardware capacity and workload placement
|
||||
|
||||
Assess current allocation headroom, HA posture, and workload placement against declared
|
||||
intent, and write a tracked report to `docs/hardware/reviews/`. On-demand only;
|
||||
scheduled runs are deferred (see `docs/TODO.md` 8.4).
|
||||
|
||||
## Reference material
|
||||
|
||||
- `docs/hardware/reference.md` — physical node specs, workload allocations, and the
|
||||
free-text intent columns (`criticality`, `ha_intent`, `profile`, `constraints`, `growth`).
|
||||
- `scripts/capacity-scan.py` — deterministic scan; emits JSON with keys `nodes`,
|
||||
`workloads`, `usage`, `warnings`.
|
||||
|
||||
## Process
|
||||
|
||||
### Phase 0 — gather facts
|
||||
|
||||
Run `python3 scripts/capacity-scan.py` and parse its JSON output:
|
||||
|
||||
- `nodes` — per-node physical totals, allocated totals, `ram_headroom_pct`, and the
|
||||
`oversubscribed` flag.
|
||||
- `workloads` — per-workload allocation rows from `reference.md`.
|
||||
- `usage` — live usage stats if available; check `usage.available`. If `false`, every
|
||||
recommendation in the report is **intent-based, not usage-based** — state this
|
||||
prominently in the report header.
|
||||
- `warnings` — drift findings the scan has already detected (reference vs Terraform/inventory).
|
||||
|
||||
### Phase 1 — read intent
|
||||
|
||||
Read `docs/hardware/reference.md` for the free-text columns the scan does not parse:
|
||||
`criticality`, `ha_intent`, `profile`, `constraints`, and `growth`, plus the
|
||||
"Capacity notes" section at the bottom of the file.
|
||||
|
||||
### Phase 2 — reason across five dimensions
|
||||
|
||||
Produce concrete, actionable recommendations. Tag every item with its type and the
|
||||
basis it rests on (**intent-based** vs **usage-based**):
|
||||
|
||||
1. **HA / redundancy** — anti-affinity violations (e.g. an HA pair co-located on one
|
||||
node), single points of failure, HA posture that looks like overkill for the
|
||||
declared `criticality`, and high-criticality workloads with no redundancy.
|
||||
2. **Right-sizing** — over- or under-provisioned workloads compared to their `profile`.
|
||||
Today this is intent-based (declared allocation vs profile); flag explicitly that it
|
||||
becomes usage-based once the `gather_usage()` hook in the scan script is live.
|
||||
3. **Placement / moves** — oversubscribed nodes (`oversubscribed: true` or low
|
||||
`ram_headroom_pct`) and constraint-driven relocations indicated by `constraints`.
|
||||
4. **Upgrade timing** — cross-reference `growth` notes against current headroom to
|
||||
estimate a rough runway before a node upgrade is needed.
|
||||
5. **Drift** — surface every entry in the scan's `warnings` array verbatim.
|
||||
|
||||
### Phase 3 — write the report
|
||||
|
||||
Save the report to `docs/hardware/reviews/YYYY-MM-DD-capacity.md` and overwrite
|
||||
`docs/hardware/reviews/latest.md` with the same content.
|
||||
|
||||
Report structure:
|
||||
|
||||
- **One-line summary** — overall health signal (e.g. "All nodes within headroom;
|
||||
two HA violations detected").
|
||||
- **Run metadata** — date, reviewed commit SHA, `usage.available` status.
|
||||
- **Section per dimension** — each with concrete, actionable items; every item states
|
||||
its basis (intent-based or usage-based) and the evidence behind it.
|
||||
- **Follow-up prompt** — a generated, copy-pasteable prompt for the next review or
|
||||
for acting on the top finding.
|
||||
|
||||
Commit the report files per CLAUDE.md git conventions.
|
||||
Loading…
Add table
Reference in a new issue