Record ADR-012 + STATUS/CLAUDE/scripts docs for capacity tooling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
1060a9c08a
commit
4c535c908e
4 changed files with 46 additions and 0 deletions
|
|
@ -31,6 +31,7 @@ Full design rationale: `docs/decisions/`
|
||||||
| Deploy a playbook | `make deploy PLAYBOOK=<name>` |
|
| Deploy a playbook | `make deploy PLAYBOOK=<name>` |
|
||||||
| Scaffold a new role | `make new-role NAME=<name>` |
|
| Scaffold a new role | `make new-role NAME=<name>` |
|
||||||
| Review repo for drift/cruft | `/review-repo` (Claude command) |
|
| Review repo for drift/cruft | `/review-repo` (Claude command) |
|
||||||
|
| Review hardware capacity | `/capacity-review` (Claude command) |
|
||||||
| Encrypt a vault file | `make encrypt FILE=<path>` |
|
| Encrypt a vault file | `make encrypt FILE=<path>` |
|
||||||
| Decrypt a vault file | `make decrypt FILE=<path>` |
|
| Decrypt a vault file | `make decrypt FILE=<path>` |
|
||||||
| Install Python deps | `make setup` |
|
| Install Python deps | `make setup` |
|
||||||
|
|
@ -170,6 +171,7 @@ Single-contributor, trunk-based (no merge requests / approval gates):
|
||||||
| Testing methodology | `docs/decisions/008-testing.md` |
|
| Testing methodology | `docs/decisions/008-testing.md` |
|
||||||
| TF ↔ Ansible handoff | `docs/decisions/009-provisioning-handoff.md` |
|
| TF ↔ Ansible handoff | `docs/decisions/009-provisioning-handoff.md` |
|
||||||
| Forgejo & CI | `docs/decisions/010-forgejo-ci.md` |
|
| Forgejo & CI | `docs/decisions/010-forgejo-ci.md` |
|
||||||
|
| Hardware & capacity | `docs/decisions/012-hardware-capacity.md` |
|
||||||
| Adding a new role | `docs/runbooks/new-role.md` |
|
| Adding a new role | `docs/runbooks/new-role.md` |
|
||||||
| Adding a new host | `docs/runbooks/new-host.md` |
|
| Adding a new host | `docs/runbooks/new-host.md` |
|
||||||
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |
|
| Rotating vault secrets | `docs/runbooks/rotate-secrets.md` |
|
||||||
|
|
|
||||||
|
|
@ -21,6 +21,8 @@ _Last reviewed: 2026-05-30._
|
||||||
| Vault password client | `scripts/vault-pass-client.sh` fetches the master password from Vaultwarden via `rbw` (wired as `vault_password_file`). Requires `rbw` installed + `rbw unlock`. |
|
| Vault password client | `scripts/vault-pass-client.sh` fetches the master password from Vaultwarden via `rbw` (wired as `vault_password_file`). Requires `rbw` installed + `rbw unlock`. |
|
||||||
| `/review-repo` | Repo audit: `scripts/repo-scan.py` (Phase 0) + `.claude/commands/review-repo.md`, reports to `docs/reviews/`. On-demand only; cron + email deferred (`docs/TODO.md`). |
|
| `/review-repo` | Repo audit: `scripts/repo-scan.py` (Phase 0) + `.claude/commands/review-repo.md`, reports to `docs/reviews/`. On-demand only; cron + email deferred (`docs/TODO.md`). |
|
||||||
| Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below |
|
| Terraform HCL (`terraform/`) | Written (proxmox VM module + envs) — but never run; see below |
|
||||||
|
| `docs/hardware/reference.md` + `scripts/capacity-scan.py` | Present — reference doc (skeleton until real hardware) + stdlib scan; emits capacity JSON |
|
||||||
|
| `/capacity-review` | Works — on-demand capacity evaluation → `docs/hardware/reviews/`. Intent-based (no live usage yet) |
|
||||||
|
|
||||||
## Scaffolded but empty — NOT implemented
|
## Scaffolded but empty — NOT implemented
|
||||||
|
|
||||||
|
|
@ -44,6 +46,7 @@ So `make deploy PLAYBOOK=site` currently **fails** on a clean clone — the `bas
|
||||||
| Level 2 / 3 testing (staging, `askari` smoke) | ADR-008 | Depends on real VMs / `askari`, which don't exist yet |
|
| Level 2 / 3 testing (staging, `askari` smoke) | ADR-008 | Depends on real VMs / `askari`, which don't exist yet |
|
||||||
| Per-service roles | ADR-004 | Model defined; no service roles built |
|
| Per-service roles | ADR-004 | Model defined; no service roles built |
|
||||||
| Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
|
| Forgejo Actions CI | ADR-003 / ADR-008 | Remote is live (pushed); Actions/`act_runner` pipeline not yet built |
|
||||||
|
| Live usage stats for `/capacity-review` | ADR-012 / TODO 8.4 | `gather_usage()` stubbed; source undecided (Proxmox RRD vs PLG stack); needs the cluster |
|
||||||
|
|
||||||
## Keeping this honest
|
## Keeping this honest
|
||||||
|
|
||||||
|
|
|
||||||
37
docs/decisions/012-hardware-capacity.md
Normal file
37
docs/decisions/012-hardware-capacity.md
Normal file
|
|
@ -0,0 +1,37 @@
|
||||||
|
# ADR-012 — Hardware reference & capacity evaluation
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The repo modelled the logical/network layer (Terraform VM specs, ADR-007
|
||||||
|
topology) but not the physical layer — node CPU/RAM/disk capacity, network gear,
|
||||||
|
or which workloads are designed to run where with what headroom. There was also
|
||||||
|
no way to ask "is this well-proportioned?" — e.g. HA that isn't needed, a
|
||||||
|
workload that should move, or a node due an upgrade.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
- `docs/hardware/reference.md` is the single, hand-maintained source of truth for
|
||||||
|
physical compute + network gear and workload placement intent. Two
|
||||||
|
machine-readable tables (node capacity, workload placement) carry the numbers.
|
||||||
|
- `scripts/capacity-scan.py` (stdlib-only, like `repo-scan.py` / `tf_to_inventory.py`)
|
||||||
|
parses those tables, computes per-node allocated-vs-physical rollups, and
|
||||||
|
cross-checks workload hostnames against `terraform output -json` /
|
||||||
|
`ansible-inventory --list` to surface drift.
|
||||||
|
- `/capacity-review` reads the scan + intent columns and writes a dated report to
|
||||||
|
`docs/hardware/reviews/`, mirroring `/review-repo` → `docs/reviews/`.
|
||||||
|
- Numeric allocations live in `reference.md`, not Terraform: the current
|
||||||
|
`terraform output` exposes only `{ip, group}`. Terraform/inventory are used
|
||||||
|
only for hostname-drift cross-checks.
|
||||||
|
- **Live usage stats are a future hook.** The cluster is not stood up;
|
||||||
|
`gather_usage()` returns `available: false` and the evaluator reasons on
|
||||||
|
declared intent. The usage source (Proxmox RRD vs Prometheus/Loki/Grafana/
|
||||||
|
Alloy) is undecided — see docs/TODO.md 8.4, to be settled before any hook is
|
||||||
|
built.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- Right-sizing advice is intent-based until usage data exists; reports say so.
|
||||||
|
- `reference.md` table headers are a parser contract — changing them needs a
|
||||||
|
matching `capacity-scan.py` change.
|
||||||
|
|
||||||
|
See also: ADR-001 (architecture), ADR-007 (network), ADR-009 (TF ↔ Ansible handoff).
|
||||||
|
|
@ -11,3 +11,7 @@ dependencies (keeps them runnable anywhere without a venv).
|
||||||
plaintext secrets.
|
plaintext secrets.
|
||||||
- `repo-scan.py` — Phase-0 deterministic scan for `/review-repo` (markers, broken
|
- `repo-scan.py` — Phase-0 deterministic scan for `/review-repo` (markers, broken
|
||||||
refs, unencrypted vaults, inventory).
|
refs, unencrypted vaults, inventory).
|
||||||
|
- `capacity-scan.py` — deterministic capacity facts for `/capacity-review`: parses
|
||||||
|
the machine-readable tables in `docs/hardware/reference.md`, computes per-node
|
||||||
|
allocated-vs-physical rollups, and cross-checks workload hostnames against
|
||||||
|
Terraform output / Ansible inventory for drift. Emits JSON. See **ADR-012**.
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue