- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional, outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative boma.baobab.band -> boma.wingu.me transition note already added earlier - terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and <host>.boma.baobab.band per ADR-007 naming (O11) - ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections placed after Consequences, matching ADR-014/019-023 (O13) - docs/README + inventories/README: list the missing subdirs / offsite_hosts + offsite.yml merge behaviour (O14, O29 note) - ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19) - ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20) - ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21) - netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23) - ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24) - capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28) - tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9) - tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep) O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected); the fix lives in the generator for the next regeneration. make lint + pytest (57) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
50 lines
2.4 KiB
Markdown
50 lines
2.4 KiB
Markdown
# ADR-012 — Hardware reference & capacity evaluation
|
|
|
|
## Status
|
|
|
|
Accepted (2026-06-01)
|
|
|
|
## Context
|
|
|
|
The repo modelled the logical/network layer (Terraform VM specs, ADR-007
|
|
topology) but not the physical layer — node CPU/RAM/disk capacity, network gear,
|
|
or which workloads are designed to run where with what headroom. There was also
|
|
no way to ask "is this well-proportioned?" — e.g. HA that isn't needed, a
|
|
workload that should move, or a node due an upgrade.
|
|
|
|
## Decision
|
|
|
|
- `docs/hardware/reference.md` is the single, hand-maintained source of truth for
|
|
physical compute + network gear and workload placement intent. Two
|
|
machine-readable tables (node capacity, workload placement) carry the numbers.
|
|
This includes `ubongo`, the physical control node (ADR-015), even though it sits
|
|
outside the Proxmox cluster.
|
|
- `scripts/capacity-scan.py` (stdlib-only, like `repo-scan.py` / `tf_to_inventory.py`)
|
|
parses those tables, computes per-node allocated-vs-physical rollups, and
|
|
cross-checks workload hostnames against `terraform output -json` /
|
|
`ansible-inventory --list` to surface drift.
|
|
- `/capacity-review` reads the scan + intent columns and writes a dated report to
|
|
`docs/hardware/reviews/YYYY-MM-DD-capacity.md`, also overwriting
|
|
`docs/hardware/reviews/latest.md`, mirroring `/review-repo` → `docs/reviews/`.
|
|
- Numeric allocations live in `reference.md`, not Terraform: the current
|
|
`terraform output` exposes only `{ip, group}`. Terraform/inventory are used
|
|
only for hostname-drift cross-checks.
|
|
- **Live usage stats are a future hook.** The cluster is not stood up;
|
|
`gather_usage()` returns `available: false` and the evaluator reasons on
|
|
declared intent. The usage source (Proxmox RRD vs Prometheus/Loki/Grafana/
|
|
Alloy) is undecided — see docs/TODO.md 8.4, to be settled before any hook is
|
|
built.
|
|
|
|
## Consequences
|
|
|
|
- Right-sizing advice is intent-based until usage data exists; reports say so.
|
|
- `reference.md` table headers are a parser contract — changing them needs a
|
|
matching `capacity-scan.py` change.
|
|
- Log storage (ADR-018) is a tracked allocation: the cluster Loki host's retention
|
|
budget and `askari`'s security-subset volume belong in `reference.md`, and SSD
|
|
**wearout/TBW** is a monitored metric — logging is write-heavy, so wear is watched,
|
|
not assumed.
|
|
|
|
## Related
|
|
|
|
ADR-001 (architecture), ADR-007 (network), ADR-009 (TF ↔ Ansible handoff).
|