boma/docs/hardware/reference.md
sjat 4732730515 docs: wire ADR-025 into testing/control-host/risks/status/capacity
- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the
  "not tested in Molecule" table
- ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs
  (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing
- accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records)
- CLAUDE.md: add make test-integration[/-clean] to key-commands;
  add ADR-025 + runbook rows to further-reading
- hardware/reference.md: note one ephemeral KVM test VM on ubongo
- STATUS.md: add integration harness entry (built, lint+pytest clean;
  RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:51:22 +02:00

3.8 KiB
Raw Blame History

Hardware reference — boma

Hand-maintained source of truth for physical compute + network gear and workload placement intent. The two machine-readable tables (Node capacity, Workload placement) are parsed by scripts/capacity-scan.py — keep their headers intact. Evaluated by /capacity-review. See ADR-012.

Status: skeleton. Replace example rows with real hardware once the cluster is stood up (STATUS.md tracks real-vs-planned).

1. Physical compute

pve0

  • Model / form factor: TBD (e.g. Minisforum MS-01, mini-PC)
  • CPU: TBD (e.g. i9-13900H, 14C/20T)
  • RAM: _TBD total; max _; free DIMM slots _
  • Storage: TBD (disks → pools, e.g. 2× 2 TB NVMe → local-zfs)
  • NICs: eno1 trunk (vmbr0), eno2 corosync (vmbr1)
  • Notes: warranty, quirks

ubongo (control node — outside the cluster)

  • Model / form factor: Lenovo ThinkCentre M70q Tiny (machine type 11DUS7XP00); 1-litre tiny/USFF
  • CPU: Intel Core i3-10100T — 4 cores / 8 threads, 35 W TDP
  • RAM: 16 GB DDR4-3200 (2×8 GB SODIMM)
  • Storage: 256 GB SanDisk X600 SATA 2.5" SSD (model SD9TB8W256G1001; TCG Opal-capable, Opal unused — no disk encryption)
  • NICs: wired GbE, interface eno1, MAC 88:a4:c2:e0:ee:da
  • BIOS: Lenovo M2WKT5AA (2023-06-20)
  • Notes: always-on; control plane + AI-worker (dedicated claude user) + local test runner (Molecule/Docker) per ADR-015; not a Proxmox guest; remote access currently LAN SSH only (mesh deferred). Also runs one ephemeral KVM integration test VM (~3 GiB RAM) at a time per ADR-025 — the resource guard enforces one-at-a-time; do not run a test-integration cycle alongside a heavy Level-4 browser session (Chromium/Playwright).

fisi (backup node — outside the cluster; provisional)

  • Model / form factor: HP Elite 600 G9 (tower)
  • CPU: i-series (12th-gen), x86-64 — featherweight for a data-only restic node
  • RAM: 16 GB+ (TBD exact)
  • Storage: OS NVMe + 2× 8 TB HDD in a mirror (ZFS/mdraid → 8 TB usable, survives one disk)
  • NICs: wired GbE
  • Notes: off-cluster pull backup node (ADR-022); owns the restic repo, runs rclone→pCloud, docks the rotated USB air-gap drives. Pending: SATA power cable to the HDDs. Crown-jewel host → full base hardening. Assignment provisional (revisit when all hardware on hand).

(repeat for pve1, pve2, askari)

2. Network gear

device model ports poe throughput uplinks notes
opnsense TBD TBD n/a TBD WAN+LAN dedicated hardware
switch TBD TBD TBD TBD trunk managed, 802.1q
ap1 TBD TBD TBD TBD trunk multi-SSID per VLAN

3. Workload placement & intent

The numeric columns (cores, ram_mb, disk_gb) feed capacity-scan.py; the free-text columns feed /capacity-review's judgement.

workload node cores ram_mb disk_gb criticality ha_intent profile constraints growth
dns1 pve0 1 512 10 high pair/dns2 tiny/steady anti-affinity: dns2 on a different node flat
dns2 pve1 1 512 10 high pair/dns1 tiny/steady anti-affinity: dns1 on a different node flat

4. Node capacity (machine-readable)

Physical totals per node. Integers; ram_gb and disk_gb may be decimals.

node cores ram_gb disk_gb
pve0 20 64 4000
pve1 20 64 4000
ubongo 4 16 250
fisi 4 16 8000

5. Capacity notes

Free-text running notes for the evaluator (trends, planned moves, upgrade ideas).