diff --git a/docs/decisions/001-architecture.md b/docs/decisions/001-architecture.md index adc3dbc..f317a79 100644 --- a/docs/decisions/001-architecture.md +++ b/docs/decisions/001-architecture.md @@ -1,5 +1,9 @@ # ADR-001 — Architecture overview +## Status + +Accepted (2026-05-30) + ## Context This document describes the overall architecture of the homelab infrastructure @@ -65,3 +69,21 @@ This architecture prioritises: - **Simplicity**: few moving parts, no orchestration layer (no Kubernetes, no Swarm) - **Reproducibility**: any host can be rebuilt from scratch via Ansible - **Legibility**: a human reading the repo can understand what runs where + +## Consequences + +Drawn from the boundaries this ADR already states: + +- The small fleet (2–5 VMs) is treated as individuals, not cattle (per Infrastructure), + and forgoing an orchestration layer is the cost of the simplicity priority (per + Decision). +- The control node `ubongo` cannot be created by the Terraform it hosts, so it is + provisioned manually — the one documented exception to Terraform-owned VM existence + (per Infrastructure / Host groups; ADR-009, ADR-015). +- Management scope is deliberately bounded: Proxmox configuration itself (storage, + clustering, networking) is out of scope, and the `control` group never runs the + `docker_host` role (per Host groups). +- Compose files are always regenerated by Ansible on deploy; no hand-edited Compose + files exist on hosts (per Service interaction model). +- The "What this repo manages" table describes the *intended* design — STATUS.md + records what is actually built (per that section). diff --git a/docs/decisions/002-security.md b/docs/decisions/002-security.md index 3c57674..5249e50 100644 --- a/docs/decisions/002-security.md +++ b/docs/decisions/002-security.md @@ -1,5 +1,9 @@ # ADR-002 — Security baseline and strategy +## Status + +Accepted (2026-05-30) + ## Context Security here is not a single control but the sum of several combined efforts — @@ -183,3 +187,27 @@ This posture was chosen to be: Out-of-scope items and conscious trade-offs are recorded in `docs/security/accepted-risks.md` rather than here, so this decision record stays stable while the risk posture evolves. + +## Consequences + +Drawn from the trade-offs, scoping, and follow-on work this ADR already states: + +- Targeted/physical adversaries are out of scope at this scale, and supply chain is + consciously deprioritized — active vuln scanning is deferred as an accepted risk + (per Threat model; `docs/security/accepted-risks.md`). +- SELinux is not used (non-native to Debian, redundant with AppArmor), recorded as an + accepted risk (per Mandatory access control). +- Some CIS L2 items require separate partitions with restrictive mount options, which + reaches into VM disk layout — a provisioning concern (Terraform / cloud-init, ADR-006), + not just the `base` role (per Hardening standard). Any impractical CIS item is exempted + into the accepted-risk register with rationale, recording named exceptions rather than a + blanket opt-out. +- Several controls and governance mechanisms are stated as planned, not yet built: + Suricata network IDS, active alerting wiring AIDE/`auditd`/`fail2ban`/Suricata plus + log-source-silence into Grafana, the `/security-review` skill and its aggregation of + every `roles/*/SECURITY.md`, and the periodic security review (per File integrity / + Governance; STATUS.md / `docs/TODO.md`). +- The per-service security bar is enforced manually in review today, pending the planned + `/security-review` automation (per Governance). +- The accepted-risk register is kept out of this ADR so the record stays stable while the + risk posture evolves (per Decision; `docs/security/accepted-risks.md`). diff --git a/docs/decisions/004-docker-model.md b/docs/decisions/004-docker-model.md index e1cd147..4bab880 100644 --- a/docs/decisions/004-docker-model.md +++ b/docs/decisions/004-docker-model.md @@ -1,5 +1,9 @@ # ADR-004 — Docker and Compose service model +## Status + +Accepted (2026-05-30) + ## Context All services run as Docker containers managed via Docker Compose. This document @@ -107,3 +111,22 @@ Docker Compose was chosen over Kubernetes/Swarm because: - Compose files are human-readable and easily auditable - No distributed state to manage - Straightforward to back up and restore + +## Consequences + +Drawn from the trade-offs and deferred items this ADR already states: + +- A shared `compose_service` engine role is intentionally not built: the ~5 standard + tasks are duplicated per role in favour of legible, self-contained roles, with a stated + revisit trigger — extract a shared engine if maintaining the duplicated mechanics + becomes painful (a pattern change touching many roles, or drift this standard alone + isn't preventing) (per "Why not a shared engine"). +- Forgoing Kubernetes/Swarm is the deliberate cost of matching complexity to a 2–5 host + fleet with no distributed state to manage (per Decision). +- User-namespace remapping is not enabled by default — evaluated per use case (per Docker + daemon configuration). +- Bare `latest` is acceptable only on the stateless tier; the stateful tier is always + pinned `tag@digest`, and image updates are a deliberate operation (per Image management; + ADR-011). +- Backup strategy is stated as defined separately, not in scope of this ADR (per Persistent + data). diff --git a/docs/decisions/005-bootstrapping.md b/docs/decisions/005-bootstrapping.md index b91a85c..f205e71 100644 --- a/docs/decisions/005-bootstrapping.md +++ b/docs/decisions/005-bootstrapping.md @@ -1,5 +1,9 @@ # ADR-005 — Host bootstrapping +## Status + +Accepted (2026-05-30) + ## Context This document defines the **cloud-init template** that managed VMs are cloned @@ -81,3 +85,19 @@ Cloud-init with Proxmox templates provides: - No manual installer interaction - A clean handoff point to Ansible - Easy rebuilds — destroy VM, clone template, run Ansible + +## Consequences + +Drawn from the trade-offs and special cases this ADR already states: + +- The cloud-init image was chosen over a manual Debian installer (slow, error-prone, + not reproducible) and over preseed/netboot (powerful but complex to maintain) (per + Approach). +- Template creation is a one-time manual procedure per Proxmox cluster, and the template + is never booted directly (per Template creation). +- There is no manual `qm clone` path for managed hosts; the full create → inventory → + configure pipeline and the Terraform↔Ansible contract live in ADR-009 (per VM + provisioning / Ansible handoff). +- The control node is the sole documented exception — `ubongo`, a physical machine + installed by hand because it cannot be created by the Terraform it hosts (chicken-and-egg); + its hardware target and recovery model live in ADR-015 (per Control node bootstrapping). diff --git a/docs/decisions/012-hardware-capacity.md b/docs/decisions/012-hardware-capacity.md index 2d0cb04..d760de3 100644 --- a/docs/decisions/012-hardware-capacity.md +++ b/docs/decisions/012-hardware-capacity.md @@ -1,5 +1,9 @@ # ADR-012 — Hardware reference & capacity evaluation +## Status + +Accepted (2026-06-01) + ## Context The repo modelled the logical/network layer (Terraform VM specs, ADR-007 diff --git a/docs/decisions/014-knowledge-sourcing.md b/docs/decisions/014-knowledge-sourcing.md index 4d1f91d..e92f036 100644 --- a/docs/decisions/014-knowledge-sourcing.md +++ b/docs/decisions/014-knowledge-sourcing.md @@ -1,5 +1,9 @@ # ADR-014 — Sourcing technical knowledge (docs and best practices) +## Status + +Accepted (2026-06-04) + ## Context Most work in boma is done by AI agents drawing on training memory, which is stale @@ -100,5 +104,27 @@ above keeps the policy working. - Commit to the principle, not a tool — degrade to `WebFetch`/`WebSearch` when plugins are absent. -See also: ADR-013 (heritage / translate-don't-transplant), ADR-011 (version pinning), -ADR-008 (testing/verification). +## Consequences + +Drawn from the follow-on work and limitations this ADR already states: + +- Verified facts carry a durable, greppable stamp; a stamp binds a fact to a pinned + version, so a `requirements` change or image upgrade marks exactly what to re-check + (per Capture / Re-verification). +- Stale-stamp detection — a `/review-repo` or `/security-review` check flagging stamps + whose recorded version no longer matches what is pinned — is a noted enhancement, not + built yet (per Re-verification). +- Any version-specific claim given from memory must be marked "from memory, unverified" + as a transparency backstop, since agent self-assessed certainty is unreliable (per + When consulting is required). +- The policy commits to the principle rather than a specific plugin, so it degrades to + `WebFetch`/`WebSearch` on a bare install; reproducing the plugin toolchain from the + repo is done via `.claude/settings.json` and `docs/runbooks/claude-code-setup.md`, + with the graceful-degradation fallback covering a fresh clone until bootstrap runs + (per Source hierarchy / Reproducibility of the toolchain). + +## Related + +- ADR-013 — heritage / translate-don't-transplant. +- ADR-011 — version pinning. +- ADR-008 — testing / verification. diff --git a/docs/decisions/015-control-host.md b/docs/decisions/015-control-host.md index 3ebeb49..58a1cf3 100644 --- a/docs/decisions/015-control-host.md +++ b/docs/decisions/015-control-host.md @@ -1,5 +1,9 @@ # ADR-015 — Control / development / AI-worker host (`ubongo`) +## Status + +Accepted (2026-06-05) + ## Context Earlier ADRs framed the control node — the host that runs Terraform and Ansible —