sjat/boma

sjat cc772ff845 docs(adr/security): record claude NOPASSWD sudo model (ADR-015 amend + R7)

The integration-testing shakedown reversed ADR-015's "no local sudo" sub-decision:
the claude AI-worker now has NOPASSWD:ALL sudo on ubongo — without it, virsh,
nft, and journalctl all block during VM diagnosis. Compensating controls:
password-locked account, auditd/Loki attribution, repo-managed revocable drop-in.

ADR-015: dated amendment note in Status + expanded AI-worker identity section.
ADR-021: new §Sudo model (amendment 2026-06-18) — claude=NOPASSWD, sjat=password
required; former sjat NOPASSWD drop-in removed 2026-06-18 (least-privilege cleanup).
accepted-risks.md: R7 added (claude NOPASSWD:ALL on ubongo); last-reviewed updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-18 21:39:20 +02:00

6.4 KiB

Raw Blame History

Accepted security risks

Conscious security trade-offs we are choosing to live with — recorded so "what we are not doing" is explicit and revisitable, not forgotten. This register is a living document, deliberately kept out of ADR-002 (which records durable decisions) so the ADR stays stable.

Owned by ADR-002 (Security baseline and strategy). Re-challenged during the periodic security review (planned /security-review; see docs/TODO.md).

Each entry: the risk · why we accept it (rationale) · what would make us revisit (trigger).

#	Accepted risk	Rationale	Revisit trigger
R1	Active supply-chain scanning deferred — baseline hygiene is required (tiered image pinning per ADR-011 — stateful `tag@digest`, stateless rolling — prefer official/verified images; gitleaks), but images and dependencies are not actively vulnerability-scanned (Trivy/Grype) or signature-verified	Scanning only pays off with the capacity to triage its output; the realistic threat is opportunistic, not a targeted supply-chain attack	A monitoring/triage stack is live; hosting high-value data/finances for others; a relevant upstream compromise
R2	SELinux not used — no SELinux mandatory access control	AppArmor — Debian-native and enforced via the CIS baseline — already provides MAC; adding SELinux means two MAC systems, non-native to Debian, for no real gain	A service that ships and requires its own SELinux policy; threat model shifts toward targeted attackers
R3	Self-hosted mesh control plane is a public target on `askari` — the NetBird coordinator (ADR-016) exposes a management API + dashboard (TCP 80/443) and STUN (UDP 3478) on `askari`'s public IP; the management API controls the whole mesh (NetBird v0.72.4 embeds STUN in the combined server — no separate Coturn)	Self-hosting means no third-party trust and an off-site control plane that survives a homelab outage (boma's sovereignty ethos). Residual surface is on `askari` (already a public VPS) and is mitigated: TLS + embedded-IdP login, source-IP restriction where practical, `base` hardening, version-pinned NetBird (ADR-011) patched on boma's cadence	A coordinator compromise or unpatched NetBird CVE; the management plane is reachable without auth/IP-limits; the operational burden makes a hosted coordinator worth reconsidering
R4	No cryptographic WORM for logs — shipped logs are append-only via Loki's push API and copied off-site to `askari` (ADR-018), but the stored chunks are not object-locked/immutable; a root-on-`askari` attacker could edit history	Append-only push + off-site copy already defeats the realistic threat (a host attacker covering tracks survives even full-cluster compromise). True WORM (object-lock) is forensic-grade cost for boma's opportunistic threat model (R1)	Threat model shifts toward targeted/forensic; a regulatory/evidentiary need appears; `askari` itself is assessed as a likely target
R5	No disk encryption on `ubongo` — the control node's SSD (SanDisk X600 256 GB, TCG-Opal-capable but Opal unused) is unencrypted at rest, so it holds recovery-critical secrets in plaintext: the Ansible Vault password's `rbw` local cache and (future) Terraform state. Physical theft of the box would expose them	`ubongo` is always-on in a physically controlled location; compensating controls are a BIOS supervisor password and disabled external/USB + PXE boot (an attacker cannot trivially boot another OS to read the disk), and the offline-recoverable design means the irreducible root secret (Vaultwarden master password) is never stored on the box anyway. Full-disk encryption was weighed against the always-on/unattended-reboot requirement (LUKS+TPM auto-unlock or passphrase) and deferred for simplicity at this trust level	`ubongo` is relocated to a less-trusted physical location; the box starts holding additional high-value secrets; or a reinstall onto LUKS (TPM-sealed) is undertaken
R6	`le-prod-wildcard` integration runs — when `CERTS=le-prod-wildcard` is passed to `make test-integration`, the production Gandi PAT (`vault.gandi.pat`) is passed to an ephemeral local test VM via the var overlay, and transient `_acme-challenge` TXT records are written into the real `wingu.me` DNS zone to satisfy the Let's Encrypt DNS-01 challenge. A compromised or long-lived test VM could exfiltrate the PAT; the real zone is briefly (seconds) modified	Scope is on-demand only — `le-staging` is the default cert tier (`CERTS=internal` for incident repro); `le-prod-wildcard` is an explicit opt-in. Compensating controls: the VM is ephemeral and destroyed on success; it sits on an isolated libvirt NAT network (no LAN/mesh access); TXT records are auto-removed by Caddy immediately after validation; the PAT is not persisted inside the VM after the run. ADR-025 documents the cert-tier design and the three isolation invariants	The PAT is exfiltrated from a test VM; the `wingu.me` zone shows unexpected records; a `CERTS=le-prod-wildcard` run must be audited or the tier must be revoked
R7	`claude` AI-worker has `NOPASSWD:ALL` sudo on `ubongo` — the automated AI-worker account can execute any command as root on the control node without a password prompt. A compromised or misbehaving agent session could make arbitrary root-level changes to ubongo	The account is password-locked (no interactive `claude` login; `NOPASSWD` sudo is the account's only escalation path, so there is no "su to claude + sudo" attack). `auditd` + Loki attribution (ADR-018) logs every `sudo` invocation with the originating user. The drop-in (`/etc/sudoers.d/claude-ai-worker`) is repo-managed via `base__ai_worker_user` — revocable in one commit + one deploy. Single-operator homelab; all changes in git; off-machine backups (ADR-022). Full rationale: ADR-015 amendment (2026-06-18) + ADR-021 §Sudo model.	The AI-worker executes a destructive action that cannot be rolled back via git; the account key is compromised; the threat model shifts toward targeted remote attackers

Last reviewed: 2026-06-18. The prior gaps (full CIS hardening, SELinux/AppArmor, IDS) were re-challenged and adopted rather than accepted: CIS Debian L1+L2 + CIS Docker, AppArmor (enforce), AIDE file-integrity, and Suricata network IDS are now part of the security strategy (ADR-002). See STATUS.md / docs/TODO.md for build status. As CIS is implemented, any specific item that proves impractical is added here as a named exception.

6.4 KiB Raw Blame History

Accepted security risks

6.4 KiB

Raw Blame History