docs(adr/security): record claude NOPASSWD sudo model (ADR-015 amend + R7)

The integration-testing shakedown reversed ADR-015's "no local sudo" sub-decision:
the claude AI-worker now has NOPASSWD:ALL sudo on ubongo — without it, virsh,
nft, and journalctl all block during VM diagnosis. Compensating controls:
password-locked account, auditd/Loki attribution, repo-managed revocable drop-in.

ADR-015: dated amendment note in Status + expanded AI-worker identity section.
ADR-021: new §Sudo model (amendment 2026-06-18) — claude=NOPASSWD, sjat=password
required; former sjat NOPASSWD drop-in removed 2026-06-18 (least-privilege cleanup).
accepted-risks.md: R7 added (claude NOPASSWD:ALL on ubongo); last-reviewed updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-18 21:39:20 +02:00
parent 3fe6f68316
commit cc772ff845
3 changed files with 66 additions and 9 deletions

View file

@ -2,7 +2,10 @@
## Status
Accepted (2026-06-05)
Accepted (2026-06-05). **Amended 2026-06-18:** the `claude` AI-worker account now has
`NOPASSWD:ALL` sudo on `ubongo` — reversing the original "no local sudo" sub-decision.
The amendment is recorded in §Access & security below; rationale and accepted risk are
in ADR-021 and `docs/security/accepted-risks.md` (R7).
## Context
@ -88,12 +91,33 @@ Manual, on bare metal:
only** — key-only, with password auth and root login disabled — until the NetBird mesh
(ADR-016) is stood up.
- **AI-worker identity:** `ubongo` runs the AI worker under a dedicated,
password-locked `claude` user (in the `docker` group for Molecule; **no local sudo**
boma deploys reach the fleet over SSH as the `ansible` user, not via local root). It is
reached via `sudo -iu claude` or its own SSH key. The rationale is **attribution +
revocation, not containment**: auditd/Loki (ADR-018) can separate human from agent
actions, and the account/key can be revoked without touching the operator's access.
(ADR-021 left the on-`ubongo` agent identity unspecified; this records it.)
password-locked `claude` user (in the `docker` and `libvirt` groups; **`NOPASSWD:ALL`
sudo** via a repo-managed drop-in — see amendment below). It is reached via `sudo -iu
claude` or its own SSH key. The rationale is **attribution + revocation, not
containment**: auditd/Loki (ADR-018) can separate human from agent actions, and the
account/key can be revoked without touching the operator's access. (ADR-021 left the
on-`ubongo` agent identity unspecified; this records it.)
**Amendment (2026-06-18) — `claude` now has `NOPASSWD:ALL` sudo.** During the
integration-testing harness shakedown, the original "no local sudo" sub-decision was
reversed. No-sudo blocked the AI-worker from diagnosing a failed VM: `virsh`,
`virt-install`, `cloud-localds`, `journalctl`, `nft` — nearly all low-level
diagnostic commands — require root. The AI-worker must autonomously spin up,
inspect, and tear down test VMs without operator hand-holding; that is the harness's
core value proposition. Compensating controls make the risk acceptable:
1. `claude`'s password is **locked** (no interactive login, no `su claude` without the
operator's own credentials) — `NOPASSWD` sudo is the *only* sudo path.
2. `auditd` + Loki attribution (ADR-018) separates human from agent root actions.
3. The drop-in is **repo-managed** via `base__ai_worker_user` — revocable in one commit
and one deploy.
4. Single-operator homelab: everything in git, off-machine backups (ADR-022).
The operator (`sjat`) uses **password-required sudo** via the `sudo` group; their
former `NOPASSWD` drop-in was removed 2026-06-18 as redundant once `claude` had sudo
(least-privilege cleanup). The accepted risk is registered as R7 in
`docs/security/accepted-risks.md`. ADR-021 records the resulting sudo model for both
accounts.
- **Disk encryption:** `ubongo`'s SSD is **not encrypted at rest** — the SanDisk X600 is
TCG-Opal-capable but Opal is unused. This is an accepted risk recorded in
`docs/security/accepted-risks.md` (control-node disk not encrypted at rest),

View file

@ -3,7 +3,9 @@
## Status
Accepted (2026-06-09). Resolves TODO 7.2 (what to set up on hosts given direct access
will be rare) and TODO 3.2 (the service admin-API access question).
will be rare) and TODO 3.2 (the service admin-API access question). **Amended
2026-06-18:** the on-`ubongo` sudo model for the two local accounts is now settled
(see §Sudo model on `ubongo` below).
**Doctrine ADR.** It pins the operational-access doctrine, the declarative `access__*`
data model, the rendered `ACCESS.md` record, and the `/check-access` verifier. It does
@ -163,6 +165,36 @@ exists and `/check-access` is green (or a deviation is recorded in `accepted-ris
No scaffold change — same manual-copy-plus-review pattern the sibling records
(`SECURITY.md`/`VERIFY.md`) use.
### Sudo model on `ubongo` (amendment 2026-06-18)
The original ADR left on-`ubongo` local sudo unspecified. The integration-testing
harness shakedown settled it:
| Account | Role | Sudo |
|---|---|---|
| `claude` | Automated AI-worker | `NOPASSWD:ALL` via repo-managed drop-in (`base__ai_worker_user`) |
| `sjat` | Human operator | Password-required sudo via the `sudo` group |
**Rationale for `claude NOPASSWD`.** No-sudo blocked the AI-worker from diagnosing a
failed test VM: `virsh`, `virt-install`, `cloud-localds`, `nft`, `journalctl`
almost every low-level diagnostic tool — require root. The harness's core value is
autonomous spin-up → apply → reboot → assert → diagnose; that loop collapses without
local root access.
**Compensating controls (R7 in `docs/security/accepted-risks.md`):**
- `claude`'s password is locked — `NOPASSWD` is the account's *only* sudo path; no
interactive login is possible.
- `auditd` + Loki attribution (ADR-018) separates human from agent root actions in the
audit trail.
- The drop-in is repo-managed and revocable in one commit + one deploy.
- Single-operator homelab; everything in git; off-machine backups (ADR-022).
**`sjat` NOPASSWD removed.** The operator's former `NOPASSWD` drop-in
(`/etc/sudoers.d/sjat-ansible`, added as an interim measure during M5 NetBird
enrolment) was removed 2026-06-18. It was redundant once `claude` held sudo, and its
removal restores least-privilege for the human operator. `sjat` retains full sudo
capability via the `sudo` group (password required).
## Consequences
- Every host and service has at least one documented, verifiable way in — and a verifier

View file

@ -19,8 +19,9 @@ revisit (trigger).
| R4 | **No cryptographic WORM for logs** — shipped logs are append-only via Loki's push API and copied off-site to `askari` (ADR-018), but the stored chunks are not object-locked/immutable; a root-on-`askari` attacker could edit history | Append-only push + off-site copy already defeats the realistic threat (a host attacker covering tracks survives even full-cluster compromise). True WORM (object-lock) is forensic-grade cost for boma's opportunistic threat model (R1) | Threat model shifts toward targeted/forensic; a regulatory/evidentiary need appears; `askari` itself is assessed as a likely target |
| R5 | **No disk encryption on `ubongo`** — the control node's SSD (SanDisk X600 256 GB, TCG-Opal-capable but Opal unused) is unencrypted at rest, so it holds recovery-critical secrets in plaintext: the Ansible Vault password's `rbw` local cache and (future) Terraform state. Physical theft of the box would expose them | `ubongo` is always-on in a physically controlled location; compensating controls are a **BIOS supervisor password** and **disabled external/USB + PXE boot** (an attacker cannot trivially boot another OS to read the disk), and the offline-recoverable design means the irreducible root secret (Vaultwarden master password) is never stored on the box anyway. Full-disk encryption was weighed against the always-on/unattended-reboot requirement (LUKS+TPM auto-unlock or passphrase) and deferred for simplicity at this trust level | `ubongo` is relocated to a less-trusted physical location; the box starts holding additional high-value secrets; or a reinstall onto LUKS (TPM-sealed) is undertaken |
| R6 | **`le-prod-wildcard` integration runs** — when `CERTS=le-prod-wildcard` is passed to `make test-integration`, the production Gandi PAT (`vault.gandi.pat`) is passed to an ephemeral local test VM via the var overlay, and transient `_acme-challenge` TXT records are written into the real `wingu.me` DNS zone to satisfy the Let's Encrypt DNS-01 challenge. A compromised or long-lived test VM could exfiltrate the PAT; the real zone is briefly (seconds) modified | Scope is **on-demand only**`le-staging` is the default cert tier (`CERTS=internal` for incident repro); `le-prod-wildcard` is an explicit opt-in. Compensating controls: the VM is ephemeral and destroyed on success; it sits on an isolated libvirt NAT network (no LAN/mesh access); TXT records are auto-removed by Caddy immediately after validation; the PAT is not persisted inside the VM after the run. ADR-025 documents the cert-tier design and the three isolation invariants | The PAT is exfiltrated from a test VM; the `wingu.me` zone shows unexpected records; a `CERTS=le-prod-wildcard` run must be audited or the tier must be revoked |
| R7 | **`claude` AI-worker has `NOPASSWD:ALL` sudo on `ubongo`** — the automated AI-worker account can execute any command as root on the control node without a password prompt. A compromised or misbehaving agent session could make arbitrary root-level changes to ubongo | The account is **password-locked** (no interactive `claude` login; `NOPASSWD` sudo is the account's only escalation path, so there is no "su to claude + sudo" attack). `auditd` + Loki attribution (ADR-018) logs every `sudo` invocation with the originating user. The drop-in (`/etc/sudoers.d/claude-ai-worker`) is repo-managed via `base__ai_worker_user` — revocable in one commit + one deploy. Single-operator homelab; all changes in git; off-machine backups (ADR-022). Full rationale: ADR-015 amendment (2026-06-18) + ADR-021 §Sudo model. | The AI-worker executes a destructive action that cannot be rolled back via git; the account key is compromised; the threat model shifts toward targeted remote attackers |
_Last reviewed: 2026-06-11. The prior gaps (full CIS hardening, SELinux/AppArmor,
_Last reviewed: 2026-06-18. The prior gaps (full CIS hardening, SELinux/AppArmor,
IDS) were re-challenged and **adopted rather than accepted**: CIS Debian L1+L2 + CIS
Docker, AppArmor (enforce), AIDE file-integrity, and Suricata network IDS are now
part of the security strategy (ADR-002). See STATUS.md / `docs/TODO.md` for build