docs(adr/security): record claude NOPASSWD sudo model (ADR-015 amend + R7)
The integration-testing shakedown reversed ADR-015's "no local sudo" sub-decision: the claude AI-worker now has NOPASSWD:ALL sudo on ubongo — without it, virsh, nft, and journalctl all block during VM diagnosis. Compensating controls: password-locked account, auditd/Loki attribution, repo-managed revocable drop-in. ADR-015: dated amendment note in Status + expanded AI-worker identity section. ADR-021: new §Sudo model (amendment 2026-06-18) — claude=NOPASSWD, sjat=password required; former sjat NOPASSWD drop-in removed 2026-06-18 (least-privilege cleanup). accepted-risks.md: R7 added (claude NOPASSWD:ALL on ubongo); last-reviewed updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
3fe6f68316
commit
cc772ff845
3 changed files with 66 additions and 9 deletions
|
|
@ -2,7 +2,10 @@
|
|||
|
||||
## Status
|
||||
|
||||
Accepted (2026-06-05)
|
||||
Accepted (2026-06-05). **Amended 2026-06-18:** the `claude` AI-worker account now has
|
||||
`NOPASSWD:ALL` sudo on `ubongo` — reversing the original "no local sudo" sub-decision.
|
||||
The amendment is recorded in §Access & security below; rationale and accepted risk are
|
||||
in ADR-021 and `docs/security/accepted-risks.md` (R7).
|
||||
|
||||
## Context
|
||||
|
||||
|
|
@ -88,12 +91,33 @@ Manual, on bare metal:
|
|||
only** — key-only, with password auth and root login disabled — until the NetBird mesh
|
||||
(ADR-016) is stood up.
|
||||
- **AI-worker identity:** `ubongo` runs the AI worker under a dedicated,
|
||||
password-locked `claude` user (in the `docker` group for Molecule; **no local sudo** —
|
||||
boma deploys reach the fleet over SSH as the `ansible` user, not via local root). It is
|
||||
reached via `sudo -iu claude` or its own SSH key. The rationale is **attribution +
|
||||
revocation, not containment**: auditd/Loki (ADR-018) can separate human from agent
|
||||
actions, and the account/key can be revoked without touching the operator's access.
|
||||
(ADR-021 left the on-`ubongo` agent identity unspecified; this records it.)
|
||||
password-locked `claude` user (in the `docker` and `libvirt` groups; **`NOPASSWD:ALL`
|
||||
sudo** via a repo-managed drop-in — see amendment below). It is reached via `sudo -iu
|
||||
claude` or its own SSH key. The rationale is **attribution + revocation, not
|
||||
containment**: auditd/Loki (ADR-018) can separate human from agent actions, and the
|
||||
account/key can be revoked without touching the operator's access. (ADR-021 left the
|
||||
on-`ubongo` agent identity unspecified; this records it.)
|
||||
|
||||
**Amendment (2026-06-18) — `claude` now has `NOPASSWD:ALL` sudo.** During the
|
||||
integration-testing harness shakedown, the original "no local sudo" sub-decision was
|
||||
reversed. No-sudo blocked the AI-worker from diagnosing a failed VM: `virsh`,
|
||||
`virt-install`, `cloud-localds`, `journalctl`, `nft` — nearly all low-level
|
||||
diagnostic commands — require root. The AI-worker must autonomously spin up,
|
||||
inspect, and tear down test VMs without operator hand-holding; that is the harness's
|
||||
core value proposition. Compensating controls make the risk acceptable:
|
||||
|
||||
1. `claude`'s password is **locked** (no interactive login, no `su claude` without the
|
||||
operator's own credentials) — `NOPASSWD` sudo is the *only* sudo path.
|
||||
2. `auditd` + Loki attribution (ADR-018) separates human from agent root actions.
|
||||
3. The drop-in is **repo-managed** via `base__ai_worker_user` — revocable in one commit
|
||||
and one deploy.
|
||||
4. Single-operator homelab: everything in git, off-machine backups (ADR-022).
|
||||
|
||||
The operator (`sjat`) uses **password-required sudo** via the `sudo` group; their
|
||||
former `NOPASSWD` drop-in was removed 2026-06-18 as redundant once `claude` had sudo
|
||||
(least-privilege cleanup). The accepted risk is registered as R7 in
|
||||
`docs/security/accepted-risks.md`. ADR-021 records the resulting sudo model for both
|
||||
accounts.
|
||||
- **Disk encryption:** `ubongo`'s SSD is **not encrypted at rest** — the SanDisk X600 is
|
||||
TCG-Opal-capable but Opal is unused. This is an accepted risk recorded in
|
||||
`docs/security/accepted-risks.md` (control-node disk not encrypted at rest),
|
||||
|
|
|
|||
|
|
@ -3,7 +3,9 @@
|
|||
## Status
|
||||
|
||||
Accepted (2026-06-09). Resolves TODO 7.2 (what to set up on hosts given direct access
|
||||
will be rare) and TODO 3.2 (the service admin-API access question).
|
||||
will be rare) and TODO 3.2 (the service admin-API access question). **Amended
|
||||
2026-06-18:** the on-`ubongo` sudo model for the two local accounts is now settled
|
||||
(see §Sudo model on `ubongo` below).
|
||||
|
||||
**Doctrine ADR.** It pins the operational-access doctrine, the declarative `access__*`
|
||||
data model, the rendered `ACCESS.md` record, and the `/check-access` verifier. It does
|
||||
|
|
@ -163,6 +165,36 @@ exists and `/check-access` is green (or a deviation is recorded in `accepted-ris
|
|||
No scaffold change — same manual-copy-plus-review pattern the sibling records
|
||||
(`SECURITY.md`/`VERIFY.md`) use.
|
||||
|
||||
### Sudo model on `ubongo` (amendment 2026-06-18)
|
||||
|
||||
The original ADR left on-`ubongo` local sudo unspecified. The integration-testing
|
||||
harness shakedown settled it:
|
||||
|
||||
| Account | Role | Sudo |
|
||||
|---|---|---|
|
||||
| `claude` | Automated AI-worker | `NOPASSWD:ALL` via repo-managed drop-in (`base__ai_worker_user`) |
|
||||
| `sjat` | Human operator | Password-required sudo via the `sudo` group |
|
||||
|
||||
**Rationale for `claude NOPASSWD`.** No-sudo blocked the AI-worker from diagnosing a
|
||||
failed test VM: `virsh`, `virt-install`, `cloud-localds`, `nft`, `journalctl` —
|
||||
almost every low-level diagnostic tool — require root. The harness's core value is
|
||||
autonomous spin-up → apply → reboot → assert → diagnose; that loop collapses without
|
||||
local root access.
|
||||
|
||||
**Compensating controls (R7 in `docs/security/accepted-risks.md`):**
|
||||
- `claude`'s password is locked — `NOPASSWD` is the account's *only* sudo path; no
|
||||
interactive login is possible.
|
||||
- `auditd` + Loki attribution (ADR-018) separates human from agent root actions in the
|
||||
audit trail.
|
||||
- The drop-in is repo-managed and revocable in one commit + one deploy.
|
||||
- Single-operator homelab; everything in git; off-machine backups (ADR-022).
|
||||
|
||||
**`sjat` NOPASSWD removed.** The operator's former `NOPASSWD` drop-in
|
||||
(`/etc/sudoers.d/sjat-ansible`, added as an interim measure during M5 NetBird
|
||||
enrolment) was removed 2026-06-18. It was redundant once `claude` held sudo, and its
|
||||
removal restores least-privilege for the human operator. `sjat` retains full sudo
|
||||
capability via the `sudo` group (password required).
|
||||
|
||||
## Consequences
|
||||
|
||||
- Every host and service has at least one documented, verifiable way in — and a verifier
|
||||
|
|
|
|||
|
|
@ -19,8 +19,9 @@ revisit (trigger).
|
|||
| R4 | **No cryptographic WORM for logs** — shipped logs are append-only via Loki's push API and copied off-site to `askari` (ADR-018), but the stored chunks are not object-locked/immutable; a root-on-`askari` attacker could edit history | Append-only push + off-site copy already defeats the realistic threat (a host attacker covering tracks survives even full-cluster compromise). True WORM (object-lock) is forensic-grade cost for boma's opportunistic threat model (R1) | Threat model shifts toward targeted/forensic; a regulatory/evidentiary need appears; `askari` itself is assessed as a likely target |
|
||||
| R5 | **No disk encryption on `ubongo`** — the control node's SSD (SanDisk X600 256 GB, TCG-Opal-capable but Opal unused) is unencrypted at rest, so it holds recovery-critical secrets in plaintext: the Ansible Vault password's `rbw` local cache and (future) Terraform state. Physical theft of the box would expose them | `ubongo` is always-on in a physically controlled location; compensating controls are a **BIOS supervisor password** and **disabled external/USB + PXE boot** (an attacker cannot trivially boot another OS to read the disk), and the offline-recoverable design means the irreducible root secret (Vaultwarden master password) is never stored on the box anyway. Full-disk encryption was weighed against the always-on/unattended-reboot requirement (LUKS+TPM auto-unlock or passphrase) and deferred for simplicity at this trust level | `ubongo` is relocated to a less-trusted physical location; the box starts holding additional high-value secrets; or a reinstall onto LUKS (TPM-sealed) is undertaken |
|
||||
| R6 | **`le-prod-wildcard` integration runs** — when `CERTS=le-prod-wildcard` is passed to `make test-integration`, the production Gandi PAT (`vault.gandi.pat`) is passed to an ephemeral local test VM via the var overlay, and transient `_acme-challenge` TXT records are written into the real `wingu.me` DNS zone to satisfy the Let's Encrypt DNS-01 challenge. A compromised or long-lived test VM could exfiltrate the PAT; the real zone is briefly (seconds) modified | Scope is **on-demand only** — `le-staging` is the default cert tier (`CERTS=internal` for incident repro); `le-prod-wildcard` is an explicit opt-in. Compensating controls: the VM is ephemeral and destroyed on success; it sits on an isolated libvirt NAT network (no LAN/mesh access); TXT records are auto-removed by Caddy immediately after validation; the PAT is not persisted inside the VM after the run. ADR-025 documents the cert-tier design and the three isolation invariants | The PAT is exfiltrated from a test VM; the `wingu.me` zone shows unexpected records; a `CERTS=le-prod-wildcard` run must be audited or the tier must be revoked |
|
||||
| R7 | **`claude` AI-worker has `NOPASSWD:ALL` sudo on `ubongo`** — the automated AI-worker account can execute any command as root on the control node without a password prompt. A compromised or misbehaving agent session could make arbitrary root-level changes to ubongo | The account is **password-locked** (no interactive `claude` login; `NOPASSWD` sudo is the account's only escalation path, so there is no "su to claude + sudo" attack). `auditd` + Loki attribution (ADR-018) logs every `sudo` invocation with the originating user. The drop-in (`/etc/sudoers.d/claude-ai-worker`) is repo-managed via `base__ai_worker_user` — revocable in one commit + one deploy. Single-operator homelab; all changes in git; off-machine backups (ADR-022). Full rationale: ADR-015 amendment (2026-06-18) + ADR-021 §Sudo model. | The AI-worker executes a destructive action that cannot be rolled back via git; the account key is compromised; the threat model shifts toward targeted remote attackers |
|
||||
|
||||
_Last reviewed: 2026-06-11. The prior gaps (full CIS hardening, SELinux/AppArmor,
|
||||
_Last reviewed: 2026-06-18. The prior gaps (full CIS hardening, SELinux/AppArmor,
|
||||
IDS) were re-challenged and **adopted rather than accepted**: CIS Debian L1+L2 + CIS
|
||||
Docker, AppArmor (enforce), AIDE file-integrity, and Suricata network IDS are now
|
||||
part of the security strategy (ADR-002). See STATUS.md / `docs/TODO.md` for build
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue