Commit graph

21 commits

Author SHA1 Message Date
77a20b8d40 docs(runbook): netbird-client mesh-drop / DNS troubleshooting
Document the 2026-06-18 incident class: a road-warrior laptop losing DNS on a network transition strands NetBird (can't resolve the coordinator FQDN), taking ubongo unreachable until DNS recovers. Adds triage (local DNS vs coordinator), device mitigations (reliable resolvers + hosts-file pin), the non-mesh LAN break-glass to ubongo, and why ubongo is relay-only (deferred mesh-hardening, not a bug) — including the break-glass rule that hardening must preserve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 22:30:41 +02:00
f51ae1a13d docs(runbook): integration-testing runbook + pre-flight cross-links
- New docs/runbooks/integration-testing.md: when to use (firewall/
  sshd/boot/Docker changes); make test-integration commands; lower-
  level driver sub-commands; cert tier guidance; diagnostics dir;
  VM inspection (virsh console / SSH); safety invariants; resource
  constraints; adding a new profile; self-validating acceptance test.
- docs/runbooks/new-host.md: pre-flight warning before deploying
  lockout-risky changes (firewall/sshd/boot) while break-glass is open
- docs/runbooks/new-role.md: step 13 pre-flight for lockout-risky roles

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:59:06 +02:00
c1323a3f29 feat(make): registry-login via vaulted Forgejo token (kaizen)
scripts/registry-login.sh reads vault.forgejo.registry_token and pipes it to
docker login --password-stdin (never echoed, never on argv); 'make registry-login'
wires it with the venv binaries. Adds the operator-minted CHANGEME vault stub
(fill via make edit-vault) and a per-machine prereq note in the claude-code-setup
runbook, so 'make caddy-image-push'/'molecule-image-push' become agent-completable
non-interactively. Consumes the 2026-06-15 signal in docs/FRICTION.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 17:50:07 +02:00
5d14efc864 docs: Phase 1 complete — clients enrolled + NetBird client runbook
mamba + work laptop enrolled in the mesh → ubongo reachable from anywhere; the
mobile-access goal is met and Phase 1 (remote access) is complete. Adds
docs/runbooks/netbird-client.md (reusable client-enrollment runbook) + STATUS/
ROADMAP flips + CLAUDE.md reading-table entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 17:11:32 +02:00
175777e36a docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)
- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:06:33 +02:00
349d10d65c docs: record ubongo physical build (2026-06-11)
Move ubongo to 'Built (partial)' in STATUS; fill real M70q hardware specs
(i3-10100T, 16 GB, 256 GB SanDisk X600 SATA, no disk encryption). Record in
ADR-015 the dedicated claude AI-worker identity, LAN-SSH-only operational
reality, and the no-encryption decision; close the rbw offline-cache
recovery-verification item (ADR-015 + rotate-secrets). Add accepted-risk R5
(control-node disk unencrypted at rest) with its compensating controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:26 +02:00
91713127cb docs(kaizen): migrate gotchas to docs; curate FRICTION log (2026-06-10 review)
- New docs/testing/gotchas.md (nft iif/iifname, Molecule ansible_host,
  apply-path coverage blind spot, render-nft-c pattern); pointer from ADR-008.
- claude-code-setup.md gains "Environment gotchas" (hooks-need-restart,
  pre-commit stashes unstaged, rbw sync cache, zsh word-split).
- FRICTION.md restructured into Open signals + a decisions ledger; consumed
  signals archived with where their resolution now lives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 12:51:39 +02:00
01e47d0890 docs(backup): add BACKUP.md step to new-role runbook (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 11:21:56 +02:00
649925b303 docs(access): gate ACCESS.md in checklist + new-role runbook (ADR-021)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:46:51 +02:00
f0d189ca09 Thread the VERIFY.md convention through ADR-004/new-role/README
Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the
ADR-004 service-role file table, as a new-role runbook step, and the README
docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:52:42 +02:00
666ad42634 review-repo: fix DNS-write contradictions + stale control-node/template refs
Auto-fixes from /review-repo:
- ADR-005 + new-host.md: drop "Terraform writes the host's DNS A record"
  (contradicts ADR-009 — dns role owns the zone; recurs from the 2026-05-30 run)
- ADR-005: control node is physical ubongo, not cloned from the template (ADR-015)
- CLAUDE.md: add the VERIFY.md template to Further reading
- TODO.md: typo fixes (we we / seperate)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:23:16 +02:00
cd62c5e098 new-host runbook: mesh VPN resolved to NetBird (ADR-016) 2026-06-05 11:52:22 +02:00
a2db8058e7 rotate-secrets: document offline vault break-glass for ubongo 2026-06-05 09:45:27 +02:00
b89ca8835a new-host runbook: control node ubongo is bare-metal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 09:44:31 +02:00
abb5c7a12f Make the Claude Code toolchain reproducible (TODO 10.7)
Reviewed the Claude Code config against boma's capabilities and committed a
reproducible, leaner toolchain:

- .claude/settings.json now declares extraKnownMarketplaces + enabledPlugins so a
  fresh clone prompts to install the active set: superpowers, context7, terraform
  (we use TF, ADR-006), claude-md-management (doc/ADR-heavy). Drops code-simplifier.
- Adds a conservative, read-only/verify permissions allowlist (git status/diff/log,
  make lint/test/check, pytest, rbw unlocked, ls/cat/rg/find) — mutations and
  outward/destructive commands stay gated, consistent with ADR-002.
- docs/runbooks/claude-code-setup.md: per-machine bootstrap, the deferred
  enable-when plugins (security-guidance/semgrep, playwright, hookify, skill-creator),
  rbw/venv prerequisites, and a note to keep the dangerous-mode prompt on.

Closes TODO 10.7. Plugin install remains a per-machine /plugin action (no native
auto-install).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 21:41:54 +02:00
3b029352b6 Add per-service SECURITY.md convention; one role per service
Revise ADR-004 to a service-role standard: every service is its own
self-contained role with a required file set including SECURITY.md, uniform
deploy mechanics, and a deferred shared-engine option (with revisit trigger)
recorded in the ADR.

Add the per-service security record:
- docs/security/service-security-template.md — canonical SECURITY.md template
  (exposure, checklist status, service-specific hardening, residual risks)
- roles/<service>/SECURITY.md is where each service records how it meets the bar;
  /security-review aggregates roles/*/SECURITY.md and cross-checks against config
- service-checklist.md noted as the generic bar the record answers

Wire-up: new-role runbook step writes SECURITY.md from the template; ADR-002
governance bullet points at it; CLAUDE.md role conventions require it and mandate
one-role-per-service; STATUS records the convention as defined-not-yet-applied.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:09:33 +02:00
f338bccd46 Expand ADR-002 into a security baseline + strategy
Add a managerial security frame on top of the host baseline: explicit threat
model (opportunistic external, lateral movement/blast radius, operator/agent
error; supply chain accepted-lower-priority), security principles, and four
governance mechanisms that ADR-002 establishes and links out to:

- docs/security/service-checklist.md — per-service security bar (referenced
  from the new-role runbook)
- docs/security/accepted-risks.md — living accepted-risk register (R1-R4)
- planned /security-review skill (TODO 8.5)
- agent guardrails in CLAUDE.md "what Claude must not do"

STATUS.md records the frame as present (manual enforcement) and /security-review
as planned-not-built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:39:51 +02:00
45ab6ced01 Purge residual .vault_pass references (review R1-R5)
Point ADR-005, the new-host runbook, CONTRIBUTING, and AGENTS at the
rbw/Vaultwarden flow instead of a .vault_pass file. Also record the cron-section
idea in docs/TODO.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:17:25 +02:00
703f1716e5 review-repo: harden scanner, apply safe fixes, record first review
First /review-repo run on boma. Hardened repo-scan.py (no TODO.md/prose false
positives). Applied 7 safe fixes (DNS staleness x2, STATUS factual correction,
hosts.yml path generalisation, trunk-based wording x2, scripts/README). Recorded
the run and 17 open findings in docs/reviews/2026-05-30-*.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:10:58 +02:00
4ee1b66e23 Source vault password from Vaultwarden via rbw; nest vault structure
Master vault password is fetched from Vaultwarden via the rbw agent
(scripts/vault-pass-client.sh, wired as vault_password_file) instead of a
plaintext .vault_pass. Vault secrets use a nested vault.<service>.<key> map.
Encrypted vault.yml files are excluded from lint. Includes the host rename in
Makefile and STATUS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:16:35 +02:00
fe4228fb38 Add architecture decision records and runbooks
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:10:01 +02:00