Commit graph

36 commits

Author SHA1 Message Date
4732730515 docs: wire ADR-025 into testing/control-host/risks/status/capacity
- ADR-008: add reboot-survivability gap row + ADR-025 pointer to the
  "not tested in Molecule" table
- ADR-015: reconcile "not a hypervisor" with ephemeral KVM test VMs
  (ADR-025); note ~3 GiB test-VM RAM against the 16 GiB sizing
- accepted-risks: add R6 (le-prod-wildcard PAT + transient TXT records)
- CLAUDE.md: add make test-integration[/-clean] to key-commands;
  add ADR-025 + runbook rows to further-reading
- hardware/reference.md: note one ephemeral KVM test VM on ubongo
- STATUS.md: add integration harness entry (built, lint+pytest clean;
  RED/GREEN acceptance PENDING ubongo live pass); TODO 2.4 stays open

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 12:51:22 +02:00
5d14efc864 docs: Phase 1 complete — clients enrolled + NetBird client runbook
mamba + work laptop enrolled in the mesh → ubongo reachable from anywhere; the
mobile-access goal is met and Phase 1 (remote access) is complete. Adds
docs/runbooks/netbird-client.md (reusable client-enrollment runbook) + STATUS/
ROADMAP flips + CLAUDE.md reading-table entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 17:11:32 +02:00
4c8fb9e03b docs: M5 mesh enrollment — ubongo + askari on the mesh
STATUS: base mesh concern built + applied; ubongo (100.99.146.14) + askari
(100.99.226.39) enrolled, link verified; ubongo agent-management access (sjat key
+ NOPASSWD sudo) recorded. ROADMAP M5: infra done, laptops = operator step,
mesh-hardening split out as the deferred follow-on. FRICTION: docs-only-commit rbw
guard + control-node self-management access gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 16:40:02 +02:00
684718f4a5 docs(netbird): M4b done — STATUS/ROADMAP/risks/friction
netbird_coordinator built + applied to askari (first service role, dashboard live).
STATUS: new "real and working" row + askari/coordinator rows updated. ROADMAP: M4b
done, M5 (peer enrol) next, recorded the v0.72.4 combined-container/embedded-Dex/
no-Coturn reality. accepted-risks R3: Coturn -> STUN wording. FRICTION: single-file
bind-mount stale-inode gotcha + check-before-first-deploy artifact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 07:48:53 +02:00
b3468b34e4 docs: record Caddy/Gandi DNS-01 as resolved + proven (was M4a deferral)
ADR-024 Status/Consequences, STATUS.md, ROADMAP M4a, and the FRICTION ledger now
record that the DNS-01 path is built and proven, with the root cause of the M4a
failure (version skew: pre-Bearer libdns/gandi sent the deprecated Apikey header;
plus building on a Hetzner IP). Traefik was reconsidered and rejected again — lego's
Gandi provider has the same PAT-vs-Apikey question, so it would not have helped.

Dated review reports and spec/plan snapshots are left as historical records.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:57:55 +02:00
d1e1e38879 feat(kaizen): nudge in /review-repo; STATUS + TODO
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:27:23 +02:00
175777e36a docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)
- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:06:33 +02:00
1862b7a828 docs(m4a): HTTP-01 for askari; ADR-024 cert-method-follows-exposure; STATUS/roadmap/friction
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:14:38 +02:00
9d787a4f53 docs(base): M3 done — ssh hardening + fail2ban applied to askari; STATUS + roadmap
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:55:22 +02:00
a1c0f4814b feat(askari): publish askari.wingu.me; mark M2 applied (askari live)
askari provisioned + bootstrapped (cx23/hel1/Debian 13.5, cloud-init ansible user
+ sudo, cloud firewall SSH-from-ubongo-WAN, reachable, in offsite_hosts). Added
askari.wingu.me A -> 77.42.120.136 via public_dns. STATUS: askari moves to
'Real and working today'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:26:26 +02:00
3588904528 docs(askari): amend ADR-006/009/020/007/016 for TF-provisioned offsite host; STATUS (apply pending)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 12:09:20 +02:00
993d7885e4 docs: mark M1 applied (STATUS); log item.values + Gandi null-MX gotchas
M1 public_dns applied to wingu.me (purge + SPF/DMARC, idempotent). Friction:
item.values dict-method collision, Gandi null-MX rejection, and the apply=false-
Molecule/data-only-pytest gap that let both bugs reach a live apply.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:58:03 +02:00
f170ffd936 docs(public_dns): amend ADR-007 to wingu.me/Gandi; resolve TODO 4; STATUS + CAPABILITIES
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:38:45 +02:00
03d33f83dd fix(O1): scaffold docker_host role so make lint passes on main
playbooks/site.yml imports the docker_host role, but it didn't exist, so
ansible-lint's syntax-check failed on a clean checkout — breaking CLAUDE.md's
"main must always work" / "Never skip lint" (top open finding O1 from the
2026-06-11 review).

Scaffold docker_host as a proper placeholder via the prescribed mechanism
(make new-role): filled meta/main.yml + README, an honest no-task tasks/main.yml
documenting planned scope (Docker engine + Compose, daemon hardening, nftables.d
container rules per ADR-004/020), and the standard molecule scenario. This
preserves site.yml's full-standard-state intent rather than dropping the play.

Update STATUS.md (docker_host moves from "Not in git" to "scaffolded, no tasks")
and the role/playbook READMEs to match.

make lint: 0 failures, 0 warnings; check-tags OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:53:55 +02:00
67f2aba9d8 STATUS: record dev_env (built+applied) and working deploy path
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:21:36 +02:00
349d10d65c docs: record ubongo physical build (2026-06-11)
Move ubongo to 'Built (partial)' in STATUS; fill real M70q hardware specs
(i3-10100T, 16 GB, 256 GB SanDisk X600 SATA, no disk encryption). Record in
ADR-015 the dedicated claude AI-worker identity, LAN-SSH-only operational
reality, and the no-encryption decision; close the rbw offline-cache
recovery-verification item (ADR-015 + rotate-secrets). Add accepted-risk R5
(control-node disk unencrypted at rest) with its compensating controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 10:32:26 +02:00
f5c97d1f36 docs(backup): record ADR-022; wire into CLAUDE.md, STATUS, TODO
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 11:19:01 +02:00
13f0d482bd docs(access): wire ADR-021 into CLAUDE.md, STATUS, TODO
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 17:48:31 +02:00
402913efb3 fix(base): make rollback snapshot restorable (flush-prefixed)
Bare 'nft list ruleset' has no leading flush, so the timer's 'nft -f rollback'
was a no-op on first apply (empty file) and errored ('table exists') on later
applies — the auto-rollback silently did nothing, defeating the askari lockout
safeguard. Prepend 'flush ruleset' so the revert is atomic + self-contained.
Verified the snapshot->lockout->revert round-trip in an isolated netns.
Also fix stale STATUS prose (base is partially built, not absent).
2026-06-06 19:15:38 +02:00
90683c7912 docs: record base firewall concern built (ADR-020 host layer)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 19:10:27 +02:00
86bb3559ad STATUS: record tag standard + enforcement (ADR-019)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 15:23:58 +02:00
1021c6d25d STATUS: record logging pipeline + security alerting (ADR-018) 2026-06-06 07:06:06 +02:00
d5c62c99ad STATUS/ADR-015: mark the three deferred design threads resolved
ubongo, the NetBird mesh, and Level 4 verification are design-resolved
(ADR-015/016/017 + specs + plans); STATUS now says so while keeping build
status honest. Also resolves ADR-015 deferred #2 (browser harness), which
was left open when ADR-017 landed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:01:14 +02:00
01e4f96983 STATUS: record Level 4 service-UI verification (ADR-017) 2026-06-05 13:19:53 +02:00
787aa3b8e1 STATUS: record NetBird mesh (coordinator + base enrollment) 2026-06-05 11:50:53 +02:00
9653a34241 STATUS: record ubongo control host as designed, not built 2026-06-05 09:47:24 +02:00
3b029352b6 Add per-service SECURITY.md convention; one role per service
Revise ADR-004 to a service-role standard: every service is its own
self-contained role with a required file set including SECURITY.md, uniform
deploy mechanics, and a deferred shared-engine option (with revisit trigger)
recorded in the ADR.

Add the per-service security record:
- docs/security/service-security-template.md — canonical SECURITY.md template
  (exposure, checklist status, service-specific hardening, residual risks)
- roles/<service>/SECURITY.md is where each service records how it meets the bar;
  /security-review aggregates roles/*/SECURITY.md and cross-checks against config
- service-checklist.md noted as the generic bar the record answers

Wire-up: new-role runbook step writes SECURITY.md from the template; ADR-002
governance bullet points at it; CLAUDE.md role conventions require it and mandate
one-role-per-service; STATUS records the convention as defined-not-yet-applied.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:09:33 +02:00
19dd89b875 Re-challenge accepted risks; adopt CIS hardening + IDS
Walked the seeded accepted-risk register (R1-R4) and turned inherited gaps into
deliberate decisions:

- Supply chain (R1): tightened to required baseline hygiene (digest pinning,
  official/verified images); active scanning deferred — stays an accepted risk
- CIS (R2): adopted as a positive decision — CIS Debian L1+L2 (base role) + CIS
  Docker (docker_host + service checklist); app layer via the checklist
- SELinux/AppArmor (R3): AppArmor becomes a baseline control (CIS-enforced);
  register keeps a clean "no SELinux" accept
- IDS (R4): adopt AIDE (baseline via CIS) + Suricata on OPNsense + active alerting

Register shrinks from 4 inherited gaps to 2 deliberate accepts. ADR-002 gains a
Hardening standard section; STATUS + TODO 15 track the (unbuilt) implementation,
including the CIS L2 partition impact on VM provisioning (ADR-006).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:15:39 +02:00
f338bccd46 Expand ADR-002 into a security baseline + strategy
Add a managerial security frame on top of the host baseline: explicit threat
model (opportunistic external, lateral movement/blast radius, operator/agent
error; supply chain accepted-lower-priority), security principles, and four
governance mechanisms that ADR-002 establishes and links out to:

- docs/security/service-checklist.md — per-service security bar (referenced
  from the new-role runbook)
- docs/security/accepted-risks.md — living accepted-risk register (R1-R4)
- planned /security-review skill (TODO 8.5)
- agent guardrails in CLAUDE.md "what Claude must not do"

STATUS.md records the frame as present (manual enforcement) and /security-review
as planned-not-built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:39:51 +02:00
4c535c908e Record ADR-012 + STATUS/CLAUDE/scripts docs for capacity tooling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:34:38 +02:00
93f2a847c7 Reconcile CI to trunk-based; mark base/docker_host not-built (R6-R8,R15-R16)
R6/R7: ADR-003 & ADR-008 CI pipelines rewritten trunk-based (push to main ->
test -> staging -> [manual gate] production); CLAUDE.md no longer forbids pushing
to main. R8: STATUS/roles-README/site.yml now say base & docker_host are not built
(not in git), so a clean clone errors. R15/R16: ADR-001 table flagged as intended
design; dropped the unbuilt 'monitoring agent' from the baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:32:37 +02:00
703f1716e5 review-repo: harden scanner, apply safe fixes, record first review
First /review-repo run on boma. Hardened repo-scan.py (no TODO.md/prose false
positives). Applied 7 safe fixes (DNS staleness x2, STATUS factual correction,
hosts.yml path generalisation, trunk-based wording x2, scripts/README). Recorded
the run and 17 open findings in docs/reviews/2026-05-30-*.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:10:58 +02:00
de38d1c68b Rename backlog to docs/TODO.md and fix references
Match the uppercase convention of the other top-level docs; includes the new
Scheduled-work and sanity-check items, and repoints references in STATUS.md and
the /review-repo command.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:01:22 +02:00
b33130eea9 Add /review-repo command with deterministic pre-scan and reviews store
New on-demand repo audit: scripts/repo-scan.py does the cheap deterministic
checks (markers, broken refs, unencrypted vaults) and inventory; the command
fans out judgement reviewers across four dimensions, applies only safe/obvious
fixes, and writes a tracked report to docs/reviews/. Cron + email deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:56:01 +02:00
4ee1b66e23 Source vault password from Vaultwarden via rbw; nest vault structure
Master vault password is fetched from Vaultwarden via the rbw agent
(scripts/vault-pass-client.sh, wired as vault_password_file) instead of a
plaintext .vault_pass. Vault secrets use a nested vault.<service>.<key> map.
Encrypted vault.yml files are excluded from lint. Includes the host rename in
Makefile and STATUS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:16:35 +02:00
19d93d32dc Add project orientation and contributor docs
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:10:01 +02:00