Commit graph

376 commits

Author SHA1 Message Date
cc3337502f Add ADR-017 (service-UI acceptance verification, Level 4) 2026-06-05 13:13:09 +02:00
be6a064f44 Add implementation plan for service-UI verification (Level 4)
Task-by-task: author ADR-017, expand ADR-008 Level 4, create the VERIFY.md
template + /verify-service skill, and reconcile the checklist/CLAUDE.md/
gitignore/STATUS/TODO. Buildable-now artifacts; live run stays deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 13:11:43 +02:00
2bd11b5aa9 Add design spec for service-UI verification (ADR-008 Level 4)
Resolves ADR-015 deferred item #2 + TODO 2.2/2.3: a Claude-driven exploratory
browser harness (/verify-service) that exercises staging service UIs through
real SSO, backed by a per-service VERIFY.md, with test users in staging
Authentik and a manual-test handoff. Basis for ADR-017.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 13:05:11 +02:00
5322cce5c6 FRICTION: resolving a deferred decision needs a doc-wide grep sweep
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 12:20:20 +02:00
cd62c5e098 new-host runbook: mesh VPN resolved to NetBird (ADR-016) 2026-06-05 11:52:22 +02:00
ed9fdcc10a CLAUDE.md: link ADR-016 (mesh VPN) 2026-06-05 11:51:36 +02:00
787aa3b8e1 STATUS: record NetBird mesh (coordinator + base enrollment) 2026-06-05 11:50:53 +02:00
841f666de9 CAPABILITIES: VPN decided — NetBird self-hosted (ADR-016) 2026-06-05 11:50:04 +02:00
08165ffb68 accepted-risks: R3 now the concrete NetBird coordinator risk 2026-06-05 11:48:58 +02:00
2ae5cf4535 ADR-015: resolve mesh-VPN deferral — NetBird on askari (ADR-016) 2026-06-05 11:48:04 +02:00
5a32dd46d3 ADR-007: retire VLAN-99 WireGuard for the NetBird mesh (ADR-016) 2026-06-05 11:47:03 +02:00
ff796c64ca Add ADR-016 (mesh VPN — NetBird self-hosted on askari) 2026-06-05 11:45:45 +02:00
4b85b14f1f Add implementation plan for NetBird mesh VPN
Task-by-task docs plan: author ADR-016 and reconcile ADR-007 (retire VLAN-99
WireGuard), ADR-015 (resolve deferred #1), accepted-risks R3, CAPABILITIES,
STATUS, CLAUDE.md. Documentation-only; role/deployment waits on the base role.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 11:44:05 +02:00
99ace3eb48 Add design spec for mesh VPN (NetBird self-hosted on askari)
Resolves ADR-015 deferred item #1: the mesh VPN is NetBird, self-hosted on
askari, replacing ADR-007's VLAN-99 OPNsense WireGuard. Agent-per-host
enrollment via base, embedded local-user IdP, coordinator off-site for
outage survival. Basis for ADR-016.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 10:58:35 +02:00
a53941dffe CLAUDE.md: fix capabilities doc link after rename to CAPABILITIES.md 2026-06-05 09:50:28 +02:00
7a48a60f14 CLAUDE.md: fix project summary — control node is physical ubongo 2026-06-05 09:49:23 +02:00
a30c1af3f0 CLAUDE.md: link ADR-015; note ubongo as physical control node 2026-06-05 09:48:09 +02:00
9653a34241 STATUS: record ubongo control host as designed, not built 2026-06-05 09:47:24 +02:00
55a3666d16 accepted-risks: reserve R3 mesh-VPN coordinator (pending choice) 2026-06-05 09:46:40 +02:00
a2db8058e7 rotate-secrets: document offline vault break-glass for ubongo 2026-06-05 09:45:27 +02:00
b89ca8835a new-host runbook: control node ubongo is bare-metal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 09:44:31 +02:00
3fb780c286 ADR-012/hardware: add ubongo as physical control node
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 09:43:09 +02:00
66064be7b2 ADR-008: tests run on ubongo; stub Level 4 service-UI acceptance 2026-06-05 09:42:01 +02:00
07bc1c83f0 ADR-009: control-node exception is a physical box, not a VM 2026-06-05 09:41:03 +02:00
1064716d49 ADR-005: control node bootstrap is bare-metal Debian on ubongo
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 09:40:15 +02:00
15779be086 ADR-001: control node is physical ubongo outside cluster 2026-06-05 09:39:18 +02:00
5aca796fa0 Add ADR-015 (control/AI-worker host ubongo) 2026-06-05 09:37:56 +02:00
4cf4aaa12e Renamed capabilities doc to capital letters to comform with other. 2026-06-05 09:36:55 +02:00
d96cf9f846 FRICTION: default to subagent-driven execution, don't ask
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 09:35:13 +02:00
0e9f179bfc Add implementation plan for ubongo control host
Task-by-task docs plan: author ADR-015 and reconcile ADR-001/005/008/009/012,
the new-host and rotate-secrets runbooks, accepted-risks, STATUS, and CLAUDE.md.
Documentation-only; the physical box stays "designed, not built".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 09:29:10 +02:00
c1b21c9b2b Add design spec for ubongo control/AI-worker host
Records the decision to replace the cluster-resident control VM with a
dedicated always-on physical mini-PC (ubongo) outside the Proxmox
cluster, collapsing control plane, AI-worker host, dev home, and local
test runner into one box. Basis for ADR-015.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 09:19:02 +02:00
fc0d49f1c4 Link ADR-011 from CLAUDE.md Further reading
CLAUDE.md indexed every ADR except 011 (update management), which is now tracked.
Found by the claude-md-improver audit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 21:53:57 +02:00
abb5c7a12f Make the Claude Code toolchain reproducible (TODO 10.7)
Reviewed the Claude Code config against boma's capabilities and committed a
reproducible, leaner toolchain:

- .claude/settings.json now declares extraKnownMarketplaces + enabledPlugins so a
  fresh clone prompts to install the active set: superpowers, context7, terraform
  (we use TF, ADR-006), claude-md-management (doc/ADR-heavy). Drops code-simplifier.
- Adds a conservative, read-only/verify permissions allowlist (git status/diff/log,
  make lint/test/check, pytest, rbw unlocked, ls/cat/rg/find) — mutations and
  outward/destructive commands stay gated, consistent with ADR-002.
- docs/runbooks/claude-code-setup.md: per-machine bootstrap, the deferred
  enable-when plugins (security-guidance/semgrep, playwright, hookify, skill-creator),
  rbw/venv prerequisites, and a note to keep the dangerous-mode prompt on.

Closes TODO 10.7. Plugin install remains a per-machine /plugin action (no native
auto-install).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 21:41:54 +02:00
f22ff4b752 Add capabilities overview (docs/capabilities.md)
A living, high-level map of boma's intended capabilities, organised by domain
(10 domains) with tier tags (platform/app/support) and commitment tags
(core/planned/candidate/maybe-later), mirroring STATUS.md's honesty.

Built capabilities-first on boma's own terms per ADR-013 — V4 was consulted only
as a completeness check (last), not as a source of scope. The check confirmed
strong alignment with V4's actual services, surfaced re-justified gaps (ebook/
audiobook library, Collabora, Samba/NFS, service portal, self-hosted email as
maybe-later), and confirmed deliberate exclusions (V4's desktop/workstation roles;
the dropped Knowledge domain) — boma is server-only by design.

Frames downstream decisions (service picks, Netbird-vs-WireGuard reconcile with
ADR-007, node placement, plugin relevance) without resolving them. Linked from
CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 20:52:08 +02:00
68a37d51f1 Add ADR-014 (sourcing technical knowledge)
Policy for how agents source authoritative knowledge and translate it to boma:
- Facts (version-specific syntax/options) → consult version-matched official docs,
  cite, and stamp `verified: subject · tool ver · source · date`. Training memory
  is the draft, never the authority.
- Judgments (best practices) → evidence to translate through boma's principles
  (extends ADR-013); never adopted because the docs/a blog say so.
- Consultation is risk-based (security / unfamiliar-or-fast-moving / can't quote a
  flag with confidence), with a "from memory, unverified" transparency backstop.
- Sources: context7 → WebFetch upstream → claude-code-guide → deep-research; match
  the PINNED version, not "latest".
- Graceful degradation: the plugin accelerators may be absent on a fresh checkout,
  so the policy commits to the principle and falls back to WebFetch/WebSearch.

CLAUDE.md gains a concise operating rule + pointer. TODO 10.4 decided. New TODO
10.7 tracks making the plugin/MCP toolchain reproducible from the repo (it is not
synced by account nor carried by git) — surfaced by a portability check this run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 20:07:18 +02:00
2f4218814a Reconcile image pinning to a tiered tag@digest rule
Resolve the conflict between ADR-011 (tags-not-digests) and the security work
(digest pinning) with one coherent rule that respects ADR-011's stateless/stateful
split:

- Stateful → pin `tag@digest` (readable tag + integrity digest): legible diffs AND
  tamper-evidence. Snapshots cover broken updates; the digest covers swapped images.
- Stateless → rolling tags (latest/stable); digest-pinning would defeat the rolling
  design. Integrity rests on official/verified images + disposability.

Aligned across ADR-011 (decision 2), ADR-004 (image management), ADR-002
(supply-chain row), accepted-risk R1, the service checklist, and TODO 15.6.
TODO 16.7 marked decided.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 19:21:36 +02:00
0e4050fa59 Add ADR-013 (V4 heritage policy); track ADR-011
ADR-013 sets how boma draws on AnsibleBaobabV4 without inheriting it:
translate-don't-transplant — V4 is evidence, never authority. It is a legitimate
source only of operational gotchas and working config snippets (re-derived on
boma's terms); never requirements, domain values, structure, or conventions.
Provenance stays transient (commits/conversation), durable docs stay clean. AI
consultation guardrails included. Resolves TODO 3.3 and 10.1.

Also bring ADR-011 (update management, Proposed draft) under version control:
- fix its "reuse V4's ntfy topics" line to "boma defines its own" (ADR-013)
- track its 6 open questions in TODO 16, plus a 7th: reconcile its tags-not-digests
  pinning with the digest-pinning the security work now mandates (R1 / checklist /
  15.6) — they currently conflict.

CLAUDE.md gains a V4 guardrail + ADR-013 pointer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 19:07:48 +02:00
3b029352b6 Add per-service SECURITY.md convention; one role per service
Revise ADR-004 to a service-role standard: every service is its own
self-contained role with a required file set including SECURITY.md, uniform
deploy mechanics, and a deferred shared-engine option (with revisit trigger)
recorded in the ADR.

Add the per-service security record:
- docs/security/service-security-template.md — canonical SECURITY.md template
  (exposure, checklist status, service-specific hardening, residual risks)
- roles/<service>/SECURITY.md is where each service records how it meets the bar;
  /security-review aggregates roles/*/SECURITY.md and cross-checks against config
- service-checklist.md noted as the generic bar the record answers

Wire-up: new-role runbook step writes SECURITY.md from the template; ADR-002
governance bullet points at it; CLAUDE.md role conventions require it and mandate
one-role-per-service; STATUS records the convention as defined-not-yet-applied.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:09:33 +02:00
19dd89b875 Re-challenge accepted risks; adopt CIS hardening + IDS
Walked the seeded accepted-risk register (R1-R4) and turned inherited gaps into
deliberate decisions:

- Supply chain (R1): tightened to required baseline hygiene (digest pinning,
  official/verified images); active scanning deferred — stays an accepted risk
- CIS (R2): adopted as a positive decision — CIS Debian L1+L2 (base role) + CIS
  Docker (docker_host + service checklist); app layer via the checklist
- SELinux/AppArmor (R3): AppArmor becomes a baseline control (CIS-enforced);
  register keeps a clean "no SELinux" accept
- IDS (R4): adopt AIDE (baseline via CIS) + Suricata on OPNsense + active alerting

Register shrinks from 4 inherited gaps to 2 deliberate accepts. ADR-002 gains a
Hardening standard section; STATUS + TODO 15 track the (unbuilt) implementation,
including the CIS L2 partition impact on VM provisioning (ADR-006).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:15:39 +02:00
f338bccd46 Expand ADR-002 into a security baseline + strategy
Add a managerial security frame on top of the host baseline: explicit threat
model (opportunistic external, lateral movement/blast radius, operator/agent
error; supply chain accepted-lower-priority), security principles, and four
governance mechanisms that ADR-002 establishes and links out to:

- docs/security/service-checklist.md — per-service security bar (referenced
  from the new-role runbook)
- docs/security/accepted-risks.md — living accepted-risk register (R1-R4)
- planned /security-review skill (TODO 8.5)
- agent guardrails in CLAUDE.md "what Claude must not do"

STATUS.md records the frame as present (manual enforcement) and /security-review
as planned-not-built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:39:51 +02:00
c57910eda8 Log Forgejo no-PR-workflow friction in FRICTION.md
Forgejo origin is trunk-based with no merge-request gate, so the
finishing-a-development-branch "open a PR" option doesn't apply — merge
locally then push. Also carries earlier uncommitted FRICTION.md edits
(emphasis normalization + 2026-05-31 ADR-status entry).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 11:22:26 +02:00
e12326148c Note latest.md report mirror in ADR-012
Final-review minor: the /capacity-review skill overwrites a latest.md
pointer alongside the dated report; record that in the ADR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 10:40:16 +02:00
4c535c908e Record ADR-012 + STATUS/CLAUDE/scripts docs for capacity tooling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:34:38 +02:00
1060a9c08a Add /capacity-review skill
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:32:07 +02:00
05694f6ea4 Complete capacity-scan.py: usage stub, subprocess glue, main()
Adds gather_usage() (stubbed, returns available:false), known_hostnames()
with graceful degradation when terraform/ansible-inventory are absent,
_run_json() helper, and main() that parses reference.md and emits JSON.
Three new TDD tests (12 total, all passing). Script exits 0 with valid
JSON even when no cluster is provisioned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:30:45 +02:00
8ed00c9206 Add hostname parsers + find_drift() to capacity-scan.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:24:11 +02:00
b240fa8bfe Add compute_rollup() to capacity-scan.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:21:22 +02:00
07ecbb2789 Add capacity-scan.py with parse_table()
Implements the parse_table() function and pytest test harness for the
capacity-scan script. Tests cover header matching and graceful empty
return when the required header is absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:20:10 +02:00
3ea9109ba2 Add hardware reference doc skeleton + reviews dir
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 10:14:53 +02:00
6ff5d55810 Add implementation plan for hardware capacity tooling
Task-by-task TDD plan: reference.md skeleton, stdlib-only capacity-scan.py
(parse_table, compute_rollup, drift, usage stub, main), /capacity-review skill,
and ADR-012 + STATUS/CLAUDE/scripts-README updates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 10:04:59 +02:00