Two tasks: a base mesh coordinator-FQDN /etc/hosts pin (Molecule TDD) + the accept-and-document docs (R8, ADR-016 availability amendment, STATUS/ROADMAP). Coordinator backup deferred to ADR-022.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sub-project 3 of the mesh-hardening follow-on. Accepts the single off-site coordinator as a documented availability SPOF (R8 + ADR-016 amendment) given the narrow blast radius (LAN/intra-cluster/local traffic unaffected; only remote relayed mesh access breaks). Hardens the one real gap: a base mesh coordinator-FQDN /etc/hosts pin so managed hosts survive a local-DNS hiccup. Coordinator off-site backup explicitly deferred to an ADR-022 kickoff (no throwaway infra).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four tasks: netbird_coordinator geolocation disable (TDD via Molecule) -> inventory enablement (INPUT-only firewall + WAN break-glass + manage over wt0) -> an askari_inputonly integration profile (the reboot-safety GREEN gate) -> the operator-gated supervised live cutover + STATUS/ROADMAP update. Tasks 1-3 are autonomously implementable; Task 4 is operator-gated (live off-site host, lockout risk).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Redesign of the backed-out 2026-06-17 askari SSH->wt0 attempt. Mirrors the proven ubongo 2/3 pattern (INPUT-only default-deny, SSH scoped by iifname wt0, no sshd ListenAddress change -> no boot-race) and adds the coordinator-host exception the incident demanded: a permanent non-mesh break-glass (WAN :22 from ubongo's static WAN IP + the Hetzner console), WAN :22 deliberately left open. Folds in the netbird_coordinator geo-DB robustness fix (FRICTION #4) so a transient egress blip can't FATAL the control plane. Harness-GREEN gate before a supervised live cutover.
Operator decision (2026-06-19): do this redesign first, then a separate sub-project to reduce askari's SPOF role.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Allow a second operator workstation (10.20.10.17) onto ubongo's LAN SSH
alongside mamba (10.20.10.50). Both are raw DHCP leases; recorded a FRICTION
open signal to replace them with MAC-pinned OPNsense reservations when
OPNsense-as-code lands (ADR-020 / TODO 3.5).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five tasks: base knobs (input-only forward policy + admin-addr SSH allow,
TDD via Molecule) → enable on the control group → a 'be ubongo' integration
profile (profile-aware verify) → the real-VM harness GREEN gate → the
operator-supervised live cutover (signal-6 order, physical-console break-glass).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sub-project 2 of the mesh-hardening follow-on (the post-incident roadmap
ordering puts ubongo first). Harden the control node's inbound surface via
base's nftables firewall as INPUT-only default-deny: the forward chain stays
permissive (new base__firewall_input_only knob) so Docker egress + the
libvirt-NAT integration harness keep working, and there is no sshd ListenAddress
change — sidestepping the ip_nonlocal_bind boot-race that sank askari. SSH
allowed from wt0, ssh-from-control (Ansible self), and mamba on the LAN (new
base__firewall_admin_addrs). Harness-validated before an operator-supervised
cutover; the physical console is the permanent break-glass.
Design maps to the four relevant 2026-06-17 incident lessons (FRICTION signals
1/2/3/6).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Throwaway KVM VMs on ubongo (libvirt, Approach A) that mirror a real host (real Docker, real reboot, real role apply) to catch the reboot/firewall/boot-order class Molecule cannot - the 2026-06-17 mesh-hardening incident. First profile: be askari; tiered certs (internal + le-staging built, le-prod-wildcard on-demand). Concrete build of ADR-008 Level 2/3; to be recorded as ADR-025.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5 tasks: base sshd ListenAddress+ip_nonlocal_bind (Molecule), firewall public
zone + askari catalog, inventory wt0 override, TF retire WAN :22, then the live
operator-supervised staged cutover.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decomposes the M5 mesh-hardening follow-on into 3 independent sub-specs; this
is sub-project 1. Three-layer SSH-on-wt0 (sshd ListenAddress=mesh + nftables
iifname wt0 + retire the Hetzner WAN :22), ip_nonlocal_bind to beat the
post-boot wt0 bind race (fail-closed), live wt0 fact for the listen addr,
staged cutover with the firewall auto-rollback as the safety gate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
base 'mesh' concern enrols NetBird agents on ubongo + askari via a reusable scoped
setup key (vault); laptops enrolled by the operator. Reachability via the default
peer policy; the base nftables default-deny on ubongo + ACL tightening are deferred
to a follow-on. Resolves ROADMAP M5 design; next: writing-plans.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Curate-only consume pass over FRICTION.md Open signals: interactive guided
session, add/change/park/remove verdicts (park-with-resurrection-trigger to
protect out-of-phase tooling on a solo project), single source = FRICTION.md,
ledger is the durable record. Mirrors /review-repo (command md + stdlib scanner).
Stage 1 on-demand + stage-2 nudge; headless/cron deferred (TODO 11.3).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
<host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)
O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture NetBird's configure.sh reference for a pinned version → translate into
boma role templates (compose + management.json + dex/openid + turnserver),
external-proxy mode behind the M4a Caddy (netbird.askari.wingu.me). First service
role: full ADR-004 standard files; secrets generated/CHANGEME-stubbed (setup key
for M5). Gated live deploy + verify.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First of M4's two build phases: docker_host (Docker engine), custom xcaddy Caddy
image (caddy-dns/gandi), reverse_proxy role (Caddyfile from a route catalog,
DNS-01 wildcard cert for *.askari.wingu.me via vault.gandi.pat), ADR-024 (Caddy is
boma's reverse proxy), firewall 80/443 + DNS, proven by serving a test route over
TLS. M4b (NetBird) follows, reading NetBird's current self-host compose then.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Caddy becomes boma's standard reverse proxy (amends the soft Traefik assumption;
new ADR) with Gandi DNS-01 certs (custom xcaddy image, reuses vault.gandi.pat) —
the only cert path for mesh/LAN-only services. NetBird self-hosted in
external-proxy mode (embedded Dex), compose rendered from boma templates
(ADR-004/013). Three roles: docker_host (first real content), reverse_proxy (new,
Caddy), netbird (first service role w/ full ADR-004 standard files). Firewall +
DNS amendments; backup execution deferred (fisi). caddy-dns/gandi + NetBird
self-host facts verified.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ADR-002 baseline (key-only, no root, fail2ban 5/1h) as two base task files under
the existing 'hardening' concern tag; applied to askari by tag (NOT the host
firewall — that's mesh-gated to avoid lockout; Hetzner Cloud Firewall is the
perimeter until M5). NetBird agent deferred to M4. Adds a LIMIT=/TAGS= passthrough
to make check/deploy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bite-sized TDD plan: add community.general; scaffold public_dns; wingu.me record
data + pytest; role tasks (gandi_livedns present/absent loops, apply toggle);
Molecule (apply=false, no live API); dns.yml play; gated live run on ubongo
(purge Gandi defaults + anti-spoof baseline + dig verify); ADR-007 amendment +
TODO 4 resolution + STATUS/CAPABILITIES.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
askari is provisioned as IaC: Terraform owns its existence too, generalizing
ADR-006 from "Proxmox VM existence" to Proxmox + Hetzner (new hetznercloud/hcloud
provider, hetzner_vm module, offsite stack with local state). CAX11 (ARM) in
Helsinki on Debian 13, behind a TF-managed Hetzner Cloud Firewall (SSH-from-ubongo
now; NetBird ports in M4). Token via TF_VAR_hcloud_token from vault.hetzner.token.
Handoff stays ADR-009-shaped (tf_to_inventory.py extended to emit askari into
offsite_hosts). State in the ADR-022 backup scope; DR via terraform import.
Amends ADR-006/009/020/007/016. Point ROADMAP.md M2 at the spec.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decided to keep the project named boma with wingu.me as its domain (boma was not
available as a domain). Record why the infra tier reads <host>.boma.wingu.me so it
isn't re-litigated; folds into the ADR-007 amendment.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
boma's domain is wingu.me (registered at Gandi; 'wingu' = Swahili for cloud).
Replace the parametric <boma-domain> placeholder with wingu.me throughout. The
zone was NOT empty — Gandi auto-seeded 13 default records (parking A, www redirect,
a full Gandi mailbox set), so M1 includes a one-time purge to a clean baseline plus
an anti-spoof null-mail set (null MX, SPF -all, DMARC reject) since wingu.me sends
no mail. Domain-pick open item closed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Settles the M1 design: full registrar transfer Cloudflare -> Gandi; three-tier
naming scheme (host.boma / service.bare / service.askari), nyumbani dropped,
mesh/LAN-only default; public-DNS-as-code via a control-node `public_dns` role
driven by group_vars data, using community.general.gandi_livedns with a PAT
(api_key is deprecated/rejected by Gandi — verified per ADR-014). Stale records +
unused MX cleaned by omission. Cert scope is DNS+PAT only (issuance deferred to
M4/Phase 2). Human/agent division of labour + token-scoping recorded.
Resolves TODO 4 and review finding O12 once the ADR-007 amendment lands. Point
ROADMAP.md M1 at the spec.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A new role (separate from base) that gives workstation-class hosts (ubongo
now, mamba later) a clean interactive environment: zsh + oh-my-zsh +
oh-my-posh, tmux + TPM plugins, and neovim. Dotfiles are real files deployed
via GNU stow (not templated); pinned nvim v0.12.2 + oh-my-posh 29.0.1.
Configs re-derived (ADR-013) from AnsibleBaobabV4 + the operator's fisi setup
on boma's terms: no Nerd Font (headless host), no system LSP suite (nvim uses
mason), versions pinned (V4 tracks latest). Applied via playbooks/workstation.yml
to the control group for users sjat + claude. Lint + Molecule (idempotent) green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the interactive build decisions (no-encryption + accepted risk,
simple partition, dedicated claude identity, LAN-only access, pinned
versions) and the A-F + H task breakdown. Sequel to the 2026-06-05
docs-only ADR-015 plan.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revisits the lifecycle decision on the evidence of ADR-011 (a real draft
with open questions). Adds a fourth state, Proposed (YYYY-MM-DD), to ADR-023,
the template, the adr-structure check (+test), spec and plan. Sets ADR-011's
Status to Proposed and removes its now-redundant inline 'Proposed' line.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the Status-only backfill with a faithful presentational
restructure bringing the whole back-catalogue to 4-section conformance
(no grandfathering). Adds the faithfulness rule and per-file worklist.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codifies the structure ADRs 019-022 converged on, pins an
Accepted/Superseded/Deprecated lifecycle with a no-silent-rewrite rule,
adds an adr-template.md scaffold, and plans a Status-header backfill of
ADRs 001-018. Basis for ADR-023.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Data-only restic backups, rebuild-from-code recovery (Model A); central
off-cluster pull node (fisi) with 8TB mirror; 3-2-1 via pCloud (rclone)
+ rotated USB air-gap. Per-service backup__* contract + BACKUP.md as a
hard convention. Two-tier restore testing (ubongo container restore-verify
+ semi-annual staging DR rehearsal). One restic password escrowed to
Vaultwarden + paper (restic + vault passwords) for a non-circular
break-glass. Dead-man's-switch alerting via Uptime Kuma.
Resolves TODO 3.8; grounds ADR-011's backup-first assumption.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Splits the work into Tranche A (land now: ADR-021, ADR-016/020
reconciliation, ssh-from-control firewall source, ACCESS.md template,
/check-access command, governance + index wiring) and Tranche B
(build-pending on infra: per-service access__* + rendered ACCESS.md,
/check-access running).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brainstorming spec for ADR-021: operational access as a deployment
deliverable. Two layers (host baseline + per-service), a three-tier
access ladder (mesh SSH -> LAN SSH from ubongo -> console break-glass),
declarative access__* data rendering ACCESS.md and driving a
/check-access verifier. Resolves TODO 3.2 (API access) and 7.2 (host
access); amends ADR-016 (SSH also from ubongo) and ADR-020
(ssh-from-control source).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task-by-task docs plan: author ADR-018 and reconcile ADR-002, accepted-risks
(R4), CAPABILITIES, ADR-012, STATUS, TODO, CLAUDE.md. Roles/pipeline deferred
on the base + service-role machinery.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All logs -> on-cluster Loki for troubleshooting/trends; a security-relevant
subset also ships write-only off-site to askari (append-only, tamper-resistant
against full-cluster compromise); skip WORM (accepted-risk R4). Alloy agent in
base; loki/grafana service roles; disk-wear handled as a design parameter.
Basis for ADR-018.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task-by-task: author ADR-017, expand ADR-008 Level 4, create the VERIFY.md
template + /verify-service skill, and reconcile the checklist/CLAUDE.md/
gitignore/STATUS/TODO. Buildable-now artifacts; live run stays deferred.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolves ADR-015 deferred item #2 + TODO 2.2/2.3: a Claude-driven exploratory
browser harness (/verify-service) that exercises staging service UIs through
real SSO, backed by a per-service VERIFY.md, with test users in staging
Authentik and a manual-test handoff. Basis for ADR-017.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>