wait_for_ip now tries --source lease first then --source arp; both produce
identical output handled by parse_lease_ip. Removes the suid leaseshelper
dependency introduced and backed out in Task 3. New unit test confirms
parse_lease_ip works on --source arp output format.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three fixes found during askari_inputonly integration-test development:
1. Hostname sanitization: cloud-init rejects underscores in local-hostname
(silently skips network-config → VM never gets DHCP). Sanitize with
name.replace("_", "-") for the meta-data hostname; paths/domain names
keep the original (underscore is valid there).
2. Netplan explicit interface: match.name: en* with a named key produces a
.network file that networkd never DHCPs. Use explicit enp1s0 (all virtio
NICs in these KVM VMs) + renderer: networkd to bypass the bug.
3. ansible_ssh_common_args in the generated hosts.yml: integration VMs
reuse IPs (different VMs at same 192.168.150.x lease). StrictHostKey
accept-new from ansible.cfg blocks changed keys. Add StrictHostKeyChecking=no
+ UserKnownHostsFile=/dev/null per-host to the generated inventory so
stale known_hosts entries never block the apply step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Makefile prepends .venv/bin to PATH (so the venv's ansible tools resolve),
but virt-install's `#!/usr/bin/env python3` shebang then resolved to the
isolated venv, which lacks system PyGObject (gi) -> ModuleNotFoundError. Strip
.venv/bin from PATH for the virt-install call so its shebang finds
/usr/bin/python3 (which has gi); ansible runs via its absolute .venv path and is
unaffected. Surfaced running `make test-integration HOST=ubongo`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ADR-023 §4: ADR-015 no-sudo sub-decision now Superseded-by ADR-025 (bidirectional), not just an in-place amendment.
- STATUS: drop the deferred `reset` verb; honest integration_test (molecule not run in this env; applied to ubongo) + verify (forward/DNAT, not wt0); RED->GREEN validated.
- driver: remove unused `import shutil`.
- README: fix the ADR-025 link filename.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Running the harness leaves tests/integration/.run/ (gitignored, generated); exclude it from yamllint + ansible-lint so a post-run 'make lint' passes. Also emit a --- doc-start in the generated inventory.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cloud-init package_update:true + block on 'cloud-init status --wait' in up() so apply sees populated apt lists (fresh genericcloud images ship empty lists); dump_diagnostics()/console() read the root:0600 serial log via sudo instead of shutil.copy, which raised PermissionError mid-diagnostics.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Debian 13 genericcloud image triple-faults at the legacy real-mode kernel
handoff under SeaBIOS/q35 (boot-loops at GRUB, no 'Decompressing Linux', no DHCP
lease). Booting via UEFI (OVMF -> efistub) bypasses the legacy entry and boots
cleanly: cloud-init runs, DHCP lease obtained, SSH reachable. Verified end-to-end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Don't rely on the genericcloud image's network fallback; the seed now carries a
network-config forcing dhcp4 on en* interfaces. A correct prerequisite for the VM
to network once cloud-init processes the seed. (Note: a separate no-DHCP-lease
issue on first real boot is still under investigation — the guest isn't networking
and, under the no-sudo claude model, the VM console/logs aren't introspectable
without libguestfs; see next steps.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Under qemu:///system the hypervisor runs as libvirt-qemu, which cannot traverse
/home/claude — so the overlay/seed/console must live in /var/lib/boma-integration
(group libvirt, world-traversable, created by the integration_test role), not the
repo/home RUN_DIR. The inventory (hosts.yml + group_vars symlink, read by ansible
as claude) stays in RUN_DIR. Verified: virt-install now creates the domain.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bare virsh/virt-install default to qemu:///session for a non-root caller, but
the substrate, /dev/kvm, and the boma-it NAT network live on the SYSTEM libvirtd.
Pin the URI so the driver targets system regardless of who runs it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The driver passed -i <RUN_DIR>/ (a directory); ansible's directory-inventory
loader then parsed sibling files (notably 'current', which holds the real host
string 'askari') as INI inventory, creating phantom hosts incl. the real askari
with its full hostvars — violating the single-host safety invariant (and a hard
error in ansible 2.18 on the binary qcow2/seed files). Point -i at the single
hosts.yml file; ansible still loads the adjacent group_vars symlink. (review C1)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/registry-login.sh reads vault.forgejo.registry_token and pipes it to
docker login --password-stdin (never echoed, never on argv); 'make registry-login'
wires it with the venv binaries. Adds the operator-minted CHANGEME vault stub
(fill via make edit-vault) and a per-machine prereq note in the claude-code-setup
runbook, so 'make caddy-image-push'/'molecule-image-push' become agent-completable
non-interactively. Consumes the 2026-06-15 signal in docs/FRICTION.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a numbered ADR announces a rename Old->New, flag design-doc lines where
Old still appears in present tense — skipping the announcing ADR, lines that
also name New, and historical/negation cues, and rejecting ADR-NNN tokens as
terms. Structural cousin of stale-deferred; run by /review-repo. Zero findings
on the current tree (the Traefik->Caddy ripple edits have landed). Consumes the
2026-06-14 KEEP-OPEN signal in docs/FRICTION.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add REPO_DIRS constant; still_exists now only checks tokens that start
with a known repo top-level dir, ignoring plugin names (caddy-dns/gandi),
make command fragments (tf-init/plan), and role-relative paths.
- Add test_still_exists_ignores_non_repo_tokens (was failing before fix).
- Add test_nudge_line_overdue_on_age to close coverage gap on age threshold.
- Add load_signals docstring.
- Replace manual --today date parsing with datetime.date.fromisoformat type
converter so malformed dates give a clean argparse error.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
<host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)
O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01
description, base/playbooks/scripts/terraform/public_dns README build-state,
CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23
stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported.
make lint green. No stale-deferred (ADR-011 open questions still open).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Streamline the recurring secret-entry friction: the agent stubs a needed secret as
vault.<service>.<key>=CHANGEME with a what/how-to-obtain comment, wires the code,
and commits; the operator fills it via make edit-vault (real value never hits chat).
check-vault now lists outstanding CHANGEME placeholders so none are forgotten.
Convention documented in CLAUDE.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`make edit-vault` runs `ansible-vault edit` (decrypt → nvim → re-encrypt on :wq,
abort on :cq) so editing the vault is one step with no plaintext left in the work
tree, then validates structure. `make check-vault` runs scripts/check-vault.py:
decrypts in-memory, asserts valid YAML with secrets under the nested `vault:` map
and no empty leaves, and prints a values-masked structure view (comments visible,
secrets never printed). Both default to the production all-vault; override VAULT=.
Update the vault header comment, CLAUDE.md (command table + Secrets section), and
scripts/README to point at edit-vault (note check-vault.py is the one venv-
dependent helper, by design).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final-review polish: demote the sub-headings under the demoted 'IP addressing'
(007) and 'Three testing levels'/'What Molecule tests' (008) to #### so they
nest correctly instead of flattening to siblings. Tighten the adr-structure
Superseded pattern to require '(YYYY-MM-DD)' per ADR-023.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revisits the lifecycle decision on the evidence of ADR-011 (a real draft
with open questions). Adds a fourth state, Proposed (YYYY-MM-DD), to ADR-023,
the template, the adr-structure check (+test), spec and plan. Sets ADR-011's
Status to Proposed and removes its now-redundant inline 'Proposed' line.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flags numbered ADRs missing a mandatory section (Status/Context/Decision/
Consequences) or with an unparseable Status line. Presence only, not order.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds role_tag_problems() to check-tags.py: every role imported in a
play's roles: block must carry its own role name as a tag (extra tags
allowed; templated role names skipped). Wires the check into main() so
make lint catches violations. 6 new unit tests (29 total, all passing).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- broken-path-ref: skip template/generated-report paths — a placeholder
(<service>) immediately following the match, a YYYY-MM-DD date token, or a
path under a generated-report reviews/ dir (14 -> 0 on the current tree).
- marker: skip numbered-backlog references (TODO 8.2, TODO-3.1, TODO (2.2,
TODO item 16) which point at the backlog, not code markers (35 -> 2; the
remaining two are literal "TODO:" strings in a plan doc). Real code markers
(TODO:, FIXME, etc.) still caught — verified with a synthetic fixture.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review O4: ADR-016 said askari gets "its own inventory group" but never named it.
Settled as offsite_hosts (off-site, distinct from on-site-but-off-cluster ubongo).
Added to VALID_GROUPS (tf_to_inventory.py), ADR-009 valid groups, ADR-001/ADR-016
host-group enumerations, and CLAUDE.md. Generated hosts.yml picks up the section on
the next make tf-inventory (a manual-exception group like control).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
repo-scan.py now enumerates open ADR "Deferred/Open" items and flags any that
another file describes as resolved but which isn't marked resolved in place
(the recurring miss in docs/FRICTION.md). review-repo.md's Phase 2 reviewer
confirms each open item against later ADRs/STATUS.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds gather_usage() (stubbed, returns available:false), known_hostnames()
with graceful degradation when terraform/ansible-inventory are absent,
_run_json() helper, and main() that parses reference.md and emits JSON.
Three new TDD tests (12 total, all passing). Script exits 0 with valid
JSON even when no cluster is provisioned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the parse_table() function and pytest test harness for the
capacity-scan script. Tests cover header matching and graceful empty
return when the required header is absent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First /review-repo run on boma. Hardened repo-scan.py (no TODO.md/prose false
positives). Applied 7 safe fixes (DNS staleness x2, STATUS factual correction,
hosts.yml path generalisation, trunk-based wording x2, scripts/README). Recorded
the run and 17 open findings in docs/reviews/2026-05-30-*.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New on-demand repo audit: scripts/repo-scan.py does the cheap deterministic
checks (markers, broken refs, unencrypted vaults) and inventory; the command
fans out judgement reviewers across four dimensions, applies only safe/obvious
fixes, and writes a tracked report to docs/reviews/. Cron + email deferred.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Master vault password is fetched from Vaultwarden via the rbw agent
(scripts/vault-pass-client.sh, wired as vault_password_file) instead of a
plaintext .vault_pass. Vault secrets use a nested vault.<service>.<key> map.
Encrypted vault.yml files are excluded from lint. Includes the host rename in
Makefile and STATUS.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>