Commit graph

31 commits

Author SHA1 Message Date
718781053f fix(dev_env): make concern tags reach included tasks (O8)
Dynamic include_tasks only filter on the include's own tags, not their
(untagged) contents — so `--tags packages` ran none of the neovim/oh-my-posh/
nodejs installs, and `--tags users|config` never entered per_user.yml. Add
`apply: tags:` to all four includes (mirroring base/tasks/main.yml) and tag the
dev_env__home getent+set_fact preflight `always` so a partial run still resolves
the home dir before the dotfile/stow tasks consume it.

Molecule: add a config-only converge play for a fresh user + a verify assertion.
Proven with `molecule converge -- --tags config` (idempotent, home resolved).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:06:15 +02:00
64f1e821d8 docs(review): 2026-06-14 repo audit — M4a doc drift + Traefik→Caddy lag
11 safe auto-fixes (docs/comments only): reverse_proxy meta stale DNS-01
description, base/playbooks/scripts/terraform/public_dns README build-state,
CAPABILITIES reverse-proxy Traefik→Caddy, README ADR list → 024, TF cax11→cx23
stamps, public_dns wildcard DNS-01→HTTP-01 comment. 29 open findings reported.
make lint green. No stale-deferred (ADR-011 open questions still open).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:37:54 +02:00
b7e919d6b3 refactor(reverse_proxy): vanilla Caddy + HTTP-01 (drop DNS-01 custom image)
Switch from a custom caddy-dns/gandi image built on-host to the official
caddy:2 image with per-host ACME HTTP-01 certificates. Removes the
Dockerfile, env.j2 (Gandi token), on-host image build/ship/load tasks,
the caddy-image Makefile target, and the wildcard DNS-01 Caddyfile.
Each route now gets its own server block and automatic certificate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:11:20 +02:00
50b6445bdd feat(reverse_proxy): Caddy role (Gandi DNS-01, on-host image build, route catalog)
Implements the Caddy reverse proxy role (ADR-024): builds boma/caddy-gandi:latest
on-host (caddy-dns/gandi plugin), renders Caddyfile from route catalog, brings
Compose project up. Adds community.docker to requirements.yml, production group_vars,
and a caddy-image Makefile target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:36:58 +02:00
456c27d12b feat(docker_host): install Docker engine + compose plugin
Implements the docker_host role tasks: prerequisites, /etc/apt/keyrings
directory (ordered before the GPG key write), Docker APT key + repo, and
docker-ce/cli/containerd.io/compose-plugin install. Daemon hardening and
nftables.d integration remain deferred to Phase 2 (cluster + base firewall).
Updates defaults, README, and molecule verify to assert docker --version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:28:51 +02:00
db1e5db138 fix(base): propagate hardening tag to included tasks; check-mode-safe fail2ban
Two bugs caught by the live make check/deploy on askari:
- include_tasks with a tag selects the include but NOT its tasks, so --tags hardening
  ran nothing. Use apply: {tags:} to propagate (also fixed the firewall include).
- fail2ban service start + restart handler fail in a first-run --check (package not
  installed yet); guard both with when: not ansible_check_mode so check is clean.
Applied to askari: SSH hardened, fail2ban active, ping still works (no lockout).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:54:23 +02:00
a111a20cc8 test(base): Molecule coverage for ssh hardening + fail2ban
Add explicit base__ssh_authorised_keys: [] default to prevent
undefined-var errors in Molecule. Extend verify.yml with sshd
drop-in validation, PasswordAuthentication check, and fail2ban
jail assertion. Pre-create /run/sshd in ssh.yml so sshd -t
works in containers before the service has ever started.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:47:42 +02:00
deec75de0f feat(base): ssh hardening + fail2ban (hardening concern, ADR-002)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:42:56 +02:00
76bd1d63fc fix(public_dns): index loop keys with item['key'] not item.key
item.values resolved to the dict's built-in .values() METHOD, not the 'values'
key, so gandi_livedns received '<built-in method values of dict object at 0x..>'
as the TXT value — garbage AND non-idempotent (the address changes each run).
Bracket-index all loop fields. Caught only by the live apply (apply=false Molecule
+ data-only pytest both missed it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:57:23 +02:00
078d1ad9d9 fix(public_dns): drop null-MX (Gandi rejects '0 .'); remove MX instead
Gandi LiveDNS rejects the RFC-7505 null-MX value '0 .' ('invalid format for MX
record'), which failed the live apply. No MX + no apex A = no mail delivery, and
SPF -all + DMARC reject still prevent spoofing — so remove Gandi's seeded MX (add
@/MX to absent) rather than declare a null-MX present. Assert now requires an SPF
@/TXT record; tests + Molecule sample updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:53:54 +02:00
e247af6e55 test(public_dns): Molecule scenario (apply disabled, no live API)
Converge runs in CI; the no-op apply=false scenario adds no local signal over
the pytest, and the test image is on an unreachable registry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:36:40 +02:00
bd84dd0213 feat(public_dns): role tasks, defaults, meta, README
Implement M1: manage wingu.me public DNS zone at Gandi LiveDNS via
community.general.gandi_livedns (PAT from vault.gandi.pat). Adds
assertion guard for domain + null-MX, present/absent record loops
with run_once, and apply-gate for Molecule dry-run mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:34:42 +02:00
70c302d7e5 scaffold(public_dns): empty role structure
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:30:02 +02:00
03d33f83dd fix(O1): scaffold docker_host role so make lint passes on main
playbooks/site.yml imports the docker_host role, but it didn't exist, so
ansible-lint's syntax-check failed on a clean checkout — breaking CLAUDE.md's
"main must always work" / "Never skip lint" (top open finding O1 from the
2026-06-11 review).

Scaffold docker_host as a proper placeholder via the prescribed mechanism
(make new-role): filled meta/main.yml + README, an honest no-task tasks/main.yml
documenting planned scope (Docker engine + Compose, daemon hardening, nftables.d
container rules per ADR-004/020), and the standard molecule scenario. This
preserves site.yml's full-standard-state intent rather than dropping the play.

Update STATUS.md (docker_host moves from "Not in git" to "scaffolded, no tasks")
and the role/playbook READMEs to match.

make lint: 0 failures, 0 warnings; check-tags OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:53:55 +02:00
1da117d65b docs(review): 2026-06-11 repo audit — fix build-wave doc drift
/review-repo run at 67f2aba. Auto-fixed 5 safe doc-drift items left by the
base(firewall)+dev_env build wave: README/playbook/role notes that still called
the roles "empty/not built", plus README tree gaps and the reciprocal ADR-021
cross-links in ADR-016/020.

18 open findings reported (not fixed). Headline: `make lint` is red on `main`
(site.yml imports the non-existent docker_host role) and an ADR-004 <-> ADR-022
backup-scope contradiction. Deferral checklist clean (0 stale-deferred); 7 of
12 prior findings confirmed resolved. See docs/reviews/2026-06-11-review.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:48:00 +02:00
aea4f8c3d6 dev_env: install Node.js from pinned tarball, drop npm bloat
Debian's npm package pulls a ~400-package node-* tree (the first deploy
installed 527 packages). Replace apt nodejs+npm with a pinned upstream Node
tarball (v20.19.2) installed to /opt + symlinked, mirroring the nvim install
pattern (ADR-014 pinning). npm/npx come bundled. Molecule verifies node/npm
on PATH; lint + idempotent converge green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:21:33 +02:00
607423d0e7 dev_env: install acl for become_user file copies
When the login user differs from the become_user (ubongo connects as sjat,
the role copies files as claude), Ansible needs ACLs on its temp files;
without the acl package it falls back to an unsupported chmod syntax and
fails. Molecule didn't catch it (root login can chown directly).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 14:09:12 +02:00
f3f382ae69 Add dev_env role: zsh/tmux/nvim for workstation-class hosts
A new role (separate from base) that gives workstation-class hosts (ubongo
now, mamba later) a clean interactive environment: zsh + oh-my-zsh +
oh-my-posh, tmux + TPM plugins, and neovim. Dotfiles are real files deployed
via GNU stow (not templated); pinned nvim v0.12.2 + oh-my-posh 29.0.1.

Configs re-derived (ADR-013) from AnsibleBaobabV4 + the operator's fisi setup
on boma's terms: no Nerd Font (headless host), no system LSP suite (nvim uses
mason), versions pinned (V4 tracks latest). Applied via playbooks/workstation.yml
to the control group for users sjat + claude. Lint + Molecule (idempotent) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:50:11 +02:00
0c507bbace feat(base): add ssh-from-control management-plane source (ADR-021)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 17:43:55 +02:00
402913efb3 fix(base): make rollback snapshot restorable (flush-prefixed)
Bare 'nft list ruleset' has no leading flush, so the timer's 'nft -f rollback'
was a no-op on first apply (empty file) and errored ('table exists') on later
applies — the auto-rollback silently did nothing, defeating the askari lockout
safeguard. Prepend 'flush ruleset' so the revert is atomic + self-contained.
Verified the snapshot->lockout->revert round-trip in an isolated netns.
Also fix stale STATUS prose (base is partially built, not absent).
2026-06-06 19:15:38 +02:00
6fb104e934 test(base): molecule verify asserts rendered firewall rules + nft -c
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 19:07:24 +02:00
b006196cc5 fix(base): confirm firewall apply over a FRESH connection
established/related keeps the in-flight session alive across the swap, so the
prior 'next task runs' confirm always passed even if new connections were
bricked — the rollback was theater. reset_connection + wait_for_connection now
force a fresh handshake through the new ruleset; failure aborts the play and the
armed timer reverts. (meta: reset_connection ignores 'when' — benign extra
reconnect on no-op runs; verified idempotent in molecule.)
2026-06-06 19:06:39 +02:00
026a29f609 feat(base): safe nftables apply with systemd-run auto-rollback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 19:03:58 +02:00
bca74458fb fix(base): iifname for load-time safety; zone-source molecule fixture
nft -c rejects iif "wt0" when the interface is absent (container, or any host
before NetBird); iifname matches by name and is robust to wt0 coming/going.
Drop the ansible_host fixture override (the docker connection uses it as the
container name) — molecule covers zone resolution, pytest covers service->IP.
2026-06-06 19:02:50 +02:00
eeab5ed8de feat(base): render nftables ruleset from catalog (+ molecule fixture)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:57:44 +02:00
7dae93e4e1 fix(base): firewall resolver fails fast on empty/malformed sources; cover hosts: + proto default
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:56:04 +02:00
4127f8bc6b feat(base): firewall catalog resolver filter plugin + tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:51:10 +02:00
390cd3b335 feat(base): shared firewall catalog/zones + firewall defaults
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:49:40 +02:00
2486e31f7d feat(base): scaffold role + meta/README (firewall concern incoming)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 18:48:35 +02:00
93f2a847c7 Reconcile CI to trunk-based; mark base/docker_host not-built (R6-R8,R15-R16)
R6/R7: ADR-003 & ADR-008 CI pipelines rewritten trunk-based (push to main ->
test -> staging -> [manual gate] production); CLAUDE.md no longer forbids pushing
to main. R8: STATUS/roles-README/site.yml now say base & docker_host are not built
(not in git), so a clean clone errors. R15/R16: ADR-001 table flagged as intended
design; dropped the unbuilt 'monitoring agent' from the baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:32:37 +02:00
3f1d7eb128 Add core Ansible scaffold, tooling, and pre-commit guards
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:10:01 +02:00