diff --git a/STATUS.md b/STATUS.md index 134a8bb..88907a3 100644 --- a/STATUS.md +++ b/STATUS.md @@ -28,7 +28,7 @@ _Last reviewed: 2026-06-11._ | Tag standard + enforcement (ADR-019) | Works — `tests/tags.yml` (closed vocabulary) + `scripts/check-tags.py` (run by `make lint`, unit-tested): enforces the tag vocabulary and that each role import in a play's `roles:` block carries its role-name tag. Governs mostly-unbuilt roles, but the linter is live now. Proxmox VM tag convention (``, group, `managed-by=terraform`) is in the Terraform HCL but unprovisioned. | | `roles/dev_env/` — interactive developer environment | **Built + applied.** zsh + oh-my-zsh + oh-my-posh, tmux + TPM plugins, neovim; dotfiles deployed via GNU stow (re-derived from V4/fisi per ADR-013). Node.js from a pinned upstream tarball (not Debian's npm). Lint + Molecule (idempotent) green. **Applied to `ubongo`** for users `sjat` + `claude` (verified: zsh login shells, stow-symlinked `.zshrc`/`.tmux.conf` + nvim config, oh-my-zsh, tmux plugins; nvim v0.12.2, oh-my-posh 29.0.1). Run via `playbooks/workstation.yml` against the `control` group (no dedicated `workstations` group yet). | | `make check` / `make deploy PLAYBOOK=` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). | -| `roles/public_dns/` + `playbooks/dns.yml` | **Built — not yet applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (null MX, SPF `-all`, DMARC reject), and the Gandi-defaults purge list are defined + unit-tested (`tests/test_public_dns.py`). The live `make deploy PLAYBOOK=dns` (purge + baseline) is **pending — run on ubongo**. M1 of the roadmap. | +| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (SPF `-all` + DMARC reject), and the Gandi-defaults purge are defined + unit-tested (`tests/test_public_dns.py`). **Applied to wingu.me (2026-06-14):** purged Gandi's 13 seeded defaults; zone now holds only the SPF + DMARC TXT records; idempotent re-run clean. No null-MX (Gandi rejects `0 .`) — the MX is removed, so no MX + no apex A = no mail. M1 of the roadmap. | | `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (no TF state yet). | ## Scaffolded but empty — NOT implemented diff --git a/docs/FRICTION.md b/docs/FRICTION.md index accefb9..7a188bd 100644 --- a/docs/FRICTION.md +++ b/docs/FRICTION.md @@ -21,6 +21,28 @@ earning its keep. _(append new raw signals here; the next kaizen review consumes them)_ +- `[gotcha]` **`item.values` in a loop sends the dict's `.values()` METHOD, not the + key** (2026-06-14): the `public_dns` role looped over records that have a `values:` + key and used `{{ item.values }}` in the `gandi_livedns` task. Jinja attribute access + resolved `item.values` to the built-in dict method, so Gandi received + `""` as the live TXT value — corrupt + **and** non-idempotent (the address changes each run → always "changed"). The fix is + bracket-indexing: `item['values']` (same risk for any key named `keys`/`items`/`get`/ + `update`/...). → convention: in loops, index loop-var keys with `item['key']`, never + `item.key`; consider an ansible-lint guard. +- `[gotcha]` **Gandi LiveDNS rejects RFC-7505 null-MX `0 .`** (2026-06-14): "invalid + format for MX record." Used "no MX + no apex A" + SPF `-all` + DMARC reject instead. + Minor, but worth a note for any future no-mail domain on Gandi. +- `[recurring]` **apply=false Molecule + data-only pytest leave a real gap for + API/templating roles** (2026-06-14): both the null-MX and the `item.values` bugs sailed + through the spec, BOTH review subagents, the pytest (validates the data file, not the + rendered template), and the Molecule scenario (`apply=false`, so the API tasks never + run) — only the **live `make check`/`deploy`** against the real Gandi API surfaced them. + For roles whose payload is "render data → external API call", the rendered template is + the thing that breaks, and nothing short of a real (or check-mode) API call exercises it. + → for such roles, treat a check-mode run against the real API as a required gate, not an + optional final step; or build a render-only assertion that materializes the module args. + - `[recurring]` **Execution-mode menu asked AGAIN despite the 2026-06-10 "mechanical fix"** (2026-06-14): at the M1 (`public_dns`) plan handoff I presented the "1. Subagent-Driven / 2. Inline Execution — which approach?" menu and asked the user to