From f170ffd936a1fb3b0296ca23b29b55be93687e86 Mon Sep 17 00:00:00 2001 From: sjat Date: Sun, 14 Jun 2026 10:38:45 +0200 Subject: [PATCH] docs(public_dns): amend ADR-007 to wingu.me/Gandi; resolve TODO 4; STATUS + CAPABILITIES Co-Authored-By: Claude Opus 4.8 (1M context) --- STATUS.md | 1 + docs/CAPABILITIES.md | 1 + docs/TODO.md | 4 +++- docs/decisions/007-network.md | 17 ++++++++++++----- 4 files changed, 17 insertions(+), 6 deletions(-) diff --git a/STATUS.md b/STATUS.md index bdba503..134a8bb 100644 --- a/STATUS.md +++ b/STATUS.md @@ -28,6 +28,7 @@ _Last reviewed: 2026-06-11._ | Tag standard + enforcement (ADR-019) | Works — `tests/tags.yml` (closed vocabulary) + `scripts/check-tags.py` (run by `make lint`, unit-tested): enforces the tag vocabulary and that each role import in a play's `roles:` block carries its role-name tag. Governs mostly-unbuilt roles, but the linter is live now. Proxmox VM tag convention (``, group, `managed-by=terraform`) is in the Terraform HCL but unprovisioned. | | `roles/dev_env/` — interactive developer environment | **Built + applied.** zsh + oh-my-zsh + oh-my-posh, tmux + TPM plugins, neovim; dotfiles deployed via GNU stow (re-derived from V4/fisi per ADR-013). Node.js from a pinned upstream tarball (not Debian's npm). Lint + Molecule (idempotent) green. **Applied to `ubongo`** for users `sjat` + `claude` (verified: zsh login shells, stow-symlinked `.zshrc`/`.tmux.conf` + nvim config, oh-my-zsh, tmux plugins; nvim v0.12.2, oh-my-posh 29.0.1). Run via `playbooks/workstation.yml` against the `control` group (no dedicated `workstations` group yet). | | `make check` / `make deploy PLAYBOOK=` | **Works.** First end-to-end run (applying `dev_env`) surfaced + fixed latent bugs: Makefile `PLAYBOOK` var collision (binary path vs playbook-name arg) meant the targets never ran; `ansible.cfg` referenced uninstalled community.general callbacks (now built-in `default` + `ansible.posix.profile_tasks`); `acl` package added so Ansible can `become_user` an unprivileged user. The make targets now function — though `site`/`base`/`docker_host` content is still incomplete (see below). | +| `roles/public_dns/` + `playbooks/dns.yml` | **Built — not yet applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); record data, anti-spoof baseline (null MX, SPF `-all`, DMARC reject), and the Gandi-defaults purge list are defined + unit-tested (`tests/test_public_dns.py`). The live `make deploy PLAYBOOK=dns` (purge + baseline) is **pending — run on ubongo**. M1 of the roadmap. | | `ubongo` — physical control / AI-worker host (ADR-015) | **Built (partial).** Debian 13.5 on a Lenovo M70q (i3-10100T, 16 GB, 256 GB SSD; no disk encryption — accepted risk). Full toolchain installed + pinned to `fisi` (Docker 29.5.3, rbw 1.15.0, Claude Code 2.1.173, ansible-core 2.17.14 + molecule via `make setup`/`make collections`). Repo cloned under a dedicated `claude` user (docker group, no sudo). Vault works via rbw (offline-cache decryption verified). SSH key-only (password + root login disabled). In the production inventory `control` group at 10.20.10.151. **`dev_env` now applied here** (zsh/tmux/nvim for `sjat` + `claude`, via `playbooks/workstation.yml`). Managed as the operator account `sjat` (`group_vars/control` sets `ansible_user: sjat`), not the `ansible` service user `group_vars/all` assumes — ubongo has no bootstrapped `ansible` user. **Pending:** NetBird mesh enrollment (so SSH is LAN-only); full `base` hardening (only the `firewall` concern exists, and it is NOT applied here — applying default-deny with no mesh would lock out inbound SSH on the physical NIC); proper `ansible`-user bootstrap (currently managed as `sjat`); OPNsense DHCP reservation for 10.20.10.151 (MAC `88:a4:c2:e0:ee:da`); Terraform state backup (no TF state yet). | ## Scaffolded but empty — NOT implemented diff --git a/docs/CAPABILITIES.md b/docs/CAPABILITIES.md index 56cee84..5a2caa5 100644 --- a/docs/CAPABILITIES.md +++ b/docs/CAPABILITIES.md @@ -26,6 +26,7 @@ decisions this frame enables. |---|---|---|---|---|---| | Reverse proxy / TLS | Traefik | P | core | Edge routing + ACME certs for everything exposed | Spin-up order names it (TODO 12) | | Internal DNS | `dns` role → dns1/dns2 | P | core | Authoritative internal zone (ADR-007) | Ansible-rendered zone | +| Public DNS | `public_dns` role → Gandi LiveDNS | P | core | wingu.me zone as code (ADR-007) | anti-spoof baseline; mesh/LAN-only default; apply pending | | VPN / remote access | NetBird (self-hosted on `askari`) | P | core | Secure mesh remote access to `srv`/`mgmt` | **Decided (ADR-016):** NetBird mesh replaces ADR-007 OPNsense WireGuard | | Service portal / dashboard | Homepage | A | candidate | One landing page listing all services — a "what does what" front door | Gap surfaced by V4; fits boma's legibility goal | diff --git a/docs/TODO.md b/docs/TODO.md index 178038a..0386a8c 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -50,7 +50,9 @@ 10. Should we keep the custom base-container (Molecule test image) method for role testing, or revisit it as boma's testing approach matures (ADR-008)? 11. ~~Deliberate tagging strategy.~~ DECIDED (ADR-019) — folded into 3.7. -4. **Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani? +4. ~~**Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?~~ + DECIDED (M1): three-tier scheme on `wingu.me`; `nyumbani` dropped; mesh/LAN-only + default. See `docs/decisions/007-network.md` + the M1 spec. 5. **Control node** 1. Set up and test the control node while waiting for hardware. diff --git a/docs/decisions/007-network.md b/docs/decisions/007-network.md index 0963685..fa1e00b 100644 --- a/docs/decisions/007-network.md +++ b/docs/decisions/007-network.md @@ -157,7 +157,8 @@ IoT devices cannot initiate connections to `srv`. | Infrastructure VMs | `` | `dns1`, `dns2`, `proxy` | | Hetzner VPS | `askari` | Swahili for guard/sentinel | | Internal FQDN | `.boma.baobab.band` | `dns1.boma.baobab.band` | -| Public service FQDN | `.baobab.band` | `forgejo.nyumbani.baobab.band` | +| Public service FQDN | `.wingu.me` | `vaultwarden.wingu.me` | +| Off-site (VPS) FQDN | `.askari.wingu.me` | `netbird.askari.wingu.me` | --- @@ -169,12 +170,18 @@ inventory (which derives from Terraform's `local.vms` via `make tf-inventory`), and service/alias/split-horizon records are explicit zone data in `group_vars`. Terraform itself writes no DNS records — see ADR-009. -**Public zone**: `baobab.band` — served by external DNS (Cloudflare or equivalent). -Public-facing services resolve to the public IP or Cloudflare proxy. +**Public zone**: `wingu.me` — Gandi LiveDNS, **managed as code** by the `public_dns` +role (`vault.gandi.pat`). Three-tier naming: infra `.boma.wingu.me` (internal), +services `.wingu.me` (split-horizon), off-site `.askari.wingu.me`. +`nyumbani` is retired. **Mesh/LAN-only by default**: home services have no public record +(reached over LAN or the NetBird mesh); only deliberate exceptions are published. The +project is `boma`; the domain is `wingu.me`. The legacy `baobab.band` zone (Cloudflare) +is out of scope here. **Split-horizon**: `dns1`/`dns2` serve internal answers for any hostname that has -both a public and private face. Example: `forgejo.nyumbani.baobab.band` resolves to -`10.20.0.12` (proxy) internally and to the public IP externally. +both a public and private face. Example: `vaultwarden.wingu.me` resolves to +`10.20.0.12` (proxy) internally and to the public IP externally (the internal +zone will be renamed to `boma.wingu.me` when the `dns` role is built — Phase 2). OPNsense DNS resolver forwards `boma.baobab.band` queries to `dns1`/`dns2`. All other queries go upstream (e.g., `1.1.1.1`, `9.9.9.9`).