diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index ba35b03..67efb16 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -47,30 +47,34 @@ this collapses into interleaving with extra context-switching cost). Delivers mobile access to `ubongo`; proves the machinery. Ordered by *real* dependencies. -### M1 · Gandi DNS migration — managed as code +### M1 · boma's DNS home — a new domain at Gandi, managed as code -Move `baobab.band` authoritative DNS (and registrar) off Cloudflare to **Gandi**, with -records **managed as code (IaC)**, not hand-edited in a panel. +Register a **new Swahili-themed domain at Gandi** for boma and manage its records **as +code (IaC)**. Greenfield, not a migration: investigating the existing domains ruled them +out as boma's home — `baobab.band` is the **live legacy homelab** (Cloudflare; vaultwarden +/ nextcloud / matrix in daily use), and `ziethen.dk` is the **family's primary email** +(Fastmail); moving either's authoritative DNS risks breaking production. A fresh domain is +zero-risk and *born at Gandi*. -- **Driver:** values/sovereignty (Gandi over Cloudflare) — *not* a NetBird technical - prerequisite. Sequenced **first** anyway, so `askari`'s records are born at Gandi and - Cloudflare is never touched again. +- **Driver:** values/sovereignty (Gandi) + a clean, decoupled home so boma builds without + endangering anything live. `baobab.band`'s Cloudflare exit / V4 decommission is a + **separate, later track**, not part of this build. `ziethen.dk` is untouched. - **IaC approach:** follow boma's grain — internal DNS is already Ansible-rendered and Terraform owns *no* DNS (CLAUDE.md), so **public DNS is Ansible-managed too** (Gandi LiveDNS via an Ansible module — exact module pinned in M1's spec, verified per ADR-014). -- **Naming scheme (decided):** three tiers — `.boma.baobab.band` (infra, - internal-only) · `.baobab.band` (home/cluster services, split-horizon) · - `.askari.baobab.band` (off-site/VPS, public). **`nyumbani` dropped.** Home - services are **mesh/LAN-only by default** (no public record; reached over LAN or the - NetBird mesh), with public Gandi records only for deliberate exceptions. The NetBird - mesh carries the `baobab.band` match-domain to road-warriors (resolver = dns1/dns2 over - `wt0`); a `*.baobab.band` ACME **DNS-01** wildcard cert (Gandi API) gives even - unexposed services real TLS. Resolves TODO 4 and review finding O12. -- **Care:** the live record `forgejo.nyumbani.baobab.band` (the git `origin` / Forgejo - remote, :7577) becomes `forgejo.baobab.band` — cutover must update the remote + CI - without breaking pushes. -- **Records as a new/updated ADR:** amends ADR-007 — public DNS provider → Gandi LiveDNS - managed as code; the three-tier naming scheme; `nyumbani` removed; mesh/LAN-only default. +- **Naming scheme (decided):** three tiers (on boma's new domain, ``) — + `.boma.` (infra, internal-only) · `.` + (home/cluster services, split-horizon) · `.askari.` (off-site/VPS, + public). **`nyumbani` dropped.** Home services are **mesh/LAN-only by default** (no + public record; reached over LAN or the NetBird mesh), with public Gandi records only for + deliberate exceptions. The NetBird mesh carries the `` match-domain to + road-warriors (resolver = dns1/dns2 over `wt0`); a `*.` ACME **DNS-01** + wildcard cert (Gandi API) gives even unexposed services real TLS. Resolves TODO 4 and + review finding O12. +- **Records as a new/updated ADR:** amends ADR-007 — boma's public zone is + `` at Gandi LiveDNS managed as code; the three-tier naming scheme; + `nyumbani` removed; mesh/LAN-only default; `baobab.band` (legacy, Cloudflare) is out of + scope. - **Maps to:** ADR-007 (network/DNS), ADR-016 (mesh DNS), TODO 4 (**resolved here**). ### M2 · `askari` provisioned + under Ansible diff --git a/docs/superpowers/specs/2026-06-11-public-dns-gandi-migration-design.md b/docs/superpowers/specs/2026-06-11-public-dns-gandi-migration-design.md index 4a91c23..40f9785 100644 --- a/docs/superpowers/specs/2026-06-11-public-dns-gandi-migration-design.md +++ b/docs/superpowers/specs/2026-06-11-public-dns-gandi-migration-design.md @@ -1,53 +1,64 @@ -# Design — Public DNS migration to Gandi (DNS-as-code) +# Design — boma's DNS home: a new domain at Gandi (DNS-as-code) -- **Date:** 2026-06-11 +- **Date:** 2026-06-11 · **Revised:** 2026-06-12 (Option B — boma gets its own new domain; + supersedes this spec's original "migrate `baobab.band` off Cloudflare" framing) - **Status:** Draft for review — design settled in brainstorming; pending user review, then implementation plan - **Roadmap milestone:** M1 (`docs/ROADMAP.md`) -- **Resolves:** TODO 4 (split-horizon FQDN — with/without `nyumbani`); review finding - O12 (ADR-007 FQDN convention contradicts its own example) -- **Amends:** ADR-007 — public DNS provider → **Gandi LiveDNS, managed as code**; the - three-tier naming scheme; `nyumbani` removed; mesh/LAN-only default exposure -- **Becomes:** an ADR-007 amendment (no new ADR unless the `public_dns` role grows - concerns of its own) +- **Resolves:** TODO 4 (split-horizon FQDN — with/without `nyumbani`); review finding O12 +- **Amends:** ADR-007 — boma's public zone is a **new domain at Gandi LiveDNS, managed as + code**; the three-tier naming scheme; `nyumbani` removed; mesh/LAN-only default +- **Becomes:** an ADR-007 amendment (no new ADR unless `public_dns` grows its own concerns) --- ## Problem -Move `baobab.band` authoritative DNS **and registration** off Cloudflare to **Gandi**. -The driver is values/sovereignty (Gandi over Cloudflare) — it is **not** a NetBird -prerequisite, but it is sequenced first (roadmap M1) so `askari`'s records are born at -Gandi and Cloudflare is never touched again. Do it **as code**, consistent with boma's -grain: internal DNS is already Ansible-rendered and Terraform owns *no* DNS (CLAUDE.md). -While in here, settle the long-open naming question (`nyumbani`, TODO 4 / O12). +boma needs a DNS home. Investigating the obvious candidates ruled them out as *boma's* +home: + +- **`baobab.band`** is the **live legacy homelab** (on Cloudflare): `vaultwarden`, + `nextcloud`, `matrix`/`element`, `collabora`, `ntfy`, `radio`, … in daily use, much of + it riding `*.baobab.band` / `*.nyumbani.baobab.band` wildcards. Moving its authoritative + DNS risks breaking production. +- **`ziethen.dk`** is the **family's primary email** (Fastmail). Moving a live email + domain's DNS is the highest-stakes DNS operation there is — worse, not better. + +**Decision: register a NEW Swahili-themed domain at Gandi for boma.** Greenfield, +zero-risk, *born at Gandi* — so it satisfies the DNS-as-code + sovereignty goal natively +with **no migration at all**. The existing domains are decoupled: `baobab.band`'s +Cloudflare exit / V4 decommission is a **separate, later track** (handled when boma +replaces what it hosts), and `ziethen.dk` is untouched. + +boma's domain is **`wingu.me`** (registered at Gandi 2026-06-14; *wingu* = Swahili for +*cloud*). The `public_dns` role keeps it as a variable (`public_dns__domain`) so it stays +swappable. + +**Starting state (verified 2026-06-14):** Gandi auto-seeded the zone with **13 default +records** — apex parking `A`, `www` web-redirect, and a full Gandi mailbox set (`MX`, SPF, +three `*._domainkey` DKIM CNAMEs, `webmail`, IMAP/POP/submission `SRV`). None are boma's; +wingu.me sends no mail (email stays at `ziethen.dk`). See the setup sequence for the +one-time purge + anti-spoof baseline. ## Decisions (as settled) -1. **Full registrar transfer.** Registration *and* authoritative DNS move to Gandi — - fully exits Cloudflare. (DNS-only would strand the registration at Cloudflare and is - likely impossible anyway, since Cloudflare Registrar requires Cloudflare nameservers.) -2. **Three-tier naming scheme** (the convention — see table below). `nyumbani` is +1. **New domain, registered at Gandi.** No transfer, no migration, no Cloudflare/Fastmail + entanglement. (Human registers + pays — see division of labour.) +2. **Three-tier naming scheme** (re-homed to `wingu.me`) — see table. `nyumbani` **dropped**. -3. **Mesh/LAN-only by default.** Home/cluster services have **no public record**; they - are reached over LAN or the NetBird mesh. Public Gandi records exist only for - deliberate exceptions (today: `forgejo`, the `askari` tier). -4. **DNS-as-code via a control-node `public_dns` role** driven by structured record data - in `group_vars` — the same pattern as the firewall catalog, and exactly what ADR-007 - already calls "service/alias/split-horizon records … explicit zone data in - `group_vars`." Name is provider-agnostic on purpose. +3. **Mesh/LAN-only by default.** Home/cluster services have **no public record**; reached + over LAN or the NetBird mesh. Public Gandi records only for deliberate exceptions. +4. **DNS-as-code via a control-node `public_dns` role** driven by record data in + `group_vars` (same pattern as the firewall catalog). Name is provider-agnostic. 5. **Tooling: `community.general.gandi_livedns` with `personal_access_token`** (PAT). - Re-adds `community.general` to `requirements.yml` under the collections-on-demand - policy (a committed role now uses `gandi_livedns`), pinned `>=9.0.0`, with the naming - comment. -6. **Clean by omission.** Stale records and the (unused) MX are *not* deleted at - Cloudflare — the zone is abandoned. Only wanted records are carried to Gandi. -7. **Cert scope: DNS + PAT only.** M1 ends at the migrated zone + the PAT in vault, which - *enables* ACME DNS-01 later. **No certificate issuance in M1** — that lands with a - reverse proxy (askari in M4, home in Phase 2). -8. **Human/agent division of labour** (see table) — account, payment, registrar - transfer, and the go-live nameserver flip are human; all record-wrangling, the IaC, - and the post-flip cutover are the agent's, executed from `ubongo`. + Re-adds `community.general` to `requirements.yml` (collections-on-demand; a committed + role uses `gandi_livedns`), pinned `>=9.0.0`, with the naming comment. +6. **Cert scope: DNS + PAT only.** M1 ends at the zone + PAT in vault, which *enables* + ACME DNS-01 later. No cert issuance in M1 (reverse proxy → askari M4 / home Phase 2). +7. **Human/agent division of labour** (see table) — register + pay + PAT are human; all + record/IaC work is the agent's, from `ubongo`. +8. **Explicitly out of scope:** `baobab.band` (and its Cloudflare exit / V4 decommission) + and `ziethen.dk` — separate later tracks. ## Verified facts (ADR-014) @@ -58,139 +69,123 @@ While in here, settle the long-open naming question (`nyumbani`, TODO 4 / O12). > 2026-06-11 > - Module params: `domain`, `record`, `type`, `values` (list), `ttl`, `state` > (`present`/`absent`). Supports **check mode + diff**. -> - Auth is per-task: pass `personal_access_token: "{{ vault.gandi.pat }}"`. - -> unverified (from memory — confirm during implementation): the current registrar of -> `baobab.band` (WHOIS) — determines whether the transfer is Cloudflare→Gandi or -> elsewhere→Gandi, and the exact unlock/EPP steps. +> - Auth is per-task: `personal_access_token: "{{ vault.gandi.pat }}"`. ## Naming scheme (the convention) | Tier | Pattern | Authoritative source | Public? | |---|---|---|---| -| Infrastructure / hosts | `.boma.baobab.band` | internal zone (`dns1`/`dns2`, Phase 2) | never | -| Home / cluster services | `.baobab.band` | internal zone (split-horizon) | only deliberate exceptions | -| Off-site / VPS services | `.askari.baobab.band` | Gandi LiveDNS | yes (askari has a stable public IP) | +| Infrastructure / hosts | `.boma.wingu.me` | internal zone (`dns1`/`dns2`, Phase 2) | never | +| Home / cluster services | `.wingu.me` | internal zone (split-horizon) | only deliberate exceptions | +| Off-site / VPS services | `.askari.wingu.me` | Gandi LiveDNS | yes (askari has a stable public IP) | -- **`nyumbani` removed.** It namespaced "home," but home is the default; only the - *exception* needs naming, and `askari.baobab.band` does that, self-documenting. +- **`nyumbani` removed** — home is the default; only the exception (`askari`) needs naming. - **The mesh carries "internal" to road-warriors.** NetBird pushes `dns1`/`dns2` (over - `wt0`) as the resolver for the `baobab.band` match-domain, so on-LAN-or-on-mesh → - internal answer; truly public → Gandi (ties M1 ↔ ADR-016 / M5). -- **Wildcard TLS later.** A `*.baobab.band` (and `*.askari.baobab.band`) ACME **DNS-01** - cert via the Gandi PAT gives even unexposed services real public-CA TLS — without a - public A record. Enabled by M1, issued in M4/Phase 2. + `wt0`) as resolver for the `wingu.me` match-domain → on-LAN-or-on-mesh resolves + internal; truly public resolves at Gandi (ties M1 ↔ ADR-016 / M5). +- **Wildcard TLS later.** `*.wingu.me` ACME DNS-01 (Gandi PAT) gives even unexposed + services real TLS without a public A record. Enabled by M1, issued in M4/Phase 2. -## Architecture — two deliverables (kept separate on purpose) +## Architecture — two deliverables -### (A) One-time migration — a runbook (`docs/runbooks/`) +### (A) One-time setup — a short runbook (`docs/runbooks/`) -Registrar transfers and the nameserver flip cannot be IaC'd. This is a human-gated -procedure (sequence below), executed once. +Greenfield, so this is small and low-risk (contrast the abandoned migration framing): +register the domain, create the LiveDNS zone, issue the PAT. No transfer, no live-zone +cutover. ### (B) `public_dns` — the reusable IaC role - Runs **from the control node** (`delegate_to: localhost`, or a `dns.yml` play targeting - `control`) against the Gandi LiveDNS API — there is no managed *host*, only API calls. + `control`) against the Gandi LiveDNS API — no managed *host*, only API calls. - Reconciles records from **`group_vars` data** via `community.general.gandi_livedns`, - PAT from `vault.gandi.pat`. -- **Check-mode/diff first**, always (boma's check-before-deploy; the module supports it). -- Carries only the public-tier records (exceptions + `askari` tier); the mesh/LAN-only - default keeps this set small. + PAT from `vault.gandi.pat`. **Check-mode/diff first**, always. #### Data model (sketch) ```yaml # inventories/production/group_vars/all/public_dns.yml -public_dns__domain: baobab.band +public_dns__domain: "wingu.me" public_dns__records: - - { record: forgejo, type: A, values: [""], ttl: 1800 } - - { record: askari, type: A, values: [""], ttl: 1800 } - # mesh/LAN-only services are intentionally ABSENT — they live only in the internal zone. + # Anti-spoof baseline for a no-mail domain (replaces Gandi's seeded mail set): + - { record: "@", type: MX, values: ["0 ."], ttl: 3600 } + - { record: "@", type: TXT, values: ['"v=spf1 -all"'], ttl: 3600 } + - { record: _dmarc, type: TXT, values: ['"v=DMARC1; p=reject;"'], ttl: 3600 } + # Service records appear as public-tier needs arise; near-empty at M1. + # askari / NetBird records land in M4, e.g.: + # - { record: askari, type: A, values: [""], ttl: 1800 } + # mesh/LAN-only services are intentionally ABSENT — internal zone only. # PAT referenced as {{ vault.gandi.pat }} (nested vault.., CLAUDE.md). ``` #### Open design nuance — additive vs authoritative -`gandi_livedns` is **per-record** (`present`/`absent`); it does not whole-zone sync. To -make the repo *authoritative* (prune undeclared records — cf. TODO 8.3's prune question), -the role would need to GET existing records and remove those not declared. **M1 decision:** -start **additive** (declare what we want; remove the old via explicit `absent` entries -during cutover); flag full-zone pruning as a possible later enhancement. Avoids -accidentally deleting a record someone added out-of-band before the repo is the single -source of truth. +`gandi_livedns` is **per-record** (`present`/`absent`), not whole-zone sync. Gandi seeded +`wingu.me` with 13 default records (above), so M1 needs a **one-time purge** of those to a +clean baseline (declare them `state: absent`, or a one-shot scripted delete), then manage +**additively**. Full-zone authoritative sync (GET existing → remove undeclared — the +proper end-state, and TODO 8.3's prune question) is flagged as a later enhancement. -## Cutover sequence (the runbook) +## Setup sequence (the runbook) Legend: **[H]** human · **[A]** agent (from `ubongo`, committed code + check-mode). -1. **[A]** Inventory: parse the **Cloudflare zone export** (BIND file the user downloads, - tokenless) → full record list; classify keep / rename / drop (incl. unused MX + stale). -2. **[A]** Draft `public_dns__records` (new scheme) + the `public_dns` role; PR/commit; - `make check` shows the intended Gandi state as a diff. -3. **[H]** Create/verify the Gandi account; issue a **LiveDNS-scoped PAT** for - `baobab.band`; store it in vault (`vault.gandi.pat`) via rbw. **[H]** Lower TTLs on the - *old* Cloudflare zone ~24–48h ahead. -4. **[A]** Create the zone in Gandi LiveDNS and load records (`make deploy`, after a clean - `make check`). Validate with `dig @`. -5. **[H]** Initiate the **registrar transfer** to Gandi (unlock at Cloudflare, get - EPP/auth code, start at Gandi, ACK to expedite; ~5 days — DNS keeps resolving). -6. **[H, go-live]** **Flip nameservers** to Gandi LiveDNS. (Irreversible/outward-facing — - explicit human go.) -7. **[A]** Post-flip: validate resolution; **rename the Forgejo remote + CI** - (`forgejo.nyumbani.baobab.band` → `forgejo.baobab.band`); verify a push. -8. **[A/H]** Confirm propagation; **[H]** decommission the Cloudflare zone. +1. **[H]** Register `wingu.me` at Gandi; pay. **[H]** Issue a **LiveDNS-scoped PAT** + for it; store in vault (`vault.gandi.pat`) via rbw. +2. **[A]** Author the `public_dns` role + `public_dns__records` data (incl. the anti-spoof + baseline); add `community.general` to `requirements.yml` (≥9.0.0, with comment); commit. +3. **[A]** One-time: **purge Gandi's 13 seeded defaults** (parking `A`, `www` redirect, + Gandi mail `MX`/SPF/DKIM/`webmail`/`SRV`) down to the boma baseline. +4. **[A]** `make check` (diff vs live Gandi) → `make deploy` to load records → `dig` + verify. Re-run `make deploy` to confirm idempotence. +4. Thereafter the zone is reconciled as code; M4 adds the `askari`/NetBird records. + +No registrar transfer, no nameserver flip of a live zone, no service-preservation, +no Forgejo rename — all of that belonged to the abandoned `baobab.band` framing. ## Division of labour & access (security posture) | Task | Who | How | |---|---|---| -| Zone inventory | Agent | From the Cloudflare **export** (tokenless). | -| New record set + `public_dns` role + data | Agent | Committed IaC; `make check` diff. | -| Gandi account, transfer, payment | Human | Identity/billing/e-mail/ToS — not automatable. | -| Create zone + load records + reconcile | Agent | `public_dns` role on `ubongo`, PAT from vault, check-mode first. | -| Nameserver flip / go-live | Human-gated | Agent preps + validates; human flips. | -| Forgejo remote + CI cutover | Agent | After flip; verify push. | -| Delete stale Cloudflare records | Nobody | Cleaned by omission. | +| Register domain + pay | Human | Identity/billing/ToS — not automatable. | +| Issue + store the PAT | Human | LiveDNS-scoped, single-domain; into vault via rbw. | +| `public_dns` role + record data | Agent | Committed IaC; `make check` diff. | +| Create zone + load records + reconcile | Agent | `public_dns` on `ubongo`, PAT from vault, check-mode first. | -- **Minimal token scope.** Gandi PAT: **LiveDNS-only**, restricted to `baobab.band`. - Cloudflare: prefer the **tokenless export**; if an API token is used, **read-only, - single-zone, throwaway** — revoke once inventory is captured. -- **Tokens live in boma's vault** (`vault.gandi.pat`) via rbw — never pasted in chat. -- **Execution on `ubongo`**, not in any agent sandbox: committed role + `make check` → - `make deploy`. Irreversible/outward steps (NS flip, go-live) require explicit human - confirmation. +- **Minimal token scope.** Gandi PAT: **LiveDNS-only**, restricted to `wingu.me`. +- **Token in vault** (`vault.gandi.pat`) via rbw — never pasted in chat. +- **Execution on `ubongo`**, committed role + `make check` → `make deploy`. No agent + sandbox holds production credentials. ## Testing & verification External-API reconciliation does not fit container Molecule cleanly (a nuance against -ADR-008 — not every role gets a converge-in-a-container scenario). Instead: - -- **`make check` (check-mode + diff)** against live Gandi before any apply. -- **Idempotence:** a second `make deploy` reports no changes. -- **`dig` assertions** post-cutover: new names resolve to expected values; a Forgejo - push over `forgejo.baobab.band` succeeds. -- Optionally a small pytest over the `public_dns__records` data shape (types, no - duplicate record/type pairs), mirroring `test_firewall_rules.py`. +ADR-008). Instead: **`make check` (check-mode + diff)**, **idempotence** (second deploy = +no changes), **`dig` assertions** post-load, and optionally a small pytest over the +`public_dns__records` data shape (mirrors `test_firewall_rules.py`). ## Scope boundaries — what M1 is NOT -- **Not** the internal split-horizon `dns` role (renders `.baobab.band` +- **Not** a migration of `baobab.band` or `ziethen.dk` — and **not** the Cloudflare exit / + V4 decommission. Those are separate, later tracks. +- **Not** the internal split-horizon `dns` role (renders `.wingu.me` privately) — that needs the `dns` role + actual home services → **Phase 2**. - **Not** certificate issuance or the reverse proxy — **M4 (askari) / Phase 2 (home)**. -- **Not** authoritative whole-zone pruning — additive for now (see nuance above). +- **Not** authoritative whole-zone pruning — additive for now. ## ADR work -Amend **ADR-007**: public zone provider → **Gandi LiveDNS, managed as code** (replaces -"Cloudflare or equivalent"); record the **three-tier naming scheme**; remove the -`nyumbani` example; state the **mesh/LAN-only default**. Note `public_dns` as the -control-node role that renders the public zone (sibling to the internal `dns` role). +Amend **ADR-007**: boma's public zone is **`wingu.me` at Gandi LiveDNS, managed as +code** (replaces "Cloudflare or equivalent"); record the **three-tier naming scheme**; +remove the `nyumbani` example; state the **mesh/LAN-only default**; note `public_dns` as +the control-node role rendering the public zone (sibling to the internal `dns` role). Note +that `baobab.band` (legacy, Cloudflare) is **not** boma's zone and is out of ADR-007's +scope going forward. ## Open items (resolve during the plan / implementation) -- **Cloudflare zone export** → the exact record list (execution input, not a design gap). -- **WHOIS** the current registrar → confirm transfer source + unlock/EPP steps. +- ~~Pick the domain~~ **DONE:** `wingu.me` registered at Gandi; LiveDNS PAT verified + (2026-06-14) and stored in vault as `vault.gandi.pat`. - **Pin** the `community.general` version in `requirements.yml` (≥9.0.0). - **Play wiring:** a dedicated `dns.yml` play (control-targeted) vs folding into an existing play — decide in the plan.