boma/docs/superpowers/plans/2026-06-14-public-dns-m1.md
sjat b131ee317e docs(plan): M1 — public_dns implementation plan
Bite-sized TDD plan: add community.general; scaffold public_dns; wingu.me record
data + pytest; role tasks (gandi_livedns present/absent loops, apply toggle);
Molecule (apply=false, no live API); dns.yml play; gated live run on ubongo
(purge Gandi defaults + anti-spoof baseline + dig verify); ADR-007 amendment +
TODO 4 resolution + STATUS/CAPABILITIES.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:23:26 +02:00

551 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Public DNS (M1) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Build the `public_dns` role that manages `wingu.me`'s records at Gandi LiveDNS as code, purging Gandi's seeded defaults and applying boma's anti-spoof baseline.
**Architecture:** A control-node role drives `community.general.gandi_livedns` over declarative record lists in `group_vars/all/public_dns.yml` (mirroring the firewall-catalog pattern). Records to keep are `state: present`; Gandi's auto-seeded defaults are `state: absent`. A `public_dns__apply` toggle lets Molecule converge without calling the API; a pytest validates the data shape; the live run happens via `make check`/`deploy PLAYBOOK=dns` on ubongo.
**Tech Stack:** Ansible (`community.general.gandi_livedns`, PAT auth), pytest, Gandi LiveDNS API. Secrets from `vault.gandi.pat`.
**Spec:** `docs/superpowers/specs/2026-06-11-public-dns-gandi-migration-design.md`
**Execution context:** Tasks 16 + 8 are authoring (any machine with the venv). **Task 7 runs on ubongo** (has the vault + Gandi egress) and is the only one that touches live Gandi.
---
## File Structure
- `requirements.yml` (modify) — add `community.general` (≥9.0.0) for `gandi_livedns`.
- `roles/public_dns/` (create) — `defaults/main.yml`, `tasks/main.yml`, `meta/main.yml`, `README.md`, `molecule/default/`.
- `inventories/production/group_vars/all/public_dns.yml` (create) — `public_dns__domain` + `public_dns__records` (present) + `public_dns__absent` (Gandi defaults).
- `playbooks/dns.yml` (create) — control-node play running the role.
- `tests/test_public_dns.py` (create) — pytest over the record data.
- `docs/decisions/007-network.md`, `STATUS.md`, `docs/TODO.md`, `docs/CAPABILITIES.md` (modify) — doc reconciliation.
---
### Task 1: Add the `community.general` collection
**Files:**
- Modify: `requirements.yml`
- [ ] **Step 1: Add the collection with the on-demand comment**
In `requirements.yml`, under `collections:`, append:
```yaml
# community.general — gandi_livedns (public_dns role manages wingu.me at Gandi
# LiveDNS). PAT auth requires >= 9.0.0.
- name: community.general
version: ">=9.0.0"
```
- [ ] **Step 2: Install it**
Run: `make collections`
Expected: installs `community.general` (≥9.0.0) with no errors.
- [ ] **Step 3: Verify the module is available**
Run: `.venv/bin/ansible-doc community.general.gandi_livedns | head -5`
Expected: prints the module doc header (confirms the module resolves), mentioning `personal_access_token`.
- [ ] **Step 4: Commit**
```bash
git add requirements.yml
git commit -m "deps: add community.general for gandi_livedns (public_dns)"
```
---
### Task 2: Scaffold the role
**Files:**
- Create: `roles/public_dns/` (via the scaffolder)
- [ ] **Step 1: Scaffold**
Run: `make new-role NAME=public_dns`
Expected: `Role public_dns scaffolded at roles/public_dns/` (creates `tasks/`, `handlers/`, `defaults/`, `meta/`, `templates/`, `files/`, `molecule/default/`, `README.md`).
- [ ] **Step 2: Commit the scaffold**
```bash
git add roles/public_dns
git commit -m "scaffold(public_dns): empty role structure"
```
---
### Task 3: Record data + validation test (TDD)
**Files:**
- Test: `tests/test_public_dns.py`
- Create: `inventories/production/group_vars/all/public_dns.yml`
- [ ] **Step 1: Write the failing test**
Create `tests/test_public_dns.py`:
```python
import pathlib
import yaml
_DATA = (
pathlib.Path(__file__).resolve().parent.parent
/ "inventories" / "production" / "group_vars" / "all" / "public_dns.yml"
)
# Gandi auto-seeds these on a fresh .me zone; boma purges them (verified 2026-06-14).
GANDI_DEFAULTS_ABSENT = {
("@", "A"), ("www", "CNAME"), ("webmail", "CNAME"),
("gm1._domainkey", "CNAME"), ("gm2._domainkey", "CNAME"), ("gm3._domainkey", "CNAME"),
("_imap._tcp", "SRV"), ("_imaps._tcp", "SRV"), ("_pop3._tcp", "SRV"),
("_pop3s._tcp", "SRV"), ("_submission._tcp", "SRV"),
}
def _load():
return yaml.safe_load(_DATA.read_text())
def test_domain_is_wingu():
assert _load()["public_dns__domain"] == "wingu.me"
def test_present_records_well_formed():
for r in _load()["public_dns__records"]:
assert r["record"] and r["type"]
assert isinstance(r["values"], list) and r["values"]
def test_anti_spoof_baseline_present():
recs = {(r["record"], r["type"]): r["values"] for r in _load()["public_dns__records"]}
assert recs[("@", "MX")] == ["0 ."] # null MX
assert recs[("@", "TXT")] == ['"v=spf1 -all"'] # SPF deny-all
assert recs[("_dmarc", "TXT")] == ['"v=DMARC1; p=reject;"']
def test_gandi_defaults_marked_absent():
absent = {(r["record"], r["type"]) for r in _load()["public_dns__absent"]}
assert GANDI_DEFAULTS_ABSENT <= absent
def test_no_record_both_present_and_absent():
present = {(r["record"], r["type"]) for r in _load()["public_dns__records"]}
absent = {(r["record"], r["type"]) for r in _load()["public_dns__absent"]}
assert present.isdisjoint(absent)
def test_no_duplicate_present_records():
keys = [(r["record"], r["type"]) for r in _load()["public_dns__records"]]
assert len(keys) == len(set(keys))
```
- [ ] **Step 2: Run it to verify it fails**
Run: `.venv/bin/python -m pytest tests/test_public_dns.py -v`
Expected: FAIL (the data file does not exist yet — `FileNotFoundError`).
- [ ] **Step 3: Create the record data**
Create `inventories/production/group_vars/all/public_dns.yml`:
```yaml
---
# Public DNS — wingu.me at Gandi LiveDNS, managed by the public_dns role (M1).
# Mesh/LAN-only by default: only deliberate public records live here. PAT in
# vault.gandi.pat. See docs/decisions/007-network.md and the M1 spec.
public_dns__domain: wingu.me
# Present — anti-spoof baseline for a no-mail domain (overwrites Gandi's seeded mail set).
public_dns__records:
- { record: "@", type: MX, values: ["0 ."], ttl: 3600 }
- { record: "@", type: TXT, values: ['"v=spf1 -all"'], ttl: 3600 }
- { record: _dmarc, type: TXT, values: ['"v=DMARC1; p=reject;"'], ttl: 3600 }
# Service records appear as public-tier needs arise (askari A in M4).
# Mesh/LAN-only services never appear here.
# Absent — Gandi's auto-seeded defaults we don't want (purged once, idempotent thereafter).
public_dns__absent:
- { record: "@", type: A } # Gandi parking IP
- { record: www, type: CNAME } # Gandi web-redirect
- { record: webmail, type: CNAME } # Gandi webmail
- { record: gm1._domainkey, type: CNAME } # Gandi DKIM
- { record: gm2._domainkey, type: CNAME }
- { record: gm3._domainkey, type: CNAME }
- { record: _imap._tcp, type: SRV } # Gandi mail autodiscovery
- { record: _imaps._tcp, type: SRV }
- { record: _pop3._tcp, type: SRV }
- { record: _pop3s._tcp, type: SRV }
- { record: _submission._tcp, type: SRV }
```
- [ ] **Step 4: Run the test to verify it passes**
Run: `.venv/bin/python -m pytest tests/test_public_dns.py -v`
Expected: PASS (6 passed).
- [ ] **Step 5: Commit**
```bash
git add tests/test_public_dns.py inventories/production/group_vars/all/public_dns.yml
git commit -m "feat(public_dns): wingu.me record data + validation test"
```
---
### Task 4: Role implementation (defaults, tasks, meta, README)
**Files:**
- Modify: `roles/public_dns/defaults/main.yml`
- Modify: `roles/public_dns/tasks/main.yml`
- Modify: `roles/public_dns/meta/main.yml`
- Modify: `roles/public_dns/README.md`
- [ ] **Step 1: Write `defaults/main.yml`**
```yaml
---
# public_dns — manage the public zone at Gandi LiveDNS as code (M1).
# Record data (public_dns__domain / __records / __absent) lives in group_vars/all.
# See docs/decisions/007-network.md.
public_dns__apply: true # set false to validate without calling the Gandi API (Molecule)
public_dns__default_ttl: 1800 # TTL when a record omits one
public_dns__domain: "" # overridden in group_vars/all
public_dns__records: [] # present records
public_dns__absent: [] # records to remove
```
- [ ] **Step 2: Write `tasks/main.yml`**
```yaml
---
- name: Assert public DNS data is sane
ansible.builtin.assert:
that:
- public_dns__domain | length > 0
- public_dns__records | selectattr('type', 'equalto', 'MX') | list | length > 0
fail_msg: >-
public_dns__domain must be set and a null-MX anti-spoof record declared in
public_dns__records (group_vars/all/public_dns.yml).
run_once: true
- name: Ensure desired records are present (Gandi LiveDNS)
community.general.gandi_livedns:
domain: "{{ public_dns__domain }}"
record: "{{ item.record }}"
type: "{{ item.type }}"
values: "{{ item.values }}"
ttl: "{{ item.ttl | default(public_dns__default_ttl) }}"
state: present
personal_access_token: "{{ vault.gandi.pat }}"
loop: "{{ public_dns__records }}"
loop_control:
label: "{{ item.record }} {{ item.type }}"
run_once: true
when: public_dns__apply | bool
- name: Ensure unwanted records are absent (Gandi LiveDNS)
community.general.gandi_livedns:
domain: "{{ public_dns__domain }}"
record: "{{ item.record }}"
type: "{{ item.type }}"
state: absent
personal_access_token: "{{ vault.gandi.pat }}"
loop: "{{ public_dns__absent }}"
loop_control:
label: "{{ item.record }} {{ item.type }}"
run_once: true
when: public_dns__apply | bool
```
- [ ] **Step 3: Write `meta/main.yml`**
```yaml
---
galaxy_info:
author: sjat
description: Manage boma's public DNS zone (wingu.me) at Gandi LiveDNS as code.
license: MIT
min_ansible_version: "2.17"
platforms:
- name: Debian
versions:
- trixie
dependencies: []
```
- [ ] **Step 4: Write `README.md`**
```markdown
# public_dns
Manages boma's public DNS zone (**wingu.me**) at **Gandi LiveDNS** as code, via
`community.general.gandi_livedns` (PAT auth from `vault.gandi.pat`). Provider-agnostic
name on purpose. Run from the control node: `make check/deploy PLAYBOOK=dns`.
Mesh/LAN-only by default — only deliberate public records live in the zone (the
anti-spoof baseline now; `askari` in M4). Everything else is reached over LAN/mesh and
never appears here.
## Data (in `group_vars/all/public_dns.yml`)
| Var | Meaning |
|---|---|
| `public_dns__domain` | the zone (`wingu.me`) |
| `public_dns__records` | records to ensure **present** (`record`, `type`, `values`, optional `ttl`) |
| `public_dns__absent` | records to ensure **absent** (Gandi's auto-seeded defaults) |
## Behaviour knobs (`defaults/main.yml`)
| Var | Default | Meaning |
|---|---|---|
| `public_dns__apply` | `true` | set `false` to validate without calling the Gandi API (Molecule) |
| `public_dns__default_ttl` | `1800` | TTL when a record omits one |
## Notes
The zone is reconciled **additively** plus an explicit `absent` list (Gandi seeds 13
default records on a new `.me`; we purge the unwanted 11 and overwrite MX/SPF with the
anti-spoof baseline). Full-zone authoritative pruning is a future enhancement (TODO 8.3).
```
- [ ] **Step 5: Lint**
Run: `make lint`
Expected: `Passed: 0 failure(s)` and `check-tags: OK`.
- [ ] **Step 6: Commit**
```bash
git add roles/public_dns
git commit -m "feat(public_dns): role tasks, defaults, meta, README"
```
---
### Task 5: Molecule scenario (no live API)
**Files:**
- Modify: `roles/public_dns/molecule/default/converge.yml`
- Modify: `roles/public_dns/molecule/default/verify.yml`
- [ ] **Step 1: Write `converge.yml` (apply disabled, sample data)**
```yaml
---
- name: Converge
hosts: all
gather_facts: true
vars:
public_dns__apply: false # never call the Gandi API from a container
public_dns__domain: example.test
public_dns__records:
- { record: "@", type: MX, values: ["0 ."], ttl: 3600 }
- { record: "@", type: TXT, values: ['"v=spf1 -all"'], ttl: 3600 }
public_dns__absent:
- { record: www, type: CNAME }
roles:
- role: public_dns
```
- [ ] **Step 2: Write `verify.yml`**
```yaml
---
- name: Verify
hosts: all
gather_facts: false
tasks:
- name: Role variables resolved
ansible.builtin.assert:
that:
- public_dns__domain == "example.test"
- public_dns__apply | bool == false
msg: "public_dns defaults/vars did not resolve as expected"
tags: [verify]
```
- [ ] **Step 3: Run Molecule**
Run: `make test ROLE=public_dns`
Expected: PASS — converge applies the role (the `assert` passes; the `gandi_livedns` tasks are skipped because `public_dns__apply: false`), verify passes, idempotence clean.
- [ ] **Step 4: Commit**
```bash
git add roles/public_dns/molecule
git commit -m "test(public_dns): Molecule scenario (apply disabled, no live API)"
```
---
### Task 6: The `dns.yml` playbook
**Files:**
- Create: `playbooks/dns.yml`
- [ ] **Step 1: Write the play**
```yaml
---
# dns.yml — manage the public DNS zone (wingu.me) at Gandi LiveDNS as code.
# Runs on the control node (ubongo) against the Gandi API — no host config.
# Run: make check PLAYBOOK=dns then make deploy PLAYBOOK=dns
- name: Manage public DNS (Gandi LiveDNS)
hosts: control
connection: local
gather_facts: false
become: false
roles:
- role: public_dns
tags: [public_dns]
```
- [ ] **Step 2: Lint (verifies the role-name tag on the import)**
Run: `make lint`
Expected: `Passed: 0 failure(s)` and `check-tags: OK (... role imports verified)`.
- [ ] **Step 3: Commit**
```bash
git add playbooks/dns.yml
git commit -m "feat(public_dns): dns.yml play (control-node, Gandi LiveDNS)"
```
---
### Task 7: Live run on ubongo (purge + baseline) — gated
> **Runs on ubongo only** (vault + Gandi egress). `rbw unlock` first. This is the one
> task that mutates live Gandi; review the check-mode diff before deploying.
- [ ] **Step 1: Dry-run (check mode + diff)**
Run: `make check PLAYBOOK=dns`
Expected: the diff shows the 3 present records being set (null MX, SPF `-all`, DMARC `reject`) and the 11 Gandi defaults being removed. **Review it.**
- [ ] **Step 2: Apply**
Run: `make deploy PLAYBOOK=dns`
Expected: `changed` for the present + absent records; no errors.
- [ ] **Step 3: Verify idempotence**
Run: `make deploy PLAYBOOK=dns`
Expected: `ok=... changed=0` — a second run makes no changes.
- [ ] **Step 4: Verify with dig**
```bash
dig +short MX wingu.me # expect: 0 .
dig +short TXT wingu.me # expect: "v=spf1 -all"
dig +short TXT _dmarc.wingu.me # expect: "v=DMARC1; p=reject;"
dig +short www.wingu.me # expect: empty (CNAME removed)
```
Expected: as annotated (allow for TTL/propagation).
- [ ] **Step 5: No commit** — this task changes live Gandi, not the repo.
---
### Task 8: Documentation reconciliation
**Files:**
- Modify: `docs/decisions/007-network.md`
- Modify: `STATUS.md`
- Modify: `docs/TODO.md`
- Modify: `docs/CAPABILITIES.md`
- [ ] **Step 1: Amend ADR-007 — naming scheme row**
Replace the `Public service FQDN` row of the naming-scheme table:
```
| Public service FQDN | `<service>.baobab.band` | `forgejo.nyumbani.baobab.band` |
```
with:
```
| Public service FQDN | `<service>.wingu.me` | `vaultwarden.wingu.me` |
| Off-site (VPS) FQDN | `<service>.askari.wingu.me` | `netbird.askari.wingu.me` |
```
- [ ] **Step 2: Amend ADR-007 — public zone + scheme**
Replace the **Public zone** paragraph:
```
**Public zone**: `baobab.band` — served by external DNS (Cloudflare or equivalent).
Public-facing services resolve to the public IP or Cloudflare proxy.
```
with:
```
**Public zone**: `wingu.me` — Gandi LiveDNS, **managed as code** by the `public_dns`
role (`vault.gandi.pat`). Three-tier naming: infra `<host>.boma.wingu.me` (internal),
services `<service>.wingu.me` (split-horizon), off-site `<service>.askari.wingu.me`.
`nyumbani` is retired. **Mesh/LAN-only by default**: home services have no public record
(reached over LAN or the NetBird mesh); only deliberate exceptions are published.
The project is `boma`; the domain is `wingu.me` (see the M1 spec). The legacy
`baobab.band` zone (Cloudflare) is out of scope here.
```
- [ ] **Step 3: Update the split-horizon example**
In the **Split-horizon** paragraph, replace the example `forgejo.nyumbani.baobab.band`
with `vaultwarden.wingu.me` (internal → private proxy IP; public → only if a deliberate
exception). Leave the internal-zone (`boma.baobab.band` → to become `boma.wingu.me` when
the `dns` role lands in Phase 2) wording; add a parenthetical: *(internal zone is renamed
to `boma.wingu.me` when the `dns` role is built — Phase 2)*.
- [ ] **Step 4: Mark STATUS — public_dns built**
In `STATUS.md`, under "Real and working today", add a row:
```
| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); purged Gandi's seeded defaults, applied the anti-spoof baseline (null MX, SPF `-all`, DMARC reject). Mesh/LAN-only default. M1 of the roadmap. |
```
- [ ] **Step 5: Resolve TODO 4**
In `docs/TODO.md`, change item 4 to struck-through/decided:
```
4. ~~**Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?~~
DECIDED (M1): three-tier scheme on `wingu.me`; `nyumbani` dropped; mesh/LAN-only
default. See `docs/decisions/007-network.md` + the M1 spec.
```
- [ ] **Step 6: Add a CAPABILITIES row**
In `docs/CAPABILITIES.md`, near the Internal DNS row, add:
```
| Public DNS | `public_dns` role → Gandi LiveDNS | P | core | wingu.me zone as code (ADR-007) | anti-spoof baseline; mesh/LAN-only |
```
(Match the surrounding table's column shape; adjust the status letter to the table's convention.)
- [ ] **Step 7: Lint + commit**
Run: `make lint`
Expected: clean.
```bash
git add docs/decisions/007-network.md STATUS.md docs/TODO.md docs/CAPABILITIES.md
git commit -m "docs(public_dns): amend ADR-007 to wingu.me/Gandi; resolve TODO 4; STATUS + CAPABILITIES"
```
---
## Self-Review (completed)
- **Spec coverage:** role + group_vars data (Decisions 4,5) → Tasks 3,4; `gandi_livedns` + PAT (Decision 5, Verified facts) → Task 4; collections-on-demand (Decision 5) → Task 1; anti-spoof baseline + Gandi-defaults purge (Problem, Data model) → Tasks 3,7; cert scope (Decision 6) → out of scope (no cert tasks, correct); testing (check-mode/idempotence/dig + pytest) → Tasks 5,7,3; ADR-007 amendment + TODO 4/O12 → Task 8. All covered.
- **Placeholder scan:** none — every code/content step is concrete.
- **Type/name consistency:** `public_dns__domain`/`__records`/`__absent`/`__apply`/`__default_ttl` and `vault.gandi.pat` used identically across data, role, play, and tests. `gandi_livedns` params match the verified module signature.
- **Note for the implementer:** Task 7 assumes ubongo. If the `gandi_livedns` `absent` call needs `values` for some record types, add them from `public_dns__absent` (verify against the pinned `community.general` version per ADR-014).