docs(plan): M1 — public_dns implementation plan

Bite-sized TDD plan: add community.general; scaffold public_dns; wingu.me record
data + pytest; role tasks (gandi_livedns present/absent loops, apply toggle);
Molecule (apply=false, no live API); dns.yml play; gated live run on ubongo
(purge Gandi defaults + anti-spoof baseline + dig verify); ADR-007 amendment +
TODO 4 resolution + STATUS/CAPABILITIES.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-14 10:23:26 +02:00
parent 602550fdaa
commit b131ee317e

View file

@ -0,0 +1,551 @@
# Public DNS (M1) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Build the `public_dns` role that manages `wingu.me`'s records at Gandi LiveDNS as code, purging Gandi's seeded defaults and applying boma's anti-spoof baseline.
**Architecture:** A control-node role drives `community.general.gandi_livedns` over declarative record lists in `group_vars/all/public_dns.yml` (mirroring the firewall-catalog pattern). Records to keep are `state: present`; Gandi's auto-seeded defaults are `state: absent`. A `public_dns__apply` toggle lets Molecule converge without calling the API; a pytest validates the data shape; the live run happens via `make check`/`deploy PLAYBOOK=dns` on ubongo.
**Tech Stack:** Ansible (`community.general.gandi_livedns`, PAT auth), pytest, Gandi LiveDNS API. Secrets from `vault.gandi.pat`.
**Spec:** `docs/superpowers/specs/2026-06-11-public-dns-gandi-migration-design.md`
**Execution context:** Tasks 16 + 8 are authoring (any machine with the venv). **Task 7 runs on ubongo** (has the vault + Gandi egress) and is the only one that touches live Gandi.
---
## File Structure
- `requirements.yml` (modify) — add `community.general` (≥9.0.0) for `gandi_livedns`.
- `roles/public_dns/` (create) — `defaults/main.yml`, `tasks/main.yml`, `meta/main.yml`, `README.md`, `molecule/default/`.
- `inventories/production/group_vars/all/public_dns.yml` (create) — `public_dns__domain` + `public_dns__records` (present) + `public_dns__absent` (Gandi defaults).
- `playbooks/dns.yml` (create) — control-node play running the role.
- `tests/test_public_dns.py` (create) — pytest over the record data.
- `docs/decisions/007-network.md`, `STATUS.md`, `docs/TODO.md`, `docs/CAPABILITIES.md` (modify) — doc reconciliation.
---
### Task 1: Add the `community.general` collection
**Files:**
- Modify: `requirements.yml`
- [ ] **Step 1: Add the collection with the on-demand comment**
In `requirements.yml`, under `collections:`, append:
```yaml
# community.general — gandi_livedns (public_dns role manages wingu.me at Gandi
# LiveDNS). PAT auth requires >= 9.0.0.
- name: community.general
version: ">=9.0.0"
```
- [ ] **Step 2: Install it**
Run: `make collections`
Expected: installs `community.general` (≥9.0.0) with no errors.
- [ ] **Step 3: Verify the module is available**
Run: `.venv/bin/ansible-doc community.general.gandi_livedns | head -5`
Expected: prints the module doc header (confirms the module resolves), mentioning `personal_access_token`.
- [ ] **Step 4: Commit**
```bash
git add requirements.yml
git commit -m "deps: add community.general for gandi_livedns (public_dns)"
```
---
### Task 2: Scaffold the role
**Files:**
- Create: `roles/public_dns/` (via the scaffolder)
- [ ] **Step 1: Scaffold**
Run: `make new-role NAME=public_dns`
Expected: `Role public_dns scaffolded at roles/public_dns/` (creates `tasks/`, `handlers/`, `defaults/`, `meta/`, `templates/`, `files/`, `molecule/default/`, `README.md`).
- [ ] **Step 2: Commit the scaffold**
```bash
git add roles/public_dns
git commit -m "scaffold(public_dns): empty role structure"
```
---
### Task 3: Record data + validation test (TDD)
**Files:**
- Test: `tests/test_public_dns.py`
- Create: `inventories/production/group_vars/all/public_dns.yml`
- [ ] **Step 1: Write the failing test**
Create `tests/test_public_dns.py`:
```python
import pathlib
import yaml
_DATA = (
pathlib.Path(__file__).resolve().parent.parent
/ "inventories" / "production" / "group_vars" / "all" / "public_dns.yml"
)
# Gandi auto-seeds these on a fresh .me zone; boma purges them (verified 2026-06-14).
GANDI_DEFAULTS_ABSENT = {
("@", "A"), ("www", "CNAME"), ("webmail", "CNAME"),
("gm1._domainkey", "CNAME"), ("gm2._domainkey", "CNAME"), ("gm3._domainkey", "CNAME"),
("_imap._tcp", "SRV"), ("_imaps._tcp", "SRV"), ("_pop3._tcp", "SRV"),
("_pop3s._tcp", "SRV"), ("_submission._tcp", "SRV"),
}
def _load():
return yaml.safe_load(_DATA.read_text())
def test_domain_is_wingu():
assert _load()["public_dns__domain"] == "wingu.me"
def test_present_records_well_formed():
for r in _load()["public_dns__records"]:
assert r["record"] and r["type"]
assert isinstance(r["values"], list) and r["values"]
def test_anti_spoof_baseline_present():
recs = {(r["record"], r["type"]): r["values"] for r in _load()["public_dns__records"]}
assert recs[("@", "MX")] == ["0 ."] # null MX
assert recs[("@", "TXT")] == ['"v=spf1 -all"'] # SPF deny-all
assert recs[("_dmarc", "TXT")] == ['"v=DMARC1; p=reject;"']
def test_gandi_defaults_marked_absent():
absent = {(r["record"], r["type"]) for r in _load()["public_dns__absent"]}
assert GANDI_DEFAULTS_ABSENT <= absent
def test_no_record_both_present_and_absent():
present = {(r["record"], r["type"]) for r in _load()["public_dns__records"]}
absent = {(r["record"], r["type"]) for r in _load()["public_dns__absent"]}
assert present.isdisjoint(absent)
def test_no_duplicate_present_records():
keys = [(r["record"], r["type"]) for r in _load()["public_dns__records"]]
assert len(keys) == len(set(keys))
```
- [ ] **Step 2: Run it to verify it fails**
Run: `.venv/bin/python -m pytest tests/test_public_dns.py -v`
Expected: FAIL (the data file does not exist yet — `FileNotFoundError`).
- [ ] **Step 3: Create the record data**
Create `inventories/production/group_vars/all/public_dns.yml`:
```yaml
---
# Public DNS — wingu.me at Gandi LiveDNS, managed by the public_dns role (M1).
# Mesh/LAN-only by default: only deliberate public records live here. PAT in
# vault.gandi.pat. See docs/decisions/007-network.md and the M1 spec.
public_dns__domain: wingu.me
# Present — anti-spoof baseline for a no-mail domain (overwrites Gandi's seeded mail set).
public_dns__records:
- { record: "@", type: MX, values: ["0 ."], ttl: 3600 }
- { record: "@", type: TXT, values: ['"v=spf1 -all"'], ttl: 3600 }
- { record: _dmarc, type: TXT, values: ['"v=DMARC1; p=reject;"'], ttl: 3600 }
# Service records appear as public-tier needs arise (askari A in M4).
# Mesh/LAN-only services never appear here.
# Absent — Gandi's auto-seeded defaults we don't want (purged once, idempotent thereafter).
public_dns__absent:
- { record: "@", type: A } # Gandi parking IP
- { record: www, type: CNAME } # Gandi web-redirect
- { record: webmail, type: CNAME } # Gandi webmail
- { record: gm1._domainkey, type: CNAME } # Gandi DKIM
- { record: gm2._domainkey, type: CNAME }
- { record: gm3._domainkey, type: CNAME }
- { record: _imap._tcp, type: SRV } # Gandi mail autodiscovery
- { record: _imaps._tcp, type: SRV }
- { record: _pop3._tcp, type: SRV }
- { record: _pop3s._tcp, type: SRV }
- { record: _submission._tcp, type: SRV }
```
- [ ] **Step 4: Run the test to verify it passes**
Run: `.venv/bin/python -m pytest tests/test_public_dns.py -v`
Expected: PASS (6 passed).
- [ ] **Step 5: Commit**
```bash
git add tests/test_public_dns.py inventories/production/group_vars/all/public_dns.yml
git commit -m "feat(public_dns): wingu.me record data + validation test"
```
---
### Task 4: Role implementation (defaults, tasks, meta, README)
**Files:**
- Modify: `roles/public_dns/defaults/main.yml`
- Modify: `roles/public_dns/tasks/main.yml`
- Modify: `roles/public_dns/meta/main.yml`
- Modify: `roles/public_dns/README.md`
- [ ] **Step 1: Write `defaults/main.yml`**
```yaml
---
# public_dns — manage the public zone at Gandi LiveDNS as code (M1).
# Record data (public_dns__domain / __records / __absent) lives in group_vars/all.
# See docs/decisions/007-network.md.
public_dns__apply: true # set false to validate without calling the Gandi API (Molecule)
public_dns__default_ttl: 1800 # TTL when a record omits one
public_dns__domain: "" # overridden in group_vars/all
public_dns__records: [] # present records
public_dns__absent: [] # records to remove
```
- [ ] **Step 2: Write `tasks/main.yml`**
```yaml
---
- name: Assert public DNS data is sane
ansible.builtin.assert:
that:
- public_dns__domain | length > 0
- public_dns__records | selectattr('type', 'equalto', 'MX') | list | length > 0
fail_msg: >-
public_dns__domain must be set and a null-MX anti-spoof record declared in
public_dns__records (group_vars/all/public_dns.yml).
run_once: true
- name: Ensure desired records are present (Gandi LiveDNS)
community.general.gandi_livedns:
domain: "{{ public_dns__domain }}"
record: "{{ item.record }}"
type: "{{ item.type }}"
values: "{{ item.values }}"
ttl: "{{ item.ttl | default(public_dns__default_ttl) }}"
state: present
personal_access_token: "{{ vault.gandi.pat }}"
loop: "{{ public_dns__records }}"
loop_control:
label: "{{ item.record }} {{ item.type }}"
run_once: true
when: public_dns__apply | bool
- name: Ensure unwanted records are absent (Gandi LiveDNS)
community.general.gandi_livedns:
domain: "{{ public_dns__domain }}"
record: "{{ item.record }}"
type: "{{ item.type }}"
state: absent
personal_access_token: "{{ vault.gandi.pat }}"
loop: "{{ public_dns__absent }}"
loop_control:
label: "{{ item.record }} {{ item.type }}"
run_once: true
when: public_dns__apply | bool
```
- [ ] **Step 3: Write `meta/main.yml`**
```yaml
---
galaxy_info:
author: sjat
description: Manage boma's public DNS zone (wingu.me) at Gandi LiveDNS as code.
license: MIT
min_ansible_version: "2.17"
platforms:
- name: Debian
versions:
- trixie
dependencies: []
```
- [ ] **Step 4: Write `README.md`**
```markdown
# public_dns
Manages boma's public DNS zone (**wingu.me**) at **Gandi LiveDNS** as code, via
`community.general.gandi_livedns` (PAT auth from `vault.gandi.pat`). Provider-agnostic
name on purpose. Run from the control node: `make check/deploy PLAYBOOK=dns`.
Mesh/LAN-only by default — only deliberate public records live in the zone (the
anti-spoof baseline now; `askari` in M4). Everything else is reached over LAN/mesh and
never appears here.
## Data (in `group_vars/all/public_dns.yml`)
| Var | Meaning |
|---|---|
| `public_dns__domain` | the zone (`wingu.me`) |
| `public_dns__records` | records to ensure **present** (`record`, `type`, `values`, optional `ttl`) |
| `public_dns__absent` | records to ensure **absent** (Gandi's auto-seeded defaults) |
## Behaviour knobs (`defaults/main.yml`)
| Var | Default | Meaning |
|---|---|---|
| `public_dns__apply` | `true` | set `false` to validate without calling the Gandi API (Molecule) |
| `public_dns__default_ttl` | `1800` | TTL when a record omits one |
## Notes
The zone is reconciled **additively** plus an explicit `absent` list (Gandi seeds 13
default records on a new `.me`; we purge the unwanted 11 and overwrite MX/SPF with the
anti-spoof baseline). Full-zone authoritative pruning is a future enhancement (TODO 8.3).
```
- [ ] **Step 5: Lint**
Run: `make lint`
Expected: `Passed: 0 failure(s)` and `check-tags: OK`.
- [ ] **Step 6: Commit**
```bash
git add roles/public_dns
git commit -m "feat(public_dns): role tasks, defaults, meta, README"
```
---
### Task 5: Molecule scenario (no live API)
**Files:**
- Modify: `roles/public_dns/molecule/default/converge.yml`
- Modify: `roles/public_dns/molecule/default/verify.yml`
- [ ] **Step 1: Write `converge.yml` (apply disabled, sample data)**
```yaml
---
- name: Converge
hosts: all
gather_facts: true
vars:
public_dns__apply: false # never call the Gandi API from a container
public_dns__domain: example.test
public_dns__records:
- { record: "@", type: MX, values: ["0 ."], ttl: 3600 }
- { record: "@", type: TXT, values: ['"v=spf1 -all"'], ttl: 3600 }
public_dns__absent:
- { record: www, type: CNAME }
roles:
- role: public_dns
```
- [ ] **Step 2: Write `verify.yml`**
```yaml
---
- name: Verify
hosts: all
gather_facts: false
tasks:
- name: Role variables resolved
ansible.builtin.assert:
that:
- public_dns__domain == "example.test"
- public_dns__apply | bool == false
msg: "public_dns defaults/vars did not resolve as expected"
tags: [verify]
```
- [ ] **Step 3: Run Molecule**
Run: `make test ROLE=public_dns`
Expected: PASS — converge applies the role (the `assert` passes; the `gandi_livedns` tasks are skipped because `public_dns__apply: false`), verify passes, idempotence clean.
- [ ] **Step 4: Commit**
```bash
git add roles/public_dns/molecule
git commit -m "test(public_dns): Molecule scenario (apply disabled, no live API)"
```
---
### Task 6: The `dns.yml` playbook
**Files:**
- Create: `playbooks/dns.yml`
- [ ] **Step 1: Write the play**
```yaml
---
# dns.yml — manage the public DNS zone (wingu.me) at Gandi LiveDNS as code.
# Runs on the control node (ubongo) against the Gandi API — no host config.
# Run: make check PLAYBOOK=dns then make deploy PLAYBOOK=dns
- name: Manage public DNS (Gandi LiveDNS)
hosts: control
connection: local
gather_facts: false
become: false
roles:
- role: public_dns
tags: [public_dns]
```
- [ ] **Step 2: Lint (verifies the role-name tag on the import)**
Run: `make lint`
Expected: `Passed: 0 failure(s)` and `check-tags: OK (... role imports verified)`.
- [ ] **Step 3: Commit**
```bash
git add playbooks/dns.yml
git commit -m "feat(public_dns): dns.yml play (control-node, Gandi LiveDNS)"
```
---
### Task 7: Live run on ubongo (purge + baseline) — gated
> **Runs on ubongo only** (vault + Gandi egress). `rbw unlock` first. This is the one
> task that mutates live Gandi; review the check-mode diff before deploying.
- [ ] **Step 1: Dry-run (check mode + diff)**
Run: `make check PLAYBOOK=dns`
Expected: the diff shows the 3 present records being set (null MX, SPF `-all`, DMARC `reject`) and the 11 Gandi defaults being removed. **Review it.**
- [ ] **Step 2: Apply**
Run: `make deploy PLAYBOOK=dns`
Expected: `changed` for the present + absent records; no errors.
- [ ] **Step 3: Verify idempotence**
Run: `make deploy PLAYBOOK=dns`
Expected: `ok=... changed=0` — a second run makes no changes.
- [ ] **Step 4: Verify with dig**
```bash
dig +short MX wingu.me # expect: 0 .
dig +short TXT wingu.me # expect: "v=spf1 -all"
dig +short TXT _dmarc.wingu.me # expect: "v=DMARC1; p=reject;"
dig +short www.wingu.me # expect: empty (CNAME removed)
```
Expected: as annotated (allow for TTL/propagation).
- [ ] **Step 5: No commit** — this task changes live Gandi, not the repo.
---
### Task 8: Documentation reconciliation
**Files:**
- Modify: `docs/decisions/007-network.md`
- Modify: `STATUS.md`
- Modify: `docs/TODO.md`
- Modify: `docs/CAPABILITIES.md`
- [ ] **Step 1: Amend ADR-007 — naming scheme row**
Replace the `Public service FQDN` row of the naming-scheme table:
```
| Public service FQDN | `<service>.baobab.band` | `forgejo.nyumbani.baobab.band` |
```
with:
```
| Public service FQDN | `<service>.wingu.me` | `vaultwarden.wingu.me` |
| Off-site (VPS) FQDN | `<service>.askari.wingu.me` | `netbird.askari.wingu.me` |
```
- [ ] **Step 2: Amend ADR-007 — public zone + scheme**
Replace the **Public zone** paragraph:
```
**Public zone**: `baobab.band` — served by external DNS (Cloudflare or equivalent).
Public-facing services resolve to the public IP or Cloudflare proxy.
```
with:
```
**Public zone**: `wingu.me` — Gandi LiveDNS, **managed as code** by the `public_dns`
role (`vault.gandi.pat`). Three-tier naming: infra `<host>.boma.wingu.me` (internal),
services `<service>.wingu.me` (split-horizon), off-site `<service>.askari.wingu.me`.
`nyumbani` is retired. **Mesh/LAN-only by default**: home services have no public record
(reached over LAN or the NetBird mesh); only deliberate exceptions are published.
The project is `boma`; the domain is `wingu.me` (see the M1 spec). The legacy
`baobab.band` zone (Cloudflare) is out of scope here.
```
- [ ] **Step 3: Update the split-horizon example**
In the **Split-horizon** paragraph, replace the example `forgejo.nyumbani.baobab.band`
with `vaultwarden.wingu.me` (internal → private proxy IP; public → only if a deliberate
exception). Leave the internal-zone (`boma.baobab.band` → to become `boma.wingu.me` when
the `dns` role lands in Phase 2) wording; add a parenthetical: *(internal zone is renamed
to `boma.wingu.me` when the `dns` role is built — Phase 2)*.
- [ ] **Step 4: Mark STATUS — public_dns built**
In `STATUS.md`, under "Real and working today", add a row:
```
| `roles/public_dns/` + `playbooks/dns.yml` | **Built + applied.** Manages wingu.me at Gandi LiveDNS as code (`community.general.gandi_livedns`, PAT from `vault.gandi.pat`); purged Gandi's seeded defaults, applied the anti-spoof baseline (null MX, SPF `-all`, DMARC reject). Mesh/LAN-only default. M1 of the roadmap. |
```
- [ ] **Step 5: Resolve TODO 4**
In `docs/TODO.md`, change item 4 to struck-through/decided:
```
4. ~~**Split-horizon FQDN** — adopt split-horizon FQDN with or without nyumbani?~~
DECIDED (M1): three-tier scheme on `wingu.me`; `nyumbani` dropped; mesh/LAN-only
default. See `docs/decisions/007-network.md` + the M1 spec.
```
- [ ] **Step 6: Add a CAPABILITIES row**
In `docs/CAPABILITIES.md`, near the Internal DNS row, add:
```
| Public DNS | `public_dns` role → Gandi LiveDNS | P | core | wingu.me zone as code (ADR-007) | anti-spoof baseline; mesh/LAN-only |
```
(Match the surrounding table's column shape; adjust the status letter to the table's convention.)
- [ ] **Step 7: Lint + commit**
Run: `make lint`
Expected: clean.
```bash
git add docs/decisions/007-network.md STATUS.md docs/TODO.md docs/CAPABILITIES.md
git commit -m "docs(public_dns): amend ADR-007 to wingu.me/Gandi; resolve TODO 4; STATUS + CAPABILITIES"
```
---
## Self-Review (completed)
- **Spec coverage:** role + group_vars data (Decisions 4,5) → Tasks 3,4; `gandi_livedns` + PAT (Decision 5, Verified facts) → Task 4; collections-on-demand (Decision 5) → Task 1; anti-spoof baseline + Gandi-defaults purge (Problem, Data model) → Tasks 3,7; cert scope (Decision 6) → out of scope (no cert tasks, correct); testing (check-mode/idempotence/dig + pytest) → Tasks 5,7,3; ADR-007 amendment + TODO 4/O12 → Task 8. All covered.
- **Placeholder scan:** none — every code/content step is concrete.
- **Type/name consistency:** `public_dns__domain`/`__records`/`__absent`/`__apply`/`__default_ttl` and `vault.gandi.pat` used identically across data, role, play, and tests. `gandi_livedns` params match the verified module signature.
- **Note for the implementer:** Task 7 assumes ubongo. If the `gandi_livedns` `absent` call needs `values` for some record types, add them from `public_dns__absent` (verify against the pinned `community.general` version per ADR-014).