boma/docs/decisions/007-network.md
sjat 9e0c264658 docs: reconcile lower-severity review findings (O9-O24)
- ADR-007: document ubongo on the legacy V4 net at 10.20.10.151 (transitional,
  outside the planned srv /24 until the LAN is re-cut) (O10); single authoritative
  boma.baobab.band -> boma.wingu.me transition note already added earlier
- terraform tfvars.example + variables.tf (both envs): pve01 -> pve0 and
  <host>.boma.baobab.band per ADR-007 naming (O11)
- ADR-012/013/015/016/017/018: convert "See also:" prose to `## Related` sections
  placed after Consequences, matching ADR-014/019-023 (O13)
- docs/README + inventories/README: list the missing subdirs / offsite_hosts +
  offsite.yml merge behaviour (O14, O29 note)
- ADR-009: drop the retired `nyumbani` example; use vaultwarden.wingu.me split-horizon (O19)
- ROADMAP M2: askari shipped as cx23/x86 (CAX11/ARM out of stock) (O20)
- ADR-020: 80/443/3478 opened in M4a (past tense); coordinator role is M4b (O21)
- netbird -> netbird_coordinator across ROADMAP M4b, the M4b plan, ADR-024 (O23)
- ADR-024: align the M1 DNS-01 wildcard scope wording with ROADMAP (O24)
- capacity-scan.py: read the inventory directory so offsite.yml (askari) is seen (O28)
- tf_to_inventory.py: generated header now warns it overwrites the manual control node (O9)
- tests/tags.yml: proxy concern comment Traefik -> Caddy (missed in the O3 sweep)

O9's existing stub hosts.yml header stays as-is (generator-owned, hook-protected);
the fix lives in the generator for the next regeneration. make lint + pytest (57) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:31:40 +02:00

9.9 KiB
Raw Permalink Blame History

ADR-007 — Network topology and addressing

Status

Accepted (2026-05-30)

Context

The boma homelab is a Proxmox cluster on a dedicated private network behind an OPNsense firewall. This document records the agreed physical topology, VLAN design, IP addressing conventions, naming scheme, and DNS zone structure. Everything here feeds directly into Terraform variables, Ansible inventory, and OPNsense configuration.


Decision

Physical topology

ISP
 └── OPNsense (dedicated hardware)
      ├── WAN — ISP uplink
      └── LAN — 802.1q trunk to managed switch
                         │
          ┌──────────────┼──────────────────────────┐
          │              │              │            │
        pve0           pve1           pve2        AP1 / AP2
     (eno1 trunk)   (eno1 trunk)  (eno1 trunk)   (trunk)
     (eno2 corosync)(eno2 corosync)(eno2 corosync)
          └──────────────┴──────────────┘
               172.16.0.0/24  (corosync ring — not on managed switch)

Dual NICs per Proxmox node:

  • eno1 — VLAN-aware trunk. Carries all VLANs via a single VLAN-aware bridge (vmbr0). VMs get their VLAN tag assigned in Proxmox.
  • eno2 — Dedicated corosync ring (vmbr1). Direct link or tiny unmanaged switch between the three nodes only. Never touches the main switch fabric.

Access points broadcast multiple SSIDs, each tagged to its corresponding VLAN (trusted WiFi → VLAN 30, IoT → VLAN 40, guest → VLAN 50).


VLAN design

VLAN Name Subnet Purpose
10 mgmt 10.10.0.0/24 Proxmox hosts, OPNsense, managed switch. No internet except update repos.
20 srv 10.20.0.0/24 All Debian VMs and Docker services. 100% static. Terraform provisions here.
30 lan 10.30.0.0/24 Trusted home devices. DHCP. Access to selected srv services via OPNsense.
40 iot 10.40.0.0/24 Smart home, cameras, printers. DHCP. Internet egress only + HA exception.
50 guest 10.50.0.0/24 Guest WiFi. DHCP. Internet only, fully isolated.
99 vpn (retired) Replaced by the NetBird mesh (ADR-016). Remote access for ubongo, askari, and road-warrior clients rides a self-hosted NetBird overlay, not an OPNsense WireGuard subnet. 10.99.0.0/24 is freed.

IP addressing

VLAN 10 — mgmt (10.10.0.0/24) — no DHCP

Address Host
10.10.0.1 OPNsense LAN (mgmt)
10.10.0.2 Managed switch
10.10.0.200 pve0
10.10.0.201 pve1
10.10.0.202 pve2

VLAN 20 — srv (10.20.0.0/24) — no DHCP, all static

Range Purpose
10.20.0.1 OPNsense gateway
10.20.0.10.19 Core infrastructure VMs (DNS, proxy)
10.20.0.20.49 Additional static infrastructure
10.20.0.50.249 Terraform-provisioned VMs

Assigned infrastructure addresses:

Address Host Role
10.20.0.10 dns1 Primary DNS server
10.20.0.11 dns2 Secondary DNS server
10.20.0.12 proxy Reverse proxy
10.20.0.13 homeassistant Home Assistant (IoT controller)

Control node ubongo — legacy V4 network (transitional). ubongo (ADR-015) is the manually-provisioned physical control node and currently lives on the legacy V4 homelab network at 10.20.10.151 — boma is being built up from the V4 base, and the physical LAN has not yet been re-cut to this VLAN scheme. That address is therefore outside the planned srv 10.20.0.0/24; base__firewall_control_addr and the inventory point at the real (V4) address. When the network is migrated to these VLANs, ubongo moves into mgmt/srv and this note is retired.

VLAN 30 — lan (10.30.0.0/24)

Range Purpose
10.30.0.1 OPNsense gateway
10.30.0.100.249 DHCP pool

VLAN 40 — iot (10.40.0.0/24)

Range Purpose
10.40.0.1 OPNsense gateway
10.40.0.100.249 DHCP pool

VLAN 50 — guest (10.50.0.0/24)

Range Purpose
10.50.0.1 OPNsense gateway
10.50.0.100.249 DHCP pool

VLAN 99 — vpn — retired

The OPNsense WireGuard VPN (10.99.0.0/24) is replaced by the NetBird mesh (ADR-016). Remote access for ubongo, askari, and road-warrior clients rides a self-hosted NetBird overlay — data plane peer-to-peer WireGuard, control plane NetBird self-hosted on askari. NetBird manages its own overlay addressing (default 100.64.0.0/10); no boma VLAN/subnet is allocated for it, and 10.99.0.0/24 is freed.

Corosync ring (172.16.0.0/24) — not on managed switch

Address Host
172.16.0.200 pve0
172.16.0.201 pve1
172.16.0.202 pve2

OPNsense firewall rules (intent)

Source Destination Policy
mgmt anywhere allow (administrator access)
srv srv allow (inter-service communication)
srv internet allow (updates, image pulls)
lan srv (allow-list) allow specific published ports only
lan internet allow
iot internet allow egress only
iot srv (HA IP only) allow on integration ports
guest internet allow, isolated from all internal
mesh peers srv (metrics ports) allow (monitoring) — enforced by NetBird ACLs, not OPNsense (ADR-016)
mesh peers mgmt allow (administration) — enforced by NetBird ACLs (ADR-016)

Home Assistant ↔ IoT: HA VM at 10.20.0.13 can reach IoT VLAN on required ports. OPNsense Avahi (mDNS reflector) bridges srviot for device discovery. IoT devices cannot initiate connections to srv.


Naming scheme

Layer Convention Examples
Homelab name boma
Proxmox nodes pve<n> pve0, pve1, pve2
Infrastructure VMs <role><n> dns1, dns2, proxy
Hetzner VPS askari Swahili for guard/sentinel
Internal FQDN <host>.boma.baobab.band dns1.boma.baobab.band
Public service FQDN <service>.wingu.me vaultwarden.wingu.me
Off-site (VPS) FQDN <service>.askari.wingu.me netbird.askari.wingu.me

DNS zones and split-horizon

Internal zone: boma.baobab.band today (the dns role is unbuilt) — served by dns1 and dns2. Target: it is renamed to boma.wingu.me in Phase 2 when the dns role lands. Until then boma.baobab.band is the authoritative internal name everywhere it appears (the naming table above, split-horizon below, the OPNsense forwarder, and ADR-009/016). This is the single source for that transition; other references use the current name and inherit this caveat. The zone is rendered by the Ansible dns role: host A records come from the inventory (which derives from Terraform's local.vms via make tf-inventory), and service/alias/split-horizon records are explicit zone data in group_vars. Terraform itself writes no DNS records — see ADR-009.

Public zone: wingu.me — Gandi LiveDNS, managed as code by the public_dns role (vault.gandi.pat). Three-tier naming: infra <host>.boma.wingu.me (internal — the Phase-2 target; currently boma.baobab.band, see Internal zone above), services <service>.wingu.me (split-horizon), off-site <service>.askari.wingu.me. nyumbani is retired. Mesh/LAN-only by default: home services have no public record (reached over LAN or the NetBird mesh); only deliberate exceptions are published. The project is boma; the domain is wingu.me. The legacy baobab.band zone (Cloudflare) is out of scope here.

Split-horizon: dns1/dns2 serve internal answers for any hostname that has both a public and private face. Example: vaultwarden.wingu.me resolves to 10.20.0.12 (proxy) internally and to the public IP externally (the internal zone will be renamed to boma.wingu.me when the dns role is built — Phase 2).

OPNsense DNS resolver forwards boma.baobab.band queries to dns1/dns2. All other queries go upstream (e.g., 1.1.1.1, 9.9.9.9).


External monitoring — askari

askari (Hetzner VPS) is a peer on the NetBird mesh (ADR-016) and also hosts the self-hosted NetBird coordinator (management/signal/relay). It reaches srv metrics endpoints and mgmt for administration over the mesh, scoped by NetBird ACLs — no OPNsense WireGuard tunnel and no 10.99.0.0/24 routing.

askari is provisioned as Terraform IaC (hetznercloud/hcloud), managed independently of the Proxmox cluster (its own provider + local state in terraform/environments/offsite/). It must be reachable even when the homelab is down (its entire purpose), which is also why the mesh coordinator lives here: an off-site control plane survives a homelab outage. FQDN: askari.wingu.me (off-site tier; record added by public_dns when askari exists — M2/M4).


Consequences

Drawn from the implications already stated above:

  • VLAN 99 (vpn, 10.99.0.0/24) is retired and the subnet freed; remote access is carried by the self-hosted NetBird mesh instead of an OPNsense WireGuard subnet (VLAN design; IP addressing — VLAN 99 retired).
  • Mesh-peer firewall allowances (to srv metrics ports and mgmt) are enforced by NetBird ACLs, not OPNsense rules (OPNsense firewall rules (intent)).
  • IoT devices cannot initiate connections to srv; only Home Assistant at 10.20.0.13 may reach the IoT VLAN, with OPNsense Avahi bridging srviot for discovery (OPNsense firewall rules (intent)).
  • Terraform writes no DNS records; the Ansible dns role renders the internal zone from inventory plus group_vars, with dns1/dns2 serving split-horizon answers (DNS zones and split-horizon).
  • askari runs independently of the cluster so it survives a homelab outage, which is why the off-site NetBird control plane lives there (External monitoring — askari).