boma/docs/decisions/007-network.md
sjat 175777e36a docs: reconcile 2026-06-14 review findings (O1-O7,O18,O22)
- STATUS: docker_host is built+applied, not scaffold-only (O1)
- ADR-004: backup points to ADR-022, not "out of scope"; service-role file
  table gains ACCESS.md + BACKUP.md rows (O2, O5)
- Finish Traefik->Caddy: ADR-008/011/017/019, CAPABILITIES, TODO (O3); scope
  ADR-024's custom-image/NetBird claims to the deferred DNS-01/M4b paths (O22)
- ADR-016/017/018 now lead with ## Status per ADR-023 (O4)
- ADR-002: caveat `PLAYBOOK=upgrade` as planned/unbuilt (O6)
- CAPABILITIES: carve out ubongo's dev_env from the nvim/tmux exclusion (O7)
- ADR-007: one authoritative boma.baobab.band -> boma.wingu.me transition note (O18)
- new-host Part E: note ubongo is managed as sjat, ansible-user bootstrap pending (O15)

O9 (hosts.yml header) left open: the file is generator-owned (hook-protected);
fixing it needs a tf_to_inventory.py change or a tf-inventory run, not a hand-edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 19:06:33 +02:00

9.4 KiB
Raw Blame History

ADR-007 — Network topology and addressing

Status

Accepted (2026-05-30)

Context

The boma homelab is a Proxmox cluster on a dedicated private network behind an OPNsense firewall. This document records the agreed physical topology, VLAN design, IP addressing conventions, naming scheme, and DNS zone structure. Everything here feeds directly into Terraform variables, Ansible inventory, and OPNsense configuration.


Decision

Physical topology

ISP
 └── OPNsense (dedicated hardware)
      ├── WAN — ISP uplink
      └── LAN — 802.1q trunk to managed switch
                         │
          ┌──────────────┼──────────────────────────┐
          │              │              │            │
        pve0           pve1           pve2        AP1 / AP2
     (eno1 trunk)   (eno1 trunk)  (eno1 trunk)   (trunk)
     (eno2 corosync)(eno2 corosync)(eno2 corosync)
          └──────────────┴──────────────┘
               172.16.0.0/24  (corosync ring — not on managed switch)

Dual NICs per Proxmox node:

  • eno1 — VLAN-aware trunk. Carries all VLANs via a single VLAN-aware bridge (vmbr0). VMs get their VLAN tag assigned in Proxmox.
  • eno2 — Dedicated corosync ring (vmbr1). Direct link or tiny unmanaged switch between the three nodes only. Never touches the main switch fabric.

Access points broadcast multiple SSIDs, each tagged to its corresponding VLAN (trusted WiFi → VLAN 30, IoT → VLAN 40, guest → VLAN 50).


VLAN design

VLAN Name Subnet Purpose
10 mgmt 10.10.0.0/24 Proxmox hosts, OPNsense, managed switch. No internet except update repos.
20 srv 10.20.0.0/24 All Debian VMs and Docker services. 100% static. Terraform provisions here.
30 lan 10.30.0.0/24 Trusted home devices. DHCP. Access to selected srv services via OPNsense.
40 iot 10.40.0.0/24 Smart home, cameras, printers. DHCP. Internet egress only + HA exception.
50 guest 10.50.0.0/24 Guest WiFi. DHCP. Internet only, fully isolated.
99 vpn (retired) Replaced by the NetBird mesh (ADR-016). Remote access for ubongo, askari, and road-warrior clients rides a self-hosted NetBird overlay, not an OPNsense WireGuard subnet. 10.99.0.0/24 is freed.

IP addressing

VLAN 10 — mgmt (10.10.0.0/24) — no DHCP

Address Host
10.10.0.1 OPNsense LAN (mgmt)
10.10.0.2 Managed switch
10.10.0.200 pve0
10.10.0.201 pve1
10.10.0.202 pve2

VLAN 20 — srv (10.20.0.0/24) — no DHCP, all static

Range Purpose
10.20.0.1 OPNsense gateway
10.20.0.10.19 Core infrastructure VMs (DNS, proxy)
10.20.0.20.49 Additional static infrastructure
10.20.0.50.249 Terraform-provisioned VMs

Assigned infrastructure addresses:

Address Host Role
10.20.0.10 dns1 Primary DNS server
10.20.0.11 dns2 Secondary DNS server
10.20.0.12 proxy Reverse proxy
10.20.0.13 homeassistant Home Assistant (IoT controller)

VLAN 30 — lan (10.30.0.0/24)

Range Purpose
10.30.0.1 OPNsense gateway
10.30.0.100.249 DHCP pool

VLAN 40 — iot (10.40.0.0/24)

Range Purpose
10.40.0.1 OPNsense gateway
10.40.0.100.249 DHCP pool

VLAN 50 — guest (10.50.0.0/24)

Range Purpose
10.50.0.1 OPNsense gateway
10.50.0.100.249 DHCP pool

VLAN 99 — vpn — retired

The OPNsense WireGuard VPN (10.99.0.0/24) is replaced by the NetBird mesh (ADR-016). Remote access for ubongo, askari, and road-warrior clients rides a self-hosted NetBird overlay — data plane peer-to-peer WireGuard, control plane NetBird self-hosted on askari. NetBird manages its own overlay addressing (default 100.64.0.0/10); no boma VLAN/subnet is allocated for it, and 10.99.0.0/24 is freed.

Corosync ring (172.16.0.0/24) — not on managed switch

Address Host
172.16.0.200 pve0
172.16.0.201 pve1
172.16.0.202 pve2

OPNsense firewall rules (intent)

Source Destination Policy
mgmt anywhere allow (administrator access)
srv srv allow (inter-service communication)
srv internet allow (updates, image pulls)
lan srv (allow-list) allow specific published ports only
lan internet allow
iot internet allow egress only
iot srv (HA IP only) allow on integration ports
guest internet allow, isolated from all internal
mesh peers srv (metrics ports) allow (monitoring) — enforced by NetBird ACLs, not OPNsense (ADR-016)
mesh peers mgmt allow (administration) — enforced by NetBird ACLs (ADR-016)

Home Assistant ↔ IoT: HA VM at 10.20.0.13 can reach IoT VLAN on required ports. OPNsense Avahi (mDNS reflector) bridges srviot for device discovery. IoT devices cannot initiate connections to srv.


Naming scheme

Layer Convention Examples
Homelab name boma
Proxmox nodes pve<n> pve0, pve1, pve2
Infrastructure VMs <role><n> dns1, dns2, proxy
Hetzner VPS askari Swahili for guard/sentinel
Internal FQDN <host>.boma.baobab.band dns1.boma.baobab.band
Public service FQDN <service>.wingu.me vaultwarden.wingu.me
Off-site (VPS) FQDN <service>.askari.wingu.me netbird.askari.wingu.me

DNS zones and split-horizon

Internal zone: boma.baobab.band today (the dns role is unbuilt) — served by dns1 and dns2. Target: it is renamed to boma.wingu.me in Phase 2 when the dns role lands. Until then boma.baobab.band is the authoritative internal name everywhere it appears (the naming table above, split-horizon below, the OPNsense forwarder, and ADR-009/016). This is the single source for that transition; other references use the current name and inherit this caveat. The zone is rendered by the Ansible dns role: host A records come from the inventory (which derives from Terraform's local.vms via make tf-inventory), and service/alias/split-horizon records are explicit zone data in group_vars. Terraform itself writes no DNS records — see ADR-009.

Public zone: wingu.me — Gandi LiveDNS, managed as code by the public_dns role (vault.gandi.pat). Three-tier naming: infra <host>.boma.wingu.me (internal — the Phase-2 target; currently boma.baobab.band, see Internal zone above), services <service>.wingu.me (split-horizon), off-site <service>.askari.wingu.me. nyumbani is retired. Mesh/LAN-only by default: home services have no public record (reached over LAN or the NetBird mesh); only deliberate exceptions are published. The project is boma; the domain is wingu.me. The legacy baobab.band zone (Cloudflare) is out of scope here.

Split-horizon: dns1/dns2 serve internal answers for any hostname that has both a public and private face. Example: vaultwarden.wingu.me resolves to 10.20.0.12 (proxy) internally and to the public IP externally (the internal zone will be renamed to boma.wingu.me when the dns role is built — Phase 2).

OPNsense DNS resolver forwards boma.baobab.band queries to dns1/dns2. All other queries go upstream (e.g., 1.1.1.1, 9.9.9.9).


External monitoring — askari

askari (Hetzner VPS) is a peer on the NetBird mesh (ADR-016) and also hosts the self-hosted NetBird coordinator (management/signal/relay). It reaches srv metrics endpoints and mgmt for administration over the mesh, scoped by NetBird ACLs — no OPNsense WireGuard tunnel and no 10.99.0.0/24 routing.

askari is provisioned as Terraform IaC (hetznercloud/hcloud), managed independently of the Proxmox cluster (its own provider + local state in terraform/environments/offsite/). It must be reachable even when the homelab is down (its entire purpose), which is also why the mesh coordinator lives here: an off-site control plane survives a homelab outage. FQDN: askari.wingu.me (off-site tier; record added by public_dns when askari exists — M2/M4).


Consequences

Drawn from the implications already stated above:

  • VLAN 99 (vpn, 10.99.0.0/24) is retired and the subnet freed; remote access is carried by the self-hosted NetBird mesh instead of an OPNsense WireGuard subnet (VLAN design; IP addressing — VLAN 99 retired).
  • Mesh-peer firewall allowances (to srv metrics ports and mgmt) are enforced by NetBird ACLs, not OPNsense rules (OPNsense firewall rules (intent)).
  • IoT devices cannot initiate connections to srv; only Home Assistant at 10.20.0.13 may reach the IoT VLAN, with OPNsense Avahi bridging srviot for discovery (OPNsense firewall rules (intent)).
  • Terraform writes no DNS records; the Ansible dns role renders the internal zone from inventory plus group_vars, with dns1/dns2 serving split-horizon answers (DNS zones and split-horizon).
  • askari runs independently of the cluster so it survives a homelab outage, which is why the off-site NetBird control plane lives there (External monitoring — askari).