HomelabDesignV5/design-decisions.md
sjat 5af0cf8582 Add design decisions list for V5
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 09:00:59 +02:00

10 KiB
Raw Permalink Blame History

Design Decisions — Homelab V5

Subjects to discuss and decide before building V5. Ordered so that each decision can be made with stable answers to everything above it. Each has a unique ID for tracking.

Status values: undecided · decided · deferred


Foundation

D-01 · Goals and guiding principles

Status: undecided

What V5 should optimise for. Every later trade-off will be made against this.

  • What are the top 35 priorities? (e.g. reliability, simplicity, maintainability, capability, cost, family usability)
  • Are there things V4 got wrong that V5 must not repeat?
  • What is explicitly out of scope?

D-02 · Hardware — what to keep, retire, replace, or add

Status: undecided

Which physical machines carry forward into V5 and in what role. Decisions here determine what compute, storage, and network capacity the rest of the design has to work with.

  • fisi: keep as primary server? Upgrade? Replace?
  • tembo: keep kiosk+monitoring combo? Split the roles?
  • papa: keep as dedicated NAS?
  • kobe: keep as dedicated backup target? Consolidate with papa?
  • kuku/faru: keep as Pi-based roles? Upgrade to newer Pi hardware?
  • simba: keep OPNsense on current hardware?
  • Any new hardware to introduce?

D-03 · Virtualisation strategy

Status: undecided

Whether to introduce a hypervisor layer, and if so which one. This decision shapes host OS choices, service isolation, and migration paths.

  • Stay bare-metal containers only (current approach)?
  • Introduce a hypervisor (Proxmox, ESXi, bhyve)?
  • If yes: which hosts get it, and which remain bare metal?
  • What is the unit of deployment — VM, LXC, container, or a mix?

D-04 · Host OS strategy

Status: undecided

What OS runs on each category of machine. Depends on D-03.

  • Debian everywhere (current)? Or specialised OS per role (TrueNAS for NAS, Proxmox for compute, etc.)?
  • Minimum Debian version to target?
  • Should all managed hosts run the same base OS?

Network

D-05 · IP addressing and VLAN design

Status: undecided

The logical network topology. Sets the stage for firewall rules and service addressing.

  • Keep the 10.20.x.x scheme?
  • Are the current VLANs (10.20.10, .1, .2, .30) the right boundaries, or does V5 need more/fewer segments?
  • Should the WireGuard tunnel subnet (10.8.0.0/24) change?
  • DHCP reservation strategy — reserve all infrastructure IPs statically?

D-06 · Firewall and router

Status: undecided

What handles routing, NAT, DHCP, and inter-VLAN policy.

  • Keep OPNsense on simba?
  • Any changes to hardware or OPNsense version?
  • Are the current inter-VLAN policies correct, or does V5 need stricter segmentation (e.g. IoT fully isolated)?

D-07 · WiFi

Status: undecided

  • Keep the two EAP610 APs (tai1/tai2) as-is?
  • Add a third AP?
  • Keep using standalone mode or move to Omada controller?

Platform Services

D-08 · Container orchestration

Status: undecided

How containers are defined, deployed, and managed. One of the most consequential decisions — affects the IaC model, tooling, and complexity.

  • Keep Docker + Docker Compose (current)?
  • Move to Podman/Quadlets?
  • Introduce an orchestrator (Nomad, K3s)?
  • If staying with Compose: keep the container_base Ansible role pattern?

D-09 · Internal DNS

Status: undecided

How internal names resolve and how ad blocking is handled.

  • Keep Technitium on fisi?
  • Risks of single-server DNS (fisi DNS outage = no internal resolution)?
  • Should DNS be moved to a more reliable host, or should there be a secondary?
  • Keep the *.nyumbani.baobab.band wildcard pattern?

D-10 · Reverse proxy

Status: undecided

How HTTPS termination and routing work for internal and public services.

  • Keep Traefik (current, on fisi)?
  • Any reason to consider Caddy?
  • Certificate strategy: keep DNS-01 wildcards via Cloudflare? Keep per-VPS Traefik instances?
  • Is a single Traefik instance on fisi the right topology, or should tembo have its own?

D-11 · Remote access and VPN

Status: undecided

How family members and VPS hosts reach the homelab network from outside.

  • Keep WireGuard on kuku (Raspberry Pi hub)?
  • Is kuku a single point of failure worth addressing?
  • Consider Tailscale or Headscale instead of self-managed WireGuard?
  • VPS integration: keep VPS hosts as WireGuard spokes?

D-12 · Secrets management

Status: undecided

Where secrets live and how they are accessed at deploy time.

  • Keep Ansible Vault (current)?
  • Move to SOPS + age?
  • Introduce a secrets server (Infisical, Doppler, HashiCorp Vault)?
  • Is the single vault file per inventory environment the right structure?

D-13 · IaC tooling

Status: undecided

The tools used to define and apply infrastructure state. Depends on D-03, D-04, D-08.

  • Keep Ansible as the primary tool?
  • Add Terraform/OpenTofu for VPS provisioning?
  • Keep the two-inventory (prod/lab) structure?
  • Role naming and structure: evolve AnsibleBaobabV4 in place, or start fresh?

Storage and Data

D-14 · Storage architecture

Status: undecided

How storage is organised across machines. Depends on D-02 and D-03.

  • Keep papa as a dedicated NAS with ZFS mirror exported via NFS?
  • Is the NVMe on fisi the right place for all container state?
  • Should media and container data live on the same host?
  • Any need for larger storage capacity in V5?

D-15 · Backup strategy

Status: undecided

What is protected, how, and where the backups land. Depends on D-14.

  • Keep Borg as primary? Keep papa as the backup target?
  • Simplify: consolidate kobe (rsnapshot) into the Borg model?
  • Off-site: keep pCloud sync via rclone? Any other off-site approach?
  • Backup for network devices (simba, APs, switch) — keep pull model from papa?
  • RTO/RPO expectations: what is acceptable downtime and data loss?

Observability

D-16 · Observability stack placement

Status: undecided

Which host runs the monitoring stack and how resilient it needs to be.

  • Keep monitoring on tembo (same machine as kiosk)?
  • Should monitoring be on a host that is not also a kiosk / display?
  • What happens to observability if the monitoring host goes down?

D-17 · Metrics, logs, and alerting

Status: undecided

The specific tools and data flows for observability. Depends on D-16.

  • Keep Prometheus + Loki + Grafana?
  • Keep Grafana Alloy as the shipping agent?
  • Keep the Matrix bot for alerts, or move to ntfy?
  • Log retention: is 15-day Prometheus retention enough?
  • Any gaps in current coverage to address in V5?

Public Exposure

D-18 · VPS strategy

Status: undecided

How many VPS hosts, at which providers, and for what roles.

  • Keep three VPS (baobab.band, makerfloss, rullebiler.dk)?
  • makerfloss is currently isolated (no WireGuard, no backup) — is that intentional?
  • Should VPS hosts be brought fully into the homelab WireGuard mesh?
  • Cost and provider consolidation: any reason to move hosts?

D-19 · Public services and exposure model

Status: undecided

What is reachable from the internet and how traffic gets there.

  • Which services need to be publicly accessible (vs. VPN-only)?
  • Keep the current model of public services pointing to fisi's public IP via Cloudflare?
  • Should any services move behind a VPS relay (Cloudflare Tunnel, nginx stream proxy)?
  • Port exposure policy: what can be opened directly vs. must go through a VPS?

D-20 · Domain and DNS provider strategy

Status: undecided

How many domains, managed where.

  • Keep baobab.band + makerfloss.eu + rullebiler.dk?
  • Keep split between Cloudflare and Gandi for DNS management?
  • Any consolidation desired?

Services

D-21 · Core service catalogue

Status: undecided

Which services are first-class citizens in V5 — things that must be reliable and are worth complexity to maintain.

  • Define the "core" tier: services that must survive a host rebuild before anything else is restored (e.g. Vaultwarden, Nextcloud, Forgejo, DNS, Grafana).
  • Define the "nice-to-have" tier.
  • Are there V4 services that should be dropped in V5?

D-22 · Media stack

Status: undecided

  • Keep the full *arr stack (Sonarr, Radarr, Lidarr, Prowlarr, Lazylibrarian)?
  • Keep Jellyfin + Audiobookshelf + Calibre Web?
  • Any services to add or drop?
  • Gluetun VPN for qBittorrent: keep PIA, or change provider?

D-23 · Communication services

Status: undecided

  • Keep self-hosted Matrix (conduwuit + Element Web)?
  • Keep Poste.io for mail — three separate instances across three hosts is the current pattern; is that the right structure?
  • Keep ntfy for push notifications?
  • Any desire to consolidate or simplify the comms stack?

D-24 · Photo management

Status: undecided

PhotoPrism is currently deployed on both fisi and tembo (partially migrated). This is unresolved technical debt.

  • Settle on a single host for PhotoPrism.
  • Is PhotoPrism the right tool long-term, or is there an alternative to consider?
  • Confirm GPU passthrough requirements (Intel Quick Sync for transcoding).

D-25 · Home automation

Status: undecided

  • Keep HAOS on twiga?
  • Is twiga's current hardware sufficient?
  • How tightly should Home Assistant integrate with the rest of the homelab in V5 (monitoring, VPN, etc.)?

D-26 · Kiosk

Status: undecided

  • Keep tembo as a dedicated kiosk display machine?
  • Is GNOME the right desktop environment for a kiosk, or something lighter?
  • Keep the current tab rotation + physical button handler?
  • Should the kiosk and monitoring stack remain co-located on tembo?

Laptops and Clients

D-27 · Laptop management strategy

Status: undecided

  • Keep Debian + XFCE on all laptops, managed by Ansible?
  • Any laptops to replace or add?
  • mbuzi currently has no WireGuard config — is that intentional?
  • Is the multi-user XFCE model on mamba working, or is it a source of friction?

D-28 · Client software stack

Status: undecided

  • Keep the current Ansible-managed flatpak + APT stack?
  • Any applications to add, replace, or drop?
  • pCloud: keep as the family cloud sync provider?
  • PIA VPN on laptops: keep alongside WireGuard, or consolidate?