Compare commits
3 commits
3dd03d4198
...
8e4bf3dd88
| Author | SHA1 | Date | |
|---|---|---|---|
| 8e4bf3dd88 | |||
| d8afa94c4b | |||
| f0d189ca09 |
10 changed files with 55 additions and 15 deletions
|
|
@ -101,14 +101,17 @@ inventories/
|
|||
vault.yml
|
||||
docker_hosts/ # hosts running Docker services
|
||||
proxmox_hosts/ # Proxmox nodes themselves
|
||||
offsite_hosts/ # off-site hosts (askari) — NetBird coordinator + watchdog
|
||||
host_vars/ # per-host overrides
|
||||
staging/ # safe to run freely
|
||||
```
|
||||
|
||||
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`
|
||||
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`
|
||||
|
||||
(`control` holds `ubongo`, the one manually-provisioned **physical** control node
|
||||
outside the cluster — see ADR-009 and ADR-015.)
|
||||
outside the cluster; `offsite_hosts` holds `askari`, the off-site Hetzner host that
|
||||
runs the NetBird coordinator + watchdog — also added manually. See ADR-009, ADR-015,
|
||||
ADR-016.)
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
17
README.md
17
README.md
|
|
@ -57,7 +57,11 @@ See `Makefile` for the full list of targets.
|
|||
│
|
||||
├── docs/
|
||||
│ ├── decisions/ # Architecture decision records (ADRs)
|
||||
│ └── runbooks/ # Step-by-step operational procedures
|
||||
│ ├── runbooks/ # Step-by-step operational procedures
|
||||
│ ├── security/ # Per-service security checklist + templates + accepted risks
|
||||
│ ├── testing/ # VERIFY.md template + service-UI verification reports
|
||||
│ ├── hardware/ # Physical capacity reference + reviews
|
||||
│ └── reviews/ # /review-repo reports
|
||||
│
|
||||
├── inventories/
|
||||
│ ├── production/ # Live hosts — edit carefully
|
||||
|
|
@ -92,6 +96,17 @@ See `Makefile` for the full list of targets.
|
|||
- Network topology: `docs/decisions/007-network.md`
|
||||
- Testing methodology: `docs/decisions/008-testing.md`
|
||||
- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md`
|
||||
- Forgejo & CI: `docs/decisions/010-forgejo-ci.md`
|
||||
- Update management: `docs/decisions/011-update-management.md`
|
||||
- Hardware & capacity: `docs/decisions/012-hardware-capacity.md`
|
||||
- Heritage / V4 policy: `docs/decisions/013-heritage-v4.md`
|
||||
- Sourcing technical knowledge: `docs/decisions/014-knowledge-sourcing.md`
|
||||
- Control / AI-worker host (`ubongo`): `docs/decisions/015-control-host.md`
|
||||
- Mesh VPN (NetBird): `docs/decisions/016-mesh-vpn.md`
|
||||
- Service-UI verification (Level 4): `docs/decisions/017-service-ui-verification.md`
|
||||
|
||||
(CLAUDE.md carries the full cross-referenced table, including the runbooks and
|
||||
security/testing docs.)
|
||||
|
||||
## Contributing
|
||||
|
||||
|
|
|
|||
|
|
@ -35,12 +35,15 @@ describes the *intended* design — see STATUS.md for what is actually built.
|
|||
all
|
||||
├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services
|
||||
├── docker_hosts # VMs running Docker services (most hosts)
|
||||
└── proxmox_hosts # Proxmox nodes themselves (limited management scope)
|
||||
├── proxmox_hosts # Proxmox nodes themselves (limited management scope)
|
||||
└── offsite_hosts # askari (off-site Hetzner) — NetBird coordinator + external watchdog
|
||||
```
|
||||
|
||||
The `control` group holds the single manually-provisioned control node; it is
|
||||
managed for baseline config (SSH, firewall, updates) but never runs the
|
||||
`docker_host` role. Proxmox nodes are managed only for basic baseline tasks (SSH).
|
||||
`docker_host` role. The `offsite_hosts` group holds `askari`, the off-site Hetzner
|
||||
host — also manually provisioned (ADR-016), managed for baseline config plus the
|
||||
`netbird_coordinator` service role. Proxmox nodes are managed only for basic baseline tasks (SSH).
|
||||
Proxmox configuration itself (storage, clustering, networking)
|
||||
is out of scope.
|
||||
|
||||
|
|
|
|||
|
|
@ -42,6 +42,7 @@ below). Each service role contains a standard set of files:
|
|||
| `defaults/main.yml` | Tuneables, `rolename__` namespace |
|
||||
| `README.md` | Purpose, variables, usage (role convention) |
|
||||
| `SECURITY.md` | Per-service security record — see ADR-002 and `docs/security/service-security-template.md` |
|
||||
| `VERIFY.md` | Per-service UI acceptance spec — see ADR-008 Level 4 / ADR-017 and `docs/testing/service-verify-template.md` |
|
||||
| `meta/main.yml`, `molecule/default/` | Metadata + Debian 13 test scenario |
|
||||
|
||||
### Standard deploy mechanics
|
||||
|
|
|
|||
|
|
@ -75,7 +75,7 @@ isolation — no risk of accidentally applying the wrong state.
|
|||
|
||||
Each environment directory contains:
|
||||
- `providers.tf` — provider version pins and configuration
|
||||
- `backend.tf` — Forgejo state backend (environment-specific path)
|
||||
- `backend.tf` — backend configuration (local state on the control node; no remote backend — see "State backend" above)
|
||||
- `variables.tf` — input declarations
|
||||
- `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values
|
||||
- `main.tf` — `local.vms` map and module calls (no DNS resources)
|
||||
|
|
|
|||
|
|
@ -75,7 +75,12 @@ The seam's interface is a single Terraform output consumed by a single script.
|
|||
`terraform output -json` and writes `inventories/<env>/hosts.yml`. It validates the
|
||||
group against the allowed set and fails loudly on an unknown group.
|
||||
|
||||
**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`.
|
||||
**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`.
|
||||
|
||||
`control` and `offsite_hosts` are not produced by Terraform — they hold manually
|
||||
provisioned hosts (`ubongo` and `askari` respectively) added to the inventory by hand
|
||||
(see the control-node exception below and ADR-015/ADR-016). They are valid groups so
|
||||
the generated `hosts.yml` carries their (otherwise empty) sections.
|
||||
|
||||
The generated `hosts.yml` carries a "do not edit manually" header and is owned by
|
||||
the generator. Treat it as a build artifact: the source of truth is `local.vms` in
|
||||
|
|
|
|||
|
|
@ -85,10 +85,11 @@ The accelerators this policy prefers (`context7`, `deep-research`, `superpowers`
|
|||
`claude-code-guide`) are **plugins under `~/.claude/`** — local per machine, **not**
|
||||
synced by Claude account and **not** carried by the git repo (only `.claude/commands`,
|
||||
`.claude/hooks`, `.claude/settings.json` travel). A fresh clone therefore lacks the
|
||||
plugin toolchain until it is reinstalled. Making it reproducible from the repo
|
||||
(`extraKnownMarketplaces` + `enabledPlugins` in `.claude/settings.json`, plus a
|
||||
bootstrap step) is tracked in `docs/TODO.md` and tied to control-node/AI setup. Until
|
||||
then, the graceful-degradation fallback above keeps the policy working.
|
||||
plugin toolchain until it is reinstalled. Making it reproducible from the repo is
|
||||
**done** (TODO 10.7): `.claude/settings.json` declares `extraKnownMarketplaces` +
|
||||
`enabledPlugins`, and `docs/runbooks/claude-code-setup.md` documents the per-machine
|
||||
bootstrap. Until a fresh clone runs that bootstrap, the graceful-degradation fallback
|
||||
above keeps the policy working.
|
||||
|
||||
## Decision
|
||||
|
||||
|
|
|
|||
|
|
@ -77,7 +77,8 @@ allocated for it.
|
|||
- **Coordinator survival:** off-site on `askari` ⇒ mesh survives a homelab outage.
|
||||
NetBird's management datastore is backed up encrypted off `askari` (synced to
|
||||
`ubongo`/`mamba`); peers keep last-known config through a brief coordinator outage.
|
||||
- **`askari` is Ansible-managed:** its own inventory group, `base` role, plus a
|
||||
- **`askari` is Ansible-managed:** its own inventory group `offsite_hosts` (added
|
||||
manually like the control node — it is not Terraform-managed), `base` role, plus a
|
||||
dedicated `netbird_coordinator` service role (one service = one role, ADR-004; with
|
||||
`SECURITY.md`). Agent install/enrollment lives in `base`. NetBird server + agents are
|
||||
version-pinned (ADR-011). boma's `dns` role stays authoritative for
|
||||
|
|
|
|||
|
|
@ -82,7 +82,16 @@ service clears the security bar — record any conscious deviation in
|
|||
manual in review today, with the planned `/security-review` aggregating every
|
||||
`roles/*/SECURITY.md` to automate it.
|
||||
|
||||
### 10. Commit
|
||||
### 10. Write the per-service verification spec (services)
|
||||
|
||||
For a **service** role, copy `docs/testing/service-verify-template.md` to
|
||||
`roles/<rolename>/VERIFY.md` and fill it in: the critical user journeys that define
|
||||
"working" for this service, what good looks like, what is not browser-verifiable
|
||||
(→ manual handoff), and the test data needed. This is the per-service backbone for the
|
||||
Level 4 `/verify-service` check (ADR-008 / ADR-017) and is part of the pre-production
|
||||
service-clearance gate (`docs/security/service-checklist.md`).
|
||||
|
||||
### 11. Commit
|
||||
|
||||
```bash
|
||||
git checkout -b role/<rolename>
|
||||
|
|
|
|||
|
|
@ -15,13 +15,15 @@ Expected Terraform output shape:
|
|||
}
|
||||
}
|
||||
|
||||
Valid groups: control, docker_hosts, proxmox_hosts
|
||||
Valid groups: control, docker_hosts, proxmox_hosts, offsite_hosts
|
||||
(control and offsite_hosts hold manually-provisioned hosts not in Terraform; they
|
||||
are valid so their sections appear in the generated inventory — see ADR-009.)
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
|
||||
VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts"}
|
||||
VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts", "offsite_hosts"}
|
||||
|
||||
|
||||
def main() -> None:
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue