Compare commits

..

No commits in common. "8e4bf3dd88f7e1bdd934db6f41c0f1bb3d89ea31" and "3dd03d4198c91aefefe84491d3185989f350ef4f" have entirely different histories.

10 changed files with 15 additions and 55 deletions

View file

@ -101,17 +101,14 @@ inventories/
vault.yml
docker_hosts/ # hosts running Docker services
proxmox_hosts/ # Proxmox nodes themselves
offsite_hosts/ # off-site hosts (askari) — NetBird coordinator + watchdog
host_vars/ # per-host overrides
staging/ # safe to run freely
```
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`
(`control` holds `ubongo`, the one manually-provisioned **physical** control node
outside the cluster; `offsite_hosts` holds `askari`, the off-site Hetzner host that
runs the NetBird coordinator + watchdog — also added manually. See ADR-009, ADR-015,
ADR-016.)
outside the cluster — see ADR-009 and ADR-015.)
---

View file

@ -57,11 +57,7 @@ See `Makefile` for the full list of targets.
├── docs/
│ ├── decisions/ # Architecture decision records (ADRs)
│ ├── runbooks/ # Step-by-step operational procedures
│ ├── security/ # Per-service security checklist + templates + accepted risks
│ ├── testing/ # VERIFY.md template + service-UI verification reports
│ ├── hardware/ # Physical capacity reference + reviews
│ └── reviews/ # /review-repo reports
│ └── runbooks/ # Step-by-step operational procedures
├── inventories/
│ ├── production/ # Live hosts — edit carefully
@ -96,17 +92,6 @@ See `Makefile` for the full list of targets.
- Network topology: `docs/decisions/007-network.md`
- Testing methodology: `docs/decisions/008-testing.md`
- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md`
- Forgejo & CI: `docs/decisions/010-forgejo-ci.md`
- Update management: `docs/decisions/011-update-management.md`
- Hardware & capacity: `docs/decisions/012-hardware-capacity.md`
- Heritage / V4 policy: `docs/decisions/013-heritage-v4.md`
- Sourcing technical knowledge: `docs/decisions/014-knowledge-sourcing.md`
- Control / AI-worker host (`ubongo`): `docs/decisions/015-control-host.md`
- Mesh VPN (NetBird): `docs/decisions/016-mesh-vpn.md`
- Service-UI verification (Level 4): `docs/decisions/017-service-ui-verification.md`
(CLAUDE.md carries the full cross-referenced table, including the runbooks and
security/testing docs.)
## Contributing

View file

@ -35,15 +35,12 @@ describes the *intended* design — see STATUS.md for what is actually built.
all
├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services
├── docker_hosts # VMs running Docker services (most hosts)
├── proxmox_hosts # Proxmox nodes themselves (limited management scope)
└── offsite_hosts # askari (off-site Hetzner) — NetBird coordinator + external watchdog
└── proxmox_hosts # Proxmox nodes themselves (limited management scope)
```
The `control` group holds the single manually-provisioned control node; it is
managed for baseline config (SSH, firewall, updates) but never runs the
`docker_host` role. The `offsite_hosts` group holds `askari`, the off-site Hetzner
host — also manually provisioned (ADR-016), managed for baseline config plus the
`netbird_coordinator` service role. Proxmox nodes are managed only for basic baseline tasks (SSH).
`docker_host` role. Proxmox nodes are managed only for basic baseline tasks (SSH).
Proxmox configuration itself (storage, clustering, networking)
is out of scope.

View file

@ -42,7 +42,6 @@ below). Each service role contains a standard set of files:
| `defaults/main.yml` | Tuneables, `rolename__` namespace |
| `README.md` | Purpose, variables, usage (role convention) |
| `SECURITY.md` | Per-service security record — see ADR-002 and `docs/security/service-security-template.md` |
| `VERIFY.md` | Per-service UI acceptance spec — see ADR-008 Level 4 / ADR-017 and `docs/testing/service-verify-template.md` |
| `meta/main.yml`, `molecule/default/` | Metadata + Debian 13 test scenario |
### Standard deploy mechanics

View file

@ -75,7 +75,7 @@ isolation — no risk of accidentally applying the wrong state.
Each environment directory contains:
- `providers.tf` — provider version pins and configuration
- `backend.tf`backend configuration (local state on the control node; no remote backend — see "State backend" above)
- `backend.tf`Forgejo state backend (environment-specific path)
- `variables.tf` — input declarations
- `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values
- `main.tf``local.vms` map and module calls (no DNS resources)

View file

@ -75,12 +75,7 @@ The seam's interface is a single Terraform output consumed by a single script.
`terraform output -json` and writes `inventories/<env>/hosts.yml`. It validates the
group against the allowed set and fails loudly on an unknown group.
**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`.
`control` and `offsite_hosts` are not produced by Terraform — they hold manually
provisioned hosts (`ubongo` and `askari` respectively) added to the inventory by hand
(see the control-node exception below and ADR-015/ADR-016). They are valid groups so
the generated `hosts.yml` carries their (otherwise empty) sections.
**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`.
The generated `hosts.yml` carries a "do not edit manually" header and is owned by
the generator. Treat it as a build artifact: the source of truth is `local.vms` in

View file

@ -85,11 +85,10 @@ The accelerators this policy prefers (`context7`, `deep-research`, `superpowers`
`claude-code-guide`) are **plugins under `~/.claude/`** — local per machine, **not**
synced by Claude account and **not** carried by the git repo (only `.claude/commands`,
`.claude/hooks`, `.claude/settings.json` travel). A fresh clone therefore lacks the
plugin toolchain until it is reinstalled. Making it reproducible from the repo is
**done** (TODO 10.7): `.claude/settings.json` declares `extraKnownMarketplaces` +
`enabledPlugins`, and `docs/runbooks/claude-code-setup.md` documents the per-machine
bootstrap. Until a fresh clone runs that bootstrap, the graceful-degradation fallback
above keeps the policy working.
plugin toolchain until it is reinstalled. Making it reproducible from the repo
(`extraKnownMarketplaces` + `enabledPlugins` in `.claude/settings.json`, plus a
bootstrap step) is tracked in `docs/TODO.md` and tied to control-node/AI setup. Until
then, the graceful-degradation fallback above keeps the policy working.
## Decision

View file

@ -77,8 +77,7 @@ allocated for it.
- **Coordinator survival:** off-site on `askari` ⇒ mesh survives a homelab outage.
NetBird's management datastore is backed up encrypted off `askari` (synced to
`ubongo`/`mamba`); peers keep last-known config through a brief coordinator outage.
- **`askari` is Ansible-managed:** its own inventory group `offsite_hosts` (added
manually like the control node — it is not Terraform-managed), `base` role, plus a
- **`askari` is Ansible-managed:** its own inventory group, `base` role, plus a
dedicated `netbird_coordinator` service role (one service = one role, ADR-004; with
`SECURITY.md`). Agent install/enrollment lives in `base`. NetBird server + agents are
version-pinned (ADR-011). boma's `dns` role stays authoritative for

View file

@ -82,16 +82,7 @@ service clears the security bar — record any conscious deviation in
manual in review today, with the planned `/security-review` aggregating every
`roles/*/SECURITY.md` to automate it.
### 10. Write the per-service verification spec (services)
For a **service** role, copy `docs/testing/service-verify-template.md` to
`roles/<rolename>/VERIFY.md` and fill it in: the critical user journeys that define
"working" for this service, what good looks like, what is not browser-verifiable
(→ manual handoff), and the test data needed. This is the per-service backbone for the
Level 4 `/verify-service` check (ADR-008 / ADR-017) and is part of the pre-production
service-clearance gate (`docs/security/service-checklist.md`).
### 11. Commit
### 10. Commit
```bash
git checkout -b role/<rolename>

View file

@ -15,15 +15,13 @@ Expected Terraform output shape:
}
}
Valid groups: control, docker_hosts, proxmox_hosts, offsite_hosts
(control and offsite_hosts hold manually-provisioned hosts not in Terraform; they
are valid so their sections appear in the generated inventory see ADR-009.)
Valid groups: control, docker_hosts, proxmox_hosts
"""
import json
import sys
VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts", "offsite_hosts"}
VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts"}
def main() -> None: