Compare commits

...

3 commits

Author SHA1 Message Date
8e4bf3dd88 ADR-006/014: clear two stale labels
Review O5/O6: ADR-006 mislabeled backend.tf as "Forgejo state backend" (its own
State-backend section chooses local state — Forgejo's API is read-only); ADR-014
called plugin reproducibility open though TODO 10.7 is done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:55:17 +02:00
d8afa94c4b Name and propagate the offsite_hosts inventory group (askari)
Review O4: ADR-016 said askari gets "its own inventory group" but never named it.
Settled as offsite_hosts (off-site, distinct from on-site-but-off-cluster ubongo).
Added to VALID_GROUPS (tf_to_inventory.py), ADR-009 valid groups, ADR-001/ADR-016
host-group enumerations, and CLAUDE.md. Generated hosts.yml picks up the section on
the next make tf-inventory (a manual-exception group like control).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:54:54 +02:00
f0d189ca09 Thread the VERIFY.md convention through ADR-004/new-role/README
Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the
ADR-004 service-role file table, as a new-role runbook step, and the README
docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:52:42 +02:00
10 changed files with 55 additions and 15 deletions

View file

@ -101,14 +101,17 @@ inventories/
vault.yml
docker_hosts/ # hosts running Docker services
proxmox_hosts/ # Proxmox nodes themselves
offsite_hosts/ # off-site hosts (askari) — NetBird coordinator + watchdog
host_vars/ # per-host overrides
staging/ # safe to run freely
```
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`
(`control` holds `ubongo`, the one manually-provisioned **physical** control node
outside the cluster — see ADR-009 and ADR-015.)
outside the cluster; `offsite_hosts` holds `askari`, the off-site Hetzner host that
runs the NetBird coordinator + watchdog — also added manually. See ADR-009, ADR-015,
ADR-016.)
---

View file

@ -57,7 +57,11 @@ See `Makefile` for the full list of targets.
├── docs/
│ ├── decisions/ # Architecture decision records (ADRs)
│ └── runbooks/ # Step-by-step operational procedures
│ ├── runbooks/ # Step-by-step operational procedures
│ ├── security/ # Per-service security checklist + templates + accepted risks
│ ├── testing/ # VERIFY.md template + service-UI verification reports
│ ├── hardware/ # Physical capacity reference + reviews
│ └── reviews/ # /review-repo reports
├── inventories/
│ ├── production/ # Live hosts — edit carefully
@ -92,6 +96,17 @@ See `Makefile` for the full list of targets.
- Network topology: `docs/decisions/007-network.md`
- Testing methodology: `docs/decisions/008-testing.md`
- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md`
- Forgejo & CI: `docs/decisions/010-forgejo-ci.md`
- Update management: `docs/decisions/011-update-management.md`
- Hardware & capacity: `docs/decisions/012-hardware-capacity.md`
- Heritage / V4 policy: `docs/decisions/013-heritage-v4.md`
- Sourcing technical knowledge: `docs/decisions/014-knowledge-sourcing.md`
- Control / AI-worker host (`ubongo`): `docs/decisions/015-control-host.md`
- Mesh VPN (NetBird): `docs/decisions/016-mesh-vpn.md`
- Service-UI verification (Level 4): `docs/decisions/017-service-ui-verification.md`
(CLAUDE.md carries the full cross-referenced table, including the runbooks and
security/testing docs.)
## Contributing

View file

@ -35,12 +35,15 @@ describes the *intended* design — see STATUS.md for what is actually built.
all
├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services
├── docker_hosts # VMs running Docker services (most hosts)
└── proxmox_hosts # Proxmox nodes themselves (limited management scope)
├── proxmox_hosts # Proxmox nodes themselves (limited management scope)
└── offsite_hosts # askari (off-site Hetzner) — NetBird coordinator + external watchdog
```
The `control` group holds the single manually-provisioned control node; it is
managed for baseline config (SSH, firewall, updates) but never runs the
`docker_host` role. Proxmox nodes are managed only for basic baseline tasks (SSH).
`docker_host` role. The `offsite_hosts` group holds `askari`, the off-site Hetzner
host — also manually provisioned (ADR-016), managed for baseline config plus the
`netbird_coordinator` service role. Proxmox nodes are managed only for basic baseline tasks (SSH).
Proxmox configuration itself (storage, clustering, networking)
is out of scope.

View file

@ -42,6 +42,7 @@ below). Each service role contains a standard set of files:
| `defaults/main.yml` | Tuneables, `rolename__` namespace |
| `README.md` | Purpose, variables, usage (role convention) |
| `SECURITY.md` | Per-service security record — see ADR-002 and `docs/security/service-security-template.md` |
| `VERIFY.md` | Per-service UI acceptance spec — see ADR-008 Level 4 / ADR-017 and `docs/testing/service-verify-template.md` |
| `meta/main.yml`, `molecule/default/` | Metadata + Debian 13 test scenario |
### Standard deploy mechanics

View file

@ -75,7 +75,7 @@ isolation — no risk of accidentally applying the wrong state.
Each environment directory contains:
- `providers.tf` — provider version pins and configuration
- `backend.tf`Forgejo state backend (environment-specific path)
- `backend.tf`backend configuration (local state on the control node; no remote backend — see "State backend" above)
- `variables.tf` — input declarations
- `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values
- `main.tf``local.vms` map and module calls (no DNS resources)

View file

@ -75,7 +75,12 @@ The seam's interface is a single Terraform output consumed by a single script.
`terraform output -json` and writes `inventories/<env>/hosts.yml`. It validates the
group against the allowed set and fails loudly on an unknown group.
**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`.
**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`.
`control` and `offsite_hosts` are not produced by Terraform — they hold manually
provisioned hosts (`ubongo` and `askari` respectively) added to the inventory by hand
(see the control-node exception below and ADR-015/ADR-016). They are valid groups so
the generated `hosts.yml` carries their (otherwise empty) sections.
The generated `hosts.yml` carries a "do not edit manually" header and is owned by
the generator. Treat it as a build artifact: the source of truth is `local.vms` in

View file

@ -85,10 +85,11 @@ The accelerators this policy prefers (`context7`, `deep-research`, `superpowers`
`claude-code-guide`) are **plugins under `~/.claude/`** — local per machine, **not**
synced by Claude account and **not** carried by the git repo (only `.claude/commands`,
`.claude/hooks`, `.claude/settings.json` travel). A fresh clone therefore lacks the
plugin toolchain until it is reinstalled. Making it reproducible from the repo
(`extraKnownMarketplaces` + `enabledPlugins` in `.claude/settings.json`, plus a
bootstrap step) is tracked in `docs/TODO.md` and tied to control-node/AI setup. Until
then, the graceful-degradation fallback above keeps the policy working.
plugin toolchain until it is reinstalled. Making it reproducible from the repo is
**done** (TODO 10.7): `.claude/settings.json` declares `extraKnownMarketplaces` +
`enabledPlugins`, and `docs/runbooks/claude-code-setup.md` documents the per-machine
bootstrap. Until a fresh clone runs that bootstrap, the graceful-degradation fallback
above keeps the policy working.
## Decision

View file

@ -77,7 +77,8 @@ allocated for it.
- **Coordinator survival:** off-site on `askari` ⇒ mesh survives a homelab outage.
NetBird's management datastore is backed up encrypted off `askari` (synced to
`ubongo`/`mamba`); peers keep last-known config through a brief coordinator outage.
- **`askari` is Ansible-managed:** its own inventory group, `base` role, plus a
- **`askari` is Ansible-managed:** its own inventory group `offsite_hosts` (added
manually like the control node — it is not Terraform-managed), `base` role, plus a
dedicated `netbird_coordinator` service role (one service = one role, ADR-004; with
`SECURITY.md`). Agent install/enrollment lives in `base`. NetBird server + agents are
version-pinned (ADR-011). boma's `dns` role stays authoritative for

View file

@ -82,7 +82,16 @@ service clears the security bar — record any conscious deviation in
manual in review today, with the planned `/security-review` aggregating every
`roles/*/SECURITY.md` to automate it.
### 10. Commit
### 10. Write the per-service verification spec (services)
For a **service** role, copy `docs/testing/service-verify-template.md` to
`roles/<rolename>/VERIFY.md` and fill it in: the critical user journeys that define
"working" for this service, what good looks like, what is not browser-verifiable
(→ manual handoff), and the test data needed. This is the per-service backbone for the
Level 4 `/verify-service` check (ADR-008 / ADR-017) and is part of the pre-production
service-clearance gate (`docs/security/service-checklist.md`).
### 11. Commit
```bash
git checkout -b role/<rolename>

View file

@ -15,13 +15,15 @@ Expected Terraform output shape:
}
}
Valid groups: control, docker_hosts, proxmox_hosts
Valid groups: control, docker_hosts, proxmox_hosts, offsite_hosts
(control and offsite_hosts hold manually-provisioned hosts not in Terraform; they
are valid so their sections appear in the generated inventory see ADR-009.)
"""
import json
import sys
VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts"}
VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts", "offsite_hosts"}
def main() -> None: