Compare commits

...

3 commits

Author SHA1 Message Date
8e4bf3dd88 ADR-006/014: clear two stale labels
Review O5/O6: ADR-006 mislabeled backend.tf as "Forgejo state backend" (its own
State-backend section chooses local state — Forgejo's API is read-only); ADR-014
called plugin reproducibility open though TODO 10.7 is done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:55:17 +02:00
d8afa94c4b Name and propagate the offsite_hosts inventory group (askari)
Review O4: ADR-016 said askari gets "its own inventory group" but never named it.
Settled as offsite_hosts (off-site, distinct from on-site-but-off-cluster ubongo).
Added to VALID_GROUPS (tf_to_inventory.py), ADR-009 valid groups, ADR-001/ADR-016
host-group enumerations, and CLAUDE.md. Generated hosts.yml picks up the section on
the next make tf-inventory (a manual-exception group like control).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:54:54 +02:00
f0d189ca09 Thread the VERIFY.md convention through ADR-004/new-role/README
Review O1-O3: ADR-017's per-service VERIFY.md requirement now appears in the
ADR-004 service-role file table, as a new-role runbook step, and the README
docs index/tree are refreshed (ADRs 010-017, security/testing/hardware dirs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 18:52:42 +02:00
10 changed files with 55 additions and 15 deletions

View file

@ -101,14 +101,17 @@ inventories/
vault.yml vault.yml
docker_hosts/ # hosts running Docker services docker_hosts/ # hosts running Docker services
proxmox_hosts/ # Proxmox nodes themselves proxmox_hosts/ # Proxmox nodes themselves
offsite_hosts/ # off-site hosts (askari) — NetBird coordinator + watchdog
host_vars/ # per-host overrides host_vars/ # per-host overrides
staging/ # safe to run freely staging/ # safe to run freely
``` ```
Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts` Host groups: `all`, `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`
(`control` holds `ubongo`, the one manually-provisioned **physical** control node (`control` holds `ubongo`, the one manually-provisioned **physical** control node
outside the cluster — see ADR-009 and ADR-015.) outside the cluster; `offsite_hosts` holds `askari`, the off-site Hetzner host that
runs the NetBird coordinator + watchdog — also added manually. See ADR-009, ADR-015,
ADR-016.)
--- ---

View file

@ -57,7 +57,11 @@ See `Makefile` for the full list of targets.
├── docs/ ├── docs/
│ ├── decisions/ # Architecture decision records (ADRs) │ ├── decisions/ # Architecture decision records (ADRs)
│ └── runbooks/ # Step-by-step operational procedures │ ├── runbooks/ # Step-by-step operational procedures
│ ├── security/ # Per-service security checklist + templates + accepted risks
│ ├── testing/ # VERIFY.md template + service-UI verification reports
│ ├── hardware/ # Physical capacity reference + reviews
│ └── reviews/ # /review-repo reports
├── inventories/ ├── inventories/
│ ├── production/ # Live hosts — edit carefully │ ├── production/ # Live hosts — edit carefully
@ -92,6 +96,17 @@ See `Makefile` for the full list of targets.
- Network topology: `docs/decisions/007-network.md` - Network topology: `docs/decisions/007-network.md`
- Testing methodology: `docs/decisions/008-testing.md` - Testing methodology: `docs/decisions/008-testing.md`
- Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md` - Terraform ↔ Ansible handoff: `docs/decisions/009-provisioning-handoff.md`
- Forgejo & CI: `docs/decisions/010-forgejo-ci.md`
- Update management: `docs/decisions/011-update-management.md`
- Hardware & capacity: `docs/decisions/012-hardware-capacity.md`
- Heritage / V4 policy: `docs/decisions/013-heritage-v4.md`
- Sourcing technical knowledge: `docs/decisions/014-knowledge-sourcing.md`
- Control / AI-worker host (`ubongo`): `docs/decisions/015-control-host.md`
- Mesh VPN (NetBird): `docs/decisions/016-mesh-vpn.md`
- Service-UI verification (Level 4): `docs/decisions/017-service-ui-verification.md`
(CLAUDE.md carries the full cross-referenced table, including the runbooks and
security/testing docs.)
## Contributing ## Contributing

View file

@ -35,12 +35,15 @@ describes the *intended* design — see STATUS.md for what is actually built.
all all
├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services ├── control # ubongo — physical control node outside the cluster; baseline config only, runs no services
├── docker_hosts # VMs running Docker services (most hosts) ├── docker_hosts # VMs running Docker services (most hosts)
└── proxmox_hosts # Proxmox nodes themselves (limited management scope) ├── proxmox_hosts # Proxmox nodes themselves (limited management scope)
└── offsite_hosts # askari (off-site Hetzner) — NetBird coordinator + external watchdog
``` ```
The `control` group holds the single manually-provisioned control node; it is The `control` group holds the single manually-provisioned control node; it is
managed for baseline config (SSH, firewall, updates) but never runs the managed for baseline config (SSH, firewall, updates) but never runs the
`docker_host` role. Proxmox nodes are managed only for basic baseline tasks (SSH). `docker_host` role. The `offsite_hosts` group holds `askari`, the off-site Hetzner
host — also manually provisioned (ADR-016), managed for baseline config plus the
`netbird_coordinator` service role. Proxmox nodes are managed only for basic baseline tasks (SSH).
Proxmox configuration itself (storage, clustering, networking) Proxmox configuration itself (storage, clustering, networking)
is out of scope. is out of scope.

View file

@ -42,6 +42,7 @@ below). Each service role contains a standard set of files:
| `defaults/main.yml` | Tuneables, `rolename__` namespace | | `defaults/main.yml` | Tuneables, `rolename__` namespace |
| `README.md` | Purpose, variables, usage (role convention) | | `README.md` | Purpose, variables, usage (role convention) |
| `SECURITY.md` | Per-service security record — see ADR-002 and `docs/security/service-security-template.md` | | `SECURITY.md` | Per-service security record — see ADR-002 and `docs/security/service-security-template.md` |
| `VERIFY.md` | Per-service UI acceptance spec — see ADR-008 Level 4 / ADR-017 and `docs/testing/service-verify-template.md` |
| `meta/main.yml`, `molecule/default/` | Metadata + Debian 13 test scenario | | `meta/main.yml`, `molecule/default/` | Metadata + Debian 13 test scenario |
### Standard deploy mechanics ### Standard deploy mechanics

View file

@ -75,7 +75,7 @@ isolation — no risk of accidentally applying the wrong state.
Each environment directory contains: Each environment directory contains:
- `providers.tf` — provider version pins and configuration - `providers.tf` — provider version pins and configuration
- `backend.tf`Forgejo state backend (environment-specific path) - `backend.tf`backend configuration (local state on the control node; no remote backend — see "State backend" above)
- `variables.tf` — input declarations - `variables.tf` — input declarations
- `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values - `terraform.tfvars.example` — tracked template; copy to `terraform.tfvars` for actual values
- `main.tf``local.vms` map and module calls (no DNS resources) - `main.tf``local.vms` map and module calls (no DNS resources)

View file

@ -75,7 +75,12 @@ The seam's interface is a single Terraform output consumed by a single script.
`terraform output -json` and writes `inventories/<env>/hosts.yml`. It validates the `terraform output -json` and writes `inventories/<env>/hosts.yml`. It validates the
group against the allowed set and fails loudly on an unknown group. group against the allowed set and fails loudly on an unknown group.
**Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`. **Valid groups**: `control`, `docker_hosts`, `proxmox_hosts`, `offsite_hosts`.
`control` and `offsite_hosts` are not produced by Terraform — they hold manually
provisioned hosts (`ubongo` and `askari` respectively) added to the inventory by hand
(see the control-node exception below and ADR-015/ADR-016). They are valid groups so
the generated `hosts.yml` carries their (otherwise empty) sections.
The generated `hosts.yml` carries a "do not edit manually" header and is owned by The generated `hosts.yml` carries a "do not edit manually" header and is owned by
the generator. Treat it as a build artifact: the source of truth is `local.vms` in the generator. Treat it as a build artifact: the source of truth is `local.vms` in

View file

@ -85,10 +85,11 @@ The accelerators this policy prefers (`context7`, `deep-research`, `superpowers`
`claude-code-guide`) are **plugins under `~/.claude/`** — local per machine, **not** `claude-code-guide`) are **plugins under `~/.claude/`** — local per machine, **not**
synced by Claude account and **not** carried by the git repo (only `.claude/commands`, synced by Claude account and **not** carried by the git repo (only `.claude/commands`,
`.claude/hooks`, `.claude/settings.json` travel). A fresh clone therefore lacks the `.claude/hooks`, `.claude/settings.json` travel). A fresh clone therefore lacks the
plugin toolchain until it is reinstalled. Making it reproducible from the repo plugin toolchain until it is reinstalled. Making it reproducible from the repo is
(`extraKnownMarketplaces` + `enabledPlugins` in `.claude/settings.json`, plus a **done** (TODO 10.7): `.claude/settings.json` declares `extraKnownMarketplaces` +
bootstrap step) is tracked in `docs/TODO.md` and tied to control-node/AI setup. Until `enabledPlugins`, and `docs/runbooks/claude-code-setup.md` documents the per-machine
then, the graceful-degradation fallback above keeps the policy working. bootstrap. Until a fresh clone runs that bootstrap, the graceful-degradation fallback
above keeps the policy working.
## Decision ## Decision

View file

@ -77,7 +77,8 @@ allocated for it.
- **Coordinator survival:** off-site on `askari` ⇒ mesh survives a homelab outage. - **Coordinator survival:** off-site on `askari` ⇒ mesh survives a homelab outage.
NetBird's management datastore is backed up encrypted off `askari` (synced to NetBird's management datastore is backed up encrypted off `askari` (synced to
`ubongo`/`mamba`); peers keep last-known config through a brief coordinator outage. `ubongo`/`mamba`); peers keep last-known config through a brief coordinator outage.
- **`askari` is Ansible-managed:** its own inventory group, `base` role, plus a - **`askari` is Ansible-managed:** its own inventory group `offsite_hosts` (added
manually like the control node — it is not Terraform-managed), `base` role, plus a
dedicated `netbird_coordinator` service role (one service = one role, ADR-004; with dedicated `netbird_coordinator` service role (one service = one role, ADR-004; with
`SECURITY.md`). Agent install/enrollment lives in `base`. NetBird server + agents are `SECURITY.md`). Agent install/enrollment lives in `base`. NetBird server + agents are
version-pinned (ADR-011). boma's `dns` role stays authoritative for version-pinned (ADR-011). boma's `dns` role stays authoritative for

View file

@ -82,7 +82,16 @@ service clears the security bar — record any conscious deviation in
manual in review today, with the planned `/security-review` aggregating every manual in review today, with the planned `/security-review` aggregating every
`roles/*/SECURITY.md` to automate it. `roles/*/SECURITY.md` to automate it.
### 10. Commit ### 10. Write the per-service verification spec (services)
For a **service** role, copy `docs/testing/service-verify-template.md` to
`roles/<rolename>/VERIFY.md` and fill it in: the critical user journeys that define
"working" for this service, what good looks like, what is not browser-verifiable
(→ manual handoff), and the test data needed. This is the per-service backbone for the
Level 4 `/verify-service` check (ADR-008 / ADR-017) and is part of the pre-production
service-clearance gate (`docs/security/service-checklist.md`).
### 11. Commit
```bash ```bash
git checkout -b role/<rolename> git checkout -b role/<rolename>

View file

@ -15,13 +15,15 @@ Expected Terraform output shape:
} }
} }
Valid groups: control, docker_hosts, proxmox_hosts Valid groups: control, docker_hosts, proxmox_hosts, offsite_hosts
(control and offsite_hosts hold manually-provisioned hosts not in Terraform; they
are valid so their sections appear in the generated inventory see ADR-009.)
""" """
import json import json
import sys import sys
VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts"} VALID_GROUPS = {"control", "docker_hosts", "proxmox_hosts", "offsite_hosts"}
def main() -> None: def main() -> None: