docs(backup): add foundation-layer implementation plan (ADR-022)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
eaffd8d900
commit
2041bd3b70
1 changed files with 476 additions and 0 deletions
476
docs/superpowers/plans/2026-06-10-backup-strategy.md
Normal file
476
docs/superpowers/plans/2026-06-10-backup-strategy.md
Normal file
|
|
@ -0,0 +1,476 @@
|
|||
# Backup & DR Strategy — Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Land the *foundation layer* of the backup strategy — ADR-022, the per-service `backup__*` data contract + `BACKUP.md` governance triad (template + checklist gate + runbook step + dormant verifier), and the doc/inventory updates — so every future service role is born backup-aware, before any live infrastructure exists.
|
||||
|
||||
**Architecture:** This is the first of three sequenced plans (see *Decomposition & roadmap* below). It is **doc/governance only** — no Ansible role, no live restic/rclone, no host contact. It mirrors exactly how ADR-021 delivered operational-access governance: a template under `docs/<concern>/`, one line in `docs/security/service-checklist.md`, a step in `docs/runbooks/new-role.md`, and a *dormant* verifier command (`/check-access` → here `/check-backup`). boma deliberately gates these per-service docs via checklist+runbook, **not** an automated lint script — so this plan adds **no** `scripts/check-*.py`. (This reconciles the design doc's casual "make lint gates its presence" phrasing with boma's actual governance choice; the ADR records the reconciliation.)
|
||||
|
||||
**Tech Stack:** Markdown docs, Ansible role-var conventions (`backup__*`, double-underscore namespace per CLAUDE.md), `make lint` (yamllint + ansible-lint + `check-tags.py`) as the only automated gate, `git` trunk-based on a feature branch.
|
||||
|
||||
**Source spec:** `docs/superpowers/specs/2026-06-10-backup-strategy-design.md` (Decisions 1–13 referenced by number throughout).
|
||||
|
||||
---
|
||||
|
||||
## Decomposition & roadmap
|
||||
|
||||
The full spec spans three subsystems with hard ordering dependencies (STATUS.md: no service roles exist, `fisi` unprovisioned, Terraform never `init`ed, no staging cluster, no Uptime Kuma/pCloud). Each becomes its own plan and produces working, testable software on its own:
|
||||
|
||||
- **Plan 1 — Foundation (THIS PLAN).** ADR + `backup__*` contract + `BACKUP.md` governance + doc/inventory updates. Buildable and verifiable **today** with zero live infra. Unblocks every service role.
|
||||
- **Plan 2 — The `backup` role (FUTURE).** `make new-role NAME=backup`: pull orchestrator, restic wrapper, `rclone→pCloud`, retention prune, udev air-gap unit + `restic copy`, systemd timers, ntfy + Uptime-Kuma heartbeat. Built with Molecule render/syntax tests + pytest, the way the `firewall` concern was — buildable now, *functionally* testable only once `fisi` + hosts exist. **Blocked on:** `fisi` provisioned (SATA power cable), `backup_hosts` inventory group, at least one service role declaring `backup__*`.
|
||||
- **Plan 3 — Live wire-up + restore testing (FUTURE).** Deploy the role, pCloud rclone auth, Uptime Kuma push monitor, Tier-1 restore-verify on `ubongo`, semi-annual Tier-2 DR rehearsal on staging, the printed break-glass runbook + its annual drill. **Blocked on:** Plan 2 deployed, real VMs/staging, services with `VERIFY.md`, Vaultwarden live.
|
||||
|
||||
Write Plans 2 and 3 with this same skill when their prerequisites land. Everything below is Plan 1.
|
||||
|
||||
---
|
||||
|
||||
## Plan 1 file map
|
||||
|
||||
| File | Action | Responsibility |
|
||||
|---|---|---|
|
||||
| `docs/decisions/022-backup.md` | create | ADR of record; distils the spec's Decisions 1–13 |
|
||||
| `docs/backup/service-backup-template.md` | create | `BACKUP.md` template; defines the `backup__*` contract shape |
|
||||
| `.claude/commands/check-backup.md` | create | Dormant verifier (mirrors `check-access.md`) |
|
||||
| `CLAUDE.md` | modify | Role-conventions: BACKUP.md required for service roles; Further-reading row |
|
||||
| `docs/security/service-checklist.md` | modify | Strengthen the Operability backup line to the ADR-022 gate |
|
||||
| `docs/runbooks/new-role.md` | modify | Add the per-service BACKUP.md step (new §12, renumber commit) |
|
||||
| `docs/hardware/reference.md` | modify | `ubongo` → M70q/1TB; add `fisi` node + capacity row |
|
||||
| `docs/CAPABILITIES.md` | modify | §9: restic+rclone+USB committed; PBS deferred; ref ADR-022 |
|
||||
| `STATUS.md` | modify | Add "Designed but not built" rows for backup role + contract |
|
||||
| `docs/TODO.md` | modify | Mark item 3.8 decided; reference ADR-022 |
|
||||
|
||||
**Working branch (all tasks):** AI-driven multi-file change → review as one diff (CLAUDE.md git conventions).
|
||||
|
||||
```bash
|
||||
git checkout -b feat/backup-foundation
|
||||
```
|
||||
|
||||
Before any commit, confirm `rbw unlocked` exits 0 (the pre-commit hook decrypts `vault.yml`); if not, stop and ask the operator to `rbw unlock`.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Author ADR-022 and wire the decision into CLAUDE.md / STATUS.md / TODO.md
|
||||
|
||||
**Files:**
|
||||
- Create: `docs/decisions/022-backup.md`
|
||||
- Modify: `CLAUDE.md` (Further-reading table; role-conventions block)
|
||||
- Modify: `STATUS.md` ("Designed but not built" table)
|
||||
- Modify: `docs/TODO.md` (item 3.8)
|
||||
|
||||
- [ ] **Step 1: Write `docs/decisions/022-backup.md`**
|
||||
|
||||
Mirror the structure of `docs/decisions/021-operational-access.md` (`## Context`, `## Decision`, subsections, `## Consequences`). Transcribe the spec's settled decisions — do not re-derive. The ADR body must state, each as its own labelled decision:
|
||||
|
||||
1. **Recovery model A** — data-only restic backups, rebuild-from-code; no PBS in v1 (deferred as Model B/C). (spec Decision 1)
|
||||
2. **One tier, ~24 h RPO.** (Decision 2)
|
||||
3. **Engine:** restic (data) + rclone (pCloud off-site); restic encrypts → rclone moves ciphertext only, no second layer. (Decision 3)
|
||||
4. **Topology:** central off-cluster **pull** node (`fisi`, provisional), 2×8 TB mirror, owns the repo, runs rclone + the USB dock; hosts hold no backup creds. New `backup_hosts` inventory group, `base` role applies. (Decision 4)
|
||||
5. **3-2-1 mapping** incl. USB air-gap as the immutable backstop. (Decision 5)
|
||||
6. **Per-service contract:** `backup__*` role vars + required `BACKUP.md`, rendered from the data (the ADR-021 pattern). **Governance reconciliation:** gated via the per-service checklist + new-role runbook + dormant `/check-backup` verifier — **not** an automated lint script (consistent with ADR-021's "runbook+gate, not scaffold" choice). State this explicitly so it supersedes the design doc's "make lint gates its presence" wording. (Decision 6)
|
||||
7. **Consistency:** logical dumps first (`pg_dump`/`mysqldump`), `quiesce` escape hatch; FS snapshots not the sole DB method. (Decision 7)
|
||||
8. **Restore testing:** Tier-1 weekly rolling container restore-verify on `ubongo` (reuses `VERIFY.md`); Tier-2 semi-annual full DR rehearsal on staging, ≥1/yr exercises the paper break-glass. `ubongo` stays bare Debian, not a hypervisor (ADR-015 unchanged). (Decision 8)
|
||||
9. **Retention (GFS):** `--keep-daily 7 --keep-weekly 4 --keep-monthly 6 --keep-yearly 1`. (Decision 9)
|
||||
10. **Encryption + escrow + break-glass:** one restic password protects all copies; escrowed to `fisi`(+vault) / Vaultwarden / **paper**; paper holds **both** the restic password **and** the Ansible vault password (breaks the Model-A circular dependency); `mamba` is the break-glass clone (ADR-015). (Decision 10)
|
||||
11. **USB air-gap:** udev serial-allowlist → `restic copy` to a USB restic repo → `restic check` → ntfy; rotate off-site. (Decision 11)
|
||||
12. **Failure alerting:** Uptime-Kuma dead-man's-switch + ntfy on failure + weekly `restic check`. (Decision 12)
|
||||
13. **Schedule.** (Decision 13)
|
||||
|
||||
`## Consequences` must note: pCloud is off-site but **sync-coupled** (deletes propagate) → USB is the only immutable copy; `fisi` is the crown-jewel host (full base hardening); pCloud's 1 TB is the off-site capacity ceiling. End with a one-line pointer back to the design doc and to Plans 2–3 as the build path.
|
||||
|
||||
- [ ] **Step 2: Add the Further-reading row in `CLAUDE.md`**
|
||||
|
||||
In the Further-reading table, immediately after the `Operational access … 021-operational-access.md` row, add:
|
||||
|
||||
```
|
||||
| Backup & disaster recovery | `docs/decisions/022-backup.md` |
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Add the BACKUP.md role-convention in `CLAUDE.md`**
|
||||
|
||||
In the "Role conventions" list, immediately after the `ACCESS.md (ADR-021)` bullet, add:
|
||||
|
||||
```
|
||||
- Every **service** role that holds state must have a populated `BACKUP.md` (ADR-022) —
|
||||
copy `docs/backup/service-backup-template.md`; rendered from the role's `backup__*`
|
||||
data. A stateless service records `backup__state: false` with a reason.
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Add STATUS.md rows**
|
||||
|
||||
In the "Designed but not built" table in `STATUS.md`, add two rows:
|
||||
|
||||
```
|
||||
| Backup `backup` role + `backup_hosts` group | ADR-022 | Does not exist. Pull node (`fisi`), restic repo, rclone→pCloud, USB air-gap — Plan 2. |
|
||||
| Per-service `backup__*` contract + `BACKUP.md` | ADR-022 | Convention defined; inert until service roles exist to declare against. |
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Update TODO item 3.8**
|
||||
|
||||
In `docs/TODO.md`, change the item-3.8 line:
|
||||
|
||||
From:
|
||||
```
|
||||
8. Ensure the right things are backed up (incl. database dumps if we land on PBS).
|
||||
```
|
||||
To:
|
||||
```
|
||||
8. ~~Ensure the right things are backed up (incl. database dumps if we land on PBS).~~
|
||||
DECIDED (ADR-022): data-only restic (Model A, no PBS) pulled by an off-cluster
|
||||
node (`fisi`); per-service `backup__*` + `BACKUP.md`; logical DB dumps; 3-2-1 via
|
||||
pCloud + rotated USB air-gap. Build: Plans 2–3.
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Verify**
|
||||
|
||||
Run: `make lint`
|
||||
Expected: PASS (yamllint, ansible-lint, `check-tags: OK …`). No new YAML/tags introduced, so this confirms nothing regressed.
|
||||
|
||||
Run: `grep -n "022-backup" CLAUDE.md && grep -rn "ADR-022" docs/decisions/022-backup.md STATUS.md docs/TODO.md`
|
||||
Expected: matches in every listed file (cross-references resolve).
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/decisions/022-backup.md CLAUDE.md STATUS.md docs/TODO.md
|
||||
git commit -m "docs(backup): record ADR-022; wire into CLAUDE.md, STATUS, TODO"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Create the `BACKUP.md` template and define the `backup__*` contract
|
||||
|
||||
**Files:**
|
||||
- Create: `docs/backup/service-backup-template.md`
|
||||
|
||||
- [ ] **Step 1: Create the template**
|
||||
|
||||
Mirror `docs/access/service-access-template.md` (preamble that says copy-to-role-and-delete; structured tables rendered from data; a hand-written prose tail). Write exactly:
|
||||
|
||||
````markdown
|
||||
# Per-service backup record — template
|
||||
|
||||
Copy this file to `roles/<service>/BACKUP.md` when building a **stateful** service
|
||||
role (ADR-022). It is the per-service **backup record**: what state the service holds,
|
||||
how it is captured consistently, and how it is restored. The structured parts are
|
||||
**rendered from the role's `backup__*` data** (the single source of truth that also
|
||||
drives `/check-backup`) — keep the data authoritative and regenerate this file rather
|
||||
than hand-editing the tables. The prose "Restore notes" tail is hand-written.
|
||||
|
||||
A **stateless** service (holds no persistent data) does not get a `BACKUP.md`; it sets
|
||||
`backup__state: false` with a reason in its role defaults instead.
|
||||
|
||||
Delete this preamble in the copy and start from the heading below.
|
||||
|
||||
---
|
||||
|
||||
# Backup — <service>
|
||||
|
||||
## State captured
|
||||
|
||||
Rendered from `backup__*`:
|
||||
|
||||
| What | Source | How captured |
|
||||
|---|---|---|
|
||||
| data dir(s) | `<backup__paths[*]>` | file-level, pulled read-only |
|
||||
| database | `<backup__dumps[*].cmd>` → `<backup__dumps[*].dest>` | logical dump (default; ADR-022 Decision 7) |
|
||||
|
||||
- **Quiesce:** `<backup__quiesce>` — `true` means the service is stopped → backed up →
|
||||
restarted (escape hatch for data that cannot be dumped live; ADR-022 Decision 7 B).
|
||||
- **RPO:** ~24 h (nightly; ADR-022 Decision 2).
|
||||
|
||||
## Restore procedure
|
||||
|
||||
1. Re-provision the host (Terraform) and redeploy this role (Ansible) — Model A.
|
||||
2. `restic restore` the latest snapshot for `<backup__service>` into `<backup__paths>`.
|
||||
3. Replay each `<backup__dumps[*].dest>` into its database.
|
||||
4. Confirm with this role's `VERIFY.md` checks (ADR-008/017).
|
||||
|
||||
## Restore notes
|
||||
|
||||
Prose the data can't capture — ordering gotchas, "restore the DB before the data dir",
|
||||
known-tricky migrations.
|
||||
|
||||
- <none yet>
|
||||
````
|
||||
|
||||
The `backup__*` contract this template renders from (document it here and in the ADR; the role in Plan 2 consumes it):
|
||||
|
||||
```yaml
|
||||
backup__service: <name> # identifier; matches the role / compose project
|
||||
backup__state: true # false = stateless → no BACKUP.md (pair with a reason)
|
||||
backup__paths: # bind-mount dirs/files holding state ([] = none)
|
||||
- /srv/<service>/data
|
||||
backup__dumps: # logical app-consistent dumps (Decision 7 default; [] = none)
|
||||
- cmd: "docker compose -p <service> exec -T db pg_dump -U {{ vault.<service>.db_user }} <db>"
|
||||
dest: <service>-db.sql
|
||||
backup__quiesce: false # true = stop→back up→restart escape hatch (Decision 7 B)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify**
|
||||
|
||||
Run: `test -f docs/backup/service-backup-template.md && echo PRESENT`
|
||||
Expected: `PRESENT`
|
||||
|
||||
Run: `make lint`
|
||||
Expected: PASS (markdown only; confirms no regression).
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/backup/service-backup-template.md
|
||||
git commit -m "docs(backup): add BACKUP.md template + backup__* contract (ADR-022)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Strengthen the per-service checklist gate
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/security/service-checklist.md` (Operability section)
|
||||
|
||||
- [ ] **Step 1: Replace the weak backup line with the ADR-022 gate**
|
||||
|
||||
In the "Operability (security-adjacent)" section, replace this line:
|
||||
|
||||
```
|
||||
- [ ] Backup/restore is covered if the service holds state
|
||||
```
|
||||
|
||||
with (mirroring the existing ADR-021 access line directly below it):
|
||||
|
||||
```
|
||||
- [ ] Backup/restore recorded and verifiable (ADR-022): a stateful service carries
|
||||
`backup__*` data, `roles/<service>/BACKUP.md` is rendered, and `/check-backup`
|
||||
reports the declared paths/dumps captured in the latest snapshot — or the service
|
||||
sets `backup__state: false` with a reason. Deviations → `docs/security/accepted-risks.md`.
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify**
|
||||
|
||||
Run: `grep -n "ADR-022" docs/security/service-checklist.md`
|
||||
Expected: one match (the new gate line).
|
||||
|
||||
Run: `grep -c "Backup/restore is covered if the service holds state" docs/security/service-checklist.md`
|
||||
Expected: `0` (old weak line gone).
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/security/service-checklist.md
|
||||
git commit -m "docs(backup): gate BACKUP.md in service checklist (ADR-022)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Add the BACKUP.md step to the new-role runbook
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/runbooks/new-role.md` (insert a new step after the §11 ACCESS step; renumber the commit step)
|
||||
|
||||
- [ ] **Step 1: Insert the new step**
|
||||
|
||||
Immediately after the §11 "Write the per-service operational-access record" block and before "### 12. Commit", insert:
|
||||
|
||||
```markdown
|
||||
### 12. Write the per-service backup record (stateful services)
|
||||
|
||||
For a **stateful** service role, copy `docs/backup/service-backup-template.md` to
|
||||
`roles/<rolename>/BACKUP.md` and populate the role's `backup__*` data (`backup__service`,
|
||||
`backup__paths`, `backup__dumps` — `cmd` + `dest` per logical dump — and `backup__quiesce`;
|
||||
ADR-022). Prefer logical dumps (`pg_dump`/`mysqldump`) over file-level DB copies. `BACKUP.md`
|
||||
is rendered from that data. A **stateless** service sets `backup__state: false` with a
|
||||
reason and gets no `BACKUP.md`. Once the backup node exists, `/check-backup <rolename>`
|
||||
proves the declared state is captured — part of the service-clearance gate
|
||||
(`docs/security/service-checklist.md`).
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Renumber the commit step**
|
||||
|
||||
Change the heading `### 12. Commit` (now the following heading) to `### 13. Commit`.
|
||||
|
||||
- [ ] **Step 3: Verify**
|
||||
|
||||
Run: `grep -nE "^### (11|12|13)\." docs/runbooks/new-role.md`
|
||||
Expected: §11 access, §12 backup, §13 commit — in that order, no duplicate numbers.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/runbooks/new-role.md
|
||||
git commit -m "docs(backup): add BACKUP.md step to new-role runbook (ADR-022)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Create the dormant `/check-backup` verifier command
|
||||
|
||||
**Files:**
|
||||
- Create: `.claude/commands/check-backup.md`
|
||||
|
||||
- [ ] **Step 1: Write the command**
|
||||
|
||||
Mirror the sibling `.claude/commands/check-access.md` (same frontmatter/sections, same "dormant until infra exists" framing). Write:
|
||||
|
||||
````markdown
|
||||
---
|
||||
description: Backup-coverage verification (ADR-022) — proves a service's declared backup state is actually captured.
|
||||
---
|
||||
|
||||
Verify that a service's **declared** backup data (`backup__*`) is actually captured in
|
||||
the backup repo, so the verifier and `BACKUP.md` can never disagree (the ADR-021 pattern,
|
||||
applied to backups). Argument: a service/role name (e.g. `/check-backup nextcloud`).
|
||||
|
||||
**Dormant until the backup node exists** (Plan 2/3): with no `fisi` repo to query, this
|
||||
command reports `not-yet-available` rather than failing.
|
||||
|
||||
## Preconditions
|
||||
|
||||
- `roles/<name>/` carries `backup__*` data (or `backup__state: false` with a reason).
|
||||
- The backup node (`fisi`) is reachable and its restic repo exists. If not → report
|
||||
`not-yet-available` and stop.
|
||||
|
||||
## Checks (when live)
|
||||
|
||||
Load the `backup__*` data for the resolved role, then:
|
||||
|
||||
| Check | How | Green when |
|
||||
|---|---|---|
|
||||
| snapshot freshness | `restic snapshots --tag <backup__service> --latest 1` | a snapshot ≤ ~24 h old exists |
|
||||
| paths present | the latest snapshot contains every `backup__paths` entry | all declared paths present |
|
||||
| dumps present | the snapshot contains every `backup__dumps[*].dest` | all declared dumps present |
|
||||
| integrity | `restic check --read-data-subset` (sampled) | no errors |
|
||||
|
||||
Report per-check pass/fail; a stateless role (`backup__state: false`) reports `n/a (stateless)`.
|
||||
````
|
||||
|
||||
- [ ] **Step 2: Verify**
|
||||
|
||||
Run: `test -f .claude/commands/check-backup.md && head -1 .claude/commands/check-backup.md`
|
||||
Expected: file present, first line `---` (valid frontmatter).
|
||||
|
||||
Run: `grep -n "not-yet-available" .claude/commands/check-backup.md`
|
||||
Expected: matches (dormancy explicit).
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add .claude/commands/check-backup.md
|
||||
git commit -m "feat(backup): add dormant /check-backup verifier (ADR-022)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Update hardware reference and capabilities
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/hardware/reference.md` (`ubongo` spec; new `fisi` node; capacity table)
|
||||
- Modify: `docs/CAPABILITIES.md` (§9 Data & backup)
|
||||
|
||||
- [ ] **Step 1: Update the `ubongo` prose block**
|
||||
|
||||
In `docs/hardware/reference.md` §1, replace the `ubongo` Storage line target with the real machine:
|
||||
|
||||
From:
|
||||
```
|
||||
- **Storage:** _TBD (target 250 GB SSD/NVMe)_
|
||||
```
|
||||
To:
|
||||
```
|
||||
- **Storage:** 1 TB NVMe (ThinkCentre M70q Tiny; i3-10100T, 16 GB) — over-spec for Tier-1 restore-verify (ADR-022)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add a `fisi` prose block**
|
||||
|
||||
After the `ubongo` block in §1, add:
|
||||
|
||||
```
|
||||
### fisi (backup node — outside the cluster; provisional)
|
||||
- **Model / form factor:** HP Elite 600 G9 (tower)
|
||||
- **CPU:** i-series (12th-gen), x86-64 — featherweight for a data-only restic node
|
||||
- **RAM:** 16 GB+ (TBD exact)
|
||||
- **Storage:** OS NVMe + **2× 8 TB HDD in a mirror** (ZFS/mdraid → 8 TB usable, survives one disk)
|
||||
- **NICs:** wired GbE
|
||||
- **Notes:** off-cluster pull backup node (ADR-022); owns the restic repo, runs rclone→pCloud,
|
||||
docks the rotated USB air-gap drives. **Pending:** SATA power cable to the HDDs.
|
||||
Crown-jewel host → full `base` hardening. Assignment provisional (revisit when all hardware on hand).
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Update the machine-readable capacity table**
|
||||
|
||||
In §4 "Node capacity", change the `ubongo` row disk from `250` to `1000` and add a `fisi` row. Keep the header and integer/decimal format intact (parsed by `capacity-scan.py`):
|
||||
|
||||
From:
|
||||
```
|
||||
| ubongo | 4 | 16 | 250 |
|
||||
```
|
||||
To:
|
||||
```
|
||||
| ubongo | 4 | 16 | 1000 |
|
||||
| fisi | 4 | 16 | 8000 |
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Update CAPABILITIES §9**
|
||||
|
||||
In `docs/CAPABILITIES.md` §9 table, replace the three backup rows:
|
||||
|
||||
From:
|
||||
```
|
||||
| Backup engine | Proxmox Backup Server · restic | P | planned | VM backups (PBS) + file/DB dumps (restic) | TODO 3.8 |
|
||||
| Off-site target | pCloud | S | planned | Off-site copy of backups (3-2-1) | |
|
||||
| Air-gap target | USB hard drives | S | maybe-later | Periodic cold/air-gapped copy | Manual rotation |
|
||||
```
|
||||
To:
|
||||
```
|
||||
| Backup engine | restic (data-only) | S | committed | Per-service state: file dirs + logical DB dumps, pulled by `fisi` | ADR-022 (PBS deferred) |
|
||||
| Off-site target | pCloud (via rclone) | S | committed | Encrypted off-site copy of the restic repo (3-2-1) | ADR-022; sync-coupled |
|
||||
| Air-gap target | USB hard drives | S | committed | Rotated offline cold copy — the immutable backstop | ADR-022; udev-triggered `restic copy` |
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Verify**
|
||||
|
||||
Run: `make lint`
|
||||
Expected: PASS.
|
||||
|
||||
Run: `python3 scripts/capacity-scan.py >/dev/null && echo CAPACITY_OK`
|
||||
Expected: `CAPACITY_OK` (the capacity table headers are still parseable; new `fisi` row accepted).
|
||||
|
||||
Run: `grep -n "ADR-022" docs/CAPABILITIES.md`
|
||||
Expected: three matches (the updated backup rows).
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/hardware/reference.md docs/CAPABILITIES.md
|
||||
git commit -m "docs(backup): update hardware ref (ubongo M70q, add fisi) + CAPABILITIES §9 (ADR-022)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 7: Final review and merge
|
||||
|
||||
- [ ] **Step 1: Full lint + capacity sanity**
|
||||
|
||||
Run: `make lint && python3 scripts/capacity-scan.py >/dev/null && echo ALL_GREEN`
|
||||
Expected: `ALL_GREEN`.
|
||||
|
||||
- [ ] **Step 2: Cross-reference audit**
|
||||
|
||||
Run: `grep -rln "ADR-022\|022-backup" CLAUDE.md STATUS.md docs/ .claude/`
|
||||
Expected: ADR file, CLAUDE.md, STATUS.md, TODO.md, service-checklist.md, new-role.md, CAPABILITIES.md, check-backup.md all listed — no dangling reference, no file missed.
|
||||
|
||||
- [ ] **Step 3: Merge to main and delete the branch**
|
||||
|
||||
```bash
|
||||
git checkout main
|
||||
git merge --no-ff feat/backup-foundation -m "feat(backup): backup strategy foundation layer (ADR-022)"
|
||||
git branch -d feat/backup-foundation
|
||||
git push origin main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Self-review (completed by plan author)
|
||||
|
||||
- **Spec coverage:** All 13 decisions are recorded in ADR-022 (Task 1, Step 1). The *foundation* obligations of Decisions 6 (contract + BACKUP.md), 7 (dumps-first wording in template/runbook), and the doc/inventory facts (Decisions 4/8 hardware) are implemented as concrete files in Tasks 2–6. Decisions whose *implementation* is live infra — 1/3/9/11/12/13 (engine, retention, air-gap mechanism, alerting, schedule) and 8's restore-testing — are explicitly deferred to Plans 2–3 (see *Decomposition & roadmap*), not silently dropped.
|
||||
- **Placeholder scan:** No "TBD/implement later" steps; every edit shows exact from→to text or full file content. (`<service>`/`<name>` inside template/contract bodies are intentional doc placeholders for the eventual role author, not plan gaps.)
|
||||
- **Consistency:** `backup__*` field names (`backup__service`, `backup__state`, `backup__paths`, `backup__dumps[].cmd/.dest`, `backup__quiesce`) are identical across the ADR (Task 1), template + contract (Task 2), checklist (Task 3), runbook (Task 4), and `/check-backup` (Task 5). The governance triad matches ADR-021's (template / checklist line / runbook step / dormant verifier), and the "no lint script" choice is stated in both the plan header and the ADR.
|
||||
Loading…
Add table
Reference in a new issue