Compare commits
10 commits
77a20b8d40
...
180af46879
| Author | SHA1 | Date | |
|---|---|---|---|
| 180af46879 | |||
| 8d8c86fa39 | |||
| 468f8c3a92 | |||
| 26bb7e442d | |||
| 6ac5afaf67 | |||
| b3e14decb4 | |||
| b10a33f439 | |||
| 66a9a0af08 | |||
| e14e347047 | |||
| 24a1d909c9 |
13 changed files with 850 additions and 10 deletions
|
|
@ -146,6 +146,57 @@ harness on ubongo and shaking it down against real KVM (spec/plan in docs/superp
|
||||||
the holistic cross-file review. → for infra this novel, budget for BOTH an adversarial
|
the holistic cross-file review. → for infra this novel, budget for BOTH an adversarial
|
||||||
cross-file review AND a real-hardware run; neither alone would have shipped it working.
|
cross-file review AND a real-hardware run; neither alone would have shipped it working.
|
||||||
|
|
||||||
|
<!-- From the 2026-06-19 mesh-hardening-2/3 design (ubongo INPUT-only default-deny). -->
|
||||||
|
|
||||||
|
- `[friction]` **Raw DHCP leases pinned in ubongo's host firewall (admin-addr SSH allows)**
|
||||||
|
(2026-06-19): mesh-hardening 2/3 lets the operator workstations reach ubongo's LAN SSH by
|
||||||
|
*raw lease* — `base__firewall_admin_addrs: ["10.20.10.50" (mamba), "10.20.10.17"]` — because
|
||||||
|
there is no DHCP reservation yet (OPNsense isn't managed as code). A lease reassignment
|
||||||
|
silently moves the allow to whatever host next holds the IP (still SSH-key-gated) and drops
|
||||||
|
the workstation's *LAN* path (mesh still works, so never a full lockout). → when
|
||||||
|
OPNsense-as-code lands (ADR-020 perimeter / TODO 3.5), replace both with **MAC-pinned DHCP
|
||||||
|
reservations** (`10.20.10.17` = MAC `bc:0f:f3:c8:4a:8a`; mamba's MAC TBD) and allow the
|
||||||
|
reserved IPs. Spec: `docs/superpowers/specs/2026-06-19-mesh-hardening-ubongo-default-deny-design.md`.
|
||||||
|
|
||||||
|
- `[gotcha]` **`make test-integration` on ubongo fails (`qemu-img` "Permission denied") when
|
||||||
|
the agent session predates the `libvirt` group grant** (2026-06-19): the `integration_test`
|
||||||
|
role adds `claude` to `libvirt`+`kvm` and makes the cache dir `/var/lib/boma-integration`
|
||||||
|
`root:libvirt 2775` — correct — but a `claude` session whose shell started *before* that
|
||||||
|
grant carries a stale process group set (`id` → `claude,docker` only, no `libvirt`), so
|
||||||
|
`qemu-img create` of the VM overlay into the group-owned dir is denied. `virsh`/`virt-install`
|
||||||
|
still work (they reach system libvirtd via polkit/socket, and the real KVM runs server-side
|
||||||
|
as `libvirt-qemu`), so ONLY claude's own file-writes break. Unblock without restarting the
|
||||||
|
session: **`sg libvirt -c 'make test-integration HOST=<name>'`** (claude needs only `libvirt`
|
||||||
|
for the dir; `kvm` is server-side; note `sg` adds one group, not the full set). → self-heal
|
||||||
|
in `scripts/integration-vm.py`: if the `libvirt` gid is absent from `os.getgroups()`, re-exec
|
||||||
|
under `sg libvirt` (or have the Makefile target do it), so a stale-session agent never hits
|
||||||
|
this opaque symptom. New agent sessions pick the groups up on login, so it's a stale-session
|
||||||
|
transient — but high-confusion, worth self-healing.
|
||||||
|
|
||||||
|
- `[friction]` **No standard for when the agent may run local-VM integration tests on ubongo
|
||||||
|
without asking** (2026-06-19): `make test-integration HOST=<name>` spins an ISOLATED throwaway
|
||||||
|
KVM VM (its own libvirt NAT; never touches the real host's firewall/network; guards:
|
||||||
|
one-VM-at-a-time + a 4 GiB free-RAM floor + auto-destroy on success), so it is safe and
|
||||||
|
self-contained — yet the agent paused for a go-ahead before running it (mesh-hardening 2/3,
|
||||||
|
Task 4). The operator wants a STANDARD that pre-authorises VM-testing on ubongo so the agent
|
||||||
|
just runs it. → decide + record the rule: e.g. a `.claude/settings.json` permission allow for
|
||||||
|
`make test-integration*` / `scripts/integration-vm.py` (and the `sg libvirt -c '…'` form per
|
||||||
|
the gotcha above), plus a CLAUDE.md line distinguishing the pre-authorised isolated VM tests
|
||||||
|
from the genuinely-gated live steps (`make deploy` to real hosts, host reboots, cutovers —
|
||||||
|
still need a go-ahead). Ties to the `test-risky-infra-before-live-deploy` +
|
||||||
|
`dont-reask-settled-defaults` memories + ADR-025.
|
||||||
|
|
||||||
|
- `[gotcha]` **Molecule covers only the `input_only`-OFF (forward drop) branch of the base
|
||||||
|
firewall** (2026-06-19): mesh-hardening 2/3 added `base__firewall_input_only` (forward policy
|
||||||
|
drop↔accept). The `default` Molecule scenario renders ONE fixture, set to the secure default
|
||||||
|
(drop) — so the fast `make test ROLE=base` gate locks the drop default (security-critical for
|
||||||
|
service hosts) but does NOT exercise the `=true` → forward-`accept` rendering; only `make
|
||||||
|
test-integration HOST=ubongo` does (passed GREEN). An in-converge re-render can't cheaply
|
||||||
|
cover it (role defaults aren't in scope outside the role run). → decide in kaizen: a second
|
||||||
|
Molecule scenario (`molecule/input-only/`) asserting forward `policy accept`, vs accepting the
|
||||||
|
integration-only coverage. Final-review finding; not a cutover blocker (the accept branch is a
|
||||||
|
literal, and a var-name break would fail the drop branch too → caught).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Kaizen reviews — decisions ledger
|
## Kaizen reviews — decisions ledger
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,470 @@
|
||||||
|
# Mesh-hardening 2/3 — ubongo INPUT-only default-deny — Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** Apply base's nftables firewall to the control node (ubongo) as an INPUT-only default-deny — hardening its inbound surface — while leaving the forward chain permissive so Docker egress and the libvirt-NAT integration harness keep working, and without any sshd `ListenAddress` change.
|
||||||
|
|
||||||
|
**Architecture:** Two new `base` knobs make the existing firewall concern fit a control node: `base__firewall_input_only` flips the forward chain to `policy accept` (host-local input filtering only), and `base__firewall_admin_addrs` adds operator-workstation LAN sources to the SSH allow-list (alongside `wt0` and `ssh-from-control`). sshd is untouched (nftables does the scoping → no `ip_nonlocal_bind` boot-race). The change is validated on a throwaway VM via the ADR-025 integration harness (a new "be ubongo" profile) before an operator-supervised live cutover whose safety net is the firewall auto-rollback timer plus the permanent on-prem physical console.
|
||||||
|
|
||||||
|
**Tech Stack:** Ansible (role `base`, FQCN), nftables, Jinja2, Molecule on Debian 13, pytest (none new), the ADR-025 integration harness (`scripts/integration-vm.py`, JSON profiles, `-e @` overlays).
|
||||||
|
|
||||||
|
**Spec:** `docs/superpowers/specs/2026-06-19-mesh-hardening-ubongo-default-deny-design.md`
|
||||||
|
|
||||||
|
**Conventions:** `make lint` and `make test ROLE=base` before each commit; `make check` before `make deploy`; never hand-edit the generated `offsite.yml`; `rbw unlocked` for any commit touching Ansible content and for the integration/live applies (the production `group_vars/all/vault.yml` is in inventory scope and gets decrypted at playbook load). Tasks 1–3 are code (subagent-driven, each lint/Molecule-verified). Task 4 is a real-VM validation gate on ubongo. Task 5 is the live, operator-supervised cutover.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
| File | Create/Modify | Responsibility |
|
||||||
|
|---|---|---|
|
||||||
|
| `roles/base/defaults/main.yml` | Modify | Declare `base__firewall_input_only` + `base__firewall_admin_addrs` (defaults: off / empty). |
|
||||||
|
| `roles/base/templates/nftables.conf.j2` | Modify | Conditional forward policy; render an SSH-allow rule per admin address. |
|
||||||
|
| `roles/base/molecule/default/converge.yml` | Modify | Fixture: an admin-addr source (input-only stays at its default → forward drop). |
|
||||||
|
| `roles/base/molecule/default/verify.yml` | Modify | Assert forward-drop default + the admin-addr rule render. |
|
||||||
|
| `inventories/production/group_vars/control/vars.yml` | Modify | Turn the knobs on for ubongo (input-only; mamba's LAN IP). |
|
||||||
|
| `tests/integration/overrides/ubongo.yml` | Create | The "be ubongo" overlay (input-only firewall; harness SSH lifeline). |
|
||||||
|
| `tests/integration/profiles/ubongo.json` | Create | The "be ubongo" VM profile (group `control`, applies `site.yml:base`). |
|
||||||
|
| `tests/integration/overrides/askari.yml` | Modify | Add the `integration_profile` marker (verify is now profile-aware). |
|
||||||
|
| `tests/integration/verify.yml` | Modify | Gate the askari (Docker/DNAT) block; add the ubongo (input-only) block + a guard. |
|
||||||
|
| `STATUS.md`, `docs/ROADMAP.md` | Modify (Task 5) | Record mesh-hardening 2/3 done. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 1: base role — `base__firewall_input_only` (forward policy) + `base__firewall_admin_addrs` (LAN SSH allow)
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `roles/base/defaults/main.yml`
|
||||||
|
- Modify: `roles/base/templates/nftables.conf.j2`
|
||||||
|
- Modify: `roles/base/molecule/default/converge.yml`
|
||||||
|
- Modify: `roles/base/molecule/default/verify.yml`
|
||||||
|
|
||||||
|
> **Test strategy (note):** Molecule renders one fixture, so it locks the *secure default* —
|
||||||
|
> `input_only` **off** → forward `policy drop` — plus the new admin-addr rule (red→green). The
|
||||||
|
> `input_only` **on** → forward `policy accept` path is exercised on a real VM by the
|
||||||
|
> integration "be ubongo" profile (Tasks 3–4), whose verify fails red until this template
|
||||||
|
> conditional exists. Both branches are covered, across the two test layers.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing test (extend Molecule verify)**
|
||||||
|
|
||||||
|
In `roles/base/molecule/default/verify.yml`, after the `Assert the docker_host extension hook is present` block, add:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Assert the forward chain defaults to policy drop (input_only off)
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'hook forward priority 0; policy drop;' in nft"
|
||||||
|
fail_msg: >-
|
||||||
|
forward chain must default to policy drop when base__firewall_input_only is
|
||||||
|
false (container isolation stays the norm on real service hosts)
|
||||||
|
|
||||||
|
- name: Assert the admin-addr SSH allow rule (operator workstation on the LAN)
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'ip saddr 10.30.0.77 tcp dport 22 accept' in nft"
|
||||||
|
fail_msg: "missing admin-addr SSH allow rule from base__firewall_admin_addrs"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Add the fixture that drives it (Molecule converge)**
|
||||||
|
|
||||||
|
In `roles/base/molecule/default/converge.yml`, add to the `vars:` block (after the `base__firewall_control_addr` line):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
base__firewall_admin_addrs:
|
||||||
|
- "10.30.0.77" # fixture: an operator-workstation LAN source (admin-addr SSH allow)
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Run the test to verify it fails**
|
||||||
|
|
||||||
|
Run: `make test ROLE=base`
|
||||||
|
Expected: FAIL on `Assert the admin-addr SSH allow rule` (the template does not consume `base__firewall_admin_addrs` yet, so the `ip saddr 10.30.0.77 …` rule is absent). The forward-drop assertion passes already (the template currently hardcodes `policy drop`).
|
||||||
|
|
||||||
|
- [ ] **Step 4: Add the defaults**
|
||||||
|
|
||||||
|
In `roles/base/defaults/main.yml`, after the `base__firewall_apply: true` line (end of the firewall behaviour block, currently line 13), add:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
base__firewall_input_only: false # true → the forward chain is `policy accept` (host-local
|
||||||
|
# INPUT filtering only). For hosts that forward/route
|
||||||
|
# container or NAT traffic (the control node's Docker +
|
||||||
|
# libvirt-NAT) where a forward default-deny would break
|
||||||
|
# them. Real service hosts keep this false (forward drop).
|
||||||
|
base__firewall_admin_addrs: [] # extra LAN source IPs allowed to SSH, besides wt0 +
|
||||||
|
# ssh-from-control. For an operator workstation reaching
|
||||||
|
# the host over the LAN (no mesh). Key-gated. (ADR-021)
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Make the forward policy conditional + render the admin-addr rules**
|
||||||
|
|
||||||
|
In `roles/base/templates/nftables.conf.j2`:
|
||||||
|
|
||||||
|
(a) Replace the forward-chain line (currently line 21):
|
||||||
|
|
||||||
|
```jinja
|
||||||
|
chain forward { type filter hook forward priority 0; policy {{ 'accept' if base__firewall_input_only | bool else 'drop' }}; }
|
||||||
|
```
|
||||||
|
|
||||||
|
(b) After the `ssh-from-control` `{% endif %}` (currently line 14) and before the `ip protocol icmp accept` line, add the admin-addr loop:
|
||||||
|
|
||||||
|
```jinja
|
||||||
|
{% for addr in base__firewall_admin_addrs %}
|
||||||
|
ip saddr {{ addr }} tcp dport {{ base__firewall_ssh_port }} accept
|
||||||
|
{% endfor %}
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 6: Run the test to verify it passes**
|
||||||
|
|
||||||
|
Run: `make test ROLE=base`
|
||||||
|
Expected: PASS — converge renders the ruleset; verify confirms the forward chain is `policy drop` (input_only defaults false) and the `ip saddr 10.30.0.77 tcp dport 22 accept` rule is present; all pre-existing assertions stay green.
|
||||||
|
|
||||||
|
- [ ] **Step 7: Lint**
|
||||||
|
|
||||||
|
Run: `make lint`
|
||||||
|
Expected: `Passed: 0 failure(s)` and `check-tags: OK`.
|
||||||
|
|
||||||
|
- [ ] **Step 8: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add roles/base/defaults/main.yml roles/base/templates/nftables.conf.j2 \
|
||||||
|
roles/base/molecule/default/converge.yml roles/base/molecule/default/verify.yml
|
||||||
|
git commit -m "feat(base): input-only forward policy + admin-addr SSH allow
|
||||||
|
|
||||||
|
base__firewall_input_only renders the forward chain policy accept (host-local
|
||||||
|
INPUT filtering only) for hosts that forward container/NAT traffic; defaults
|
||||||
|
false so real service hosts keep the forward default-deny. base__firewall_admin_addrs
|
||||||
|
adds operator-workstation LAN sources to the SSH allow-list alongside wt0 +
|
||||||
|
ssh-from-control. Molecule locks the secure default + the admin rule.
|
||||||
|
Mesh-hardening 2/3 (ADR-020/021).
|
||||||
|
|
||||||
|
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 2: inventory — enable input-only default-deny + mamba on ubongo (control group)
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `inventories/production/group_vars/control/vars.yml`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Turn the knobs on for the control group**
|
||||||
|
|
||||||
|
Append to `inventories/production/group_vars/control/vars.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
|
||||||
|
# Mesh-hardening 2/3 (2026-06-19, ADR-020/021): apply base's host firewall to ubongo as
|
||||||
|
# INPUT-only default-deny — harden the inbound surface, leave the forward chain permissive so
|
||||||
|
# Docker egress + the libvirt-NAT integration harness keep working. sshd is unchanged
|
||||||
|
# (nftables scopes inbound), so there is no boot-race. Reach ubongo over wt0 (mesh), the
|
||||||
|
# ssh-from-control self-path (base__firewall_control_addr, group_vars/all = 10.20.10.151), or
|
||||||
|
# mamba on the LAN. Break-glass: the physical console. (base__firewall_apply defaults true.)
|
||||||
|
base__firewall_input_only: true
|
||||||
|
base__firewall_admin_addrs:
|
||||||
|
- "10.20.10.50" # mamba over the LAN (NetBird off). Raw DHCP lease — revisit with an
|
||||||
|
# OPNsense reservation when OPNsense-as-code lands; backstopped by wt0.
|
||||||
|
- "10.20.10.17" # 2nd operator workstation (MAC bc:0f:f3:c8:4a:8a). Raw lease — ditto.
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Verify the vars resolve for ubongo**
|
||||||
|
|
||||||
|
Run: `.venv/bin/ansible-inventory -i inventories/production/ --host ubongo 2>/dev/null | grep -E 'firewall_input_only|firewall_admin_addrs|10.20.10.(50|17)'`
|
||||||
|
Expected: shows `"base__firewall_input_only": true` and `"base__firewall_admin_addrs": ["10.20.10.50", "10.20.10.17"]`.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Lint**
|
||||||
|
|
||||||
|
Run: `make lint`
|
||||||
|
Expected: clean pass (`check-tags: OK`).
|
||||||
|
|
||||||
|
- [ ] **Step 4: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add inventories/production/group_vars/control/vars.yml
|
||||||
|
git commit -m "feat(inventory): ubongo gets INPUT-only host firewall + mamba LAN SSH
|
||||||
|
|
||||||
|
Enables base__firewall_input_only on the control group (forward chain stays
|
||||||
|
permissive so Docker egress + the integration-test libvirt NAT survive) and
|
||||||
|
allows the operator workstations' LAN IPs (mamba 10.20.10.50 + 10.20.10.17;
|
||||||
|
raw leases, backstopped by wt0). Mesh-hardening 2/3.
|
||||||
|
|
||||||
|
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 3: integration harness — "be ubongo" profile (overlay + profile + profile-aware verify)
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `tests/integration/overrides/ubongo.yml`
|
||||||
|
- Create: `tests/integration/profiles/ubongo.json`
|
||||||
|
- Modify: `tests/integration/overrides/askari.yml`
|
||||||
|
- Modify: `tests/integration/verify.yml`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Create the "be ubongo" overlay**
|
||||||
|
|
||||||
|
Create `tests/integration/overrides/ubongo.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
# Integration-test overlay for the "ubongo" profile (ADR-025). Passed via `-e @`.
|
||||||
|
# Exercises mesh-hardening 2/3: base's INPUT-only default-deny on the control node — input
|
||||||
|
# chain default-deny, forward chain left permissive (Docker/libvirt-NAT safe), no sshd
|
||||||
|
# ListenAddress change (so no boot-race).
|
||||||
|
integration_profile: ubongo
|
||||||
|
base__firewall_apply: true
|
||||||
|
base__firewall_input_only: true # forward chain renders `policy accept`
|
||||||
|
base__firewall_admin_addrs:
|
||||||
|
- "192.168.150.98" # two representative LAN sources — exercises the
|
||||||
|
- "192.168.150.99" # admin-addr loop with a multi-entry list (like ubongo)
|
||||||
|
# Never wt0-only; never touch the real mesh from a throwaway VM.
|
||||||
|
base__ssh_listen_mesh_only: false
|
||||||
|
base__mesh_enabled: false
|
||||||
|
# Allow SSH from the libvirt-NAT gateway (where the driver/ansible connect from) so the
|
||||||
|
# default-deny apply + the reboot don't lock out the harness. By source IP (interface-
|
||||||
|
# independent). This is the harness's lifeline; the admin-addr above is only exercised.
|
||||||
|
base__firewall_control_addr: "192.168.150.1"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Create the "be ubongo" VM profile**
|
||||||
|
|
||||||
|
Create `tests/integration/profiles/ubongo.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"groups": ["control"],
|
||||||
|
"applies": [
|
||||||
|
{"playbook": "site.yml", "tags": ["base"]}
|
||||||
|
],
|
||||||
|
"extra_vars_files": ["overrides/ubongo.yml"],
|
||||||
|
"mem_mib": 2048,
|
||||||
|
"vcpus": 2
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Mark the askari overlay with its profile name**
|
||||||
|
|
||||||
|
In `tests/integration/overrides/askari.yml`, after the two header comment lines (before `base__firewall_apply: true`), add:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
integration_profile: askari
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Make `verify.yml` profile-aware (the test)**
|
||||||
|
|
||||||
|
Replace the entire contents of `tests/integration/verify.yml` with:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
# Integration verify (ADR-025). Outcome-based, profile-aware: the active profile is named by
|
||||||
|
# `integration_profile` (set in each profile's overlay). Each profile asserts its own success
|
||||||
|
# criteria; an unknown/unset profile fails loudly (never a silent pass).
|
||||||
|
- name: Verify the rebooted host
|
||||||
|
hosts: all
|
||||||
|
become: true
|
||||||
|
gather_facts: false
|
||||||
|
tasks:
|
||||||
|
- name: A known integration_profile must be set (no silent pass)
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- integration_profile is defined
|
||||||
|
- integration_profile in ['askari', 'ubongo']
|
||||||
|
fail_msg: "integration_profile must be set in the profile overlay (askari|ubongo)"
|
||||||
|
|
||||||
|
# ── askari profile — Docker host: published-port forwarding survives the reboot ──
|
||||||
|
# The load-bearing check probes the VM's published :80 FROM the controller (ubongo) — if
|
||||||
|
# base's forward-drop killed DNAT, this times out (the FRICTION 2026-06-17 #1 bug).
|
||||||
|
- name: (askari) Gather service facts
|
||||||
|
when: integration_profile == 'askari'
|
||||||
|
ansible.builtin.service_facts:
|
||||||
|
|
||||||
|
- name: (askari) Docker daemon is active
|
||||||
|
when: integration_profile == 'askari'
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that: "ansible_facts.services['docker.service'].state == 'running'"
|
||||||
|
fail_msg: "docker.service is not running"
|
||||||
|
|
||||||
|
- name: (askari) Forward chain permits container traffic (drop-in loaded)
|
||||||
|
when: integration_profile == 'askari'
|
||||||
|
ansible.builtin.command: nft list chain inet filter forward
|
||||||
|
register: _fwd
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: (askari) Assert container forwarding is allowed (not pure drop)
|
||||||
|
when: integration_profile == 'askari'
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that: "'accept' in _fwd.stdout"
|
||||||
|
fail_msg: >-
|
||||||
|
forward chain is pure drop — container forwarding will die on reboot
|
||||||
|
(FRICTION 2026-06-17 #1). docker_host container-forward drop-in missing.
|
||||||
|
|
||||||
|
- name: (askari) Published port answers from the controller (DNAT + forward alive)
|
||||||
|
when: integration_profile == 'askari'
|
||||||
|
delegate_to: localhost
|
||||||
|
become: false
|
||||||
|
ansible.builtin.uri:
|
||||||
|
url: "http://{{ ansible_host }}/"
|
||||||
|
follow_redirects: none
|
||||||
|
status_code: [200, 301, 308, 404, 502, 503]
|
||||||
|
timeout: 10
|
||||||
|
register: _probe
|
||||||
|
retries: 5
|
||||||
|
delay: 6
|
||||||
|
until: _probe is succeeded
|
||||||
|
|
||||||
|
# ── ubongo profile — control node: INPUT-only default-deny survives the reboot ──
|
||||||
|
# SSH reachability across the reboot is proven by the harness itself (it re-SSHes and
|
||||||
|
# checks boot_id changed before this verify runs). Here we assert the ruleset shape.
|
||||||
|
- name: (ubongo) Read the live nftables ruleset
|
||||||
|
when: integration_profile == 'ubongo'
|
||||||
|
ansible.builtin.command: nft list ruleset
|
||||||
|
register: _nft
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: (ubongo) INPUT default-deny, forward permissive, admin-addr allow
|
||||||
|
when: integration_profile == 'ubongo'
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'hook input priority 0; policy drop;' in _nft.stdout"
|
||||||
|
- "'hook forward priority 0; policy accept;' in _nft.stdout"
|
||||||
|
- "'ip saddr 192.168.150.98 tcp dport 22 accept' in _nft.stdout"
|
||||||
|
- "'ip saddr 192.168.150.99 tcp dport 22 accept' in _nft.stdout"
|
||||||
|
fail_msg: >-
|
||||||
|
ubongo profile: expected input policy drop, forward policy accept (input-only),
|
||||||
|
and both admin-addr (192.168.150.98/99) SSH allows in the live ruleset.
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Validate the JSON + lint**
|
||||||
|
|
||||||
|
Run: `.venv/bin/python -m json.tool tests/integration/profiles/ubongo.json >/dev/null && echo OK` then `make lint`
|
||||||
|
Expected: `OK`, then a clean lint pass (`check-tags: OK`).
|
||||||
|
|
||||||
|
- [ ] **Step 6: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add tests/integration/overrides/ubongo.yml tests/integration/profiles/ubongo.json \
|
||||||
|
tests/integration/overrides/askari.yml tests/integration/verify.yml
|
||||||
|
git commit -m "test(integration): add the 'be ubongo' profile (input-only default-deny)
|
||||||
|
|
||||||
|
A control-group VM that applies base with INPUT-only default-deny (forward
|
||||||
|
policy accept; admin-addr SSH allow). verify.yml is now profile-aware via an
|
||||||
|
integration_profile marker — the askari Docker/DNAT block is gated, and a ubongo
|
||||||
|
block asserts input drop + forward accept + the admin-addr rule. Enables
|
||||||
|
\`make test-integration HOST=ubongo\`. Mesh-hardening 2/3 (ADR-025).
|
||||||
|
|
||||||
|
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 4: Validate on the integration harness (`make test-integration HOST=ubongo`) — the GREEN gate
|
||||||
|
|
||||||
|
> Runs a throwaway UEFI VM on ubongo: boots it, applies the base role with the ubongo
|
||||||
|
> overlay (INPUT-only default-deny), **reboots it**, and asserts the ruleset + SSH-returns.
|
||||||
|
> This proves the change survives a reboot before the real control node is ever touched
|
||||||
|
> (spec §cutover step 1; FRICTION signal-6). No code change / no commit — a validation gate.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Ensure the vault is unlocked**
|
||||||
|
|
||||||
|
The run loads `inventories/production/group_vars/all/vault.yml` (symlinked into the run dir), which is decrypted at playbook load.
|
||||||
|
|
||||||
|
Run: `rbw unlocked || rbw unlock`
|
||||||
|
Expected: exits 0 (unlocked). If it prompts, the operator unlocks.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run the integration cycle**
|
||||||
|
|
||||||
|
Run: `make test-integration HOST=ubongo`
|
||||||
|
Expected (the `cycle`: up → apply → reboot → assert): the VM gets a `192.168.150.x` lease; `site.yml --tags base` applies cleanly; `… rebooted (boot_id changed), SSH back at 192.168.150.x`; then `VERIFY PASSED for boma-it-ubongo-…`. The VM is destroyed on success.
|
||||||
|
|
||||||
|
- [ ] **Step 3: On failure, read the diagnostics**
|
||||||
|
|
||||||
|
If it prints `VERIFY FAILED`, diagnostics are in `~/integration-runs/boma-it-ubongo-<id>/` (`nft.txt`, `console.log`, `journal.txt`). The likely suspects: the admin-addr/forward assertion (Task 1/3 wiring) or SSH not returning post-reboot (the `base__firewall_control_addr: 192.168.150.1` lifeline in the overlay). Fix the implicated task, re-commit, and re-run Step 2. Re-run `make test-integration-clean` first if a VM was left defined.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Record the result**
|
||||||
|
|
||||||
|
Capture the `VERIFY PASSED` line in the task notes (this is the gate Task 5 step 1 depends on). No commit.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 5: Live staged cutover (operator-supervised — NOT a subagent task)
|
||||||
|
|
||||||
|
> Touches the **real ubongo** (the control node Ansible runs from) and reboots it — lockout-
|
||||||
|
> risky. Run it interactively with the operator, in order, verifying each step before the
|
||||||
|
> next. The firewall auto-rollback timer (`base__firewall_rollback_timeout`, 45 s) +
|
||||||
|
> `wait_for_connection` over the live path is the safety net; the **on-prem physical console**
|
||||||
|
> is the permanent break-glass. Do NOT hand this to an unattended agent.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Pre-checks (gate: Task 4 GREEN)**
|
||||||
|
|
||||||
|
- `rbw unlocked || rbw unlock`.
|
||||||
|
- SSH to ubongo over `wt0` from a road-warrior succeeds.
|
||||||
|
- SSH to ubongo from mamba on the LAN (`10.20.10.50`) succeeds.
|
||||||
|
- `.venv/bin/ansible ubongo -i inventories/production/ -m ping` → `SUCCESS` (over `10.20.10.151`).
|
||||||
|
- The physical console is reachable. If any path fails, STOP.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Dry-run the firewall apply**
|
||||||
|
|
||||||
|
Run: `make check PLAYBOOK=site LIMIT=ubongo TAGS=firewall`
|
||||||
|
Expected: the nftables diff shows `policy drop` on input, `iifname "wt0" … accept`, `ip saddr 10.20.10.151 … accept`, `ip saddr 10.20.10.50 … accept`, and the forward chain as `policy accept`. No errors.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Apply the host firewall (auto-rollback armed)**
|
||||||
|
|
||||||
|
Run: `make deploy PLAYBOOK=site LIMIT=ubongo TAGS=firewall`
|
||||||
|
Expected: the firewall concern snapshots `/etc/nftables.rollback`, arms the 45 s `systemd-run` revert, applies the ruleset, `reset_connection` → `wait_for_connection` over `10.20.10.151` succeeds, then cancels the timer. If connectivity is lost, the timer reverts the ruleset within 45 s and the console is the fallback.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Verify every path + forwarding still works**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# from a road-warrior over wt0, and from mamba on the LAN:
|
||||||
|
ssh sjat@100.99.146.14 true && echo "wt0 OK"
|
||||||
|
ssh sjat@10.20.10.151 true && echo "mamba-LAN OK" # run from mamba (10.20.10.50)
|
||||||
|
# Ansible self-path:
|
||||||
|
.venv/bin/ansible ubongo -i inventories/production/ -m ping
|
||||||
|
# a disallowed LAN host (e.g. 10.20.10.17) must now be refused/timeout on :22
|
||||||
|
# Docker egress (forward chain still permissive):
|
||||||
|
docker run --rm busybox wget -qO- https://cloudflare.com/cdn-cgi/trace | head -1
|
||||||
|
# libvirt-NAT forwarding intact — a fresh integration VM still reaches apt:
|
||||||
|
make test-integration HOST=ubongo # expect VERIFY PASSED (proves the NAT path survived)
|
||||||
|
```
|
||||||
|
Expected: `wt0 OK`, `mamba-LAN OK`, Ansible `SUCCESS`, the disallowed host refused, the Docker egress line returns, and the integration cycle passes.
|
||||||
|
|
||||||
|
- [ ] **Step 5: Reboot resilience — while the console is present (FRICTION signal-6)**
|
||||||
|
|
||||||
|
With the operator at the physical console, reboot ubongo (`sudo systemctl reboot`). After it returns, confirm SSH comes back on all paths **unaided**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh sjat@100.99.146.14 true && echo "wt0 OK after reboot"
|
||||||
|
.venv/bin/ansible ubongo -i inventories/production/ -m ping
|
||||||
|
```
|
||||||
|
Expected: SSH returns with no manual intervention (no `ListenAddress`, so nothing to race). Only now is the cutover complete.
|
||||||
|
|
||||||
|
- [ ] **Step 6: Update STATUS + ROADMAP**
|
||||||
|
|
||||||
|
- In `STATUS.md`: in the `roles/base/` row of "Scaffolded but empty", change the firewall note — the `firewall` concern is now **applied to ubongo** as INPUT-only default-deny (it is no longer "not yet applied to any host"); note the `base__firewall_input_only` knob and that the forward default-deny still awaits the `docker_host` drop-in for real service hosts. Add the ubongo control-node row's "Pending" item for default-deny → done.
|
||||||
|
- In `docs/ROADMAP.md`: mark **mesh-hardening sub-project 2 (ubongo default-deny) done**; the remaining follow-on is sub-project 1 (askari SSH→`wt0` *redesign*) and sub-project 3 (NetBird ACL). Update the "Next step" section accordingly.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add STATUS.md docs/ROADMAP.md
|
||||||
|
git commit -m "docs: ubongo INPUT-only default-deny applied (mesh-hardening 2/3 done)
|
||||||
|
|
||||||
|
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 7: Push**
|
||||||
|
|
||||||
|
Run: `git push origin main`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Self-review (against the spec)
|
||||||
|
|
||||||
|
- **§ Design — INPUT-only default-deny** → Task 1 (forward-policy knob) + Task 2 (enabled on ubongo). ✓
|
||||||
|
- **§ Design — admin-addrs (operator workstations on LAN)** → Task 1 (`base__firewall_admin_addrs` + template loop) + Task 2 (`10.20.10.50` mamba, `10.20.10.17`). ✓
|
||||||
|
- **§ Design — no sshd ListenAddress change** → nothing touches `ssh.yml`/`sshd_hardening.conf.j2`; only nftables. ✓ (verified: Tasks 1–3 file lists exclude them).
|
||||||
|
- **§ allow-list** (lo, established, wt0, ssh-from-control, admin-addr, icmp; forward accept) → template already renders lo/established/wt0/control/icmp; Task 1 adds admin-addr + forward-accept. ✓
|
||||||
|
- **§ Why-safe (incident signals 1/2/3/6)** → signal 1 (forward accept, Task 1); signal 2 (no ListenAddress); signal 3 (ubongo keeps LAN + console); signal 6 (Task 4 harness reboot + Task 5 step 5 reboot-while-console). ✓
|
||||||
|
- **§ New & changed code** (defaults, template, molecule, group_vars/control, integration profile) → Tasks 1–3. ✓
|
||||||
|
- **§ admin raw-leases + revisit** → Task 2 comments record both leases + the OPNsense-reservation revisit trigger; backstop (wt0) noted; flagged in `FRICTION.md`. ✓
|
||||||
|
- **§ Testing** (Molecule render asserts; `make test-integration HOST=ubongo`; live checks) → Task 1 (Molecule), Task 4 (harness), Task 5 step 4 (live). ✓ Coverage split (default in Molecule, input_only on the VM) noted in Task 1.
|
||||||
|
- **§ Staged cutover (signal-6 order)** → Task 5 steps 1–7; reboot-recovery (step 5) precedes nothing that retires a break-glass (the console is permanent). ✓
|
||||||
|
- **§ Risks/rollback** → auto-rollback (Task 5 step 3), redundant paths + physical console, raw-lease backstop. ✓
|
||||||
|
- **Type/name consistency:** `base__firewall_input_only` (bool) and `base__firewall_admin_addrs` (list) are spelled identically in defaults, template, converge, group_vars, and the overlay. `integration_profile` is spelled identically in both overlays and the three gates in `verify.yml`. ✓
|
||||||
|
- **Placeholder scan:** no TBD/TODO; every code/command step shows the actual content. ✓
|
||||||
|
|
@ -0,0 +1,203 @@
|
||||||
|
# Spec — Mesh-hardening (2 of 3): ubongo INPUT-only default-deny + `ssh-from-control`
|
||||||
|
|
||||||
|
Status: Accepted (2026-06-19)
|
||||||
|
|
||||||
|
## Context & scope
|
||||||
|
|
||||||
|
The **mesh-hardening follow-on** (deferred from M5, ROADMAP) was decomposed into three
|
||||||
|
independent sub-projects, each its own spec → plan → implementation cycle:
|
||||||
|
|
||||||
|
1. askari SSH → `wt0` — spec/plan written 2026-06-17, **attempted and backed out the same day**
|
||||||
|
(the incident; six lessons in `FRICTION.md`). Needs a redesign — **not** this spec.
|
||||||
|
2. **ubongo nftables default-deny + `ssh-from-control`** ← *this spec*
|
||||||
|
3. NetBird ACL off Allow-All → scoped policies (its own later spec; open mechanism question —
|
||||||
|
no headless API path).
|
||||||
|
|
||||||
|
ROADMAP (re-ordered after the 2026-06-17 incident) puts **ubongo first**: it is the clean,
|
||||||
|
low-risk case — a physical box with a permanent console break-glass, and *not* the coordinator
|
||||||
|
host that the incident proved you must not corner.
|
||||||
|
|
||||||
|
This spec hardens **ubongo's inbound surface only**. It does **not** change sshd's
|
||||||
|
`ListenAddress` (so no boot-race), does **not** apply a forward-chain default-deny (so Docker +
|
||||||
|
the libvirt NAT keep working), and does **not** touch askari or the NetBird ACL.
|
||||||
|
|
||||||
|
Current state (verified on ubongo, 2026-06-19): **no host firewall** — sshd listens on
|
||||||
|
`0.0.0.0:22`, reachable from LAN, mesh, and anything routable; only Docker's + libvirt's own
|
||||||
|
`iptables-nft` tables exist. Interfaces: `eno1` `10.20.10.151` (LAN, = `ansible_host`), `wt0`
|
||||||
|
`100.99.146.14` (mesh), `docker0` (one container, no published ports), `virbr-boma`
|
||||||
|
`192.168.150.1/24` (the libvirt NAT that `make test-integration` uses), `ip_forward=1`.
|
||||||
|
|
||||||
|
## Goal / success criteria
|
||||||
|
|
||||||
|
- SSH to ubongo succeeds over **`wt0`** (road-warriors, askari), from **mamba on the LAN**
|
||||||
|
(`10.20.10.50`), and via the **`ssh-from-control` self-path** (Ansible; source `10.20.10.151`).
|
||||||
|
- SSH from any **other** LAN source is **dropped** (default-deny on `input`).
|
||||||
|
- **Docker container egress and `make test-integration` (libvirt NAT) keep working** — the
|
||||||
|
forward chain is untouched.
|
||||||
|
- A **reboot** does not lock SSH out (no `ListenAddress`, so no bind race).
|
||||||
|
- Break-glass is the **on-prem physical console** (permanent, non-mesh). The live apply is
|
||||||
|
additionally gated by the firewall **auto-rollback** timer.
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
Apply base's nftables `firewall` concern to ubongo, with two adjustments and one deliberate
|
||||||
|
non-change:
|
||||||
|
|
||||||
|
1. **INPUT-only default-deny.** The `input` chain keeps `policy drop` with the guaranteed
|
||||||
|
management plane: `lo`, `established,related`, ICMP, SSH on `wt0`, and SSH from
|
||||||
|
`ssh-from-control` (`10.20.10.151`). We add **one operator-workstation source** (mamba,
|
||||||
|
`10.20.10.50`) via a new `base__firewall_admin_addrs` list. Everything else on `eno1` drops.
|
||||||
|
2. **Forward chain left permissive.** base hardcodes `chain forward { … policy drop; }` for
|
||||||
|
inter-container isolation. On ubongo that would break Docker egress **and** the libvirt NAT
|
||||||
|
the integration harness depends on — the same class of failure that sank askari (FRICTION
|
||||||
|
2026-06-17, signal 1). A new `base__firewall_input_only` knob renders the forward chain
|
||||||
|
`policy accept` instead. Docker's and libvirt's own `iptables-nft` forward rules continue to
|
||||||
|
apply (separate tables); base simply does not add a default-deny on top.
|
||||||
|
3. **No sshd `ListenAddress` change.** sshd keeps listening on `0.0.0.0:22`; nftables does all
|
||||||
|
inbound scoping. This deliberately avoids the `ip_nonlocal_bind` boot-race that broke askari
|
||||||
|
(FRICTION signal 2) — there is nothing to bind before `wt0` exists.
|
||||||
|
|
||||||
|
Resulting `input` allow-list:
|
||||||
|
|
||||||
|
```
|
||||||
|
iif "lo" accept
|
||||||
|
ct state established,related accept
|
||||||
|
ct state invalid drop
|
||||||
|
iifname "wt0" tcp dport 22 accept # mesh (road-warriors, askari)
|
||||||
|
ip saddr 10.20.10.151 tcp dport 22 accept # ssh-from-control (Ansible self) — group_vars/all
|
||||||
|
ip saddr 10.20.10.50 tcp dport 22 accept # mamba on the LAN — base__firewall_admin_addrs
|
||||||
|
ip saddr 10.20.10.17 tcp dport 22 accept # 2nd operator wkstn — base__firewall_admin_addrs
|
||||||
|
ip protocol icmp accept ; ip6 nexthdr ipv6-icmp accept
|
||||||
|
# (no catalog services on ubongo) → default drop
|
||||||
|
chain forward: policy accept # Docker + libvirt-NAT forwarding preserved
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why ubongo is the safe case (maps to the 2026-06-17 incident)
|
||||||
|
|
||||||
|
- **Signal 1** (forward-drop breaks Docker hosts): sidestepped — INPUT-only leaves forwarding alone.
|
||||||
|
- **Signal 2** (`ip_nonlocal_bind` boot-race): sidestepped — no `ListenAddress`; sshd binds nothing new.
|
||||||
|
- **Signal 3** (a host's only mgmt path must not depend on a service it hosts): satisfied —
|
||||||
|
ubongo is not the coordinator and keeps three independent paths (mesh, LAN, physical console).
|
||||||
|
- **Signal 6** (recovery tested after the break-glass was removed): the physical console is
|
||||||
|
permanent (nothing to retire), and reboot-recovery is proven on a throwaway VM first.
|
||||||
|
|
||||||
|
## New & changed code
|
||||||
|
|
||||||
|
**Role `base`:**
|
||||||
|
|
||||||
|
- `roles/base/defaults/main.yml` — add:
|
||||||
|
- `base__firewall_input_only: false` — when true, the forward chain is `policy accept`
|
||||||
|
(host-local input filtering only), for hosts that route/forward container or NAT traffic
|
||||||
|
(e.g. the control node's Docker + libvirt-NAT) where a forward default-deny would break them.
|
||||||
|
- `base__firewall_admin_addrs: []` — extra LAN source IPs allowed to SSH (besides `wt0` +
|
||||||
|
`ssh-from-control`); for an operator workstation reaching the host over the LAN. Key-gated.
|
||||||
|
- `roles/base/templates/nftables.conf.j2`:
|
||||||
|
- the forward line (currently line 21) →
|
||||||
|
`chain forward { type filter hook forward priority 0; policy {{ "accept" if base__firewall_input_only | bool else "drop" }}; }`
|
||||||
|
- after the `ssh-from-control` block (currently lines 12-14), add a loop:
|
||||||
|
`{% for addr in base__firewall_admin_addrs %}` →
|
||||||
|
`ip saddr {{ addr }} tcp dport {{ base__firewall_ssh_port }} accept`
|
||||||
|
- `roles/base/molecule/default/{converge,verify}.yml` — fixture sets `input_only: true` + an
|
||||||
|
`admin_addrs` entry; assert (a) `forward` renders `policy accept`, (b) the admin-addr accept
|
||||||
|
rule renders, (c) existing input default-deny + `wt0` + control-addr assertions stay green.
|
||||||
|
|
||||||
|
**Inventory** (`inventories/production/group_vars/control/vars.yml`, append):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Mesh-hardening 2/3 (2026-06-19, ADR-020/021): apply base's host firewall to ubongo as
|
||||||
|
# INPUT-only default-deny — harden the inbound surface, leave the forward chain permissive so
|
||||||
|
# Docker egress + the libvirt-NAT integration harness keep working. sshd is unchanged
|
||||||
|
# (nftables scopes inbound), so there is no boot-race. Reach ubongo over wt0, the
|
||||||
|
# ssh-from-control self-path (base__firewall_control_addr in group_vars/all), or mamba on the
|
||||||
|
# LAN. Break-glass: the physical console.
|
||||||
|
base__firewall_input_only: true
|
||||||
|
base__firewall_admin_addrs:
|
||||||
|
- "10.20.10.50" # mamba over the LAN (NetBird off). Raw DHCP lease — see note below.
|
||||||
|
- "10.20.10.17" # a 2nd operator workstation (MAC bc:0f:f3:c8:4a:8a). Raw lease — ditto.
|
||||||
|
# base__firewall_apply defaults true; base__firewall_control_addr (= ubongo's own 10.20.10.151)
|
||||||
|
# is set in group_vars/all and covers Ansible's self-connection.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Integration harness** (ADR-025) — a "be ubongo" profile, mirroring "be askari":
|
||||||
|
|
||||||
|
- `tests/integration/overrides/ubongo.yml` — `firewall_apply: true`, `input_only: true`,
|
||||||
|
`admin_addrs: ["192.168.150.99"]` (a representative LAN addr to exercise the rule),
|
||||||
|
`firewall_control_addr: "192.168.150.1"` (the libvirt-NAT gateway = the harness's own SSH
|
||||||
|
path, so the apply + reboot don't lock it out), `ssh_listen_mesh_only: false`,
|
||||||
|
`mesh_enabled: false`.
|
||||||
|
- `tests/integration/profiles/ubongo.json` — mirror `profiles/askari.json` (VM resources/image).
|
||||||
|
- `tests/integration/verify.yml` — make the assertions **profile-aware** (gated on the active
|
||||||
|
profile, since `verify.yml` is shared): for ubongo assert `input` policy drop, `forward`
|
||||||
|
policy **accept**, and the admin-addr rule present. Reachability across the reboot is the
|
||||||
|
harness's existing cycle. The askari assertions (Docker/forward-DNAT) must **not** run for the
|
||||||
|
ubongo profile, nor vice-versa.
|
||||||
|
|
||||||
|
Enables `make test-integration HOST=ubongo`.
|
||||||
|
|
||||||
|
## The admin-addrs — deliberately interim values
|
||||||
|
|
||||||
|
`base__firewall_admin_addrs: ["10.20.10.50", "10.20.10.17"]` are the operator workstations'
|
||||||
|
**current raw DHCP leases** (mamba + a second box), not reservations (operator decision,
|
||||||
|
2026-06-19). Both share the operator's `sjat` SSH key. Caveats, accepted for now:
|
||||||
|
|
||||||
|
- **Lease drift:** if DHCP reassigns either IP, the rule allows whatever host then holds it
|
||||||
|
(still SSH-key-gated, so low risk) and that workstation loses its *LAN* path. **Backstop:**
|
||||||
|
the workstations also reach ubongo over `wt0` (mesh), so they are never cut off — only the
|
||||||
|
off-mesh LAN convenience lapses until the IP is corrected.
|
||||||
|
- **Revisit trigger (flagged for follow-up):** when OPNsense-as-code lands (ADR-020 perimeter /
|
||||||
|
TODO 3.5), replace both raw leases with **MAC-pinned DHCP reservations** (`10.20.10.17` =
|
||||||
|
MAC `bc:0f:f3:c8:4a:8a`) and allow the reserved addresses. Recorded as a `FRICTION.md` open
|
||||||
|
signal so the next `/kaizen` surfaces it.
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
- **Molecule** (base `default`, render-only, `firewall_apply: false`): the new forward-accept +
|
||||||
|
admin-addr assertions above, with existing assertions green.
|
||||||
|
- **Integration harness** (`make test-integration HOST=ubongo`): on a throwaway UEFI VM, apply
|
||||||
|
the ubongo overlay, assert the ruleset shape, and prove **SSH survives a reboot** from an
|
||||||
|
allowed source (the existing assert/cycle). This is the gate before touching the real control
|
||||||
|
node.
|
||||||
|
- **Live** (during cutover): SSH over `wt0` ✓, from mamba LAN ✓, Ansible self-ping ✓; SSH from a
|
||||||
|
disallowed LAN host dropped ✓; `docker run … ` egress ✓; a fresh `make test-integration`
|
||||||
|
still spins a VM (libvirt NAT intact) ✓.
|
||||||
|
|
||||||
|
## Staged cutover (operator-supervised — lockout-aware, FRICTION signal-6 order)
|
||||||
|
|
||||||
|
ubongo is managed as `sjat` (password sudo), so the live apply needs the operator present
|
||||||
|
anyway. The physical console is open throughout.
|
||||||
|
|
||||||
|
1. **Harness GREEN:** `make test-integration HOST=ubongo` passes (incl. the reboot).
|
||||||
|
2. **Pre-check the real paths** *before* applying: SSH over `wt0`, SSH from mamba
|
||||||
|
(`10.20.10.50`), `ansible ubongo -m ping`. Confirm the physical console is reachable.
|
||||||
|
3. **Dry-run:** `make check PLAYBOOK=site LIMIT=ubongo TAGS=firewall` — review the nftables diff
|
||||||
|
(input default-deny + `wt0` + `10.20.10.151` + `10.20.10.50`; forward `policy accept`).
|
||||||
|
4. **Apply (auto-rollback armed):** `make deploy PLAYBOOK=site LIMIT=ubongo TAGS=firewall` — the
|
||||||
|
firewall concern snapshots, arms the 45 s revert, applies, `reset_connection` →
|
||||||
|
`wait_for_connection` over the live path (`10.20.10.151`), then cancels the timer. A bad
|
||||||
|
ruleset reverts itself; the console is the ultimate fallback.
|
||||||
|
5. **Verify** every path + Docker egress + a fresh integration-VM spin (above).
|
||||||
|
6. **Reboot ubongo; confirm SSH returns on all paths unaided** (console present). Only now is it
|
||||||
|
done — recovery is proven *while the break-glass is still there*.
|
||||||
|
7. **Docs:** update `STATUS.md` (ubongo row: input-only default-deny applied) and `ROADMAP.md`
|
||||||
|
(mesh-hardening 2/3 done; next is sub-project 1 askari redesign or 3 NetBird ACL).
|
||||||
|
|
||||||
|
## Risks & rollback
|
||||||
|
|
||||||
|
- **Self-referential apply** (ubongo runs Ansible against itself): mitigated by the auto-rollback
|
||||||
|
timer, the `wait_for_connection` over the real path, three redundant allowed sources, and the
|
||||||
|
permanent physical console. ubongo cannot be bricked.
|
||||||
|
- **Raw-lease fragility:** documented above; backstopped by the mesh path; revisit with OPNsense.
|
||||||
|
- **No new container isolation** (forward stays accept): accepted — ubongo is a single-tenant
|
||||||
|
control node, not a service host; Docker/libvirt keep their own forward rules. The forward
|
||||||
|
default-deny remains the norm for real service hosts (`base__firewall_input_only: false`).
|
||||||
|
|
||||||
|
## Out of scope / follow-ons
|
||||||
|
|
||||||
|
- askari SSH → `wt0` redesign (sub-project 1) — needs the boot-race + coordinator-bootstrap
|
||||||
|
resolved; folds in the coordinator-robustness (geo-DB FATAL-loop) + off-site backup lessons.
|
||||||
|
- NetBird ACL off Allow-All (sub-project 3) — open mechanism question (no headless API path).
|
||||||
|
- OPNsense DHCP reservations for the admin workstations (`10.20.10.50` mamba, `10.20.10.17`)
|
||||||
|
and ubongo — replace the raw leases with MAC-pinned reservations; flagged in `FRICTION.md`,
|
||||||
|
with OPNsense-as-code.
|
||||||
|
- Forward-chain container isolation on ubongo — deliberately not done here.
|
||||||
|
- `STATUS.md` / `ROADMAP.md` edits land with the implementation, not this spec.
|
||||||
|
|
@ -19,3 +19,15 @@ base__ai_worker_user: claude
|
||||||
# Enrollment only; the host firewall default-deny stays deferred (the mesh-hardening
|
# Enrollment only; the host firewall default-deny stays deferred (the mesh-hardening
|
||||||
# follow-on), so this brings up wt0 without changing SSH exposure.
|
# follow-on), so this brings up wt0 without changing SSH exposure.
|
||||||
base__mesh_enabled: true
|
base__mesh_enabled: true
|
||||||
|
|
||||||
|
# Mesh-hardening 2/3 (2026-06-19, ADR-020/021): apply base's host firewall to ubongo as
|
||||||
|
# INPUT-only default-deny — harden the inbound surface, leave the forward chain permissive so
|
||||||
|
# Docker egress + the libvirt-NAT integration harness keep working. sshd is unchanged
|
||||||
|
# (nftables scopes inbound), so there is no boot-race. Reach ubongo over wt0 (mesh), the
|
||||||
|
# ssh-from-control self-path (base__firewall_control_addr, group_vars/all = 10.20.10.151), or
|
||||||
|
# mamba on the LAN. Break-glass: the physical console. (base__firewall_apply defaults true.)
|
||||||
|
base__firewall_input_only: true
|
||||||
|
base__firewall_admin_addrs:
|
||||||
|
- "10.20.10.50" # mamba over the LAN (NetBird off). Raw DHCP lease — revisit with an
|
||||||
|
# OPNsense reservation when OPNsense-as-code lands; backstopped by wt0.
|
||||||
|
- "10.20.10.17" # 2nd operator workstation (MAC bc:0f:f3:c8:4a:8a). Raw lease — ditto.
|
||||||
|
|
|
||||||
|
|
@ -11,6 +11,14 @@ base__firewall_rollback_timeout: 45 # seconds before the auto-revert fires on a
|
||||||
base__firewall_confirm_timeout: 20 # seconds to re-establish a fresh connection post-apply
|
base__firewall_confirm_timeout: 20 # seconds to re-establish a fresh connection post-apply
|
||||||
base__firewall_dropin_dir: /etc/nftables.d
|
base__firewall_dropin_dir: /etc/nftables.d
|
||||||
base__firewall_apply: true # set false to render+validate without applying (CI/Molecule)
|
base__firewall_apply: true # set false to render+validate without applying (CI/Molecule)
|
||||||
|
base__firewall_input_only: false # true → the forward chain is `policy accept` (host-local
|
||||||
|
# INPUT filtering only). For hosts that forward/route
|
||||||
|
# container or NAT traffic (the control node's Docker +
|
||||||
|
# libvirt-NAT) where a forward default-deny would break
|
||||||
|
# them. Real service hosts keep this false (forward drop).
|
||||||
|
base__firewall_admin_addrs: [] # extra LAN source IPs allowed to SSH, besides wt0 +
|
||||||
|
# ssh-from-control. For an operator workstation reaching
|
||||||
|
# the host over the LAN (no mesh). Key-gated. (ADR-021)
|
||||||
|
|
||||||
# SSH hardening + fail2ban (ADR-002) — `hardening` concern.
|
# SSH hardening + fail2ban (ADR-002) — `hardening` concern.
|
||||||
base__ssh_password_authentication: "no"
|
base__ssh_password_authentication: "no"
|
||||||
|
|
|
||||||
|
|
@ -6,6 +6,8 @@
|
||||||
vars:
|
vars:
|
||||||
base__firewall_apply: false
|
base__firewall_apply: false
|
||||||
base__firewall_control_addr: 10.10.0.99 # test control-node LAN address
|
base__firewall_control_addr: 10.10.0.99 # test control-node LAN address
|
||||||
|
base__firewall_admin_addrs:
|
||||||
|
- "10.30.0.77" # fixture: an operator-workstation LAN source (admin-addr SSH allow)
|
||||||
# Exercise the mesh concern's include path with the live actions gated off, so it
|
# Exercise the mesh concern's include path with the live actions gated off, so it
|
||||||
# runs hermetically (no coordinator/key needed) and must be a clean no-op.
|
# runs hermetically (no coordinator/key needed) and must be a clean no-op.
|
||||||
base__mesh_enabled: true
|
base__mesh_enabled: true
|
||||||
|
|
|
||||||
|
|
@ -51,6 +51,20 @@
|
||||||
- "'include \"/etc/nftables.d/*.nft\"' in nft"
|
- "'include \"/etc/nftables.d/*.nft\"' in nft"
|
||||||
fail_msg: "missing drop-in include hook"
|
fail_msg: "missing drop-in include hook"
|
||||||
|
|
||||||
|
- name: Assert the forward chain defaults to policy drop (input_only off)
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'hook forward priority 0; policy drop;' in nft"
|
||||||
|
fail_msg: >-
|
||||||
|
forward chain must default to policy drop when base__firewall_input_only is
|
||||||
|
false (container isolation stays the norm on real service hosts)
|
||||||
|
|
||||||
|
- name: Assert the admin-addr SSH allow rule (operator workstation on the LAN)
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'ip saddr 10.30.0.77 tcp dport 22 accept' in nft"
|
||||||
|
fail_msg: "missing admin-addr SSH allow rule from base__firewall_admin_addrs"
|
||||||
|
|
||||||
- name: Syntax-check the rendered ruleset (no apply)
|
- name: Syntax-check the rendered ruleset (no apply)
|
||||||
ansible.builtin.command: nft -c -f /etc/nftables.conf
|
ansible.builtin.command: nft -c -f /etc/nftables.conf
|
||||||
changed_when: false
|
changed_when: false
|
||||||
|
|
|
||||||
|
|
@ -12,13 +12,16 @@ table inet filter {
|
||||||
{% if base__firewall_control_addr %}
|
{% if base__firewall_control_addr %}
|
||||||
ip saddr {{ base__firewall_control_addr }} tcp dport {{ base__firewall_ssh_port }} accept
|
ip saddr {{ base__firewall_control_addr }} tcp dport {{ base__firewall_ssh_port }} accept
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
{% for addr in base__firewall_admin_addrs %}
|
||||||
|
ip saddr {{ addr }} tcp dport {{ base__firewall_ssh_port }} accept
|
||||||
|
{% endfor %}
|
||||||
ip protocol icmp accept
|
ip protocol icmp accept
|
||||||
ip6 nexthdr ipv6-icmp accept
|
ip6 nexthdr ipv6-icmp accept
|
||||||
{% for r in base__firewall_resolved %}
|
{% for r in base__firewall_resolved %}
|
||||||
ip saddr { {{ r.sources | join(', ') }} } {{ r.proto }} dport {{ r.port }} accept
|
ip saddr { {{ r.sources | join(', ') }} } {{ r.proto }} dport {{ r.port }} accept
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
}
|
}
|
||||||
chain forward { type filter hook forward priority 0; policy drop; }
|
chain forward { type filter hook forward priority 0; policy {{ 'accept' if base__firewall_input_only | bool else 'drop' }}; }
|
||||||
chain output { type filter hook output priority 0; policy accept; }
|
chain output { type filter hook output priority 0; policy accept; }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -201,6 +201,13 @@ def up(host, name=None, mem_mib=DEFAULT_MEM_MIB, vcpus=DEFAULT_VCPUS):
|
||||||
sh(["cloud-localds", "--network-config", str(RUN_DIR / "network-config"),
|
sh(["cloud-localds", "--network-config", str(RUN_DIR / "network-config"),
|
||||||
str(seed), str(RUN_DIR / "user-data"), str(RUN_DIR / "meta-data")])
|
str(seed), str(RUN_DIR / "user-data"), str(RUN_DIR / "meta-data")])
|
||||||
console = CACHE_DIR / f"{name}-console.log"
|
console = CACHE_DIR / f"{name}-console.log"
|
||||||
|
# virt-install has a `#!/usr/bin/env python3` shebang; the Makefile prepends .venv/bin to
|
||||||
|
# PATH (so the venv's ansible tools resolve), which would hijack virt-install into the
|
||||||
|
# isolated venv — it lacks system PyGObject (`gi`) and crashes. Strip the venv from PATH
|
||||||
|
# for this system tool so its shebang finds /usr/bin/python3 (which has gi). Ansible is
|
||||||
|
# invoked via its absolute .venv path elsewhere, so it is unaffected.
|
||||||
|
sys_path = ":".join(p for p in os.environ.get("PATH", "").split(":")
|
||||||
|
if "/.venv/bin" not in p)
|
||||||
sh(["virt-install", "--name", name, "--memory", str(mem_mib), "--vcpus", str(vcpus),
|
sh(["virt-install", "--name", name, "--memory", str(mem_mib), "--vcpus", str(vcpus),
|
||||||
"--boot", "uefi", # genericcloud triple-faults on legacy BIOS handoff; UEFI boots
|
"--boot", "uefi", # genericcloud triple-faults on legacy BIOS handoff; UEFI boots
|
||||||
"--import",
|
"--import",
|
||||||
|
|
@ -210,7 +217,8 @@ def up(host, name=None, mem_mib=DEFAULT_MEM_MIB, vcpus=DEFAULT_VCPUS):
|
||||||
"--osinfo", "debian13",
|
"--osinfo", "debian13",
|
||||||
"--graphics", "none",
|
"--graphics", "none",
|
||||||
"--serial", f"file,path={console}",
|
"--serial", f"file,path={console}",
|
||||||
"--noautoconsole"])
|
"--noautoconsole"],
|
||||||
|
env=dict(os.environ, PATH=sys_path))
|
||||||
ip = wait_for_ip(name)
|
ip = wait_for_ip(name)
|
||||||
wait_for_ssh(ip, "ansible")
|
wait_for_ssh(ip, "ansible")
|
||||||
# Block until cloud-init finishes (incl. apt-get update) so apply sees a ready system.
|
# Block until cloud-init finishes (incl. apt-get update) so apply sees a ready system.
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,7 @@
|
||||||
---
|
---
|
||||||
# Integration-test overlay for the "askari" profile (ADR-025). Passed via `-e @`.
|
# Integration-test overlay for the "askari" profile (ADR-025). Passed via `-e @`.
|
||||||
# Reproduces the 2026-06-17 incident: apply base's nftables default-deny to a Docker host.
|
# Reproduces the 2026-06-17 incident: apply base's nftables default-deny to a Docker host.
|
||||||
|
integration_profile: askari
|
||||||
base__firewall_apply: true
|
base__firewall_apply: true
|
||||||
# Keep a break-glass: sshd stays on all interfaces (never wt0-only in a throwaway VM).
|
# Keep a break-glass: sshd stays on all interfaces (never wt0-only in a throwaway VM).
|
||||||
base__ssh_listen_mesh_only: false
|
base__ssh_listen_mesh_only: false
|
||||||
|
|
|
||||||
18
tests/integration/overrides/ubongo.yml
Normal file
18
tests/integration/overrides/ubongo.yml
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
---
|
||||||
|
# Integration-test overlay for the "ubongo" profile (ADR-025). Passed via `-e @`.
|
||||||
|
# Exercises mesh-hardening 2/3: base's INPUT-only default-deny on the control node — input
|
||||||
|
# chain default-deny, forward chain left permissive (Docker/libvirt-NAT safe), no sshd
|
||||||
|
# ListenAddress change (so no boot-race).
|
||||||
|
integration_profile: ubongo
|
||||||
|
base__firewall_apply: true
|
||||||
|
base__firewall_input_only: true # forward chain renders `policy accept`
|
||||||
|
base__firewall_admin_addrs:
|
||||||
|
- "192.168.150.98" # two representative LAN sources — exercises the
|
||||||
|
- "192.168.150.99" # admin-addr loop with a multi-entry list (like ubongo)
|
||||||
|
# Never wt0-only; never touch the real mesh from a throwaway VM.
|
||||||
|
base__ssh_listen_mesh_only: false
|
||||||
|
base__mesh_enabled: false
|
||||||
|
# Allow SSH from the libvirt-NAT gateway (where the driver/ansible connect from) so the
|
||||||
|
# default-deny apply + the reboot don't lock out the harness. By source IP (interface-
|
||||||
|
# independent). This is the harness's lifeline; the admin-addr above is only exercised.
|
||||||
|
base__firewall_control_addr: "192.168.150.1"
|
||||||
9
tests/integration/profiles/ubongo.json
Normal file
9
tests/integration/profiles/ubongo.json
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
{
|
||||||
|
"groups": ["control"],
|
||||||
|
"applies": [
|
||||||
|
{"playbook": "site.yml", "tags": ["base"]}
|
||||||
|
],
|
||||||
|
"extra_vars_files": ["overrides/ubongo.yml"],
|
||||||
|
"mem_mib": 2048,
|
||||||
|
"vcpus": 2
|
||||||
|
}
|
||||||
|
|
@ -1,33 +1,48 @@
|
||||||
---
|
---
|
||||||
# Integration verify (ADR-025). Outcome-based: proves Docker forwarding survives the
|
# Integration verify (ADR-025). Outcome-based, profile-aware: the active profile is named by
|
||||||
# reboot. The load-bearing check probes the VM's published :80 FROM the controller
|
# `integration_profile` (set in each profile's overlay). Each profile asserts its own success
|
||||||
# (ubongo) — if base's forward-drop killed DNAT, this times out (the FRICTION #1 bug).
|
# criteria; an unknown/unset profile fails loudly (never a silent pass).
|
||||||
- name: Verify the rebooted host
|
- name: Verify the rebooted host
|
||||||
hosts: all
|
hosts: all
|
||||||
become: true
|
become: true
|
||||||
gather_facts: false
|
gather_facts: false
|
||||||
tasks:
|
tasks:
|
||||||
- name: Gather service facts
|
- name: A known integration_profile must be set (no silent pass)
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- integration_profile is defined
|
||||||
|
- integration_profile in ['askari', 'ubongo']
|
||||||
|
fail_msg: "integration_profile must be set in the profile overlay (askari|ubongo)"
|
||||||
|
|
||||||
|
# ── askari profile — Docker host: published-port forwarding survives the reboot ──
|
||||||
|
# The load-bearing check probes the VM's published :80 FROM the controller (ubongo) — if
|
||||||
|
# base's forward-drop killed DNAT, this times out (the FRICTION 2026-06-17 #1 bug).
|
||||||
|
- name: (askari) Gather service facts
|
||||||
|
when: integration_profile == 'askari'
|
||||||
ansible.builtin.service_facts:
|
ansible.builtin.service_facts:
|
||||||
|
|
||||||
- name: Docker daemon is active
|
- name: (askari) Docker daemon is active
|
||||||
|
when: integration_profile == 'askari'
|
||||||
ansible.builtin.assert:
|
ansible.builtin.assert:
|
||||||
that: "ansible_facts.services['docker.service'].state == 'running'"
|
that: "ansible_facts.services['docker.service'].state == 'running'"
|
||||||
fail_msg: "docker.service is not running"
|
fail_msg: "docker.service is not running"
|
||||||
|
|
||||||
- name: Forward chain permits container traffic (drop-in loaded)
|
- name: (askari) Forward chain permits container traffic (drop-in loaded)
|
||||||
|
when: integration_profile == 'askari'
|
||||||
ansible.builtin.command: nft list chain inet filter forward
|
ansible.builtin.command: nft list chain inet filter forward
|
||||||
register: _fwd
|
register: _fwd
|
||||||
changed_when: false
|
changed_when: false
|
||||||
|
|
||||||
- name: Assert container forwarding is allowed (not pure drop)
|
- name: (askari) Assert container forwarding is allowed (not pure drop)
|
||||||
|
when: integration_profile == 'askari'
|
||||||
ansible.builtin.assert:
|
ansible.builtin.assert:
|
||||||
that: "'accept' in _fwd.stdout"
|
that: "'accept' in _fwd.stdout"
|
||||||
fail_msg: >-
|
fail_msg: >-
|
||||||
forward chain is pure drop — container forwarding will die on reboot
|
forward chain is pure drop — container forwarding will die on reboot
|
||||||
(FRICTION 2026-06-17 #1). docker_host container-forward drop-in missing.
|
(FRICTION 2026-06-17 #1). docker_host container-forward drop-in missing.
|
||||||
|
|
||||||
- name: Published port answers from the controller (DNAT + forward alive)
|
- name: (askari) Published port answers from the controller (DNAT + forward alive)
|
||||||
|
when: integration_profile == 'askari'
|
||||||
delegate_to: localhost
|
delegate_to: localhost
|
||||||
become: false
|
become: false
|
||||||
ansible.builtin.uri:
|
ansible.builtin.uri:
|
||||||
|
|
@ -42,3 +57,29 @@
|
||||||
retries: 5
|
retries: 5
|
||||||
delay: 6
|
delay: 6
|
||||||
until: _probe is succeeded
|
until: _probe is succeeded
|
||||||
|
|
||||||
|
# ── ubongo profile — control node: INPUT-only default-deny survives the reboot ──
|
||||||
|
# SSH reachability across the reboot is proven by the harness itself (it re-SSHes and
|
||||||
|
# checks boot_id changed before this verify runs). Here we assert the ruleset shape.
|
||||||
|
- name: (ubongo) Read the live nftables ruleset
|
||||||
|
when: integration_profile == 'ubongo'
|
||||||
|
ansible.builtin.command: nft list ruleset
|
||||||
|
register: _nft
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: (ubongo) INPUT default-deny, forward permissive, lifeline + admin-addr allow
|
||||||
|
when: integration_profile == 'ubongo'
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
# live `nft list ruleset` prints the SYMBOLIC priority (`filter` = 0), unlike the
|
||||||
|
# rendered /etc/nftables.conf (`priority 0`) that the Molecule scenario asserts against.
|
||||||
|
- "'hook input priority filter; policy drop;' in _nft.stdout"
|
||||||
|
- "'hook forward priority filter; policy accept;' in _nft.stdout"
|
||||||
|
# the ssh-from-control lifeline (base__firewall_control_addr) — the reconnect path
|
||||||
|
- "'ip saddr 192.168.150.1 tcp dport 22 accept' in _nft.stdout"
|
||||||
|
- "'ip saddr 192.168.150.98 tcp dport 22 accept' in _nft.stdout"
|
||||||
|
- "'ip saddr 192.168.150.99 tcp dport 22 accept' in _nft.stdout"
|
||||||
|
fail_msg: >-
|
||||||
|
ubongo profile: expected input policy drop, forward policy accept (input-only),
|
||||||
|
the ssh-from-control lifeline (192.168.150.1), and both admin-addr
|
||||||
|
(192.168.150.98/99) SSH allows in the live ruleset.
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue