diff --git a/clients/cascades-tucson/reports/2026-06-16-2.4ghz-remediation-runbook.md b/clients/cascades-tucson/reports/2026-06-16-2.4ghz-remediation-runbook.md new file mode 100644 index 0000000..a614535 --- /dev/null +++ b/clients/cascades-tucson/reports/2026-06-16-2.4ghz-remediation-runbook.md @@ -0,0 +1,124 @@ +# Cascades of Tucson — 2.4 GHz Remediation Runbook (execute 2026-06-16 night) + +**Goal:** cut 2.4 GHz airtime/co-channel contention (the "bad for some users" cause) by (1) powering down +over-strong 2.4 cells and (2) disabling redundant 2.4 radios where a nearby **active-2.4** neighbor still +covers the area — **without** breaking mesh backhaul, orphaning SSIDs, or opening coverage holes. + +**Reference data:** SNR matrix `.claude/tmp/cascades-nbr.json` (fresh tonight), audit +`reports/2026-06-16-unifi-full-audit.md`. Regenerate any list with the commands in each phase. + +--- + +## Safety facts established before touching anything +- **Mesh topology (live):** `2nd Floor Atrium` is the wireless-mesh **parent** for **CC Bridge** + **salon** + (backhaul ch36 / 5 GHz); `206 U7 Pro` carries `108`. **MESH-PROTECTED — do NOT disable:** `2nd Floor + Atrium, CC Bridge, salon, 206 U7 Pro, 108`. (coverage-thin now auto-excludes these.) +- **SSID pinning:** none — CSC ENT / CSCNet / Guest broadcast on **all** APs (`broadcasting_aps` off). No + client is locked to one AP; disabling a radio just makes clients roam to a neighbor. **No orphaning risk.** +- **Reversibility:** every change is `apply-radio` with a rollback JSON saved to `.claude/tmp/`. Re-enable = + `apply-radio ng enable --ap "" --apply`; restore power = `... ng power auto --ap ""`. +- **AP-hang recovery:** if an AP goes/stays offline after a change, do NOT force-provision (that took 445 + offline on 06-16). Recover with `device-control.sh cascades poe-cycle "" --apply`. + +--- + +## Phase 0 — Pre-flight (do first, ~5 min) +1. Site VPN up + stable; controller + APs reachable: + - `bash .claude/skills/unifi-wifi/scripts/live-stats.sh cascades | head -3` (expect 77 APs reporting) +2. **Baseline snapshot** (to compare after): + - `bash .claude/skills/unifi-wifi/scripts/live-stats.sh cascades > .claude/tmp/baseline-pre.txt` + - `bash .claude/skills/unifi-wifi/scripts/radio-usage.sh cascades ng 77 > .claude/tmp/usage-pre.txt` +3. Pick a **watch AP** per floor (a KEEP neighbor) to eyeball during changes, e.g. + `bash .claude/skills/unifi-wifi/scripts/watch-ap.sh ` in a second terminal. + +--- + +## Phase 1 — Power-down 2.4 to Low (biggest safe win, fully reversible) +Smaller 2.4 cells = less mutual interference, **no coverage loss, no radios turned off.** Do per floor; +DRY-RUN first (omit `--apply`), eyeball, then `--apply`. + +``` +# clean floors (no mesh parent) — safe to do by zone: +bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng power low --zone "Floor 1" --apply +bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng power low --zone "Floor 3" --apply +bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng power low --zone "Floor 4" --apply +bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng power low --zone "Floor 5" --apply +bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng power low --zone "Floor 6" --apply +# Floor 1 caveat: this re-enables the deliberately-disabled 128 (sets it Low). Re-disable it after: +bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng disable --ap "128" --apply +``` +For **Floor 2** and **misc** (contain mesh APs), power-down **per-AP** to skip the mesh parents — do the +non-mesh APs only, and WATCH CC Bridge/salon stay online: +``` +# Floor 2 (skip 2nd Floor Atrium, 206 U7 Pro): +for ap in 203 204 209 210 217 221 222 229 237 240 241 248 236; do \ + bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng power low --ap "$ap" --apply; done +# misc (skip CC Bridge, salon): Kitchen, Memcare*, memcare piano, Dining Room +for ap in "Kitchen" "Dining Room" "Memcare Nurse Station" "memcare piano" "Memcare TV room"; do \ + bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng power low --ap "$ap" --apply; done +``` +**GATE:** after each floor, confirm client counts hold and no dead spot (`live-stats`, watch-ap). If CC +Bridge/salon blip during Floor 2/misc, STOP and reassess. + +--- + +## Phase 2 — Settle + re-measure (~15 min) +Let it converge, then: +``` +bash .claude/skills/unifi-wifi/scripts/live-stats.sh cascades > .claude/tmp/baseline-postA.txt +``` +Compare retry% / cu_total vs `baseline-pre.txt`. Expect 2.4 retry + cu_interf to drop. **If Phase 1 alone +looks good, it is a legitimate stopping point for tonight** — disables can be a separate night. + +--- + +## Phase 3 — Disable redundant 2.4 radios (per-AP, floor by floor, with gates) +Each AP below is low-client (<2 avg), non-mesh, non-memcare, and stays covered by **≥2 active-2.4** +neighbors. **Do ONE FLOOR, validate 15 min, then the next.** Recommend Floor 1 or Floor 4 as the pilot. + +**Floor 1 (7):** `139 115 102 145 127 116 109` +**Floor 2 (8):** `240 221 209 229 241 248 203 236` +**Floor 3 (5):** `342 317 304 336 347` +**Floor 4 (5):** `445 441 403 407 450` + +``` +# one floor at a time, e.g. Floor 4 pilot: +for ap in 445 441 403 407 450; do \ + bash .claude/skills/unifi-wifi/scripts/apply-radio.sh cascades ng disable --ap "$ap" --apply; \ + sleep 5; done +# then VALIDATE before the next floor: +bash .claude/skills/unifi-wifi/scripts/live-stats.sh cascades --clients | head -40 +bash .claude/skills/unifi-wifi/scripts/coverage-thin.sh cascades 14 # re-check remaining coverage (NEIGHBOR_JSON=...) +``` +**GATE (per floor):** clients on disabled APs moved to neighbors; no AP shows a sudden client-drop to ~0 +in a covered area; no user complaints. If a dead spot appears → re-enable that AP: +`apply-radio.sh cascades ng enable --ap "" --apply`. + +--- + +## NOT doing tonight (explicit — so nothing is ambiguous) +- **Power-DOWN instead of disable** (redundant but high-client, keep serving): `3rd Floor Atrium, 318, 303, + 4th Floor Atrium, 406-408, 505, Dining Room` — already set Low in Phase 1; leave ON. +- **HOLD (leave fully on):** `Memcare TV room` (memory-care area); all **MESH-PROTECTED** APs. +- **Companion 2.4 levers deferred to a later night** (isolate cause/effect): min-data-rate raise + (1→12 Mbps), band-steering (`apply-wlan bandsteer`), 2.4 min-RSSI on the 6 OFF APs + (615,608,505,517,622,salon). These are the next big airtime levers once disables are validated. +- **5 GHz / DFS / channel-plan:** separate effort (see full audit). + +--- + +## Phase 4 — Wrap +``` +bash .claude/skills/unifi-wifi/scripts/live-stats.sh cascades > .claude/tmp/baseline-post.txt +# re-snapshot 2.4 usage + audit; then /save the session log. +``` +**Success = 2.4 retry% and cu_interf down on kept radios, client counts steady, zero coverage complaints.** +**Abort/rollback at any point:** re-enable/raise-power the affected APs (rollback JSONs in `.claude/tmp/`), +or `device-control poe-cycle` a hung AP. Power-down is always safe to leave; only disables can hole coverage. + +--- +### Tunable to regenerate the disable plan live +``` +NBR=.claude/tmp/cascades-nbr.json; NBR_JSON=$NBR bash .claude/skills/unifi-wifi/scripts/neighbor-collect.sh cascades +MINCOV=2 CLIENT_CAP=8 NEIGHBOR_JSON=$NBR bash .claude/skills/unifi-wifi/scripts/coverage-thin.sh cascades 14 +```