diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md index 1728ac8..5be0d20 100644 --- a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md @@ -721,3 +721,52 @@ Commits: 58ecc5a (pfSense-ssh + health + off-hold), e42ad8f (memory). ### Still open (the actual WiFi win) Tonight's 2.4 remediation per the runbook (reports/2026-06-16-2.4ghz-remediation-runbook.md): scope (power-down-all + pilot one floor vs all 25) + whether Claude runs validation gates live during execution. + +## Update: 23:53 PT (2026-06-16) — 2.4 remediation EXECUTED (power-down + partial disables); PAUSED before Floor 2 + +Howard rebooted the Cascades pfSense, then gave the go to push the 2.4 changes. (Changes go through the +UOS controller 172.16.3.29, NOT the pfSense/VPN — reboot didn't block them. Verified fleet healthy +post-reboot: 77 adopted / 2 disconnected, unchanged.) + +### Phase 1 — 2.4 power-down to Low: COMPLETE (65 radios) +Applied auto->low across Floors 1,2,3 + misc (Floor 4 was already done). Floor 3 by-zone (17); Floors 1/2 ++ misc PER-AP to skip mesh + offline + 128. Final ng state: low=65, auto=11 (correctly left: mesh parents +2nd Floor Atrium/206 U7 Pro, mesh children CC Bridge/salon, offline 108/108U7 Pro, EXCLUDED Floors 5/6 = +505/517/608/615/622), disabled=1 (128). 0 failures, no AP knocked offline. +Initially missed 247 (Floor 2) and 4 Floor-1 stragglers (121/122/109/102) — see Problems; all fixed. + +### Phase 2 — re-measure (15-min settle): the cumulative WIN +Site-wide 2.4 averages (live-stats, ~74 radios): + PRE power-down: cu_total 77% cu_interf 64% retry 12.2% clients 134 + POST settle: cu_total 68% cu_interf 53% retry 13.8% clients 111 +=> cu_interf -11 pts, cu_total -9 pts site-wide (vs only ~4 pts from the Floor-4-only test earlier -> +confirms the benefit is cumulative/site-wide). Retry +1.6 is midnight client-mix noise (fewer clients). + +### Phase 3 — disables: 15 of 23 done, PAUSED before Floor 2 +Refreshed coverage-thin (MINCOV=2, mesh-aware) -> conservative set = 23 (low-client <2 avg, >=2 active +coverers, non-mesh, non-memcare, Floors 1-4). Disabled (apply-radio ng disable --ap, spaced): + Floor 4: 445 442 407 450 Floor 3: 342 317 304 336 Floor 1: 139 115 102 145 127 122 116 +REMAINING (Howard said "lets pause"): **Floor 2: 240 229 241 209 203 248 204 236** (8 APs). +Fleet after disables: 77 adopted / 2 disconnected — disabling 2.4 just drops that radio; AP stays up on +5/6 GHz, every disabled radio's area keeps >=2 active 2.4 coverers. The force-provision hazard does NOT +apply to `disable` (clean radio-off, unlike the 445 force-provision incident). + +### Other this turn +- apply-wlan.sh BUG FIXED: wlan_bands token was "6e" but THIS controller stores "6g" (verified live on + Guest SSID) -> 6 GHz enablement would have failed. Fixed (5g6g/6g/all). Commit 5ad25d1. +- Runbook: folded in Phase 5 (5 GHz: 80->40 width on 76 radios + channel plan with DFS decision flagged -- + DFS empirically CLEAN here so including clean-DFS gives ~20 ch vs ~5 non-DFS-only) and Phase 6 (6 GHz: + ROOT CAUSE 6 GHz empty = production SSID CSCNet not on 6 GHz [bands 2g,5g]; add 6g + bss-transition). +- Floors 5 & 6 EXCLUDED from the whole plan (Howard). + +### RESUME / next +1. Finish Phase 3: disable Floor 2 (240 229 241 209 203 248 204 236), spaced, validate. +2. Proper site-wide re-measure after settle (compare cu_interf again). +3. Phase 5 (5 GHz width+channel) + Phase 6 (6 GHz CSCNet) when ready. +ROLLBACK: re-enable a radio = apply-radio cascades ng enable --ap "" --apply; restore power = +apply-radio cascades ng power auto --zone "Floor X" --apply. (apply-radio rollback JSON is per-call in +.claude/tmp, overwritten each call -> for power-down revert just set power auto.) + +### Friction logged +apply-radio per-AP --apply loop: each call re-logs-in to the controller; rapid succession throttles and +the write silently skips (no [ok]). Fix: space calls (sleep ~4) or add multi-AP/one-login support.