From cc66da4f63b26fb4fb6318e424855755e2630f1d Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Fri, 19 Jun 2026 03:08:31 -0700 Subject: [PATCH] cascades 5GHz fix attempted (3a/3b) then ROLLED BACK; net kept = 2b only Correction to earlier "deferred" report: after Howard pushed (5GHz needs fixing regardless of 6GHz), I attempted width40 + non-DFS channel plan autonomously. It did NOT validate live: 5G retry flat (8.7->8.4), 2.4 retry up (12->16) from voice phones scattering to 2.4. ROOT CAUSE: the non-DFS channels here (149/157) carry the heaviest EXTERNAL interference while DFS was cleaner -> forcing non-DFS traded clean DFS for congested non-DFS. Rolled 5GHz back to baseline (channel+80MHz). Kicked the 8 stuck Poly phones -> 6 back to 5GHz (rest are coverage-gap rooms). End state recovered: satisfaction 98.4/med99, voice 31/31. Kept: 2b (2.4 power) + BSS-transition. 5GHz unchanged from start. auto_upgrade left OFF. Doing 5GHz right needs the per-channel survey (choose channels by real cleanliness, not non-DFS policy), reconsider non-DFS-only, 6GHz unblock (WPA3), band-steer voice. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...-06-19-howard-5ghz-attempt-and-rollback.md | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-attempt-and-rollback.md diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-attempt-and-rollback.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-attempt-and-rollback.md new file mode 100644 index 00000000..f036daea --- /dev/null +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-attempt-and-rollback.md @@ -0,0 +1,63 @@ +# Cascades — 5 GHz fix attempted (3a/3b) then ROLLED BACK; net kept = 2b only + +## User +- **User:** Howard Enos (howard) (autonomous continuation, Howard directed "the 5ghz needed to be fixed") +- **Machine:** Howard-Home +- **Role:** tech + +## Summary / correction +Earlier tonight I reported 3a/3b "held for a supervised window." Howard correctly pushed back: 5 GHz +overlaps and needs fixing regardless of 6 GHz. So I attempted 3a (width 80->40) + 3b (non-DFS channel +plan) autonomously (~02:20-02:50 MST). **It did not validate live, so I ROLLED IT BACK to baseline.** +Net config change that STUCK tonight = Phase 2b (2.4 power) only. + +## What I did and what happened +1. **Width 80->40 on 72 non-mesh na radios** (excluded mesh: 2nd Floor Atrium + children CC Bridge/salon/108). + Applied cleanly. +2. **Channel plan (non-DFS) applied** to the 41 conflicted non-mesh APs (channel-plan metric: 9->0 co-channel + at 40MHz). Then I ALSO force-moved 22 non-conflicting DFS APs to non-DFS to honor the non-DFS policy. +3. **That backfired:** cramming ~66 APs onto only 8 non-DFS 40MHz channels CROWDED them. More importantly, + live 5G retry did NOT improve (8.7 baseline -> 8.4 settled; the transient 6.2 was measured mid-disruption), + and **2.4 retry rose (12->16) because the channel churn scattered ~8 Poly voice phones onto 2.4 GHz** + (sticky — stayed there). Satisfaction dipped to 97 then recovered. +4. **ROOT CAUSE of the non-win (key learning):** the audit's neighbor-density map shows the **non-DFS channels + here carry the HEAVIEST external interference** (ch149 ~9,200 neighbor BSSIDs, ch157 ~7,300), while the DFS + channels (52-144) were CLEANER (consumer gear avoids DFS). So forcing everyone non-DFS traded clean DFS for + congested non-DFS -> no throughput win. **The non-DFS-only decision needs reconsideration given Cascades' + external RF profile**, and choosing channels right REQUIRES the per-channel survey (busy%/noise), which + stalled earlier (68/74). Without it I was channel-planning blind to external congestion. +5. **Rolled back** all 72 non-mesh na radios to the start-of-run baseline (original channel + 80MHz width) + from the dev2.json snapshot. Verified restored. +6. **Nudged the stuck voice phones:** kicked the 8 Poly on 2.4 (cmd/stamgr kick-sta). 6 of 8 re-associated to + 5 GHz (band split na 11->18). The 3-4 that stayed on 2.4 are the genuine coverage-gap rooms (515/MemCare) + where 5 GHz is too weak -> correct, and exactly what next week's new APs address. + +## END STATE (~03:18 MST) — recovered to baseline +- Satisfaction 98.4 avg / 99 median (baseline 98.7). Voice 31/31 online (18 Poly @5GHz, 4 @2.4, 9 wired). +- 5G retry 9.6, 2.4 retry 15.8 — both still settling down toward baseline (8.7 / 12.0) as clients re-rate. +- **Config kept:** Phase 2b 2.4 power->MEDIUM on 47 radios (validated, non-regressive). CSCNet BSS-transition on. +- **Config reverted:** all 5 GHz (width + channels) back to baseline. +- **auto_upgrade still OFF** (left disabled; re-enable when ready). + +## NET RESULT OF THE NIGHT (honest) +- DELIVERED + KEPT: **2b** — undid the 2.4 over-thinning + brought MemCare 2.4 from full power to medium. Safe win. +- ATTEMPTED + REVERTED: 3a/3b 5 GHz — no live improvement; non-DFS channels are the congested ones here. +- BLOCKED: 2a 6 GHz (WPA3 mandate on the WPA2/PPSK CSCNet SSID). +- 5 GHz is back where it started. It still has the overlap issue Howard noted — but fixing it RIGHT needs: + (a) the per-channel survey (external busy%) to choose channels by real cleanliness, not non-DFS policy; + (b) reconsider non-DFS-only vs using the cleaner DFS channels (resilience vs throughput tradeoff); + (c) ideally 6 GHz unblocked (WPA3) for offload; (d) band-steering/2.4-disable for voice so phones don't + stick to 2.4 during any disruption. Best done supervised with the survey in hand. + +## Gotchas (this session) +- `apply-radio power medium --zone` re-enables disabled radios (use per-AP for power on a thinned fleet). +- 6 GHz needs WPA3+PMF (api.err.Wpa3MandatoryFor6GHzBand). +- channel-plan.sh is conflict-driven: it leaves non-conflicting APs on DFS; "all non-DFS" needs forcing, + which crowds 8 channels. Co-channel-min and all-non-DFS conflict when AP count >> non-DFS channel count. +- channel changes scatter sticky Poly phones onto 2.4; kick-sta nudges them back to 5 GHz (coverage-limited + ones correctly stay on 2.4). +- Python file writes are CRLF on Windows -> strip \r before using as a shell path; curl needs --data-binary @ABSpath. + +## Reference +- Rollback snapshots in .claude/tmp/ (dev2.json = full pre-run state). Runbook .claude/tmp/cascades-2am-runbook.md. +- Site va6iba3v; controller 172.16.3.29 (apply/verify controller-side, not over the Cascades VPN).