cascades 5GHz fix attempted (3a/3b) then ROLLED BACK; net kept = 2b only

Correction to earlier "deferred" report: after Howard pushed (5GHz needs fixing
regardless of 6GHz), I attempted width40 + non-DFS channel plan autonomously.
It did NOT validate live: 5G retry flat (8.7->8.4), 2.4 retry up (12->16) from
voice phones scattering to 2.4. ROOT CAUSE: the non-DFS channels here (149/157)
carry the heaviest EXTERNAL interference while DFS was cleaner -> forcing non-DFS
traded clean DFS for congested non-DFS. Rolled 5GHz back to baseline (channel+80MHz).
Kicked the 8 stuck Poly phones -> 6 back to 5GHz (rest are coverage-gap rooms).

End state recovered: satisfaction 98.4/med99, voice 31/31. Kept: 2b (2.4 power)
+ BSS-transition. 5GHz unchanged from start. auto_upgrade left OFF.

Doing 5GHz right needs the per-channel survey (choose channels by real cleanliness,
not non-DFS policy), reconsider non-DFS-only, 6GHz unblock (WPA3), band-steer voice.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-19 03:08:31 -07:00
parent 3c85d2cfda
commit cc66da4f63

View File

@@ -0,0 +1,63 @@
# Cascades — 5 GHz fix attempted (3a/3b) then ROLLED BACK; net kept = 2b only
## User
- **User:** Howard Enos (howard) (autonomous continuation, Howard directed "the 5ghz needed to be fixed")
- **Machine:** Howard-Home
- **Role:** tech
## Summary / correction
Earlier tonight I reported 3a/3b "held for a supervised window." Howard correctly pushed back: 5 GHz
overlaps and needs fixing regardless of 6 GHz. So I attempted 3a (width 80->40) + 3b (non-DFS channel
plan) autonomously (~02:20-02:50 MST). **It did not validate live, so I ROLLED IT BACK to baseline.**
Net config change that STUCK tonight = Phase 2b (2.4 power) only.
## What I did and what happened
1. **Width 80->40 on 72 non-mesh na radios** (excluded mesh: 2nd Floor Atrium + children CC Bridge/salon/108).
Applied cleanly.
2. **Channel plan (non-DFS) applied** to the 41 conflicted non-mesh APs (channel-plan metric: 9->0 co-channel
at 40MHz). Then I ALSO force-moved 22 non-conflicting DFS APs to non-DFS to honor the non-DFS policy.
3. **That backfired:** cramming ~66 APs onto only 8 non-DFS 40MHz channels CROWDED them. More importantly,
live 5G retry did NOT improve (8.7 baseline -> 8.4 settled; the transient 6.2 was measured mid-disruption),
and **2.4 retry rose (12->16) because the channel churn scattered ~8 Poly voice phones onto 2.4 GHz**
(sticky — stayed there). Satisfaction dipped to 97 then recovered.
4. **ROOT CAUSE of the non-win (key learning):** the audit's neighbor-density map shows the **non-DFS channels
here carry the HEAVIEST external interference** (ch149 ~9,200 neighbor BSSIDs, ch157 ~7,300), while the DFS
channels (52-144) were CLEANER (consumer gear avoids DFS). So forcing everyone non-DFS traded clean DFS for
congested non-DFS -> no throughput win. **The non-DFS-only decision needs reconsideration given Cascades'
external RF profile**, and choosing channels right REQUIRES the per-channel survey (busy%/noise), which
stalled earlier (68/74). Without it I was channel-planning blind to external congestion.
5. **Rolled back** all 72 non-mesh na radios to the start-of-run baseline (original channel + 80MHz width)
from the dev2.json snapshot. Verified restored.
6. **Nudged the stuck voice phones:** kicked the 8 Poly on 2.4 (cmd/stamgr kick-sta). 6 of 8 re-associated to
5 GHz (band split na 11->18). The 3-4 that stayed on 2.4 are the genuine coverage-gap rooms (515/MemCare)
where 5 GHz is too weak -> correct, and exactly what next week's new APs address.
## END STATE (~03:18 MST) — recovered to baseline
- Satisfaction 98.4 avg / 99 median (baseline 98.7). Voice 31/31 online (18 Poly @5GHz, 4 @2.4, 9 wired).
- 5G retry 9.6, 2.4 retry 15.8 — both still settling down toward baseline (8.7 / 12.0) as clients re-rate.
- **Config kept:** Phase 2b 2.4 power->MEDIUM on 47 radios (validated, non-regressive). CSCNet BSS-transition on.
- **Config reverted:** all 5 GHz (width + channels) back to baseline.
- **auto_upgrade still OFF** (left disabled; re-enable when ready).
## NET RESULT OF THE NIGHT (honest)
- DELIVERED + KEPT: **2b** — undid the 2.4 over-thinning + brought MemCare 2.4 from full power to medium. Safe win.
- ATTEMPTED + REVERTED: 3a/3b 5 GHz — no live improvement; non-DFS channels are the congested ones here.
- BLOCKED: 2a 6 GHz (WPA3 mandate on the WPA2/PPSK CSCNet SSID).
- 5 GHz is back where it started. It still has the overlap issue Howard noted — but fixing it RIGHT needs:
(a) the per-channel survey (external busy%) to choose channels by real cleanliness, not non-DFS policy;
(b) reconsider non-DFS-only vs using the cleaner DFS channels (resilience vs throughput tradeoff);
(c) ideally 6 GHz unblocked (WPA3) for offload; (d) band-steering/2.4-disable for voice so phones don't
stick to 2.4 during any disruption. Best done supervised with the survey in hand.
## Gotchas (this session)
- `apply-radio power medium --zone` re-enables disabled radios (use per-AP for power on a thinned fleet).
- 6 GHz needs WPA3+PMF (api.err.Wpa3MandatoryFor6GHzBand).
- channel-plan.sh is conflict-driven: it leaves non-conflicting APs on DFS; "all non-DFS" needs forcing,
which crowds 8 channels. Co-channel-min and all-non-DFS conflict when AP count >> non-DFS channel count.
- channel changes scatter sticky Poly phones onto 2.4; kick-sta nudges them back to 5 GHz (coverage-limited
ones correctly stay on 2.4).
- Python file writes are CRLF on Windows -> strip \r before using as a shell path; curl needs --data-binary @ABSpath.
## Reference
- Rollback snapshots in .claude/tmp/ (dev2.json = full pre-run state). Runbook .claude/tmp/cascades-2am-runbook.md.
- Site va6iba3v; controller 172.16.3.29 (apply/verify controller-side, not over the Cascades VPN).