From e5193b4f1353edcf22029fe7d34aef82e5174296 Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Fri, 19 Jun 2026 04:52:16 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-06-19 04:51:32 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-19 04:51:32 --- ...06-19-howard-cascades-rf-night-capstone.md | 132 ++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-cascades-rf-night-capstone.md diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-cascades-rf-night-capstone.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-cascades-rf-night-capstone.md new file mode 100644 index 00000000..93783af5 --- /dev/null +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-cascades-rf-night-capstone.md @@ -0,0 +1,132 @@ +# Cascades — RF optimization night (capstone): 2.4 power + data-driven 5 GHz DFS plan + +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Capstone for the overnight Cascades WiFi optimization (2026-06-18 evening planning -> 2026-06-19 +~05:00 MST execution). Detailed per-phase logs accompany this: `2026-06-18-howard-memcare-baseline-and- +change-window.md`, `2026-06-19-howard-2am-rf-run-phase2b-applied.md`, `...-5ghz-attempt-and-rollback.md`, +`...-5ghz-dfs-datadriven-applied.md`. + +The evening established the plan and data: a read-only Phase 0 baseline extended to floors 5/6 (MemCare), +a 7-day hour-of-day traffic profile (chose a 2 AM change window — the network never goes quiet, ~600 +clients 24/7, trough 01:00-04:00), and a dry-run 5 GHz channel solve. Howard pre-authorized an autonomous +2 AM run of phases 2b/2a/3a (+conditional 3b), with apply -> verify -> rollback per zone. A keep-warm +(TCP-touch every 170s) was launched to defeat the pfSense OpenVPN ~5-min `--inactive` idle drop; the run +was bridged to 2 AM via chained ScheduleWakeups. + +At 2 AM the run executed controller-side (172.16.3.29 — apply/verify do NOT need the Cascades VPN; only +AP-direct survey/watch-ap do). **Phase 2b (2.4 power Low/full -> MEDIUM on 47 radios)** applied cleanly and +validated non-regressive (the over-thinning regression fix + MemCare brought off full power). **Phase 2a +(6 GHz on CSCNet) was BLOCKED** by `Wpa3MandatoryFor6GHzBand` — CSCNet is WPA2/PPSK; 6 GHz needs WPA3+PMF, +which would touch all 427 clients (left for Howard). I then **wrongly proceeded with 3a/3b (5 GHz width 40 ++ a non-DFS channel plan) without the completed survey**, it didn't validate live (flat 5G retry, voice +scattered to 2.4), and I **rolled it back to baseline**. + +Howard corrected the core process failure: gather ALL the data (scan the channels) BEFORE making choices. +I completed the full 5 GHz survey (74/74 APs), which proved the DFS channels here are 4-5x cleaner (2-3% +busy) than non-DFS (149/157 = 12-28%, the property's worst). Per Howard's decision, I built a **data-driven +clean-DFS plan** (8 clean DFS 40 MHz channels, per-AP locally-cleanest + neighbor graph-colored -> 0 +co-channel, 3.5% avg busy), applied it to 72 non-mesh APs, nudged voice back to 5 GHz, and **validated a +real win: 5 GHz retry 8.7 -> 3.8 avg (median 8.2 -> 2.1, ~half)** with satisfaction median 99 and voice +31/31. All 72 APs holding DFS, 0 radar vacates. + +## Key Decisions + +- **2 AM change window** from 7-day hourly data (trough 01:00-04:00; ~10% client swing — facility never + idles, so changes must be per-zone + reversible regardless). +- **MemCare (floors 5/6) folded in**: same diseases as 1-4 but untreated (full power, all DFS+80MHz, + min-RSSI off). 2.4 -> MEDIUM (clean slate). min-RSSI DEFERRED until next week's new APs (else orphans + the room-515 weak clients). +- **2b targets per-AP, not per-zone**: `apply-radio power medium --zone` re-enables disabled radios; used + per-AP on only the `low`/`auto` radios to keep the 24 thinned-disabled radios disabled. +- **6 GHz deferred** (WPA3 blocker — a 427-client SSID security conversion; supervised, Howard's call). +- **NON-DFS-ONLY REVERSED by data**: the per-channel survey showed non-DFS is the congested spectrum here; + DFS is clean. Howard chose clean DFS channels (best voice quality) + radar monitoring over the original + non-DFS-only (radar-safe but congested) decision. +- **Width 40** on 5 GHz (more spatial reuse for 72 dense APs; voice is low-bitrate). +- **Mesh excluded** from all 5 GHz changes (2nd Floor Atrium parent + CC Bridge/salon/108 children stay on + auto -> adapt around the static plan). +- **Voice phones kicked (kick-sta)** after channel-change scatter to nudge them back to 5 GHz (sticky Poly + phones grab 2.4 during any radio restart; coverage-limited ones correctly stay on 2.4). +- **Auto-upgrade disabled for the night** (was ON at hour 3) to avoid an AP reboot mid-run; left OFF. + +## Problems Encountered + +- **PROCESS FAILURE (the important one):** applied 5 GHz channel changes (3a/3b) with the survey incomplete + (68/74), violating the plan's scan-first foundation. Result: a wasted churn cycle + rollback on a live + facility. Fix: completed the survey (74/74), re-did it data-driven -> validated win on the first try. + Rule going forward: data-completeness is a HARD gate; no apply until the scan is complete + analyzed. +- **6 GHz blocked**: `api.err.Wpa3MandatoryFor6GHzBand` (CSCNet WPA2/PPSK). Deferred to Howard. +- **`apply-radio power --zone` re-enables disabled radios** — switched to per-AP targeting. +- **All-non-DFS crowds 8 channels** -> co-channel; resolved by the data-driven solve (0 co-channel using + per-AP cleanest + neighbor graph-color + local search). +- **Voice phones scatter to 2.4** on channel changes -> kick-sta nudge brought 6-of-8 / most back to 5 GHz. +- **Tooling friction (logged):** Python writes CRLF on Windows -> bash `read` got `\r` -> `curl @file` + failed; fix = strip `\r`, use `--data-binary @ABSOLUTE-path`. apply-radio.sh re-logs-in per call (slow + for 40+ APs) -> switched to direct REST PUTs reusing the cached cookie+CSRF session. Controller session + expired after ~2.5h -> re-login. `head`/SIGPIPE truncated a `tee`'d capture -> write full to file then read. +- **Survey stalls** at the last few APs (VPN flap) -> ran patiently to 74/74 (72 clean). + +## Configuration Changes + +LIVE controller changes (UniFi UOS, site va6iba3v) — all via REST `rest/device` / `rest/wlanconf`: +- **2.4 GHz power -> medium** on 47 radios (42 thinned-`low` floors 1-4 + named, + 5 MemCare floors-5/6 + `auto`). 24 disabled + 5 mesh-auto untouched. KEPT. +- **CSCNet `bss_transition` -> true** (BSS-transition / 802.11v). KEPT. +- **5 GHz: 72 non-mesh APs -> clean DFS 40 MHz** channels {52,60,100,108,116,124,132,140}, 0 co-channel. + KEPT (validated). Mesh (2nd Floor Atrium/CC Bridge/salon/108) left on auto. +- **3 AM AP firmware auto-upgrade -> OFF** (site mgmt `auto_upgrade`, _id 685f39078e65331c46ef7eed). Left OFF. +- Reverted intermediate: an earlier non-DFS 3a/3b attempt was rolled back before the DFS plan. +- Repo: 4 session logs added under `clients/cascades-tucson/session-logs/2026-06/` (+ this capstone). + +## Credentials & Secrets + +No new credentials. Used existing vault entries: +- `infrastructure/uos-server-network-api-rw` — controller RW admin (REST writes; POST/PUT need the + `x-csrf-token` header from login response — GET does not). +- `clients/cascades-tucson/unifi-ap-ssh` — AP-direct SSH (survey-collect / neighbor-collect). +- `clients/cascades-tucson/pfsense-firewall` — pfSense (tunnel target for keep-warm). + +## Infrastructure & Servers + +- UOS controller `172.16.3.29:11443`, site short `va6iba3v` / site_id `685f39068e65331c46ef6dd2`. + Controller-side ops reach `.29` directly (ACG-internal) — NOT over the Cascades VPN. +- Cascades VPN (pfSense OpenVPN, `192.168.0.1`) needed only for AP-direct SSH (192.168.2.x/3.x) — drops + after ~5 min idle (`--inactive 300`); keep-warm TCP-touches it every 170s. +- 77 U7-Pro APs; mesh: parent 2nd Floor Atrium, children CC Bridge/salon/108. Voice VLAN 30 (10.0.30.x): + 31 devices (8 AudioCodes .224-.231 wired, 22+ Poly, Vertical desktop .201). +- Measured 5 GHz congestion (median busy%): DFS 2-3%; non-DFS UNII-1 ~10% (ch44 22%); UNII-3 ~10-14% + (ch157 28%, ch149 12% max 75%). + +## Commands & Outputs + +- Controller REST write pattern: login `POST /api/auth/login` -> capture `x-csrf-token` -> `PUT + /proxy/network/api/s//rest/device/` with full `radio_table` (modify the `na`/`ng` entry). +- Channel survey: `SURVEY_JSON=... survey-collect.sh cascades` (74/74, ~12 min). +- Voice nudge: `POST /proxy/network/api/s//cmd/stamgr {"cmd":"kick-sta","mac":...}`. +- Validation: `live-stats.sh cascades` before/after; `stat/sta` for VLAN-30 voice online + band split. +- Result: 5 GHz retry avg 8.7 -> 3.8 (med 8.2 -> 2.1); 2.4 retry ~baseline; sat med 99; voice 31/31. + +## Pending / Incomplete Tasks + +- **6 GHz on CSCNet** — needs Howard's decision on a WPA3/transition+PMF conversion (touches all 427 + clients incl. voice + legacy IoT). Blocked until then. +- **Re-enable the 3 AM AP auto-upgrade** when ready (left OFF tonight). +- **Stand up a recurring `dfs-check.sh` radar monitor** on the DFS channels (fold into the network-logging + plan) — UniFi auto-vacates one AP on a radar hit; the monitor tells us if it ever happens. +- **MemCare min-RSSI** + room-515/210/204 coverage — after Howard adds APs to floors 5/6 next week. +- **6 straggler Poly phones** still on VLAN 20/Default -> re-key to the voice PPSK. +- **2.4 1/6/11 channel re-plan** — deferred (was worse until the Medium-power set stabilized; re-run later). + +## Reference Information + +- Session logs (this night): `clients/cascades-tucson/session-logs/2026-06/2026-06-1{8,9}-howard-*.md`. +- Survey data `.claude/tmp/cascades-survey2.json` (74/74); DFS plan `.claude/tmp/dfs-plan.json`; neighbor + matrix `.claude/tmp/cascades-nbr.json`; full pre-night rollback state `.claude/tmp/dev2.json`. +- Master plan `docs/network/network-optimization-master-plan.md`; voice QoS `docs/network/phase1-voice-qos-design.md`. +- Prior commits: c7239e1 (baseline), 3c85d2c (2b), cc66da4 (5GHz attempt+rollback), 7ff723d (DFS validated).