From c99615df7e222874632860c033265a4640bb34f2 Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Mon, 15 Jun 2026 20:40:57 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-06-15 20:40:48 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-15 20:40:48 --- .../skills/unifi-wifi/scripts/live-stats.sh | 36 +++++++------ ...026-06-15-howard-cascades-wifi-rf-audit.md | 52 +++++++++++++++++++ 2 files changed, 73 insertions(+), 15 deletions(-) diff --git a/.claude/skills/unifi-wifi/scripts/live-stats.sh b/.claude/skills/unifi-wifi/scripts/live-stats.sh index 65cfa46..f543f51 100644 --- a/.claude/skills/unifi-wifi/scripts/live-stats.sh +++ b/.claude/skills/unifi-wifi/scripts/live-stats.sh @@ -1,8 +1,12 @@ #!/usr/bin/env bash # live-stats.sh — Plane-2 live RF/airtime from the UOS Network API (classic session API). -# Gives CURRENT per-AP per-radio cu_total / cu_self / num_sta / satisfaction / tx_retries and the -# AP RF-neighbor table — for before/after validation of changes and (the neighbor table) the -# materials-aware AP-to-AP coverage graph that unlocks confident radio DISABLES. +# Gives CURRENT per-AP per-radio cu_total / cu_self / num_sta / retry% and device-level +# satisfaction, plus the worst-clients view (signal / retry% / satisfaction_reason) — for +# before/after validation of changes. +# NOTE: satisfaction is populated at the DEVICE level (per-radio is -1 on this controller), +# and tx_retries is a cumulative counter so we report radio_table_stats.tx_retries_pct (a rate). +# TODO: AP-to-AP RF-neighbor table (for confident radio DISABLES) — not a single API field; +# build it from the `rogue` collection by matching our own APs' vap_table BSSIDs. Not done yet. # # AUTH (provision once): the classic API needs a controller admin session. Create a dedicated # READ-ONLY admin in the UniFi UI (OS Settings -> Admins -> add a Viewer), then vault it: @@ -52,21 +56,23 @@ echo "[INFO] site short=$SHORT" curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" | python -c " import sys,json -for d in json.load(sys.stdin).get('data',[]): - if d.get('type')!='uap': continue - print('AP',d.get('name'),'clients=',d.get('num_sta')) +aps=[d for d in json.load(sys.stdin).get('data',[]) if d.get('type')=='uap'] +print('# APs reporting:',len(aps)) +for d in sorted(aps,key=lambda a:str(a.get('name'))): + # device-level satisfaction is the populated one (per-radio satisfaction is -1 on this controller) + print('AP',d.get('name'),'clients=',d.get('num_sta'),'satisfaction=',d.get('satisfaction')) for r in d.get('radio_table_stats',[]): - print(' ',r.get('radio'),'ch',r.get('channel'),'cu_total',r.get('cu_total'),'cu_self_rx',r.get('cu_self_rx'),'cu_self_tx',r.get('cu_self_tx'),'num_sta',r.get('num_sta'),'tx_retries',r.get('tx_retries'),'satisfaction',r.get('satisfaction')) - # RF neighbor table (materials-aware AP-to-AP visibility) if present - for n in (d.get('radio_table') or []): - pass -" 2>&1 | head -60 + print(' ',r.get('radio'),'ch',r.get('channel'),'cu_total',r.get('cu_total'),'cu_self_rx',r.get('cu_self_rx'),'cu_self_tx',r.get('cu_self_tx'),'num_sta',r.get('num_sta'),'retry%',r.get('tx_retries_pct')) +" 2>&1 if [ "$WANT_CLIENTS" = "--clients" ]; then - echo "=== clients (rssi/rate/retries) ===" + echo "=== worst wireless clients by satisfaction (signal / retry% / why) ===" curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/sta" | python -c " import sys,json -for c in json.load(sys.stdin).get('data',[])[:40]: - print(' ',c.get('hostname') or c.get('mac'),'ap',c.get('ap_mac'),'rssi',c.get('rssi'),'signal',c.get('signal'),'tx_rate',c.get('tx_rate'),'retries',c.get('tx_retries'),'sat',c.get('satisfaction')) -" 2>&1 | head -45 +cs=[c for c in json.load(sys.stdin).get('data',[]) if not c.get('is_wired')] +cs.sort(key=lambda c:(c.get('satisfaction') if isinstance(c.get('satisfaction'),(int,float)) else 999)) +print('# wireless clients:',len(cs),' (worst 40 by satisfaction)') +for c in cs[:40]: + print(' ',(c.get('hostname') or c.get('mac')),'sat',c.get('satisfaction'),'signal',c.get('signal'),'noise',c.get('noise'),'retry%',c.get('wifi_tx_retries_percentage'),'band',c.get('radio'),'ch',c.get('channel'),'why',c.get('satisfaction_reason')) +" 2>&1 fi diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md index 25b99d1..ec16cc7 100644 --- a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md @@ -71,3 +71,55 @@ Key audit output: 2.4 cu_total 74–94% / interf 61–81% / ~1 client; retry 40 - unifi-wifi skill: `.claude/skills/unifi-wifi/` (methodology.md, data-access.md, interference-model.md). - Prior wireless log: `clients/cascades-tucson/session-logs/2026-05-16-howard-wireless-diagnostic.md`. - UOS system wiki: `wiki/systems/uos-server.md`. + +--- + +## Update: 20:40 PT — RW cred arrived, live (Plane 2) re-look, live-stats.sh accuracy fixes + +Mike vaulted the RW controller admin (`infrastructure/uos-server-network-api-rw`) within ~30 min +of the request, so the live Network API (Plane 2) became available. Re-audited Cascades live and, +in doing so, found + fixed accuracy bugs in `unifi-wifi/scripts/live-stats.sh` that had skewed the +earlier read. + +### live-stats.sh bugs fixed (held a coord lock; messaged Mike e8be889f) +1. `stat/device` output hard-capped at `head -60` -> only ~15 of 77 APs were shown (we'd been + judging the whole site from a 15-AP sample). Removed the cap; verified 77/77 now. +2. `satisfaction` was read per-radio (always `-1` on this controller). DEVICE-level satisfaction + IS populated -> switched to `d['satisfaction']`. +3. `tx_retries` was the raw cumulative counter (scales with traffic, misleading). Switched to + `radio_table_stats.tx_retries_pct` (a true rate). +4. `--clients` was `[:40]` unsorted (hid 90% of 574 clients). Now sorts worst-by-satisfaction and + prints signal/noise/retry%/`satisfaction_reason`. +5. RF-neighbor table left as a documented TODO in the header (not a single API field; must be built + from the `rogue` BSSID cross-ref vs each AP's `vap_table`). It's what unlocks confident radio + DISABLES; until then power-down/channel/width are the safe levers. + +### Corrected diagnosis (the data fix changed a conclusion) +- Accurate avg retry RATE: **2.4GHz 11.2%** > 5GHz clear 9.0% ~= 5GHz DFS 8.4%. My mid-session + claim that "5GHz/DFS is now the #1 problem" was an ARTIFACT of the raw counter + 15-AP sample and + is WITHDRAWN. On the rate, DFS is NOT retrying worse than clear channels. +- **2.4GHz is the primary pain band** (highest retry; 27 of the 40 worst clients are on 2.4, retry + 11-42%, mostly IoT/legacy: Ring cams, robotic cleaner, smart plugs, EPSON printer, Poly phone, + handheld scanners, Watch). The original 2.4 power-down/prune plan stands as #1. +- **DFS = resilience risk, not throughput killer.** 55/77 5GHz radios on DFS near Davis-Monthan = + radar-vacate exposure (client drops), worth moving off DFS for stability, but not urgent for + performance. +- **6GHz still dead (1 client of 574)** — top untapped clean/non-DFS capacity; steering remains a + key opportunity. +- **AP-level satisfaction 95-100 across the fleet** — network is healthy on average; pain is in the + client tail = consistent with "bad for SOME users." + +### Cascades live snapshot (2026-06-15 ~20:30 PT) +- 77/77 APs reporting; 574 wireless clients. Band split: na (5GHz) ~87 / ng (2.4) ~21 sampled per + pull / 6e ~1. +- 2.4 cu_total 69-94% live (saturation confirmed). 5GHz cu_total mostly <40%. +- Worst clients: RingStickupCam (sat 60), Galaxy-A32-5G (60), DIRECTV (71), Samsung on ch1 (76, + retry 42%). + +### Updated next steps +- [ ] Build Floor-pilot DRY-RUN: 2.4 power-down to Low on one floor (e.g. Floor 4), validate live + cu_total/retry% before+after via the fixed live-stats.sh. Get explicit go before `--apply`. +- [ ] Implement the AP-to-AP RF-neighbor table (rogue BSSID x vap_table) to enable safe DISABLEs. +- [ ] 6GHz steering plan; 5GHz 80->40MHz + non-DFS channel plan (resilience). +- [ ] Coord msgs this update: RW-cred request 6b98282f (+todo cbb355ef); live-stats fix e8be889f. +- [ ] pfSense `.ovpn` (Howard handling) — needed for per-AP watch-ap.sh live stream.