sync: auto-sync from HOWARD-HOME at 2026-06-15 20:40:48

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-15 20:40:48
This commit is contained in:
2026-06-15 20:40:57 -07:00
parent f341ee9398
commit c99615df7e
2 changed files with 73 additions and 15 deletions

View File

@@ -1,8 +1,12 @@
#!/usr/bin/env bash
# live-stats.sh — Plane-2 live RF/airtime from the UOS Network API (classic session API).
# Gives CURRENT per-AP per-radio cu_total / cu_self / num_sta / satisfaction / tx_retries and the
# AP RF-neighbor table — for before/after validation of changes and (the neighbor table) the
# materials-aware AP-to-AP coverage graph that unlocks confident radio DISABLES.
# Gives CURRENT per-AP per-radio cu_total / cu_self / num_sta / retry% and device-level
# satisfaction, plus the worst-clients view (signal / retry% / satisfaction_reason) — for
# before/after validation of changes.
# NOTE: satisfaction is populated at the DEVICE level (per-radio is -1 on this controller),
# and tx_retries is a cumulative counter so we report radio_table_stats.tx_retries_pct (a rate).
# TODO: AP-to-AP RF-neighbor table (for confident radio DISABLES) — not a single API field;
# build it from the `rogue` collection by matching our own APs' vap_table BSSIDs. Not done yet.
#
# AUTH (provision once): the classic API needs a controller admin session. Create a dedicated
# READ-ONLY admin in the UniFi UI (OS Settings -> Admins -> add a Viewer), then vault it:
@@ -52,21 +56,23 @@ echo "[INFO] site short=$SHORT"
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" | python -c "
import sys,json
for d in json.load(sys.stdin).get('data',[]):
if d.get('type')!='uap': continue
print('AP',d.get('name'),'clients=',d.get('num_sta'))
aps=[d for d in json.load(sys.stdin).get('data',[]) if d.get('type')=='uap']
print('# APs reporting:',len(aps))
for d in sorted(aps,key=lambda a:str(a.get('name'))):
# device-level satisfaction is the populated one (per-radio satisfaction is -1 on this controller)
print('AP',d.get('name'),'clients=',d.get('num_sta'),'satisfaction=',d.get('satisfaction'))
for r in d.get('radio_table_stats',[]):
print(' ',r.get('radio'),'ch',r.get('channel'),'cu_total',r.get('cu_total'),'cu_self_rx',r.get('cu_self_rx'),'cu_self_tx',r.get('cu_self_tx'),'num_sta',r.get('num_sta'),'tx_retries',r.get('tx_retries'),'satisfaction',r.get('satisfaction'))
# RF neighbor table (materials-aware AP-to-AP visibility) if present
for n in (d.get('radio_table') or []):
pass
" 2>&1 | head -60
print(' ',r.get('radio'),'ch',r.get('channel'),'cu_total',r.get('cu_total'),'cu_self_rx',r.get('cu_self_rx'),'cu_self_tx',r.get('cu_self_tx'),'num_sta',r.get('num_sta'),'retry%',r.get('tx_retries_pct'))
" 2>&1
if [ "$WANT_CLIENTS" = "--clients" ]; then
echo "=== clients (rssi/rate/retries) ==="
echo "=== worst wireless clients by satisfaction (signal / retry% / why) ==="
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/sta" | python -c "
import sys,json
for c in json.load(sys.stdin).get('data',[])[:40]:
print(' ',c.get('hostname') or c.get('mac'),'ap',c.get('ap_mac'),'rssi',c.get('rssi'),'signal',c.get('signal'),'tx_rate',c.get('tx_rate'),'retries',c.get('tx_retries'),'sat',c.get('satisfaction'))
" 2>&1 | head -45
cs=[c for c in json.load(sys.stdin).get('data',[]) if not c.get('is_wired')]
cs.sort(key=lambda c:(c.get('satisfaction') if isinstance(c.get('satisfaction'),(int,float)) else 999))
print('# wireless clients:',len(cs),' (worst 40 by satisfaction)')
for c in cs[:40]:
print(' ',(c.get('hostname') or c.get('mac')),'sat',c.get('satisfaction'),'signal',c.get('signal'),'noise',c.get('noise'),'retry%',c.get('wifi_tx_retries_percentage'),'band',c.get('radio'),'ch',c.get('channel'),'why',c.get('satisfaction_reason'))
" 2>&1
fi

View File

@@ -71,3 +71,55 @@ Key audit output: 2.4 cu_total 7494% / interf 6181% / ~1 client; retry 40
- unifi-wifi skill: `.claude/skills/unifi-wifi/` (methodology.md, data-access.md, interference-model.md).
- Prior wireless log: `clients/cascades-tucson/session-logs/2026-05-16-howard-wireless-diagnostic.md`.
- UOS system wiki: `wiki/systems/uos-server.md`.
---
## Update: 20:40 PT — RW cred arrived, live (Plane 2) re-look, live-stats.sh accuracy fixes
Mike vaulted the RW controller admin (`infrastructure/uos-server-network-api-rw`) within ~30 min
of the request, so the live Network API (Plane 2) became available. Re-audited Cascades live and,
in doing so, found + fixed accuracy bugs in `unifi-wifi/scripts/live-stats.sh` that had skewed the
earlier read.
### live-stats.sh bugs fixed (held a coord lock; messaged Mike e8be889f)
1. `stat/device` output hard-capped at `head -60` -> only ~15 of 77 APs were shown (we'd been
judging the whole site from a 15-AP sample). Removed the cap; verified 77/77 now.
2. `satisfaction` was read per-radio (always `-1` on this controller). DEVICE-level satisfaction
IS populated -> switched to `d['satisfaction']`.
3. `tx_retries` was the raw cumulative counter (scales with traffic, misleading). Switched to
`radio_table_stats.tx_retries_pct` (a true rate).
4. `--clients` was `[:40]` unsorted (hid 90% of 574 clients). Now sorts worst-by-satisfaction and
prints signal/noise/retry%/`satisfaction_reason`.
5. RF-neighbor table left as a documented TODO in the header (not a single API field; must be built
from the `rogue` BSSID cross-ref vs each AP's `vap_table`). It's what unlocks confident radio
DISABLES; until then power-down/channel/width are the safe levers.
### Corrected diagnosis (the data fix changed a conclusion)
- Accurate avg retry RATE: **2.4GHz 11.2%** > 5GHz clear 9.0% ~= 5GHz DFS 8.4%. My mid-session
claim that "5GHz/DFS is now the #1 problem" was an ARTIFACT of the raw counter + 15-AP sample and
is WITHDRAWN. On the rate, DFS is NOT retrying worse than clear channels.
- **2.4GHz is the primary pain band** (highest retry; 27 of the 40 worst clients are on 2.4, retry
11-42%, mostly IoT/legacy: Ring cams, robotic cleaner, smart plugs, EPSON printer, Poly phone,
handheld scanners, Watch). The original 2.4 power-down/prune plan stands as #1.
- **DFS = resilience risk, not throughput killer.** 55/77 5GHz radios on DFS near Davis-Monthan =
radar-vacate exposure (client drops), worth moving off DFS for stability, but not urgent for
performance.
- **6GHz still dead (1 client of 574)** — top untapped clean/non-DFS capacity; steering remains a
key opportunity.
- **AP-level satisfaction 95-100 across the fleet** — network is healthy on average; pain is in the
client tail = consistent with "bad for SOME users."
### Cascades live snapshot (2026-06-15 ~20:30 PT)
- 77/77 APs reporting; 574 wireless clients. Band split: na (5GHz) ~87 / ng (2.4) ~21 sampled per
pull / 6e ~1.
- 2.4 cu_total 69-94% live (saturation confirmed). 5GHz cu_total mostly <40%.
- Worst clients: RingStickupCam (sat 60), Galaxy-A32-5G (60), DIRECTV (71), Samsung on ch1 (76,
retry 42%).
### Updated next steps
- [ ] Build Floor-pilot DRY-RUN: 2.4 power-down to Low on one floor (e.g. Floor 4), validate live
cu_total/retry% before+after via the fixed live-stats.sh. Get explicit go before `--apply`.
- [ ] Implement the AP-to-AP RF-neighbor table (rogue BSSID x vap_table) to enable safe DISABLEs.
- [ ] 6GHz steering plan; 5GHz 80->40MHz + non-DFS channel plan (resilience).
- [ ] Coord msgs this update: RW-cred request 6b98282f (+todo cbb355ef); live-stats fix e8be889f.
- [ ] pfSense `.ovpn` (Howard handling) — needed for per-AP watch-ap.sh live stream.