sync: auto-sync from HOWARD-HOME at 2026-06-15 20:40:48

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-15 20:40:48
This commit is contained in:
2026-06-15 20:40:57 -07:00
parent f341ee9398
commit c99615df7e
2 changed files with 73 additions and 15 deletions

View File

@@ -71,3 +71,55 @@ Key audit output: 2.4 cu_total 7494% / interf 6181% / ~1 client; retry 40
- unifi-wifi skill: `.claude/skills/unifi-wifi/` (methodology.md, data-access.md, interference-model.md).
- Prior wireless log: `clients/cascades-tucson/session-logs/2026-05-16-howard-wireless-diagnostic.md`.
- UOS system wiki: `wiki/systems/uos-server.md`.
---
## Update: 20:40 PT — RW cred arrived, live (Plane 2) re-look, live-stats.sh accuracy fixes
Mike vaulted the RW controller admin (`infrastructure/uos-server-network-api-rw`) within ~30 min
of the request, so the live Network API (Plane 2) became available. Re-audited Cascades live and,
in doing so, found + fixed accuracy bugs in `unifi-wifi/scripts/live-stats.sh` that had skewed the
earlier read.
### live-stats.sh bugs fixed (held a coord lock; messaged Mike e8be889f)
1. `stat/device` output hard-capped at `head -60` -> only ~15 of 77 APs were shown (we'd been
judging the whole site from a 15-AP sample). Removed the cap; verified 77/77 now.
2. `satisfaction` was read per-radio (always `-1` on this controller). DEVICE-level satisfaction
IS populated -> switched to `d['satisfaction']`.
3. `tx_retries` was the raw cumulative counter (scales with traffic, misleading). Switched to
`radio_table_stats.tx_retries_pct` (a true rate).
4. `--clients` was `[:40]` unsorted (hid 90% of 574 clients). Now sorts worst-by-satisfaction and
prints signal/noise/retry%/`satisfaction_reason`.
5. RF-neighbor table left as a documented TODO in the header (not a single API field; must be built
from the `rogue` BSSID cross-ref vs each AP's `vap_table`). It's what unlocks confident radio
DISABLES; until then power-down/channel/width are the safe levers.
### Corrected diagnosis (the data fix changed a conclusion)
- Accurate avg retry RATE: **2.4GHz 11.2%** > 5GHz clear 9.0% ~= 5GHz DFS 8.4%. My mid-session
claim that "5GHz/DFS is now the #1 problem" was an ARTIFACT of the raw counter + 15-AP sample and
is WITHDRAWN. On the rate, DFS is NOT retrying worse than clear channels.
- **2.4GHz is the primary pain band** (highest retry; 27 of the 40 worst clients are on 2.4, retry
11-42%, mostly IoT/legacy: Ring cams, robotic cleaner, smart plugs, EPSON printer, Poly phone,
handheld scanners, Watch). The original 2.4 power-down/prune plan stands as #1.
- **DFS = resilience risk, not throughput killer.** 55/77 5GHz radios on DFS near Davis-Monthan =
radar-vacate exposure (client drops), worth moving off DFS for stability, but not urgent for
performance.
- **6GHz still dead (1 client of 574)** — top untapped clean/non-DFS capacity; steering remains a
key opportunity.
- **AP-level satisfaction 95-100 across the fleet** — network is healthy on average; pain is in the
client tail = consistent with "bad for SOME users."
### Cascades live snapshot (2026-06-15 ~20:30 PT)
- 77/77 APs reporting; 574 wireless clients. Band split: na (5GHz) ~87 / ng (2.4) ~21 sampled per
pull / 6e ~1.
- 2.4 cu_total 69-94% live (saturation confirmed). 5GHz cu_total mostly <40%.
- Worst clients: RingStickupCam (sat 60), Galaxy-A32-5G (60), DIRECTV (71), Samsung on ch1 (76,
retry 42%).
### Updated next steps
- [ ] Build Floor-pilot DRY-RUN: 2.4 power-down to Low on one floor (e.g. Floor 4), validate live
cu_total/retry% before+after via the fixed live-stats.sh. Get explicit go before `--apply`.
- [ ] Implement the AP-to-AP RF-neighbor table (rogue BSSID x vap_table) to enable safe DISABLEs.
- [ ] 6GHz steering plan; 5GHz 80->40MHz + non-DFS channel plan (resilience).
- [ ] Coord msgs this update: RW-cred request 6b98282f (+todo cbb355ef); live-stats fix e8be889f.
- [ ] pfSense `.ovpn` (Howard handling) — needed for per-AP watch-ap.sh live stream.