diff --git a/.claude/skills/unifi-wifi/SKILL.md b/.claude/skills/unifi-wifi/SKILL.md new file mode 100644 index 0000000..9c86d51 --- /dev/null +++ b/.claude/skills/unifi-wifi/SKILL.md @@ -0,0 +1,61 @@ +--- +name: unifi-wifi +description: "Analyze and tune UniFi WiFi for performance + stability, especially in dense/congested environments. Audits AP/radio config and the neighbor-interference map from the UOS controller, flags issues (2.4GHz over-provisioning, channel width, min-RSSI/sticky clients, channel plan), and recommends prioritized changes. Works for any UniFi site on the UOS (172.16.3.29); Cascades is the hard case. Triggers: unifi wifi tuning, RF/airtime/channel analysis, 2.4GHz congestion, AP channel plan, sticky clients, wireless performance." +--- + +# UniFi WiFi tuning (UOS sites) + +Data-driven WiFi tuning for UniFi sites on the **UOS Server** (`172.16.3.29`). Goal: solid +performance + stability for connected devices in congested environments by analyzing what the +controller knows and making prioritized, validated changes. Built for any site; **Cascades** +(77 APs, ~550 clients, brutal 2.4GHz) is the reference hard case. + +## First, load context +- **[references/data-access.md](references/data-access.md)** — what data the UOS exposes and how + to read it (the two planes: Mongo config/interference now, live Network API later). +- **[references/methodology.md](references/methodology.md)** — the prioritized tuning playbook + (synthesized from a multi-model pass + live recon). Read before recommending any change. + +## What it does (current = Plane 1, read-only) +1. **Audit a site** — config + interference, no live plane needed: + ```bash + bash .claude/skills/unifi-wifi/scripts/audit-site.sh + bash .claude/skills/unifi-wifi/scripts/audit-site.sh cascades + ``` + Outputs 2.4/5/6 config summary, the per-channel neighbor-density (interference) map, and flagged + issues (2.4 over-provisioning, 40/80/160MHz width, off-1/6/11 channels, min-RSSI off, high power). +2. **Interpret** the flags against `methodology.md` (fix order: prune 2.4 -> shrink cells/power -> + min data rates -> manual 1/6/11 plan -> min-RSSI + roaming -> steer to 6GHz). +3. **Recommend** a prioritized, per-zone change plan. Roll out per zone, not site-wide at once. + +Ad-hoc Mongo queries: `.claude/scripts/uos-mongo.sh` (recipes in data-access.md). Access is the +vaulted dedicated key `infrastructure/uos-server-ssh-key` (works from any fleet machine). + +## The two data planes (know which you're using) +- **Plane 1 — Mongo `ace`** (available now, read-only via SSH): radio config (`radio_table`), the + `rogue` neighbor-interference map, `channelplan`, AP/client inventory. Enough for the full config + audit + channel/interference plan — covers the 2.4GHz "first problem". +- **Plane 2 — live Network API** (`stat/device`, `stat/sta`; NOT yet wired): live channel + utilization (`cu_total`), per-client RSSI/SNR/retries, AP satisfaction. Needed to **validate** + changes (before/after) and find the worst APs by live airtime. Wiring it needs a dedicated + read-only UniFi admin or integration API key on `.29`, vaulted as + `infrastructure/uos-server-network-api`. See data-access.md "Plane 2". + +## Applying changes — IMPORTANT boundary +This skill is **audit + advisory** today. **Writing config to the controller is not wired** and is +high-stakes (a bad channel/power/disable push degrades a live facility). When change-application is +added it MUST: go through the controller API with an authed account; change **one zone at a time**; +capture live `cu_total`/satisfaction **before and after** each change; and never run nightly/auto +channel optimization in ultra-dense sites (pin a manual plan). Until then, hand the recommended +plan to a tech to apply in the UniFi UI, or get explicit go before any write path is built. + +## Roadmap +- **Phase 1 (done):** config + interference audit, flags, methodology. Read-only. +- **Phase 2:** wire the live Network API (Plane 2) for `cu_total`/satisfaction/per-client RF → + before/after validation + "worst APs by airtime" ranking. +- **Phase 3:** assisted change application (per-zone, API-driven, with live before/after gating). + +## Notes +- The methodology is independent-model guidance + config-plane recon — **validate against live + stats before trusting any single recommendation**, and roll out per zone. +- Multi-site: pass any site name/id; `uos-mongo.sh --sites` lists them. Cascades = `685f39068e65331c46ef6dd2`. diff --git a/.claude/skills/unifi-wifi/references/data-access.md b/.claude/skills/unifi-wifi/references/data-access.md new file mode 100644 index 0000000..d6de6c0 --- /dev/null +++ b/.claude/skills/unifi-wifi/references/data-access.md @@ -0,0 +1,107 @@ +# UniFi WiFi data access — what the UOS controller exposes and how to read it + +This is the data-capability map for the `unifi-wifi` tuning skill, built from live recon of the +**UOS Server** (`172.16.3.29`, self-hosted UniFi OS / classic Network app, Mongo DB `ace`) using +the **Cascades** site (`site_id 685f39068e65331c46ef6dd2`) as the hard case (77 APs / 12 switches, +~550 concurrent wireless clients, severe 2.4GHz neighbor congestion). Access: `infrastructure/uos-server-ssh-key` +(vaulted) + `.claude/scripts/uos-mongo.sh`. See also [[uos-server]]. + +## Two data planes (the key finding) + +| Plane | Source | Reach | Holds | +|---|---|---|---| +| **Config + history** | Mongo `ace` via `uos-mongo.sh` | root SSH, fully available now | radio config, the interference map, channel-plan settings, AP/client inventory | +| **Live RF/airtime** | Controller Network API (`stat/device`, `stat/sta`) | needs a session / integration key — NOT yet wired | current channel utilization, per-client RSSI/retries/tx-rate, AP satisfaction, num_sta | + +The live per-AP utilization and per-client RF stats are **NOT persisted in Mongo** (the `device` +collection carries config but no `radio_table_stats`; the `user`/client collection only keeps +`last_radio`). So a first-pass audit + channel/interference plan comes entirely from Mongo; the +live "current airtime / who's unhappy right now" feedback loop needs the local Network API (see +"Live-stats gap" below). + +## Plane 1 — Mongo `ace` (available now) + +### `device` collection (per AP/switch; filter `type:'uap'` for APs) +Per-AP `radio_table[]` (the config we tune), one entry per radio: +- `radio`: `ng` = 2.4GHz, `na` = 5GHz, `6e` = 6GHz +- `channel`, `ht` (width: 20/40/80/160), `tx_power_mode` (auto/low/medium/high/custom) +- `min_rssi_enabled`, `min_rssi` (sticky-client / roaming floor, e.g. -77) +- plus `atf_enabled` (airtime fairness), `country_code`, `antenna_table`, `scan_radio_table`, + `support_wifi6e`, `wifi_caps`, model/firmware, `num_sta` is present at the device level (last reported). + +Cascades is a **U7-Pro / WiFi-6E** fleet (tri-band ng/na/6e). Example AP: 2.4 ch11/20MHz, +5GHz ch153/80MHz, 6GHz ch145/160MHz, min_rssi -77 enabled. + +### `rogue` collection — the interference map (586,688 docs fleet-wide; the gold) +Every neighbor/over-the-air BSSID an AP has seen, with `band`, `channel`, `ap_mac` (which of our +APs saw it), `bssid`, `essid`, `rssi`, `age`, `site_id`. Aggregate by channel to quantify +**co-channel congestion**. Cascades 2.4GHz is brutal: + +| Band | Channel | Neighbor BSSIDs seen | +|---|---|---| +| 2.4 (ng) | **6** | 33,359 | +| 2.4 (ng) | **1** | 19,275 | +| 2.4 (ng) | **11** | 16,578 | +| 5 (na) | 149 | 8,889 | +| 5 (na) | 157 | 6,964 | +| 5 (na) | 44 | 5,477 | +| 6 (6e) | 69 | 86 | + +Takeaways the skill uses: 2.4GHz is saturated on all three usable channels (so the fix is fewer +2.4 radios + tight power, not "find a clean channel"); 6GHz is nearly empty (steer capable +clients up); 5GHz upper band (149/157) is busier than the UNII-1/DFS lower band. + +### `channelplan` collection +The controller's auto-channel-plan inputs/outputs per site: `channels_ng`/`channels_na`/`channels_6e` +(allowed channel lists), `ht_modes_ng/na/6e`, `method`, `optimize`, `exclude_devices`, +`high_priority_devices`, `date`. Lets the skill read/propose the channel plan the controller will honor. + +### `user` collection — client inventory/history +~1,807 client records for Cascades. Holds identity + `last_radio` (band) but **not** live RF; use +for inventory/segmentation, not live signal. + +### Config-audit signals already computable from Mongo +- 2.4GHz width != 20MHz (40MHz on 2.4 in density = self-inflicted overlap). +- `min_rssi_enabled=false` on 2.4 radios → sticky-client risk. (Cascades: **6 of 77** APs have 2.4 min_rssi disabled.) +- 2.4 channels not on the 1/6/11 plan; adjacent APs on the same channel. +- TX-power mode (auto/high) on 2.4 in dense clusters (should be low/medium). +- Per-AP radio enable: which dense-cluster APs should have their 2.4 radio disabled entirely. + +## Plane 2 — Live RF/airtime (the gap to wire next) + +Live data the tuner ideally also wants (current utilization, satisfaction, per-client RSSI/SNR/ +retries, roam events) lives in the **classic Network API**, session-authenticated: +- `GET /proxy/network/api/s//stat/device` → per-AP `radio_table_stats[]`: + `cu_total` (channel utilization %), `cu_self_rx`/`cu_self_tx` (our own airtime), `num_sta`, + `tx_retries`, `satisfaction`, per radio. +- `GET /proxy/network/api/s//stat/sta` → per-client: `rssi`, `signal`, `noise`, + `tx_rate`/`rx_rate`, `tx_retries`, `satisfaction`, `radio`/`channel`, `nss`, anomalies. + +Auth options (none wired yet): (a) a **dedicated read-only local UniFi admin** → login for a +session cookie; (b) a **Network integration API key** (`X-API-KEY` vs `/proxy/network/integration/v1/...`). +The cloud Site Manager key does NOT authenticate the local API (401); the existing local "Claude" +integration key's value is hashed/unrecoverable. **Action for phase 2:** create a dedicated +read-only admin or integration key on `.29`, vault it (`infrastructure/uos-server-network-api`), +and read live stats from it. Until then the skill runs config+interference analysis (Plane 1), +which already covers the 2.4GHz "first problem". + +## Quick recipes + +```bash +# Cascades site id +bash .claude/scripts/uos-mongo.sh --sites | grep -i casc # 685f39068e65331c46ef6dd2 + +# 2.4GHz neighbor congestion per channel for a site (the interference map) +cat <<'JS' | bash .claude/scripts/uos-mongo.sh +db.rogue.aggregate([{$match:{site_id:'685f39068e65331c46ef6dd2',band:'ng'}}, + {$group:{_id:'$channel',neighbors:{$sum:1}}},{$sort:{neighbors:-1}}]).forEach(printjson) +JS + +# Per-AP 2.4GHz config audit (channel, width, power, min_rssi) +cat <<'JS' | bash .claude/scripts/uos-mongo.sh +db.device.find({site_id:'685f39068e65331c46ef6dd2',type:'uap'},{name:1,radio_table:1}).forEach(function(a){ + (a.radio_table||[]).forEach(function(r){ if(r.radio=='ng') + print(a.name+" ch="+r.channel+" ht="+r.ht+" pwr="+r.tx_power_mode+" min_rssi="+(r.min_rssi_enabled?r.min_rssi:'OFF')); }); +}); +JS +``` diff --git a/.claude/skills/unifi-wifi/references/methodology.md b/.claude/skills/unifi-wifi/references/methodology.md new file mode 100644 index 0000000..73e2300 --- /dev/null +++ b/.claude/skills/unifi-wifi/references/methodology.md @@ -0,0 +1,63 @@ +# High-density UniFi RF tuning — methodology + +Synthesized from a multi-model pass (Grok 4.3 + Gemini 3.1 Pro, which converged strongly) plus +live recon of Cascades. The governing principle in a congested environment: **conserve airtime +and shrink 2.4GHz participation** — you cannot out-tune ambient interference (Cascades sees ~33k +neighbor BSSIDs on 2.4 ch6), so the win is moving capable clients onto smaller 5/6GHz cells and +using 2.4 for legacy coverage only. Applies to any UOS site; thresholds below are starting points +to tune from each site's data. + +## Fix order (do in this sequence — earlier steps de-risk later ones) +1. **Prune 2.4GHz radios.** Disable the 2.4 radio on ~40–60% of APs in dense clusters; keep 2.4 + only where needed for coverage/legacy (perimeter, stairwells, elevators, rooms with 2.4-only + medical pendants/tablets). 77 APs all on 2.4 in this density is catastrophic self-interference. + Target **≤15–25 STAs per active 2.4 radio**. +2. **Power + width.** 2.4 → **Low / custom ~6–11 dBm** (smallest cells so a client hears 2–3 + BSSIDs, not 20). 5GHz → **Medium / ~12–15 dBm, 40MHz** (avoid 80/160 in density — wide channels + destroy spatial reuse). 6GHz → **80MHz, higher power ~18–20 dBm** as a "staircase" that pulls + 6E-capable clients up into the clean lane. +3. **Minimum data rates.** Disable 1–11 Mbps; set 2.4 minimum to **12 or 24 Mbps**. Kills the + management-frame overhead from distant/legacy stations and effectively shrinks cells. +4. **Channel plan (manual).** Strict **1/6/11, 20MHz** on 2.4; assign per-zone so adjacent APs + differ, weighting toward the channel where *our* APs dominate vs neighbor density. 5GHz: spread + across UNII-1 + **DFS** (UNII-2/2e) for more non-overlapping channels; U7-Pro DFS filtering is + good. 6GHz: **PSC channels**. **Do NOT use nightly/auto channel optimization in ultra-dense** — + it can't model intermittent neighbor interference; pin a manual plan. +5. **min-RSSI + roaming.** 2.4 min-RSSI **−75/−76**, 5GHz **−70/−72** (start −72, tighten only if + satisfaction stays high). Enable **802.11r (fast roaming)** + **802.11v (BSS transition)** — + but **test legacy medical/IoT devices first**; if pendants drop, scope 802.11r to a separate + SSID or disable. Senior users pause mid-hallway — don't over-aggress (−65 causes drops). +6. **Steering.** Band steering **prefer 5GHz** globally; enable 6GHz for capable clients. Consider a + separate legacy SSID for fragile 2.4-only devices so the main SSID can be tuned aggressively. +7. **Monitor + iterate** (needs live-stats plane). Re-audit neighbor density monthly — external + interference won't go away; the job is to keep shrinking 2.4 participation and packing capable + clients onto small 5/6GHz cells. + +## Metrics → action thresholds (drive every change from data, not vibes) +| Signal (source) | Threshold | Action | +|---|---|---| +| `cu_total` (channel utilization) | > 50–60% sustained | reduce width, disable a 2.4 radio, or move AP off that band | +| `cu_self_rx`/`cu_self_tx` vs `cu_total` | self low but total high | neighbors eating airtime → raise min data rates, shrink cell, change channel | +| `tx_retries` per radio/client | > 15–20% | co-channel/hidden-node or power too high or sticky client; try a DFS channel | +| `satisfaction` (AP/client) | < 80–90% | investigate that AP/zone | +| `num_sta` per radio | > 40 on a 5GHz radio / > 25 on 2.4 | add steering pressure or another AP; prune nearby 2.4 | +| neighbor BSSIDs per channel (`rogue`) | >> 500 on a channel | don't site critical 2.4 there; consider disabling 2.4 in that wing | +| roam quality | edge RSSI −68..−72, SNR ≥ 20 dB | healthy handoff target | + +Plane 1 (Mongo) gives: config audit, the neighbor-density (`rogue`) map, channel plan, num_sta +(last-reported), min_rssi state. Plane 2 (live Network API, not yet wired) gives: live `cu_total`, +`tx_retries`, `satisfaction`, per-client RSSI/SNR. See [data-access.md](data-access.md). + +## 30-day success criteria +Median client satisfaction > 90%; 2.4 `cu_total` < 40% on active radios; 5/6GHz carrying > 75% of +associations; tx-retry < 10% on the primary SSID. + +## Cascades-specific (the hard case) +77 U7-Pro APs / ~550 clients / 6 of 77 APs currently have 2.4 min-RSSI OFF. 2.4 is saturated on +1/6/11 (16k–33k neighbors each) → aggressive pruning is mandatory, not optional. 6GHz is nearly +empty (ch69: 86 neighbors) → the biggest untapped win is steering 6E-capable clients to 6GHz. +5GHz upper (149/157) is busier than UNII-1/DFS lower → bias the 5GHz plan toward 36–48 + DFS. + +> Caveat: these are independent-model recommendations + config-plane recon. Validate against live +> `cu_total`/satisfaction before/after each change (Plane 2), and roll out per-zone, not site-wide +> at once, so a bad assumption is contained. diff --git a/.claude/skills/unifi-wifi/scripts/audit-site.sh b/.claude/skills/unifi-wifi/scripts/audit-site.sh new file mode 100644 index 0000000..ce79ad2 --- /dev/null +++ b/.claude/skills/unifi-wifi/scripts/audit-site.sh @@ -0,0 +1,58 @@ +#!/usr/bin/env bash +# audit-site.sh — Plane-1 (config + interference) WiFi audit for a UniFi site on the UOS. +# Reads the controller Mongo (ace) via .claude/scripts/uos-mongo.sh; no live-stats plane needed. +# +# Usage: +# bash .claude/skills/unifi-wifi/scripts/audit-site.sh +# bash .claude/skills/unifi-wifi/scripts/audit-site.sh cascades +# +# Output: 2.4/5/6 config summary, the per-channel neighbor-density (interference) map, and +# flagged issues (min_rssi off, 40MHz on 2.4, off-1/6/11 channels, high power). Pair with +# references/methodology.md to turn flags into changes; validate with live stats (Plane 2). +set -euo pipefail +REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)" +UOS="$REPO/.claude/scripts/uos-mongo.sh" +arg="${1:?usage: audit-site.sh }" + +if [[ "$arg" =~ ^[0-9a-f]{24}$ ]]; then + SITE="$arg" +else + SITE="$(bash "$UOS" --sites 2>/dev/null | grep -vi 'pq.html' | grep -i "$arg" | awk '{print $1}' | head -1)" + [ -n "$SITE" ] || { echo "[ERROR] no site matching '$arg' (try: uos-mongo.sh --sites)"; exit 1; } +fi +echo "[INFO] auditing site_id=$SITE" + +cat <&1 | grep -viE 'pq.html|post-quantum|store now|server may need' +var SITE='$SITE'; +var aps=db.device.find({site_id:SITE,type:'uap'},{name:1,radio_table:1}).toArray(); +function tally(o,k){o[k]=(o[k]||0)+1;} +var ng_ch={}, ng_w={}, ng_pwr={}, na_w={}, na_used=0, sixe_used=0, ngOffRssi=0, flags=[]; +aps.forEach(function(a){ + (a.radio_table||[]).forEach(function(r){ + if(r.radio=='ng'){ + tally(ng_ch,r.channel); tally(ng_w,r.ht); tally(ng_pwr,r.tx_power_mode); + if(!r.min_rssi_enabled){ ngOffRssi++; flags.push("2.4 min_rssi OFF: "+a.name); } + if(String(r.ht)!='20') flags.push("2.4 width "+r.ht+"MHz (want 20): "+a.name); + if([1,6,11].indexOf(r.channel)<0) flags.push("2.4 off-plan ch"+r.channel+" (want 1/6/11): "+a.name); + if(/high/i.test(String(r.tx_power_mode))) flags.push("2.4 power=high (want low/medium): "+a.name); + } else if(r.radio=='na'){ na_used++; tally(na_w,r.ht); + } else { sixe_used++; } + }); +}); +print("==== CONFIG SUMMARY ("+aps.length+" APs) ===="); +print(" 2.4GHz channels: "+JSON.stringify(ng_ch)+" (want only 1/6/11)"); +print(" 2.4GHz widths: "+JSON.stringify(ng_w)+" (want only 20)"); +print(" 2.4GHz power: "+JSON.stringify(ng_pwr)+" (want low/medium/custom in density)"); +print(" 2.4GHz min_rssi OFF on "+ngOffRssi+" radios"); +print(" 5GHz radios: "+na_used+" widths: "+JSON.stringify(na_w)+" (want 40 in density, not 80/160)"); +print(" 6GHz radios active: "+sixe_used+" (steer 6E-capable clients here — usually the clean band)"); +print("\n==== NEIGHBOR-DENSITY MAP (rogue = co-channel interference) ===="); +print(" 2.4GHz:"); +db.rogue.aggregate([{\$match:{site_id:SITE,band:'ng'}},{\$group:{_id:'\$channel',n:{\$sum:1}}},{\$sort:{n:-1}},{\$limit:6}]).forEach(function(d){print(" ch"+d._id+": "+d.n+" neighbor BSSIDs")}); +print(" 5GHz (top):"); +db.rogue.aggregate([{\$match:{site_id:SITE,band:'na'}},{\$group:{_id:'\$channel',n:{\$sum:1}}},{\$sort:{n:-1}},{\$limit:6}]).forEach(function(d){print(" ch"+d._id+": "+d.n)}); +print("\n==== FLAGS ("+flags.length+") ===="); +if(flags.length==0) print(" (none from config plane)"); else flags.forEach(function(f){print(" [!] "+f)}); +print("\n[next] map flags -> changes via references/methodology.md (prune 2.4, shrink cells, steer to 6GHz, manual 1/6/11)."); +print("[next] validate cu_total / satisfaction / tx_retries before+after via the live Network API (Plane 2, not yet wired)."); +JS