diff --git a/.claude/skills/unifi-wifi/SKILL.md b/.claude/skills/unifi-wifi/SKILL.md index 162feb3..eed1fcb 100644 --- a/.claude/skills/unifi-wifi/SKILL.md +++ b/.claude/skills/unifi-wifi/SKILL.md @@ -27,7 +27,14 @@ controller knows and making prioritized, validated changes. Built for any site; ``` Outputs 2.4/5/6 config summary, the per-channel neighbor-density (interference) map, and flagged issues (2.4 over-provisioning, 40/80/160MHz width, off-1/6/11 channels, min-RSSI off, high power). -2. **Interpret** the flags against `methodology.md` (fix order: prune 2.4 -> shrink cells/power -> +2. **Rank airtime-reduction candidates** — which radios to disable / power down, from real history + (`ace_stat`: per-AP airtime `cu_total`/`cu_interf`/`num_sta` + the `wifi_connectivity_event` roam + graph). Works for any band: + ```bash + bash .claude/skills/unifi-wifi/scripts/model-rank.sh [days=7] [band=ng|na|6e|all] + ``` + (Cascades 2.4: 75 radios at 74–94% utilization, 61–81% interference, ~1 client each → disable/power-down.) +3. **Interpret** the flags against `methodology.md` (fix order: prune 2.4 -> shrink cells/power -> min data rates -> manual 1/6/11 plan -> min-RSSI + roaming -> steer to 6GHz). 3. **Recommend** a prioritized, per-zone change plan. Roll out per zone, not site-wide at once. diff --git a/.claude/skills/unifi-wifi/references/data-access.md b/.claude/skills/unifi-wifi/references/data-access.md index d6de6c0..458d610 100644 --- a/.claude/skills/unifi-wifi/references/data-access.md +++ b/.claude/skills/unifi-wifi/references/data-access.md @@ -10,8 +10,12 @@ the **Cascades** site (`site_id 685f39068e65331c46ef6dd2`) as the hard case (77 | Plane | Source | Reach | Holds | |---|---|---|---| -| **Config + history** | Mongo `ace` via `uos-mongo.sh` | root SSH, fully available now | radio config, the interference map, channel-plan settings, AP/client inventory | -| **Live RF/airtime** | Controller Network API (`stat/device`, `stat/sta`) | needs a session / integration key — NOT yet wired | current channel utilization, per-client RSSI/retries/tx-rate, AP satisfaction, num_sta | +| **Config** | Mongo `ace` via `uos-mongo.sh` | root SSH, available now | radio config, foreign-interference map (`rogue`), channel-plan, device/floorplan | +| **History** | Mongo `ace_stat` (`db.getSiblingDB('ace_stat')`) | root SSH, available now | per-AP/per-band airtime time-series (`-cu_total/cu_interf/num_sta`, satisfaction, retries) in `stat_hourly`/`stat_daily`; the **roam graph** in `wifi_connectivity_event`; per-client history | +| **Live (optional)** | Controller Network API (`stat/device`, `stat/sta`) | needs a session / integration key — not wired | *current* utilization + the live RF-neighbor table; nice for before/after validation | + +> The accumulated **history plane (`ace_stat`)** is the key source for the interference / airtime +> model — it already holds what the UniFi UI shows. See [interference-model.md](interference-model.md). The live per-AP utilization and per-client RF stats are **NOT persisted in Mongo** (the `device` collection carries config but no `radio_table_stats`; the `user`/client collection only keeps diff --git a/.claude/skills/unifi-wifi/references/interference-model.md b/.claude/skills/unifi-wifi/references/interference-model.md index cc8195e..e45344c 100644 --- a/.claude/skills/unifi-wifi/references/interference-model.md +++ b/.claude/skills/unifi-wifi/references/interference-model.md @@ -1,61 +1,66 @@ -# AP interference / airtime-reduction model — design + data feasibility +# AP interference / airtime-reduction model — design + data Goal (per Mike): a fleet/site-level model that decides **which AP radios to disable, and where to reduce power**, to cut total airtime contention while preserving client coverage. Per AP **per -radio, all bands** (not 2.4-only, not per-client). Inputs: each AP's view of neighboring APs (RF) -+ historical client connections. +radio, all bands** (not 2.4-only, not per-client). Inputs: each AP's neighbor/overlap relationships ++ historical client connections + airtime. -## Data feasibility (probed on Cascades 2026-06-15) -| Signal the model wants | In Mongo `ace`? | Source to use | +## Data — it's ALL in the controller (corrected 2026-06-15) +First pass only checked the `ace` DB (config/current state) and wrongly concluded the history +"isn't there." It is — in **`ace_stat`** (1.5 GB of accumulated time-series) and the +`wifi_connectivity_event` collection. **No external collector is needed**; the controller already +retains it. Three databases on the UOS Mongo (port 27117): + +| DB | Holds | Use | |---|---|---| -| **Our AP ↔ our AP RF visibility** (A hears B at RSSI r) | **NO** — `rogue` is FOREIGN APs only; our managed APs are filtered out (0 rows match our SSIDs) | **Live Network API** `stat/device` neighbor table / triggered RF scan (Plane 2) | -| **Historical client→AP connections / roam overlap** | **NO** — `user` keeps only `last_uplink_mac` (last AP); no sessions, `alarm` empty, `stat` collections empty | **Accumulated `stat/sta` polling** over time (Plane 2 + a collector) | -| **Physical AP coordinates** | **NO** — 0 APs placed on the 1 floorplan | derive coarse topology from **AP names** (room#/floor encoded) | -| Radio config (channel/band/width/power/min_rssi) | **YES** | Mongo `device.radio_table` (Plane 1) | -| Foreign interference per channel | **YES** | Mongo `rogue` aggregate (Plane 1) | +| `ace` | config + current state | radio_table (channel/band/width/power/min_rssi), `rogue` (FOREIGN interference), `channelplan`, device/floorplan | +| **`ace_stat`** | **time-series history** | the model's airtime + client history (below) | +| `ace_audit` | audit log | — | -**Conclusion:** the interference graph the model needs (our-AP mutual RSSI + client overlap) -**cannot** be built from Mongo. It requires **Plane 2 (the live Network API)** plus a **collector** -that accumulates snapshots over time. Mongo gives config + foreign-interference + (via names) a -coarse topology prior to seed the model before enough live data is collected. +### `ace_stat.stat_hourly` / `stat_daily` (o:'ap') — per-AP per-band airtime history +Flat per-band fields (`ng`=2.4, `na`=5, `6e`=6): **`-cu_total`** (channel utilization %), +**`-cu_interf`** (airtime lost to interference %), **`-cu_self_rx/tx`**, **`-num_sta`**, +`-satisfaction`, `-tx_retries`. Plus `ap` (AP mac), `site_id`, `time` (ms epoch). This +is the historical airtime/interference/load profile per AP per band — the core "who's contending" +signal. (Also o:'user' per-client retries/anomalies; o:'site' rollups.) -## Model design -Per band `b` (ng/na/6e), build a weighted graph over AP radios: +### `ace_stat.wifi_connectivity_event` — the roam graph (empirical AP adjacency) +Each doc: `from_endpoint{mac(AP), channel, channel_width, band, rssi}`, `to_endpoint{mac(AP), ...}`, +`client_mac`, `successful`, `time`, `_class:WIFI_ROAMING`. Every roam is an **edge between two APs a +real client handed off between** → empirical coverage overlap/adjacency, weighted by volume, with +the handoff RSSI (coverage quality). This is the "historical connections → which APs cover the same +space" signal, and it's better than a raw RF scan because it reflects where clients actually move. -- **Nodes:** each AP's radio on band `b`. -- **RF edges** `w_rf(A,B)`: from the live neighbor table — how strongly A hears B (and vice-versa), - scaled up when same/overlapping channel. Strong mutual RSSI on the same channel = high - co-channel interference. -- **Overlap edges** `w_ov(A,B)`: fraction of clients that have associated with BOTH A and B over - the collection window (built by snapshotting `stat/sta` every N minutes). High overlap = they - cover the same space → one is redundant. -- **Per-radio metrics:** `load` (num_sta, live `cu_total`), `unique_coverage` (clients only this - radio serves at good RSSI), `interference_contribution` (Σ strong RF edges on same channel). +### Still useful from `ace` / setup +- `rogue` aggregate = FOREIGN co-channel interference per channel (Plane-1 audit). +- **Floorplan coords**: not set today (0/77 APs placed). Mike is willing to use the floorplan + feature → once APs are placed, `device` x/y gives true physical distance for adjacency edges + (complements the roam graph, esp. for APs that rarely roam-share). +- Live Network API (`stat/device`/`stat/sta`) still adds *current* utilization + the AP's live + RF-neighbor table; nice-to-have for before/after validation, not required to build the model. -**Recommendation logic (greedy, coverage-safe):** -1. **Disable a radio** when: high `interference_contribution` AND high `coverage_redundancy` - (its clients keep good signal from neighbors) AND `unique_coverage ≈ 0`. Disable the worst - offender, recompute the graph, repeat until a redundancy floor is hit (don't open holes). -2. **Reduce power** when interference is high but `unique_coverage > 0` (can't disable without a - hole) — shrink the cell to cut contention while keeping coverage. -3. **Leave** radios that carry unique coverage and contribute little interference. -Band weighting: 2.4 prunes most aggressively (most redundant + least capacity value); 5/6 lighter; -6GHz usually keep (clean band, steer up). Output = ranked per-AP-per-radio actions with the metric -that justified each, applied **per zone** with live before/after validation. +## Model +Per band `b`, over a history window (default 7d): +- **Per-radio airtime**: avg `cu_total`, avg `cu_interf`, avg `num_sta` from `stat_hourly`. +- **Overlap/redundancy**: roam volume per AP and per AP-pair from `wifi_connectivity_event` (high + roam-share to neighbors ⇒ a client leaving this radio has somewhere to land ⇒ redundant). +- **Score (v1, `model-rank.sh`)**: `(cu_total + cu_interf) * log(1+roams) / (clients+1)` — high = + this radio burns airtime in a contended band AND its clients are mobile/redundant ⇒ shrink or + disable. Hint: **DISABLE** when high roam + near-zero clients; **POWER-DOWN** when busy but holds + some unique load. +- **v2 (greedy, coverage-safe)**: iteratively disable the top candidate, recompute overlap so we + never open a hole (stop when a radio's clients would lose their only good-RSSI neighbor), and add + floorplan-distance edges once APs are placed. Output = ranked per-AP-per-radio actions with the + metric that justified each, applied per zone with before/after validation. -## Prerequisites to build it (the real next step) -1. **Wire Plane 2** — provision a dedicated **read-only UniFi admin or Network integration API key** - on `.29` (doable with our root SSH), vault as `infrastructure/uos-server-network-api`. Gives - `stat/device` (live neighbor RSSI, `cu_total`, `num_sta`, satisfaction) + `stat/sta` (client→AP). -2. **Stand up a collector** — a periodic job (cron on `.30`/a fleet host) snapshotting - `stat/device` + `stat/sta` into a small store (sqlite/postgres). The **overlap + RF matrix - accrue over the collection window** (a week+ gives a usable model; longer = better). This is the - "historical look at devices connected" Mike asked for — the controller doesn't retain it, so we - accumulate it ourselves. -3. **Build the model** on the accumulated data; seed early recommendations from the Mongo config + - AP-name topology prior until enough live data exists. +## v1 result (Cascades 2.4GHz, 7d) — the smoking gun +`model-rank.sh cascades 7 ng`: 2.4 radios run **cu_total 74–94%, cu_interf 61–81%, ~0.3–2.6 clients +each** across 75 APs. Translation: 2.4 is saturated and mostly interference, serving almost no one — +a textbook case to disable 2.4 on most APs and power down the rest. Run `na`/`6e` for the 5/6GHz +picture (expected: keep, with 6GHz the clean capacity band). ## Status -Phase 1 (config + foreign-interference audit) is built (`scripts/audit-site.sh`). The interference -model is **blocked on Plane 2 + the collector** — needs a go to provision the UniFi API account and -stand up the collector. +- `scripts/audit-site.sh` — config + foreign-interference audit (Plane 1). +- `scripts/model-rank.sh` — **v1 airtime-reduction ranker from real history** (this doc). Works now. +- Next: v2 greedy coverage-safe optimizer + floorplan-distance edges (after APs are placed on the + floorplan) + optional live-API before/after validation. diff --git a/.claude/skills/unifi-wifi/scripts/model-rank.sh b/.claude/skills/unifi-wifi/scripts/model-rank.sh new file mode 100644 index 0000000..2c0a7c4 --- /dev/null +++ b/.claude/skills/unifi-wifi/scripts/model-rank.sh @@ -0,0 +1,63 @@ +#!/usr/bin/env bash +# model-rank.sh — rank AP radios as airtime-reduction candidates from ACCUMULATED history. +# Data (all from ace_stat, already collected by the controller — no new collector needed): +# - airtime/interference: stat_hourly (o:'ap') -cu_total, -cu_interf, -num_sta +# - coverage overlap: wifi_connectivity_event (client roams between AP pairs) +# Per band (ng/na/6e). A radio is a strong DISABLE candidate when it carries high interference +# airtime AND its clients heavily roam to other APs (redundant coverage); a POWER-DOWN candidate +# when busy/interfering but with less roam redundancy. This is a v1 ranker, not the final greedy +# optimizer — see references/interference-model.md. +# +# Usage: bash .claude/skills/unifi-wifi/scripts/model-rank.sh [days=7] [band=ng|na|6e|all] +set -euo pipefail +REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)" +UOS="$REPO/.claude/scripts/uos-mongo.sh" +arg="${1:?usage: model-rank.sh [days] [band]}"; DAYS="${2:-7}"; BAND="${3:-ng}" +if [[ "$arg" =~ ^[0-9a-f]{24}$ ]]; then SITE="$arg"; else + SITE="$(bash "$UOS" --sites 2>/dev/null | grep -vi 'pq.html' | grep -i "$arg" | awk '{print $1}' | head -1)" + [ -n "$SITE" ] || { echo "[ERROR] no site matching '$arg'"; exit 1; } +fi +echo "[INFO] site=$SITE window=${DAYS}d band=$BAND" + +cat <&1 | grep -viE 'pq.html|post-quantum|store now|server may need' +var SITE='$SITE', DAYS=$DAYS, BAND='$BAND'; +var ace=db.getSiblingDB('ace'), st=db.getSiblingDB('ace_stat'); +var since = new Date().getTime() - DAYS*86400000; +var bands = (BAND=='all') ? ['ng','na','6e'] : [BAND]; +// ap mac -> name +var name={}; ace.device.find({site_id:SITE,type:'uap'},{mac:1,name:1}).forEach(function(a){name[a.mac]=a.name||a.mac;}); +// roam volume per AP (coverage-overlap proxy: high roam = redundant neighbors exist) +var roam={}; +st.wifi_connectivity_event.find({site_id:SITE, time:{\$gte:since}},{from_endpoint:1,to_endpoint:1}).forEach(function(e){ + [e.from_endpoint&&e.from_endpoint.mac, e.to_endpoint&&e.to_endpoint.mac].forEach(function(m){ if(m) roam[m]=(roam[m]||0)+1; }); +}); +// airtime profile per AP per band from stat_hourly +var prof={}; +st.stat_hourly.find({o:'ap', site_id:SITE, time:{\$gte:since}}).forEach(function(d){ + var ap=d.ap; if(!ap) return; if(!prof[ap])prof[ap]={}; + bands.forEach(function(b){ + var cu=d[b+'-cu_total'], intf=d[b+'-cu_interf'], sta=d[b+'-num_sta']; + if(cu==null && intf==null) return; + if(!prof[ap][b])prof[ap][b]={cu:0,intf:0,sta:0,n:0}; + var p=prof[ap][b]; p.cu+=(cu||0); p.intf+=(intf||0); p.sta+=(sta||0); p.n++; + }); +}); +bands.forEach(function(b){ + var rows=[]; + for(var ap in prof){ var p=prof[ap][b]; if(!p||!p.n) continue; + var avgCu=p.cu/p.n, avgIntf=p.intf/p.n, avgSta=p.sta/p.n, rm=roam[ap]||0; + // score: airtime pressure (cu+interf) weighted, * redundancy(roam) / (load+1) + var score = (avgCu + avgIntf) * Math.log(1+rm) / (1+avgSta); + rows.push({ap:ap, name:name[ap]||ap, cu:avgCu, intf:avgIntf, sta:avgSta, roam:rm, score:score}); + } + rows.sort(function(a,b){return b.score-a.score;}); + print("\\n==== band="+b+" top airtime-reduction candidates (disable/power-down) ===="); + print(" rank AP cu% interf% ~clients roams score hint"); + rows.slice(0,15).forEach(function(r,i){ + var hint = (r.roam>50 && r.sta<3) ? "DISABLE (redundant, low load)" : (r.cu+r.intf>40 ? "POWER-DOWN (busy)" : "review"); + print(" "+(i+1)+"\\t"+(r.name.substring(0,24)+" ").substring(0,24)+" "+r.cu.toFixed(0)+"\\t"+r.intf.toFixed(0)+"\\t"+r.sta.toFixed(1)+"\\t"+r.roam+"\\t"+r.score.toFixed(1)+"\\t"+hint); + }); + print(" (APs profiled on "+b+": "+rows.length+")"); +}); +print("\\n[note] v1 heuristic: score = (cu_total + cu_interf) * log(1+roams) / (clients+1). High = busy+interfered AND clients have somewhere else to roam = safe to shrink/disable. Validate per-zone before applying. Full greedy coverage-safe optimizer = v2 (interference-model.md)."); +JS