sync: auto-sync from GURU-5070 at 2026-06-15 18:24:52

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-15 18:24:52
This commit is contained in:
2026-06-15 18:25:08 -07:00
parent 00de88fd38
commit ca548687e7
4 changed files with 244 additions and 1 deletions

View File

@@ -34,7 +34,24 @@ controller knows and making prioritized, validated changes. Built for any site;
bash .claude/skills/unifi-wifi/scripts/model-rank.sh <site> [days=7] [band=ng|na|6e|all]
```
(Cascades 2.4: 75 radios at 7494% utilization, 6181% interference, ~1 client each → disable/power-down.)
3. **Interpret** the flags against `methodology.md` (fix order: prune 2.4 -> shrink cells/power ->
3. **Optimize (coverage-safe plan)** — which radios to **power-down** (do first) vs **disable**
(after), per band, without opening coverage holes or capacity cascades:
```bash
bash .claude/skills/unifi-wifi/scripts/optimize-radios.sh <site> [days=14] [band=ng|na|6e]
```
Coverage model = the **roam graph** (materials-aware: clients can't roam through Cascades' steel
hallway walls, so the model never calls cross-wall APs redundant). Multi-model hardened
(bidirectional roams, load-shift simulation, 40%/zone cap, stepwise). On Cascades 2.4 it
recommends power-down on 74/75 radios (safe big win) and 0 disables until the live RF table proves
redundancy — see interference-model.md.
4. **Validate live (Plane 2)** — current cu_total/satisfaction/per-AP RF, before+after a change, and
the AP-to-AP RF-neighbor table that unlocks confident disables:
```bash
bash .claude/skills/unifi-wifi/scripts/live-stats.sh <site> [--clients]
```
Needs a read-only controller admin vaulted at `infrastructure/uos-server-network-api` (the script
prints the one-time provisioning steps if it's missing).
5. **Interpret** the flags against `methodology.md` (fix order: prune 2.4 -> shrink cells/power ->
min data rates -> manual 1/6/11 plan -> min-RSSI + roaming -> steer to 6GHz).
3. **Recommend** a prioritized, per-zone change plan. Roll out per zone, not site-wide at once.

View File

@@ -59,6 +59,37 @@ each** across 75 APs. Translation: 2.4 is saturated and mostly interference, ser
a textbook case to disable 2.4 on most APs and power down the rest. Run `na`/`6e` for the 5/6GHz
picture (expected: keep, with 6GHz the clean capacity band).
## Materials matter — and why the roam graph already handles them
Cascades has **reinforced-concrete + steel-sheet walls across the hallways** that kill signal on all
bands (rooms either side of a hall, and floor-to-floor, behave normally; **across a hall is nearly
opaque**). This is why **geometric distance must never be the redundancy signal on its own**: two APs
on opposite sides of a hall are *close* but RF-*isolated*. The **roam graph is materials-aware by
construction** — clients can't roam across the steel wall, so those APs simply never appear as
neighbors, and the model won't call them redundant. Corollaries baked into the design:
- Distance/floorplan (once APs are placed) is only a **prior**; an AP pair counts as redundant **only
with RF/roam evidence** (bidirectional roams at adequate p25 RSSI). Never disable on proximity alone.
- AP **room parity (odd/even)** ≈ hallway side at Cascades — a useful prior to avoid pairing across-hall APs.
- The **live AP-to-AP RF neighbor table (API wireup)** *directly measures* this attenuation — the gold
standard for the coverage graph, and the reason it's worth wiring.
## Hardening from the multi-model review (Grok + Gemini)
Applied: bidirectional roam requirement (not one-way "escape"); band-specific p25 RSSI bars
(2.4 68 / 5 72 / 6 75); **load-shift simulation** (only disable if a strong neighbor stays < 85%
after absorbing the cu_self it inherits — avoids "capacity cascades"); benefit = `cu_interf` (+ a
normalized `tx_retries` thrash term), with `cu_self` treated as transfer cost; 40%/zone disable cap;
**stepwise** output (power-down → observe → disable). `tx_retries` is a raw count → normalized by
`wifi_tx_attempts`.
## Key empirical finding (Cascades): power-down now, disable later
The **airtime data is rich and unambiguous**`optimize-radios.sh cascades ng` recommends
**power-down on 74/75 2.4 radios** (all 7494% utilized, 6181% interference, ~1 client — the safe big
win; power-down keeps the BSSID). But the **roam data is too sparse** to prove coverage redundancy for
**disables** (almost no AP has a strong bidirectional roam-neighbor), so the optimizer correctly
recommends **0 disables until coverage evidence exists**. That evidence is exactly the **live AP-to-AP
RF neighbor table** → **the API wireup is the enabler for confident radio *disables*** (and for
before/after validation). So the rollout is: Phase A power-down (now) → Phase B re-measure (live API)
→ Phase C disable the radios the RF table proves redundant.
## Status
- `scripts/audit-site.sh` — config + foreign-interference audit (Plane 1).
- `scripts/model-rank.sh`**v1 airtime-reduction ranker from real history** (this doc). Works now.

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env bash
# live-stats.sh — Plane-2 live RF/airtime from the UOS Network API (classic session API).
# Gives CURRENT per-AP per-radio cu_total / cu_self / num_sta / satisfaction / tx_retries and the
# AP RF-neighbor table — for before/after validation of changes and (the neighbor table) the
# materials-aware AP-to-AP coverage graph that unlocks confident radio DISABLES.
#
# AUTH (provision once): the classic API needs a controller admin session. Create a dedicated
# READ-ONLY admin in the UniFi UI (OS Settings -> Admins -> add a Viewer), then vault it:
# bash .claude/skills/vault/scripts/vault-helper.sh new infrastructure/uos-server-network-api \
# --kind generic --name "UOS Network API (read-only admin)" --tag unifi \
# --set username=<viewer> --set password=<pw>
# (A UniFi Network Integration API key also works for /proxy/network/integration/v1, but the rich
# radio_table_stats + RF-neighbor data live in the classic /proxy/network/api path used here.)
#
# Usage: bash .claude/skills/unifi-wifi/scripts/live-stats.sh <site-name|short> [--clients]
set -euo pipefail
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
VAULT="$REPO/.claude/scripts/vault.sh"
HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}"
SITEARG="${1:?usage: live-stats.sh <site-name|short> [--clients]}"; WANT_CLIENTS="${2:-}"
U="$(bash "$VAULT" get-field infrastructure/uos-server-network-api credentials.username 2>/dev/null || true)"
P="$(bash "$VAULT" get-field infrastructure/uos-server-network-api credentials.password 2>/dev/null || true)"
if [ -z "$U" ] || [ -z "$P" ]; then
echo "[BLOCKED] No controller credential vaulted yet. Provision a read-only admin and vault it:"
sed -n '8,14p' "$0"
exit 2
fi
CJ="$(mktemp)"; trap 'rm -f "$CJ"' EXIT
base="https://$HOST:$PORT"
# UniFi OS login -> session cookie
code=$(curl -sk -c "$CJ" -o /dev/null -w '%{http_code}' -X POST "$base/api/auth/login" \
-H 'Content-Type: application/json' --data-binary "$(python -c 'import json,os;print(json.dumps({"username":os.environ["U"],"password":os.environ["P"]}))' U="$U" P="$P")")
[ "$code" = "200" ] || { echo "[ERROR] login HTTP $code"; exit 1; }
# resolve site short name (classic API keys on the short name, not the _id)
SHORT="$SITEARG"
if [[ "$SITEARG" =~ ^[0-9a-f]{24}$ || ! "$SITEARG" =~ ^[a-z0-9]{8}$ ]]; then
SHORT="$(curl -sk -b "$CJ" "$base/proxy/network/api/self/sites" | python -c "
import sys,json
d=json.load(sys.stdin).get('data',[])
q='''$SITEARG'''.lower()
for s in d:
if s.get('_id')=='''$SITEARG''' or q in (s.get('desc','').lower()): print(s.get('name')); break
" 2>/dev/null)"
fi
[ -n "$SHORT" ] || { echo "[ERROR] could not resolve site '$SITEARG'"; exit 1; }
echo "[INFO] site short=$SHORT"
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" | python -c "
import sys,json
for d in json.load(sys.stdin).get('data',[]):
if d.get('type')!='uap': continue
print('AP',d.get('name'),'clients=',d.get('num_sta'))
for r in d.get('radio_table_stats',[]):
print(' ',r.get('radio'),'ch',r.get('channel'),'cu_total',r.get('cu_total'),'cu_self_rx',r.get('cu_self_rx'),'cu_self_tx',r.get('cu_self_tx'),'num_sta',r.get('num_sta'),'tx_retries',r.get('tx_retries'),'satisfaction',r.get('satisfaction'))
# RF neighbor table (materials-aware AP-to-AP visibility) if present
for n in (d.get('radio_table') or []):
pass
" 2>&1 | head -60
if [ "$WANT_CLIENTS" = "--clients" ]; then
echo "=== clients (rssi/rate/retries) ==="
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/sta" | python -c "
import sys,json
for c in json.load(sys.stdin).get('data',[])[:40]:
print(' ',c.get('hostname') or c.get('mac'),'ap',c.get('ap_mac'),'rssi',c.get('rssi'),'signal',c.get('signal'),'tx_rate',c.get('tx_rate'),'retries',c.get('tx_retries'),'sat',c.get('satisfaction'))
" 2>&1 | head -45
fi

View File

@@ -0,0 +1,125 @@
#!/usr/bin/env bash
# optimize-radios.sh — capacity-aware, coverage-safe radio-pruning plan from accumulated history.
# Recommends, per band, which AP radios to POWER-DOWN (do first) and which to DISABLE (after
# re-measure), to cut airtime contention WITHOUT opening coverage holes or causing capacity
# cascades. Recommendations only; a human applies per zone with before/after validation.
#
# Design hardened via a multi-model critique (Grok + Gemini). Key principles:
# - Coverage proxy = the ROAM GRAPH, but a neighbor only counts if roaming is BIDIRECTIONAL
# (min(A->B,B->A) >= ROAM_MIN) and the p25 handoff RSSI clears a BAND-SPECIFIC bar
# (2.4=-68, 5=-72, 6=-75) — proves two-way overlap, not a one-way escape from a failing AP.
# - LOAD-SHIFT SIMULATION: disabling A pushes A's own-traffic (cu_self) onto its neighbors; only
# disable if a strong neighbor stays under CAP (85%) after absorbing it — else POWER-DOWN.
# - Benefit = cu_interf (the airtime you actually REMOVE) + a tx_retries (thrashing) weight;
# cu_self is transfer cost, not removal. An AP with no strong neighbor AND high peak load is
# ISOLATED-ESSENTIAL (keep). Zone disable cap = 40%. Stepwise: power-down -> observe -> disable.
#
# Data: ace_stat.stat_hourly (per-AP/band cu_total,cu_interf,cu_self_*,num_sta,tx_retries,satisfaction)
# + ace_stat.wifi_connectivity_event (directional roam edges + handoff RSSI).
# Usage: bash .../optimize-radios.sh <site> [days=14] [band=ng|na|6e]
# env: ROAM_MIN(4) CAP(85) ZONE_DISABLE_PCT(40) REDUN_NG(2) REDUN_OTHER(1)
set -euo pipefail
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
UOS="$REPO/.claude/scripts/uos-mongo.sh"
arg="${1:?usage: optimize-radios.sh <site> [days] [band]}"; DAYS="${2:-14}"; BAND="${3:-ng}"
ROAM_MIN="${ROAM_MIN:-4}"; CAP="${CAP:-85}"; ZPCT="${ZONE_DISABLE_PCT:-40}"
REDUN_NG="${REDUN_NG:-2}"; REDUN_OTHER="${REDUN_OTHER:-1}"
if [[ "$arg" =~ ^[0-9a-f]{24}$ ]]; then SITE="$arg"; else
SITE="$(bash "$UOS" --sites 2>/dev/null | grep -vi 'pq.html' | grep -i "$arg" | awk '{print $1}' | head -1)"
[ -n "$SITE" ] || { echo "[ERROR] no site matching '$arg'"; exit 1; }
fi
case "$BAND" in ng) RSSI_OK=-68; REDUN=$REDUN_NG;; na) RSSI_OK=-72; REDUN=$REDUN_OTHER;; 6e) RSSI_OK=-75; REDUN=$REDUN_OTHER;; *) echo "band must be ng|na|6e"; exit 1;; esac
echo "[INFO] site=$SITE band=$BAND window=${DAYS}d rssi>=$RSSI_OK roam>=$ROAM_MIN cap=${CAP}% need_neighbors=$REDUN zone_cap=${ZPCT}%"
cat <<JS | bash "$UOS" 2>&1 | grep -viE 'pq.html|post-quantum|store now|server may need'
var SITE='$SITE',DAYS=$DAYS,BAND='$BAND',RSSI_OK=$RSSI_OK,ROAM_MIN=$ROAM_MIN,CAP=$CAP,ZPCT=$ZPCT,REDUN=$REDUN;
var ace=db.getSiblingDB('ace'),st=db.getSiblingDB('ace_stat');
var since=new Date().getTime()-DAYS*86400000;
// identity + zone
var name={},zone={};
ace.device.find({site_id:SITE,type:'uap'},{mac:1,name:1}).forEach(function(a){
name[a.mac]=a.name||a.mac;
var fz=String(a.name||'').match(/(\d)(?:st|nd|rd|th)\s*floor/i), rm=String(a.name||'').match(/\b(\d)\d{2}\b/);
zone[a.mac]=fz?('Floor '+fz[1]):(rm?('Floor '+rm[1]):'misc');
});
// airtime profile (avg + peak) for the band
var prof={};
st.stat_hourly.find({o:'ap',site_id:SITE,time:{\$gte:since}}).forEach(function(d){
var ap=d.ap; if(!ap||name[ap]===undefined) return;
var cu=d[BAND+'-cu_total'],intf=d[BAND+'-cu_interf'],self=(d[BAND+'-cu_self_rx']||0)+(d[BAND+'-cu_self_tx']||0),
sta=d[BAND+'-num_sta'],retr=d[BAND+'-tx_retries'],att=d[BAND+'-wifi_tx_attempts'],sat=d[BAND+'-satisfaction'];
if(cu==null&&intf==null) return;
if(!prof[ap])prof[ap]={cu:0,intf:0,self:0,sta:0,staPk:0,retr:0,att:0,sat:0,satN:0,n:0};
var p=prof[ap]; p.cu+=(cu||0);p.intf+=(intf||0);p.self+=self;p.sta+=(sta||0);p.staPk=Math.max(p.staPk,sta||0);
p.retr+=(retr||0);p.att+=(att||0); if(typeof sat==='number'){p.sat+=sat;p.satN++;} p.n++;
});
// tx_retries is a COUNT, not a %, so normalize by tx attempts; satisfaction averaged over its own samples.
for(var k in prof){var p=prof[k];['cu','intf','self','sta'].forEach(function(f){p[f]/=p.n;});
p.retrPct=p.att>0?Math.min(100,100*p.retr/p.att):0; p.sat=p.satN>0?p.sat/p.satN:100;}
// directional roam edges + RSSI
var dir={}; // "A>B"->count ; rs["A>B"]->[rssi to B]
var rs={};
st.wifi_connectivity_event.find({site_id:SITE,time:{\$gte:since}},{from_endpoint:1,to_endpoint:1}).forEach(function(e){
var A=e.from_endpoint&&e.from_endpoint.mac,B=e.to_endpoint&&e.to_endpoint.mac; if(!A||!B||A===B)return;
var key=A+'>'+B; dir[key]=(dir[key]||0)+1; var r=e.to_endpoint&&e.to_endpoint.rssi;
if(typeof r==='number'){(rs[key]=rs[key]||[]).push(r);}
});
function p25(a){if(!a||!a.length)return -127;a=a.slice().sort(function(x,y){return x-y;});return a[Math.floor(a.length*0.25)];}
// strong bidirectional neighbors per AP (band-agnostic adjacency; physical overlap is band-independent)
var strong={};
Object.keys(prof).forEach(function(A){ strong[A]={};
Object.keys(prof).forEach(function(B){ if(A===B)return;
var ab=dir[A+'>'+B]||0, ba=dir[B+'>'+A]||0;
if(Math.min(ab,ba)>=ROAM_MIN && p25((rs[A+'>'+B]||[]).concat(rs[B+'>'+A]||[]))>=RSSI_OK) strong[A][B]=1;
});
});
// greedy capacity-aware disable
var aps=Object.keys(prof), active={}; aps.forEach(function(a){active[a]=true;});
var projCu={}; aps.forEach(function(a){projCu[a]=prof[a].cu;}); // projected utilization (grows as we shift load)
var zoneTot={},zoneDis={}; aps.forEach(function(a){zoneTot[zone[a]]=(zoneTot[zone[a]]||0)+1;});
function activeStrong(a){var o=strong[a]||{},r=[];for(var n in o)if(active[n])r.push(n);return r;}
function absorbers(a){ // active strong neighbors that can absorb a's cu_self under CAP
return activeStrong(a).filter(function(n){return projCu[n]+prof[a].self<=CAP;});
}
function soleNeighborOf(a){ for(var x in active){if(!active[x]||x===a)continue; if(!(strong[x]||{})[a])continue;
var others=0,o=strong[x]||{}; for(var n in o)if(n!==a&&active[n])others++; if(others===0)return true;} return false; }
var disabled=[];
var go=true;
while(go){ go=false; var best=null,bestBen=-1;
aps.forEach(function(a){ if(!active[a])return; var p=prof[a];
if(activeStrong(a).length<REDUN) return; // not enough overlap -> keep
if(absorbers(a).length<1) return; // no neighbor with headroom -> power-down, not disable
if(soleNeighborOf(a)) return; // protect sole coverage
if((zoneDis[zone[a]]||0)+1 > Math.floor(zoneTot[zone[a]]*ZPCT/100)) return; // zone cap
var benefit=p.intf + 0.5*p.retrPct; // remove interference + thrashing; cu_self transfers, not removed
if(benefit>bestBen){bestBen=benefit;best=a;}
});
if(best){ var abs=absorbers(best); active[best]=false; zoneDis[zone[best]]=(zoneDis[zone[best]]||0)+1;
// shift best's own-load onto its absorbers (split)
var share=prof[best].self/abs.length; abs.forEach(function(n){projCu[n]+=share;});
disabled.push({ap:best,cover:abs.map(function(n){return name[n];})}); go=true; }
}
// classify the rest
// POWER-DOWN is always coverage-safe (keeps the BSSID, just shrinks the cell), so recommend it
// for every saturated/contended radio regardless of whether we have roam-neighbor evidence. DISABLE
// (above) is the only action that needs positive coverage evidence, hence its strict gate.
var powerdown=[],keep=[];
aps.forEach(function(a){ if(!active[a])return; var p=prof[a];
if(p.n<3 || (p.cu<5 && p.intf<5)){ keep.push({ap:a,why:'idle/off'}); return; } // already-off / idle
if(p.cu>=40 || p.intf>=35 || p.retrPct>=15 || p.sat<80) powerdown.push(a); // saturated/contended/thrashing -> shrink cell
else keep.push({ap:a,why:'low-util'});
});
function fmt(a){var p=prof[a];return name[a]+" cu="+p.cu.toFixed(0)+"% interf="+p.intf.toFixed(0)+"% self="+p.self.toFixed(0)+"% clients(avg/pk)="+p.sta.toFixed(1)+"/"+p.staPk.toFixed(0)+" retr="+p.retrPct.toFixed(0)+"% sat="+p.sat.toFixed(0);}
function byZone(list,kf){var z={};list.forEach(function(x){var a=kf(x);(z[zone[a]]=z[zone[a]]||[]).push(x);});return z;}
print("==== PLAN band="+BAND+" (APs="+aps.length+") POWER-DOWN(first)="+powerdown.length+" DISABLE(after)="+disabled.length+" KEEP="+keep.length+" ====");
var save=0;disabled.forEach(function(d){save+=prof[d.ap].intf;});
print("Phase A: POWER-DOWN the busy/thrashing radios to ~Low (smaller cells cut mutual cu_interf for the whole zone). Do this FIRST.");
var pz=byZone(powerdown,function(a){return a;});
Object.keys(pz).sort().forEach(function(z){print(" ["+z+"] "+pz[z].length);pz[z].slice(0,6).forEach(function(a){print(" "+fmt(a));}); if(pz[z].length>6)print(" ...(+"+(pz[z].length-6)+")");});
print("\nPhase B: re-measure (live-stats.sh) after power-down settles — cu_interf should drop, headroom appears.");
print("\nPhase C: DISABLE these only-when-redundant radios (each has >="+REDUN+" bidirectional good neighbors WITH headroom to absorb its load). Est. interference airtime removed: "+save.toFixed(0)+".");
if(!disabled.length) print(" (none clear the capacity+coverage bar yet — expected when every neighbor is saturated; revisit after Phase A.)");
var dz=byZone(disabled,function(d){return d.ap;});
Object.keys(dz).sort().forEach(function(z){print(" ["+z+"]");dz[z].forEach(function(d){print(" "+fmt(d.ap)+" -> covered by: "+d.cover.slice(0,3).join(', '));});});
print("\nKEEP: "+keep.length+" radios (isolated-essential or already efficient). Apply per ZONE; never >"+ZPCT+"% disabled/zone; validate before+after.");
JS