From fb835fe75644f5092fe03e1e78e57811f79b92c5 Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Fri, 19 Jun 2026 05:00:47 -0700 Subject: [PATCH] =?UTF-8?q?unifi-wifi:=20data-driven=20channel=20selection?= =?UTF-8?q?=20=E2=80=94=20add=20survey-report,=20kill=20non-DFS=20bias?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Codifies the scan-first/data-driven workflow proven on Cascades (where the baked-in non-DFS bias picked the congested channels and a data-driven DFS plan halved 5GHz retry): - NEW survey-report.py: rolls survey-collect JSON into the fleet per-channel/per-band-group measured busy% table + cleanest/dirtiest ranking + a suggested clean 40MHz palette. The decision-driver that was missing (we built it by hand). - channel-plan.sh: na palette is now DATA-DRIVEN, not hardcoded non-DFS. Adds --channels (explicit palette) + --dfs ok|avoid|only; default considers ALL 40MHz primaries and lets measured busy% choose. Adds load-balancing + a local-search pass -> strong co-channel to 0. - survey-collect.sh: per-AP "cleanest" report no longer pre-filters out DFS (DFS is usually cleanest here); marks DFS with *, points at survey-report. - SKILL.md: documents the mandatory scan -> survey-report -> channel-plan --channels -> apply -> validate order + the Cascades lesson. Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/skills/unifi-wifi/SKILL.md | 28 ++++++- .../skills/unifi-wifi/scripts/channel-plan.sh | 50 ++++++++--- .../unifi-wifi/scripts/survey-collect.sh | 10 ++- .../unifi-wifi/scripts/survey-report.py | 83 +++++++++++++++++++ 4 files changed, 153 insertions(+), 18 deletions(-) create mode 100644 .claude/skills/unifi-wifi/scripts/survey-report.py diff --git a/.claude/skills/unifi-wifi/SKILL.md b/.claude/skills/unifi-wifi/SKILL.md index 1f2a58e5..14b3e1f0 100644 --- a/.claude/skills/unifi-wifi/SKILL.md +++ b/.claude/skills/unifi-wifi/SKILL.md @@ -255,14 +255,34 @@ Closing an internet-facing PPTP usually = `pf-set-ports VPN 80,443` (drop tcp 17 admin REST (`rest/portforward|firewallrule|firewallgroup`). `block-ips` clones an existing WAN_IN rule's schema for firmware compatibility — verify the new rule's precedence in the UI. Dry-run validated 2026-06-16 on Grabb & Durando (USG-3P): identified the live `VPN` forward (80,443,1723→.200) + `GRE` WAN_IN accept. +**Channel data-gathering — the MANDATORY scan-first workflow (do this BEFORE any channel change).** +Choosing channels without the measured scan is how you pick the *congested* channels. Order: +```bash +# 1. AP-to-AP SNR neighbor matrix (who-hears-who = the co-channel graph) +NBR_JSON=.claude/tmp/-nbr.json neighbor-collect.sh +# 2. measured per-channel busy%/noise for EVERY AP (iw survey) -> SURVEY_JSON +SURVEY_JSON=.claude/tmp/-survey.json survey-collect.sh # be patient: run to 74/74, not partial +# 3. FLEET congestion analysis -> the data that makes the channel choice (incl. DFS-vs-non-DFS) +python scripts/survey-report.py .claude/tmp/-survey.json na # per-channel busy% + suggested clean palette +``` +`survey-report.py` is the **decision-driver**: it rolls the survey up into a per-channel / per-band-group +measured busy% table, ranks cleanest↔dirtiest, and prints a suggested clean 40MHz palette to feed +channel-plan via `--channels`. **DFS channels are usually the cleanest** (consumer gear avoids them) — do +NOT assume non-DFS; let this report decide (then weigh the radar-vacate tradeoff via `dfs-check.sh`). + **Channel plan — `scripts/channel-plan.sh`** (computes + applies a co-channel-minimizing plan): ```bash NEIGHBOR_JSON=...nbr.json SURVEY_JSON=...survey.json \ - channel-plan.sh ng|na [--apply] # ng: 1/6/11 graph-color; na: cleanest NON-DFS + separation + channel-plan.sh ng|na [--channels 52,60,100,108,116,124,132,140] [--dfs ok|avoid|only] [--apply] ``` -ng uses the neighbor matrix to graph-color 1/6/11; na picks each AP's lowest-cost non-DFS channel -(measured busy% + neighbor-collision penalty). Reports co-channel pairs before/after. (Cascades dry-run: -ng 92→35 pairs; na 20→0 + all off DFS.) `survey-collect.sh` emits its JSON via `SURVEY_JSON=`. +ng graph-colors 1/6/11. **na palette is DATA-DRIVEN, not a hardcoded non-DFS list** (that bias picked the +congested 149/157 at Cascades): pass `--channels` from survey-report, or `--dfs only|avoid`, else the +default considers ALL 40MHz primaries and the measured busy% chooses. Cost = co-channel collision (dominant) ++ load-balance + measured busy%; a local-search pass drives strong co-channel pairs to **0**. Reports +before/after. `survey-collect.sh` emits its JSON via `SURVEY_JSON=`. +**LESSON (Cascades 2026-06-19):** a blind non-DFS reshuffle moved APs onto the 2 dirtiest channels (149=12%, +157=28% busy) and did nothing; the data-driven clean-DFS plan (52/60/100/108/116/124/132/140, all ≤3% busy, +0 co-channel) **halved 5GHz retry (8.7→3.8)**. Always: scan → survey-report → channel-plan --channels → apply → validate. GOTCHA (handled): a manual min rate is only honored when `minrate_setting_preference=manual` — the script sets it; `minrate ... auto` hands rate management back to the controller. Write path validated 2026-06-16 on a 0-client WLAN (Green Valley Computer Club) — apply->verify->restore. diff --git a/.claude/skills/unifi-wifi/scripts/channel-plan.sh b/.claude/skills/unifi-wifi/scripts/channel-plan.sh index fbaac6ca..9b238386 100644 --- a/.claude/skills/unifi-wifi/scripts/channel-plan.sh +++ b/.claude/skills/unifi-wifi/scripts/channel-plan.sh @@ -15,7 +15,12 @@ REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)" UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh" HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}" SITEARG="${1:?usage: channel-plan.sh [--apply]}"; BAND="${2:?band ng|na}"; APPLY=0 -shift 2; while [ $# -gt 0 ]; do case "$1" in --apply) APPLY=1; shift;; *) shift;; esac; done +CHANS=""; DFSPOL="" +shift 2; while [ $# -gt 0 ]; do case "$1" in + --apply) APPLY=1; shift;; + --channels) CHANS="${2:-}"; shift 2;; # explicit palette, e.g. 52,60,100,108,116,124,132,140 + --dfs) DFSPOL="${2:-}"; shift 2;; # ok|avoid|only — data-driven policy (default: ok = all) + *) shift;; esac; done case "$BAND" in ng|na) ;; *) echo "band must be ng|na (channel planning is for 2.4/5GHz)"; exit 1;; esac NEIGHBOR_JSON="${NEIGHBOR_JSON:-}"; SURVEY_JSON="${SURVEY_JSON:-}"; SNR_MIN="${NBR_SNR_MIN:-20}" [ -n "$NEIGHBOR_JSON" ] && [ -f "$NEIGHBOR_JSON" ] || { echo "[ERROR] NEIGHBOR_JSON required (run neighbor-collect.sh with NBR_JSON=...)"; exit 1; } @@ -26,14 +31,26 @@ if [[ "$SITEARG" =~ ^[0-9a-f]{24}$ ]]; then SITE="$SITEARG"; else SITE="$(bash "$UOS" --sites 2>/dev/null | grep -vi 'pq.html' | grep -i "$SITEARG" | awk '{print $1}' | head -1)"; fi [ -n "$SITE" ] || { echo "[ERROR] site not found"; exit 1; } echo "[INFO] channel-plan site=$SITE band=$BAND mode=$([ $APPLY = 1 ] && echo APPLY || echo DRY-RUN) (neighbor=$NEIGHBOR_JSON survey=${SURVEY_JSON:-none})" -export CP_SITE="$SITE" CP_BAND="$BAND" CP_APPLY="$APPLY" CP_NBR="$NEIGHBOR_JSON" CP_SURVEY="$SURVEY_JSON" CP_SNR="$SNR_MIN" RW_U="$U" RW_P="$P" REPO +export CP_SITE="$SITE" CP_BAND="$BAND" CP_APPLY="$APPLY" CP_NBR="$NEIGHBOR_JSON" CP_SURVEY="$SURVEY_JSON" CP_SNR="$SNR_MIN" CP_CHANS="$CHANS" CP_DFS="$DFSPOL" RW_U="$U" RW_P="$P" REPO python - <<'PY' import os,sys,json,ssl,urllib.request,http.cookiejar band=os.environ['CP_BAND']; apply=os.environ['CP_APPLY']=='1'; SNR=int(os.environ['CP_SNR']) nbr=json.load(open(os.environ['CP_NBR'])) survey=json.load(open(os.environ['CP_SURVEY'])) if os.environ.get('CP_SURVEY') and os.path.exists(os.environ['CP_SURVEY']) else {} sb={'ng':'2.4','na':'5'}[band] -CH = [1,6,11] if band=='ng' else [36,40,44,48,149,153,157,161] # non-DFS only (radar-safe) +# Palette: data-driven. ng is always 1/6/11. na: --channels overrides; else --dfs policy decides. +# Default (na) is NOT a hardcoded non-DFS list anymore -- that baked-in bias picked the congested +# channels at Cascades. Run survey-report.py first and pass its suggested palette via --channels. +_chans=os.environ.get('CP_CHANS','').strip() +_dfs=os.environ.get('CP_DFS','').strip().lower() +NONDFS_NA=[36,40,44,48,149,153,157,161]; DFS_NA=[52,60,100,108,116,124,132,140]; ALL_NA=sorted(NONDFS_NA+DFS_NA) +if band=='ng': + CH=[1,6,11] +elif _chans: + CH=[int(x) for x in _chans.replace(' ','').split(',') if x] +elif _dfs=='only': CH=DFS_NA +elif _dfs=='avoid': CH=NONDFS_NA +else: CH=ALL_NA # 'ok'/default: consider ALL 40MHz primaries; measured busy% chooses H="172.16.3.29";PORT=11443;base=f"https://{H}:{PORT}" ctx=ssl.create_default_context();ctx.check_hostname=False;ctx.verify_mode=ssl.CERT_NONE cj=http.cookiejar.CookieJar();op=urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj),urllib.request.HTTPSHandler(context=ctx)) @@ -70,16 +87,29 @@ plan={} def busy(ap,ch): try: return survey.get(ap,{}).get(sb,{}).get(str(ch),50) except: return 50 +from collections import Counter +def cost(a,ch,pl): + coll=sum(1 for n in adj[a] if pl.get(n)==ch) # strong-neighbor co-channel (dominant) + load=Counter(pl.values()).get(ch,0) # balance APs across the palette + bz=busy(a,ch) if band=='na' else 0 # measured congestion (na) + return coll*1000 + load*10 + bz +# greedy: most-constrained first for a in order: - best=None;bestcost=1e9 - for ch in CH: - coll=sum(1 for n in adj[a] if plan.get(n)==ch) # strong neighbors already on this channel - cost = coll*1000 + (busy(a,ch) if band=='na' else 0) # ng: pure separation; na: separation + measured busy - if cost4} -> {c1} (strong-neighbor adj={len(adj[a])})") if len(changes)>60: print(f" ...(+{len(changes)-60} more)") diff --git a/.claude/skills/unifi-wifi/scripts/survey-collect.sh b/.claude/skills/unifi-wifi/scripts/survey-collect.sh index d2c0498a..f83cb025 100644 --- a/.claude/skills/unifi-wifi/scripts/survey-collect.sh +++ b/.claude/skills/unifi-wifi/scripts/survey-collect.sh @@ -101,7 +101,9 @@ for ln in open(sys.argv[1],encoding='utf-8',errors='replace'): elif 'channel busy time' in ln: m=re.search(r'(\d+) ms',ln); rec['busy']=int(m.group(1)) if m else 0 flush() print(f"\n==== MEASURED RF OCCUPANCY - cleanest channels per AP ({len(data)} APs) ====") -print("(in-use channel busy%, then 3 lowest-busy NON-DFS channels measured; * = DFS)\n") +print("(in-use channel busy%, then 3 lowest-busy channels measured -- ALL channels; * = DFS)") +print("NOTE: DFS channels are often the cleanest (consumer gear avoids them). Do NOT pre-filter them out;") +print(" run survey-report.py for the fleet rollup + a data-driven palette, then channel-plan --channels.\n") for ap in sorted(data): print(f"{ap}:") for b in ('2.4','5','6'): @@ -109,9 +111,9 @@ for ap in sorted(data): if not rows: continue inuse=[r for r in rows if r[2]] iu=f"ch{inuse[0][1]}={inuse[0][0]}%" if inuse else "?" - nondfs=sorted([r for r in rows if r[1] not in DFS], key=lambda r:r[0])[:3] - clean=", ".join(f"ch{c}({bz}%)" for bz,c,_,_ in nondfs) - print(f" {b}GHz in-use {iu} | cleanest non-DFS: {clean}") + cleanest=sorted(rows, key=lambda r:r[0])[:3] # ALL channels (DFS + non-DFS), * marks DFS + clean=", ".join(f"ch{c}{'*' if c in DFS else ''}({bz}%)" for bz,c,_,_ in cleanest) + print(f" {b}GHz in-use {iu} | cleanest (all): {clean}") OUT=sys.argv[2] if len(sys.argv)>2 else 'NONE' if OUT!='NONE': j={} diff --git a/.claude/skills/unifi-wifi/scripts/survey-report.py b/.claude/skills/unifi-wifi/scripts/survey-report.py new file mode 100644 index 00000000..1fb6219e --- /dev/null +++ b/.claude/skills/unifi-wifi/scripts/survey-report.py @@ -0,0 +1,83 @@ +#!/usr/bin/env python3 +# survey-report.py -- fleet-wide channel-congestion analysis from a survey-collect JSON. +# +# THE DECISION-DRIVER: turns the raw per-AP survey (SURVEY_JSON from survey-collect.sh) into the +# fleet-wide per-channel measured busy% table + per-band-group rollup + cleanest/dirtiest ranking, +# so the channel plan is chosen from MEASURED FACTS, not policy. This is what makes the +# DFS-vs-non-DFS (and any channel) call obvious instead of assumed. +# +# WHY THIS EXISTS (Cascades 2026-06-19): the skill collected the survey but never aggregated it, and +# both survey-collect's report and channel-plan's palette were hardcoded to NON-DFS. On this site the +# non-DFS channels (149/157) measured 12-28% busy while DFS measured 2-3% -- the opposite of the +# baked-in assumption. A blind non-DFS plan made things worse; the data-driven DFS plan halved 5GHz +# retry. Always run THIS before channel-plan, and feed channel-plan a palette derived from it. +# +# Usage: +# python survey-report.py [band=na|ng] +# SURVEY_JSON=.claude/tmp/-survey.json python survey-report.py - na +# +# Output: per-channel median/mean/max busy% (n APs), band-group rollup (UNII-1/UNII-2a-DFS/ +# UNII-2c-DFS/UNII-3), the cleanest + dirtiest channels, and a suggested clean 40MHz palette to +# hand to channel-plan via --channels. + +import json, os, sys, statistics as st +from collections import defaultdict + +path = sys.argv[1] if len(sys.argv) > 1 and sys.argv[1] != '-' else os.environ.get('SURVEY_JSON', '') +band = sys.argv[2] if len(sys.argv) > 2 else 'na' +if not path or not os.path.exists(path): + sys.exit("usage: survey-report.py [na|ng] (or SURVEY_JSON env). file not found.") +d = json.load(open(path)) +sb = {'na': '5', 'ng': '2.4'}[band] + +def band_of(c): + if band == 'ng': + return '2.4 GHz' + if 36 <= c <= 48: return 'UNII-1 (non-DFS)' + if 52 <= c <= 64: return 'UNII-2a (DFS)' + if 100 <= c <= 144: return 'UNII-2c (DFS)' + if 149 <= c <= 165: return 'UNII-3 (non-DFS)' + return '?' + +ch = defaultdict(list) +for ap, bands in d.items(): + for c, busy in bands.get(sb, {}).items(): + try: ch[int(c)].append(busy) + except: pass +if not ch: + sys.exit(f"[survey-report] no {band} ({sb}GHz) data in {path}") + +print(f"==== FLEET CHANNEL CONGESTION ({band}, {len(d)} APs scanned) -- measured busy% (lower=cleaner) ====") +print(f"{'ch':>4} {'band':<18} {'median':>7} {'mean':>6} {'max':>5} n") +rows = [] +for c in sorted(ch): + v = ch[c]; rows.append((st.median(v), c, st.mean(v), max(v), len(v))) + print(f"{c:>4} {band_of(c):<18} {st.median(v):>6.0f}% {st.mean(v):>5.0f}% {max(v):>4.0f}% {len(v):>3}") + +grp = defaultdict(list) +for ap, bands in d.items(): + for c, busy in bands.get(sb, {}).items(): + try: grp[band_of(int(c))].append(busy) + except: pass +print("\nBY BAND GROUP (median busy% across all AP-channel samples):") +for g in sorted(grp): + v = grp[g] + print(f" {g:<18} median={st.median(v):>4.0f}% mean={st.mean(v):>4.0f}% (n={len(v)})") + +clean = sorted(rows) # by median busy asc +print("\nCLEANEST channels:", ", ".join(f"ch{c}({m:.0f}%)" for m, c, *_ in clean[:8])) +print("DIRTIEST channels:", ", ".join(f"ch{c}({m:.0f}%)" for m, c, *_ in clean[::-1][:5])) + +if band == 'na': + # suggest a clean 40MHz palette: lower-primary channels whose 40MHz pair (c, c+4) are both clean + medbusy = {c: st.median(v) for c, v in ch.items()} + THRESH = max(8, st.median([m for m in medbusy.values()])) # "clean" = <= this + pairs = [(52, 56), (60, 64), (100, 104), (108, 112), (116, 120), (124, 128), (132, 136), (140, 144), + (36, 40), (44, 48), (149, 153), (157, 161)] + palette = [lo for lo, hi in pairs + if medbusy.get(lo, 99) <= THRESH and medbusy.get(hi, 99) <= THRESH] + dfs_clean = [c for c in palette if 52 <= c <= 144] + print(f"\nSUGGESTED clean 40MHz palette (both halves <= {THRESH:.0f}% busy): {palette}") + print(f" of those, DFS (cleaner here but radar-vacate risk): {dfs_clean}") + print(f" -> feed channel-plan: channel-plan.sh na --channels {','.join(map(str,palette))} --apply") + print(" NOTE: choose DFS vs non-DFS from THIS data + the radar tradeoff, not a hardcoded policy.")