unifi-wifi: coverage-thin mesh-awareness — never disable wireless-mesh APs or their parents

Howard caught a real hazard: coverage-thin was mesh-blind. At Cascades, 2nd Floor Atrium is the
wireless-mesh PARENT for CC Bridge + salon (backhaul ch36/5GHz), and 206 U7 Pro carries 108. The tool
had listed 2nd Floor Atrium / CC Bridge / 206 as 2.4 disable targets. Although the backhaul is 5GHz
(so a 2.4-radio disable wouldn't drop it), touching infra APs that feed others is needless risk.

Fix: fetch live uplink topology (stat/device); build the mesh set = wireless-uplink APs UNION their
parents; exclude them from disable (kept as coverers if their 2.4 is on); print MESH-PROTECTED line.
Falls back with a clear WARNING if no controller cred. Cascades now auto-excludes 108/206/2nd Atrium/
CC Bridge/salon; resilient plan 34->33. Also verified SSIDs are not AP-pinned (broadcasting_aps off),
so no client is orphaned by a radio disable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-16 16:43:08 -07:00
parent 6f77222bcb
commit 3374483cd6

View File

@@ -18,7 +18,8 @@
# ZPCT=50 (max % of a zone's 2.4 radios off), CLIENT_CAP=12 (max projected avg clients on a coverer).
set -uo pipefail
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
UOS="$REPO/.claude/scripts/uos-mongo.sh"
UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh"
HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}"
SITEARG="${1:?usage: coverage-thin.sh <site> [days=14] (NEIGHBOR_JSON=<matrix> required)}"; DAYS="${2:-14}"
NJ="${NEIGHBOR_JSON:-}"; [ -n "$NJ" ] && [ -f "$NJ" ] || { echo "[ERROR] NEIGHBOR_JSON=<matrix.json> required (run neighbor-collect.sh with NBR_JSON=...)"; exit 1; }
if [[ "$SITEARG" =~ ^[0-9a-f]{24}$ ]]; then SITE="$SITEARG"; else
@@ -27,6 +28,35 @@ if [[ "$SITEARG" =~ ^[0-9a-f]{24}$ ]]; then SITE="$SITEARG"; else
echo "[INFO] coverage-thin site=$SITE window=${DAYS}d matrix=$NJ"
TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT
# ---- MESH SAFETY: wireless-mesh APs depend on a parent's backhaul; never disable a mesh AP or a mesh
# parent (even 2.4-only changes risk the children). Fetch live uplink topology (controller) -> exclude. ----
MESH="$TMP/mesh.txt"; : > "$MESH"
CU="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.username 2>/dev/null || true)"
CP="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.password 2>/dev/null || true)"
if [ -n "$CU" ] && [ -n "$CP" ]; then
base="https://$HOST:$PORT"; CJ="$TMP/cj"
curl -sk -c "$CJ" -o /dev/null -X POST "$base/api/auth/login" -H 'Content-Type: application/json' \
--data-binary "$(python -c 'import json,sys;print(json.dumps({"username":sys.argv[1],"password":sys.argv[2]}))' "$CU" "$CP")" 2>/dev/null
SHORT="$(curl -sk -b "$CJ" "$base/proxy/network/api/self/sites" | python -c "import sys,json;[print(s['name']) for s in json.load(sys.stdin).get('data',[]) if s.get('_id')=='$SITE']" 2>/dev/null)"
curl -sk -b "$CJ" "$base/proxy/network/api/s/${SHORT:-$SITEARG}/stat/device" -o "$TMP/dev.json" 2>/dev/null
python - "$TMP/dev.json" "$MESH" <<'PY' 2>/dev/null || true
import sys,json
try: dev=json.load(open(sys.argv[1])).get('data',[])
except Exception: dev=[]
aps=[d for d in dev if d.get('type')=='uap']; bymac={d.get('mac'):(d.get('name') or d.get('mac')) for d in aps}
mesh=set()
for a in aps:
up=a.get('uplink') or {}
if up.get('type')=='wireless':
mesh.add(a.get('name')) # the wireless child
if up.get('uplink_mac') in bymac: mesh.add(bymac[up['uplink_mac']]) # its parent
open(sys.argv[2],'w',newline='\n').write('\n'.join(sorted(x for x in mesh if x)))
PY
[ -s "$MESH" ] && echo "[INFO] mesh APs auto-excluded from disable: $(tr '\n' ' ' < "$MESH")" || echo "[INFO] no wireless-mesh APs detected (all wired)."
else
echo "[WARNING] no controller cred -> CANNOT detect mesh topology; mesh APs are NOT auto-excluded. Verify manually before disabling!"
fi
# ---- per-AP 2.4 state + airtime/clients (Mongo) -> TSV ----
cat <<JS | bash "$UOS" 2>&1 | grep -viE 'pq.html|post-quantum|store now|server may need' > "$TMP/ap.tsv"
var ace=db.getSiblingDB('ace'), st=db.getSiblingDB('ace_stat'), SITE="$SITE";
@@ -54,8 +84,11 @@ JS
# ---- greedy coverage-thinning on the 2.4 SNR layer ----
COVER_SNR="${COVER_SNR:-28}"; MINCOV="${MINCOV:-1}"; ZPCT="${ZPCT:-50}"; CLIENT_CAP="${CLIENT_CAP:-12}"
python - "$TMP/ap.tsv" "$NJ" "$COVER_SNR" "$MINCOV" "$ZPCT" "$CLIENT_CAP" <<'PY'
python - "$TMP/ap.tsv" "$NJ" "$COVER_SNR" "$MINCOV" "$ZPCT" "$CLIENT_CAP" "$MESH" <<'PY'
import sys,json
MESH=set()
try: MESH={l.strip() for l in open(sys.argv[7],encoding='utf-8') if l.strip()}
except Exception: pass
ap={}
for ln in open(sys.argv[1],encoding='utf-8',errors='replace'):
if not ln.startswith('ROW\t'): continue
@@ -85,6 +118,7 @@ while changed:
changed=False
for a in order:
if not on[a]: continue
if a in MESH: continue # never disable a mesh AP/parent (backhaul risk)
cov=coverers(a)
if len(cov)<max(1,MINCOV): continue # area must stay covered by active 2.4
# disabling 'a' must not strand an already-OFF neighbor that relied on it
@@ -103,7 +137,9 @@ keep=[a for a in nodes if on[a]]
intf_removed=sum(ap[a]['intf'] for a in disabled)
def pad(s,w): s=str(s); return s+' '*max(0,w-len(s))
print(f"\n==== 2.4 COVERAGE-THINNING PLAN (active 2.4 radios={len(nodes)}) DISABLE={len(disabled)} KEEP={len(keep)} ====")
print(f"SNR>={COVER:.0f} = 'covers same area'; only ACTIVE-2.4 neighbors count; <{ '2' if MINCOV<2 else MINCOV} coverers flagged.\n")
print(f"SNR>={COVER:.0f} = 'covers same area'; only ACTIVE-2.4 neighbors count; <{ '2' if MINCOV<2 else MINCOV} coverers flagged.")
if MESH: print(f"MESH-PROTECTED (never disabled - wireless backhaul): {', '.join(sorted(MESH))}")
print("")
from collections import defaultdict
byz=defaultdict(list)
for a in disabled: byz[ap[a]['zone']].append(a)