diff --git a/.claude/skills/unifi-wifi/SKILL.md b/.claude/skills/unifi-wifi/SKILL.md index a49df7d..e8d29af 100644 --- a/.claude/skills/unifi-wifi/SKILL.md +++ b/.claude/skills/unifi-wifi/SKILL.md @@ -186,8 +186,14 @@ client-control.sh block|unblock|kick [--apply] # ban a MAC / un-b ``` **Device / adoption remediation — `scripts/device-control.sh`** (pairs with gw-audit flags; gated): ```bash -device-control.sh adopt|restart|provision|locate|unlocate|upgrade [--apply] +device-control.sh adopt|restart|locate|unlocate|upgrade [--apply] +device-control.sh poe-cycle [--apply] # RECOVER a hung AP: power-cycle its PoE switch port ``` +**SAFETY:** `provision`/force-provision is **removed** — it took AP 445 fully **offline** on a U7-Pro +(2026-06-16), requiring a physical port power-cycle. To recover a hung/offline AP, use **`poe-cycle`** +(remote PoE port power-cycle of the AP's uplink port); to push config cleanly use `restart` or let the +controller converge. + **Gateway router actions — `scripts/gw-control.sh`** (port-forwards + WAN firewall; the write side of gw-audit; gated/DRY-RUN, rollback saved): ```bash diff --git a/.claude/skills/unifi-wifi/scripts/device-control.sh b/.claude/skills/unifi-wifi/scripts/device-control.sh index 1dc5caa..61e5409 100644 --- a/.claude/skills/unifi-wifi/scripts/device-control.sh +++ b/.claude/skills/unifi-wifi/scripts/device-control.sh @@ -1,57 +1,77 @@ #!/usr/bin/env bash -# device-control.sh — adoption remediation + device ops via the controller (cmd/devmgr). -# Pairs with gw-audit.sh, which flags pending/disconnected/upgradable devices; this remediates them. -# adopt adopt a pending device -# restart reboot a device -# provision force re-provision (push config) — fixes "stuck"/out-of-sync devices -# locate flash the device LED (find it physically) -# unlocate stop flashing -# upgrade upgrade firmware to the controller-recommended version -# DRY-RUN default; --apply gated behind infrastructure/uos-server-network-api-rw. Controller-side -# (no AP cred / VPN) -> any UOS site. Find MACs with: gw-audit.sh / sites.sh / stat/device. +# device-control.sh — adoption + device ops via the controller (cmd/devmgr). Pairs with gw-audit. +# adopt adopt a pending device +# restart reboot a device (clean; drops its clients ~1 min) +# locate flash the device LED +# unlocate stop flashing +# upgrade upgrade firmware to the controller-recommended version +# poe-cycle RECOVERY: power-cycle the PoE switch port feeding an AP (the remote +# equivalent of physically re-seating the cable). Use when an AP is hung/ +# offline after a change. Resolves the AP -> its uplink switch + port. # -# Usage: bash .claude/skills/unifi-wifi/scripts/device-control.sh [--apply] +# REMOVED: 'provision' / force-provision — on U7-Pro it knocked AP 445 fully OFFLINE (required a +# physical port power-cycle to recover, 2026-06-16). Do NOT force-provision these APs. To recover a +# hung AP use `poe-cycle`; to push config cleanly use `restart` or just let the controller converge. +# +# DRY-RUN default; --apply gated behind infrastructure/uos-server-network-api-rw. Controller-side. +# Usage: bash .../device-control.sh [--apply] set -uo pipefail REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)" UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh" -SITEARG="${1:?usage: device-control.sh [--apply]}" -ACT="${2:?action: adopt|restart|provision|locate|unlocate|upgrade}"; MAC="$(echo "${3:?mac required}" | tr 'A-Z' 'a-z')"; APPLY=0 +SITEARG="${1:?usage: device-control.sh [--apply]}" +ACT="${2:?action: adopt|restart|locate|unlocate|upgrade|poe-cycle}"; TGT="${3:?target (mac, or AP name for poe-cycle)}"; APPLY=0 shift 3; while [ $# -gt 0 ]; do case "$1" in --apply) APPLY=1; shift;; *) shift;; esac; done case "$ACT" in - adopt) CMD="adopt";; restart) CMD="restart";; provision) CMD="force-provision";; - locate) CMD="set-locate";; unlocate) CMD="unset-locate";; upgrade) CMD="upgrade";; - *) echo "action: adopt|restart|provision|locate|unlocate|upgrade"; exit 1;; esac -[[ "$MAC" =~ ^[0-9a-f:]{17}$ ]] || { echo "[ERROR] mac must be aa:bb:cc:dd:ee:ff"; exit 1; } + adopt) CMD="adopt";; restart) CMD="restart";; locate) CMD="set-locate";; unlocate) CMD="unset-locate";; upgrade) CMD="upgrade";; poe-cycle) CMD="power-cycle";; + provision|force-provision) echo "[REFUSED] force-provision is unsafe on these APs (took 445 offline 2026-06-16). Use 'poe-cycle' to recover a hung AP, or 'restart'."; exit 1;; + *) echo "action: adopt|restart|locate|unlocate|upgrade|poe-cycle"; exit 1;; esac if [[ "$SITEARG" =~ ^[0-9a-f]{24}$ ]]; then SITE="$SITEARG"; else SITE="$(bash "$UOS" --sites 2>/dev/null | grep -vi 'pq.html' | grep -i "$SITEARG" | awk '{print $1}' | head -1)"; fi [ -n "$SITE" ] || { echo "[ERROR] site not found"; exit 1; } -echo "[INFO] site=$SITE $ACT ($CMD) mac=$MAC mode=$([ $APPLY = 1 ] && echo APPLY || echo DRY-RUN)" -if [ "$APPLY" != "1" ]; then echo "[dry-run] would POST cmd/devmgr {cmd:'$CMD', mac:'$MAC'}. Add --apply."; exit 0; fi +# mac-based verbs need a mac; poe-cycle accepts an AP name OR mac +if [ "$ACT" != "poe-cycle" ]; then + MAC="$(echo "$TGT" | tr 'A-Z' 'a-z')"; [[ "$MAC" =~ ^[0-9a-f:]{17}$ ]] || { echo "[ERROR] $ACT needs a mac aa:bb:cc:dd:ee:ff"; exit 1; } +else MAC="$TGT"; fi +echo "[INFO] site=$SITE $ACT ($CMD) target=$TGT mode=$([ $APPLY = 1 ] && echo APPLY || echo DRY-RUN)" RWP="infrastructure/uos-server-network-api-rw" export RW_U="$(bash "$VAULT" get-field "$RWP" credentials.username 2>/dev/null || true)" export RW_P="$(bash "$VAULT" get-field "$RWP" credentials.password 2>/dev/null || true)" -[ -n "$RW_U" ] && [ -n "$RW_P" ] || { echo "[BLOCKED] --apply needs RW admin vaulted at $RWP"; exit 2; } -export DC_SITE="$SITE" DC_CMD="$CMD" DC_MAC="$MAC" +[ -n "$RW_U" ] && [ -n "$RW_P" ] || { echo "[BLOCKED] needs RW admin vaulted at $RWP"; exit 2; } +export DC_SITE="$SITE" DC_CMD="$CMD" DC_TGT="$MAC" DC_ACT="$ACT" DC_APPLY="$APPLY" python - <<'PY' import os,sys,json,ssl,urllib.request,http.cookiejar H="172.16.3.29";PORT=11443;base=f"https://{H}:{PORT}" ctx=ssl.create_default_context();ctx.check_hostname=False;ctx.verify_mode=ssl.CERT_NONE cj=http.cookiejar.CookieJar();op=urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj),urllib.request.HTTPSHandler(context=ctx)) -def call(method,path,body=None,csrf=None,wh=False): - data=json.dumps(body).encode() if body is not None else None - r=urllib.request.Request(base+path,data=data,method=method);r.add_header('Content-Type','application/json') +def call(m,p,b=None,csrf=None,wh=False): + d=json.dumps(b).encode() if b is not None else None + r=urllib.request.Request(base+p,data=d,method=m);r.add_header('Content-Type','application/json') if csrf:r.add_header('X-CSRF-Token',csrf) - resp=op.open(r,timeout=30);return (resp.read().decode('utf-8','replace'),resp.headers) if wh else resp.read().decode('utf-8','replace') + x=op.open(r,timeout=30);return (x.read().decode('utf-8','replace'),x.headers) if wh else x.read().decode('utf-8','replace') try:_,hd=call('POST','/api/auth/login',{'username':os.environ['RW_U'],'password':os.environ['RW_P']},wh=True) except Exception as e:print("[ERROR] login failed:",e);sys.exit(1) csrf=hd.get('X-CSRF-Token') or hd.get('X-Updated-Csrf-Token') sites=json.loads(call('GET','/proxy/network/api/self/sites')).get('data',[]) short=next((s['name'] for s in sites if s.get('_id')==os.environ['DC_SITE']),None) -if not short:print("[ERROR] site resolve failed");sys.exit(1) -try: - r=call('POST',f"/proxy/network/api/s/{short}/cmd/devmgr",{'cmd':os.environ['DC_CMD'],'mac':os.environ['DC_MAC']},csrf=csrf) - meta=json.loads(r).get('meta',{}) - print(f" [{'ok' if meta.get('rc')=='ok' else 'FAIL'}] {os.environ['DC_CMD']} {os.environ['DC_MAC']} -> {meta}") -except Exception as e:print(" [FAIL]",e) +act=os.environ['DC_ACT']; tgt=os.environ['DC_TGT']; apply=os.environ['DC_APPLY']=='1' +if act=='poe-cycle': + devs=json.loads(call('GET',f'/proxy/network/api/s/{short}/stat/device')).get('data',[]) + ap=next((d for d in devs if d.get('name')==tgt or d.get('mac')==tgt.lower()),None) + if not ap:print(f"[ERROR] AP '{tgt}' not found");sys.exit(1) + u=ap.get('uplink',{}); swmac=u.get('uplink_mac'); port=u.get('uplink_remote_port') + sw=next((d for d in devs if d.get('mac')==swmac),None) + if not (swmac and port):print(f"[ERROR] no wired uplink for {tgt} (mesh? can't PoE-cycle)");sys.exit(1) + print(f" {tgt} uplinks to {sw.get('name') if sw else swmac} port {port} -> power-cycle that PoE port") + if not apply:print(" [dry-run] add --apply to power-cycle the port (drops the AP ~30s, then it boots).");sys.exit(0) + try: + r=call('POST',f"/proxy/network/api/s/{short}/cmd/devmgr",{'cmd':'power-cycle','mac':swmac,'port_idx':int(port)},csrf=csrf) + meta=json.loads(r).get('meta',{}); print(f" [{'ok' if meta.get('rc')=='ok' else 'FAIL'}] power-cycle {sw.get('name') if sw else swmac}:p{port} -> {meta}") + except Exception as e:print(" [FAIL]",e) +else: + if not apply:print(f" [dry-run] would POST cmd/devmgr {{cmd:'{os.environ['DC_CMD']}', mac:'{tgt}'}}. Add --apply.");sys.exit(0) + try: + r=call('POST',f"/proxy/network/api/s/{short}/cmd/devmgr",{'cmd':os.environ['DC_CMD'],'mac':tgt},csrf=csrf) + meta=json.loads(r).get('meta',{}); print(f" [{'ok' if meta.get('rc')=='ok' else 'FAIL'}] {os.environ['DC_CMD']} {tgt} -> {meta}") + except Exception as e:print(" [FAIL]",e) PY diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md index c231b32..138c0d8 100644 --- a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md @@ -523,3 +523,27 @@ Fix: gw-audit no longer false-flags internet status=unknown on third-party-firew SKILL.md + ROADMAP updated (D items done). Skill is feature-complete for monitoring+tuning+apply across WiFi/switch/gateway, multi-client, with scheduling + flap resilience. Remaining: per-client AP creds/VPN, read-only cred (Mike to create UI admin), gateway VPN-server/DHCP-DNS (Mike). Coord: this msg. + +--- + +## Update: 2026-06-16 13:12 PT — FIRST PRODUCTION CHANGE: Floor-4 2.4 power-down pilot + 445 outage/recovery learning + +Baseline captured (Floor-4 2.4: 15 radios, avg cu_total 86%, avg retry 13.2%; AP 434 ground-truth 94% +busy / hears 48 managed APs on 2.4). Applied: `apply-radio cascades ng power low --zone "Floor 4" --apply`. +RESULT: 14/15 radios -> 6dBm (from ~23), clients retained (118 vs ~119) - no coverage loss. Rollback saved. + +INCIDENT: AP 445 lagged (config=low but radio stayed 23dBm). Tried device-control 'provision' +(cmd/devmgr force-provision) to nudge -> HTTP 400 AND it took 445 fully OFFLINE; Howard physically +power-cycled the switch port to recover. 445 back online (booted to 30dBm; config still low, not converged +- left alone, harmless). + +FIX (Howard's directive - power-cycle the port is the recovery): +- REMOVED 'provision'/force-provision from device-control.sh (unsafe on U7-Pro; refuses now). +- ADDED 'poe-cycle ' - resolves AP uplink_mac + uplink_remote_port, POSTs cmd/devmgr + power-cycle {mac:switch, port_idx} = remote PoE port cycle (the controller equivalent of re-seating the + cable). Dry-run validated (445->Switch 4th Floor p7, 434->p5). errorlog.md updated. Coord -> Mike. + +EXPECTATION SET: 2.4 cu_total won't plummet (channel is foreign-dominated, ~33k BSSIDs); power-down's win +is cell-shrink + reduced SELF-interference + better client SNR, seen in retry%/consolidation over ~10-15 +min, and it sets up the Phase-C disables. Next: settle ~15 min, re-snapshot Floor-4 retry% for before/after; +decide on 445 (poe-cycle to apply low, or leave). Disables (445/428) still HELD. diff --git a/errorlog.md b/errorlog.md index ec13de7..de309de 100644 --- a/errorlog.md +++ b/errorlog.md @@ -67,3 +67,4 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · 2026-06-14 | GURU-BEAST-ROG | coord skill (coord.py msg send) | `py "$CLAUDETOOLS_ROOT/.claude/skills/coord/scripts/coord.py"` failed — `$CLAUDETOOLS_ROOT` is not exported in fresh Git-bash shells here, so the path resolved under `C:\Program Files\Git\`. [RESOLVED 2026-06-14] Added `.claude/scripts/ensure-settings-env.py` (seeds `env.CLAUDETOOLS_ROOT` in per-machine `settings.local.json` from `identity.json`); Claude Code injects it into every Bash call. Wired into ONBOARDING.md + broadcast to fleet. Effective next session start. 2026-06-14 | GURU-BEAST-ROG | /sync (sync.sh Phase 3, submodule update) | submodule `projects/msp-tools/guru-rmm` checkout of f38da05 aborted: untracked `docs/RMM_THOUGHTS.md` would be overwritten. Parent repo synced fine; submodule pointer left lagging. Recurring transient. [RESOLVED 2026-06-15] sync.sh now has `resolve_submodule_collisions()` — on the abort it moves only the untracked files the incoming commit tracks aside to `.synced-aside-` (content preserved, NOT --force) then retries once. Verified live: guru-rmm advanced ed92097->f38da05; the aside copy held 94 lines of un-committed 2026-06-08 thoughts (rescued, not lost — needs manual merge into canonical RMM_THOUGHTS.md). +2026-06-16 | HOWARD-HOME | unifi-wifi/device-control.sh provision | cmd/devmgr force-provision returned HTTP 400 (mac 0c:ea:14:3f:40:6d / AP 445); verb needs fix — likely wrong cmd name or requires device _id not mac. block/kick/locate via stamgr work; adopt/restart/upgrade unverified.