diff --git a/.claude/skills/unifi-wifi/SKILL.md b/.claude/skills/unifi-wifi/SKILL.md index 969f7a0..d8efebf 100644 --- a/.claude/skills/unifi-wifi/SKILL.md +++ b/.claude/skills/unifi-wifi/SKILL.md @@ -26,10 +26,12 @@ path is Cascades — override with the script's vault-path arg per client. empirical DFS radar history (`dfs-check`), the **AP-to-AP SNR neighbor matrix** from `/proc/ui_neighbor` (`neighbor-collect`), per-AP live stream (`watch-ap`), and gated config apply (`apply-radio`). All site-parameterized → works on every UniFi site we monitor. -- **[WIP] Switches / PoE, gateways / WAN / firewall, adoption, client DHCP/DNS** — not yet wrapped - as collectors. The access layer reaches them (the same `uos-mongo.sh` covers the whole `ace` DB; - the controller API + device SSH reach switches/gateways) — these just need dedicated scripts. - Use ad-hoc for now; productize as the need arises (or split into a sibling `unifi` skill). +- **[WORKING] Switch / PoE audit** — `scripts/switch-audit.sh [--all-ports]`: per-switch PoE + budget, port up/total, and flags (UNDERSPEED gig-port-at-10/100, rate-based DROPS, ERRORS, PoE + faults, PoE-budget pressure). Controller-side, any site. (Found ~25 100M-linked gig ports at Cascades.) +- **[WIP] Gateways / WAN / firewall, adoption, client DHCP/DNS** — not yet wrapped as collectors. + The access layer reaches them (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH) — + these just need dedicated scripts. Productize as the need arises (or split into a sibling `unifi` skill). - **Per-client requirement:** `watch-ap`/`neighbor-collect`/`survey-collect`/`dfs-check` default the AP device-auth SSH cred to `clients/cascades-tucson/unifi-ap-ssh`; for another client, vault its own `clients//unifi-ap-ssh` and pass it as the script's vault-path arg. diff --git a/.claude/skills/unifi-wifi/references/ROADMAP.md b/.claude/skills/unifi-wifi/references/ROADMAP.md index bb1e163..efc8509 100644 --- a/.claude/skills/unifi-wifi/references/ROADMAP.md +++ b/.claude/skills/unifi-wifi/references/ROADMAP.md @@ -50,8 +50,9 @@ side, multi-client enablement, and non-WiFi scope. Build/validate new apply acti scripts print the exact vault command when a client's cred is missing (and note controller-side works). ## C. Non-WiFi UniFi (currently WIP / out of scope) -- [ ] **Switch/PoE collector** — port up/down, PoE budget + per-port draw, errors, **uplink negotiated - speed** (the FastEthernet-uplink issue is still not scriptable). +- [x] **Switch/PoE collector** — DONE: `switch-audit.sh` (port up/down, PoE budget+draw, ERRORS, + rate-based DROPS, PoE faults, and UNDERSPEED = gig-capable port linked at 10/100 = the FastEthernet + issue, now systematically found — ~25 such ports at Cascades). Controller-side, any site. - [ ] **Gateway/WAN/firewall + adoption** — WAN health/failover, pending-adoption devices. - The access layer already reaches these (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH); they just need dedicated scripts. Consider a sibling `unifi` skill if scope grows. diff --git a/.claude/skills/unifi-wifi/scripts/switch-audit.sh b/.claude/skills/unifi-wifi/scripts/switch-audit.sh new file mode 100644 index 0000000..1470626 --- /dev/null +++ b/.claude/skills/unifi-wifi/scripts/switch-audit.sh @@ -0,0 +1,65 @@ +#!/usr/bin/env bash +# switch-audit.sh - UniFi switch / PoE / port-health audit for a site (controller-side, read-only). +# Per switch: PoE budget used vs total, port up/total, and FLAGS: underspeed links (a gig-capable +# port negotiated at 10/100 - the classic AP "FastEthernet uplink" problem), ports with rx/tx errors +# or drops, PoE faults (enabled but not good), and PoE budget pressure. No AP cred / VPN needed -> +# works on ANY UOS site. First non-WiFi collector (ROADMAP C). +# +# Usage: bash .claude/skills/unifi-wifi/scripts/switch-audit.sh [--all-ports] +set -uo pipefail +REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)" +UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh" +HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}" +SITEARG="${1:?usage: switch-audit.sh [--all-ports]}"; ALL="${2:-}" +TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT +U="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.username 2>/dev/null)" +P="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.password 2>/dev/null)" +[ -n "$U" ] && [ -n "$P" ] || { echo "[ERROR] no controller cred (infrastructure/uos-server-network-api-rw)"; exit 1; } +base="https://$HOST:$PORT"; CJ="$TMP/cj" +code=$(curl -sk -c "$CJ" -o /dev/null -w '%{http_code}' -X POST "$base/api/auth/login" -H 'Content-Type: application/json' \ + --data-binary "$(python -c 'import json,sys;print(json.dumps({"username":sys.argv[1],"password":sys.argv[2]}))' "$U" "$P")") +[ "$code" = "200" ] || { echo "[ERROR] controller login HTTP $code"; exit 1; } +SHORT="$(curl -sk -b "$CJ" "$base/proxy/network/api/self/sites" | python -c " +import sys,json; d=json.load(sys.stdin).get('data',[]); q='''$SITEARG'''.lower() +for s in d: + if s.get('_id')=='''$SITEARG''' or s.get('name')=='''$SITEARG''' or q in (s.get('desc','').lower()): print(s.get('name')); break +")"; [ -n "$SHORT" ] || SHORT="$SITEARG" +echo "[INFO] switch audit: site=$SHORT" +curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" -o "$TMP/dev.json" +python - "$TMP/dev.json" "$ALL" <<'PY' +import sys,json +d=json.load(open(sys.argv[1])).get('data',[]); ALL=(sys.argv[2]=='--all-ports') +sw=[x for x in d if x.get('type')=='usw'] +print(f"==== {len(sw)} switches ====") +tot_flags=0 +for s in sorted(sw,key=lambda x:str(x.get('name'))): + pt=s.get('port_table',[]) + up=[p for p in pt if p.get('up')] + poe_used=sum(float(p.get('poe_power') or 0) for p in pt if p.get('poe_enable')) + budget=s.get('total_max_power') or 0 + online = s.get('state')==1 + hdr=f"\n{s.get('name')} [{s.get('model')}] {'OFFLINE ' if not online else ''}ports up {len(up)}/{len(pt)}" + if budget: hdr+=f" PoE {poe_used:.0f}/{budget}W" + print(hdr) + flags=[] + for p in up: + nm=f"p{p.get('port_idx')}({p.get('name')})"; sp=p.get('speed'); cap=p.get('speed_caps') or 0 + # underspeed: up at 10/100 while the port supports 1000+ (speed_caps bit for 1000 = 0x10) + if sp in (10,100) and cap and cap>=16: + flags.append(f" [UNDERSPEED] {nm}: linked {sp}M but gig-capable - check cable/NIC/port") + if (p.get('rx_errors') or 0)+(p.get('tx_errors') or 0) > 1000: + flags.append(f" [ERRORS] {nm}: rx_err={p.get('rx_errors')} tx_err={p.get('tx_errors')}") + txp=p.get('tx_packets') or 0; txd=p.get('tx_dropped') or 0 # rate-based: cumulative counts are noisy + if txp>50000 and txd/(txp+1) > 0.02: + flags.append(f" [DROPS] {nm}: tx_drop {100*txd/(txp+1):.1f}% ({txd}/{txp})") + if p.get('poe_enable') and p.get('poe_good') is False: + flags.append(f" [POE-FAULT] {nm}: PoE enabled but not good (poe_class={p.get('poe_class')})") + if budget and poe_used>0.85*budget: + flags.append(f" [POE-BUDGET] {poe_used:.0f}/{budget}W (>85%) - near capacity") + if flags: tot_flags+=len(flags); print("\n".join(flags)) + else: print(" [OK] no port issues") + if ALL: + for p in sorted(up,key=lambda x:x.get('port_idx') or 0): + print(f" p{p.get('port_idx'):>2} {str(p.get('name'))[:18]:<18} {str(p.get('speed'))+'M':>6} {'PoE '+str(p.get('poe_power'))+'W' if p.get('poe_enable') else '':>10}") +print(f"\n==== {tot_flags} flag(s) across {len(sw)} switches ====") +PY diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md index a19e7d9..39e6019 100644 --- a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md @@ -446,3 +446,21 @@ apply-wlan restructured: SITE resolved before the action case (so aps can map na WiFi APPLY SURFACE COMPLETE: apply-radio (radio_table) + apply-wlan (wlanconf, 15 actions) + client-control (cmd/stamgr), all gated + rollback + validated on 0-client sandboxes / dummy MAC. SKILL.md + ROADMAP updated. Coord: this msg. NEXT: switches/PoE collector (ROADMAP C). + +--- + +## Update: 2026-06-16 01:13 PT — switches: switch-audit.sh shipped (first non-WiFi collector, ROADMAP C) + +NEW scripts/switch-audit.sh [--all-ports] (controller-side, read-only, any site): per switch -> +PoE budget used/total, ports up/total, FLAGS: UNDERSPEED (gig-capable port linked 10/100 = FastEthernet +uplink problem), DROPS (rate-based tx_dropped/tx_packets>2%), ERRORS (>1000), POE-FAULT (poe_enable & +!poe_good), POE-BUDGET (>85%), + OFFLINE switches. port_table fields: speed/speed_caps/poe_*/rx_tx_errors. + +VALIDATED on Cascades (12 switches): found ~25 ports linked at 100M but gig-capable (systematic +cabling/NIC issue = the long-suspected FastEthernet uplink problem, now fleet-visible), offline switches +(2nd Floor #2, 4th Floor #2, USW Pro Max 16), PoE budgets healthy, one real 4.1% tx-drop port (p38 on +1st Floor USW). Refined DROPS to rate-based (cumulative counters were noisy). + +SKILL coverage now: WiFi (monitor+tune+full apply incl. device lock/client controls) + switch/PoE audit, +all controller-side multi-client + AP-side where cred vaulted. ROADMAP remaining: gateway/WAN/firewall + +adoption collectors. Coord: this msg.