sync: auto-sync from HOWARD-HOME at 2026-06-16 01:13:51

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-16 01:13:51
This commit is contained in:
2026-06-16 01:13:59 -07:00
parent 6557cdb5bb
commit 4e797dbf61
4 changed files with 92 additions and 6 deletions

View File

@@ -26,10 +26,12 @@ path is Cascades — override with the script's vault-path arg per client.
empirical DFS radar history (`dfs-check`), the **AP-to-AP SNR neighbor matrix** from
`/proc/ui_neighbor` (`neighbor-collect`), per-AP live stream (`watch-ap`), and gated config apply
(`apply-radio`). All site-parameterized → works on every UniFi site we monitor.
- **[WIP] Switches / PoE, gateways / WAN / firewall, adoption, client DHCP/DNS** — not yet wrapped
as collectors. The access layer reaches them (the same `uos-mongo.sh` covers the whole `ace` DB;
the controller API + device SSH reach switches/gateways) — these just need dedicated scripts.
Use ad-hoc for now; productize as the need arises (or split into a sibling `unifi` skill).
- **[WORKING] Switch / PoE audit** — `scripts/switch-audit.sh <site> [--all-ports]`: per-switch PoE
budget, port up/total, and flags (UNDERSPEED gig-port-at-10/100, rate-based DROPS, ERRORS, PoE
faults, PoE-budget pressure). Controller-side, any site. (Found ~25 100M-linked gig ports at Cascades.)
- **[WIP] Gateways / WAN / firewall, adoption, client DHCP/DNS** — not yet wrapped as collectors.
The access layer reaches them (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH) —
these just need dedicated scripts. Productize as the need arises (or split into a sibling `unifi` skill).
- **Per-client requirement:** `watch-ap`/`neighbor-collect`/`survey-collect`/`dfs-check` default the
AP device-auth SSH cred to `clients/cascades-tucson/unifi-ap-ssh`; for another client, vault its
own `clients/<x>/unifi-ap-ssh` and pass it as the script's vault-path arg.

View File

@@ -50,8 +50,9 @@ side, multi-client enablement, and non-WiFi scope. Build/validate new apply acti
scripts print the exact vault command when a client's cred is missing (and note controller-side works).
## C. Non-WiFi UniFi (currently WIP / out of scope)
- [ ] **Switch/PoE collector**port up/down, PoE budget + per-port draw, errors, **uplink negotiated
speed** (the FastEthernet-uplink issue is still not scriptable).
- [x] **Switch/PoE collector**DONE: `switch-audit.sh` (port up/down, PoE budget+draw, ERRORS,
rate-based DROPS, PoE faults, and UNDERSPEED = gig-capable port linked at 10/100 = the FastEthernet
issue, now systematically found — ~25 such ports at Cascades). Controller-side, any site.
- [ ] **Gateway/WAN/firewall + adoption** — WAN health/failover, pending-adoption devices.
- The access layer already reaches these (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH);
they just need dedicated scripts. Consider a sibling `unifi` skill if scope grows.

View File

@@ -0,0 +1,65 @@
#!/usr/bin/env bash
# switch-audit.sh - UniFi switch / PoE / port-health audit for a site (controller-side, read-only).
# Per switch: PoE budget used vs total, port up/total, and FLAGS: underspeed links (a gig-capable
# port negotiated at 10/100 - the classic AP "FastEthernet uplink" problem), ports with rx/tx errors
# or drops, PoE faults (enabled but not good), and PoE budget pressure. No AP cred / VPN needed ->
# works on ANY UOS site. First non-WiFi collector (ROADMAP C).
#
# Usage: bash .claude/skills/unifi-wifi/scripts/switch-audit.sh <site-name|id> [--all-ports]
set -uo pipefail
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh"
HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}"
SITEARG="${1:?usage: switch-audit.sh <site-name|id> [--all-ports]}"; ALL="${2:-}"
TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT
U="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.username 2>/dev/null)"
P="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.password 2>/dev/null)"
[ -n "$U" ] && [ -n "$P" ] || { echo "[ERROR] no controller cred (infrastructure/uos-server-network-api-rw)"; exit 1; }
base="https://$HOST:$PORT"; CJ="$TMP/cj"
code=$(curl -sk -c "$CJ" -o /dev/null -w '%{http_code}' -X POST "$base/api/auth/login" -H 'Content-Type: application/json' \
--data-binary "$(python -c 'import json,sys;print(json.dumps({"username":sys.argv[1],"password":sys.argv[2]}))' "$U" "$P")")
[ "$code" = "200" ] || { echo "[ERROR] controller login HTTP $code"; exit 1; }
SHORT="$(curl -sk -b "$CJ" "$base/proxy/network/api/self/sites" | python -c "
import sys,json; d=json.load(sys.stdin).get('data',[]); q='''$SITEARG'''.lower()
for s in d:
if s.get('_id')=='''$SITEARG''' or s.get('name')=='''$SITEARG''' or q in (s.get('desc','').lower()): print(s.get('name')); break
")"; [ -n "$SHORT" ] || SHORT="$SITEARG"
echo "[INFO] switch audit: site=$SHORT"
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" -o "$TMP/dev.json"
python - "$TMP/dev.json" "$ALL" <<'PY'
import sys,json
d=json.load(open(sys.argv[1])).get('data',[]); ALL=(sys.argv[2]=='--all-ports')
sw=[x for x in d if x.get('type')=='usw']
print(f"==== {len(sw)} switches ====")
tot_flags=0
for s in sorted(sw,key=lambda x:str(x.get('name'))):
pt=s.get('port_table',[])
up=[p for p in pt if p.get('up')]
poe_used=sum(float(p.get('poe_power') or 0) for p in pt if p.get('poe_enable'))
budget=s.get('total_max_power') or 0
online = s.get('state')==1
hdr=f"\n{s.get('name')} [{s.get('model')}] {'OFFLINE ' if not online else ''}ports up {len(up)}/{len(pt)}"
if budget: hdr+=f" PoE {poe_used:.0f}/{budget}W"
print(hdr)
flags=[]
for p in up:
nm=f"p{p.get('port_idx')}({p.get('name')})"; sp=p.get('speed'); cap=p.get('speed_caps') or 0
# underspeed: up at 10/100 while the port supports 1000+ (speed_caps bit for 1000 = 0x10)
if sp in (10,100) and cap and cap>=16:
flags.append(f" [UNDERSPEED] {nm}: linked {sp}M but gig-capable - check cable/NIC/port")
if (p.get('rx_errors') or 0)+(p.get('tx_errors') or 0) > 1000:
flags.append(f" [ERRORS] {nm}: rx_err={p.get('rx_errors')} tx_err={p.get('tx_errors')}")
txp=p.get('tx_packets') or 0; txd=p.get('tx_dropped') or 0 # rate-based: cumulative counts are noisy
if txp>50000 and txd/(txp+1) > 0.02:
flags.append(f" [DROPS] {nm}: tx_drop {100*txd/(txp+1):.1f}% ({txd}/{txp})")
if p.get('poe_enable') and p.get('poe_good') is False:
flags.append(f" [POE-FAULT] {nm}: PoE enabled but not good (poe_class={p.get('poe_class')})")
if budget and poe_used>0.85*budget:
flags.append(f" [POE-BUDGET] {poe_used:.0f}/{budget}W (>85%) - near capacity")
if flags: tot_flags+=len(flags); print("\n".join(flags))
else: print(" [OK] no port issues")
if ALL:
for p in sorted(up,key=lambda x:x.get('port_idx') or 0):
print(f" p{p.get('port_idx'):>2} {str(p.get('name'))[:18]:<18} {str(p.get('speed'))+'M':>6} {'PoE '+str(p.get('poe_power'))+'W' if p.get('poe_enable') else '':>10}")
print(f"\n==== {tot_flags} flag(s) across {len(sw)} switches ====")
PY

View File

@@ -446,3 +446,21 @@ apply-wlan restructured: SITE resolved before the action case (so aps can map na
WiFi APPLY SURFACE COMPLETE: apply-radio (radio_table) + apply-wlan (wlanconf, 15 actions) +
client-control (cmd/stamgr), all gated + rollback + validated on 0-client sandboxes / dummy MAC.
SKILL.md + ROADMAP updated. Coord: this msg. NEXT: switches/PoE collector (ROADMAP C).
---
## Update: 2026-06-16 01:13 PT — switches: switch-audit.sh shipped (first non-WiFi collector, ROADMAP C)
NEW scripts/switch-audit.sh <site> [--all-ports] (controller-side, read-only, any site): per switch ->
PoE budget used/total, ports up/total, FLAGS: UNDERSPEED (gig-capable port linked 10/100 = FastEthernet
uplink problem), DROPS (rate-based tx_dropped/tx_packets>2%), ERRORS (>1000), POE-FAULT (poe_enable &
!poe_good), POE-BUDGET (>85%), + OFFLINE switches. port_table fields: speed/speed_caps/poe_*/rx_tx_errors.
VALIDATED on Cascades (12 switches): found ~25 ports linked at 100M but gig-capable (systematic
cabling/NIC issue = the long-suspected FastEthernet uplink problem, now fleet-visible), offline switches
(2nd Floor #2, 4th Floor #2, USW Pro Max 16), PoE budgets healthy, one real 4.1% tx-drop port (p38 on
1st Floor USW). Refined DROPS to rate-based (cumulative counters were noisy).
SKILL coverage now: WiFi (monitor+tune+full apply incl. device lock/client controls) + switch/PoE audit,
all controller-side multi-client + AP-side where cred vaulted. ROADMAP remaining: gateway/WAN/firewall +
adoption collectors. Coord: this msg.