sync: auto-sync from HOWARD-HOME at 2026-06-16 01:20:51
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-16 01:20:51
This commit is contained in:
@@ -29,9 +29,12 @@ path is Cascades — override with the script's vault-path arg per client.
|
||||
- **[WORKING] Switch / PoE audit** — `scripts/switch-audit.sh <site> [--all-ports]`: per-switch PoE
|
||||
budget, port up/total, and flags (UNDERSPEED gig-port-at-10/100, rate-based DROPS, ERRORS, PoE
|
||||
faults, PoE-budget pressure). Controller-side, any site. (Found ~25 100M-linked gig ports at Cascades.)
|
||||
- **[WIP] Gateways / WAN / firewall, adoption, client DHCP/DNS** — not yet wrapped as collectors.
|
||||
The access layer reaches them (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH) —
|
||||
these just need dedicated scripts. Productize as the need arises (or split into a sibling `unifi` skill).
|
||||
- **[WORKING] Gateway / WAN / internet + site health** — `scripts/gw-audit.sh <site>`: WAN status/IP/
|
||||
uplink, internet latency/drops/speedtest, gateway CPU/mem/uptime, and the adoption/health rollup
|
||||
(APs/switches adopted vs disconnected vs pending, client counts, firmware-upgradable) with flags.
|
||||
Handles third-party-firewall sites (num_gw=0). Controller-side, any site.
|
||||
- **[WIP] Deeper firewall/VPN policy, client DHCP/DNS, adoption *remediation*** — read/health is covered
|
||||
by gw-audit; deeper config + remediation actions are future. Access layer reaches them already.
|
||||
- **Per-client requirement:** `watch-ap`/`neighbor-collect`/`survey-collect`/`dfs-check` default the
|
||||
AP device-auth SSH cred to `clients/cascades-tucson/unifi-ap-ssh`; for another client, vault its
|
||||
own `clients/<x>/unifi-ap-ssh` and pass it as the script's vault-path arg.
|
||||
|
||||
@@ -53,9 +53,11 @@ side, multi-client enablement, and non-WiFi scope. Build/validate new apply acti
|
||||
- [x] **Switch/PoE collector** — DONE: `switch-audit.sh` (port up/down, PoE budget+draw, ERRORS,
|
||||
rate-based DROPS, PoE faults, and UNDERSPEED = gig-capable port linked at 10/100 = the FastEthernet
|
||||
issue, now systematically found — ~25 such ports at Cascades). Controller-side, any site.
|
||||
- [ ] **Gateway/WAN/firewall + adoption** — WAN health/failover, pending-adoption devices.
|
||||
- The access layer already reaches these (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH);
|
||||
they just need dedicated scripts. Consider a sibling `unifi` skill if scope grows.
|
||||
- [x] **Gateway/WAN + site health** — DONE: `gw-audit.sh` (WAN status/IP/uplink, internet latency/
|
||||
drops/speedtest, gw CPU/mem/uptime, adoption rollup: adopted/disconnected/pending + firmware-upgradable,
|
||||
with flags). Handles third-party-firewall sites. Validated on a USG site + the pfSense Cascades site.
|
||||
- [ ] **Deeper firewall/VPN policy + adoption remediation** — read/health covered; config + remediation
|
||||
actions (adopt a pending device, restart, etc.) are future. Access layer already reaches these.
|
||||
|
||||
## D. Robustness / ops
|
||||
- [ ] **VPN-flap resilience** in the AP-side loops (resume/retry so a mid-run tunnel drop doesn't waste
|
||||
|
||||
92
.claude/skills/unifi-wifi/scripts/gw-audit.sh
Normal file
92
.claude/skills/unifi-wifi/scripts/gw-audit.sh
Normal file
@@ -0,0 +1,92 @@
|
||||
#!/usr/bin/env bash
|
||||
# gw-audit.sh — gateway / WAN / internet + overall site-health audit (controller-side, read-only).
|
||||
# Uses the controller's own stat/health (wan/www/wlan/lan/vpn subsystems) + the gateway device's
|
||||
# per-WAN detail. Reports WAN status + IP + uplink speed, internet latency/drops/last-speedtest,
|
||||
# gateway CPU/mem/uptime, and the adoption/health rollup (APs/switches adopted vs disconnected vs
|
||||
# pending, client counts). Flags: WAN/internet not-ok, high latency/drops, disconnected or pending
|
||||
# devices, gateway resource pressure. No AP cred / VPN needed -> any UOS site. (ROADMAP C)
|
||||
#
|
||||
# Sites with a third-party firewall (e.g. pfSense) show num_gw=0 -> "no UniFi gateway" (still reports
|
||||
# wlan/lan/adoption health).
|
||||
#
|
||||
# Usage: bash .claude/skills/unifi-wifi/scripts/gw-audit.sh <site-name|id>
|
||||
set -uo pipefail
|
||||
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
|
||||
UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh"
|
||||
HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}"
|
||||
SITEARG="${1:?usage: gw-audit.sh <site-name|id>}"
|
||||
TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT
|
||||
U="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.username 2>/dev/null)"
|
||||
P="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.password 2>/dev/null)"
|
||||
[ -n "$U" ] && [ -n "$P" ] || { echo "[ERROR] no controller cred (infrastructure/uos-server-network-api-rw)"; exit 1; }
|
||||
base="https://$HOST:$PORT"; CJ="$TMP/cj"
|
||||
code=$(curl -sk -c "$CJ" -o /dev/null -w '%{http_code}' -X POST "$base/api/auth/login" -H 'Content-Type: application/json' \
|
||||
--data-binary "$(python -c 'import json,sys;print(json.dumps({"username":sys.argv[1],"password":sys.argv[2]}))' "$U" "$P")")
|
||||
[ "$code" = "200" ] || { echo "[ERROR] controller login HTTP $code"; exit 1; }
|
||||
SHORT="$(curl -sk -b "$CJ" "$base/proxy/network/api/self/sites" | python -c "
|
||||
import sys,json; d=json.load(sys.stdin).get('data',[]); q='''$SITEARG'''.lower()
|
||||
for s in d:
|
||||
if s.get('_id')=='''$SITEARG''' or s.get('name')=='''$SITEARG''' or q in (s.get('desc','').lower()): print(s.get('name')); break
|
||||
")"; [ -n "$SHORT" ] || SHORT="$SITEARG"
|
||||
echo "[INFO] gateway/health audit: site=$SHORT"
|
||||
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/health" -o "$TMP/health.json"
|
||||
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" -o "$TMP/dev.json"
|
||||
python - "$TMP/health.json" "$TMP/dev.json" <<'PY'
|
||||
import sys,json
|
||||
H={h['subsystem']:h for h in json.load(open(sys.argv[1])).get('data',[])}
|
||||
devs=json.load(open(sys.argv[2])).get('data',[])
|
||||
flags=[]
|
||||
def st(x): return (x or 'unknown')
|
||||
wan=H.get('wan',{}); www=H.get('www',{}); wlan=H.get('wlan',{}); lan=H.get('lan',{}); vpn=H.get('vpn',{})
|
||||
|
||||
print("\n== WAN / Internet ==")
|
||||
ngw=wan.get('num_gw',0)
|
||||
if not ngw:
|
||||
print(" no UniFi gateway at this site (third-party firewall, e.g. pfSense)")
|
||||
else:
|
||||
gwdev=next((d for d in devs if d.get('type') in ('ugw','uxg','udm')),None)
|
||||
gwname=wan.get('gw_name') or (gwdev and (gwdev.get('name') or gwdev.get('model'))) or '?'
|
||||
gwmodel=(gwdev and gwdev.get('model')) or ''
|
||||
print(f" gateway: {gwname} [{gwmodel}] ({wan.get('gw_version')}) status={st(wan.get('status'))}")
|
||||
print(f" WAN IP: {wan.get('wan_ip')} gateways: {wan.get('gateways')} nameservers: {wan.get('nameservers')}")
|
||||
gs=wan.get('gw_system-stats') or {}
|
||||
if gs: print(f" gw load: cpu={gs.get('cpu')}% mem={gs.get('mem')}% uptime={int(int(gs.get('uptime',0))/86400)}d")
|
||||
if st(wan.get('status'))!='ok': flags.append(f"WAN status={wan.get('status')}")
|
||||
try:
|
||||
if float(gs.get('cpu',0))>85: flags.append(f"gateway CPU {gs.get('cpu')}%")
|
||||
if float(gs.get('mem',0))>90: flags.append(f"gateway MEM {gs.get('mem')}%")
|
||||
except: pass
|
||||
# internet (www)
|
||||
print(f" internet: status={st(www.get('status'))} latency={www.get('latency')}ms drops={www.get('drops')} "
|
||||
f"speedtest={www.get('xput_down')}/{www.get('xput_up')} Mbps ping={www.get('speedtest_ping')}ms")
|
||||
if www and st(www.get('status'))!='ok': flags.append(f"internet(www) status={www.get('status')}")
|
||||
try:
|
||||
if www.get('latency') and float(www.get('latency'))>80: flags.append(f"internet latency {www.get('latency')}ms")
|
||||
if www.get('drops') and float(www.get('drops'))>5: flags.append(f"internet drops={www.get('drops')}")
|
||||
except: pass
|
||||
# per-WAN device detail (multi-WAN)
|
||||
for d in devs:
|
||||
if d.get('type') in ('ugw','uxg','udm'):
|
||||
for wk in ('wan1','wan2'):
|
||||
w=d.get(wk) or {}
|
||||
if w.get('enable'): print(f" {wk}: ip={w.get('ip')} up={w.get('up')} {w.get('speed')}M{'' if w.get('full_duplex',True) else ' HALF-DUPLEX'}")
|
||||
if w.get('enable') and not w.get('up'): flags.append(f"{wk} DOWN")
|
||||
|
||||
print("\n== Adoption / device health ==")
|
||||
for label,sub in (('APs (wlan)',wlan),('switches (lan)',lan),('gateway (wan)',wan)):
|
||||
a=sub.get('num_adopted'); dc=sub.get('num_disconnected'); pe=sub.get('num_pending')
|
||||
if a is None and dc is None: continue
|
||||
print(f" {label}: adopted={a} disconnected={dc} pending={pe}")
|
||||
if dc: flags.append(f"{label}: {dc} disconnected")
|
||||
if pe: flags.append(f"{label}: {pe} pending adoption")
|
||||
print(f" clients: users={wlan.get('num_user')} guests={wlan.get('num_guest')} iot={wlan.get('num_iot')}")
|
||||
# outdated firmware (from device list)
|
||||
outd=[d.get('name') for d in devs if d.get('upgradable')]
|
||||
if outd: flags.append(f"{len(outd)} device(s) with firmware upgrade available")
|
||||
|
||||
print("\n== FLAGS ==")
|
||||
if flags:
|
||||
for f in flags: print(f" [!] {f}")
|
||||
else:
|
||||
print(" [OK] gateway/WAN/adoption all healthy")
|
||||
PY
|
||||
@@ -464,3 +464,22 @@ cabling/NIC issue = the long-suspected FastEthernet uplink problem, now fleet-vi
|
||||
SKILL coverage now: WiFi (monitor+tune+full apply incl. device lock/client controls) + switch/PoE audit,
|
||||
all controller-side multi-client + AP-side where cred vaulted. ROADMAP remaining: gateway/WAN/firewall +
|
||||
adoption collectors. Coord: this msg.
|
||||
|
||||
---
|
||||
|
||||
## Update: 2026-06-16 01:20 PT — gateway/WAN + site-health collector (gw-audit.sh); ROADMAP C largely done
|
||||
|
||||
NEW scripts/gw-audit.sh <site> (controller-side, any site, read-only) via stat/health + gateway device:
|
||||
WAN status/IP/uplink/duplex + gw CPU/mem/uptime; internet(www) status/latency/drops/last-speedtest;
|
||||
health rollup (APs/switches/gw adopted vs disconnected vs pending, client user/guest/iot, firmware-
|
||||
upgradable); flags WAN/internet not-ok, latency>80/drops>5, WAN down, disconnected/pending, gw cpu>85/
|
||||
mem>90. Third-party-firewall sites (num_gw=0) handled. Fixed: gw_name fallback to device name; drops
|
||||
threshold >5 (was >0, too sensitive).
|
||||
|
||||
Validated: Sonoran Glass (USG 3P [UGW3]) -> all healthy (cpu2%/mem21%, latency 14-24ms); Cascades
|
||||
(pfSense) -> "no UniFi gateway" + correctly caught 2 disconnected APs + 3 disconnected switches.
|
||||
|
||||
SKILL now end-to-end for any client: WiFi (monitor + tune + full apply incl device-lock/client-control),
|
||||
switch/PoE audit, gateway/WAN/site-health audit, sites.sh discovery. ROADMAP remaining: deeper
|
||||
firewall/VPN policy + adoption REMEDIATION (adopt/restart), channel-plan apply, per-client AP creds/VPN.
|
||||
Coord: this msg.
|
||||
|
||||
Reference in New Issue
Block a user