sync: auto-sync from HOWARD-HOME at 2026-06-16 01:20:51

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-16 01:20:51
This commit is contained in:
2026-06-16 01:21:00 -07:00
parent 4e797dbf61
commit 03b429d10a
4 changed files with 122 additions and 6 deletions

View File

@@ -29,9 +29,12 @@ path is Cascades — override with the script's vault-path arg per client.
- **[WORKING] Switch / PoE audit** — `scripts/switch-audit.sh <site> [--all-ports]`: per-switch PoE
budget, port up/total, and flags (UNDERSPEED gig-port-at-10/100, rate-based DROPS, ERRORS, PoE
faults, PoE-budget pressure). Controller-side, any site. (Found ~25 100M-linked gig ports at Cascades.)
- **[WIP] Gateways / WAN / firewall, adoption, client DHCP/DNS** — not yet wrapped as collectors.
The access layer reaches them (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH) —
these just need dedicated scripts. Productize as the need arises (or split into a sibling `unifi` skill).
- **[WORKING] Gateway / WAN / internet + site health** — `scripts/gw-audit.sh <site>`: WAN status/IP/
uplink, internet latency/drops/speedtest, gateway CPU/mem/uptime, and the adoption/health rollup
(APs/switches adopted vs disconnected vs pending, client counts, firmware-upgradable) with flags.
Handles third-party-firewall sites (num_gw=0). Controller-side, any site.
- **[WIP] Deeper firewall/VPN policy, client DHCP/DNS, adoption *remediation*** — read/health is covered
by gw-audit; deeper config + remediation actions are future. Access layer reaches them already.
- **Per-client requirement:** `watch-ap`/`neighbor-collect`/`survey-collect`/`dfs-check` default the
AP device-auth SSH cred to `clients/cascades-tucson/unifi-ap-ssh`; for another client, vault its
own `clients/<x>/unifi-ap-ssh` and pass it as the script's vault-path arg.

View File

@@ -53,9 +53,11 @@ side, multi-client enablement, and non-WiFi scope. Build/validate new apply acti
- [x] **Switch/PoE collector** — DONE: `switch-audit.sh` (port up/down, PoE budget+draw, ERRORS,
rate-based DROPS, PoE faults, and UNDERSPEED = gig-capable port linked at 10/100 = the FastEthernet
issue, now systematically found — ~25 such ports at Cascades). Controller-side, any site.
- [ ] **Gateway/WAN/firewall + adoption** — WAN health/failover, pending-adoption devices.
- The access layer already reaches these (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH);
they just need dedicated scripts. Consider a sibling `unifi` skill if scope grows.
- [x] **Gateway/WAN + site health** — DONE: `gw-audit.sh` (WAN status/IP/uplink, internet latency/
drops/speedtest, gw CPU/mem/uptime, adoption rollup: adopted/disconnected/pending + firmware-upgradable,
with flags). Handles third-party-firewall sites. Validated on a USG site + the pfSense Cascades site.
- [ ] **Deeper firewall/VPN policy + adoption remediation** — read/health covered; config + remediation
actions (adopt a pending device, restart, etc.) are future. Access layer already reaches these.
## D. Robustness / ops
- [ ] **VPN-flap resilience** in the AP-side loops (resume/retry so a mid-run tunnel drop doesn't waste

View File

@@ -0,0 +1,92 @@
#!/usr/bin/env bash
# gw-audit.sh — gateway / WAN / internet + overall site-health audit (controller-side, read-only).
# Uses the controller's own stat/health (wan/www/wlan/lan/vpn subsystems) + the gateway device's
# per-WAN detail. Reports WAN status + IP + uplink speed, internet latency/drops/last-speedtest,
# gateway CPU/mem/uptime, and the adoption/health rollup (APs/switches adopted vs disconnected vs
# pending, client counts). Flags: WAN/internet not-ok, high latency/drops, disconnected or pending
# devices, gateway resource pressure. No AP cred / VPN needed -> any UOS site. (ROADMAP C)
#
# Sites with a third-party firewall (e.g. pfSense) show num_gw=0 -> "no UniFi gateway" (still reports
# wlan/lan/adoption health).
#
# Usage: bash .claude/skills/unifi-wifi/scripts/gw-audit.sh <site-name|id>
set -uo pipefail
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh"
HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}"
SITEARG="${1:?usage: gw-audit.sh <site-name|id>}"
TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT
U="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.username 2>/dev/null)"
P="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.password 2>/dev/null)"
[ -n "$U" ] && [ -n "$P" ] || { echo "[ERROR] no controller cred (infrastructure/uos-server-network-api-rw)"; exit 1; }
base="https://$HOST:$PORT"; CJ="$TMP/cj"
code=$(curl -sk -c "$CJ" -o /dev/null -w '%{http_code}' -X POST "$base/api/auth/login" -H 'Content-Type: application/json' \
--data-binary "$(python -c 'import json,sys;print(json.dumps({"username":sys.argv[1],"password":sys.argv[2]}))' "$U" "$P")")
[ "$code" = "200" ] || { echo "[ERROR] controller login HTTP $code"; exit 1; }
SHORT="$(curl -sk -b "$CJ" "$base/proxy/network/api/self/sites" | python -c "
import sys,json; d=json.load(sys.stdin).get('data',[]); q='''$SITEARG'''.lower()
for s in d:
if s.get('_id')=='''$SITEARG''' or s.get('name')=='''$SITEARG''' or q in (s.get('desc','').lower()): print(s.get('name')); break
")"; [ -n "$SHORT" ] || SHORT="$SITEARG"
echo "[INFO] gateway/health audit: site=$SHORT"
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/health" -o "$TMP/health.json"
curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" -o "$TMP/dev.json"
python - "$TMP/health.json" "$TMP/dev.json" <<'PY'
import sys,json
H={h['subsystem']:h for h in json.load(open(sys.argv[1])).get('data',[])}
devs=json.load(open(sys.argv[2])).get('data',[])
flags=[]
def st(x): return (x or 'unknown')
wan=H.get('wan',{}); www=H.get('www',{}); wlan=H.get('wlan',{}); lan=H.get('lan',{}); vpn=H.get('vpn',{})
print("\n== WAN / Internet ==")
ngw=wan.get('num_gw',0)
if not ngw:
print(" no UniFi gateway at this site (third-party firewall, e.g. pfSense)")
else:
gwdev=next((d for d in devs if d.get('type') in ('ugw','uxg','udm')),None)
gwname=wan.get('gw_name') or (gwdev and (gwdev.get('name') or gwdev.get('model'))) or '?'
gwmodel=(gwdev and gwdev.get('model')) or ''
print(f" gateway: {gwname} [{gwmodel}] ({wan.get('gw_version')}) status={st(wan.get('status'))}")
print(f" WAN IP: {wan.get('wan_ip')} gateways: {wan.get('gateways')} nameservers: {wan.get('nameservers')}")
gs=wan.get('gw_system-stats') or {}
if gs: print(f" gw load: cpu={gs.get('cpu')}% mem={gs.get('mem')}% uptime={int(int(gs.get('uptime',0))/86400)}d")
if st(wan.get('status'))!='ok': flags.append(f"WAN status={wan.get('status')}")
try:
if float(gs.get('cpu',0))>85: flags.append(f"gateway CPU {gs.get('cpu')}%")
if float(gs.get('mem',0))>90: flags.append(f"gateway MEM {gs.get('mem')}%")
except: pass
# internet (www)
print(f" internet: status={st(www.get('status'))} latency={www.get('latency')}ms drops={www.get('drops')} "
f"speedtest={www.get('xput_down')}/{www.get('xput_up')} Mbps ping={www.get('speedtest_ping')}ms")
if www and st(www.get('status'))!='ok': flags.append(f"internet(www) status={www.get('status')}")
try:
if www.get('latency') and float(www.get('latency'))>80: flags.append(f"internet latency {www.get('latency')}ms")
if www.get('drops') and float(www.get('drops'))>5: flags.append(f"internet drops={www.get('drops')}")
except: pass
# per-WAN device detail (multi-WAN)
for d in devs:
if d.get('type') in ('ugw','uxg','udm'):
for wk in ('wan1','wan2'):
w=d.get(wk) or {}
if w.get('enable'): print(f" {wk}: ip={w.get('ip')} up={w.get('up')} {w.get('speed')}M{'' if w.get('full_duplex',True) else ' HALF-DUPLEX'}")
if w.get('enable') and not w.get('up'): flags.append(f"{wk} DOWN")
print("\n== Adoption / device health ==")
for label,sub in (('APs (wlan)',wlan),('switches (lan)',lan),('gateway (wan)',wan)):
a=sub.get('num_adopted'); dc=sub.get('num_disconnected'); pe=sub.get('num_pending')
if a is None and dc is None: continue
print(f" {label}: adopted={a} disconnected={dc} pending={pe}")
if dc: flags.append(f"{label}: {dc} disconnected")
if pe: flags.append(f"{label}: {pe} pending adoption")
print(f" clients: users={wlan.get('num_user')} guests={wlan.get('num_guest')} iot={wlan.get('num_iot')}")
# outdated firmware (from device list)
outd=[d.get('name') for d in devs if d.get('upgradable')]
if outd: flags.append(f"{len(outd)} device(s) with firmware upgrade available")
print("\n== FLAGS ==")
if flags:
for f in flags: print(f" [!] {f}")
else:
print(" [OK] gateway/WAN/adoption all healthy")
PY

View File

@@ -464,3 +464,22 @@ cabling/NIC issue = the long-suspected FastEthernet uplink problem, now fleet-vi
SKILL coverage now: WiFi (monitor+tune+full apply incl. device lock/client controls) + switch/PoE audit,
all controller-side multi-client + AP-side where cred vaulted. ROADMAP remaining: gateway/WAN/firewall +
adoption collectors. Coord: this msg.
---
## Update: 2026-06-16 01:20 PT — gateway/WAN + site-health collector (gw-audit.sh); ROADMAP C largely done
NEW scripts/gw-audit.sh <site> (controller-side, any site, read-only) via stat/health + gateway device:
WAN status/IP/uplink/duplex + gw CPU/mem/uptime; internet(www) status/latency/drops/last-speedtest;
health rollup (APs/switches/gw adopted vs disconnected vs pending, client user/guest/iot, firmware-
upgradable); flags WAN/internet not-ok, latency>80/drops>5, WAN down, disconnected/pending, gw cpu>85/
mem>90. Third-party-firewall sites (num_gw=0) handled. Fixed: gw_name fallback to device name; drops
threshold >5 (was >0, too sensitive).
Validated: Sonoran Glass (USG 3P [UGW3]) -> all healthy (cpu2%/mem21%, latency 14-24ms); Cascades
(pfSense) -> "no UniFi gateway" + correctly caught 2 disconnected APs + 3 disconnected switches.
SKILL now end-to-end for any client: WiFi (monitor + tune + full apply incl device-lock/client-control),
switch/PoE audit, gateway/WAN/site-health audit, sites.sh discovery. ROADMAP remaining: deeper
firewall/VPN policy + adoption REMEDIATION (adopt/restart), channel-plan apply, per-client AP creds/VPN.
Coord: this msg.