From 03b429d10a81a7cf713d3b50733959618b7a016e Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Tue, 16 Jun 2026 01:21:00 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-06-16 01:20:51 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-16 01:20:51 --- .claude/skills/unifi-wifi/SKILL.md | 9 +- .../skills/unifi-wifi/references/ROADMAP.md | 8 +- .claude/skills/unifi-wifi/scripts/gw-audit.sh | 92 +++++++++++++++++++ ...026-06-15-howard-cascades-wifi-rf-audit.md | 19 ++++ 4 files changed, 122 insertions(+), 6 deletions(-) create mode 100644 .claude/skills/unifi-wifi/scripts/gw-audit.sh diff --git a/.claude/skills/unifi-wifi/SKILL.md b/.claude/skills/unifi-wifi/SKILL.md index d8efebf..14e6aa6 100644 --- a/.claude/skills/unifi-wifi/SKILL.md +++ b/.claude/skills/unifi-wifi/SKILL.md @@ -29,9 +29,12 @@ path is Cascades — override with the script's vault-path arg per client. - **[WORKING] Switch / PoE audit** — `scripts/switch-audit.sh [--all-ports]`: per-switch PoE budget, port up/total, and flags (UNDERSPEED gig-port-at-10/100, rate-based DROPS, ERRORS, PoE faults, PoE-budget pressure). Controller-side, any site. (Found ~25 100M-linked gig ports at Cascades.) -- **[WIP] Gateways / WAN / firewall, adoption, client DHCP/DNS** — not yet wrapped as collectors. - The access layer reaches them (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH) — - these just need dedicated scripts. Productize as the need arises (or split into a sibling `unifi` skill). +- **[WORKING] Gateway / WAN / internet + site health** — `scripts/gw-audit.sh `: WAN status/IP/ + uplink, internet latency/drops/speedtest, gateway CPU/mem/uptime, and the adoption/health rollup + (APs/switches adopted vs disconnected vs pending, client counts, firmware-upgradable) with flags. + Handles third-party-firewall sites (num_gw=0). Controller-side, any site. +- **[WIP] Deeper firewall/VPN policy, client DHCP/DNS, adoption *remediation*** — read/health is covered + by gw-audit; deeper config + remediation actions are future. Access layer reaches them already. - **Per-client requirement:** `watch-ap`/`neighbor-collect`/`survey-collect`/`dfs-check` default the AP device-auth SSH cred to `clients/cascades-tucson/unifi-ap-ssh`; for another client, vault its own `clients//unifi-ap-ssh` and pass it as the script's vault-path arg. diff --git a/.claude/skills/unifi-wifi/references/ROADMAP.md b/.claude/skills/unifi-wifi/references/ROADMAP.md index efc8509..d2bc949 100644 --- a/.claude/skills/unifi-wifi/references/ROADMAP.md +++ b/.claude/skills/unifi-wifi/references/ROADMAP.md @@ -53,9 +53,11 @@ side, multi-client enablement, and non-WiFi scope. Build/validate new apply acti - [x] **Switch/PoE collector** — DONE: `switch-audit.sh` (port up/down, PoE budget+draw, ERRORS, rate-based DROPS, PoE faults, and UNDERSPEED = gig-capable port linked at 10/100 = the FastEthernet issue, now systematically found — ~25 such ports at Cascades). Controller-side, any site. -- [ ] **Gateway/WAN/firewall + adoption** — WAN health/failover, pending-adoption devices. -- The access layer already reaches these (`uos-mongo.sh` = whole `ace` DB; controller API + device SSH); - they just need dedicated scripts. Consider a sibling `unifi` skill if scope grows. +- [x] **Gateway/WAN + site health** — DONE: `gw-audit.sh` (WAN status/IP/uplink, internet latency/ + drops/speedtest, gw CPU/mem/uptime, adoption rollup: adopted/disconnected/pending + firmware-upgradable, + with flags). Handles third-party-firewall sites. Validated on a USG site + the pfSense Cascades site. +- [ ] **Deeper firewall/VPN policy + adoption remediation** — read/health covered; config + remediation + actions (adopt a pending device, restart, etc.) are future. Access layer already reaches these. ## D. Robustness / ops - [ ] **VPN-flap resilience** in the AP-side loops (resume/retry so a mid-run tunnel drop doesn't waste diff --git a/.claude/skills/unifi-wifi/scripts/gw-audit.sh b/.claude/skills/unifi-wifi/scripts/gw-audit.sh new file mode 100644 index 0000000..190dcfa --- /dev/null +++ b/.claude/skills/unifi-wifi/scripts/gw-audit.sh @@ -0,0 +1,92 @@ +#!/usr/bin/env bash +# gw-audit.sh — gateway / WAN / internet + overall site-health audit (controller-side, read-only). +# Uses the controller's own stat/health (wan/www/wlan/lan/vpn subsystems) + the gateway device's +# per-WAN detail. Reports WAN status + IP + uplink speed, internet latency/drops/last-speedtest, +# gateway CPU/mem/uptime, and the adoption/health rollup (APs/switches adopted vs disconnected vs +# pending, client counts). Flags: WAN/internet not-ok, high latency/drops, disconnected or pending +# devices, gateway resource pressure. No AP cred / VPN needed -> any UOS site. (ROADMAP C) +# +# Sites with a third-party firewall (e.g. pfSense) show num_gw=0 -> "no UniFi gateway" (still reports +# wlan/lan/adoption health). +# +# Usage: bash .claude/skills/unifi-wifi/scripts/gw-audit.sh +set -uo pipefail +REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)" +UOS="$REPO/.claude/scripts/uos-mongo.sh"; VAULT="$REPO/.claude/scripts/vault.sh" +HOST="${UOS_HOST:-172.16.3.29}"; PORT="${UOS_HTTPS_PORT:-11443}" +SITEARG="${1:?usage: gw-audit.sh }" +TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT +U="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.username 2>/dev/null)" +P="$(bash "$VAULT" get-field infrastructure/uos-server-network-api-rw credentials.password 2>/dev/null)" +[ -n "$U" ] && [ -n "$P" ] || { echo "[ERROR] no controller cred (infrastructure/uos-server-network-api-rw)"; exit 1; } +base="https://$HOST:$PORT"; CJ="$TMP/cj" +code=$(curl -sk -c "$CJ" -o /dev/null -w '%{http_code}' -X POST "$base/api/auth/login" -H 'Content-Type: application/json' \ + --data-binary "$(python -c 'import json,sys;print(json.dumps({"username":sys.argv[1],"password":sys.argv[2]}))' "$U" "$P")") +[ "$code" = "200" ] || { echo "[ERROR] controller login HTTP $code"; exit 1; } +SHORT="$(curl -sk -b "$CJ" "$base/proxy/network/api/self/sites" | python -c " +import sys,json; d=json.load(sys.stdin).get('data',[]); q='''$SITEARG'''.lower() +for s in d: + if s.get('_id')=='''$SITEARG''' or s.get('name')=='''$SITEARG''' or q in (s.get('desc','').lower()): print(s.get('name')); break +")"; [ -n "$SHORT" ] || SHORT="$SITEARG" +echo "[INFO] gateway/health audit: site=$SHORT" +curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/health" -o "$TMP/health.json" +curl -sk -b "$CJ" "$base/proxy/network/api/s/$SHORT/stat/device" -o "$TMP/dev.json" +python - "$TMP/health.json" "$TMP/dev.json" <<'PY' +import sys,json +H={h['subsystem']:h for h in json.load(open(sys.argv[1])).get('data',[])} +devs=json.load(open(sys.argv[2])).get('data',[]) +flags=[] +def st(x): return (x or 'unknown') +wan=H.get('wan',{}); www=H.get('www',{}); wlan=H.get('wlan',{}); lan=H.get('lan',{}); vpn=H.get('vpn',{}) + +print("\n== WAN / Internet ==") +ngw=wan.get('num_gw',0) +if not ngw: + print(" no UniFi gateway at this site (third-party firewall, e.g. pfSense)") +else: + gwdev=next((d for d in devs if d.get('type') in ('ugw','uxg','udm')),None) + gwname=wan.get('gw_name') or (gwdev and (gwdev.get('name') or gwdev.get('model'))) or '?' + gwmodel=(gwdev and gwdev.get('model')) or '' + print(f" gateway: {gwname} [{gwmodel}] ({wan.get('gw_version')}) status={st(wan.get('status'))}") + print(f" WAN IP: {wan.get('wan_ip')} gateways: {wan.get('gateways')} nameservers: {wan.get('nameservers')}") + gs=wan.get('gw_system-stats') or {} + if gs: print(f" gw load: cpu={gs.get('cpu')}% mem={gs.get('mem')}% uptime={int(int(gs.get('uptime',0))/86400)}d") + if st(wan.get('status'))!='ok': flags.append(f"WAN status={wan.get('status')}") + try: + if float(gs.get('cpu',0))>85: flags.append(f"gateway CPU {gs.get('cpu')}%") + if float(gs.get('mem',0))>90: flags.append(f"gateway MEM {gs.get('mem')}%") + except: pass +# internet (www) +print(f" internet: status={st(www.get('status'))} latency={www.get('latency')}ms drops={www.get('drops')} " + f"speedtest={www.get('xput_down')}/{www.get('xput_up')} Mbps ping={www.get('speedtest_ping')}ms") +if www and st(www.get('status'))!='ok': flags.append(f"internet(www) status={www.get('status')}") +try: + if www.get('latency') and float(www.get('latency'))>80: flags.append(f"internet latency {www.get('latency')}ms") + if www.get('drops') and float(www.get('drops'))>5: flags.append(f"internet drops={www.get('drops')}") +except: pass +# per-WAN device detail (multi-WAN) +for d in devs: + if d.get('type') in ('ugw','uxg','udm'): + for wk in ('wan1','wan2'): + w=d.get(wk) or {} + if w.get('enable'): print(f" {wk}: ip={w.get('ip')} up={w.get('up')} {w.get('speed')}M{'' if w.get('full_duplex',True) else ' HALF-DUPLEX'}") + if w.get('enable') and not w.get('up'): flags.append(f"{wk} DOWN") + +print("\n== Adoption / device health ==") +for label,sub in (('APs (wlan)',wlan),('switches (lan)',lan),('gateway (wan)',wan)): + a=sub.get('num_adopted'); dc=sub.get('num_disconnected'); pe=sub.get('num_pending') + if a is None and dc is None: continue + print(f" {label}: adopted={a} disconnected={dc} pending={pe}") + if dc: flags.append(f"{label}: {dc} disconnected") + if pe: flags.append(f"{label}: {pe} pending adoption") +print(f" clients: users={wlan.get('num_user')} guests={wlan.get('num_guest')} iot={wlan.get('num_iot')}") +# outdated firmware (from device list) +outd=[d.get('name') for d in devs if d.get('upgradable')] +if outd: flags.append(f"{len(outd)} device(s) with firmware upgrade available") + +print("\n== FLAGS ==") +if flags: + for f in flags: print(f" [!] {f}") +else: + print(" [OK] gateway/WAN/adoption all healthy") +PY diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md index 39e6019..a7286a9 100644 --- a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md @@ -464,3 +464,22 @@ cabling/NIC issue = the long-suspected FastEthernet uplink problem, now fleet-vi SKILL coverage now: WiFi (monitor+tune+full apply incl. device lock/client controls) + switch/PoE audit, all controller-side multi-client + AP-side where cred vaulted. ROADMAP remaining: gateway/WAN/firewall + adoption collectors. Coord: this msg. + +--- + +## Update: 2026-06-16 01:20 PT — gateway/WAN + site-health collector (gw-audit.sh); ROADMAP C largely done + +NEW scripts/gw-audit.sh (controller-side, any site, read-only) via stat/health + gateway device: +WAN status/IP/uplink/duplex + gw CPU/mem/uptime; internet(www) status/latency/drops/last-speedtest; +health rollup (APs/switches/gw adopted vs disconnected vs pending, client user/guest/iot, firmware- +upgradable); flags WAN/internet not-ok, latency>80/drops>5, WAN down, disconnected/pending, gw cpu>85/ +mem>90. Third-party-firewall sites (num_gw=0) handled. Fixed: gw_name fallback to device name; drops +threshold >5 (was >0, too sensitive). + +Validated: Sonoran Glass (USG 3P [UGW3]) -> all healthy (cpu2%/mem21%, latency 14-24ms); Cascades +(pfSense) -> "no UniFi gateway" + correctly caught 2 disconnected APs + 3 disconnected switches. + +SKILL now end-to-end for any client: WiFi (monitor + tune + full apply incl device-lock/client-control), +switch/PoE audit, gateway/WAN/site-health audit, sites.sh discovery. ROADMAP remaining: deeper +firewall/VPN policy + adoption REMEDIATION (adopt/restart), channel-plan apply, per-client AP creds/VPN. +Coord: this msg.