sync: auto-sync from HOWARD-HOME at 2026-06-16 07:44:03

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-16 07:44:03
This commit is contained in:
2026-06-16 07:44:13 -07:00
parent 69987190fc
commit d4d526ae26
8 changed files with 84 additions and 11 deletions

View File

@@ -37,6 +37,10 @@ path is Cascades — override with the script's vault-path arg per client.
list/disable/enable/delete/re-scope port-forwards (`pf-*`), toggle WAN firewall rules (`fw-disable`/
`fw-enable`), and drop attacker IPs at the edge (`block-ips`). The write companion to gw-audit; closes
an internet-facing exposure (e.g. a brute-forced PPTP). Gated/DRY-RUN, rollback saved. Controller-side.
- **[WORKING] Scheduled fleet monitoring** — `scripts/monitor-run.sh <site|all>`: controller-side
read-only health digest per site (gateway/WAN flags + switch/PoE flags + WiFi config flag count),
cron-friendly for ongoing monitoring of every client. AP-side collectors now **retry per-AP** (3x)
to ride out transient VPN flaps without wasting a sweep.
- **[WIP] Client DHCP/DNS policy, deeper VPN (server) config, adoption *remediation* depth** — port-forward
+ WAN firewall is now covered (gw-control); remaining gateway config (VPN server stand-up, DHCP/DNS) is future.
- **[PROPOSED] pfSense gateway compatibility layer** — the gateway verbs (gw-audit / gw-control / VPN) speak

View File

@@ -69,9 +69,10 @@ side, multi-client enablement, and non-WiFi scope. Build/validate new apply acti
client DHCP/DNS policy. Config beyond port-forward/WAN-firewall; future. Access layer reaches it.
## D. Robustness / ops
- [ ] **VPN-flap resilience** in the AP-side loops (resume/retry so a mid-run tunnel drop doesn't waste
a 4-min sweep). Background runs can't spawn the SSH_ASKPASS helper — must run foreground.
- [ ] **Scheduling** — periodic `dfs-check` + neighbor/survey refresh (DFS is time-varying).
- [x] **VPN-flap resilience** AP-side loops now retry per AP (3x, capture-to-var so a failed try never
appends partial data); dfs-check distinguishes unreachable from no-events. Validated (74/74). Still foreground.
- [x] **Scheduling / fleet monitoring**`monitor-run.sh <site|all>` = cron-friendly read-only health
digest (gateway + switch/PoE flags + WiFi flag count) per site. Validated. (Cron the `all` sweep nightly.)
- [ ] Vault read-only `infrastructure/uos-server-network-api` (least-privilege; RW does double duty now).
## E. pfSense gateway support — gateway "compatibility layer" (NEW, proposed 2026-06-16)

View File

@@ -70,7 +70,9 @@ n=0; tot=$(wc -l < "$TMP/aps.tsv"); hits=0
while IFS=$'\t' read -r name ip ch dfs; do
ip="${ip%$'\r'}"; dfs="${dfs%$'\r'}"; [ -z "$ip" ] && continue; n=$((n+1))
printf '\r[INFO] checking %d/%d ' "$n" "$tot" >&2
ev=$(ap_ssh "$AU@$ip" "dmesg 2>/dev/null | grep -iE '$PAT' | grep -iv 'cached' | tail -4" 2>/dev/null)
ev=""; t=0; reach=0 # retry per AP (transient VPN flaps); ap_ssh rc distinguishes unreachable from no-events
while [ $t -lt 3 ]; do if ev="$(ap_ssh "$AU@$ip" "dmesg 2>/dev/null | grep -iE '$PAT' | grep -iv 'cached' | tail -4" 2>/dev/null)"; then reach=1; break; fi; t=$((t+1)); sleep 2; done
[ "$reach" = 1 ] || { echo "[UNREACHABLE] $name ($ip) - skipped"; continue; }
cnt=$(printf '%s' "$ev" | grep -c . )
mark=$([ "$dfs" = "DFS" ] && echo ' *' || echo '')
if [ "$cnt" -gt 0 ]; then

View File

@@ -59,7 +59,7 @@ else:
# internet (www)
print(f" internet: status={st(www.get('status'))} latency={www.get('latency')}ms drops={www.get('drops')} "
f"speedtest={www.get('xput_down')}/{www.get('xput_up')} Mbps ping={www.get('speedtest_ping')}ms")
if www and st(www.get('status'))!='ok': flags.append(f"internet(www) status={www.get('status')}")
if ngw and www and st(www.get('status'))!='ok': flags.append(f"internet(www) status={www.get('status')}")
try:
if www.get('latency') and float(www.get('latency'))>80: flags.append(f"internet latency {www.get('latency')}ms")
if www.get('drops') and float(www.get('drops'))>5: flags.append(f"internet drops={www.get('drops')}")

View File

@@ -0,0 +1,44 @@
#!/usr/bin/env bash
# monitor-run.sh — scheduled fleet/site health sweep. Runs the CONTROLLER-SIDE audits (no AP cred /
# VPN needed) and prints a compact per-site digest of flags only, so it's cron-friendly for ongoing
# monitoring of every client we manage. Pure read-only.
# per site -> gw-audit flags (WAN/internet/disconnected/pending/firmware) + switch-audit flags
# (underspeed/PoE/errors/offline) + audit-site WiFi flag count.
#
# Usage:
# bash .claude/skills/unifi-wifi/scripts/monitor-run.sh <site-name|id> # one site
# bash .claude/skills/unifi-wifi/scripts/monitor-run.sh all # every UOS site (slow)
# Cron example (nightly digest): 0 6 * * * bash .../monitor-run.sh all >> /var/log/unifi-monitor.log 2>&1
set -uo pipefail
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
UOS="$REPO/.claude/scripts/uos-mongo.sh"; D="$REPO/.claude/skills/unifi-wifi/scripts"
ARG="${1:?usage: monitor-run.sh <site|all>}"
clean(){ grep -viE 'post-quantum|store now|upgraded|openssh.com|WARNING: connection'; }
sweep_one(){ # $1 = site id, $2 = display name
local sid="$1" nm="$2"
echo "================================================================"
echo "SITE: ${nm:-$sid} ($sid)"
# gateway/WAN/health flags
local g; g="$(bash "$D/gw-audit.sh" "$sid" 2>&1 | clean | sed -n '/^== FLAGS/,$p' | grep -E '^\s*\[' )"
echo " gateway/health:"; [ -n "$g" ] && echo "$g" | sed 's/^/ /' || echo " [OK]"
# switch/PoE flags (summary line + any flags)
local sflag; sflag="$(bash "$D/switch-audit.sh" "$sid" 2>&1 | clean | grep -E 'UNDERSPEED|ERRORS|DROPS|POE-|OFFLINE|flag\(s\)' )"
echo " switches:"; [ -n "$sflag" ] && echo "$sflag" | sed 's/^/ /' | head -20 || echo " [OK] no switches / no issues"
# wifi config flag count
local wc; wc="$(bash "$D/audit-site.sh" "$sid" 2>&1 | clean | grep -cE '^\s*\[!\]' )"
echo " wifi config flags: ${wc:-0}"
}
if [ "$ARG" = all ]; then
echo "[INFO] fleet health sweep — $(date '+%Y-%m-%d %H:%M') — all UOS sites (controller-side, read-only)"
bash "$UOS" --sites 2>/dev/null | clean | grep -E '^[0-9a-f]{24}' | while read -r sid rest; do
sweep_one "$sid" "$rest"
done
else
if [[ "$ARG" =~ ^[0-9a-f]{24}$ ]]; then SID="$ARG"; NM=""; else
line="$(bash "$UOS" --sites 2>/dev/null | clean | grep -i "$ARG" | head -1)"; SID="${line%% *}"; NM="${line#* }"; fi
[ -n "$SID" ] || { echo "[ERROR] site not found: $ARG"; exit 1; }
echo "[INFO] health sweep — $(date '+%Y-%m-%d %H:%M')$NM"
sweep_one "$SID" "$NM"
fi

View File

@@ -89,11 +89,10 @@ while IFS=$'\t' read -r name ip; do
ip="${ip%$'\r'}"; name="${name%$'\r'}" # strip any CR (Windows line endings) so ssh target is valid
[ -z "$ip" ] && continue; n=$((n+1))
echo "###AP $name $ip" >> "$RAW"
if ap_ssh "$AU@$ip" 'echo "@@ESS"; cat /proc/ui_neighbor/ess_ap_list 2>/dev/null; for s in /proc/ui_neighbor/ssid/*; do echo "@@SSID $s"; cat "$s" 2>/dev/null; done' >> "$RAW" 2>/dev/null; then
ok=$((ok+1))
else
echo "@@UNREACHABLE" >> "$RAW"
fi
# retry per AP (transient VPN flaps); capture to var so a failed try never appends partial data
out=""; t=0
while [ $t -lt 3 ]; do if out="$(ap_ssh "$AU@$ip" 'echo "@@ESS"; cat /proc/ui_neighbor/ess_ap_list 2>/dev/null; for s in /proc/ui_neighbor/ssid/*; do echo "@@SSID $s"; cat "$s" 2>/dev/null; done' 2>/dev/null)" && [ -n "$out" ]; then break; fi; t=$((t+1)); sleep 2; done
if [ -n "$out" ]; then printf '%s\n' "$out" >> "$RAW"; ok=$((ok+1)); else echo "@@UNREACHABLE" >> "$RAW"; fi
printf '\r[INFO] harvested %d/%d (reachable %d) ' "$n" "$(wc -l < "$TMP/aps.tsv")" "$ok" >&2
done < "$TMP/aps.tsv"
echo "" >&2

View File

@@ -64,7 +64,9 @@ RAW="$TMP/raw.txt"; n=0; ok=0; tot=$(wc -l < "$TMP/aps.tsv")
while IFS=$'\t' read -r name ip; do
ip="${ip%$'\r'}"; name="${name%$'\r'}"; [ -z "$ip" ] && continue; n=$((n+1))
echo "###AP $name" >> "$RAW"
if ap_ssh "$AU@$ip" 'for r in wifi0 wifi1 wifi2 ath0 ath1; do iw dev $r survey dump 2>/dev/null; done' >> "$RAW" 2>/dev/null; then ok=$((ok+1)); else echo "@@UNREACHABLE" >> "$RAW"; fi
out=""; t=0 # retry per AP (transient VPN flaps); capture-to-var avoids partial appends
while [ $t -lt 3 ]; do if out="$(ap_ssh "$AU@$ip" 'for r in wifi0 wifi1 wifi2 ath0 ath1; do iw dev $r survey dump 2>/dev/null; done' 2>/dev/null)" && [ -n "$out" ]; then break; fi; t=$((t+1)); sleep 2; done
if [ -n "$out" ]; then printf '%s\n' "$out" >> "$RAW"; ok=$((ok+1)); else echo "@@UNREACHABLE" >> "$RAW"; fi
printf '\r[INFO] surveyed %d/%d (ok %d) ' "$n" "$tot" "$ok" >&2
done < "$TMP/aps.tsv"; echo "" >&2

View File

@@ -502,3 +502,24 @@ NEW channel-plan.sh <site> ng|na [--apply] (NEIGHBOR_JSON + SURVEY_JSON):
SKILL now: WiFi (monitor+tune+full apply+device-lock+client/device control+channel-plan) + switch/PoE
audit + gateway/WAN/site-health + multi-client. ROADMAP nearly clear (deeper firewall/VPN policy +
per-client AP creds/VPN remain). Coord: this msg.
---
## Update: 2026-06-16 07:44 PT — robustness (ROADMAP D): monitor-run.sh + per-AP retry; gw-audit pfSense fix
Synced first to pick up Mike's gw-control.sh (eb87710, firewall/port-forward router actions — the
"deeper firewall/VPN policy" item; no dup with my robustness work).
NEW scripts/monitor-run.sh <site|all> — cron-friendly controller-side read-only fleet health digest:
per site -> gateway/WAN flags + switch/PoE flags + WiFi config flag count. Validated Sonoran (healthy)
+ Cascades (flags 2 disc APs / 3 disc switches / underspeed / firmware). Cron 'all' nightly.
VPN-flap resilience: neighbor-collect / survey-collect / dfs-check now RETRY per AP (3x, capture-to-var
so a failed attempt never appends partial data; dfs-check distinguishes UNREACHABLE vs no-events).
Validated neighbor-collect end-to-end (reachable 74/74, redundancy 73/74, JSON 74 APs - identical).
Fix: gw-audit no longer false-flags internet status=unknown on third-party-firewall sites (gated on num_gw).
SKILL.md + ROADMAP updated (D items done). Skill is feature-complete for monitoring+tuning+apply across
WiFi/switch/gateway, multi-client, with scheduling + flap resilience. Remaining: per-client AP creds/VPN,
read-only cred (Mike to create UI admin), gateway VPN-server/DHCP-DNS (Mike). Coord: this msg.