diff --git a/.claude/skills/unifi-wifi/SKILL.md b/.claude/skills/unifi-wifi/SKILL.md index 2b7b548..502276a 100644 --- a/.claude/skills/unifi-wifi/SKILL.md +++ b/.claude/skills/unifi-wifi/SKILL.md @@ -43,9 +43,12 @@ path is Cascades — override with the script's vault-path arg per client. to ride out transient VPN flaps without wasting a sweep. - **[WIP] Client DHCP/DNS policy, deeper VPN (server) config, adoption *remediation* depth** — port-forward + WAN firewall is now covered (gw-control); remaining gateway config (VPN server stand-up, DHCP/DNS) is future. -- **[SCAFFOLDED — ON HOLD] pfSense gateway compatibility layer** — `scripts/pfsense-backend.sh` (REST API pkg backend). - ON HOLD (Howard 2026-06-16): the RESTAPI package needs a newer pfSense than Cascades runs — **blocked on a - pfSense upgrade** before any live use. Code is complete; see ROADMAP §E "BLOCKER / Resume trigger". +- **[WORKING] pfSense gateway access via SSH** — `scripts/pfsense-ssh.sh audit|dhcp|run ""`. + DECISION (Mike 2026-06-16): **no RESTAPI package needed** — VPN + SSH shell reads the same data and makes + changes. Cred = `clients//pfsense-firewall`. Validated live on Cascades (pfSense Plus 25.07; admin + SSH = real shell). `audit`/`dhcp` are read-only; `run` executes arbitrary commands (incl. changes — + operator-gated, no dry-run). Structured/gated CONTROL verbs (block-ips via easyrule, pf/fw toggles) are + the remaining build — ROADMAP §E. (REST `pfsense-backend.sh` kept as a dormant optional alternative.) `gw-audit.sh`/`gw-control.sh` now **auto-dispatch** to it when a site has no UniFi gateway (num_gw=0) AND a pfSense API cred is vaulted at `clients//pfsense-api` (or pass `--pfsense ` when the UOS site name differs from the client slug) — the SAME verbs (`gw-audit`, `pf-list/disable/enable/set-ports`, diff --git a/.claude/skills/unifi-wifi/references/ROADMAP.md b/.claude/skills/unifi-wifi/references/ROADMAP.md index 4ae90f6..29f6f9a 100644 --- a/.claude/skills/unifi-wifi/references/ROADMAP.md +++ b/.claude/skills/unifi-wifi/references/ROADMAP.md @@ -119,22 +119,26 @@ exists for at least two sites; per-client pfSense cred vaulting mirrors the AP-S collectors). DONE: writes are `--apply`-gated and save a per-object rollback to `.claude/tmp/`, and pfSense `firewall/apply` is called after each change. config.xml backup-first is the SSH-fallback's job. -**STATUS: SCAFFOLDED — ON HOLD (blocked on pfSense upgrade).** Build complete (backend + dispatch + -setup helper); the BLOCKED/setup/no-cred-hint paths are tested. The live REST calls -(audit/pf-*/fw-*/block-ips) need a reachable pfSense with the API pkg installed + a key vaulted; REST -endpoint paths follow the v2 schema and must be verified against the installed API version on first live run. +**DECISION (Mike, 2026-06-16): backend = SSH, NOT the REST API package.** "We don't need the RESTAPI — +with VPN + SSH we can read the same data and make changes." Confirmed: Cascades pfSense is **Plus +25.07-RELEASE** (current, not old — the earlier "too old" premise was wrong) and **admin SSH drops +straight to a shell** (no menu gotcha). So the upgrade/package blocker is **MOOT** and the layer is +**OFF HOLD**. -**[BLOCKER — Howard 2026-06-16]** `pfSense-pkg-RESTAPI` is third-party and the **Cascades pfSense is too -old to install it**. PREREQUISITE: **upgrade the Cascades pfSense** (firmware) before the package will -install. Work is ON HOLD until that upgrade is done. After the upgrade: install RESTAPI → mint a read-only -key (write-capable for control) → `pfsense-backend.sh clients/cascades-tucson/pfsense-api setup` → -vault url+apikey at `clients/cascades-tucson/pfsense-api` → first live `gw-audit cascades` to verify -v2 endpoints. (Also blocked from Howard-Home by the `.0.0/24` home-LAN shadow over pfSense `192.168.0.1` — -run the first live validation from/through the Cascades network.) ACG office pfSense (`infrastructure/ -pfsense-firewall`) may be a newer box usable as the first live test once it has the pkg + a vaulted key. +**STATUS: WORKING (read) via `scripts/pfsense-ssh.sh` — control verbs WIP.** +- `pfsense-ssh.sh audit` — version/WAN-media/gateway-events/DHCP-exhaustion/states/DNS/load/NIC-errors. +- `pfsense-ssh.sh dhcp` — pool utilization + "no free leases" check. +- `pfsense-ssh.sh run ""` — arbitrary command (reads OR changes; operator-gated, no dry-run). +- Cred = `clients//pfsense-firewall` (host + admin user/pass), system OpenSSH via askpass. Validated + live on Cascades 2026-06-16 (the pfSense-health audit in the unifi-full-audit report came from this). -**Resume trigger:** Cascades (or another client) pfSense upgraded + RESTAPI installable. The code is done; -resuming = the setup/vault steps above + endpoint verification, no further build expected unless v2 paths differ. +**Remaining build (SSH backend):** named, reviewed, gated CONTROL verbs mapping the gw-control contract to +SSH primitives — `block-ips` → `easyrule block wan `; `pf-list`/`fw-list` → read config.xml / `pfSsh.php`; +toggles → config edit + `filter_configure`/`rc.reload_all`; backup config.xml first. Then optionally wire +gw-audit/gw-control dispatch to the SSH backend when `clients//pfsense-firewall` exists + num_gw=0. + +**Superseded/optional:** the REST `pfsense-backend.sh` + `clients//pfsense-api` path stays in-tree +as a dormant alternative (works if a site ever installs the pkg) but is no longer the plan. - [ ] **Site→gateway map:** record per-site gateway type + access (UOS site_id ↔ pfSense host/cred) so the driver auto-selects. Could live alongside `sites.sh` output. - [ ] **VPN convergence:** the "Deeper VPN — gateway-hosted VPN server" item (C) is *easier and better* on diff --git a/.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh b/.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh new file mode 100644 index 0000000..7a7b217 --- /dev/null +++ b/.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh @@ -0,0 +1,68 @@ +#!/usr/bin/env bash +# pfsense-ssh.sh — talk to a client's pfSense over SSH (site VPN reachable). This is the SSH backend for +# the gateway compatibility layer. DECISION (Mike, 2026-06-16): the RESTAPI package is NOT needed — with +# VPN + SSH shell we can read the same data and make changes directly. (Confirmed on Cascades pfSense Plus +# 25.07: admin SSH drops straight to a shell, no menu gotcha.) +# +# Cred from the vault: clients//pfsense-firewall (top-level `host`, credentials.username/password). +# Uses SYSTEM OpenSSH via an SSH_ASKPASS helper (no sshpass dependency); runs each call as `sh -s` over a +# heredoc so awk/quoting is clean. +# +# Usage: +# pfsense-ssh.sh audit # read-only health: version/WAN/DHCP-exhaustion/DNS/states/load +# pfsense-ssh.sh dhcp # DHCP pool utilization + "no free leases" check +# pfsense-ssh.sh run "" # arbitrary command (CAN mutate — operator-gated; e.g. run "pfctl -si") +# pfsense-ssh.sh shell # (prints the interactive ssh command to paste) +# NOTE: `run` executes whatever you pass, including changes — there is no dry-run for it. For repeatable +# changes prefer adding a named, reviewed verb here over ad-hoc `run`. +set -uo pipefail +REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)" +VAULT="$REPO/.claude/scripts/vault.sh" +SLUG="${1:?usage: pfsense-ssh.sh [args]}" +ACT="${2:?action: audit|dhcp|run|shell}"; shift 2 || true +VP="clients/$SLUG/pfsense-firewall" +HOST="$(bash "$VAULT" get-field "$VP" host 2>/dev/null || bash "$VAULT" get-field "$VP" credentials.host 2>/dev/null || true)" +U="$(bash "$VAULT" get-field "$VP" credentials.username 2>/dev/null || true)" +PP="$(bash "$VAULT" get-field "$VP" credentials.password 2>/dev/null || true)"; export PP +if [ -z "$HOST" ] || [ -z "$U" ] || [ -z "$PP" ]; then + echo "[BLOCKED] need host + admin creds at vault:$VP (fields: host, credentials.username, credentials.password)"; exit 2; fi +if [ "$ACT" = "shell" ]; then echo "ssh ${U}@${HOST} # password in vault:$VP"; exit 0; fi + +TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT +ASKP="$TMP/a.sh"; printf '#!/bin/sh\nprintf "%%s\\n" "$PP"\n' >"$ASKP"; chmod +x "$ASKP" +# pfssh: feed remote sh a script on stdin; password via askpass (stderr noise dropped) +pfssh(){ SSH_ASKPASS="$ASKP" SSH_ASKPASS_REQUIRE=force DISPLAY=:0 ssh \ + -o ConnectTimeout=12 -o StrictHostKeyChecking=accept-new -o UserKnownHostsFile=/dev/null \ + -o PreferredAuthentications=password -o PubkeyAuthentication=no -o NumberOfPasswordPrompts=1 \ + "$U@$HOST" 'sh -s' 2>/dev/null; } + +echo "[INFO] pfSense $ACT @ $U@$HOST (vault:$VP)" +case "$ACT" in + run) + CMD="$*"; [ -n "$CMD" ] || { echo "[ERROR] run needs a command"; exit 1; } + printf '%s\n' "$CMD" | pfssh ;; + dhcp) + pfssh <<'RSCRIPT' +echo "## DHCP backend"; { pgrep -lf dhcpd >/dev/null && echo "ISC dhcpd active"; }; { pgrep -lf kea >/dev/null && echo "Kea active"; } +echo "## 'no free leases' events (exhaustion)"; clog /var/log/dhcpd.log 2>/dev/null | grep -ic 'no free leases' +echo "## active leases per /24 (top 20)" +awk '/^lease /{ip=$2} /binding state active/{a[ip]=1} END{for(i in a){n=i; sub(/\.[0-9]+$/,"",n); c[n]++} for(k in c) print c[k], k}' /var/dhcpd/var/db/dhcpd.leases 2>/dev/null | sort -rn | head -20 +echo "## pool ranges (subnet -> range)"; grep -hE 'subnet|range ' /var/dhcpd/etc/dhcpd.conf 2>/dev/null | paste - - | head -40 +RSCRIPT + ;; + audit) + pfssh <<'RSCRIPT' +echo "## VERSION"; cat /etc/version 2>/dev/null +echo "## UPTIME/LOAD"; uptime +echo "## physical interfaces (media/status; VLAN sub-ifs skipped)"; for i in $(ifconfig -l); do case $i in *.*) continue;; igc[0-9]|em[0-9]|ix[0-9]|vmx[0-9]) m=$(ifconfig $i 2>/dev/null | grep -E 'media|status' | tr '\n' ' '); [ -n "$m" ] && echo " $i: $m";; esac; done +echo "## GATEWAY loss/down events (last 8)"; clog /var/log/gateways.log 2>/dev/null | tail -8 +echo "## DHCP exhaustion ('no free leases' count)"; clog /var/log/dhcpd.log 2>/dev/null | grep -ic 'no free leases' +echo "## DHCP busiest /24s (top 8)"; awk '/^lease /{ip=$2} /binding state active/{a[ip]=1} END{for(i in a){n=i; sub(/\.[0-9]+$/,"",n); c[n]++} for(k in c) print c[k], k}' /var/dhcpd/var/db/dhcpd.leases 2>/dev/null | sort -rn | head -8 +echo "## PF states"; pfctl -si 2>/dev/null | grep -iE 'current entries|searches'; pfctl -sm 2>/dev/null | grep -E '^states' +echo "## DNS resolver"; pgrep -lf unbound >/dev/null && echo "unbound running" || echo "unbound NOT running" +echo "## mbuf"; netstat -m 2>/dev/null | head -1 +echo "## NIC errors (Ierrs/Oerrs/Coll)"; netstat -i 2>/dev/null | awk 'NR==1 || ($1 ~ /^(igc|em|ix|vmx)[0-9]$/)' +RSCRIPT + ;; + *) echo "action: audit|dhcp|run|shell"; exit 1;; +esac diff --git a/clients/cascades-tucson/reports/2026-06-16-unifi-full-audit.md b/clients/cascades-tucson/reports/2026-06-16-unifi-full-audit.md index f2c28eb..3350c93 100644 --- a/clients/cascades-tucson/reports/2026-06-16-unifi-full-audit.md +++ b/clients/cascades-tucson/reports/2026-06-16-unifi-full-audit.md @@ -47,3 +47,30 @@ no UniFi gateway (pfSense firewall). All collectors ran clean. (All changes via the gated `apply-radio`/`apply-wlan`/`channel-plan` scripts — per zone, with rollback + live validation. Nothing applied in this audit.) + +--- + +## pfSense health check (2026-06-16) — ruling out the gateway as a WiFi factor + +Investigated the Cascades pfSense (`192.168.0.1`, **pfSense Plus 25.07-RELEASE**, Netgate) over the site +VPN via SSH, to confirm whether any gateway-side issue contributes to the "WiFi bad for some users" +symptom. **Verdict: pfSense is healthy and is NOT a contributor — the problem is RF-side (2.4 GHz).** + +| Area | Finding | WiFi impact | +|---|---|---| +| **DHCP exhaustion** | **0** "no free leases" events in dhcpd.log. WiFi/AP pool `192.168.0.0/22` (range 192.168.2.2–3.254, cap ~507) only **270 active (~53%)**; per-unit /28s + `10.0.20/.50` all have headroom | **Ruled out** (was the top suspect) | +| **DNS** | unbound resolver running | Fine | +| **WAN** | Dual Cox — WAN1 `184.191.143.62/30`, WAN2 `72.211.21.217/27`, both active **full-duplex**, `WAN_Group` gateway group, **no loss/down events** logged | Fine | +| **Firewall states** | 28,368 / 790,000 limit | Fine | +| **CPU / mbuf / uptime** | load 0.6, mbufs nominal, 10-day uptime | Healthy | + +**Architecture:** per-unit design — **199 DHCP subnets**, mostly `10.x.y.0/28` per apartment (assisted- +living L2 isolation) + the `192.168.0.0/22` staff/AP network (APs + most WiFi clients). Active DHCP +backend is **ISC** (Kea config present but dormant). + +**Minor (not WiFi-related):** `igc3`/WAN2 logged 1707 input-errors + 1707 "collisions", but the link is +2.5GbE full-duplex/active with zero gateway loss — consistent with the known Intel I225/I226 2.5G counter +quirk, not a real fault. No action needed unless WAN2 misbehaves. + +**Conclusion:** gateway/DHCP/DNS/WAN are not bottlenecking the wireless. The 2.4 GHz remediation +(power-down + coverage-redundancy disables) remains the correct and sole fix for the client-experience tail.