unifi-wifi: pfSense gateway access via SSH (pfSense-ssh.sh) + pfSense health section; layer OFF HOLD
DECISION (Mike, 2026-06-16): drop the RESTAPI package — VPN + SSH shell reads the same data and makes changes. Confirmed Cascades pfSense is Plus 25.07-RELEASE (current; the "too old" premise was wrong) and admin SSH = real shell (no menu). The upgrade/package blocker is moot; compat layer is off hold. - NEW scripts/pfsense-ssh.sh: audit (version/WAN-media/gateway-events/DHCP-exhaustion/states/DNS/load/NIC), dhcp (pool utilization + no-free-leases), run "<cmd>" (arbitrary, incl changes; operator-gated). Cred from clients/<slug>/pfsense-firewall; system OpenSSH via askpass. Validated live on Cascades. - audit report: added "pfSense health check (2026-06-16)" — DHCP NOT exhausted (192.168.0.0/22 pool 270/507, 0 no-free-leases), DNS up, dual-WAN stable (no gateway flaps), states/load healthy => gateway is NOT a WiFi factor; the 2.4 GHz RF work is the sole fix. (Minor: igc3/WAN2 I225 2.5G counter quirk, not a fault.) - ROADMAP §E + SKILL.md updated to the SSH backend decision; REST pfsense-backend.sh kept dormant/optional. - Remaining: named gated CONTROL verbs over SSH (easyrule block-ips, pf/fw toggles) + optional gw-* dispatch. - Closed obsolete coord todo (upgrade-pfSense-for-RESTAPI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -43,9 +43,12 @@ path is Cascades — override with the script's vault-path arg per client.
|
||||
to ride out transient VPN flaps without wasting a sweep.
|
||||
- **[WIP] Client DHCP/DNS policy, deeper VPN (server) config, adoption *remediation* depth** — port-forward
|
||||
+ WAN firewall is now covered (gw-control); remaining gateway config (VPN server stand-up, DHCP/DNS) is future.
|
||||
- **[SCAFFOLDED — ON HOLD] pfSense gateway compatibility layer** — `scripts/pfsense-backend.sh` (REST API pkg backend).
|
||||
ON HOLD (Howard 2026-06-16): the RESTAPI package needs a newer pfSense than Cascades runs — **blocked on a
|
||||
pfSense upgrade** before any live use. Code is complete; see ROADMAP §E "BLOCKER / Resume trigger".
|
||||
- **[WORKING] pfSense gateway access via SSH** — `scripts/pfsense-ssh.sh <slug> audit|dhcp|run "<cmd>"`.
|
||||
DECISION (Mike 2026-06-16): **no RESTAPI package needed** — VPN + SSH shell reads the same data and makes
|
||||
changes. Cred = `clients/<slug>/pfsense-firewall`. Validated live on Cascades (pfSense Plus 25.07; admin
|
||||
SSH = real shell). `audit`/`dhcp` are read-only; `run` executes arbitrary commands (incl. changes —
|
||||
operator-gated, no dry-run). Structured/gated CONTROL verbs (block-ips via easyrule, pf/fw toggles) are
|
||||
the remaining build — ROADMAP §E. (REST `pfsense-backend.sh` kept as a dormant optional alternative.)
|
||||
`gw-audit.sh`/`gw-control.sh` now **auto-dispatch** to it when a site has no UniFi gateway (num_gw=0) AND a
|
||||
pfSense API cred is vaulted at `clients/<slug>/pfsense-api` (or pass `--pfsense <slug>` when the UOS site
|
||||
name differs from the client slug) — the SAME verbs (`gw-audit`, `pf-list/disable/enable/set-ports`,
|
||||
|
||||
@@ -119,22 +119,26 @@ exists for at least two sites; per-client pfSense cred vaulting mirrors the AP-S
|
||||
collectors). DONE: writes are `--apply`-gated and save a per-object rollback to `.claude/tmp/`, and
|
||||
pfSense `firewall/apply` is called after each change. config.xml backup-first is the SSH-fallback's job.
|
||||
|
||||
**STATUS: SCAFFOLDED — ON HOLD (blocked on pfSense upgrade).** Build complete (backend + dispatch +
|
||||
setup helper); the BLOCKED/setup/no-cred-hint paths are tested. The live REST calls
|
||||
(audit/pf-*/fw-*/block-ips) need a reachable pfSense with the API pkg installed + a key vaulted; REST
|
||||
endpoint paths follow the v2 schema and must be verified against the installed API version on first live run.
|
||||
**DECISION (Mike, 2026-06-16): backend = SSH, NOT the REST API package.** "We don't need the RESTAPI —
|
||||
with VPN + SSH we can read the same data and make changes." Confirmed: Cascades pfSense is **Plus
|
||||
25.07-RELEASE** (current, not old — the earlier "too old" premise was wrong) and **admin SSH drops
|
||||
straight to a shell** (no menu gotcha). So the upgrade/package blocker is **MOOT** and the layer is
|
||||
**OFF HOLD**.
|
||||
|
||||
**[BLOCKER — Howard 2026-06-16]** `pfSense-pkg-RESTAPI` is third-party and the **Cascades pfSense is too
|
||||
old to install it**. PREREQUISITE: **upgrade the Cascades pfSense** (firmware) before the package will
|
||||
install. Work is ON HOLD until that upgrade is done. After the upgrade: install RESTAPI → mint a read-only
|
||||
key (write-capable for control) → `pfsense-backend.sh clients/cascades-tucson/pfsense-api setup` →
|
||||
vault url+apikey at `clients/cascades-tucson/pfsense-api` → first live `gw-audit cascades` to verify
|
||||
v2 endpoints. (Also blocked from Howard-Home by the `.0.0/24` home-LAN shadow over pfSense `192.168.0.1` —
|
||||
run the first live validation from/through the Cascades network.) ACG office pfSense (`infrastructure/
|
||||
pfsense-firewall`) may be a newer box usable as the first live test once it has the pkg + a vaulted key.
|
||||
**STATUS: WORKING (read) via `scripts/pfsense-ssh.sh` — control verbs WIP.**
|
||||
- `pfsense-ssh.sh <slug> audit` — version/WAN-media/gateway-events/DHCP-exhaustion/states/DNS/load/NIC-errors.
|
||||
- `pfsense-ssh.sh <slug> dhcp` — pool utilization + "no free leases" check.
|
||||
- `pfsense-ssh.sh <slug> run "<cmd>"` — arbitrary command (reads OR changes; operator-gated, no dry-run).
|
||||
- Cred = `clients/<slug>/pfsense-firewall` (host + admin user/pass), system OpenSSH via askpass. Validated
|
||||
live on Cascades 2026-06-16 (the pfSense-health audit in the unifi-full-audit report came from this).
|
||||
|
||||
**Resume trigger:** Cascades (or another client) pfSense upgraded + RESTAPI installable. The code is done;
|
||||
resuming = the setup/vault steps above + endpoint verification, no further build expected unless v2 paths differ.
|
||||
**Remaining build (SSH backend):** named, reviewed, gated CONTROL verbs mapping the gw-control contract to
|
||||
SSH primitives — `block-ips` → `easyrule block wan <ip>`; `pf-list`/`fw-list` → read config.xml / `pfSsh.php`;
|
||||
toggles → config edit + `filter_configure`/`rc.reload_all`; backup config.xml first. Then optionally wire
|
||||
gw-audit/gw-control dispatch to the SSH backend when `clients/<slug>/pfsense-firewall` exists + num_gw=0.
|
||||
|
||||
**Superseded/optional:** the REST `pfsense-backend.sh` + `clients/<slug>/pfsense-api` path stays in-tree
|
||||
as a dormant alternative (works if a site ever installs the pkg) but is no longer the plan.
|
||||
- [ ] **Site→gateway map:** record per-site gateway type + access (UOS site_id ↔ pfSense host/cred) so the
|
||||
driver auto-selects. Could live alongside `sites.sh` output.
|
||||
- [ ] **VPN convergence:** the "Deeper VPN — gateway-hosted VPN server" item (C) is *easier and better* on
|
||||
|
||||
68
.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh
Normal file
68
.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh
Normal file
@@ -0,0 +1,68 @@
|
||||
#!/usr/bin/env bash
|
||||
# pfsense-ssh.sh — talk to a client's pfSense over SSH (site VPN reachable). This is the SSH backend for
|
||||
# the gateway compatibility layer. DECISION (Mike, 2026-06-16): the RESTAPI package is NOT needed — with
|
||||
# VPN + SSH shell we can read the same data and make changes directly. (Confirmed on Cascades pfSense Plus
|
||||
# 25.07: admin SSH drops straight to a shell, no menu gotcha.)
|
||||
#
|
||||
# Cred from the vault: clients/<slug>/pfsense-firewall (top-level `host`, credentials.username/password).
|
||||
# Uses SYSTEM OpenSSH via an SSH_ASKPASS helper (no sshpass dependency); runs each call as `sh -s` over a
|
||||
# heredoc so awk/quoting is clean.
|
||||
#
|
||||
# Usage:
|
||||
# pfsense-ssh.sh <slug> audit # read-only health: version/WAN/DHCP-exhaustion/DNS/states/load
|
||||
# pfsense-ssh.sh <slug> dhcp # DHCP pool utilization + "no free leases" check
|
||||
# pfsense-ssh.sh <slug> run "<command>" # arbitrary command (CAN mutate — operator-gated; e.g. run "pfctl -si")
|
||||
# pfsense-ssh.sh <slug> shell # (prints the interactive ssh command to paste)
|
||||
# NOTE: `run` executes whatever you pass, including changes — there is no dry-run for it. For repeatable
|
||||
# changes prefer adding a named, reviewed verb here over ad-hoc `run`.
|
||||
set -uo pipefail
|
||||
REPO="$(git rev-parse --show-toplevel 2>/dev/null || echo .)"
|
||||
VAULT="$REPO/.claude/scripts/vault.sh"
|
||||
SLUG="${1:?usage: pfsense-ssh.sh <slug> <audit|dhcp|run|shell> [args]}"
|
||||
ACT="${2:?action: audit|dhcp|run|shell}"; shift 2 || true
|
||||
VP="clients/$SLUG/pfsense-firewall"
|
||||
HOST="$(bash "$VAULT" get-field "$VP" host 2>/dev/null || bash "$VAULT" get-field "$VP" credentials.host 2>/dev/null || true)"
|
||||
U="$(bash "$VAULT" get-field "$VP" credentials.username 2>/dev/null || true)"
|
||||
PP="$(bash "$VAULT" get-field "$VP" credentials.password 2>/dev/null || true)"; export PP
|
||||
if [ -z "$HOST" ] || [ -z "$U" ] || [ -z "$PP" ]; then
|
||||
echo "[BLOCKED] need host + admin creds at vault:$VP (fields: host, credentials.username, credentials.password)"; exit 2; fi
|
||||
if [ "$ACT" = "shell" ]; then echo "ssh ${U}@${HOST} # password in vault:$VP"; exit 0; fi
|
||||
|
||||
TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT
|
||||
ASKP="$TMP/a.sh"; printf '#!/bin/sh\nprintf "%%s\\n" "$PP"\n' >"$ASKP"; chmod +x "$ASKP"
|
||||
# pfssh: feed remote sh a script on stdin; password via askpass (stderr noise dropped)
|
||||
pfssh(){ SSH_ASKPASS="$ASKP" SSH_ASKPASS_REQUIRE=force DISPLAY=:0 ssh \
|
||||
-o ConnectTimeout=12 -o StrictHostKeyChecking=accept-new -o UserKnownHostsFile=/dev/null \
|
||||
-o PreferredAuthentications=password -o PubkeyAuthentication=no -o NumberOfPasswordPrompts=1 \
|
||||
"$U@$HOST" 'sh -s' 2>/dev/null; }
|
||||
|
||||
echo "[INFO] pfSense $ACT @ $U@$HOST (vault:$VP)"
|
||||
case "$ACT" in
|
||||
run)
|
||||
CMD="$*"; [ -n "$CMD" ] || { echo "[ERROR] run needs a command"; exit 1; }
|
||||
printf '%s\n' "$CMD" | pfssh ;;
|
||||
dhcp)
|
||||
pfssh <<'RSCRIPT'
|
||||
echo "## DHCP backend"; { pgrep -lf dhcpd >/dev/null && echo "ISC dhcpd active"; }; { pgrep -lf kea >/dev/null && echo "Kea active"; }
|
||||
echo "## 'no free leases' events (exhaustion)"; clog /var/log/dhcpd.log 2>/dev/null | grep -ic 'no free leases'
|
||||
echo "## active leases per /24 (top 20)"
|
||||
awk '/^lease /{ip=$2} /binding state active/{a[ip]=1} END{for(i in a){n=i; sub(/\.[0-9]+$/,"",n); c[n]++} for(k in c) print c[k], k}' /var/dhcpd/var/db/dhcpd.leases 2>/dev/null | sort -rn | head -20
|
||||
echo "## pool ranges (subnet -> range)"; grep -hE 'subnet|range ' /var/dhcpd/etc/dhcpd.conf 2>/dev/null | paste - - | head -40
|
||||
RSCRIPT
|
||||
;;
|
||||
audit)
|
||||
pfssh <<'RSCRIPT'
|
||||
echo "## VERSION"; cat /etc/version 2>/dev/null
|
||||
echo "## UPTIME/LOAD"; uptime
|
||||
echo "## physical interfaces (media/status; VLAN sub-ifs skipped)"; for i in $(ifconfig -l); do case $i in *.*) continue;; igc[0-9]|em[0-9]|ix[0-9]|vmx[0-9]) m=$(ifconfig $i 2>/dev/null | grep -E 'media|status' | tr '\n' ' '); [ -n "$m" ] && echo " $i: $m";; esac; done
|
||||
echo "## GATEWAY loss/down events (last 8)"; clog /var/log/gateways.log 2>/dev/null | tail -8
|
||||
echo "## DHCP exhaustion ('no free leases' count)"; clog /var/log/dhcpd.log 2>/dev/null | grep -ic 'no free leases'
|
||||
echo "## DHCP busiest /24s (top 8)"; awk '/^lease /{ip=$2} /binding state active/{a[ip]=1} END{for(i in a){n=i; sub(/\.[0-9]+$/,"",n); c[n]++} for(k in c) print c[k], k}' /var/dhcpd/var/db/dhcpd.leases 2>/dev/null | sort -rn | head -8
|
||||
echo "## PF states"; pfctl -si 2>/dev/null | grep -iE 'current entries|searches'; pfctl -sm 2>/dev/null | grep -E '^states'
|
||||
echo "## DNS resolver"; pgrep -lf unbound >/dev/null && echo "unbound running" || echo "unbound NOT running"
|
||||
echo "## mbuf"; netstat -m 2>/dev/null | head -1
|
||||
echo "## NIC errors (Ierrs/Oerrs/Coll)"; netstat -i 2>/dev/null | awk 'NR==1 || ($1 ~ /^(igc|em|ix|vmx)[0-9]$/)'
|
||||
RSCRIPT
|
||||
;;
|
||||
*) echo "action: audit|dhcp|run|shell"; exit 1;;
|
||||
esac
|
||||
@@ -47,3 +47,30 @@ no UniFi gateway (pfSense firewall). All collectors ran clean.
|
||||
|
||||
(All changes via the gated `apply-radio`/`apply-wlan`/`channel-plan` scripts — per zone, with rollback +
|
||||
live validation. Nothing applied in this audit.)
|
||||
|
||||
---
|
||||
|
||||
## pfSense health check (2026-06-16) — ruling out the gateway as a WiFi factor
|
||||
|
||||
Investigated the Cascades pfSense (`192.168.0.1`, **pfSense Plus 25.07-RELEASE**, Netgate) over the site
|
||||
VPN via SSH, to confirm whether any gateway-side issue contributes to the "WiFi bad for some users"
|
||||
symptom. **Verdict: pfSense is healthy and is NOT a contributor — the problem is RF-side (2.4 GHz).**
|
||||
|
||||
| Area | Finding | WiFi impact |
|
||||
|---|---|---|
|
||||
| **DHCP exhaustion** | **0** "no free leases" events in dhcpd.log. WiFi/AP pool `192.168.0.0/22` (range 192.168.2.2–3.254, cap ~507) only **270 active (~53%)**; per-unit /28s + `10.0.20/.50` all have headroom | **Ruled out** (was the top suspect) |
|
||||
| **DNS** | unbound resolver running | Fine |
|
||||
| **WAN** | Dual Cox — WAN1 `184.191.143.62/30`, WAN2 `72.211.21.217/27`, both active **full-duplex**, `WAN_Group` gateway group, **no loss/down events** logged | Fine |
|
||||
| **Firewall states** | 28,368 / 790,000 limit | Fine |
|
||||
| **CPU / mbuf / uptime** | load 0.6, mbufs nominal, 10-day uptime | Healthy |
|
||||
|
||||
**Architecture:** per-unit design — **199 DHCP subnets**, mostly `10.x.y.0/28` per apartment (assisted-
|
||||
living L2 isolation) + the `192.168.0.0/22` staff/AP network (APs + most WiFi clients). Active DHCP
|
||||
backend is **ISC** (Kea config present but dormant).
|
||||
|
||||
**Minor (not WiFi-related):** `igc3`/WAN2 logged 1707 input-errors + 1707 "collisions", but the link is
|
||||
2.5GbE full-duplex/active with zero gateway loss — consistent with the known Intel I225/I226 2.5G counter
|
||||
quirk, not a real fault. No action needed unless WAN2 misbehaves.
|
||||
|
||||
**Conclusion:** gateway/DHCP/DNS/WAN are not bottlenecking the wireless. The 2.4 GHz remediation
|
||||
(power-down + coverage-redundancy disables) remains the correct and sole fix for the client-experience tail.
|
||||
|
||||
Reference in New Issue
Block a user