sync: auto-sync from HOWARD-HOME at 2026-06-17 22:46:27

Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-17 22:46:27
2026-06-17 22:46:37 -07:00
parent dc4560cf27
commit f36fb97eb8
4 changed files with 136 additions and 0 deletions
--- a/.claude/memory/MEMORY.md
+++ b/.claude/memory/MEMORY.md
@@ -101,6 +101,7 @@
 ### Cascades
 - [Cascades operational rules](feedback_cascades.md) — Active rules: (1) folder redirection (fdeploy) needs subfolders PRE-CREATED before first logon or it caches a failure forever; recovery via fix-shell-redirect.ps1. (2) ALWAYS ask which security group(s) a new user goes into — never auto-derive from OU. (3) Do NOT lock down the legacy Main\Company Web Docs\Accounting (Everyone:Full) folder — still in active use. (4) NEVER change Cascades production infra (pfSense/UniFi/switches/DHCP) without discussing it + explicit per-change go — read-only/dry-run until then.
 - [Cascades FR GPO fix](reference_cascades_fr_gpo_fix.md) — Native Folder Redirection was DOA on every machine: redirect targets were in a misnamed `fdeploy1.ini` (Windows reads `fdeploy.ini`) → empty target path → silent no-op → per-user registry workaround every time. Fixed 2026-06-08 (correct fdeploy.ini + version bump). Also: CS-SERVER live RMM agent is `c39f1de7...` (old `6766e973` stale).
+- [pfSense 25.07 ops quirks](reference_pfsense_25_07_ops.md) — Cascades pfSense Plus 25.07: logs are PLAIN TEXT (use tail/grep, NOT clog → clog returns empty); clean dhcpd restart = `services_dhcpd_configure()` via slow pfSsh.php (needs 50s+ timeout); dirty boot can leave 2 dhcpd → DISCOVER/OFFER but no ACK; reboot the Cox modem after a config restore; ZFS survives power loss. From the 2026-06-17 power-outage incident.
 - [feedback_ascii_only_api_payloads](feedback_ascii_only_api_payloads.md) -- On Windows/Git-bash, non-ASCII chars (em-dash, arrow, smart quotes) in JSON payload TEXT passed to curl get mangled and rejected — Discord bot-alert returns 400, the coord API returns "error parsing the body". Use ASCII-only in API payload text, or a single-quoted heredoc.

 ## Machine
--- a/.claude/memory/reference_pfsense_25_07_ops.md
+++ b/.claude/memory/reference_pfsense_25_07_ops.md
@@ -0,0 +1,28 @@
+---
+name: reference_pfsense_25_07_ops
+description: pfSense Plus 25.07 operational quirks learned during the Cascades power-outage recovery — plain-text logs (NOT clog), clean dhcpd restart via pfSsh.php, reboot the upstream modem after a config restore, ZFS power-loss resilience
+metadata:
+  type: reference
+---
+
+Learned on the Cascades pfSense (`192.168.0.1`, Plus 25.07-RELEASE, ZFS) during the 2026-06-17
+power-outage recovery. Access: `bash .claude/skills/unifi-wifi/scripts/pfsense-ssh.sh cascades-tucson run "<cmd>"`
+(admin SSH = real shell). Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`.
+
+- **Logs are PLAIN TEXT (ASCII), not clog binary.** `clog /var/log/dhcpd.log` returns EMPTY on 25.07
+  → do NOT conclude "logs are empty / service not logging." Read with `tail`/`grep`/`cat` directly.
+  (Burned a whole hypothesis on this — the DHCP server was actually fine.) `file /var/log/*.log` → ASCII text.
+- **Clean single-instance DHCP restart from shell:** `echo "services_dhcpd_configure();" | /usr/local/sbin/pfSsh.php`
+  (regenerates `/var/dhcpd/etc/dhcpd.conf` + restarts ONE dhcpd; kills duplicates). A power-loss/dirty boot
+  can leave **two `dhcpd` processes** fighting → clients get DISCOVER→OFFER but never REQUEST/ACK.
+  Verify: `pgrep -f "dhcpd -user" | wc -l` should be **1**. Test config: `dhcpd -t -cf /var/dhcpd/etc/dhcpd.conf`.
+- **`pfSsh.php` is SLOW to load (~20-40s).** SSH commands that invoke it need a long timeout (50s+) or they
+  time out mid-run and you can't tell if the action took.
+- **After a pfSense config restore/replace, REBOOT the upstream modem** (Cox at Cascades) to re-sync the WAN —
+  skipping this prolongs post-restore issues. Add to any restore runbook.
+- **ZFS root is power-loss resistant** — `zpool status -x` → "all pools are healthy"; `config.xml` survived an
+  unclean power-off intact. A 50x on the GUI right after a dirty boot is usually transient (services still starting).
+- **DHCP "offers but never completes" on ONE segment/switch** = asymmetric L2 forwarding (DISCOVER reaches
+  pfSense + OFFER sent on the right iface/subnet, but REQUEST=0/ACK=0 because the reply doesn't reach the client).
+  Root cause is the switch (re-adopted with stale forwarding/bad port profile), NOT pfSense — fix = reset/re-adopt
+  that switch. See [[reference_cascades_fr_gpo_fix]] for other Cascades infra notes.