sync: auto-sync from HOWARD-HOME at 2026-06-18 08:29:03

Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-18 08:29:03
2026-06-18 08:29:11 -07:00
parent 95a29da79c
commit 7a10dff74c
1 changed files with 81 additions and 0 deletions
--- a/clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-cascades-outage-followup-openvpn-printer.md
+++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-cascades-outage-followup-openvpn-printer.md
@@ -0,0 +1,81 @@
+# Cascades — power-outage follow-up: OpenVPN flapping root cause + kitchen printer post-outage casualty
+
+- **Date:** 2026-06-18
+- **Machine:** Howard-Home
+- **Client:** Cascades of Tucson
+- **Continuation of:** 2026-06-17 power-outage incident (`clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`)
+
+## User
+- **User:** Howard Enos (howard)
+- **Machine:** Howard-Home
+- **Role:** tech
+
+## Session Summary
+
+Short follow-up session on the 2026-06-17 Cascades power outage. Two items.
+
+Diagnosed why the Howard-Home OpenVPN Connect tunnel to Cascades pfSense kept disconnecting/
+reconnecting. Read the pfSense OpenVPN server log (`/var/log/openvpn.log`): the disconnects are
+caused by a configured **inactivity timeout** — `Howard/... Inactivity timeout (--inactive),
+exiting` firing at ~5 min (connected 23:23:52 -> dropped 23:28:57 ~= 305s), after which OpenVPN
+Connect auto-reconnects. Ruled OUT duplicate-CN (0 "will cause previous active session" events),
+WAN instability (Cox gateway stable since the 20:47 recovery), and TLS/auth errors (clean auth each
+time; the "IP packet with unknown IP version=0" line is cosmetic). It is a configured idle-disconnect,
+not a fault. Fix = raise/disable the OpenVPN server `--inactive` timeout (keepalive pings do NOT
+reset it — `--inactive` measures tunnel data). Proposed, not applied (standing no-change-without-go rule).
+
+Second: the kitchen thermal printer (iPad POS ticket printer) reported "disconnected from the network"
+and would not print the morning after the outage; Howard power-cycled it and it resumed printing iPad
+tickets. Root cause: it powered up DURING the DHCP-down window of the recovery (duplicate dhcpd +
+2nd-floor switch not passing offers), never got an IP, cached a disconnected state, and did not retry
+once the network was healthy. The power-cycle forced a fresh DHCP request against the now-healthy
+network. Not a printer or network fault. Ran a read-only straggler sweep on pfSense (pulled recent
+dhcpd.log, per-MAC DISCOVER vs ACK): 13/13 active DISCOVER senders are completing, 0 stuck — network
+healthy. Noted that "gave-up" casualties like the printer are INVISIBLE to a DHCP scan (they stopped
+requesting), so expect a few more "won't connect" reports today, each fixed by a power-cycle.
+Updated the incident report with the printer casualty + a recovery-checklist lesson; synced.
+
+## Key Decisions
+
+- **OpenVPN flapping = configured `--inactive` idle timeout, not instability** — diagnosed from the
+  server log rather than guessing; ruled out duplicate-CN / WAN / TLS. Fix proposed (raise/disable inactive), not applied.
+- **Printer = power-outage DHCP-down-window casualty** — correct fix was the power-cycle (re-DHCP);
+  no network change needed. Captured as a recovery-checklist item (power-cycle devices that booted during the DHCP-down window).
+- **A DHCP-log scan cannot find gave-up casualties** (they stop requesting) — so the realistic plan is
+  reactive (power-cycle as reports come in), not a proactive scan.
+
+## Configuration Changes
+
+- No infrastructure changes. pfSense access was read-only (OpenVPN log, dhcpd log).
+- Repo: updated `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md` (added "Post-Recovery
+  Casualties / Lessons" section — kitchen printer + the power-cycle-stragglers checklist item).
+
+## Infrastructure & Servers
+
+- **Cascades pfSense** `192.168.0.1`, Plus 25.07. OpenVPN server `ovpns1`, user `Howard` (client IP pool
+  192.168.10.x; this session it got 192.168.10.2). Server has an **`--inactive` idle timeout ~300s** that
+  drops idle clients. WAN = Cox (igc0, 184.191.143.x / dpinger WAN_DHCP + WANCOAX_DHCP). pfSense logs are
+  PLAIN TEXT (read with tail/grep, not clog).
+- OpenVPN client on Howard-Home: OpenVPN Connect (IV_GUI_VER=OCWindows_3.9.0-5008), public src 98.168.18.21.
+- **Kitchen thermal printer:** iPad POS ticket printer (exact IP/MAC not captured); resolved by power-cycle.
+
+## Commands & Outputs
+
+- OpenVPN flap cause: `grep -i "inactivity timeout" /var/log/openvpn.log` -> `Howard/... Inactivity timeout (--inactive), exiting`; duplicate-CN count = 0.
+- Straggler sweep: pulled `tail -5000 /var/log/dhcpd.log` locally -> python per-MAC DISCOVER vs ACK -> 13 senders, 13 completing, 0 stuck.
+
+## Pending / Incomplete Tasks
+
+- **OpenVPN flapping fix:** raise/disable the pfSense OpenVPN server `--inactive` timeout (proposed; needs go).
+- **Watch for more post-outage stragglers** (printers/POS/IoT that gave up) — power-cycle each as reported.
+- Carryover from the outage (unchanged): rotate the exposed Synology credential (vault history commit 1fbc0e1);
+  enable AutoConfigBackup; UPS coverage/runtime/clean-shutdown review; 5GHz Option B + 2.4 Low->Medium bump
+  (plus the auto-channel change still needs a proper data-driven re-plan).
+
+## Reference Information
+
+- Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md` (updated).
+- Prior session logs (same outage): `2026-06-17-howard-cascades-power-outage-recovery-and-5ghz.md`,
+  `2026-06-17-howard-cascades-poly-phone-drops-network-smoothing.md`.
+- Memory: `reference_pfsense_25_07_ops.md`, `feedback_cascades.md` #4 (no prod change without discussing).
+- pfSense access: `bash .claude/skills/unifi-wifi/scripts/pfsense-ssh.sh cascades-tucson run "<cmd>"`.