sync: auto-sync from HOWARD-HOME at 2026-06-18 08:29:03
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-18 08:29:03
This commit is contained in:
@@ -0,0 +1,81 @@
|
||||
# Cascades — power-outage follow-up: OpenVPN flapping root cause + kitchen printer post-outage casualty
|
||||
|
||||
- **Date:** 2026-06-18
|
||||
- **Machine:** Howard-Home
|
||||
- **Client:** Cascades of Tucson
|
||||
- **Continuation of:** 2026-06-17 power-outage incident (`clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`)
|
||||
|
||||
## User
|
||||
- **User:** Howard Enos (howard)
|
||||
- **Machine:** Howard-Home
|
||||
- **Role:** tech
|
||||
|
||||
## Session Summary
|
||||
|
||||
Short follow-up session on the 2026-06-17 Cascades power outage. Two items.
|
||||
|
||||
Diagnosed why the Howard-Home OpenVPN Connect tunnel to Cascades pfSense kept disconnecting/
|
||||
reconnecting. Read the pfSense OpenVPN server log (`/var/log/openvpn.log`): the disconnects are
|
||||
caused by a configured **inactivity timeout** — `Howard/... Inactivity timeout (--inactive),
|
||||
exiting` firing at ~5 min (connected 23:23:52 -> dropped 23:28:57 ~= 305s), after which OpenVPN
|
||||
Connect auto-reconnects. Ruled OUT duplicate-CN (0 "will cause previous active session" events),
|
||||
WAN instability (Cox gateway stable since the 20:47 recovery), and TLS/auth errors (clean auth each
|
||||
time; the "IP packet with unknown IP version=0" line is cosmetic). It is a configured idle-disconnect,
|
||||
not a fault. Fix = raise/disable the OpenVPN server `--inactive` timeout (keepalive pings do NOT
|
||||
reset it — `--inactive` measures tunnel data). Proposed, not applied (standing no-change-without-go rule).
|
||||
|
||||
Second: the kitchen thermal printer (iPad POS ticket printer) reported "disconnected from the network"
|
||||
and would not print the morning after the outage; Howard power-cycled it and it resumed printing iPad
|
||||
tickets. Root cause: it powered up DURING the DHCP-down window of the recovery (duplicate dhcpd +
|
||||
2nd-floor switch not passing offers), never got an IP, cached a disconnected state, and did not retry
|
||||
once the network was healthy. The power-cycle forced a fresh DHCP request against the now-healthy
|
||||
network. Not a printer or network fault. Ran a read-only straggler sweep on pfSense (pulled recent
|
||||
dhcpd.log, per-MAC DISCOVER vs ACK): 13/13 active DISCOVER senders are completing, 0 stuck — network
|
||||
healthy. Noted that "gave-up" casualties like the printer are INVISIBLE to a DHCP scan (they stopped
|
||||
requesting), so expect a few more "won't connect" reports today, each fixed by a power-cycle.
|
||||
Updated the incident report with the printer casualty + a recovery-checklist lesson; synced.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **OpenVPN flapping = configured `--inactive` idle timeout, not instability** — diagnosed from the
|
||||
server log rather than guessing; ruled out duplicate-CN / WAN / TLS. Fix proposed (raise/disable inactive), not applied.
|
||||
- **Printer = power-outage DHCP-down-window casualty** — correct fix was the power-cycle (re-DHCP);
|
||||
no network change needed. Captured as a recovery-checklist item (power-cycle devices that booted during the DHCP-down window).
|
||||
- **A DHCP-log scan cannot find gave-up casualties** (they stop requesting) — so the realistic plan is
|
||||
reactive (power-cycle as reports come in), not a proactive scan.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
- No infrastructure changes. pfSense access was read-only (OpenVPN log, dhcpd log).
|
||||
- Repo: updated `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md` (added "Post-Recovery
|
||||
Casualties / Lessons" section — kitchen printer + the power-cycle-stragglers checklist item).
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- **Cascades pfSense** `192.168.0.1`, Plus 25.07. OpenVPN server `ovpns1`, user `Howard` (client IP pool
|
||||
192.168.10.x; this session it got 192.168.10.2). Server has an **`--inactive` idle timeout ~300s** that
|
||||
drops idle clients. WAN = Cox (igc0, 184.191.143.x / dpinger WAN_DHCP + WANCOAX_DHCP). pfSense logs are
|
||||
PLAIN TEXT (read with tail/grep, not clog).
|
||||
- OpenVPN client on Howard-Home: OpenVPN Connect (IV_GUI_VER=OCWindows_3.9.0-5008), public src 98.168.18.21.
|
||||
- **Kitchen thermal printer:** iPad POS ticket printer (exact IP/MAC not captured); resolved by power-cycle.
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- OpenVPN flap cause: `grep -i "inactivity timeout" /var/log/openvpn.log` -> `Howard/... Inactivity timeout (--inactive), exiting`; duplicate-CN count = 0.
|
||||
- Straggler sweep: pulled `tail -5000 /var/log/dhcpd.log` locally -> python per-MAC DISCOVER vs ACK -> 13 senders, 13 completing, 0 stuck.
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **OpenVPN flapping fix:** raise/disable the pfSense OpenVPN server `--inactive` timeout (proposed; needs go).
|
||||
- **Watch for more post-outage stragglers** (printers/POS/IoT that gave up) — power-cycle each as reported.
|
||||
- Carryover from the outage (unchanged): rotate the exposed Synology credential (vault history commit 1fbc0e1);
|
||||
enable AutoConfigBackup; UPS coverage/runtime/clean-shutdown review; 5GHz Option B + 2.4 Low->Medium bump
|
||||
(plus the auto-channel change still needs a proper data-driven re-plan).
|
||||
|
||||
## Reference Information
|
||||
|
||||
- Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md` (updated).
|
||||
- Prior session logs (same outage): `2026-06-17-howard-cascades-power-outage-recovery-and-5ghz.md`,
|
||||
`2026-06-17-howard-cascades-poly-phone-drops-network-smoothing.md`.
|
||||
- Memory: `reference_pfsense_25_07_ops.md`, `feedback_cascades.md` #4 (no prod change without discussing).
|
||||
- pfSense access: `bash .claude/skills/unifi-wifi/scripts/pfsense-ssh.sh cascades-tucson run "<cmd>"`.
|
||||
Reference in New Issue
Block a user