7.8 KiB
Peaceful Spirit — Session Log 2026-06-04
User
- User: Mike Swanson (mike)
- Machine: GURU-5070
- Role: admin
Session Summary
Investigated a report that Bridgette's home machine (BridgettePSHomeComputer) was throwing VPN errors. The investigation established that this was not a Bridgette-specific or account-specific problem but a site-wide VPN outage: all Peaceful Spirit L2TP/IPsec clients were failing at the IPsec negotiation layer (Windows RAS error 789), which occurs before user authentication. MaraHomeNew failed identically while connecting as a different user (pst-admin), confirming the fault was below the per-user layer.
Diagnosis proceeded by elimination via GuruRMM remote commands. The RRAS endpoint on PST-SERVER was confirmed fully healthy: 30-day uptime (no overnight reboot), services running (RemoteAccess, IKEEXT, PolicyAgent, RasMan), IKE listening on UDP 500/4500, firewall allow-rules present, PSK matching the vault, and the public/egress IP unchanged at 98.190.129.150 — exactly what clients dial. The client-side NAT-T registry key was correct (value 2). IPsec auditing was temporarily enabled on the server and a live dial triggered from Bridgette; the server logged zero IKE/IPsec security events, proving the clients' negotiation packets were not reaching the server at all.
Root cause was isolated to the edge gateway. The site router (UDR Ultra, hostname UCG-PST-CC, 192.168.0.10) was reached read-only via an SSH key jump through PST-SERVER (the device's WAN SSH on :22 had also gone unreachable). Its live iptables ruleset contained no DNAT/port-forward for the VPN, and last reboot showed the device rebooted 2026-06-04 03:59 — the outage cutoff. After the reboot it came back without the UDP 500/4500 to 192.168.0.2 port-forward, so all inbound IPsec was silently dropped at the edge. The UniFi config's legacy Mongo collections (portforward/network/firewallrule/routing) all read 0, indicating this UniFi OS build (5.1.15) stores these in a migrated schema, so the controller UI was the authoritative place to confirm/restore the rule.
Mike re-added the Port Forward (UDP 500 + 4500 to 192.168.0.2) in the UniFi controller. Verification confirmed the DNAT rules were live in the UDR ruleset, and a live dial test showed Bridgette fully connected (assigned VPN IP 192.168.0.242, RAS event 20224 link established, no 789). Mara's IPsec link also established (no 789); her test returned 691 only because the test rasdial ran as SYSTEM, the documented wrong-principal artifact for this site, not a real fault. The outage was resolved end-to-end. Resolution and root cause were posted to #dev-alerts.
Key Decisions
- Treated the report as a whole-site diagnosis rather than a single-machine fix once MaraHomeNew showed the identical 789, since error 789 is a pre-authentication IPsec failure.
- Used IPsec auditing + a live dial as the decisive test to distinguish "packets not arriving (edge)" from "PSK/policy mismatch (server)" — zero IKE events conclusively pointed upstream of the server.
- Reached the UDR read-only via the PST-SERVER LAN jump (SSH key pushed to the server temporarily, used, then deleted) because the UDR's WAN SSH was unreachable and RMM was the only live channel to the LAN.
- Did not edit the UDR firewall over SSH; the UniFi controller re-pushes config, so iptables edits would not persist. The fix was directed to the controller UI.
- Left the controller change to Mike (per his choice) rather than driving the UniFi browser, then verified via RMM.
Problems Encountered
- Two GuruRMM agents matched "BridgettePSHomeComputer"; the wiki UUID (074141d7…) was stale/offline. The machine re-enrolled — live agent is 01160fc8… (v0.6.49). Resolved by always resolving hostname to UUID live.
- Remote command sent to the UDR over PowerShell→ssh→bash lost its quoting and the remote shell choked on parentheses/pipes. Resolved by base64-encoding the remote script and running
echo <b64> | base64 -d | sh. - A transient "/mingw64/bin/curl: Permission denied" during one dispatch. Resolved by writing the JSON payload to a file and using
--data-binary @file. - Mara's verification dial returned 691; identified as the known SYSTEM-rasdial wrong-principal artifact, not a real failure (IPsec link established successfully).
Configuration Changes
- UniFi controller (UDR Ultra, 192.168.0.10): Re-added Port Forward UDP 500 + 4500 → 192.168.0.2 (done by Mike in UI). This is the fix.
- PST-SERVER: IPsec auditing (Main/Extended/Quick Mode) temporarily enabled for the live-dial test, then restored to "No Auditing". No persistent change.
- No repo code changes. Session log + wiki article only.
Credentials & Secrets
- No new credentials created. Existing vault entries referenced:
clients/peaceful-spirit/server.sops.yaml→credentials.ucg(UDR SSH key~/.ssh/pst-cc-ucg, ssh_password, vpn_psk).clients/peaceful-spirit/vpn.sops.yaml→ L2TP PSKz5zkNBds2V9eIkdey09Zm6Khil3DAZs8(matches server).
- Vault drift noted (not yet fixed):
vpn.sops.yamllists pst-admin password24Hearts$, but the wiki records a reset toSpiritWalk26!on 2026-05-22. Needs reconciliation.
Infrastructure & Servers
- PST-SERVER — 192.168.0.2, Windows Server 2016 Essentials, PEACEFULSPIRIT.local DC, RRAS L2TP/IPsec endpoint. Public IP 98.190.129.150. RMM agent 87293069-33b6-45e8-a68f-6811216cdb96 (v0.6.52).
- UCG-PST-CC — UDR Ultra, LAN 192.168.0.10 (default gateway), WAN 98.190.129.150. UniFi OS 5.1.15, kernel 5.4.213-ui-ipq5322 (aarch64). Rebooted 2026-06-04 03:59.
- BridgettePSHomeComputer — RMM agent 01160fc8-4c2e-4e47-a591-e4e0f9ba5ea7 (v0.6.49). Connects as PEACEFULSPIRIT\BridgetteSH (SSO) via logon task "Connect Peaceful Spirit VPN". Got VPN IP 192.168.0.242 after fix.
- MaraHomeNew — RMM agent e9645594-6d7c-4c97-8cb4-920cb5d06c8e (v0.6.52). Connects as pst-admin via AllUserConnection cmdkey path.
- VPN: L2TP/IPsec, MSCHAPv2 + PSK, pool 192.168.0.240+, DNS 192.168.0.2.
Commands & Outputs
- Client error:
error code returned on failure is 789(RasClient event 20227) on every dial; one earlier termination 832. - Server during live dial w/ auditing on:
NO IKE/IPsec security events in last 5 min -> negotiation packets not reaching server. - UDR
last reboot:reboot ... Thu Jun 4 03:59 still running(prior boot May 22). - UDR post-fix NAT:
-A UBIOS_PREROUTING_USER_HOOK -p udp ... --dport 500 ... -j DNAT --to-destination 192.168.0.2:500and...4500 ... -j DNAT --to-destination 192.168.0.2:4500. - Bridgette post-fix:
ConnectionStatus: Connected,IPAddress: 192.168.0.242, event 20224 "link ... established by user PEACEFULSPIRIT\BridgetteSH". - Mara post-fix: event 20224 link established;
rasdialas SYSTEM → 691 (expected wrong-principal artifact).
Pending / Incomplete Tasks
- Update original Syncro ticket with resolution + 1hr warranty labor (in progress this session).
- Reboot-persistence test: confirm the re-added port-forward survives a deliberate UDR reboot — a port-forward vanishing on reboot is abnormal (possible firmware bug or uncommitted rule).
- DDNS for client profiles: clients hardcode 98.190.129.150; a DDNS hostname would future-proof against a Cox WAN-IP change.
- Vault reconcile: fix pst-admin password drift in
vpn.sops.yaml(24Hearts$ vs SpiritWalk26!). - Wiki: record re-enrolled BridgettePSHomeComputer agent UUID and the "UDR reboot drops VPN port-forward" known issue.
- Optional real-world confirmation of Mara's auto-connect when a user is at the machine.
Reference Information
- Syncro customer: Peaceful Spirit Massage, ID 278525.
- GuruRMM API: http://172.16.3.30:3001.
- UDR read-only access path: SSH key
~/.ssh/pst-cc-ucgvia PST-SERVER LAN jump to root@192.168.0.10. - #dev-alerts messages: outage root cause 1512129381093474324; resolution 1512133532221444348.