Files
claudetools/clients/peaceful-spirit/session-logs/2026-06-04-session.md
Mike Swanson 5a78c56f36 sync: auto-sync from GURU-5070 at 2026-06-04 09:45:37
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-04 09:45:37
2026-06-04 09:45:42 -07:00

7.8 KiB

Peaceful Spirit — Session Log 2026-06-04

User

  • User: Mike Swanson (mike)
  • Machine: GURU-5070
  • Role: admin

Session Summary

Investigated a report that Bridgette's home machine (BridgettePSHomeComputer) was throwing VPN errors. The investigation established that this was not a Bridgette-specific or account-specific problem but a site-wide VPN outage: all Peaceful Spirit L2TP/IPsec clients were failing at the IPsec negotiation layer (Windows RAS error 789), which occurs before user authentication. MaraHomeNew failed identically while connecting as a different user (pst-admin), confirming the fault was below the per-user layer.

Diagnosis proceeded by elimination via GuruRMM remote commands. The RRAS endpoint on PST-SERVER was confirmed fully healthy: 30-day uptime (no overnight reboot), services running (RemoteAccess, IKEEXT, PolicyAgent, RasMan), IKE listening on UDP 500/4500, firewall allow-rules present, PSK matching the vault, and the public/egress IP unchanged at 98.190.129.150 — exactly what clients dial. The client-side NAT-T registry key was correct (value 2). IPsec auditing was temporarily enabled on the server and a live dial triggered from Bridgette; the server logged zero IKE/IPsec security events, proving the clients' negotiation packets were not reaching the server at all.

Root cause was isolated to the edge gateway. The site router (UDR Ultra, hostname UCG-PST-CC, 192.168.0.10) was reached read-only via an SSH key jump through PST-SERVER (the device's WAN SSH on :22 had also gone unreachable). Its live iptables ruleset contained no DNAT/port-forward for the VPN, and last reboot showed the device rebooted 2026-06-04 03:59 — the outage cutoff. After the reboot it came back without the UDP 500/4500 to 192.168.0.2 port-forward, so all inbound IPsec was silently dropped at the edge. The UniFi config's legacy Mongo collections (portforward/network/firewallrule/routing) all read 0, indicating this UniFi OS build (5.1.15) stores these in a migrated schema, so the controller UI was the authoritative place to confirm/restore the rule.

Mike re-added the Port Forward (UDP 500 + 4500 to 192.168.0.2) in the UniFi controller. Verification confirmed the DNAT rules were live in the UDR ruleset, and a live dial test showed Bridgette fully connected (assigned VPN IP 192.168.0.242, RAS event 20224 link established, no 789). Mara's IPsec link also established (no 789); her test returned 691 only because the test rasdial ran as SYSTEM, the documented wrong-principal artifact for this site, not a real fault. The outage was resolved end-to-end. Resolution and root cause were posted to #dev-alerts.

Key Decisions

  • Treated the report as a whole-site diagnosis rather than a single-machine fix once MaraHomeNew showed the identical 789, since error 789 is a pre-authentication IPsec failure.
  • Used IPsec auditing + a live dial as the decisive test to distinguish "packets not arriving (edge)" from "PSK/policy mismatch (server)" — zero IKE events conclusively pointed upstream of the server.
  • Reached the UDR read-only via the PST-SERVER LAN jump (SSH key pushed to the server temporarily, used, then deleted) because the UDR's WAN SSH was unreachable and RMM was the only live channel to the LAN.
  • Did not edit the UDR firewall over SSH; the UniFi controller re-pushes config, so iptables edits would not persist. The fix was directed to the controller UI.
  • Left the controller change to Mike (per his choice) rather than driving the UniFi browser, then verified via RMM.

Problems Encountered

  • Two GuruRMM agents matched "BridgettePSHomeComputer"; the wiki UUID (074141d7…) was stale/offline. The machine re-enrolled — live agent is 01160fc8… (v0.6.49). Resolved by always resolving hostname to UUID live.
  • Remote command sent to the UDR over PowerShell→ssh→bash lost its quoting and the remote shell choked on parentheses/pipes. Resolved by base64-encoding the remote script and running echo <b64> | base64 -d | sh.
  • A transient "/mingw64/bin/curl: Permission denied" during one dispatch. Resolved by writing the JSON payload to a file and using --data-binary @file.
  • Mara's verification dial returned 691; identified as the known SYSTEM-rasdial wrong-principal artifact, not a real failure (IPsec link established successfully).

Configuration Changes

  • UniFi controller (UDR Ultra, 192.168.0.10): Re-added Port Forward UDP 500 + 4500 → 192.168.0.2 (done by Mike in UI). This is the fix.
  • PST-SERVER: IPsec auditing (Main/Extended/Quick Mode) temporarily enabled for the live-dial test, then restored to "No Auditing". No persistent change.
  • No repo code changes. Session log + wiki article only.

Credentials & Secrets

  • No new credentials created. Existing vault entries referenced:
    • clients/peaceful-spirit/server.sops.yamlcredentials.ucg (UDR SSH key ~/.ssh/pst-cc-ucg, ssh_password, vpn_psk).
    • clients/peaceful-spirit/vpn.sops.yaml → L2TP PSK z5zkNBds2V9eIkdey09Zm6Khil3DAZs8 (matches server).
  • Vault drift noted (not yet fixed): vpn.sops.yaml lists pst-admin password 24Hearts$, but the wiki records a reset to SpiritWalk26! on 2026-05-22. Needs reconciliation.

Infrastructure & Servers

  • PST-SERVER — 192.168.0.2, Windows Server 2016 Essentials, PEACEFULSPIRIT.local DC, RRAS L2TP/IPsec endpoint. Public IP 98.190.129.150. RMM agent 87293069-33b6-45e8-a68f-6811216cdb96 (v0.6.52).
  • UCG-PST-CC — UDR Ultra, LAN 192.168.0.10 (default gateway), WAN 98.190.129.150. UniFi OS 5.1.15, kernel 5.4.213-ui-ipq5322 (aarch64). Rebooted 2026-06-04 03:59.
  • BridgettePSHomeComputer — RMM agent 01160fc8-4c2e-4e47-a591-e4e0f9ba5ea7 (v0.6.49). Connects as PEACEFULSPIRIT\BridgetteSH (SSO) via logon task "Connect Peaceful Spirit VPN". Got VPN IP 192.168.0.242 after fix.
  • MaraHomeNew — RMM agent e9645594-6d7c-4c97-8cb4-920cb5d06c8e (v0.6.52). Connects as pst-admin via AllUserConnection cmdkey path.
  • VPN: L2TP/IPsec, MSCHAPv2 + PSK, pool 192.168.0.240+, DNS 192.168.0.2.

Commands & Outputs

  • Client error: error code returned on failure is 789 (RasClient event 20227) on every dial; one earlier termination 832.
  • Server during live dial w/ auditing on: NO IKE/IPsec security events in last 5 min -> negotiation packets not reaching server.
  • UDR last reboot: reboot ... Thu Jun 4 03:59 still running (prior boot May 22).
  • UDR post-fix NAT: -A UBIOS_PREROUTING_USER_HOOK -p udp ... --dport 500 ... -j DNAT --to-destination 192.168.0.2:500 and ...4500 ... -j DNAT --to-destination 192.168.0.2:4500.
  • Bridgette post-fix: ConnectionStatus: Connected, IPAddress: 192.168.0.242, event 20224 "link ... established by user PEACEFULSPIRIT\BridgetteSH".
  • Mara post-fix: event 20224 link established; rasdial as SYSTEM → 691 (expected wrong-principal artifact).

Pending / Incomplete Tasks

  • Update original Syncro ticket with resolution + 1hr warranty labor (in progress this session).
  • Reboot-persistence test: confirm the re-added port-forward survives a deliberate UDR reboot — a port-forward vanishing on reboot is abnormal (possible firmware bug or uncommitted rule).
  • DDNS for client profiles: clients hardcode 98.190.129.150; a DDNS hostname would future-proof against a Cox WAN-IP change.
  • Vault reconcile: fix pst-admin password drift in vpn.sops.yaml (24Hearts$ vs SpiritWalk26!).
  • Wiki: record re-enrolled BridgettePSHomeComputer agent UUID and the "UDR reboot drops VPN port-forward" known issue.
  • Optional real-world confirmation of Mara's auto-connect when a user is at the machine.

Reference Information

  • Syncro customer: Peaceful Spirit Massage, ID 278525.
  • GuruRMM API: http://172.16.3.30:3001.
  • UDR read-only access path: SSH key ~/.ssh/pst-cc-ucg via PST-SERVER LAN jump to root@192.168.0.10.
  • #dev-alerts messages: outage root cause 1512129381093474324; resolution 1512133532221444348.