78 lines
7.8 KiB
Markdown
78 lines
7.8 KiB
Markdown
# Peaceful Spirit — Session Log 2026-06-04
|
|
|
|
## User
|
|
- **User:** Mike Swanson (mike)
|
|
- **Machine:** GURU-5070
|
|
- **Role:** admin
|
|
|
|
## Session Summary
|
|
|
|
Investigated a report that Bridgette's home machine (BridgettePSHomeComputer) was throwing VPN errors. The investigation established that this was not a Bridgette-specific or account-specific problem but a site-wide VPN outage: all Peaceful Spirit L2TP/IPsec clients were failing at the IPsec negotiation layer (Windows RAS error 789), which occurs before user authentication. MaraHomeNew failed identically while connecting as a different user (pst-admin), confirming the fault was below the per-user layer.
|
|
|
|
Diagnosis proceeded by elimination via GuruRMM remote commands. The RRAS endpoint on PST-SERVER was confirmed fully healthy: 30-day uptime (no overnight reboot), services running (RemoteAccess, IKEEXT, PolicyAgent, RasMan), IKE listening on UDP 500/4500, firewall allow-rules present, PSK matching the vault, and the public/egress IP unchanged at 98.190.129.150 — exactly what clients dial. The client-side NAT-T registry key was correct (value 2). IPsec auditing was temporarily enabled on the server and a live dial triggered from Bridgette; the server logged zero IKE/IPsec security events, proving the clients' negotiation packets were not reaching the server at all.
|
|
|
|
Root cause was isolated to the edge gateway. The site router (UDR Ultra, hostname UCG-PST-CC, 192.168.0.10) was reached read-only via an SSH key jump through PST-SERVER (the device's WAN SSH on :22 had also gone unreachable). Its live iptables ruleset contained no DNAT/port-forward for the VPN, and `last reboot` showed the device rebooted 2026-06-04 03:59 — the outage cutoff. After the reboot it came back without the UDP 500/4500 to 192.168.0.2 port-forward, so all inbound IPsec was silently dropped at the edge. The UniFi config's legacy Mongo collections (portforward/network/firewallrule/routing) all read 0, indicating this UniFi OS build (5.1.15) stores these in a migrated schema, so the controller UI was the authoritative place to confirm/restore the rule.
|
|
|
|
Mike re-added the Port Forward (UDP 500 + 4500 to 192.168.0.2) in the UniFi controller. Verification confirmed the DNAT rules were live in the UDR ruleset, and a live dial test showed Bridgette fully connected (assigned VPN IP 192.168.0.242, RAS event 20224 link established, no 789). Mara's IPsec link also established (no 789); her test returned 691 only because the test rasdial ran as SYSTEM, the documented wrong-principal artifact for this site, not a real fault. The outage was resolved end-to-end. Resolution and root cause were posted to #dev-alerts.
|
|
|
|
## Key Decisions
|
|
|
|
- Treated the report as a whole-site diagnosis rather than a single-machine fix once MaraHomeNew showed the identical 789, since error 789 is a pre-authentication IPsec failure.
|
|
- Used IPsec auditing + a live dial as the decisive test to distinguish "packets not arriving (edge)" from "PSK/policy mismatch (server)" — zero IKE events conclusively pointed upstream of the server.
|
|
- Reached the UDR read-only via the PST-SERVER LAN jump (SSH key pushed to the server temporarily, used, then deleted) because the UDR's WAN SSH was unreachable and RMM was the only live channel to the LAN.
|
|
- Did not edit the UDR firewall over SSH; the UniFi controller re-pushes config, so iptables edits would not persist. The fix was directed to the controller UI.
|
|
- Left the controller change to Mike (per his choice) rather than driving the UniFi browser, then verified via RMM.
|
|
|
|
## Problems Encountered
|
|
|
|
- Two GuruRMM agents matched "BridgettePSHomeComputer"; the wiki UUID (074141d7…) was stale/offline. The machine re-enrolled — live agent is 01160fc8… (v0.6.49). Resolved by always resolving hostname to UUID live.
|
|
- Remote command sent to the UDR over PowerShell→ssh→bash lost its quoting and the remote shell choked on parentheses/pipes. Resolved by base64-encoding the remote script and running `echo <b64> | base64 -d | sh`.
|
|
- A transient "/mingw64/bin/curl: Permission denied" during one dispatch. Resolved by writing the JSON payload to a file and using `--data-binary @file`.
|
|
- Mara's verification dial returned 691; identified as the known SYSTEM-rasdial wrong-principal artifact, not a real failure (IPsec link established successfully).
|
|
|
|
## Configuration Changes
|
|
|
|
- **UniFi controller (UDR Ultra, 192.168.0.10):** Re-added Port Forward UDP 500 + 4500 → 192.168.0.2 (done by Mike in UI). This is the fix.
|
|
- **PST-SERVER:** IPsec auditing (Main/Extended/Quick Mode) temporarily enabled for the live-dial test, then restored to "No Auditing". No persistent change.
|
|
- No repo code changes. Session log + wiki article only.
|
|
|
|
## Credentials & Secrets
|
|
|
|
- No new credentials created. Existing vault entries referenced:
|
|
- `clients/peaceful-spirit/server.sops.yaml` → `credentials.ucg` (UDR SSH key `~/.ssh/pst-cc-ucg`, ssh_password, vpn_psk).
|
|
- `clients/peaceful-spirit/vpn.sops.yaml` → L2TP PSK `z5zkNBds2V9eIkdey09Zm6Khil3DAZs8` (matches server).
|
|
- **Vault drift noted (not yet fixed):** `vpn.sops.yaml` lists pst-admin password `24Hearts$`, but the wiki records a reset to `SpiritWalk26!` on 2026-05-22. Needs reconciliation.
|
|
|
|
## Infrastructure & Servers
|
|
|
|
- **PST-SERVER** — 192.168.0.2, Windows Server 2016 Essentials, PEACEFULSPIRIT.local DC, RRAS L2TP/IPsec endpoint. Public IP 98.190.129.150. RMM agent 87293069-33b6-45e8-a68f-6811216cdb96 (v0.6.52).
|
|
- **UCG-PST-CC** — UDR Ultra, LAN 192.168.0.10 (default gateway), WAN 98.190.129.150. UniFi OS 5.1.15, kernel 5.4.213-ui-ipq5322 (aarch64). Rebooted 2026-06-04 03:59.
|
|
- **BridgettePSHomeComputer** — RMM agent 01160fc8-4c2e-4e47-a591-e4e0f9ba5ea7 (v0.6.49). Connects as PEACEFULSPIRIT\BridgetteSH (SSO) via logon task "Connect Peaceful Spirit VPN". Got VPN IP 192.168.0.242 after fix.
|
|
- **MaraHomeNew** — RMM agent e9645594-6d7c-4c97-8cb4-920cb5d06c8e (v0.6.52). Connects as pst-admin via AllUserConnection cmdkey path.
|
|
- VPN: L2TP/IPsec, MSCHAPv2 + PSK, pool 192.168.0.240+, DNS 192.168.0.2.
|
|
|
|
## Commands & Outputs
|
|
|
|
- Client error: `error code returned on failure is 789` (RasClient event 20227) on every dial; one earlier termination 832.
|
|
- Server during live dial w/ auditing on: `NO IKE/IPsec security events in last 5 min -> negotiation packets not reaching server`.
|
|
- UDR `last reboot`: `reboot ... Thu Jun 4 03:59 still running` (prior boot May 22).
|
|
- UDR post-fix NAT: `-A UBIOS_PREROUTING_USER_HOOK -p udp ... --dport 500 ... -j DNAT --to-destination 192.168.0.2:500` and `...4500 ... -j DNAT --to-destination 192.168.0.2:4500`.
|
|
- Bridgette post-fix: `ConnectionStatus: Connected`, `IPAddress: 192.168.0.242`, event 20224 "link ... established by user PEACEFULSPIRIT\BridgetteSH".
|
|
- Mara post-fix: event 20224 link established; `rasdial` as SYSTEM → 691 (expected wrong-principal artifact).
|
|
|
|
## Pending / Incomplete Tasks
|
|
|
|
- **Update original Syncro ticket** with resolution + 1hr warranty labor (in progress this session).
|
|
- **Reboot-persistence test:** confirm the re-added port-forward survives a deliberate UDR reboot — a port-forward vanishing on reboot is abnormal (possible firmware bug or uncommitted rule).
|
|
- **DDNS for client profiles:** clients hardcode 98.190.129.150; a DDNS hostname would future-proof against a Cox WAN-IP change.
|
|
- **Vault reconcile:** fix pst-admin password drift in `vpn.sops.yaml` (24Hearts$ vs SpiritWalk26!).
|
|
- **Wiki:** record re-enrolled BridgettePSHomeComputer agent UUID and the "UDR reboot drops VPN port-forward" known issue.
|
|
- Optional real-world confirmation of Mara's auto-connect when a user is at the machine.
|
|
|
|
## Reference Information
|
|
|
|
- Syncro customer: Peaceful Spirit Massage, ID 278525.
|
|
- GuruRMM API: http://172.16.3.30:3001.
|
|
- UDR read-only access path: SSH key `~/.ssh/pst-cc-ucg` via PST-SERVER LAN jump to root@192.168.0.10.
|
|
- #dev-alerts messages: outage root cause 1512129381093474324; resolution 1512133532221444348.
|