Files
claudetools/clients/ucryo/session-logs/2026-06-02-session.md
Mike Swanson e0643310a0 sync: auto-sync from GURU-5070 at 2026-06-02 19:53:08
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-02 19:53:08
2026-06-02 19:53:12 -07:00

131 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Universal Cryogenics (UCRYO) — Session 2026-06-02
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Onboarded a new client, Universal Cryogenics (shortname UCRYO), into GuruRMM with a single site "Main" (site_code LIGHT-WOLF-2305), vaulting the one-time agent enrollment key. Over the session eight Windows agents enrolled under the site: the domain controller UC2-SERVER, the Hyper-V/Veeam backup host WIN-709JUVCJ2DQ, and six workstations (DESKTOP-PMML1JC, KIRBY, gromit, hobbes, hoborg, lilo).
Investigated reported "remnants of a previous cryptolocker infection" on UC2-SERVER. Read-only recon identified a December 2019 TrickBot infection: a hidden SYSTEM scheduled task "System Health Application" (boot + every 12 min) pointing at a launcher EXE that was already gone, plus the TrickBot module/config folder under the SYSTEM profile. The task had been failing every run with 0x80070002 (FILE_NOT_FOUND). Quarantined the module folder, deleted the task, removed the folder, and verified. Swept the second server clean. Flagged the real outstanding risk: TrickBot ran pwgrab64 (credential theft) on a domain controller in 2019, so domain credentials/KRBTGT were exposed then — confirmation of a post-incident reset is the open item. Confirmed no free Ryuk decryptor exists or is forthcoming. A reported "crypto" folder of held encrypted data could not be located on either server; the user concluded it was misremembered.
Ran the onboarding health/security diagnostic across all eight boxes. A first parallel run had 7 of 8 agents return "interrupted" (agent restarted mid-probe under concurrent load); a gentler sequential re-run completed all eight. All graded RED (typical SMB fleet: missing BitLocker, EOL OS builds, pending patches, RDP enabled). Required a one-line change to the diagnostic runner to make the per-probe exec timeout overridable.
Filed a GuruRMM bug (#39) for the agent spawning duplicate system-tray icons (5 gururmm-tray.exe processes on GURU-5070, no single-instance guard). Diagnosed and fixed a Backblaze-bound backup failure on UC2-SERVER's MSP360 plan: the agent was failing TLS to Backblaze because the 64-bit .NET TLS keys were unset on Server 2012 R2; added the keys, restarted services, and confirmed uploads resumed. Established via a controlled comparison (Seth-PC on Win11 with identical missing keys but zero TLS errors) that the issue is legacy-OS-specific, so did not mass-apply the fix to modern boxes. Traced the mspbackups console "disagreement" to a combination of a stalled session never reporting a terminal result and an outdated agent degrading dashboard status reporting. Finally, produced SPEC-024 for a ScreenConnect auto-deploy GuruRMM module and committed it.
## Key Decisions
- **Client slug `ucryo`, client code `UCRYO`.** Used the user-provided shortname as the GuruRMM client `code` and lowercase as the vault slug, matching existing per-client vault conventions.
- **Read-before-write on the DC.** All TrickBot investigation was read-only; cleanup (quarantine + task delete + folder removal) was gated on explicit user confirmation given UC2-SERVER is a domain controller.
- **Quarantine-then-remove** rather than outright delete, preserving the TrickBot modules at C:\Quarantine\syshealth-trickbot-20260602-170235 for IR record.
- **Sequential diagnostic re-run** after the parallel run caused agent interruptions — isolated the cause as concurrent-load contention (not an agent-stability bug), since the gentle pass completed cleanly.
- **Did NOT mass-apply the .NET TLS fix** to the 9 RMM-reachable MSP360 boxes. The sweep proved they are all modern OS (2016/2019/2022/Win10) where .NET already negotiates TLS 1.2 by default; the missing keys are benign there. Restarting backup services on healthy production servers across multiple clients was not justified.
- **TLS root cause is legacy-OS-specific.** Confirmed by controlled comparison: Seth-PC (Win11) has the identical missing keys but 0 secure-channel errors, vs UC2-SERVER (2012 R2) which had many. The fix is only needed on 2012 R2 / Win7-8 era boxes.
- **Session log placed under `clients/ucryo/`** (primary subject = UCRYO onboarding/infra). GuruRMM bug #39 and SPEC-024 are GuruRMM-scoped cross-references; the fleet-wide MSP360 TLS/agent-version findings are noted but are not UCRYO-specific.
- **ScreenConnect spec modeled on the existing MSPBackups integration** pattern, with the labeled installer URL built server-side (labels = ScreenConnect c0..c7 custom properties applied at download time).
## Problems Encountered
- **PowerShell parser error** (`An empty pipe element is not allowed`) from piping a `foreach(){}` statement directly into `Sort-Object`/`Format-Table`. Aborted whole probes silently (empty stdout). Fixed by collecting into a variable first, then piping.
- **Empty Defender section** on the recon — expected: Server 2012 R2 does not ship the Defender AV PowerShell cmdlets.
- **Diagnostic probe timeout (240s)** on UC2-SERVER (slow 2012 R2, installed-software enumeration). Made the runner's exec timeout overridable via `DIAG_EXEC_TIMEOUT` env var (default unchanged at 240) and used 480s for servers.
- **7/8 diagnostic agents "interrupted"** on the parallel run (agent restarted mid-probe under load). Resolved by re-running sequentially — all completed.
- **MSP360 monitoring API field/enum guessing.** Initial jq used wrong field names (Result/LastBackup null); correct fields are Status/ErrorMessage/FilesCopied/BuildVersion etc. Calibrated the Status enum empirically across 66 records.
- **Coord todos POST schema mismatch** — endpoint requires `text`, `created_by_user`, `created_by_machine` (not title/description); todo creation returned null and was not reliably persisted. Follow-up captured in this log instead.
- **Over-generalized the TLS hypothesis** to the Tucson Coin Win11 boxes from the shared "Status 3 stuck" symptom; corrected after the user pointed out they are Win11 and endpoint evidence showed 0 secure-channel errors. The stuck-Status-3 signature is not TLS-specific.
## Configuration Changes
**Created:**
- `clients/ucryo/gururmm-site-main.sops.yaml` (vault repo) — UCRYO Main site GuruRMM enrollment key (SOPS-encrypted).
- `clients/ucryo/onboarding-baselines/*.{json,md}` — 8 immutable diagnostic baselines (UC2-SERVER, WIN-709JUVCJ2DQ, DESKTOP-PMML1JC, KIRBY, gromit, hobbes, hoborg, lilo), timestamped 20260603T00xxxx UTC.
- `projects/msp-tools/guru-rmm/docs/specs/SPEC-024-screenconnect-auto-deploy.md` — ScreenConnect auto-deploy module spec (committed gururmm 1e24b71).
**Modified:**
- `.claude/scripts/run-onboarding-diagnostic.sh` — added `EXEC_TIMEOUT="${DIAG_EXEC_TIMEOUT:-240}"` and used it for the probe-exec dispatch (was hardcoded 240).
- `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` — added Integration Features → "Remote Access Tools (Auto-Deploy)" subsection linking SPEC-024.
**On endpoint UC2-SERVER (Server 2012 R2):**
- Added DWORD `SchUseStrongCrypto=1` and `SystemDefaultTlsVersions=1` to BOTH `HKLM\SOFTWARE\Microsoft\.NETFramework\v4.0.30319` and `HKLM\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v4.0.30319`.
- Restarted services "Online Backup Service" and "Online Backup Service Remote Management".
- Deleted scheduled task "System Health Application"; removed `C:\Windows\system32\config\systemprofile\AppData\Roaming\syshealth\`; quarantine copy at `C:\Quarantine\syshealth-trickbot-20260602-170235\`.
**GitHub/Gitea:**
- gururmm#39 — bug: duplicate system-tray icons (no single-instance guard).
## Credentials & Secrets
- **UCRYO GuruRMM enrollment key** — vaulted at `clients/ucryo/gururmm-site-main.sops.yaml` (fields: client_id, site_id, site_code, api_key, installer_url, msi_url).
- **MSP360 Managed Backup Service API** — vault `msp-tools/msp360-api.sops.yaml`. Base URL `https://api.mspbackups.com`; login `kY9PvDdWki` (password vaulted). Auth: `POST /api/Provider/Login` (body `{"UserName","Password"}`) → `access_token`; then `GET /api/Monitoring` with Bearer token.
- **GuruRMM admin API** — vault `infrastructure/gururmm-server.sops.yaml` (credentials.gururmm-api.admin-email / admin-password). Base `http://172.16.3.30:3001`.
- **ScreenConnect instance (ACG)** — relay host `instance-kgc7jt-relay.screenconnect.com`, port 443, instance GUID `s=9f3db089-eb29-441d-a9d2-2c441bde8c78` (observed in UC2-SERVER client launch string; public key `k` also in that string). Not high-sensitivity but record for SPEC-024 implementation.
## Infrastructure & Servers
**Universal Cryogenics — domain `ucryo.local`**
- **UC2-SERVER** — Windows Server 2012 R2 Essentials (build 9600), domain controller (AD DS, DNS, DHCP, WSUS, AD CS installed). Drives C: (500GB) and E: (931GB, shares: OFFICE DOCS, Projects, QB2020, UCDATA, x-files; Offsite Archive). MSP360 plan "Ucryo Files" (user richard@ucryo.com). RMM agent id `64cff183-429c-44bf-aebd-55386417a494`.
- **WIN-709JUVCJ2DQ** — Windows Server 2012 R2 Essentials, Hyper-V + Veeam backup host (VBRCatalog, Veeam-Scripts). Drives C:/E: Hyper-V/V-Hard-Disks / F: Hyper-Data-Disks / M: 4.7TB MWF-Backup. RMM agent id `b7311d8a-6c5e-4aa5-9abf-79212d344009`. UC2-SERVER is likely a guest VM on this host.
- Workstations: DESKTOP-PMML1JC, KIRBY (Win10 Pro 19045 laptop), gromit, hobbes, hoborg, lilo — all GuruRMM v0.6.54.
- Management stack present (legit): Syncro, ScreenConnect, Splashtop, ACG Online Backup (MSP360), GuruRMM.
**GuruRMM site:** client_id `f954f150-3605-4ef7-82e7-6b942883cb00`, site Main, site_id `345e59d2-ca30-4b9c-b703-c19915b47753`, site_code **LIGHT-WOLF-2305**.
**Other (fleet/cross-client):**
- Seth-PC — Windows 11 Home (build 26200), client "Tucson Coin and Autograph". RMM agent id `4267e35a-cd14-424d-ab82-3da4f9baa0dc`. MSP360 build 8.6.0.290.
- MSP360 fleet: 47 computers; newest deployed build 8.6.0.290 (34 boxes, still flagged outdated by console); oldest 4.4.2.221 (2 boxes).
## Commands & Outputs
- TrickBot task: `schtasks /query /tn "System Health Application" /xml` → hidden, RunLevel HighestAvailable, UserId SYSTEM, BootTrigger + 12-min repetition; Last Result `-2147024894` (0x80070002 FILE_NOT_FOUND).
- TrickBot modules confirmed: `injectDll64`, `pwgrab64`, `psfin64`, `importDll64`, `tabDll64`, `mwormDll64`, `mshareDll64`, `networkDll64`, `NewBCtestnDll64` + `dinj`/`dpost`/`sinj` configs + `settings.ini` under `...systemprofile\AppData\Roaming\syshealth\`.
- Backup failure (UC2 plan log `5a44fc46-...log`): `LightWebException: The request was aborted: Could not create SSL/TLS secure channel.` against `api001.backblazeb2.com`. First secure-channel error 2025-10-15; intermittent thru May; hard-failing 2026-06-02.
- Post-fix verify: `cbb plan -r "Ucryo Files"` → "Plan is started"; `secure-channel errors in last 5 min: 0`; `Scanned 474.9 GB ... Uploaded 2.15 GB`.
- MSP360 Status enum (empirical): 0=completed/idle, 1=Success, 2=Warning, 3=Running(in-progress), 4=Scheduled/never-run, 7=completed-with-errors. Counters (FilesCopied/DataCopied/Duration) populate only at session completion, not during a run.
- Tray bug evidence (GURU-5070): 5 × `gururmm-tray.exe` PIDs (26224, 11424, 14524, 15928, 4076) with distinct StartTimes spanning 2 days; 2 × `gururmm-agent.exe` (expected: agent + watchdog).
## Pending / Incomplete Tasks
- **UCRYO 2019 incident — confirm domain credential / KRBTGT reset.** TrickBot pwgrab64 ran on the DC in 2019; verify with client/records whether a full post-incident reset was done. If not, this is the primary residual risk.
- **AD2** (ACG internal) TLS key check is queued — agent was offline; re-check when it reconnects. It is the only RMM-reachable box that might be legacy OS.
- **Tucson Coin agent update** — Seth-PC + DESKTOP-P36LUUN: update the outdated MSP360 agent (clears the grey dashboard indicator). Do it AFTER the current first-full completes (avoid restarting the ~20GB upload). Now that Seth-PC is RMM-enabled it can be driven via RMM.
- **Fleet MSP360 agent-update pass** — 47 boxes lagging; prioritize the 4.4.2.221 / 7.8.x / 7.9.x stragglers. Worklist (client+host+build) can be pulled from the MSP360 API.
- **GuruRMM bug #39** (tray icons) — awaiting triage/fix; repo has zero labels (offered to create a `bug` label).
- **SPEC-024 open questions** — instance GUID per-node?, slot-name auto-fetch?, per-OS existing-client detection strings, force_relabel semantics, Linux installer variant, which fields fill remaining c-slots (no tags model in GuruRMM yet).
- **All 8 UCRYO boxes graded RED** — remediation backlog: BitLocker (KIRBY laptop unencrypted), Win10 22H2 EOL, pending patches, RDP exposure review.
## Reference Information
- GuruRMM API: `http://172.16.3.30:3001` · Coord API: `http://172.16.3.30:8001/api/coord`
- UCRYO installer page: `https://rmm.azcomputerguru.com/install/LIGHT-WOLF-2305` · MSI: `https://rmm.azcomputerguru.com/api/sites/345e59d2-ca30-4b9c-b703-c19915b47753/installer`
- MSP360 API: `https://api.mspbackups.com` (`/api/Provider/Login`, `/api/Monitoring`)
- UC2-SERVER MSP360 plan id: `5a44fc46-ca94-4095-a645-889eaf754389` ("Ucryo Files", richard@ucryo.com)
- gururmm#39: `https://git.azcomputerguru.com/azcomputerguru/gururmm/issues/39`
- SPEC-024: `projects/msp-tools/guru-rmm/docs/specs/SPEC-024-screenconnect-auto-deploy.md` (gururmm commit `1e24b71`)
- ScreenConnect ClientSetup build URL form: `https://<instance>.screenconnect.com/Bin/ScreenConnect.ClientSetup.msi?e=Access&y=Guest&c=<c0>..&c=<c7>` (c0..c7 = 8 custom org properties, applied at download time)
- TLS fix (legacy Windows + Backblaze): set `SchUseStrongCrypto=1` + `SystemDefaultTlsVersions=1` (DWORD) under both `.NETFramework\v4.0.30319` and `WOW6432Node\...\v4.0.30319`, restart Online Backup services. Only needed on 2012 R2 / Win7-8; modern OS unaffected.
---
## Update: 19:52 PT — Fleet-wide MSP360 outdated-agent worklist
Pulled the full outdated MSP360 (ACG Online Backup) agent worklist across all clients via the MBS API (`api.mspbackups.com` `/api/Monitoring`), deduped per computer, attached client names, sorted oldest-first, and cross-referenced GuruRMM reachability. (Fleet-wide MSP-tools inventory; recorded here as a continuation of the UCRYO backup investigation that surfaced it. Closes the "Fleet MSP360 agent-update pass" pending item from above by producing the actual list.)
**Latest available build: `8.6.0.338`** (one box already on it). Everything below is outdated: **46 agents across 29 clients.**
Build distribution (47 distinct computers): `4.4.2.221`×2, `7.9.7.69`×1, `8.0.0.269`×2, `8.1.0.619`×1, `8.1.2.172`×2, `8.1.3.72`×2, `8.1.4.97`×2, `8.2.0.122`×5, `8.6.0.290`×29 (one bump behind), `8.6.0.338`×1 (current).
**Priority tiers:**
- **Ancient (4.x/7.x), do first, none RMM-reachable:** `Julies-Mini-2` (LaHC, 4.4.2.221), `pbx.intranet.dataforth.com` (Dataforth, 4.4.2.221), `DesertRVServer` (Desert RV, 7.9.7.69).
- **Behind (8.08.2), 13 boxes,** only `LAB-SVR` (Len's Auto, 8.2.0.122) is RMM-reachable. Others incl. Dataforth (SAGE-SQL, DF-HYPERV-B), Saguaro Conveyor ×3, Glaztech (GTI-INV-VMHOST), Martell, Tedards, Russo, Jimmy Co, Len's Auto (DESKTOP-BMBTQLI), Tucson Safety & Medical.
- **One bump behind (8.6.0.290), 29 boxes,** low urgency.
**RMM-reachable & outdated (can push/verify via GuruRMM) — 10:** `LAB-SVR` (Len's Auto, 8.2.0.122), `AD2` (Dataforth), `GND-SERVER` (Grabb & Durando), `HSM-NewServer` (Horseshoe Mgmt), `IMC1` (Instrumental Music), `LAB-Becky` (Len's Auto), `NEPTUNE` (ACG-Internal), `PST-SERVER` (Peaceful Spirit), `UC2-SERVER` (UCRYO), `rednourcarrievirt` (Rednour Law) — all 8.6.0.290 except LAB-SVR.
**Worst-hit clients:** Dataforth (7), then Saguaro Conveyor / Len's Auto / Glaztech / Desert RV (3 each).
**Recommendation:** bulk-update from the MSP360 console (reaches all 46, including the 36 not in RMM — the ancient 4.x/7.x boxes can *only* be updated that way); optionally trial the RMM-driven path on a low-risk reachable box (NEPTUNE, ACG-internal) first. Updating mostly fixes the grey-dashboard reporting glitch; not an emergency except the 3 ancient boxes. No changes were made — worklist only.
(Also: answered a capability question — no native text-to-image generation available in this environment; can produce SVG / HTML-CSS / matplotlib / Mermaid / Graphviz / ASCII instead. No deliverable.)