131 lines
16 KiB
Markdown
131 lines
16 KiB
Markdown
# Universal Cryogenics (UCRYO) — Session 2026-06-02
|
||
|
||
## User
|
||
- **User:** Mike Swanson (mike)
|
||
- **Machine:** GURU-5070
|
||
- **Role:** admin
|
||
|
||
## Session Summary
|
||
|
||
Onboarded a new client, Universal Cryogenics (shortname UCRYO), into GuruRMM with a single site "Main" (site_code LIGHT-WOLF-2305), vaulting the one-time agent enrollment key. Over the session eight Windows agents enrolled under the site: the domain controller UC2-SERVER, the Hyper-V/Veeam backup host WIN-709JUVCJ2DQ, and six workstations (DESKTOP-PMML1JC, KIRBY, gromit, hobbes, hoborg, lilo).
|
||
|
||
Investigated reported "remnants of a previous cryptolocker infection" on UC2-SERVER. Read-only recon identified a December 2019 TrickBot infection: a hidden SYSTEM scheduled task "System Health Application" (boot + every 12 min) pointing at a launcher EXE that was already gone, plus the TrickBot module/config folder under the SYSTEM profile. The task had been failing every run with 0x80070002 (FILE_NOT_FOUND). Quarantined the module folder, deleted the task, removed the folder, and verified. Swept the second server clean. Flagged the real outstanding risk: TrickBot ran pwgrab64 (credential theft) on a domain controller in 2019, so domain credentials/KRBTGT were exposed then — confirmation of a post-incident reset is the open item. Confirmed no free Ryuk decryptor exists or is forthcoming. A reported "crypto" folder of held encrypted data could not be located on either server; the user concluded it was misremembered.
|
||
|
||
Ran the onboarding health/security diagnostic across all eight boxes. A first parallel run had 7 of 8 agents return "interrupted" (agent restarted mid-probe under concurrent load); a gentler sequential re-run completed all eight. All graded RED (typical SMB fleet: missing BitLocker, EOL OS builds, pending patches, RDP enabled). Required a one-line change to the diagnostic runner to make the per-probe exec timeout overridable.
|
||
|
||
Filed a GuruRMM bug (#39) for the agent spawning duplicate system-tray icons (5 gururmm-tray.exe processes on GURU-5070, no single-instance guard). Diagnosed and fixed a Backblaze-bound backup failure on UC2-SERVER's MSP360 plan: the agent was failing TLS to Backblaze because the 64-bit .NET TLS keys were unset on Server 2012 R2; added the keys, restarted services, and confirmed uploads resumed. Established via a controlled comparison (Seth-PC on Win11 with identical missing keys but zero TLS errors) that the issue is legacy-OS-specific, so did not mass-apply the fix to modern boxes. Traced the mspbackups console "disagreement" to a combination of a stalled session never reporting a terminal result and an outdated agent degrading dashboard status reporting. Finally, produced SPEC-024 for a ScreenConnect auto-deploy GuruRMM module and committed it.
|
||
|
||
## Key Decisions
|
||
|
||
- **Client slug `ucryo`, client code `UCRYO`.** Used the user-provided shortname as the GuruRMM client `code` and lowercase as the vault slug, matching existing per-client vault conventions.
|
||
- **Read-before-write on the DC.** All TrickBot investigation was read-only; cleanup (quarantine + task delete + folder removal) was gated on explicit user confirmation given UC2-SERVER is a domain controller.
|
||
- **Quarantine-then-remove** rather than outright delete, preserving the TrickBot modules at C:\Quarantine\syshealth-trickbot-20260602-170235 for IR record.
|
||
- **Sequential diagnostic re-run** after the parallel run caused agent interruptions — isolated the cause as concurrent-load contention (not an agent-stability bug), since the gentle pass completed cleanly.
|
||
- **Did NOT mass-apply the .NET TLS fix** to the 9 RMM-reachable MSP360 boxes. The sweep proved they are all modern OS (2016/2019/2022/Win10) where .NET already negotiates TLS 1.2 by default; the missing keys are benign there. Restarting backup services on healthy production servers across multiple clients was not justified.
|
||
- **TLS root cause is legacy-OS-specific.** Confirmed by controlled comparison: Seth-PC (Win11) has the identical missing keys but 0 secure-channel errors, vs UC2-SERVER (2012 R2) which had many. The fix is only needed on 2012 R2 / Win7-8 era boxes.
|
||
- **Session log placed under `clients/ucryo/`** (primary subject = UCRYO onboarding/infra). GuruRMM bug #39 and SPEC-024 are GuruRMM-scoped cross-references; the fleet-wide MSP360 TLS/agent-version findings are noted but are not UCRYO-specific.
|
||
- **ScreenConnect spec modeled on the existing MSPBackups integration** pattern, with the labeled installer URL built server-side (labels = ScreenConnect c0..c7 custom properties applied at download time).
|
||
|
||
## Problems Encountered
|
||
|
||
- **PowerShell parser error** (`An empty pipe element is not allowed`) from piping a `foreach(){}` statement directly into `Sort-Object`/`Format-Table`. Aborted whole probes silently (empty stdout). Fixed by collecting into a variable first, then piping.
|
||
- **Empty Defender section** on the recon — expected: Server 2012 R2 does not ship the Defender AV PowerShell cmdlets.
|
||
- **Diagnostic probe timeout (240s)** on UC2-SERVER (slow 2012 R2, installed-software enumeration). Made the runner's exec timeout overridable via `DIAG_EXEC_TIMEOUT` env var (default unchanged at 240) and used 480s for servers.
|
||
- **7/8 diagnostic agents "interrupted"** on the parallel run (agent restarted mid-probe under load). Resolved by re-running sequentially — all completed.
|
||
- **MSP360 monitoring API field/enum guessing.** Initial jq used wrong field names (Result/LastBackup null); correct fields are Status/ErrorMessage/FilesCopied/BuildVersion etc. Calibrated the Status enum empirically across 66 records.
|
||
- **Coord todos POST schema mismatch** — endpoint requires `text`, `created_by_user`, `created_by_machine` (not title/description); todo creation returned null and was not reliably persisted. Follow-up captured in this log instead.
|
||
- **Over-generalized the TLS hypothesis** to the Tucson Coin Win11 boxes from the shared "Status 3 stuck" symptom; corrected after the user pointed out they are Win11 and endpoint evidence showed 0 secure-channel errors. The stuck-Status-3 signature is not TLS-specific.
|
||
|
||
## Configuration Changes
|
||
|
||
**Created:**
|
||
- `clients/ucryo/gururmm-site-main.sops.yaml` (vault repo) — UCRYO Main site GuruRMM enrollment key (SOPS-encrypted).
|
||
- `clients/ucryo/onboarding-baselines/*.{json,md}` — 8 immutable diagnostic baselines (UC2-SERVER, WIN-709JUVCJ2DQ, DESKTOP-PMML1JC, KIRBY, gromit, hobbes, hoborg, lilo), timestamped 20260603T00xxxx UTC.
|
||
- `projects/msp-tools/guru-rmm/docs/specs/SPEC-024-screenconnect-auto-deploy.md` — ScreenConnect auto-deploy module spec (committed gururmm 1e24b71).
|
||
|
||
**Modified:**
|
||
- `.claude/scripts/run-onboarding-diagnostic.sh` — added `EXEC_TIMEOUT="${DIAG_EXEC_TIMEOUT:-240}"` and used it for the probe-exec dispatch (was hardcoded 240).
|
||
- `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` — added Integration Features → "Remote Access Tools (Auto-Deploy)" subsection linking SPEC-024.
|
||
|
||
**On endpoint UC2-SERVER (Server 2012 R2):**
|
||
- Added DWORD `SchUseStrongCrypto=1` and `SystemDefaultTlsVersions=1` to BOTH `HKLM\SOFTWARE\Microsoft\.NETFramework\v4.0.30319` and `HKLM\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v4.0.30319`.
|
||
- Restarted services "Online Backup Service" and "Online Backup Service Remote Management".
|
||
- Deleted scheduled task "System Health Application"; removed `C:\Windows\system32\config\systemprofile\AppData\Roaming\syshealth\`; quarantine copy at `C:\Quarantine\syshealth-trickbot-20260602-170235\`.
|
||
|
||
**GitHub/Gitea:**
|
||
- gururmm#39 — bug: duplicate system-tray icons (no single-instance guard).
|
||
|
||
## Credentials & Secrets
|
||
|
||
- **UCRYO GuruRMM enrollment key** — vaulted at `clients/ucryo/gururmm-site-main.sops.yaml` (fields: client_id, site_id, site_code, api_key, installer_url, msi_url).
|
||
- **MSP360 Managed Backup Service API** — vault `msp-tools/msp360-api.sops.yaml`. Base URL `https://api.mspbackups.com`; login `kY9PvDdWki` (password vaulted). Auth: `POST /api/Provider/Login` (body `{"UserName","Password"}`) → `access_token`; then `GET /api/Monitoring` with Bearer token.
|
||
- **GuruRMM admin API** — vault `infrastructure/gururmm-server.sops.yaml` (credentials.gururmm-api.admin-email / admin-password). Base `http://172.16.3.30:3001`.
|
||
- **ScreenConnect instance (ACG)** — relay host `instance-kgc7jt-relay.screenconnect.com`, port 443, instance GUID `s=9f3db089-eb29-441d-a9d2-2c441bde8c78` (observed in UC2-SERVER client launch string; public key `k` also in that string). Not high-sensitivity but record for SPEC-024 implementation.
|
||
|
||
## Infrastructure & Servers
|
||
|
||
**Universal Cryogenics — domain `ucryo.local`**
|
||
- **UC2-SERVER** — Windows Server 2012 R2 Essentials (build 9600), domain controller (AD DS, DNS, DHCP, WSUS, AD CS installed). Drives C: (500GB) and E: (931GB, shares: OFFICE DOCS, Projects, QB2020, UCDATA, x-files; Offsite Archive). MSP360 plan "Ucryo Files" (user richard@ucryo.com). RMM agent id `64cff183-429c-44bf-aebd-55386417a494`.
|
||
- **WIN-709JUVCJ2DQ** — Windows Server 2012 R2 Essentials, Hyper-V + Veeam backup host (VBRCatalog, Veeam-Scripts). Drives C:/E: Hyper-V/V-Hard-Disks / F: Hyper-Data-Disks / M: 4.7TB MWF-Backup. RMM agent id `b7311d8a-6c5e-4aa5-9abf-79212d344009`. UC2-SERVER is likely a guest VM on this host.
|
||
- Workstations: DESKTOP-PMML1JC, KIRBY (Win10 Pro 19045 laptop), gromit, hobbes, hoborg, lilo — all GuruRMM v0.6.54.
|
||
- Management stack present (legit): Syncro, ScreenConnect, Splashtop, ACG Online Backup (MSP360), GuruRMM.
|
||
|
||
**GuruRMM site:** client_id `f954f150-3605-4ef7-82e7-6b942883cb00`, site Main, site_id `345e59d2-ca30-4b9c-b703-c19915b47753`, site_code **LIGHT-WOLF-2305**.
|
||
|
||
**Other (fleet/cross-client):**
|
||
- Seth-PC — Windows 11 Home (build 26200), client "Tucson Coin and Autograph". RMM agent id `4267e35a-cd14-424d-ab82-3da4f9baa0dc`. MSP360 build 8.6.0.290.
|
||
- MSP360 fleet: 47 computers; newest deployed build 8.6.0.290 (34 boxes, still flagged outdated by console); oldest 4.4.2.221 (2 boxes).
|
||
|
||
## Commands & Outputs
|
||
|
||
- TrickBot task: `schtasks /query /tn "System Health Application" /xml` → hidden, RunLevel HighestAvailable, UserId SYSTEM, BootTrigger + 12-min repetition; Last Result `-2147024894` (0x80070002 FILE_NOT_FOUND).
|
||
- TrickBot modules confirmed: `injectDll64`, `pwgrab64`, `psfin64`, `importDll64`, `tabDll64`, `mwormDll64`, `mshareDll64`, `networkDll64`, `NewBCtestnDll64` + `dinj`/`dpost`/`sinj` configs + `settings.ini` under `...systemprofile\AppData\Roaming\syshealth\`.
|
||
- Backup failure (UC2 plan log `5a44fc46-...log`): `LightWebException: The request was aborted: Could not create SSL/TLS secure channel.` against `api001.backblazeb2.com`. First secure-channel error 2025-10-15; intermittent thru May; hard-failing 2026-06-02.
|
||
- Post-fix verify: `cbb plan -r "Ucryo Files"` → "Plan is started"; `secure-channel errors in last 5 min: 0`; `Scanned 474.9 GB ... Uploaded 2.15 GB`.
|
||
- MSP360 Status enum (empirical): 0=completed/idle, 1=Success, 2=Warning, 3=Running(in-progress), 4=Scheduled/never-run, 7=completed-with-errors. Counters (FilesCopied/DataCopied/Duration) populate only at session completion, not during a run.
|
||
- Tray bug evidence (GURU-5070): 5 × `gururmm-tray.exe` PIDs (26224, 11424, 14524, 15928, 4076) with distinct StartTimes spanning 2 days; 2 × `gururmm-agent.exe` (expected: agent + watchdog).
|
||
|
||
## Pending / Incomplete Tasks
|
||
|
||
- **UCRYO 2019 incident — confirm domain credential / KRBTGT reset.** TrickBot pwgrab64 ran on the DC in 2019; verify with client/records whether a full post-incident reset was done. If not, this is the primary residual risk.
|
||
- **AD2** (ACG internal) TLS key check is queued — agent was offline; re-check when it reconnects. It is the only RMM-reachable box that might be legacy OS.
|
||
- **Tucson Coin agent update** — Seth-PC + DESKTOP-P36LUUN: update the outdated MSP360 agent (clears the grey dashboard indicator). Do it AFTER the current first-full completes (avoid restarting the ~20GB upload). Now that Seth-PC is RMM-enabled it can be driven via RMM.
|
||
- **Fleet MSP360 agent-update pass** — 47 boxes lagging; prioritize the 4.4.2.221 / 7.8.x / 7.9.x stragglers. Worklist (client+host+build) can be pulled from the MSP360 API.
|
||
- **GuruRMM bug #39** (tray icons) — awaiting triage/fix; repo has zero labels (offered to create a `bug` label).
|
||
- **SPEC-024 open questions** — instance GUID per-node?, slot-name auto-fetch?, per-OS existing-client detection strings, force_relabel semantics, Linux installer variant, which fields fill remaining c-slots (no tags model in GuruRMM yet).
|
||
- **All 8 UCRYO boxes graded RED** — remediation backlog: BitLocker (KIRBY laptop unencrypted), Win10 22H2 EOL, pending patches, RDP exposure review.
|
||
|
||
## Reference Information
|
||
|
||
- GuruRMM API: `http://172.16.3.30:3001` · Coord API: `http://172.16.3.30:8001/api/coord`
|
||
- UCRYO installer page: `https://rmm.azcomputerguru.com/install/LIGHT-WOLF-2305` · MSI: `https://rmm.azcomputerguru.com/api/sites/345e59d2-ca30-4b9c-b703-c19915b47753/installer`
|
||
- MSP360 API: `https://api.mspbackups.com` (`/api/Provider/Login`, `/api/Monitoring`)
|
||
- UC2-SERVER MSP360 plan id: `5a44fc46-ca94-4095-a645-889eaf754389` ("Ucryo Files", richard@ucryo.com)
|
||
- gururmm#39: `https://git.azcomputerguru.com/azcomputerguru/gururmm/issues/39`
|
||
- SPEC-024: `projects/msp-tools/guru-rmm/docs/specs/SPEC-024-screenconnect-auto-deploy.md` (gururmm commit `1e24b71`)
|
||
- ScreenConnect ClientSetup build URL form: `https://<instance>.screenconnect.com/Bin/ScreenConnect.ClientSetup.msi?e=Access&y=Guest&c=<c0>..&c=<c7>` (c0..c7 = 8 custom org properties, applied at download time)
|
||
- TLS fix (legacy Windows + Backblaze): set `SchUseStrongCrypto=1` + `SystemDefaultTlsVersions=1` (DWORD) under both `.NETFramework\v4.0.30319` and `WOW6432Node\...\v4.0.30319`, restart Online Backup services. Only needed on 2012 R2 / Win7-8; modern OS unaffected.
|
||
|
||
---
|
||
|
||
## Update: 19:52 PT — Fleet-wide MSP360 outdated-agent worklist
|
||
|
||
Pulled the full outdated MSP360 (ACG Online Backup) agent worklist across all clients via the MBS API (`api.mspbackups.com` `/api/Monitoring`), deduped per computer, attached client names, sorted oldest-first, and cross-referenced GuruRMM reachability. (Fleet-wide MSP-tools inventory; recorded here as a continuation of the UCRYO backup investigation that surfaced it. Closes the "Fleet MSP360 agent-update pass" pending item from above by producing the actual list.)
|
||
|
||
**Latest available build: `8.6.0.338`** (one box already on it). Everything below is outdated: **46 agents across 29 clients.**
|
||
|
||
Build distribution (47 distinct computers): `4.4.2.221`×2, `7.9.7.69`×1, `8.0.0.269`×2, `8.1.0.619`×1, `8.1.2.172`×2, `8.1.3.72`×2, `8.1.4.97`×2, `8.2.0.122`×5, `8.6.0.290`×29 (one bump behind), `8.6.0.338`×1 (current).
|
||
|
||
**Priority tiers:**
|
||
- **Ancient (4.x/7.x), do first, none RMM-reachable:** `Julies-Mini-2` (LaHC, 4.4.2.221), `pbx.intranet.dataforth.com` (Dataforth, 4.4.2.221), `DesertRVServer` (Desert RV, 7.9.7.69).
|
||
- **Behind (8.0–8.2), 13 boxes,** only `LAB-SVR` (Len's Auto, 8.2.0.122) is RMM-reachable. Others incl. Dataforth (SAGE-SQL, DF-HYPERV-B), Saguaro Conveyor ×3, Glaztech (GTI-INV-VMHOST), Martell, Tedards, Russo, Jimmy Co, Len's Auto (DESKTOP-BMBTQLI), Tucson Safety & Medical.
|
||
- **One bump behind (8.6.0.290), 29 boxes,** low urgency.
|
||
|
||
**RMM-reachable & outdated (can push/verify via GuruRMM) — 10:** `LAB-SVR` (Len's Auto, 8.2.0.122), `AD2` (Dataforth), `GND-SERVER` (Grabb & Durando), `HSM-NewServer` (Horseshoe Mgmt), `IMC1` (Instrumental Music), `LAB-Becky` (Len's Auto), `NEPTUNE` (ACG-Internal), `PST-SERVER` (Peaceful Spirit), `UC2-SERVER` (UCRYO), `rednourcarrievirt` (Rednour Law) — all 8.6.0.290 except LAB-SVR.
|
||
|
||
**Worst-hit clients:** Dataforth (7), then Saguaro Conveyor / Len's Auto / Glaztech / Desert RV (3 each).
|
||
|
||
**Recommendation:** bulk-update from the MSP360 console (reaches all 46, including the 36 not in RMM — the ancient 4.x/7.x boxes can *only* be updated that way); optionally trial the RMM-driven path on a low-risk reachable box (NEPTUNE, ACG-internal) first. Updating mostly fixes the grey-dashboard reporting glitch; not an emergency except the 3 ancient boxes. No changes were made — worklist only.
|
||
|
||
(Also: answered a capability question — no native text-to-image generation available in this environment; can produce SVG / HTML-CSS / matplotlib / Mermaid / Graphviz / ASCII instead. No deliverable.)
|