From fb12cd1dcb96842404393f66939ee342cb52b2e3 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Wed, 27 May 2026 20:54:23 -0700 Subject: [PATCH] sync: auto-sync from GURU-5070 at 2026-05-27 20:54:20 Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-27 20:54:20 --- session-logs/2026-05-27-session.md | 89 ++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/session-logs/2026-05-27-session.md b/session-logs/2026-05-27-session.md index 070e86f..ef1d01d 100644 --- a/session-logs/2026-05-27-session.md +++ b/session-logs/2026-05-27-session.md @@ -547,3 +547,92 @@ User requested comprehensive interview for Howard about log analysis feature des - ADR-007 documentation - Dashboard rebuild (after feature design clear) + +--- + +## Update: 20:49 PT — LHM/WinRing0 pulled from GuruRMM agent install (GURU-5070) + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +### Session Summary + +Microsoft Defender alerted on a managed Windows endpoint with `VulnerableDriver:WinNT/Winring0` quarantining `C:\Program Files\GuruRMM\lhm\LibreHardwareMonitor.sys` (Trojan / Severe). Connected to Howard's earlier LHM WMI crash report on Cascades MAINTENANCE-PC — same component, deeper root cause: LHM bundles WinRing0 (CVE-2020-14979 ring-0 arbitrary R/W, on Microsoft's vulnerable-driver blocklist). GuruRMM was effectively shipping a known-vulnerable kernel LPE primitive to every managed Windows endpoint, and on Win11 with HVCI/blocklist active the driver also fails to load. + +Scoped blast radius: 58 of 64 agents are Windows (5 linux, 1 macos; versions mostly 0.6.39 with a tail to 0.6.2; 39 online / 19 offline at the time). All 58 carry the bundled LHM. Owner decision: pull LHM from the install now and defer the proper headless temperature-monitoring replacement. Spawned a Coding Agent which removed LHM bundling from `installer/gururmm-agent.wxs` (10 components — the Pluto-built WiX MSI), `installer/gururmm-agent-linux.wxs` (32 components — the Linux-built CI variant that also shipped LHM, surprise finding), and `scripts/setup-build-server.ps1` (no longer downloads/stages LHM 0.9.4 net472). `agent/src/ohw.rs::LhmGuard::start()` is now a clean no-op; the module + `LHM_RUNNING` flag are kept so re-enabling later is a small change. `metrics::collect_temps_from_lhm()` is already gated on `LHM_RUNNING` (now permanently false) — temps simply return None and no other agent code needed changing. + +Push reconciled across parallel commits. The gururmm push initially rejected non-fast-forward against a parallel SPEC-011 spec commit from another machine; rebase (non-overlapping markdown) → `bc3c2bd` live. CI auto-bump pushed `6326ec6` on top. The ClaudeTools parent push then rejected against Howard's auto-sync that bumped the same submodule pointer to the earlier (parallel) SHA — Git auto-resolved by taking the newer pointer (transitive supersede: `fae47f2` is `bc3c2bd`'s direct parent, so the newer pointer includes Howard's transitively); `6902645` pushed. The webhook fired the agent build; v0.6.46 built clean on Pluto in 1126s, signed, marked beta, latest symlinks updated, old v0.6.45 removed. + +Coordinated the change fleet-wide while the build was in flight: lock claimed on `gururmm/agents` (4h TTL), component state set to `building`, broadcast sent to ALL_SESSIONS describing the in-flight change plus a don't-edit-these-files ask. After build completed: component flipped to `deployed v0.6.46`, lock released, follow-up broadcast sent. Cleanup todo logged for the runtime-registered WinRing0 kernel service / extracted `.sys` on already-affected endpoints — MSI MajorUpgrade removes the bundled `lhm` folder when agents update but does NOT touch the runtime-registered kernel service (LHM extracts/registers it at runtime, not via MSI File components). + +Side review: a new `check-messages.sh` arrived via sync (`a35b583`, from GURU-KALI). Substantive improvements — per-machine broadcast seen-tracking via `.claude/coord-broadcasts-seen` (eliminates the PUT `/read` clobber on broadcasts that share one server-side `read_at`), alias-merge query (catches `to=mike`/`to=howard`), JSON control-char sanitization, mode-file auto-init, Windows toast. Flagged one cross-platform concern: `sanitize_json` calls `python3` unconditionally; on Windows boxes without `python3` on PATH (per `feedback_python_windows`, ACG uses `py`) the function silently fails and unread messages get dropped. Fix awaiting owner go-ahead. + +### Key Decisions + +- **Pull LHM, do NOT add a Defender exclusion.** Excluding a genuinely-vulnerable ring-0 driver leaves an LPE primitive on every managed endpoint; unacceptable for an MSP. The Win11 vulnerable-driver blocklist also prevents driver load regardless of AV exclusion. +- **Keep `LhmGuard` module + `LHM_RUNNING` flag as a no-op rather than delete.** Re-enabling a future replacement temp path is a small change later; gutting the temp-monitoring scaffolding would mean rebuilding from scratch. +- **Trust Git's auto-resolution of the submodule-pointer collision** on the parent rebase: Howard's pointer `fae47f2` is `bc3c2bd`'s direct parent, so newer-wins is a strict supersede, not a divergence. Verified post-rebase via `git ls-tree HEAD projects/msp-tools/guru-rmm` = `bc3c2bd`. +- **Removed LHM from both `gururmm-agent.wxs` AND `gururmm-agent-linux.wxs`.** Surprise finding: the Linux-built CI MSI variant also bundled LHM (32 components vs the 10-component hand-curated Pluto WXS). Removing only one would leave the other build path producing a vulnerable MSI. +- **Coord broadcast + lock during the rollout.** Howard's parallel SPEC-010 work touches the same agent areas; a fleet-wide rollout warranted explicit "don't concurrently edit these files" coordination. + +### Problems Encountered + +- **gururmm push rejected non-fast-forward** against a parallel SPEC-011 spec commit (`fae47f2`). Resolved: verified non-overlapping (markdown only via `git log --stat fae47f2`), rebased, pushed `bc3c2bd`. +- **ClaudeTools parent push rejected non-fast-forward** against Howard's auto-sync `47d6519` which bumped the same submodule pointer to the parallel (earlier) SHA. Resolved: Git auto-resolved by taking the newer pointer on rebase; `git ls-tree` verified the resulting pointer = `bc3c2bd`. +- **Coord todo POST returned `"error parsing the body"` with the full ~1.5KB description**, despite the JSON validating locally with jq. A compressed ~700-char text body POSTed cleanly via the same code path. Likely an undocumented body-size limit or content-specific parser issue on `/api/coord/todos` somewhere between those sizes; the long version failed on both POST and PUT identically. Worth knowing for future long todos. +- **Coord todo POST validation missed required fields.** Schema (`api/schemas/coord_todo.py`) requires `created_by_user` + `created_by_machine`; first attempt missed both. Returned structured `Request validation failed` — added and retried. +- **Initial `/api/agents` response shape misparse.** Endpoint returns a bare array of 64, not `{agents:[...]}` or `{data:[...]}` — first jq filter silently returned empty. Re-parsed correctly after dumping the shape. +- **`cargo check` could not run locally** (no Rust toolchain on GURU-5070's Git Bash). Verified the change by inspection; CI builds the agent crate on Pluto, so a compile error would fail the build rather than deploy badly. v0.6.46 built clean, confirming. + +### Configuration Changes + +- `projects/msp-tools/guru-rmm` commit **`bc3c2bd`** ("fix(agent): remove LibreHardwareMonitor bundling — Defender flags WinRing0 driver"): 6 files, +57/-331. + - `agent/src/ohw.rs` — `LhmGuard::start()` is now a clean no-op (no spawn, no `taskkill`, no 30s WMI poll). Module + `LHM_RUNNING` flag preserved. + - `agent/src/main.rs`, `agent/src/service.rs` — stale "Start LibreHardwareMonitor / Drop kills the child" comments updated to reflect no-op. + - `installer/gururmm-agent.wxs` — removed `ComponentGroupRef Id="LhmComponents"`, `LHM_DIR`, and the 10-File `ComponentGroup` (LibreHardwareMonitor.exe, .exe.config, LibreHardwareMonitorLib.dll, HidSharp.dll, Aga.Controls.dll, Newtonsoft.Json.dll, OxyPlot.dll, OxyPlot.WindowsForms.dll, Microsoft.Win32.TaskScheduler.dll, System.CodeDom.dll). + - `installer/gururmm-agent-linux.wxs` — removed 32 `ComponentRef`/`` entries + `LHM_DIR` (full LHM 0.9.4 net472 file set). + - `scripts/setup-build-server.ps1` — LHM download/stage replaced with "delete stale `C:\gururmm\lhm` if present" cleanup step. +- ClaudeTools commit **`6902645`** ("chore(submodule): advance guru-rmm — LHM removed from agent install"): submodule pointer `495575d → bc3c2bd`. +- Coord state writes (all from session `GURU-5070/claude-main`): + - PUT `components/gururmm/agents` → `building` → then `deployed v0.6.46`. + - POST `locks` on `gururmm/agents` (id `04fb5c30`, ttl 4h) → DELETE after build. + - POST `messages` to ALL_SESSIONS (`620af7f5` in-flight + `e032f029` done). + - POST `todos` `42c08298` (project_key `gururmm`, assigned_to_user `mike`). + +### Credentials & Secrets + +None new. + +### Infrastructure & Servers + +- **GuruRMM build server:** `guru@172.16.3.30` (Linux). Build logs `/var/log/gururmm-build-{windows,linux}.log`; last-built-commit at `/opt/gururmm/last-built-commit-{windows,linux,mac}`; artifacts under `/var/www/gururmm/downloads/`; webhook handler `localhost:9000` (PID 518498, ~3.2 days uptime, `ok`). +- **Pluto build VM:** `172.16.3.36` (Windows Server 2019 VM on Jupiter) — does the actual Windows MSI/EXE build via `build-windows.sh` driven from `guru@172.16.3.30`. +- **Coord API:** `http://172.16.3.30:8001/api/coord` — messages, locks, components, todos. +- **GuruRMM control API:** `http://172.16.3.30:3001` — `claude-api@azcomputerguru.com / ClaudeAPI2026!@#`. +- **Fleet at action time:** 64 agents total = 58 Windows + 5 Linux + 1 macOS; Windows 39 online / 19 offline. + +### Commands & Outputs + +- Fleet count: `curl /api/agents -H "Authorization: Bearer $TOKEN" | jq -r 'group_by(.os_type)[] | "\(.[0].os_type): \(length)"'` → linux 5, macos 1, windows 58. +- Build complete log line: `2026-05-28 03:07:42 [WINDOWS] === Windows build complete: v0.6.46 in 1126s ===`. +- last-built-commit post-build: windows = linux = `6326ec6ca90e853021ae59fc75c6d031307cdca7` (includes `bc3c2bd`). +- Coord todo POST error sample (long text): `{"detail":"There was an error parsing the body"}`. Validation-error sample (minimal POST): `{"error":"Request validation failed","details":{"validation_errors":[{"field":"body.created_by_user","message":"Field required","type":"missing"},{"field":"body.created_by_machine","message":"Field required","type":"missing"}]}}`. +- Coord todo schema (`api/schemas/coord_todo.py`): `text: str` (required, no max_length declared), `project_key: Optional[str] <=100`, `assigned_to_user: <=50`, `created_by_user: str <=50` (required), `created_by_machine: str <=100` (required), `auto_created: bool`, `source_context: Optional[str]`, `due_at: Optional[datetime]`, `parent_id: Optional[UUID]`. + +### Pending / Incomplete Tasks + +- **Endpoint WinRing0 service cleanup** (coord todo `42c08298`, gururmm): runtime-registered kernel service + any non-quarantined `.sys` on machines that ran the old agent are not MSI-tracked. Options: extend `installer/cleanup/gururmm-cleanup.exe` to stop+delete the WinRing0 service, or push a one-shot RMM `sc.exe stop/delete WinRing0_1_2_0; rm ` command across the 58 affected Windows endpoints. Ties to deferred temperature-monitoring replacement decision. +- **Proper headless temperature monitoring** (deferred — "revisit later"): replace LHM with a non-driver path (ACPI `MSAcpi_ThermalZoneTemperature` / vendor WMI) or accept no detailed temps fleet-wide. +- **`check-messages.sh` Windows `python3` compatibility** (open question to owner at end of /sync review): hook calls `python3` unconditionally inside `sanitize_json`; on Windows boxes without `python3` on PATH (most of ACG uses `py` per `feedback_python_windows`) the function silently fails and unread messages get dropped through the empty-result fallback. Proposed fix: `command -v python3 || command -v py || command -v python` or read `.python.command` from `identity.json` like the other scripts. +- Beta-channel rollout: agents pick up v0.6.46 on next check-in; verify a couple of online endpoints actually update and the `lhm` folder gets removed by MajorUpgrade. + +### Reference Information + +- Commits: gururmm **`bc3c2bd`** (LHM removal), gururmm **`6326ec6`** (CI auto-bump), ClaudeTools **`6902645`** (submodule advance). +- Coord IDs: lock `04fb5c30-34ff-4dd0-9696-70b380f93def`, todo `42c08298-be75-4f6d-954c-c9a9fd1138ee`, broadcast messages `620af7f5-f238-469b-a595-16f86e861458` (in-flight) + `e032f029-4aa2-4a3e-985f-f668ea174d61` (done). +- Build artifacts (v0.6.46, beta, signed): `gururmm-agent-base-0.6.46.msi`, `gururmm-agent-windows-amd64-0.6.46.exe`, `gururmm-agent-windows-x86-0.6.46.exe`, `gururmm-agent-windows-legacy-{amd64,x86}-0.6.46.exe`. `*-latest.{exe,msi}` symlinks all point at 0.6.46. +- Driver context: WinRing0 / CVE-2020-14979 (ring-0 arbitrary R/W LPE) / Microsoft vulnerable-driver blocklist / Defender signature `VulnerableDriver:WinNT/Winring0`. +- Related earlier session log on this calendar day: `clients/peaceful-spirit/session-logs/2026-05-27-session.md` (Bridgette VPN deployment + Syncro #32271 customer-visible completion note — different work, captured separately). +- New hook reviewed: `.claude/scripts/check-messages.sh` (commit `a35b583`, authored from GURU-KALI).