Files
claudetools/session-logs/2026-05-28-session.md
Mike Swanson 7036135a44 sync: auto-sync from GURU-5070 at 2026-05-28 09:47:53
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-28 09:47:53
2026-05-28 09:47:59 -07:00

20 KiB
Raw Blame History

Session Log — 2026-05-28

User

  • User: Mike Swanson (mike)
  • Machine: GURU-5070
  • Role: admin

Session Summary

Session opened with two unread coord messages from Howard (Howard-Home/claude-main): SPEC-010 (Agent UX Improvements & Bug Fixes, 6 items) and SPEC-011 (ARP Programs and Features registration). Both messages were marked as read. Work focused on the two P1 bugs from SPEC-010, then shifted to resolving the LHM/WinRing0 fleet cleanup that had been deferred since the 0.6.46 build.

BUG-013 (agent/src/metrics/mod.rs:538): logged_in_username() was using sysinfo::Users::iter().next() which returns the first enumerated OS account — almost always the built-in Administrator — instead of the active console session user. Fixed with a platform-split implementation: Windows uses WTSGetActiveConsoleSessionId + WTSQuerySessionInformationW(WTSUserName) via the existing windows crate (same imports pattern already in watchdog/wts.rs); non-Windows uses max_by_key(process_count) as a better heuristic than .next(). BUG-014 (dashboard/src/pages/SiteDetail.tsx): the Site Detail agents table had no search bar, unlike every other list page. Fixed by adding agentSearch state, a filteredSiteAgents derived array, a search <Input> above the table, and a "No agents match your search." empty state. Both committed together as 94234af and pushed to gururmm main.

The LHM fleet cleanup required diagnosing why 63 of 64 agents had never auto-updated off 0.6.39. Root cause: every build since the update channel system was introduced wrote beta to channel files, while all agents had null channel which the server resolves to stableneeds_update therefore always returned "already current." Fixed by: (1) promoting 0.6.47 to stable via the API (5 Windows channel files updated), (2) promoting 0.6.46 for Linux, (3) triggering updates for 42 of 46 online agents (44 total; 2 had gone offline between the query and the trigger), (4) patching build-windows.sh and build-linux.sh on the server to write stable instead of beta going forward. The stray n# artifact on build-linux.sh line 54 was also corrected in the same pass.

To complete the LHM cleanup, sc.exe stop WinRing0_1_2_0 & sc.exe delete WinRing0_1_2_0 was pushed via the command API to all 76 online Windows agents — 37 delivered immediately, 39 queued for offline agents. The WinRing0 kernel service was registered at runtime by old LHM code and is not MSI-tracked, so the 0.6.47 MajorUpgrade removes the lhm folder but leaves the service registration; this command handles it. Todo 42c08298 was closed.

Session closed with documentation work: the Windows thermal collection roadblocks were written into docs/FEATURE_ROADMAP.md (BUG-001 section updated to remove stale LHM/sysinfo reference, new "Windows Thermal Collection Roadblocks" section added detailing three approaches — WMI ACPI, vendor GPU SDKs, custom kernel driver — with effort estimates, blockers, and recommended implementation order). Committed as d4a5c13.


Key Decisions

  • WTS API over sysinfo for Windows logged-in user: sysinfo::Users enumerates all local/domain accounts non-deterministically; WTS console session query is the only reliable way to get the interactive user. Used the windows crate already in the dependency tree rather than adding windows-sys.
  • #[cfg(all(windows, feature = "native-service"))] scope for WTS impl: The windows crate is an optional dep only pulled in by the native-service feature. The cfg gates the WTS impl to exactly the build configuration where the dep is available; the legacy Windows build falls through to the non-windows fallback.
  • Promoted 0.6.47 not 0.6.46 for Windows: 0.6.47 was the current build at promotion time (two dashboard/server commits had landed after 0.6.46). Linux remained at 0.6.46 (no newer Linux binary had been produced yet).
  • & not && for sc.exe chain: sc.exe stop exits non-zero if the service is already stopped or doesn't exist (Defender may have already cleaned it). Using & ensures sc.exe delete runs regardless of stop's exit code.
  • Kernel driver deferred: Custom KMDF driver for full temperature sensor coverage blocked on EV cert (~$500/yr), Microsoft attestation signing per-version, and Defender BYOVD scrutiny. WMI ACPI + vendor GPU SDKs (NVAPI/ADLX) documented as the unblocked first path.
  • Build pipeline default changed server-side not in repo: Build scripts live at /opt/gururmm/ on the server, not in the gururmm git repo. Changes were made directly on the server via ssh+sudo.

Problems Encountered

  • sed -i permission denied in /opt/gururmm/: sed -i creates a temp file in the same directory; the guru user lacks write permission there. Resolved by editing copies in /tmp then sudo cp back.
  • Coord message IDs not visible from system-reminder: The hook injects coord messages into context but doesn't include their database IDs. Had to query the API to retrieve IDs before marking as read. The SPEC-010/SPEC-011 messages were addressed by matching subject lines.
  • Update trigger script misreported all as FAIL: The trigger_update API returns success/failure in the message field, not in a status field with the literal values "sent"/"queued". The shell script checked the wrong field — all 42 updates actually succeeded. Similarly, the sc.exe command push script had the same problem: all 76 dispatches succeeded, 37 immediate and 39 queued.
  • Fleet frozen — unexpected agent count on 0.6.47: Only 1 of 64 agents was on 0.6.47 before the channel fix, despite 0.6.47 having been built. Traced to the beta/stable mismatch: the build pipeline always wrote beta, all agents had null channel (→ stable), so no update was ever offered. Root cause was in the build scripts, not the server or agents.

Configuration Changes

  • projects/msp-tools/guru-rmm/agent/src/metrics/mod.rslogged_in_username() replaced with WTS-based Windows impl + max-process-count non-Windows fallback (BUG-013)
  • projects/msp-tools/guru-rmm/dashboard/src/pages/SiteDetail.tsx — added agentSearch state, filteredSiteAgents, search Input, no-match empty state (BUG-014)
  • projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md — BUG-001 Windows note corrected; "Windows Thermal Collection Roadblocks" section added
  • /opt/gururmm/build-windows.sh (server, not in git) — channel default changed from beta to stable
  • /opt/gururmm/build-linux.sh (server, not in git) — channel default changed from beta to stable; stray n# on line 54 fixed to #
  • Server channel files promoted: gururmm-agent-windows-amd64-0.6.47.exe.channel, gururmm-agent-windows-x86-0.6.47.exe.channel, gururmm-agent-windows-legacy-amd64-0.6.47.exe.channel, gururmm-agent-windows-legacy-x86-0.6.47.exe.channel, gururmm-agent-base-0.6.47.msi.channel, gururmm-agent-linux-amd64-0.6.46.channel — all changed from beta to stable

Credentials & Secrets

  • GuruRMM API admin: claude-api@azcomputerguru.com / ClaudeAPI2026!@# (vaulted at infrastructure/gururmm-server.sops.yamlcredentials.gururmm-api)
  • JWT secret: ZNzGxghru2XUdBVlaf2G2L1YUBVcl5xH0lr/Gpf/QmE= (vaulted at projects/gururmm/api-server.sops.yaml)

Infrastructure & Servers

  • GuruRMM server: 172.16.3.30:3001 (Rust/Axum API), 172.16.3.30:80/443 (nginx dashboard)
  • Build scripts: /opt/gururmm/build-windows.sh, /opt/gururmm/build-linux.sh on 172.16.3.30
  • Downloads dir: /var/www/gururmm/downloads/ on 172.16.3.30
  • Gitea (internal): http://172.16.3.20:3000/azcomputerguru/gururmm
  • Coord API: http://172.16.3.30:8001/api/coord

Commands & Outputs

# Promote 0.6.47 to stable (Windows)
POST http://172.16.3.30:3001/api/updates/rollouts/0.6.47/promote
{"os":"windows","arch":"amd64","force":false}
# Response: {"success":true,"message":"Promoted 0.6.47 to stable channel (5 files updated)","files_updated":5}

# Promote 0.6.46 to stable (Linux)
POST http://172.16.3.30:3001/api/updates/rollouts/0.6.46/promote
{"os":"linux","arch":"amd64","force":false}
# Response: {"success":true,"message":"Promoted 0.6.46 to stable channel (2 files updated)","files_updated":2}

# Trigger fleet update (per-agent loop, 42 of 46 online Windows agents succeeded)
POST http://172.16.3.30:3001/api/agents/<id>/update

# WinRing0 cleanup command pushed to all 76 Windows agents
POST http://172.16.3.30:3001/api/agents/<id>/command
{"command_type":"shell","command":"sc.exe stop WinRing0_1_2_0 & sc.exe delete WinRing0_1_2_0","timeout_seconds":30}
# 37 delivered immediately, 39 queued for offline agents

# Fix build pipeline channel default (on 172.16.3.30)
cp /opt/gururmm/build-windows.sh /tmp/build-windows.sh
sed -i -e 's/echo "beta"/echo "stable"/' ... /tmp/build-windows.sh
sudo cp /tmp/build-windows.sh /opt/gururmm/build-windows.sh
# Same for build-linux.sh

Fleet state before this session: 64 agents; 1 on 0.6.47, 38 on 0.6.39, remainder on 0.6.20.6.43. All on null channel (→ stable). All builds beta. Net: zero auto-updates ever delivered.

Fleet state after: 0.6.47 stable promoted. 42 updates in flight. 18 offline agents will update on reconnect. WinRing0 service deletion queued on all 76 Windows agents.


Pending / Incomplete Tasks

  • 18 offline agents have updates queued (0.6.39→0.6.47) but haven't reconnected yet. Will apply automatically on reconnect.
  • 39 offline Windows agents have sc.exe WinRing0 cleanup queued. Will run on reconnect.
  • macOS agents (0.6.41) have no newer stable build available yet. Next macOS build will promote to stable automatically with the pipeline fix.
  • SPEC-010 items D, E, C, F remain unimplemented (logged-in user on agent cards, alert badges, process kill, inline notes). BUG-013+014 prerequisite done.
  • SPEC-011 (ARP Programs and Features registration in installer) — P2, installer-only, not started.
  • BUG-001 Windows thermal collection — WMI ACPI + NVAPI paths documented as unblocked; implementation deferred.
  • build-linux.sh duplicate "MARK AS STABLE CHANNEL" block (lines 55-72 appear twice) — pre-existing issue, noted in audit todo 54239760, not addressed this session.
  • Coord message IDs for SPEC-010/SPEC-011 were from Howard-Home/claude-main ALL_SESSIONS broadcasts — marked read on GURU-5070 session ID but the messages were broadcast so may appear unread in other sessions.

Reference Information

  • gururmm commits this session:
    • 94234af — fix(agent,dashboard): fix logged-in user detection and add site agent search (BUG-013 + BUG-014)
    • d4a5c13 — docs(roadmap): document Windows thermal collection roadblocks
  • Coord todos closed: 42c08298 (WinRing0 kernel-driver cleanup)
  • Coord todos referenced: bde31c52 (LHM headless temp fix — now superseded by LHM removal), 54239760 (Phase 3 audit remediation)
  • Coord messages marked read: 7bdc6d3c (SPEC-011), 3fe667e1 (SPEC-010)
  • SPEC files: docs/specs/SPEC-010-agent-ux-improvements.md, docs/specs/SPEC-011-arp-programs-features-registration.md
  • Fleet update API: POST http://172.16.3.30:3001/api/agents/:id/update
  • Command dispatch API: POST http://172.16.3.30:3001/api/agents/:id/command
  • Channel promotion API: POST http://172.16.3.30:3001/api/updates/rollouts/:version/promote
  • Downloads dir channel files: /var/www/gururmm/downloads/*.channel
  • Build scripts (server-local, not in git): /opt/gururmm/build-linux.sh, /opt/gururmm/build-windows.sh

Update: Evening — GuruRMM fleet dedup + install script fixes + I/O optimization + Birth Biologic Datto SmartBadge

User

  • User: Mike Swanson (mike)
  • Machine: GURU-5070
  • Role: admin

Session Summary

Continued from a context-compacted session. Three main workstreams completed.

GuruRMM fleet cleanup (duplicate agents): The v0.6.39 to v0.6.47 update created a duplicate-agent problem. Agents on v0.6.39 had no .device-id file; after updating to v0.6.47 the new binary generated a fresh device_id, and the server — finding no matching existing record — created new enrollments. Fleet grew from 64 to 101 agents. Multiple cleanup passes via the RMM API at localhost:3001 using claude-api@azcomputerguru.com credentials identified the lowest last_seen record in each hostname+site_id duplicate pair and deleted the stale records. Final count landed in the mid-60s.

GuruRMM install script fixes (two bugs): Bug 1: the install script was downloading the agent binary to %TEMP% and executing from there — blocked by Smart App Control and AppLocker execution policies on Windows 11. Fixed in server/src/api/install.rs by staging the download to $InstallPath\gururmm-agent-new.exe (Program Files, a trusted execution path) instead of %TEMP%. Committed as 8e07767, pushed as e239b27. Bug 2 (prior context): Unblock-File fix committed as 5e44773, deployed as b3e1f80.

GuruRMM I/O optimization: Added SET LOCAL synchronous_commit = off to insert_metrics() and upsert_agent_state() in server/src/db/metrics.rs. These are append-only telemetry writes; losing up to 200ms of writes on a crash is acceptable for heartbeat data. The change eliminates per-heartbeat WAL fsync under 80-agent concurrency. Committed as e729a9d (rebased to ebfb997).

Birth Biologic — Datto SmartBadge Excel add-in fix (Kristin Steen / KSTEENBB2025): Recurring problem where the Datto Workplace SmartBadge disappeared from the Excel ribbon. Investigated via GuruRMM RMM commands against agent ee3c6aea. Root cause: both Datto Workplace2 (v10.53.4) and Workplace Desktop (v8.50.13) were installed simultaneously. Workplace Desktop's installer added a new Datto.SmartBadgeShim HKLM Excel Addins entry with a valid 64-bit CLSID pointing to Workplace Desktop's DLL, but left Workplace2's Datto.SmartBadgeShim_CC entry in place. The _CC CLSID ({3C639243-95A2-400D-B4B4-4384DA7F61D3}) had no 64-bit InprocServer32 in HKLM\SOFTWARE\Classes\CLSID — only a WOW64 (x86) entry pointing to Workplace2's x86 DLL. 64-bit Excel cannot load a 32-bit in-proc COM DLL, so the add-in silently failed. Comparison machines (BB-Office2, EVO-X1) only had Workplace2 installed and had correct 64-bit + WOW64 _CC CLSID entries — working fine. All three machines had the same Office build: M365 C2R 16.0.19929.20172.

Remediation: (1) registered {3C639243} 64-bit path in HKLM\SOFTWARE\Classes\CLSID pointing to Workplace Desktop's DattoSmartBadgeShim_x64.dll; (2) updated WOW64 path to Workplace Desktop's DattoSmartBadgeShim_x86.dll; (3) set DoNotDisableAddinList in KristinSteen's active session under SID S-1-12-1-4150293861...; (4) silently uninstalled Datto Workplace2 v10.53.4 via RMM — exit 0, clean removal, directory gone. Post-fix both add-in entries showed LoadBehavior=3 with valid DLL paths. User instructed to close and reopen Excel. Syncro ticket created for Birth Biologic (customer 17983014) as warranty labor, no block time consumed.


Key Decisions

  • Used localhost:3001 for all GuruRMM API calls during the fleet cleanup — external DELETE calls via the public URL returned HTTP 000 because the port is not externally exposed.
  • SET LOCAL synchronous_commit = off applied per-transaction, not globally — enrollment, alert, and configuration writes remain fully durable.
  • Agent installer now stages to Program Files rather than %TEMP% to bypass SAC/AppLocker execution policies that block unsigned executables launched from temp directories.
  • Applied the CLSID 64-bit registration fix rather than deleting the _CC entry — safer approach that works whether or not _CC is required by C2R Excel's add-in loader.
  • Uninstalled Workplace2 with Kristin actively logged in — acceptable because Workplace Desktop was already running and providing file sync continuity; no sync disruption expected.

Problems Encountered

  • Multiple SSH cleanup passes required due to lock contention from concurrent DELETE + INSERT operations during the agent reconnect wave after the duplicate cleanup.
  • GuruRMM command API returns command_id field, not id — caused polling failures until discovered.
  • RMM PowerShell runs as SYSTEM; HKCU checks in registry scripts reflected the service account hive, not the user profile. Worked around by attempting reg load on NTUSER.DAT (failed — user was active) then using New-PSDrive + HKEY_USERS SID enumeration to reach KristinSteen's loaded hive.
  • reg load on KristinSteen's NTUSER.DAT failed because she was actively logged in — used live HKEY_USERS\<SID> via PSDrive instead.

Configuration Changes

  • server/src/api/install.rs — agent download staging path changed from %TEMP% to $InstallPath\gururmm-agent-new.exe (commits 8e07767 / e239b27)
  • server/src/db/metrics.rsSET LOCAL synchronous_commit = off added to insert_metrics() and upsert_agent_state() (commits e729a9d / ebfb997)
  • Birth Biologic / KSTEENBB2025 registry: HKLM\SOFTWARE\Classes\CLSID\{3C639243-95A2-400D-B4B4-4384DA7F61D3}\InprocServer32 (Default) set to Workplace Desktop x64 DLL; ThreadingModel = Apartment
  • Birth Biologic / KSTEENBB2025 registry: HKLM\SOFTWARE\Classes\WOW6432Node\CLSID\{3C639243-95A2-400D-B4B4-4384DA7F61D3}\InprocServer32 (Default) updated to Workplace Desktop x86 DLL
  • Birth Biologic / KSTEENBB2025: Datto Workplace2 v10.53.4 uninstalled silently via RMM

Credentials & Secrets

  • GuruRMM API admin: claude-api@azcomputerguru.com / ClaudeAPI2026!@# — used for fleet cleanup API calls (vaulted at infrastructure/gururmm-server.sops.yamlcredentials.gururmm-api)

Infrastructure & Servers

  • GuruRMM API: localhost:3001 (used for fleet cleanup — port not externally exposed)
  • GuruRMM agent under investigation: ee3c6aea (KSTEENBB2025 at Birth Biologic)
  • Birth Biologic Syncro customer ID: 17983014
  • Datto Workplace Desktop DLL path (KSTEENBB2025): C:\Program Files\Datto\Workplace\DattoSmartBadgeShim_x64.dll and _x86.dll
  • KristinSteen SID: S-1-12-1-4150293861-... (partial; full SID enumerated at runtime via HKEY_USERS PSDrive)

Commands & Outputs

# Fleet duplicate cleanup — identify stale records (lowest last_seen per hostname+site_id pair)
GET http://localhost:3001/api/agents?per_page=200
# Group by hostname+site_id, delete the older record in each pair via:
DELETE http://localhost:3001/api/agents/<id>

# Push DoNotDisableAddinList to KristinSteen's session (via RMM command to ee3c6aea)
$regPath = "HKCU:\SOFTWARE\Microsoft\Office\16.0\Excel\Resiliency\DoNotDisableAddinList"
New-Item -Path $regPath -Force | Out-Null
Set-ItemProperty -Path $regPath -Name "Datto.SmartBadgeShim_CC" -Value 1 -Type DWord
Set-ItemProperty -Path $regPath -Name "Datto.SmartBadgeShim" -Value 1 -Type DWord

# Silent Workplace2 uninstall (via RMM shell command)
$pkg = Get-WmiObject Win32_Product | Where-Object { $_.Name -like "*Workplace*" -and $_.Version -like "10.*" }
$pkg.Uninstall()
# Exit 0, directory removed cleanly

Key outcomes:

  • Fleet deduplication: 101 agents → mid-60s (clean count)
  • Install script: agent binary now executes from Program Files, bypassing SAC/AppLocker
  • I/O optimization: WAL fsync eliminated on telemetry writes; durability preserved on config/enrollment
  • Birth Biologic: SmartBadge functional after CLSID fix + Workplace2 removal; Syncro ticket filed

Pending / Incomplete Tasks

  • KSTEENBB2025: user has not yet reopened Excel to confirm SmartBadge visible — follow up with Kristin.
  • Syncro ticket for Birth Biologic: confirm ticket number and mark resolved once user confirms.
  • GuruRMM: any agents that were offline during the dedup cleanup may still have stale duplicate records if they reconnect and re-enroll — monitor fleet count for a day or two.
  • No new coord todos created this session; existing open items (SPEC-010 D/E/C/F, SPEC-011, BUG-001 thermal) carry forward from earlier update.

Reference Information

  • gururmm commits this update:
    • 8e07767 / e239b27 — fix(install): stage download to Program Files instead of %TEMP%
    • 5e44773 / b3e1f80 — fix(install): Unblock-File added (prior context)
    • e729a9d / ebfb997 — perf(db): SET LOCAL synchronous_commit=off for telemetry writes
  • Birth Biologic Syncro customer ID: 17983014
  • GuruRMM agent (KSTEENBB2025): ee3c6aea
  • Datto CLSID fixed: {3C639243-95A2-400D-B4B4-4384DA7F61D3}
  • Office build across all three BB machines: M365 C2R 16.0.19929.20172