11 KiB
Session Log — 2026-05-28
User
- User: Mike Swanson (mike)
- Machine: GURU-5070
- Role: admin
Session Summary
Session opened with two unread coord messages from Howard (Howard-Home/claude-main): SPEC-010 (Agent UX Improvements & Bug Fixes, 6 items) and SPEC-011 (ARP Programs and Features registration). Both messages were marked as read. Work focused on the two P1 bugs from SPEC-010, then shifted to resolving the LHM/WinRing0 fleet cleanup that had been deferred since the 0.6.46 build.
BUG-013 (agent/src/metrics/mod.rs:538): logged_in_username() was using sysinfo::Users::iter().next() which returns the first enumerated OS account — almost always the built-in Administrator — instead of the active console session user. Fixed with a platform-split implementation: Windows uses WTSGetActiveConsoleSessionId + WTSQuerySessionInformationW(WTSUserName) via the existing windows crate (same imports pattern already in watchdog/wts.rs); non-Windows uses max_by_key(process_count) as a better heuristic than .next(). BUG-014 (dashboard/src/pages/SiteDetail.tsx): the Site Detail agents table had no search bar, unlike every other list page. Fixed by adding agentSearch state, a filteredSiteAgents derived array, a search <Input> above the table, and a "No agents match your search." empty state. Both committed together as 94234af and pushed to gururmm main.
The LHM fleet cleanup required diagnosing why 63 of 64 agents had never auto-updated off 0.6.39. Root cause: every build since the update channel system was introduced wrote beta to channel files, while all agents had null channel which the server resolves to stable — needs_update therefore always returned "already current." Fixed by: (1) promoting 0.6.47 to stable via the API (5 Windows channel files updated), (2) promoting 0.6.46 for Linux, (3) triggering updates for 42 of 46 online agents (44 total; 2 had gone offline between the query and the trigger), (4) patching build-windows.sh and build-linux.sh on the server to write stable instead of beta going forward. The stray n# artifact on build-linux.sh line 54 was also corrected in the same pass.
To complete the LHM cleanup, sc.exe stop WinRing0_1_2_0 & sc.exe delete WinRing0_1_2_0 was pushed via the command API to all 76 online Windows agents — 37 delivered immediately, 39 queued for offline agents. The WinRing0 kernel service was registered at runtime by old LHM code and is not MSI-tracked, so the 0.6.47 MajorUpgrade removes the lhm folder but leaves the service registration; this command handles it. Todo 42c08298 was closed.
Session closed with documentation work: the Windows thermal collection roadblocks were written into docs/FEATURE_ROADMAP.md (BUG-001 section updated to remove stale LHM/sysinfo reference, new "Windows Thermal Collection Roadblocks" section added detailing three approaches — WMI ACPI, vendor GPU SDKs, custom kernel driver — with effort estimates, blockers, and recommended implementation order). Committed as d4a5c13.
Key Decisions
- WTS API over sysinfo for Windows logged-in user:
sysinfo::Usersenumerates all local/domain accounts non-deterministically; WTS console session query is the only reliable way to get the interactive user. Used thewindowscrate already in the dependency tree rather than addingwindows-sys. #[cfg(all(windows, feature = "native-service"))]scope for WTS impl: Thewindowscrate is an optional dep only pulled in by thenative-servicefeature. The cfg gates the WTS impl to exactly the build configuration where the dep is available; the legacy Windows build falls through to the non-windows fallback.- Promoted 0.6.47 not 0.6.46 for Windows: 0.6.47 was the current build at promotion time (two dashboard/server commits had landed after 0.6.46). Linux remained at 0.6.46 (no newer Linux binary had been produced yet).
¬&&for sc.exe chain:sc.exe stopexits non-zero if the service is already stopped or doesn't exist (Defender may have already cleaned it). Using&ensuressc.exe deleteruns regardless of stop's exit code.- Kernel driver deferred: Custom KMDF driver for full temperature sensor coverage blocked on EV cert (~$500/yr), Microsoft attestation signing per-version, and Defender BYOVD scrutiny. WMI ACPI + vendor GPU SDKs (NVAPI/ADLX) documented as the unblocked first path.
- Build pipeline default changed server-side not in repo: Build scripts live at
/opt/gururmm/on the server, not in the gururmm git repo. Changes were made directly on the server via ssh+sudo.
Problems Encountered
sed -ipermission denied in/opt/gururmm/:sed -icreates a temp file in the same directory; theguruuser lacks write permission there. Resolved by editing copies in/tmpthensudo cpback.- Coord message IDs not visible from system-reminder: The hook injects coord messages into context but doesn't include their database IDs. Had to query the API to retrieve IDs before marking as read. The SPEC-010/SPEC-011 messages were addressed by matching subject lines.
- Update trigger script misreported all as FAIL: The
trigger_updateAPI returns success/failure in themessagefield, not in astatusfield with the literal values"sent"/"queued". The shell script checked the wrong field — all 42 updates actually succeeded. Similarly, the sc.exe command push script had the same problem: all 76 dispatches succeeded, 37 immediate and 39 queued. - Fleet frozen — unexpected agent count on 0.6.47: Only 1 of 64 agents was on 0.6.47 before the channel fix, despite 0.6.47 having been built. Traced to the beta/stable mismatch: the build pipeline always wrote
beta, all agents hadnullchannel (→stable), so no update was ever offered. Root cause was in the build scripts, not the server or agents.
Configuration Changes
projects/msp-tools/guru-rmm/agent/src/metrics/mod.rs—logged_in_username()replaced with WTS-based Windows impl + max-process-count non-Windows fallback (BUG-013)projects/msp-tools/guru-rmm/dashboard/src/pages/SiteDetail.tsx— addedagentSearchstate,filteredSiteAgents, search Input, no-match empty state (BUG-014)projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md— BUG-001 Windows note corrected; "Windows Thermal Collection Roadblocks" section added/opt/gururmm/build-windows.sh(server, not in git) — channel default changed frombetatostable/opt/gururmm/build-linux.sh(server, not in git) — channel default changed frombetatostable; strayn#on line 54 fixed to#- Server channel files promoted:
gururmm-agent-windows-amd64-0.6.47.exe.channel,gururmm-agent-windows-x86-0.6.47.exe.channel,gururmm-agent-windows-legacy-amd64-0.6.47.exe.channel,gururmm-agent-windows-legacy-x86-0.6.47.exe.channel,gururmm-agent-base-0.6.47.msi.channel,gururmm-agent-linux-amd64-0.6.46.channel— all changed frombetatostable
Credentials & Secrets
- GuruRMM API admin:
claude-api@azcomputerguru.com/ClaudeAPI2026!@#(vaulted atinfrastructure/gururmm-server.sops.yaml→credentials.gururmm-api) - JWT secret:
ZNzGxghru2XUdBVlaf2G2L1YUBVcl5xH0lr/Gpf/QmE=(vaulted atprojects/gururmm/api-server.sops.yaml)
Infrastructure & Servers
- GuruRMM server:
172.16.3.30:3001(Rust/Axum API),172.16.3.30:80/443(nginx dashboard) - Build scripts:
/opt/gururmm/build-windows.sh,/opt/gururmm/build-linux.shon172.16.3.30 - Downloads dir:
/var/www/gururmm/downloads/on172.16.3.30 - Gitea (internal):
http://172.16.3.20:3000/azcomputerguru/gururmm - Coord API:
http://172.16.3.30:8001/api/coord
Commands & Outputs
# Promote 0.6.47 to stable (Windows)
POST http://172.16.3.30:3001/api/updates/rollouts/0.6.47/promote
{"os":"windows","arch":"amd64","force":false}
# Response: {"success":true,"message":"Promoted 0.6.47 to stable channel (5 files updated)","files_updated":5}
# Promote 0.6.46 to stable (Linux)
POST http://172.16.3.30:3001/api/updates/rollouts/0.6.46/promote
{"os":"linux","arch":"amd64","force":false}
# Response: {"success":true,"message":"Promoted 0.6.46 to stable channel (2 files updated)","files_updated":2}
# Trigger fleet update (per-agent loop, 42 of 46 online Windows agents succeeded)
POST http://172.16.3.30:3001/api/agents/<id>/update
# WinRing0 cleanup command pushed to all 76 Windows agents
POST http://172.16.3.30:3001/api/agents/<id>/command
{"command_type":"shell","command":"sc.exe stop WinRing0_1_2_0 & sc.exe delete WinRing0_1_2_0","timeout_seconds":30}
# 37 delivered immediately, 39 queued for offline agents
# Fix build pipeline channel default (on 172.16.3.30)
cp /opt/gururmm/build-windows.sh /tmp/build-windows.sh
sed -i -e 's/echo "beta"/echo "stable"/' ... /tmp/build-windows.sh
sudo cp /tmp/build-windows.sh /opt/gururmm/build-windows.sh
# Same for build-linux.sh
Fleet state before this session: 64 agents; 1 on 0.6.47, 38 on 0.6.39, remainder on 0.6.2–0.6.43. All on null channel (→ stable). All builds beta. Net: zero auto-updates ever delivered.
Fleet state after: 0.6.47 stable promoted. 42 updates in flight. 18 offline agents will update on reconnect. WinRing0 service deletion queued on all 76 Windows agents.
Pending / Incomplete Tasks
- 18 offline agents have updates queued (0.6.39→0.6.47) but haven't reconnected yet. Will apply automatically on reconnect.
- 39 offline Windows agents have sc.exe WinRing0 cleanup queued. Will run on reconnect.
- macOS agents (0.6.41) have no newer stable build available yet. Next macOS build will promote to stable automatically with the pipeline fix.
- SPEC-010 items D, E, C, F remain unimplemented (logged-in user on agent cards, alert badges, process kill, inline notes). BUG-013+014 prerequisite done.
- SPEC-011 (ARP Programs and Features registration in installer) — P2, installer-only, not started.
- BUG-001 Windows thermal collection — WMI ACPI + NVAPI paths documented as unblocked; implementation deferred.
- build-linux.sh duplicate "MARK AS STABLE CHANNEL" block (lines 55-72 appear twice) — pre-existing issue, noted in audit todo
54239760, not addressed this session. - Coord message IDs for SPEC-010/SPEC-011 were from Howard-Home/claude-main
ALL_SESSIONSbroadcasts — marked read on GURU-5070 session ID but the messages were broadcast so may appear unread in other sessions.
Reference Information
- gururmm commits this session:
94234af— fix(agent,dashboard): fix logged-in user detection and add site agent search (BUG-013 + BUG-014)d4a5c13— docs(roadmap): document Windows thermal collection roadblocks
- Coord todos closed:
42c08298(WinRing0 kernel-driver cleanup) - Coord todos referenced:
bde31c52(LHM headless temp fix — now superseded by LHM removal),54239760(Phase 3 audit remediation) - Coord messages marked read:
7bdc6d3c(SPEC-011),3fe667e1(SPEC-010) - SPEC files:
docs/specs/SPEC-010-agent-ux-improvements.md,docs/specs/SPEC-011-arp-programs-features-registration.md - Fleet update API:
POST http://172.16.3.30:3001/api/agents/:id/update - Command dispatch API:
POST http://172.16.3.30:3001/api/agents/:id/command - Channel promotion API:
POST http://172.16.3.30:3001/api/updates/rollouts/:version/promote - Downloads dir channel files:
/var/www/gururmm/downloads/*.channel - Build scripts (server-local, not in git):
/opt/gururmm/build-linux.sh,/opt/gururmm/build-windows.sh