Files
claudetools/wiki/projects/gururmm-agent.md
Mike Swanson 70c496bb30 wiki: compile 5 missing articles + dedupe neptune queue entry
Seeded via /wiki-compile (parallel sub-agents):
- clients: gonzvar-tax-services, tohono-oodham-doit (Syncro 33069069),
  tucson-golden-corral (Syncro 3859123)
- projects: gururmm-agent (artifact-based, agent/ @ origin/main), msp-tools (umbrella)
Index rows added for all five. Deduped the duplicate system:neptune compile-queue
entry (merged the cert/DkimSigner note into one).

Left as-is (intentional, not duplicates/dead): wiki/projects/guru-rmm.md is a
redirect tombstone; the patterns/tailscale-client-enroll.ps1 index link is valid
(the .ps1 script exists).
2026-06-12 08:06:07 -07:00

33 KiB

type, name, display_name, last_compiled, compiled_by, sources, backlinks
type name display_name last_compiled compiled_by sources backlinks
project gururmm-agent GuruRMM Agent 2026-06-12 GURU-5070/claude-main
gururmm@a794a7f: agent/src/main.rs
gururmm@a794a7f: agent/src/commands/mod.rs
gururmm@a794a7f: agent/src/transport/mod.rs
gururmm@a794a7f: agent/src/metrics/mod.rs
gururmm@a794a7f: agent/src/checks.rs
gururmm@a794a7f: agent/src/updater/mod.rs
gururmm@a794a7f: agent/src/bsod.rs
gururmm@a794a7f: agent/src/watchdog/mod.rs
gururmm@a794a7f: agent/src/inventory.rs
gururmm@a794a7f: agent/src/users.rs
gururmm@a794a7f: agent/src/tunnel/mod.rs
gururmm@a794a7f: agent/src/event_log.rs
gururmm@a794a7f: agent/src/compliance.rs
gururmm@a794a7f: agent/src/discovery/mod.rs
gururmm@a794a7f: agent/src/registry_ops/mod.rs
gururmm@a794a7f: agent/src/vss.rs
gururmm@a794a7f: agent/Cargo.toml
gururmm@a794a7f: agent/agent.toml.example
gururmm@a794a7f: server/migrations/ (059 migrations, filenames as capability timeline)
gururmm@a794a7f: docs/FEATURE_ROADMAP.md
gururmm@a794a7f: git log origin/main -- agent/ (recent 30 commits)
projects/gururmm-agent/session-logs/2026-05-25-recovered-review-fix-audit-2-remediation-branch-status.md
projects/gururmm-agent/session-logs/2026-06-01-recovered-investigate-blue-screen-and-test-detection-featu.md
projects/gururmm-agent/session-logs/2026-06-01-recovered-investigate-blue-screen-and-test-detection-featu-f5631414.md
wiki/projects/gururmm.md (cross-reference)
projects/gururmm

GuruRMM Agent

Summary

The GuruRMM agent is the endpoint component of the gururmm platform. It is a Rust binary that installs as a long-running system service on managed Windows, Linux, and macOS endpoints. The agent connects to the GuruRMM server over an authenticated WebSocket, reports system metrics, executes remote commands, manages self-updates, and runs a variety of monitoring and compliance tasks.

Current version: 0.6.66 at agent/Cargo.toml HEAD (a794a7f). Fleet is converged to 0.6.63 stable as of 2026-06-11 (see gururmm for live fleet state). This article covers agent capabilities only — server, dashboard, and platform architecture are documented in gururmm.

Crate name: gururmm-agent. Single binary; CLI subcommands select the operational mode (run, install, uninstall, start, stop, status, generate-config, watchdog, vss-snapshot, service, watchdog-service).


Capabilities / Feature Set

Monitoring and Telemetry

Periodic metrics (default interval 60s, configurable via policy ConfigUpdate):

Metric Notes
CPU usage % Cross-platform via sysinfo
Memory used/total bytes + % Cross-platform
Disk used/total bytes + % (primary disk) Cross-platform
Network RX/TX bytes (delta) Cross-platform
OS type, version, hostname Cross-platform
Uptime seconds, boot time Cross-platform
Logged-in user + idle seconds Win: GetLastInputInfo; Linux: xprintidle; macOS: CGEventSource (verify coverage)
Public/WAN IP Cached; fetched periodically from external service
Top 10 processes by CPU Cross-platform via sysinfo::Processes
Top 10 processes by memory Cross-platform
CPU package temperature Via sysinfo::Components. Windows: thermal zones depend on BIOS ACPI export — firmware-less coverage; LibreHardwareMonitor removed 2026-05-27 (CVE-2020-14979, WinRing0 Defender quarantine). Windows thermal collection is currently partial or None on most hosts.
GPU temperature Via sysinfo::Components. Same Windows caveat as above.
All hardware sensor readings (label, value, unit, critical threshold) Via sysinfo::Components. Reliable on Linux; variable on Windows/macOS.

Network state (sent on connect and on interface change): per-interface IPv4/IPv6 addresses, MAC, derived CIDR subnets.

Log upload: Agent log bundle sent every 12 hours (LogUpload message). The upload task uses a watch::Sender for the current WS sender so it never holds a stale handle from a prior connection.


Remote Execution

Commands are dispatched by the server as CommandPayload messages over the WebSocket.

Command types (from transport::CommandType enum):

Type Notes
shell cmd.exe on Windows, bash on Unix. Also accepts the alias "cmd" (backward compat — servers that send command_type: "cmd" historically triggered a parse failure that silently dropped the message; fixed in commit 3de9faf).
powershell Windows PowerShell. Also accepts the alias "powershell".

Extended types (python, script, claude_task) are referenced in the parent article at gururmm and in agent/src/scripts.rs / agent/src/claude.rs; confirm against those modules for current state.

Execution contexts (from transport::CommandContext, added migration 041):

Context Behavior
system (default) Runs in Session 0, in the service's own process context (LocalSystem on Windows).
user_session Windows only. Impersonates the active logged-on user's desktop session via WTSQueryUserToken + CreateProcessAsUserW + per-user environment block. Requires an active console/RDS session. Implemented in agent/src/watchdog/wts.rs. Returns an error on non-Windows platforms.

Command options: timeout_seconds (optional), elevated (bool), context (defaulting to system).

In-flight management:

  • Individual cancellation: server sends CancelCommand { command_id }; agent aborts the Tokio JoinHandle immediately.
  • All in-flight commands aborted on disconnect (abort_all called by WS reconnect loop).

Comms durability (Phase 1, shipped 2026-06-11 as v0.6.63):

  • Agent sends CommandAck { command_id } immediately on receipt of a dispatched command (before execution begins). Server stamps acked_at (migration 058).
  • Re-delivery dedup: CommandExecutor keeps a FIFO-bounded cache of recently-completed results (capacity 64, max 256 KB per result). A re-delivered command that is already in-flight is ignored; one that already completed re-reports the cached result without re-executing.
  • Result is cached BEFORE deregistering the running task (record_result -> complete) so a re-delivery in the race window always finds it cached.
  • Server reaper re-delivers un-acked commands past a 60s ACK deadline (returns to pending) instead of failing them. Pending commands re-offered on every heartbeat (rides the refreshed NAT conntrack). Capability gate: reaper only re-delivers for agents that have demonstrably ACK'd at least once (partial index on acked_at); old agents keep the legacy fail-on-timeout path.

Hardware Inventory

agent/src/inventory.rsHardwareInventory struct, sent on connect and on server request (InventoryReport message).

Field group Fields
System identity manufacturer, model, serial_number, bios_version
CPU cpu_model, cpu_cores, cpu_threads, cpu_speed_mhz
Memory total_memory_mb
Disks name, mount, total_gb, fs_type (Vec)
Network name, IP, MAC, speed_mbps (Vec)
Software name, version, publisher (Vec, installed applications)
Services name, display_name, status, start_type (Vec)
OS detail os_name, os_version, os_build
Windows OS product type os_product_type: 1=Workstation, 2=DC, 3=Server (None on Linux/macOS)
Windows OS edition os_edition: "Pro", "Standard", "Datacenter", etc. (None on Linux/macOS)
VM / container is_virtual_machine, hypervisor_type, vm_uuid, is_hypervisor, hosted_vm_uuids, is_container, is_unraid
Agent version agent_version

Collection uses platform-specific subprocess calls (wmic/dmidecode/system_profiler) and sysinfo. Fields are Option<T> to avoid panics on platforms that cannot supply them.


User Inventory

agent/src/users.rsUserInventory struct, sent on connect and on server request (UserInventoryReport message). Policy-scheduled (default 24h interval).

Per-user fields: username, display_name, account_type (local/domain/aad), enabled, password_never_expires, password_expired, last_logon, is_admin, email (AD), upn (AD), department (AD), group membership.

Collection per platform:

  • Windows: Get-LocalUser, Get-ADUser (if domain-joined), dsregcmd /status (Azure AD/hybrid). Group membership from Get-LocalGroupMember. DC detection (is_dc) from AD role query.
  • Linux: getent passwd + /etc/group.
  • macOS: dscl, dsconfigad, dseditgroup.

User actions dispatched by server: enable/disable accounts, expire/un-expire password, create accounts, reset passwords. Executed via PowerShell on Windows, shell commands on Linux/macOS. Results reported as UserActionResult.


Checks

agent/src/checks.rs — server-defined check configurations (CheckPayload) executed on demand or on schedule.

Check type Implementation
cpu sysinfo global CPU usage; value returned, threshold evaluated server-side
memory sysinfo total/used memory; percentage computed
disk sysinfo disks; usage percentage
ping tokio::process::Command calls platform ping binary
port TcpStream::connect with timeout
script Arbitrary shell/script command; stdout/stderr/exit code captured
service Win: sc.exe query; Linux: systemctl is-active; reports running/stopped

CPU, memory, and disk checks wrap blocking sysinfo calls in spawn_blocking. Unknown check types return status failing with an error message (no silent no-op).


Self-Update

agent/src/updater/mod.rs.

Flow:

  1. Server sends UpdatePayload (version, download URL, SHA-256 checksum).
  2. Agent downloads the binary via HTTPS (300s timeout).
  3. SHA-256 checksum verified before touching the live binary.
  4. Current binary copied to backup path (gururmm-agent.backup in config dir).
  5. New binary atomically replaces current (via temp write + rename).
  6. Agent restarts itself.
  7. On reconnect, agent reports UpdateResult with outcome. If the agent fails to reconnect within ~180s rollback window, the backup binary is restored and the service restarts again.

Windows: backup path C:\ProgramData\GuruRMM\gururmm-agent.backup (or GuruRMM-Debug\ in debug builds). On restart, uses a detached child process that waits for parent exit before swapping.

Linux/macOS: backup at /etc/gururmm/gururmm-agent.backup.

Update channels: stable / beta / null (inherits default stable). Agent reports previous_version and pending_update_id in its next Auth payload so the server can correlate the update result. New builds are tagged beta by default; promotion to stable is a deliberate re-tag of the .channel sidecar file on the server's downloads directory.


Watchdog (Windows Only)

agent/src/watchdog/ — a second, separate Windows SCM service (GuruRMMWatchdog) that runs from the same binary via gururmm-agent watchdog.

Responsibilities:

  1. Polls SCM every 30s for GuruRMMAgent. On unexpected stop: restart with backoff 30s / 60s / 120s. After 3 failed attempts, POSTs a watchdog alert to the server and continues monitoring.
  2. Serves a named-pipe IPC channel so the main agent can request a clean service restart without racing against its own SCM stop signal.
  3. watchdog/wts.rs: WTS (Windows Terminal Services) token management for the user_session command context — acquires the active console session's user token, creates the child process with CreateProcessAsUserW, and provides the per-user environment block.

The ensure_watchdog_running() function is called by the main agent on startup. It uses a two-step SCM privilege model: opens with CONNECT only first (least privilege); escalates to CREATE_SERVICE only if the service does not yet exist.

Platform: The run_watchdog_service() entrypoint is a no-op stub on Linux/macOS (#[cfg(not(windows))] returns a warning).


Event Log Watches (Windows Only)

agent/src/event_log.rs — added migration 047.

Mechanism: Server pushes EventLogWatchRule configs as part of ConfigUpdate. The agent polls matching rules at a configured interval. Each rule specifies log_name, optional event_id, optional source, and optional level (critical/error/warning/information). Matches are queried via Get-WinEvent PowerShell with a since timestamp watermark. Matching entries are batched and sent as EventLogMatches messages.

Platform: Windows only. query_watch_rule is #[cfg(target_os = "windows")]. Non-Windows builds compile with a stub that returns an empty Vec.

Injection guard: log_name has single-quote characters stripped before inserting into the PowerShell filter hash.


BSOD / Kernel Crash Detection (Windows Only)

agent/src/bsod.rs — added migration 048. Shipped v0.6.51 (June 2026).

How it works:

  • Runs one-shot on agent startup, then polls on a periodic interval.
  • Enumerates C:\Windows\Minidump\*.dmp.
  • Parses the kernel dump header by hand:
    • 64-bit PAGEDU64 (DUMP_HEADER64): bugcheck code at offset 0x38 (u32), 4 parameters at 0x40 (4x u64), system filetime at 0xFA8.
    • 32-bit PAGEDUMP (DUMP_HEADER): bugcheck code at 0x28 (u32), 4 parameters at 0x2C (4x u32).
    • The minidump crate is intentionally not used — it parses Breakpad MDMP format, not Windows kernel PAGEDU64 dumps.
  • Cross-references the System event log (WER event 1001, Kernel-Power event 41) for faulting driver name and WER Report Id.
  • Computes SHA-256 of each dump file. Already-seen hashes are stored in a watermark file (C:\ProgramData\GuruRMM\bsod-seen.json). Watermark writes use a tmp-then-rename atomic pattern to survive mid-write crashes.
  • First-run suppression: On the first run (no watermark file), all existing dumps are baselined as already-seen — no retroactive alerts for crashes that predate the agent install.
  • Sends one BsodEvent per newly-detected crash. Server-side: bsod_events table (migration 048), deduplicated by (agent_id, dump_sha256), always Critical severity. Dashboard Crashes tab shipped 2026-06-07.

Platform: #[cfg(target_os = "windows")]. Non-Windows compiles to an empty stub.

Validated against: A real 0x116 VIDEO_TDR_FAILURE (nvlddmkm.sys) on GURU-5070 during the 2026-06-01 session.


VSS Shadow Copy Management (Windows Only, SPEC-016)

agent/src/vss.rs + agent/src/compliance.rs — migrations 050, 051.

Mechanism: Driven entirely via PowerShell relay (crate::powershell::run_ps()); no native COM. Non-Windows platforms compile thin stubs.

Policy-driven snapshot scheduling:

  • The agent receives VSS policy as part of ConfigUpdate. It mirrors the policy to C:\ProgramData\GuruRMM\vss-policy.json on every ConfigUpdate.
  • Snapshot execution is performed by a separate short-lived process invocation (gururmm-agent vss-snapshot), triggered by a Windows Scheduled Task named GuruRMM-VSS-Snapshot. The task is registered/updated by the agent when the policy hash changes.
  • Shadow storage is always bounded: agent runs vssadmin resize shadowstorage /maxsize=N% — never left unbounded.
  • Retention governed by max count and max age, as upper bounds on top of Windows' own storage-cap eviction.
  • Per-volume first-run staggering: a vss-firstrun-<driveLetter>.flag marker gates when each additional volume starts snapshotting (C: first, then others).

DeviceObject handle: The \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopyN string is the stable identifier and is persisted at creation time. The trailing index N is not stable across reboots.

Compliance reporting (SPEC-025): agent/src/compliance.rs implements evaluate_all(), which calls each policy domain's evaluator. VSS is Domain #1. Evaluation is read-only — it reports posture (compliant, pending, non_compliant, not_applicable) without mutating the machine. VSS is the only domain in v1 that performs opt-in, kill-switchable self-heal. A compliance batch is sent as ComplianceReport. Serde-tolerant: older servers ignore the variant.

Status reporting: VssStatus message sent on a slow cadence — per-volume shadow storage usage for server cache and dashboard display.

Platform notes in code (TODO markers):

  • Linux: LVM snapshots (lvcreate --snapshot) — not implemented.
  • macOS: APFS snapshots (tmutil localsnapshot) — not implemented.

Network Discovery

agent/src/discovery/mod.rs.

How it works: Server dispatches DiscoveryScanConfig (IP ranges, ports, timeout_ms, concurrency, exclusions). Agent expands ranges to individual IPs and probes:

  1. TCP connect on each configured port (async, up to 200 concurrent probes via Semaphore).
  2. ICMP ping fallback for hosts where all TCP ports are firewalled but the host is up.
  3. ARP table lookup for MAC address.
  4. Reverse DNS lookup for hostname.
  5. OS fingerprinting from open port set.

Results are streamed back as DiscoveryResult messages (one per found device) and finalized with DiscoveryComplete (total found, duration ms).


Registry Operations (Windows Only)

agent/src/registry_ops/ via the winreg crate.

Operations:

Function Notes
enumerate_keys(path) Lists subkeys under a registry path
enumerate_values(path) Lists values under a registry path
read_value(path, name) Reads a single registry value
set_value(path, name, value_type, value_data) Writes a registry value (typed, raw bytes)
create_key(path) Creates a registry key

All operations return anyhow::Error on non-Windows platforms (compile-time stubs; no silent no-op). Note from parent article gururmm: the HTTP API currently exposes read-only paths (enumerate, read_value); write paths exist in the agent but are not yet routed server-side. (verify current state)


Tunnel

agent/src/tunnel/mod.rs. Agent-side TunnelManager state machine.

Modes:

  • Heartbeat (default): periodic metrics and heartbeats.
  • Tunnel: active session with a tech (triggered by TunnelOpen { session_id, tech_id } from server). Bidirectional data relay over the existing WebSocket connection.

Channel types defined in code: Terminal, File (Phase 2+), Registry (Phase 2+), Service (Phase 2+). Currently only Terminal is operationally relevant.

Server status: The server-side tunnel skeleton exists but is not production-ready (no /tunnel API routes declared, WS handler logs "not yet implemented"). Live TTY is planned as Phase 2 of the agent-comms-durability spec.


IPC and Tray Integration

agent/src/ipc.rs.

Windows: Named-pipe IPC server for the GuruRMM Tray companion process. Linux: Unix socket IPC.

Operations available over IPC:

  • Subscribe/unsubscribe: tray receives IpcStatusUpdate broadcasts when WS connection state changes.
  • Force check-in: tray requests immediate metrics collection (wakes the metrics task via AppState::force_checkin Notify).
  • Per-section policy update: server's ConfigUpdate is relayed to the tray for display.

IPC subscribers are tracked in AppState::ipc_subscribers (RwLock<Vec<UnboundedSender<IpcStatusUpdate>>>).


Architecture

Components

Component Location State
Agent service (Windows) C:\Program Files\GuruRMM\gururmm-agent.exe, SCM service GuruRMMAgent Deployed; 0.6.63 stable
Watchdog service (Windows) Same binary, SCM service GuruRMMWatchdog Deployed
Agent service (Linux) /usr/local/bin/gururmm-agent, systemd gururmm-agent.service Deployed
Agent service (macOS) /usr/local/bin/gururmm-agent, LaunchDaemon com.azcomputerguru.gururmm-agent.plist Phase 1 deployed 2026-05-12
Tray (Windows) Named-pipe IPC, separate binary Deployed; BUG-020 ghost-icon fix in beta
Tray (Linux) Unix socket IPC, libappindicator/GTK Deployed (PR #13+#14, 2026-05-24)
Tray (macOS) Menu bar stub TODO (issue #18)

Key Files and Repos

  • Repo: azcomputerguru/gururmm, internal Gitea at http://172.16.3.20:3000
  • Submodule (dev): D:\claudetools\projects\msp-tools\guru-rmm
  • Agent source: agent/src/ within repo
  • Windows config dir: C:\ProgramData\GuruRMM\ (service files, device_id, BSOD watermark, VSS policy cache, VSS first-run flags)
  • Windows registry: HKLM\SOFTWARE\GuruRMM\SiteId (set by MSI), HKLM\SOFTWARE\GuruRMM\DeviceId (set by agent, Phase 1 durable identity)
  • Linux config: /etc/gururmm/agent.toml (root, mode 600); /var/lib/gururmm/.device-id + /etc/gururmm/.device-id (durable identity mirrors)
  • macOS config: /usr/local/etc/gururmm/site.plist (site_id/agent_key via plist crate)
  • Downloads dir (server): /var/www/gururmm/downloads/ on 172.16.3.30 — agent binaries + .channel sidecars + .sha256 checksums

Build Variants

Artifact Platform Notes
gururmm-agent-windows-amd64-<ver>.exe Windows 10+/Server 2016+ (64-bit) Native Windows Service (windows-service crate)
gururmm-agent-windows-x86-<ver>.exe Windows 10+/Server 2016+ (32-bit) Same, 32-bit
gururmm-agent-windows-legacy-amd64-<ver>.exe Windows 7/Server 2008 R2 (64-bit) legacy feature flag; no windows-service dependency; NSSM-based service
gururmm-agent-windows-legacy-x86-<ver>.exe Windows 7/Server 2008 R2 (32-bit) Same, 32-bit
gururmm-agent-base-<ver>.msi Windows (all) WiX v4 MSI installer; SITEKEY baked per-site on download
gururmm-agent-linux-amd64-<ver> Linux (x86_64) musl static; systemd service
gururmm-agent-macos-amd64-<ver> macOS (Intel) Mach-O, LaunchDaemon
gururmm-agent-macos-arm64-<ver> macOS (Apple Silicon) Same, arm64

macOS builds are manual (no CI pipeline; no build host); tagged .channel files for macOS are managed manually. (verify)

Platform Coverage

Capability Windows Linux macOS
Metrics (CPU/mem/disk/net) Full Full Full
Temperature sensors Partial (ACPI thermal only; LHM removed) Full (hwmon) Partial (SMC; Apple Silicon inconsistent)
Hardware/software/service inventory Full Full Full
User/group inventory Full (local + AD + AAD) Full (getent) Full (dscl/dsconfigad)
Checks (cpu/mem/disk/ping/port/script) Full Full Full
Service checks Full Full (verify)
Shell command execution Full (cmd.exe) Full (bash) Full (bash)
PowerShell execution Full N/A N/A
user_session execution context Full (WTS impersonation) Not implemented Not implemented
Self-update Full Full Full
Watchdog SCM supervision Full N/A (systemd handles) N/A (launchd handles)
BSOD detection Full N/A N/A
Event Log watches Full (Get-WinEvent) N/A N/A
VSS shadow copy management Full (SPEC-016) Planned (LVM) Planned (APFS)
Registry operations Full Error stub Error stub
Network discovery Full Full Full
Tunnel (terminal) Partial (agent side built; server not production-ready) Partial Partial
Compliance reporting Full (VSS domain) N/A (VSS only in v1) N/A (VSS only in v1)
Tray IPC Full (named pipe) Full (Unix socket) Stub

Communication Protocol

The agent communicates with the server over a persistent TLS WebSocket (wss://). All messages are JSON-serialized using a tagged enum (type/payload fields, snake_case).

Agent -> Server message types (AgentMessage): Auth, Metrics, NetworkState, CommandResult, WatchdogEvent, UpdateResult, Heartbeat, LogUpload, CommandCancelled, CommandAck, TunnelReady, TunnelData, TunnelError, ScriptResult, RequestChecks, InventoryReport, UserInventoryReport, UserActionResult, CheckResult, DiscoveryResult, DiscoveryComplete, RegistryResult, EventLogMatches, BsodEvent, VssResult, VssStatus, ComplianceReport.

Server -> Agent message types (ServerMessage): Command, ConfigUpdate, Update, Ack, Error, RequestLogUpload, CancelCommand, TunnelOpen, TunnelClose, TunnelData, RequestInventory, UserAction, RunChecks, DiscoveryScan, RegistryOp, VssOp.

Serde tolerance: Newer message types (VssStatus, ComplianceReport) are variants the server may ignore if it is running an older version. Unrecognized ServerMessage variants are NAK'd with an error response rather than silently dropped (agent: commit 3de9faf).

Auto-reconnect: The WS client loop reconnects on disconnect with exponential backoff. On reconnect: pending commands are re-dispatched, pending updates re-offered, agent requests current check configs.


Development

Current Focus

As of 2026-06-12 (agent 0.6.66 HEAD / fleet on 0.6.63 stable):

  • Agent-comms-durability Phase 2 (planned): Live TTY over WS — seq/resume frames, single-use session token, AIMD keepalive. Not yet started.
  • Durable agent identity Phase 1 Tasks 2-3 (pending): Hardware fingerprint capture (inventory.rs baseboard serial + primary MAC); server migration for hardware_fingerprint; dashboard duplicate-hostname surfacing (read-only).
  • BSOD Phase 2/3 (deferred): BSOD events in the Alerts stream, on-demand dump upload (fetch_bsod_dump), full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
  • Windows thermal collection: WMI ACPI (MSAcpi_ThermalZoneTemperature) recommended as first unblocked path (Approach 1 in FEATURE_ROADMAP.md). NVAPI (NVIDIA GPU temps) as Approach 2. Custom kernel driver deferred.
  • Tray IPC peer authorization (Windows issue #16), logind console-user resolution (issue #17), macOS tray (issue #18), subscriber broadcast (issue #19).
  • Linux fleet unit drift: Auto-updater replaces binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix agents have new binary + old unit missing StateDirectory=gururmm. Needs an ops-script pass.
  • VSS Linux/macOS: Stubs remain; LVM (Linux) and APFS (macOS) snapshots are design-level TODOs only.

Patterns and Anti-Patterns

Never repeat:

Pattern What Went Wrong
Using the minidump crate for Windows kernel dumps Parses only Breakpad MDMP format; Windows kernel PAGEDU64 dumps require direct offset reads from DUMP_HEADER64.
command_type: "cmd" sent from server without agent alias Agent did not recognize "cmd" as the Shell variant; command message was silently dropped instead of executed. Fixed commit 3de9faf.
Restart-Service GuruRMMAgent -Force in a remote command Kills the agent before it can report the command result; command stays in running state forever. Use a scheduled task with a delay.
LHM (LibreHardwareMonitor) for Windows thermal WinRing0 kernel driver (CVE-2020-14979); Defender quarantined it fleet-wide. Do not re-add.
Installer using & $stagingPath install 2>&1 | Out-Null Swallows all output under $ErrorActionPreference='Stop'; surfaces misleading NativeCommandError on non-zero exit. Use Start-Process -Wait -PassThru + explicit ExitCode check. Fixed commit 5c0d004.
Agent-level channel pin for a beta canary Agent update_channel is lost on re-enrollment. Use site-level or client-level channel override — they survive re-enrollment.
New agent builds tagged stable by default Races the entire fleet to auto-update before any beta soak. All new builds default to beta; promotion to stable requires explicit re-tag of the .channel sidecar.
Reaper failing un-acked commands on timeout False failures for commands black-holed by NAT conntrack gap. Reaper must only fail commands that were ACK'd but exceeded real execution timeout. Un-acked commands requeue to pending.
+1.77 legacy builds without --ignore-rust-version Fail MSRV check after adding rust-version to Cargo.toml. Legacy build lines need --ignore-rust-version.
CRLF line endings in migration SQL files sqlx SHA-384 checksum mismatch crashes server on start. .gitattributes + core.autocrlf=false + pre-commit hook prevents this.
git config --system --add safe.directory omission when building as root Webhook builds run as root on guru-owned repo; git rejects repo as dubious ownership without this setting. Fixed 2026-06-11.

Good patterns:

  • Platform parity rule: Any agent feature ships on Windows + Linux + macOS in the same commit. Stubs with // TODO(platform): <os> — <reason> are acceptable; silent no-ops are not.
  • Serde-tolerant messages: New AgentMessage / ServerMessage variants must not break older server/agent versions. Use #[serde(default)] on new fields; new enum variants are simply ignored by old deserializers.
  • SHA-256 watermark atomic write (bsod, vss): Always write to a .tmp file then rename over the target to avoid corrupt-on-crash.
  • CommandAck before execution: ACK is sent on RECEIPT, not on completion. The server can distinguish "never reached agent" from "still running" based on acked_at.
  • record_result before complete: Caching the result before deregistering the task handle ensures a racing re-delivery always finds the command in one of: running (ignore), completed+cached (re-report), or about-to-run (de-dup before spawn). Never in a "finished but not yet cached" gap.

Build and Deploy

  • Build trigger: Push to gururmm Gitea main branch fires a Gitea webhook to http://172.16.3.30:9000/webhookwebhook-handler.py. Detects which component changed (agent vs. server) via last-built-commit-{linux,windows,mac,server} marker files.
  • Linux agent: build-linux.sh on the build server; musl static binary.
  • Windows agent: build-windows.sh dispatches SSH to Beast (GURU-BEAST-ROG, i9-14900K, primary) or Pluto (172.16.3.36, fallback). MSVC + WiX v4. Legacy builds use --ignore-rust-version. Signing via jsign + Azure Trusted Signing (Arizona Computer Guru LLC cert). Binary tagged beta in downloads .channel sidecar.
  • macOS agent: Manual build; build-macos.sh / build-macos-pkg.sh on an Apple machine. No CI pipeline yet. (verify)
  • Promotion to stable: POST /api/updates/rollouts/:version/promote body {"os","arch"} re-tags .channel sidecars. Rollback: POST /api/updates/rollouts/:version/rollback. This is intentionally a manual gate — no automated health-gated promotion yet (Phase 2 of safe-rollout spec, migration 046, is written but unwired).
  • Cargo.toml version pinning: Several crates pinned for Rust 1.77 legacy-build compatibility (edition 2024 crates, MSRV bumps). See agent/Cargo.toml comments for rationale. New deps must not pull in edition-2024 or MSRV >1.77 crates if legacy builds are still required.

Active State

Fleet on agent 0.6.63 stable as of 2026-06-11. ~168-182 agents typically online (215 enrolled). HEAD at a794a7f is v0.6.66 (not yet released). See gururmm Summary section for live fleet state, server version, and recent deployment notes.


History Highlights

  • 2025-12-15: Initial agent: WebSocket transport, basic metrics (CPU/mem/disk/net), command execution (shell/PowerShell), self-updater with rollback.
  • 2026-04-19: Temperature collection added (sysinfo Components); checks engine (cpu/mem/disk/ping/port/script/service).
  • 2026-04-29: Hardware inventory, software inventory, service inventory; VM/container/Unraid detection.
  • 2026-05-12: macOS agent Phase 1 deployed (LaunchDaemon, plist-based config, cross-compiled amd64/arm64).
  • 2026-05-15: User/group inventory (migration 037-040); DC detection; domain/AAD classification.
  • 2026-05-17: Network discovery (TCP + ICMP + ARP + rDNS + OS fingerprint, concurrent probes).
  • 2026-05-19: user_session execution context (migration 041) — WTS token impersonation for active user desktop on Windows.
  • 2026-05-21: Agent events table (migration 042); interrupted command status (migration 043).
  • 2026-05-24: Linux tray (PRs #13+#14, libappindicator/GTK + Unix socket IPC).
  • 2026-05-25 (audit-2-remediation): BUG-002 crash-detection dead code fixed (re-keyed to update_success). BUG-003 build-server.sh hardened (build lock + binary backup + auto-rollback). update_channel added to all agent API responses.
  • 2026-05-27: LibreHardwareMonitor removed fleet-wide — WinRing0 driver flagged by Microsoft Defender (CVE-2020-14979). Windows temperature collection reduced to ACPI/WMI partial coverage only.
  • 2026-05-27: Event log watches implemented (migration 047); BSOD detection spec written.
  • 2026-06-01: BSOD detection shipped (migration 048, agent v0.6.51). Validated against real 0x116 VIDEO_TDR_FAILURE (nvlddmkm.sys) on GURU-5070. First-run suppression; SHA-256 atomic watermark; DUMP_HEADER64 hand-parser.
  • 2026-06-04: VSS shadow copy management merged (SPEC-016, migration 050); SPEC-025 compliance posture (migration 051, VSS domain #1).
  • 2026-06-04: BUG-020 tray ghost/duplicate icons fixed (commit 137dd85); fix in beta.
  • 2026-06-07: Durable agent identity Phase 1 Task 1 (commit 0b81d33, v0.6.62): registry + file mirror for device_id; cleanup.ps1 whitelisted identity files. Addresses ~99% of new ghost-agent creation.
  • 2026-06-07: command_type: "cmd" alias added + unparseable commands NAK'd instead of silently dropped (commit 3de9faf, v0.6.63-pre).
  • 2026-06-11: Agent Comms Durability Phase 1 shipped (v0.6.63): CommandAck on receipt, re-delivery dedup cache, server reaper re-queues un-acked commands, pending commands re-offered on every heartbeat. Verified at PST-SERVER (behind UDR Ultra NAT).

  • gururmm — parent project article (server, dashboard, build pipeline, full architecture, deployment state)
  • gururmm-build — the production server host at 172.16.3.30 where agent binaries are built and served