--- type: project name: gururmm-agent display_name: GuruRMM Agent last_compiled: 2026-06-12 compiled_by: GURU-5070/claude-main sources: - "gururmm@a794a7f: agent/src/main.rs" - "gururmm@a794a7f: agent/src/commands/mod.rs" - "gururmm@a794a7f: agent/src/transport/mod.rs" - "gururmm@a794a7f: agent/src/metrics/mod.rs" - "gururmm@a794a7f: agent/src/checks.rs" - "gururmm@a794a7f: agent/src/updater/mod.rs" - "gururmm@a794a7f: agent/src/bsod.rs" - "gururmm@a794a7f: agent/src/watchdog/mod.rs" - "gururmm@a794a7f: agent/src/inventory.rs" - "gururmm@a794a7f: agent/src/users.rs" - "gururmm@a794a7f: agent/src/tunnel/mod.rs" - "gururmm@a794a7f: agent/src/event_log.rs" - "gururmm@a794a7f: agent/src/compliance.rs" - "gururmm@a794a7f: agent/src/discovery/mod.rs" - "gururmm@a794a7f: agent/src/registry_ops/mod.rs" - "gururmm@a794a7f: agent/src/vss.rs" - "gururmm@a794a7f: agent/Cargo.toml" - "gururmm@a794a7f: agent/agent.toml.example" - "gururmm@a794a7f: server/migrations/ (059 migrations, filenames as capability timeline)" - "gururmm@a794a7f: docs/FEATURE_ROADMAP.md" - "gururmm@a794a7f: git log origin/main -- agent/ (recent 30 commits)" - projects/gururmm-agent/session-logs/2026-05-25-recovered-review-fix-audit-2-remediation-branch-status.md - projects/gururmm-agent/session-logs/2026-06-01-recovered-investigate-blue-screen-and-test-detection-featu.md - projects/gururmm-agent/session-logs/2026-06-01-recovered-investigate-blue-screen-and-test-detection-featu-f5631414.md - wiki/projects/gururmm.md (cross-reference) backlinks: - projects/gururmm --- # GuruRMM Agent ## Summary The GuruRMM agent is the endpoint component of the [[gururmm]] platform. It is a Rust binary that installs as a long-running system service on managed Windows, Linux, and macOS endpoints. The agent connects to the GuruRMM server over an authenticated WebSocket, reports system metrics, executes remote commands, manages self-updates, and runs a variety of monitoring and compliance tasks. **Current version:** 0.6.66 at `agent/Cargo.toml` HEAD (a794a7f). Fleet is converged to 0.6.63 stable as of 2026-06-11 (see [[gururmm]] for live fleet state). This article covers agent capabilities only — server, dashboard, and platform architecture are documented in [[gururmm]]. **Crate name:** `gururmm-agent`. Single binary; CLI subcommands select the operational mode (run, install, uninstall, start, stop, status, generate-config, watchdog, vss-snapshot, service, watchdog-service). --- ## Capabilities / Feature Set ### Monitoring and Telemetry **Periodic metrics** (default interval 60s, configurable via policy `ConfigUpdate`): | Metric | Notes | |---|---| | CPU usage % | Cross-platform via `sysinfo` | | Memory used/total bytes + % | Cross-platform | | Disk used/total bytes + % (primary disk) | Cross-platform | | Network RX/TX bytes (delta) | Cross-platform | | OS type, version, hostname | Cross-platform | | Uptime seconds, boot time | Cross-platform | | Logged-in user + idle seconds | Win: `GetLastInputInfo`; Linux: `xprintidle`; macOS: `CGEventSource` (verify coverage) | | Public/WAN IP | Cached; fetched periodically from external service | | Top 10 processes by CPU | Cross-platform via `sysinfo::Processes` | | Top 10 processes by memory | Cross-platform | | CPU package temperature | Via `sysinfo::Components`. **Windows: thermal zones depend on BIOS ACPI export — firmware-less coverage; LibreHardwareMonitor removed 2026-05-27 (CVE-2020-14979, WinRing0 Defender quarantine). Windows thermal collection is currently partial or None on most hosts.** | | GPU temperature | Via `sysinfo::Components`. Same Windows caveat as above. | | All hardware sensor readings (label, value, unit, critical threshold) | Via `sysinfo::Components`. Reliable on Linux; variable on Windows/macOS. | **Network state** (sent on connect and on interface change): per-interface IPv4/IPv6 addresses, MAC, derived CIDR subnets. **Log upload:** Agent log bundle sent every 12 hours (`LogUpload` message). The upload task uses a `watch::Sender` for the current WS sender so it never holds a stale handle from a prior connection. --- ### Remote Execution Commands are dispatched by the server as `CommandPayload` messages over the WebSocket. **Command types** (from `transport::CommandType` enum): | Type | Notes | |---|---| | `shell` | `cmd.exe` on Windows, `bash` on Unix. Also accepts the alias `"cmd"` (backward compat — servers that send `command_type: "cmd"` historically triggered a parse failure that silently dropped the message; fixed in commit `3de9faf`). | | `powershell` | Windows PowerShell. Also accepts the alias `"powershell"`. | Extended types (python, script, claude_task) are referenced in the parent article at [[gururmm]] and in `agent/src/scripts.rs` / `agent/src/claude.rs`; confirm against those modules for current state. **Execution contexts** (from `transport::CommandContext`, added migration 041): | Context | Behavior | |---|---| | `system` (default) | Runs in Session 0, in the service's own process context (LocalSystem on Windows). | | `user_session` | **Windows only.** Impersonates the active logged-on user's desktop session via `WTSQueryUserToken` + `CreateProcessAsUserW` + per-user environment block. Requires an active console/RDS session. Implemented in `agent/src/watchdog/wts.rs`. Returns an error on non-Windows platforms. | **Command options:** `timeout_seconds` (optional), `elevated` (bool), `context` (defaulting to `system`). **In-flight management:** - Individual cancellation: server sends `CancelCommand { command_id }`; agent aborts the Tokio `JoinHandle` immediately. - All in-flight commands aborted on disconnect (`abort_all` called by WS reconnect loop). **Comms durability (Phase 1, shipped 2026-06-11 as v0.6.63):** - Agent sends `CommandAck { command_id }` immediately on receipt of a dispatched command (before execution begins). Server stamps `acked_at` (migration 058). - Re-delivery dedup: `CommandExecutor` keeps a FIFO-bounded cache of recently-completed results (capacity 64, max 256 KB per result). A re-delivered command that is already in-flight is ignored; one that already completed re-reports the cached result without re-executing. - Result is cached BEFORE deregistering the running task (`record_result` -> `complete`) so a re-delivery in the race window always finds it cached. - Server reaper re-delivers un-acked commands past a 60s ACK deadline (returns to `pending`) instead of failing them. Pending commands re-offered on every heartbeat (rides the refreshed NAT conntrack). Capability gate: reaper only re-delivers for agents that have demonstrably ACK'd at least once (partial index on `acked_at`); old agents keep the legacy fail-on-timeout path. --- ### Hardware Inventory `agent/src/inventory.rs` — `HardwareInventory` struct, sent on connect and on server request (`InventoryReport` message). | Field group | Fields | |---|---| | System identity | manufacturer, model, serial_number, bios_version | | CPU | cpu_model, cpu_cores, cpu_threads, cpu_speed_mhz | | Memory | total_memory_mb | | Disks | name, mount, total_gb, fs_type (Vec) | | Network | name, IP, MAC, speed_mbps (Vec) | | Software | name, version, publisher (Vec, installed applications) | | Services | name, display_name, status, start_type (Vec) | | OS detail | os_name, os_version, os_build | | Windows OS product type | os_product_type: 1=Workstation, 2=DC, 3=Server (None on Linux/macOS) | | Windows OS edition | os_edition: "Pro", "Standard", "Datacenter", etc. (None on Linux/macOS) | | VM / container | is_virtual_machine, hypervisor_type, vm_uuid, is_hypervisor, hosted_vm_uuids, is_container, is_unraid | | Agent version | agent_version | Collection uses platform-specific subprocess calls (`wmic`/`dmidecode`/`system_profiler`) and `sysinfo`. Fields are `Option` to avoid panics on platforms that cannot supply them. --- ### User Inventory `agent/src/users.rs` — `UserInventory` struct, sent on connect and on server request (`UserInventoryReport` message). Policy-scheduled (default 24h interval). **Per-user fields:** username, display_name, account_type (local/domain/aad), enabled, password_never_expires, password_expired, last_logon, is_admin, email (AD), upn (AD), department (AD), group membership. **Collection per platform:** - Windows: `Get-LocalUser`, `Get-ADUser` (if domain-joined), `dsregcmd /status` (Azure AD/hybrid). Group membership from `Get-LocalGroupMember`. DC detection (`is_dc`) from AD role query. - Linux: `getent passwd` + `/etc/group`. - macOS: `dscl`, `dsconfigad`, `dseditgroup`. **User actions** dispatched by server: enable/disable accounts, expire/un-expire password, create accounts, reset passwords. Executed via PowerShell on Windows, shell commands on Linux/macOS. Results reported as `UserActionResult`. --- ### Checks `agent/src/checks.rs` — server-defined check configurations (`CheckPayload`) executed on demand or on schedule. | Check type | Implementation | |---|---| | `cpu` | `sysinfo` global CPU usage; value returned, threshold evaluated server-side | | `memory` | `sysinfo` total/used memory; percentage computed | | `disk` | `sysinfo` disks; usage percentage | | `ping` | `tokio::process::Command` calls platform ping binary | | `port` | `TcpStream::connect` with timeout | | `script` | Arbitrary shell/script command; stdout/stderr/exit code captured | | `service` | Win: `sc.exe query`; Linux: `systemctl is-active`; reports running/stopped | CPU, memory, and disk checks wrap blocking `sysinfo` calls in `spawn_blocking`. Unknown check types return status `failing` with an error message (no silent no-op). --- ### Self-Update `agent/src/updater/mod.rs`. **Flow:** 1. Server sends `UpdatePayload` (version, download URL, SHA-256 checksum). 2. Agent downloads the binary via HTTPS (300s timeout). 3. SHA-256 checksum verified before touching the live binary. 4. Current binary copied to backup path (`gururmm-agent.backup` in config dir). 5. New binary atomically replaces current (via temp write + rename). 6. Agent restarts itself. 7. On reconnect, agent reports `UpdateResult` with outcome. If the agent fails to reconnect within ~180s rollback window, the backup binary is restored and the service restarts again. **Windows:** backup path `C:\ProgramData\GuruRMM\gururmm-agent.backup` (or `GuruRMM-Debug\` in debug builds). On restart, uses a detached child process that waits for parent exit before swapping. **Linux/macOS:** backup at `/etc/gururmm/gururmm-agent.backup`. **Update channels:** stable / beta / null (inherits default `stable`). Agent reports previous_version and pending_update_id in its next `Auth` payload so the server can correlate the update result. New builds are tagged `beta` by default; promotion to `stable` is a deliberate re-tag of the `.channel` sidecar file on the server's downloads directory. --- ### Watchdog (Windows Only) `agent/src/watchdog/` — a second, separate Windows SCM service (`GuruRMMWatchdog`) that runs from the same binary via `gururmm-agent watchdog`. **Responsibilities:** 1. Polls SCM every 30s for `GuruRMMAgent`. On unexpected stop: restart with backoff 30s / 60s / 120s. After 3 failed attempts, POSTs a watchdog alert to the server and continues monitoring. 2. Serves a named-pipe IPC channel so the main agent can request a clean service restart without racing against its own SCM stop signal. 3. `watchdog/wts.rs`: WTS (Windows Terminal Services) token management for the `user_session` command context — acquires the active console session's user token, creates the child process with `CreateProcessAsUserW`, and provides the per-user environment block. The `ensure_watchdog_running()` function is called by the main agent on startup. It uses a two-step SCM privilege model: opens with `CONNECT` only first (least privilege); escalates to `CREATE_SERVICE` only if the service does not yet exist. **Platform:** The `run_watchdog_service()` entrypoint is a no-op stub on Linux/macOS (`#[cfg(not(windows))]` returns a warning). --- ### Event Log Watches (Windows Only) `agent/src/event_log.rs` — added migration 047. **Mechanism:** Server pushes `EventLogWatchRule` configs as part of `ConfigUpdate`. The agent polls matching rules at a configured interval. Each rule specifies `log_name`, optional `event_id`, optional `source`, and optional `level` (critical/error/warning/information). Matches are queried via `Get-WinEvent` PowerShell with a `since` timestamp watermark. Matching entries are batched and sent as `EventLogMatches` messages. **Platform:** Windows only. `query_watch_rule` is `#[cfg(target_os = "windows")]`. Non-Windows builds compile with a stub that returns an empty `Vec`. **Injection guard:** `log_name` has single-quote characters stripped before inserting into the PowerShell filter hash. --- ### BSOD / Kernel Crash Detection (Windows Only) `agent/src/bsod.rs` — added migration 048. Shipped v0.6.51 (June 2026). **How it works:** - Runs one-shot on agent startup, then polls on a periodic interval. - Enumerates `C:\Windows\Minidump\*.dmp`. - Parses the kernel dump header by hand: - 64-bit `PAGEDU64` (`DUMP_HEADER64`): bugcheck code at offset 0x38 (u32), 4 parameters at 0x40 (4x u64), system filetime at 0xFA8. - 32-bit `PAGEDUMP` (`DUMP_HEADER`): bugcheck code at 0x28 (u32), 4 parameters at 0x2C (4x u32). - The `minidump` crate is intentionally not used — it parses Breakpad MDMP format, not Windows kernel PAGEDU64 dumps. - Cross-references the System event log (WER event 1001, Kernel-Power event 41) for faulting driver name and WER Report Id. - Computes SHA-256 of each dump file. Already-seen hashes are stored in a watermark file (`C:\ProgramData\GuruRMM\bsod-seen.json`). Watermark writes use a tmp-then-rename atomic pattern to survive mid-write crashes. - **First-run suppression:** On the first run (no watermark file), all existing dumps are baselined as already-seen — no retroactive alerts for crashes that predate the agent install. - Sends one `BsodEvent` per newly-detected crash. Server-side: `bsod_events` table (migration 048), deduplicated by `(agent_id, dump_sha256)`, always Critical severity. Dashboard Crashes tab shipped 2026-06-07. **Platform:** `#[cfg(target_os = "windows")]`. Non-Windows compiles to an empty stub. **Validated against:** A real `0x116 VIDEO_TDR_FAILURE` (nvlddmkm.sys) on GURU-5070 during the 2026-06-01 session. --- ### VSS Shadow Copy Management (Windows Only, SPEC-016) `agent/src/vss.rs` + `agent/src/compliance.rs` — migrations 050, 051. **Mechanism:** Driven entirely via PowerShell relay (`crate::powershell::run_ps()`); no native COM. Non-Windows platforms compile thin stubs. **Policy-driven snapshot scheduling:** - The agent receives VSS policy as part of `ConfigUpdate`. It mirrors the policy to `C:\ProgramData\GuruRMM\vss-policy.json` on every `ConfigUpdate`. - Snapshot execution is performed by a separate short-lived process invocation (`gururmm-agent vss-snapshot`), triggered by a Windows Scheduled Task named `GuruRMM-VSS-Snapshot`. The task is registered/updated by the agent when the policy hash changes. - Shadow storage is always bounded: agent runs `vssadmin resize shadowstorage /maxsize=N%` — never left unbounded. - Retention governed by max count and max age, as upper bounds on top of Windows' own storage-cap eviction. - Per-volume first-run staggering: a `vss-firstrun-.flag` marker gates when each additional volume starts snapshotting (C: first, then others). **DeviceObject handle:** The `\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopyN` string is the stable identifier and is persisted at creation time. The trailing index `N` is not stable across reboots. **Compliance reporting (SPEC-025):** `agent/src/compliance.rs` implements `evaluate_all()`, which calls each policy domain's evaluator. VSS is Domain #1. Evaluation is **read-only** — it reports posture (`compliant`, `pending`, `non_compliant`, `not_applicable`) without mutating the machine. VSS is the only domain in v1 that performs opt-in, kill-switchable self-heal. A compliance batch is sent as `ComplianceReport`. Serde-tolerant: older servers ignore the variant. **Status reporting:** `VssStatus` message sent on a slow cadence — per-volume shadow storage usage for server cache and dashboard display. **Platform notes in code (TODO markers):** - Linux: LVM snapshots (`lvcreate --snapshot`) — not implemented. - macOS: APFS snapshots (`tmutil localsnapshot`) — not implemented. --- ### Network Discovery `agent/src/discovery/mod.rs`. **How it works:** Server dispatches `DiscoveryScanConfig` (IP ranges, ports, timeout_ms, concurrency, exclusions). Agent expands ranges to individual IPs and probes: 1. TCP connect on each configured port (async, up to 200 concurrent probes via `Semaphore`). 2. ICMP ping fallback for hosts where all TCP ports are firewalled but the host is up. 3. ARP table lookup for MAC address. 4. Reverse DNS lookup for hostname. 5. OS fingerprinting from open port set. Results are streamed back as `DiscoveryResult` messages (one per found device) and finalized with `DiscoveryComplete` (total found, duration ms). --- ### Registry Operations (Windows Only) `agent/src/registry_ops/` via the `winreg` crate. **Operations:** | Function | Notes | |---|---| | `enumerate_keys(path)` | Lists subkeys under a registry path | | `enumerate_values(path)` | Lists values under a registry path | | `read_value(path, name)` | Reads a single registry value | | `set_value(path, name, value_type, value_data)` | Writes a registry value (typed, raw bytes) | | `create_key(path)` | Creates a registry key | All operations return `anyhow::Error` on non-Windows platforms (compile-time stubs; no silent no-op). **Note from parent article [[gururmm]]: the HTTP API currently exposes read-only paths (enumerate, read_value); write paths exist in the agent but are not yet routed server-side. (verify current state)** --- ### Tunnel `agent/src/tunnel/mod.rs`. Agent-side TunnelManager state machine. **Modes:** - `Heartbeat` (default): periodic metrics and heartbeats. - `Tunnel`: active session with a tech (triggered by `TunnelOpen { session_id, tech_id }` from server). Bidirectional data relay over the existing WebSocket connection. **Channel types defined in code:** `Terminal`, `File` (Phase 2+), `Registry` (Phase 2+), `Service` (Phase 2+). Currently only `Terminal` is operationally relevant. **Server status:** The server-side tunnel skeleton exists but is not production-ready (no `/tunnel` API routes declared, WS handler logs "not yet implemented"). Live TTY is planned as Phase 2 of the agent-comms-durability spec. --- ### IPC and Tray Integration `agent/src/ipc.rs`. **Windows:** Named-pipe IPC server for the `GuruRMM Tray` companion process. **Linux:** Unix socket IPC. **Operations available over IPC:** - Subscribe/unsubscribe: tray receives `IpcStatusUpdate` broadcasts when WS connection state changes. - Force check-in: tray requests immediate metrics collection (wakes the metrics task via `AppState::force_checkin` Notify). - Per-section policy update: server's `ConfigUpdate` is relayed to the tray for display. IPC subscribers are tracked in `AppState::ipc_subscribers` (`RwLock>>`). --- ## Architecture ### Components | Component | Location | State | |---|---|---| | Agent service (Windows) | `C:\Program Files\GuruRMM\gururmm-agent.exe`, SCM service `GuruRMMAgent` | Deployed; 0.6.63 stable | | Watchdog service (Windows) | Same binary, SCM service `GuruRMMWatchdog` | Deployed | | Agent service (Linux) | `/usr/local/bin/gururmm-agent`, systemd `gururmm-agent.service` | Deployed | | Agent service (macOS) | `/usr/local/bin/gururmm-agent`, LaunchDaemon `com.azcomputerguru.gururmm-agent.plist` | Phase 1 deployed 2026-05-12 | | Tray (Windows) | Named-pipe IPC, separate binary | Deployed; BUG-020 ghost-icon fix in beta | | Tray (Linux) | Unix socket IPC, libappindicator/GTK | Deployed (PR #13+#14, 2026-05-24) | | Tray (macOS) | Menu bar stub | TODO (issue #18) | ### Key Files and Repos - **Repo:** `azcomputerguru/gururmm`, internal Gitea at http://172.16.3.20:3000 - **Submodule (dev):** `D:\claudetools\projects\msp-tools\guru-rmm` - **Agent source:** `agent/src/` within repo - **Windows config dir:** `C:\ProgramData\GuruRMM\` (service files, device_id, BSOD watermark, VSS policy cache, VSS first-run flags) - **Windows registry:** `HKLM\SOFTWARE\GuruRMM\SiteId` (set by MSI), `HKLM\SOFTWARE\GuruRMM\DeviceId` (set by agent, Phase 1 durable identity) - **Linux config:** `/etc/gururmm/agent.toml` (root, mode 600); `/var/lib/gururmm/.device-id` + `/etc/gururmm/.device-id` (durable identity mirrors) - **macOS config:** `/usr/local/etc/gururmm/site.plist` (site_id/agent_key via plist crate) - **Downloads dir (server):** `/var/www/gururmm/downloads/` on 172.16.3.30 — agent binaries + `.channel` sidecars + `.sha256` checksums ### Build Variants | Artifact | Platform | Notes | |---|---|---| | `gururmm-agent-windows-amd64-.exe` | Windows 10+/Server 2016+ (64-bit) | Native Windows Service (windows-service crate) | | `gururmm-agent-windows-x86-.exe` | Windows 10+/Server 2016+ (32-bit) | Same, 32-bit | | `gururmm-agent-windows-legacy-amd64-.exe` | Windows 7/Server 2008 R2 (64-bit) | `legacy` feature flag; no windows-service dependency; NSSM-based service | | `gururmm-agent-windows-legacy-x86-.exe` | Windows 7/Server 2008 R2 (32-bit) | Same, 32-bit | | `gururmm-agent-base-.msi` | Windows (all) | WiX v4 MSI installer; SITEKEY baked per-site on download | | `gururmm-agent-linux-amd64-` | Linux (x86_64) | musl static; systemd service | | `gururmm-agent-macos-amd64-` | macOS (Intel) | Mach-O, LaunchDaemon | | `gururmm-agent-macos-arm64-` | macOS (Apple Silicon) | Same, arm64 | macOS builds are manual (no CI pipeline; no build host); tagged `.channel` files for macOS are managed manually. (verify) ### Platform Coverage | Capability | Windows | Linux | macOS | |---|---|---|---| | Metrics (CPU/mem/disk/net) | Full | Full | Full | | Temperature sensors | Partial (ACPI thermal only; LHM removed) | Full (hwmon) | Partial (SMC; Apple Silicon inconsistent) | | Hardware/software/service inventory | Full | Full | Full | | User/group inventory | Full (local + AD + AAD) | Full (getent) | Full (dscl/dsconfigad) | | Checks (cpu/mem/disk/ping/port/script) | Full | Full | Full | | Service checks | Full | Full | (verify) | | Shell command execution | Full (cmd.exe) | Full (bash) | Full (bash) | | PowerShell execution | Full | N/A | N/A | | `user_session` execution context | Full (WTS impersonation) | Not implemented | Not implemented | | Self-update | Full | Full | Full | | Watchdog SCM supervision | Full | N/A (systemd handles) | N/A (launchd handles) | | BSOD detection | Full | N/A | N/A | | Event Log watches | Full (Get-WinEvent) | N/A | N/A | | VSS shadow copy management | Full (SPEC-016) | Planned (LVM) | Planned (APFS) | | Registry operations | Full | Error stub | Error stub | | Network discovery | Full | Full | Full | | Tunnel (terminal) | Partial (agent side built; server not production-ready) | Partial | Partial | | Compliance reporting | Full (VSS domain) | N/A (VSS only in v1) | N/A (VSS only in v1) | | Tray IPC | Full (named pipe) | Full (Unix socket) | Stub | --- ### Communication Protocol The agent communicates with the server over a persistent TLS WebSocket (`wss://`). All messages are JSON-serialized using a tagged enum (`type`/`payload` fields, `snake_case`). **Agent -> Server message types** (`AgentMessage`): Auth, Metrics, NetworkState, CommandResult, WatchdogEvent, UpdateResult, Heartbeat, LogUpload, CommandCancelled, CommandAck, TunnelReady, TunnelData, TunnelError, ScriptResult, RequestChecks, InventoryReport, UserInventoryReport, UserActionResult, CheckResult, DiscoveryResult, DiscoveryComplete, RegistryResult, EventLogMatches, BsodEvent, VssResult, VssStatus, ComplianceReport. **Server -> Agent message types** (`ServerMessage`): Command, ConfigUpdate, Update, Ack, Error, RequestLogUpload, CancelCommand, TunnelOpen, TunnelClose, TunnelData, RequestInventory, UserAction, RunChecks, DiscoveryScan, RegistryOp, VssOp. **Serde tolerance:** Newer message types (VssStatus, ComplianceReport) are variants the server may ignore if it is running an older version. Unrecognized `ServerMessage` variants are NAK'd with an error response rather than silently dropped (agent: commit `3de9faf`). **Auto-reconnect:** The WS client loop reconnects on disconnect with exponential backoff. On reconnect: pending commands are re-dispatched, pending updates re-offered, agent requests current check configs. --- ## Development ### Current Focus As of 2026-06-12 (agent 0.6.66 HEAD / fleet on 0.6.63 stable): - **Agent-comms-durability Phase 2 (planned):** Live TTY over WS — seq/resume frames, single-use session token, AIMD keepalive. Not yet started. - **Durable agent identity Phase 1 Tasks 2-3 (pending):** Hardware fingerprint capture (`inventory.rs` baseboard serial + primary MAC); server migration for `hardware_fingerprint`; dashboard duplicate-hostname surfacing (read-only). - **BSOD Phase 2/3 (deferred):** BSOD events in the Alerts stream, on-demand dump upload (`fetch_bsod_dump`), full ~350-entry bugcheck name table (Phase 1 ships a 10-code map). - **Windows thermal collection:** WMI ACPI (`MSAcpi_ThermalZoneTemperature`) recommended as first unblocked path (Approach 1 in FEATURE_ROADMAP.md). NVAPI (NVIDIA GPU temps) as Approach 2. Custom kernel driver deferred. - **Tray IPC peer authorization** (Windows issue #16), logind console-user resolution (issue #17), macOS tray (issue #18), subscriber broadcast (issue #19). - **Linux fleet unit drift:** Auto-updater replaces binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix agents have new binary + old unit missing `StateDirectory=gururmm`. Needs an ops-script pass. - **VSS Linux/macOS:** Stubs remain; LVM (Linux) and APFS (macOS) snapshots are design-level TODOs only. ### Patterns and Anti-Patterns **Never repeat:** | Pattern | What Went Wrong | |---|---| | Using the `minidump` crate for Windows kernel dumps | Parses only Breakpad MDMP format; Windows kernel PAGEDU64 dumps require direct offset reads from DUMP_HEADER64. | | `command_type: "cmd"` sent from server without agent alias | Agent did not recognize "cmd" as the Shell variant; command message was silently dropped instead of executed. Fixed commit `3de9faf`. | | `Restart-Service GuruRMMAgent -Force` in a remote command | Kills the agent before it can report the command result; command stays in `running` state forever. Use a scheduled task with a delay. | | LHM (LibreHardwareMonitor) for Windows thermal | WinRing0 kernel driver (CVE-2020-14979); Defender quarantined it fleet-wide. Do not re-add. | | Installer using `& $stagingPath install 2>&1 \| Out-Null` | Swallows all output under `$ErrorActionPreference='Stop'`; surfaces misleading NativeCommandError on non-zero exit. Use `Start-Process -Wait -PassThru` + explicit ExitCode check. Fixed commit `5c0d004`. | | Agent-level channel pin for a beta canary | Agent `update_channel` is lost on re-enrollment. Use site-level or client-level channel override — they survive re-enrollment. | | New agent builds tagged stable by default | Races the entire fleet to auto-update before any beta soak. All new builds default to beta; promotion to stable requires explicit re-tag of the `.channel` sidecar. | | Reaper failing un-acked commands on timeout | False failures for commands black-holed by NAT conntrack gap. Reaper must only fail commands that were ACK'd but exceeded real execution timeout. Un-acked commands requeue to `pending`. | | `+1.77` legacy builds without `--ignore-rust-version` | Fail MSRV check after adding `rust-version` to Cargo.toml. Legacy build lines need `--ignore-rust-version`. | | CRLF line endings in migration SQL files | sqlx SHA-384 checksum mismatch crashes server on start. `.gitattributes` + `core.autocrlf=false` + pre-commit hook prevents this. | | `git config --system --add safe.directory` omission when building as root | Webhook builds run as root on guru-owned repo; git rejects repo as dubious ownership without this setting. Fixed 2026-06-11. | **Good patterns:** - **Platform parity rule:** Any agent feature ships on Windows + Linux + macOS in the same commit. Stubs with `// TODO(platform): ` are acceptable; silent no-ops are not. - **Serde-tolerant messages:** New `AgentMessage` / `ServerMessage` variants must not break older server/agent versions. Use `#[serde(default)]` on new fields; new enum variants are simply ignored by old deserializers. - **SHA-256 watermark atomic write (bsod, vss):** Always write to a `.tmp` file then `rename` over the target to avoid corrupt-on-crash. - **CommandAck before execution:** ACK is sent on RECEIPT, not on completion. The server can distinguish "never reached agent" from "still running" based on `acked_at`. - **`record_result` before `complete`:** Caching the result before deregistering the task handle ensures a racing re-delivery always finds the command in one of: running (ignore), completed+cached (re-report), or about-to-run (de-dup before spawn). Never in a "finished but not yet cached" gap. ### Build and Deploy - **Build trigger:** Push to `gururmm` Gitea main branch fires a Gitea webhook to `http://172.16.3.30:9000/webhook` → `webhook-handler.py`. Detects which component changed (agent vs. server) via `last-built-commit-{linux,windows,mac,server}` marker files. - **Linux agent:** `build-linux.sh` on the build server; musl static binary. - **Windows agent:** `build-windows.sh` dispatches SSH to Beast (GURU-BEAST-ROG, i9-14900K, primary) or Pluto (172.16.3.36, fallback). MSVC + WiX v4. Legacy builds use `--ignore-rust-version`. Signing via jsign + Azure Trusted Signing (Arizona Computer Guru LLC cert). Binary tagged `beta` in downloads `.channel` sidecar. - **macOS agent:** Manual build; `build-macos.sh` / `build-macos-pkg.sh` on an Apple machine. No CI pipeline yet. (verify) - **Promotion to stable:** `POST /api/updates/rollouts/:version/promote` body `{"os","arch"}` re-tags `.channel` sidecars. Rollback: `POST /api/updates/rollouts/:version/rollback`. This is intentionally a manual gate — no automated health-gated promotion yet (Phase 2 of safe-rollout spec, migration 046, is written but unwired). - **Cargo.toml version pinning:** Several crates pinned for Rust 1.77 legacy-build compatibility (edition 2024 crates, MSRV bumps). See `agent/Cargo.toml` comments for rationale. New deps must not pull in edition-2024 or MSRV >1.77 crates if legacy builds are still required. --- ## Active State Fleet on agent 0.6.63 stable as of 2026-06-11. ~168-182 agents typically online (215 enrolled). HEAD at a794a7f is v0.6.66 (not yet released). See [[gururmm]] Summary section for live fleet state, server version, and recent deployment notes. --- ## History Highlights - **2025-12-15:** Initial agent: WebSocket transport, basic metrics (CPU/mem/disk/net), command execution (shell/PowerShell), self-updater with rollback. - **2026-04-19:** Temperature collection added (sysinfo Components); checks engine (cpu/mem/disk/ping/port/script/service). - **2026-04-29:** Hardware inventory, software inventory, service inventory; VM/container/Unraid detection. - **2026-05-12:** macOS agent Phase 1 deployed (LaunchDaemon, plist-based config, cross-compiled amd64/arm64). - **2026-05-15:** User/group inventory (migration 037-040); DC detection; domain/AAD classification. - **2026-05-17:** Network discovery (TCP + ICMP + ARP + rDNS + OS fingerprint, concurrent probes). - **2026-05-19:** `user_session` execution context (migration 041) — WTS token impersonation for active user desktop on Windows. - **2026-05-21:** Agent events table (migration 042); interrupted command status (migration 043). - **2026-05-24:** Linux tray (PRs #13+#14, libappindicator/GTK + Unix socket IPC). - **2026-05-25 (audit-2-remediation):** BUG-002 crash-detection dead code fixed (re-keyed to `update_success`). BUG-003 build-server.sh hardened (build lock + binary backup + auto-rollback). `update_channel` added to all agent API responses. - **2026-05-27:** LibreHardwareMonitor removed fleet-wide — WinRing0 driver flagged by Microsoft Defender (CVE-2020-14979). Windows temperature collection reduced to ACPI/WMI partial coverage only. - **2026-05-27:** Event log watches implemented (migration 047); BSOD detection spec written. - **2026-06-01:** BSOD detection shipped (migration 048, agent v0.6.51). Validated against real `0x116 VIDEO_TDR_FAILURE` (nvlddmkm.sys) on GURU-5070. First-run suppression; SHA-256 atomic watermark; DUMP_HEADER64 hand-parser. - **2026-06-04:** VSS shadow copy management merged (SPEC-016, migration 050); SPEC-025 compliance posture (migration 051, VSS domain #1). - **2026-06-04:** BUG-020 tray ghost/duplicate icons fixed (commit `137dd85`); fix in beta. - **2026-06-07:** Durable agent identity Phase 1 Task 1 (commit `0b81d33`, v0.6.62): registry + file mirror for device_id; `cleanup.ps1` whitelisted identity files. Addresses ~99% of new ghost-agent creation. - **2026-06-07:** `command_type: "cmd"` alias added + unparseable commands NAK'd instead of silently dropped (commit `3de9faf`, v0.6.63-pre). - **2026-06-11:** Agent Comms Durability Phase 1 shipped (v0.6.63): `CommandAck` on receipt, re-delivery dedup cache, server reaper re-queues un-acked commands, pending commands re-offered on every heartbeat. Verified at PST-SERVER (behind UDR Ultra NAT). --- ## Backlinks - [[gururmm]] — parent project article (server, dashboard, build pipeline, full architecture, deployment state) - [[gururmm-build]] — the production server host at 172.16.3.30 where agent binaries are built and served