sync: auto-sync from GURU-5070 at 2026-06-04 19:08:11

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-04 19:08:11
This commit is contained in:
2026-06-04 19:08:16 -07:00
parent e95fa07cfe
commit e08488ae5e
6 changed files with 98 additions and 11 deletions

29
wiki/projects/guru-rmm.md Normal file
View File

@@ -0,0 +1,29 @@
---
type: redirect
name: guru-rmm
display_name: "GuruRMM (redirect → gururmm)"
canonical: gururmm
tombstone: true
last_compiled: 2026-06-04
compiled_by: GURU-5070/claude-main
---
# guru-rmm → **gururmm** (redirect)
**This is not the article. The GuruRMM project article is [[gururmm]] (`wiki/projects/gururmm.md`).**
## Why this file exists
There are two spellings of the project slug, and they do not match:
| Context | Spelling |
|---|---|
| On-disk project directory / submodule | `projects/msp-tools/guru-rmm/` (**hyphenated**) |
| Gitea repo | `azcomputerguru/gururmm` (**no hyphen**) |
| Wiki article slug | `gururmm` (**no hyphen**) |
Anyone (human or Claude) who infers the wiki slug from the directory name searches
`guru-rmm` and gets nothing — the article is at `gururmm.md`. This tombstone makes the
hyphenated lookup resolve instead of dead-ending.
**Go to [[gururmm]].**

View File

@@ -2,8 +2,10 @@
type: project
name: gururmm
display_name: GuruRMM
last_compiled: 2026-06-02
last_compiled: 2026-06-04
compiled_by: GURU-5070/claude-main
aliases:
- guru-rmm
sources:
- "gururmm@main: server/src/api/*.rs (REST API surface, ~30 route modules)"
- "gururmm@main: agent/src/ (agent capabilities; transport/CommandContext, ohw.rs, watchdog/wts.rs, bsod.rs)"
@@ -14,6 +16,7 @@ sources:
- "gururmm@main: agent/src/bsod.rs"
- "gururmm@main: deploy/build-pipeline/webhook-handler.py"
- "gururmm@main: deploy/build-pipeline/build-server.sh"
- "gururmm@main: commit 137dd85 (BUG-020 tray fix: single-instance mutex + WTSEnumerateProcessesW reconciliation + graceful shutdown event)"
- projects/msp-tools/guru-rmm/CONTEXT.md
- projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
- projects/msp-tools/guru-rmm/docs/UI_GAPS.md
@@ -46,6 +49,7 @@ sources:
- session-logs/2026-05-24-GURU-KALI-session.md
- session-logs/2026-05-31-howard-gururmm-roadmap-and-features.md
- session-logs/2026-06-02-mike-bsod-detection-and-pipeline.md
- "live GuruRMM Postgres query 2026-06-04: agents/sites/update_rollouts/agent_updates tables (channel verification)"
backlinks:
- clients/cascades-tucson
- systems/gururmm-build
@@ -59,7 +63,9 @@ backlinks:
GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.
**Current version:** agent 0.6.51 / server 0.3.37 as of 2026-06-02. Fleet converged to 0.6.51. Note: committed changelogs are stale (stop at agent v0.6.22 / server v0.3.1) — migrations + commit log are the authoritative feature record, not changelogs.
**Current version:** agent 0.6.54 (beta) / 0.6.47 (stable) / server 0.3.37 as of 2026-06-04. Fleet on stable target 0.6.47 (pinned 2026-05-28); GURU-5070 is the lone beta agent (explicit per-agent override), running 0.6.54 and auto-riding each new beta build. Note: committed changelogs are stale (stop at agent v0.6.22 / server v0.3.1) — migrations + commit log are the authoritative feature record, not changelogs.
**See also:** `wiki/projects/guru-rmm.md` is a redirect tombstone pointing here (slug disambiguation: on-disk directory is `guru-rmm` hyphenated; wiki and Gitea repo use `gururmm` no-hyphen).
**Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a git submodule tracking the active `azcomputerguru/gururmm` repo; the pinned pointer normally lags `main` (expected). Development happens in the submodule working tree and changes are committed and pushed to Gitea from there.
@@ -135,7 +141,7 @@ Agent↔server communication is a persistent authenticated WebSocket with auto-r
|---|---|---|---|
| Server | 172.16.3.30:3001, systemd `gururmm-server`, binary `/usr/local/bin/gururmm-server` | Rust, Axum | deployed, production |
| Dashboard | https://rmm.azcomputerguru.com, nginx at `/var/www/gururmm/dashboard/` | React + TypeScript + Vite, shadcn/ui, Tailwind CSS v4 | deployed, production |
| Agent (Windows) | Endpoints, installed as `GuruRMMAgent` Windows service via WiX MSI | Rust, Windows MSVC | deployed, fleet on 0.6.51 |
| Agent (Windows) | Endpoints, installed as `GuruRMMAgent` Windows service via WiX MSI | Rust, Windows MSVC | deployed; stable fleet on 0.6.47; GURU-5070 (beta) on 0.6.54 |
| Agent (Linux) | Endpoints, systemd `gururmm-agent`, binary `/usr/local/bin/gururmm-agent` | Rust, musl static | deployed |
| Agent (macOS) | Endpoints, LaunchDaemon `com.azcomputerguru.gururmm-agent.plist` | Rust, aarch64/x86_64 | Phase 1 deployed 2026-05-12; code signing issue on Apple Silicon |
| Tray (Windows) | System tray, named pipe IPC | Rust | deployed |
@@ -204,8 +210,9 @@ gururmm/
### Current Focus
As of 2026-06-02 (agent 0.6.51 / server 0.3.37):
As of 2026-06-04 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.37):
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main → beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event → 3s wait → `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab + BSOD in Alerts stream (issue #10, dashboard bullets unchecked); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
- **Linux fleet unit drift:** Auto-updater replaces the binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix Linux agents have new binary + old unit (missing `StateDirectory=gururmm`). Needs an ops-script pass via `/rmm` or organic at next reinstall.
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
@@ -254,7 +261,7 @@ As of 2026-06-02 (agent 0.6.51 / server 0.3.37):
- **`interrupt_running_commands()` at reconnect** — flips all `status='running'` commands for reconnecting agent to `status='interrupted'`.
- **Build change-gate + backup/rollback in `build-server.sh`** — skips rebuild when `server/` is unchanged (marker `last-built-commit-server`); backs up previous binary; restores it if the new binary fails `is-active`. Prevents unnecessary rebuilds and covers the BUG-003 no-rollback gap for server.
- **Server's own root RMM agent for privileged ops** — the server (172.16.3.30) runs the GuruRMM Linux agent as root (hostname `gururmm`); it can read/write `/var/www/gururmm/downloads`, re-tag `.channel` sidecars, and trigger `build-server.sh` without SSH or `sshpass`.
- **GURU-5070 as permanent beta-channel canary** — always on `beta`, gets new builds first; meaningful now that builds default to beta.
- **GURU-5070 as permanent beta-channel canary** — per-agent `update_channel = 'beta'` override (only agent in the fleet with an explicit channel; site/all-other-agents default to `NULL` = stable). Gets every new beta build immediately; stable fleet is protected by the explicit `update_rollouts` pin.
### Build & Deploy
@@ -311,10 +318,11 @@ Gitea push to main
## Active State
**Fleet (as of 2026-06-02, live API verified):**
- 55 enrolled agents total; fleet converged to 0.6.51
- GURU-5070 on beta channel (permanent canary)
- Stragglers still catching up as they reconnect
**Fleet (as of 2026-06-04, live Postgres verified):**
- 55 enrolled agents total
- Stable channel: pinned at 0.6.47 windows/amd64 (promoted 2026-05-28); 0.6.46 linux. All 39 sites and 118 agents are on stable (channel NULL = stable default).
- Beta channel: **GURU-5070 only** — per-agent `update_channel = 'beta'` override (site "Mike's Car" / `103c10b9-c1de-4dd8-b382-b8362ed3143e` has `update_channel = NULL`, so stable is the site default; GURU-5070 is the explicit per-agent exception). Beta has no `update_rollouts` pin — server dispatches the newest signed beta artifact straight from the build pipeline.
- GURU-5070 running 0.6.54 (beta). Permanent canary; gets every new beta build immediately upon reconnect.
**Enrolled clients/sites (live API, 2026-05-24 baseline; no removals since):**
@@ -358,6 +366,13 @@ Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI
- #18 — macOS tray
- #19 — subscriber broadcast
**BUG-020 — tray duplicate/ghost icons (fixed to beta 2026-06-04; dormant follow-up open):**
- Symptom: duplicate AND ghost `gururmm-tray.exe` tray icons. Live evidence: 5 stacked tray processes in Session 1 on GURU-5070 (one per watchdog restart over 6/16/2).
- Root cause: `TrayLauncher` (`agent/src/watchdog/wts.rs`) tracked launches only in an in-memory `HashMap<sid,HANDLE>` that resets on watchdog restart (esp. agent auto-update), so it relaunched trays into sessions that already had one; no single-instance guard in the tray; `terminate_all` hard-killed via `TerminateProcess` skipping the tray's `Drop``NIM_DELETE` (ghost).
- Fix (commit `137dd85`, gururmm@main → beta): (1) per-session `Local\GuruRMM_Tray` single-instance mutex; (2) launcher reconciliation via `WTSEnumerateProcessesW` (idempotent); (3) graceful `Global\GuruRMM_TrayShutdown_{sid}` event → 3s wait → `TerminateProcess` fallback.
- Verified: independent Grok review + Code Review Agent APPROVE.
- Follow-up (coord todo `25fdf31a`): wire `terminate_all` graceful-shutdown into the watchdog policy-disable/uninstall path so fix #3 becomes active.
**Security backlog (HIGH):**
- `credentials/:id/reveal` — horizontal privilege escalation (no ownership scope check)
- `internal_err()` — ~130 call sites returning raw DB errors to callers
@@ -399,7 +414,8 @@ These decisions are locked. Do not reverse without explicit user approval.
| 2026-05-24 | Linux tray IPC + GTK (PR #13+#14) and peer-cred authz (PR #14) merged. PR #21 (ReadWritePaths fix) merged. Build pipeline split into per-platform scripts. Pluto known-hosts pinned. Fleet converged to 0.6.38. |
| 2026-05-31 | Roadmap reconciliation (17 corrections — roadmap understated built state). MSPBackups mapping/verify UI + dev-admin impersonation UI deployed (dashboard v0.2.32). BUG-008/013/014 status corrected to fixed. SPEC-021 (logged-in user domain detection) written after Howard feature request. |
| 2026-06-01 | BUG-016 (Linux systemd missing StateDirectory=gururmm) + BUG-017 (device_id OnceLock cache) fixed (commit 30da053). GURU-KALI had 11 ghost agent rows from repeated UUID churn — fixed and verified. BSOD forensics: GURU-5070 bluescreened with `0x116 VIDEO_TDR_FAILURE` (nvlddmkm.sys, NVIDIA driver 32.0.15.9201 on RTX 5070 Ti Laptop GPU); GuruConnect cleared on three grounds; root cause one-off driver TDR. BSOD detection feature (issue #10 Phase 1) implemented: bsod.rs + migration 048 + ws/mod.rs handler; code review caught and fixed SF-1 (watermark before send) + SF-2 (non-atomic watermark write); merged to main (0ec55cf), agent versioned 0.6.51. |
| 2026-06-02 | Server 0.3.37 + migration 048 deployed. Build channel default-beta fix applied to build-windows.sh + build-linux.sh (macOS already correct). Webhook wired to dispatch build-server.sh with change-gate (last-built-commit-server) + backup/rollback. Fleet converged to 0.6.51; GURU-5070 promoted to stable after beta soak was effectively lost due to auto-update race. GURU-KALI BUG-016 unit file refreshed, override removed, verified clean. |
| 2026-06-02 | Server 0.3.37 + migration 048 deployed. Build channel default-beta fix applied to build-windows.sh + build-linux.sh (macOS already correct). Webhook wired to dispatch build-server.sh with change-gate (last-built-commit-server) + backup/rollback. Fleet converged to 0.6.51. GURU-KALI BUG-016 unit file refreshed, override removed, verified clean. [NOTE: the session log recorded "GURU-5070 promoted to stable" — contradicted by live DB; see 2026-06-04 entry.] |
| 2026-06-04 | Channel correction confirmed via live Postgres query: GURU-5070 `agents.update_channel = 'beta'` (explicit per-agent override). Site "Mike's Car" and all 39 sites are `update_channel = NULL` (stable default); GURU-5070 is the only beta agent in the 119-agent fleet. Stable channel pinned at 0.6.47 windows/amd64 + 0.6.46 linux via `update_rollouts` (promoted 2026-05-28); beta channel has 0 `update_rollouts` rows (server dispatches newest signed beta artifact directly). GURU-5070 running 0.6.54. BUG-020 (duplicate/ghost tray icons) fixed in commit `137dd85` to beta: per-session single-instance mutex + `WTSEnumerateProcessesW` reconciliation + graceful shutdown event (fix #3 dormant pending `terminate_all` wiring — coord todo `25fdf31a`). Verified by Grok + Code Review Agent. |
---
@@ -411,6 +427,7 @@ These decisions are locked. Do not reverse without explicit user approval.
- Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]
- Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
- **2026-06-02 recompile:** Folded in BSOD detection feature (Phase 1 shipped — agent/src/bsod.rs, migration 048, ws handler, always-Critical alerts, verified against real 0x116 dump); server build now wired into webhook (change-gated + rollback); build channel default changed to beta (stable is explicit promote); versions updated to agent 0.6.51 / server 0.3.37; fleet converged. Corrected submodule framing (tracks active repo, develop here + push to Gitea — not "stale, do not develop"). Added build-server.sh change-gate marker and server build log to Key Files. Added server's root RMM agent as a good pattern. Updated Current Focus with BSOD Phase 2/3 and Linux fleet unit drift. Added four new anti-patterns (minidump crate, default-stable builds, webhook agent-only gap, auto-update race). Migration count updated 46 → 48.
- **2026-06-04 recompile:** Corrected GURU-5070 channel state — live Postgres confirms `update_channel = 'beta'` per-agent (not stable as the 2026-06-02 session log implied). Stable fleet pinned at 0.6.47 (not 0.6.51). GURU-5070 on 0.6.54 beta. Beta channel has no `update_rollouts` pin. Added BUG-020 (tray duplicate/ghost icons) — symptom, root cause, fix commit `137dd85`, dormant follow-up for fix #3 wiring. Updated Summary, Components table, Active State, Current Focus, History, Good Patterns, and Compilation Notes. Added sources entry for live Postgres query + commit 137dd85. Added `aliases: [guru-rmm]` frontmatter to cross-reference the tombstone at `wiki/projects/guru-rmm.md`.
## Backlinks