Files
claudetools/wiki/projects/gururmm.md
Mike Swanson 2f99a01e7e wiki: correct GuruRMM fleet state and enrolled client list from live API
- Remove stale BB-SERVER/RECEPTIONIST-PC laggard note (both on 0.6.38)
- Add actual laggards (15 offline agents on older versions)
- Replace 4-entry enrolled sites list with full 12-client table from live API
- Note Saturn agent not present in API (concern resolved)
- Update overview.md fleet count and client table to match

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 16:48:24 -07:00

22 KiB
Raw Blame History

type, name, display_name, last_compiled, compiled_by, sources, backlinks
type name display_name last_compiled compiled_by sources backlinks
project gururmm GuruRMM 2026-05-24 DESKTOP-0O8A1RL/claude-main
projects/msp-tools/guru-rmm/CONTEXT.md
projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
projects/msp-tools/guru-rmm/docs/UI_GAPS.md
projects/msp-tools/guru-rmm/docs/ARCHITECTURE_DECISIONS.md
projects/msp-tools/guru-rmm/docs/tech-stack.md
projects/msp-tools/guru-rmm/docs/DESIGN.md
.claude/memory/reference_gururmm_server.md
.claude/memory/reference_gururmm_api.md
.claude/memory/gururmm-development-principles.md
.claude/memory/feedback_gururmm_agent_parity.md
.claude/memory/reference_pluto_build_server.md
.claude/memory/project_mac_gururmm_setup_pending.md
credentials.md
session-logs/2025-12-15-session.md
session-logs/2025-12-20-session.md
session-logs/2026-04-19-session.md
session-logs/2026-04-21-session.md
session-logs/2026-04-29-session.md
session-logs/2026-05-12-guru-rmm-macos-agent-phase1.md
session-logs/2026-05-15-session.md
session-logs/2026-05-16-session.md
session-logs/2026-05-17-session.md
session-logs/2026-05-19-gururmm-backup-fixes.md
session-logs/2026-05-19-session.md
session-logs/2026-05-21-session.md
session-logs/2026-05-23-session.md
session-logs/2026-05-24-session.md
session-logs/2026-05-24-GURU-KALI-session.md
clients/cascades-tucson
systems/gururmm-build
systems/jupiter
systems/pluto

GuruRMM

Summary

GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.

Current version: 0.6.38 (as of 2026-05-24; fleet converged within ~10 minutes of publish)

Repo: azcomputerguru/gururmm on Gitea (internal: http://172.16.3.20:3000). The copy at D:\claudetools\projects\msp-tools\guru-rmm is a stale reference submodule — do NOT develop there; all real work happens in the Gitea repo.

Goal: Full-featured MSP platform rivaling commercial RMMs, with a companion PSA (GuruPSA, separate future repo) designed as a truly integrated unified system — not bolted-together products.


Architecture

Components

Component Location Tech State
Server 172.16.3.30:3001, systemd gururmm-server, binary /usr/local/bin/gururmm-server Rust, Axum deployed, production
Dashboard https://rmm.azcomputerguru.com, nginx at /var/www/gururmm/dashboard/ React + TypeScript + Vite, shadcn/ui, Tailwind CSS v4 deployed, production
Agent (Windows) Endpoints, installed as GuruRMMAgent Windows service via WiX MSI Rust, Windows MSVC deployed, fleet on 0.6.38
Agent (Linux) Endpoints, systemd gururmm-agent, binary /usr/local/bin/gururmm-agent Rust, musl static deployed
Agent (macOS) Endpoints, LaunchDaemon com.azcomputerguru.gururmm-agent.plist Rust, aarch64/x86_64 Phase 1 deployed 2026-05-12; code signing issue on Apple Silicon
Tray (Windows) System tray, named pipe IPC Rust deployed
Tray (Linux) System tray, Unix socket IPC, libappindicator/GTK Rust, GTK deployed 2026-05-24 (PR #13+#14 merged)
Tray (macOS) Menu bar Rust stub/TODO (issue #18)
PostgreSQL DB localhost:5432 on 172.16.3.30, database gururmm PostgreSQL deployed
Coord API 172.16.3.30:8001/api/coord FastAPI (part of ClaudeTools API) deployed
Build pipeline 172.16.3.30:9000 webhook + /opt/gururmm/ scripts Python (webhook-handler.py), Bash deployed; split into per-platform scripts 2026-05-24
Pluto (Windows build VM) 172.16.3.36, Windows Server 2019 VM on Jupiter (Unraid) Rust MSVC, WiX v4 operational

Key Files & Repos

  • Active repo: azcomputerguru/gururmmhttp://172.16.3.20:3000/azcomputerguru/gururmm
  • Reference clone: D:\claudetools\projects\msp-tools\guru-rmm — stale submodule, do not develop here
  • Server binary: /usr/local/bin/gururmm-server on 172.16.3.30
  • Agent binary (Linux): /usr/local/bin/gururmm-agent
  • Agent config (Linux/macOS): /etc/gururmm/agent.toml (root, mode 600); macOS uses /usr/local/etc/gururmm/site.plist
  • Agent registry (Windows): HKLM\SOFTWARE\GuruRMM\SiteId (written by MSI)
  • Windows service name: GuruRMMAgent (NOT gururmm-agent)
  • Downloads dir: /var/www/gururmm/downloads/ on 172.16.3.30
  • Webhook handler: /opt/gururmm/webhook-handler.py (port 9000, systemd gururmm-webhook)
  • Build scripts: /opt/gururmm/build-shared.sh, build-linux.sh, build-windows.sh, build-mac.sh (split 2026-05-24; build-agents.sh is now a compat wrapper)
  • Server build script: /opt/gururmm/build-server.sh (separate pipeline — manual trigger required for server code changes)
  • Per-platform SHA tracking: /opt/gururmm/last-built-commit-{linux,windows,mac}
  • Pluto known-hosts: /opt/gururmm/pluto_known_hosts (pinned SSH keys; installed 2026-05-24)
  • Build log (Linux): /var/log/gururmm-build-linux.log
  • Build log (Windows): /var/log/gururmm-build-windows.log
  • API (internal): http://172.16.3.30:3001
  • API (external): https://rmm-api.azcomputerguru.com (Cloudflare)
  • Dashboard: https://rmm.azcomputerguru.com
  • DB URL: postgres://gururmm:43617ebf7eb242e814ca9988cc4df5ad@localhost:5432/gururmm
  • Vault path: infrastructure/gururmm-server.sops.yaml

Repo Structure

gururmm/
├── agent/          Rust agent (managed endpoints)
│   └── src/
│       ├── ipc.rs      Unix socket IPC (Linux); Windows named pipe
│       ├── tunnel/     TunnelManager state machine
│       ├── metrics/    sysinfo-based collection (temp NOT yet wired — BUG-001)
│       ├── registry_ops/ Windows registry read/write
│       ├── updater/    Self-update handler
│       └── main.rs     systemd unit template generation
├── server/         Rust/Axum API server
│   └── src/
│       ├── api/        REST handlers
│       ├── db/         Database layer (sqlx)
│       ├── ws/         WebSocket handler
│       └── mspbackups/ MSP360 backup integration
├── tray/           System tray binary
├── installer/      WiX v4 MSI (gururmm-agent.wxs)
├── scripts/        Build/ops scripts
└── docs/           FEATURE_ROADMAP.md, UI_GAPS.md, ARCHITECTURE_DECISIONS.md, tech-stack.md, DESIGN.md, specs/

Development

Current Focus

As of 2026-05-24 (v0.6.38):

  • Tray IPC + peer authorization — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
  • Agent self-update hardening — ProtectSystem=strict needs ReadWritePaths=/var/log /usr/local/bin /etc/gururmm and RuntimeDirectory=gururmm. Fixed in PR #21.
  • Auto-update reliability — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
  • Watchdog alerts UI — backend complete but PUT /watchdog-alerts/:id/resolve and DELETE /watchdog-alerts/:id routes missing on server (found in 2026-05-23 audit).
  • MSP360 backup integration — Phase 1 complete (monitoring, alerts, mapping, storage thresholds). Phase 2 (management) not started.
  • Security audit backlog: credentials/:id/reveal horizontal privilege escalation (HIGH), internal_err() raw DB errors at ~130 call sites (HIGH).

Patterns & Anti-Patterns

Anti-patterns — never repeat:

Pattern What Went Wrong
useMemo with stable deps for data-dependent values queryClient is stable, memo never recomputes after queries resolve. Use useQuery instead.
CSS variable text colors inside the sidebar Sidebar bg is hardcoded dark; CSS vars flip in light mode. Use text-white explicitly inside sidebar.
Deploying without stopping the server first "text file busy" kernel error. Always systemctl stop before cp.
Building without DATABASE_URL sqlx compile-time macros fail. DATABASE_URL is in /home/guru/.cargo/env.
DB migrations without inserting into _sqlx_migrations Server crashes on start. Must insert SHA-384 checksum manually.
WiX MSI builds on Linux WiX requires msi.dll. MSI must be built on Pluto (Windows).
Manual builds via SSH All builds go through webhook-handler.py. Never SSH and run cargo build + artifact copy manually.
TOML/config for agent endpoint or site_id Server URL compiled into binary, site_id baked into MSI. No runtime config files for these values.
path.find('\\') in #[cfg(windows)] files Compiles on Linux silently, fails on Pluto MSVC with unterminated char literal. Use '\\\\'.
STATUS_BADGE_CLASSES Record const Vite/Rollup may optimize away the lookup. Use explicit getStatusBadgeClass() if/else function.
SSH heredoc for TypeScript edits Shell strips double-quote characters. Edit locally in submodule, push to Gitea, pull on server.
Restart-Service GuruRMMAgent -Force in command scripts Kills agent before it can report result. Commands stay forever running. Use scheduled task with delay instead.
sudo -u guru git in systemd build context git rejects repo as dubious ownership when running as root on guru-owned repo. Use safe.directory config or sudo -u guru git.
Self-updating running bash script bash reads line-by-line from disk; replacing mid-execution silently skips remaining blocks.
+1.77 legacy builds without --ignore-rust-version Fail MSRV check after adding rust-version to Cargo.toml. Add --ignore-rust-version to legacy build lines only.
StrictHostKeyChecking=no for Pluto SSH Replaced with pinned known-hosts at /opt/gururmm/pluto_known_hosts. MITM would compromise build artifacts.
CRLF line endings in migration files sqlx SHA-384 checksum mismatch causes server crash on start. .gitattributes + core.autocrlf=false + pre-commit hook prevents this.
Dead WebSocket write half WS write fails, send task dies, receive loop keeps agent in ConnectedAgents with dead write half. Commands silently fail. Fix: tokio::select! monitoring both tasks.

Good patterns:

  • Platform parity rule — any agent feature goes on Windows + Linux + macOS in the same commit. If a real implementation isn't feasible, add a working stub + // TODO(platform): <os> — <reason>. No silent no-ops.
  • Per-platform last-built-commit tracking — Linux builds succeed and record progress independently of Windows builds.
  • Holistic feature development — every feature ships backend + API + dashboard UI + docs together. Backend-only features are rejected.
  • sqlx offline mode — compile-time query validation requires DB reachable or offline cache present.
  • RuntimeDirectory=gururmm in systemd unit — systemd-native way to give agent writable /run/gururmm/ for IPC socket.
  • Registry-first path resolution — read HKLM:\SOFTWARE\GuruRMM for install dir, fall back to service PathName, then hardcoded default.
  • interrupt_running_commands() at reconnect — flips all status='running' commands for reconnecting agent to status='interrupted'.

Build & Deploy

CRITICAL: Never trigger builds manually via SSH. All builds go through the webhook pipeline.

Gitea push to main
  -> webhook-handler.py (172.16.3.30:9000, parallel threads per platform)
    -> build-shared.sh   (auto-version bump, git sync — runs once)
    -> build-linux.sh    (cargo build on .30; log: /var/log/gururmm-build-linux.log)
    -> build-windows.sh  (SSH -> Pluto 172.16.3.36 via pinned known-hosts
                          cargo build --release x64 MSVC + i686 MSVC
                          +1.77 legacy builds with --ignore-rust-version
                          WiX MSI build for site-specific base
                          sign-windows.sh (jsign + Azure Trusted Signing)
                          SCP artifacts back; log: /var/log/gururmm-build-windows.log)
    -> build-mac.sh      (stub — no build machine configured yet)
  -> artifacts -> /var/www/gururmm/downloads/ with sha256 + -latest symlinks
  -> per-platform last-built-commit files updated
  -> systemctl restart gururmm-agent (local agent on .30)

Auto-version: build-shared.sh diffs agent/, server/, dashboard/ against last built SHA. For each changed component, bumps patch version in Cargo.toml or package.json, commits [ci-version-bump], pushes. Webhook skips builds where all commits are version bumps.

Server code changes — separate manual step, NOT in agent pipeline:

sudo /opt/gururmm/build-server.sh

Dashboard deploy — also separate:

cd /home/guru/gururmm/dashboard && sudo -u guru npm run build
sudo rsync -av --delete /home/guru/gururmm/dashboard/dist/ /var/www/gururmm/dashboard/

DB migrations — manual; must insert SHA-384 checksum into _sqlx_migrations or server crashes on start.

Pluto (172.16.3.36):

  • Windows Server 2019 VM on Jupiter (Unraid)
  • SSH: ssh -o UserKnownHostsFile=/opt/gururmm/pluto_known_hosts Administrator@172.16.3.36
  • Rust stable 1.95.0 + 1.77 pinned for legacy builds
  • VS Build Tools (MSVC), sccache at C:\sccache, WiX v4, Gitea clone at C:\gururmm\

Auto-update delivery:

  • Server scans every 300s; dispatches update command on agent heartbeat
  • Gated on effective policy auto_update (default on when policy is null)
  • Agent: downloads to PrivateTmp, verifies SHA-256, replaces binary, restarts service
  • Force-trigger: POST /api/agents/:id/update

Active State

Fleet (as of 2026-05-24, live API verified):

  • 55 enrolled agents total; 37 online
  • 40/55 on 0.6.38 (current). 15 laggards — all offline; will self-update on reconnect.
  • Laggards by version: 6× v0.6.27, 4× v0.6.3, 2× v0.6.29, 1× v0.6.28, 1× v0.6.2, 1× v0.6.1 (Mikes-MacBook-Air.local — macOS, significant lag)
  • "Saturn" agent not present in API as of 2026-05-24 — concern resolved or entry was removed.

Enrolled clients/sites (live API, 2026-05-24):

Client Type Sites Notable agents
AZ Computer Guru (internal) Internal DF Server Storage, Howard-VM, Main Office, Mike's Car, Mikes House Jupiter, PLUTO, gururmm, GURU-KALI, GURU-5070, Mikes-MacBook-Air.local, ACG-DC16, NEPTUNE, ix.azcomputerguru.com
BirthBiologic Corporate Main Office BB-SERVER
Cascades of Tucson Corporate CascadesTucson 27 agents — CS-SERVER, RECEPTIONIST-PC, ASSISTMAN-PC, MDIRECTOR-PC, MEMRECEPT-PC, and ~22 others
Dataforth Corp Corporate D1 AD2, DF-GAGETRAK
Grabb & Durando Law Office Corporate Main Office GND-SERVER
Instrumental Music Center Corporate IMCMain IMC1
Key, Paul Residential Home KEY-MEDIA
Peaceful Spirit Residential Bridgette Home, Country Club, Mara Home BridgettePSHomeComputer, PST-SERVER, PST-SURFACE, Maras-HP-Laptop, MaraHomeNew
Safesite Corporate Glendale MSI
Sombra Residential LLC Corporate main office DESKTOP-UQRN4K3, Server2013
Stamback Septic Corporate StambackSeptic DESKTOP-BTR2AM3, StambackLaptopNew
Swanson, Len Residential Home LAS-GAMER

API auth:

  • POST /api/auth/login → JWT (~24h)
  • Creds: vault infrastructure/gururmm-server.sops.yamlcredentials.gururmm-api.admin-email / admin-password
  • Key endpoints: GET /api/agents, POST /api/agents/:id/command, GET /api/commands/:id, POST /api/agents/:id/update
  • Command fields: command_type (powershell/shell/exec), command (script text, JSON-encoded). Windows agent runs as LocalSystem.
  • Response: stdout, stderr, exit_code, status (running/completed/failed/timeout/interrupted)

Dashboard — complete and working: Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO, Policies Dashboard (all tabs), Registry editor, MSP360 backup status card.

Dashboard — incomplete (see UI_GAPS.md):

  • Temperature monitoring (BUG-001) — UI ready, agent-side collection never wired
  • Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
  • Watchdog alerts UI — blocked on 2 missing server routes
  • MSPBackups management UI — backend complete, no frontend
  • Organizations management UI — multi-tenancy backend done, no frontend
  • Tunnel session management (interactive terminal — backend skeleton, not production-ready)

Open Gitea issues:

  • #15 — Pipeline tray build (publish tray binary to downloads)
  • #16 — Windows IPC peer authz
  • #17 — logind console user resolution
  • #18 — macOS tray
  • #19 — subscriber broadcast

Security backlog (HIGH):

  • credentials/:id/reveal — horizontal privilege escalation (no ownership scope check)
  • internal_err() — ~130 call sites returning raw DB errors to callers

Key Architecture Decisions (LOCKED)

These decisions are locked. Do not reverse without explicit user approval.

  1. Per-agent enrollment keys — MSI contains server URL + site_id only. Agent calls POST /api/enroll on first run; server issues unique per-agent key stored hashed. Enables revocation, clone detection, audit trail.
  2. Site-specific MSI generation — Universal base MSI from CI; dashboard endpoint generates site-specific MSI with site_id baked in via WiX property → HKLM\SOFTWARE\GuruRMM\SiteId.
  3. No TOML/config for endpoints — Server URL compiled into binary. No runtime config files for server URL or site_id.
  4. Policy inheritance chain — global → site → client → agent. Server computes merged effective policy and pushes via ConfigUpdate WebSocket message.
  5. Platform parity rule — Any agent feature ships on Windows, Linux, and macOS in the same change. Stub + TODO required if a real implementation is not yet feasible.
  6. Watchdog as separate process — Main agent cannot reliably restart itself after a crash.
  7. Build pipeline is the only path to production — Enforces signing, checksum generation, consistent artifact layout.
  8. Multi-tenancy identity model (ADR-001) — Dev team with partner impersonation. Three levels: Dev → Partner → Client. Computer Guru is partner #1.
  9. Holistic feature development (DESIGN.md) — Every feature requires backend + API + dashboard UI + documentation. Backend-only features are rejected.
  10. AI-optional operation — GuruRMM must be fully functional without AI. AI features are enhancements, not requirements.

History Highlights

Date Event
2025-12-15 Project genesis: Windows service + Linux installer + site code auth + build server. DB migrated from Jupiter Docker to local PostgreSQL.
2026-04-19 Full drill-down navigation, auto-install on first run, Pluto build VM setup started.
2026-04-21 MSI build fix (missing WiX extension flag). DESIGN.md created (holistic development mandate). BirthBiologic onboarded.
2026-04-29 UI_GAPS.md created. Holistic development principle formalized.
2026-05-12 macOS agent Phase 1 deployed from Mikes-MacBook-Air. Code signing issue on Apple Silicon noted.
2026-05-15 Dead WebSocket write-half bug fixed. Temperature struct field name mismatch fixed.
2026-05-16 Watchdog bugs fixed (sc.exe fallback, suppress_until, hypervisor detection). /feature-request skill created.
2026-05-17 Syncro PSA Integration added to roadmap (P1) after Howard /feature-request. Office power failure recovery — all VMs recovered.
2026-05-18 Multi-tenancy architecture (ADR-001) decided. 5 SPEC documents created (SPEC-001 through SPEC-006).
2026-05-19 4-bug fix for AD2 crash loop. MSP360 backup integration completed (6 fixes). Clickable CPU/Memory gauge cards + process drill-down modal.
2026-05-23 /rmm-audit pass. Agent optimization Phases 1A-3. Auto-version bump mechanism. MSRV bumped to 1.85. Fleet at 0.6.29.
2026-05-24 Linux tray IPC + GTK (PR #13+#14) and peer-cred authz (PR #14) merged. PR #21 (ReadWritePaths fix) merged. Build pipeline split into per-platform scripts. Pluto known-hosts pinned. Fleet converged to 0.6.38.

Compilation Notes

  • macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). build-mac.sh is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified]
  • Tunnel subsystem: agent-side substantially complete; server-side is dead-code skeleton. Current live status unconfirmed. [unverified]
  • Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]
  • Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
  • clients/cascades-tucson — RECEPTIONIST-PC enrolled (site CascadesTucson)
  • systems/gururmm-build — Linux VM at 172.16.3.30 on Jupiter; GuruRMM API 3001, ClaudeTools API 8001, Coord API, MariaDB, PostgreSQL, build pipeline; originally a container on Jupiter, migrated to own VM
  • systems/jupiter — Unraid host at 172.16.3.20; virsh host for all VMs (GuruRMM VM, Unifi, OwnCloud, Pluto/Claude-Builder); Docker: Gitea port 3000, NPM, Seafile; iptables PREROUTING routes :443 to NPM (NPM proxy rmm-api -> 172.16.3.20:3001 in credentials.md is STALE — actual GuruRMM API is on 172.16.3.30)
  • systems/pluto — Windows build server (MSI, WiX) at 172.16.3.36