Files
claudetools/wiki/projects/gururmm.md
Mike Swanson 2f99a01e7e wiki: correct GuruRMM fleet state and enrolled client list from live API
- Remove stale BB-SERVER/RECEPTIONIST-PC laggard note (both on 0.6.38)
- Add actual laggards (15 offline agents on older versions)
- Replace 4-entry enrolled sites list with full 12-client table from live API
- Note Saturn agent not present in API (concern resolved)
- Update overview.md fleet count and client table to match

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 16:48:24 -07:00

328 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
type: project
name: gururmm
display_name: GuruRMM
last_compiled: 2026-05-24
compiled_by: DESKTOP-0O8A1RL/claude-main
sources:
- projects/msp-tools/guru-rmm/CONTEXT.md
- projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
- projects/msp-tools/guru-rmm/docs/UI_GAPS.md
- projects/msp-tools/guru-rmm/docs/ARCHITECTURE_DECISIONS.md
- projects/msp-tools/guru-rmm/docs/tech-stack.md
- projects/msp-tools/guru-rmm/docs/DESIGN.md
- .claude/memory/reference_gururmm_server.md
- .claude/memory/reference_gururmm_api.md
- .claude/memory/gururmm-development-principles.md
- .claude/memory/feedback_gururmm_agent_parity.md
- .claude/memory/reference_pluto_build_server.md
- .claude/memory/project_mac_gururmm_setup_pending.md
- credentials.md
- session-logs/2025-12-15-session.md
- session-logs/2025-12-20-session.md
- session-logs/2026-04-19-session.md
- session-logs/2026-04-21-session.md
- session-logs/2026-04-29-session.md
- session-logs/2026-05-12-guru-rmm-macos-agent-phase1.md
- session-logs/2026-05-15-session.md
- session-logs/2026-05-16-session.md
- session-logs/2026-05-17-session.md
- session-logs/2026-05-19-gururmm-backup-fixes.md
- session-logs/2026-05-19-session.md
- session-logs/2026-05-21-session.md
- session-logs/2026-05-23-session.md
- session-logs/2026-05-24-session.md
- session-logs/2026-05-24-GURU-KALI-session.md
backlinks:
- clients/cascades-tucson
- systems/gururmm-build
- systems/jupiter
- systems/pluto
---
# GuruRMM
## Summary
GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.
**Current version:** 0.6.38 (as of 2026-05-24; fleet converged within ~10 minutes of publish)
**Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a stale reference submodule — do NOT develop there; all real work happens in the Gitea repo.
**Goal:** Full-featured MSP platform rivaling commercial RMMs, with a companion PSA (GuruPSA, separate future repo) designed as a truly integrated unified system — not bolted-together products.
---
## Architecture
### Components
| Component | Location | Tech | State |
|---|---|---|---|
| Server | 172.16.3.30:3001, systemd `gururmm-server`, binary `/usr/local/bin/gururmm-server` | Rust, Axum | deployed, production |
| Dashboard | https://rmm.azcomputerguru.com, nginx at `/var/www/gururmm/dashboard/` | React + TypeScript + Vite, shadcn/ui, Tailwind CSS v4 | deployed, production |
| Agent (Windows) | Endpoints, installed as `GuruRMMAgent` Windows service via WiX MSI | Rust, Windows MSVC | deployed, fleet on 0.6.38 |
| Agent (Linux) | Endpoints, systemd `gururmm-agent`, binary `/usr/local/bin/gururmm-agent` | Rust, musl static | deployed |
| Agent (macOS) | Endpoints, LaunchDaemon `com.azcomputerguru.gururmm-agent.plist` | Rust, aarch64/x86_64 | Phase 1 deployed 2026-05-12; code signing issue on Apple Silicon |
| Tray (Windows) | System tray, named pipe IPC | Rust | deployed |
| Tray (Linux) | System tray, Unix socket IPC, libappindicator/GTK | Rust, GTK | deployed 2026-05-24 (PR #13+#14 merged) |
| Tray (macOS) | Menu bar | Rust | stub/TODO (issue #18) |
| PostgreSQL DB | localhost:5432 on 172.16.3.30, database `gururmm` | PostgreSQL | deployed |
| Coord API | 172.16.3.30:8001/api/coord | FastAPI (part of ClaudeTools API) | deployed |
| Build pipeline | 172.16.3.30:9000 webhook + `/opt/gururmm/` scripts | Python (webhook-handler.py), Bash | deployed; split into per-platform scripts 2026-05-24 |
| Pluto (Windows build VM) | 172.16.3.36, Windows Server 2019 VM on Jupiter (Unraid) | Rust MSVC, WiX v4 | operational |
### Key Files & Repos
- **Active repo:** `azcomputerguru/gururmm` — http://172.16.3.20:3000/azcomputerguru/gururmm
- **Reference clone:** `D:\claudetools\projects\msp-tools\guru-rmm` — stale submodule, do not develop here
- **Server binary:** `/usr/local/bin/gururmm-server` on 172.16.3.30
- **Agent binary (Linux):** `/usr/local/bin/gururmm-agent`
- **Agent config (Linux/macOS):** `/etc/gururmm/agent.toml` (root, mode 600); macOS uses `/usr/local/etc/gururmm/site.plist`
- **Agent registry (Windows):** `HKLM\SOFTWARE\GuruRMM\SiteId` (written by MSI)
- **Windows service name:** `GuruRMMAgent` (NOT `gururmm-agent`)
- **Downloads dir:** `/var/www/gururmm/downloads/` on 172.16.3.30
- **Webhook handler:** `/opt/gururmm/webhook-handler.py` (port 9000, systemd `gururmm-webhook`)
- **Build scripts:** `/opt/gururmm/build-shared.sh`, `build-linux.sh`, `build-windows.sh`, `build-mac.sh` (split 2026-05-24; `build-agents.sh` is now a compat wrapper)
- **Server build script:** `/opt/gururmm/build-server.sh` (separate pipeline — manual trigger required for server code changes)
- **Per-platform SHA tracking:** `/opt/gururmm/last-built-commit-{linux,windows,mac}`
- **Pluto known-hosts:** `/opt/gururmm/pluto_known_hosts` (pinned SSH keys; installed 2026-05-24)
- **Build log (Linux):** `/var/log/gururmm-build-linux.log`
- **Build log (Windows):** `/var/log/gururmm-build-windows.log`
- **API (internal):** http://172.16.3.30:3001
- **API (external):** https://rmm-api.azcomputerguru.com (Cloudflare)
- **Dashboard:** https://rmm.azcomputerguru.com
- **DB URL:** `postgres://gururmm:43617ebf7eb242e814ca9988cc4df5ad@localhost:5432/gururmm`
- **Vault path:** `infrastructure/gururmm-server.sops.yaml`
### Repo Structure
```
gururmm/
├── agent/ Rust agent (managed endpoints)
│ └── src/
│ ├── ipc.rs Unix socket IPC (Linux); Windows named pipe
│ ├── tunnel/ TunnelManager state machine
│ ├── metrics/ sysinfo-based collection (temp NOT yet wired — BUG-001)
│ ├── registry_ops/ Windows registry read/write
│ ├── updater/ Self-update handler
│ └── main.rs systemd unit template generation
├── server/ Rust/Axum API server
│ └── src/
│ ├── api/ REST handlers
│ ├── db/ Database layer (sqlx)
│ ├── ws/ WebSocket handler
│ └── mspbackups/ MSP360 backup integration
├── tray/ System tray binary
├── installer/ WiX v4 MSI (gururmm-agent.wxs)
├── scripts/ Build/ops scripts
└── docs/ FEATURE_ROADMAP.md, UI_GAPS.md, ARCHITECTURE_DECISIONS.md, tech-stack.md, DESIGN.md, specs/
```
---
## Development
### Current Focus
As of 2026-05-24 (v0.6.38):
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
- **Agent self-update hardening** — ProtectSystem=strict needs `ReadWritePaths=/var/log /usr/local/bin /etc/gururmm` and `RuntimeDirectory=gururmm`. Fixed in PR #21.
- **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
- **Watchdog alerts UI** — backend complete but `PUT /watchdog-alerts/:id/resolve` and `DELETE /watchdog-alerts/:id` routes missing on server (found in 2026-05-23 audit).
- **MSP360 backup integration** — Phase 1 complete (monitoring, alerts, mapping, storage thresholds). Phase 2 (management) not started.
- **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH).
### Patterns & Anti-Patterns
**Anti-patterns — never repeat:**
| Pattern | What Went Wrong |
|---|---|
| `useMemo` with stable deps for data-dependent values | queryClient is stable, memo never recomputes after queries resolve. Use `useQuery` instead. |
| CSS variable text colors inside the sidebar | Sidebar bg is hardcoded dark; CSS vars flip in light mode. Use `text-white` explicitly inside sidebar. |
| Deploying without stopping the server first | "text file busy" kernel error. Always `systemctl stop` before `cp`. |
| Building without `DATABASE_URL` | sqlx compile-time macros fail. `DATABASE_URL` is in `/home/guru/.cargo/env`. |
| DB migrations without inserting into `_sqlx_migrations` | Server crashes on start. Must insert SHA-384 checksum manually. |
| WiX MSI builds on Linux | WiX requires `msi.dll`. MSI must be built on Pluto (Windows). |
| Manual builds via SSH | All builds go through `webhook-handler.py`. Never SSH and run `cargo build` + artifact copy manually. |
| TOML/config for agent endpoint or site_id | Server URL compiled into binary, site_id baked into MSI. No runtime config files for these values. |
| `path.find('\\')` in `#[cfg(windows)]` files | Compiles on Linux silently, fails on Pluto MSVC with unterminated char literal. Use `'\\\\'`. |
| `STATUS_BADGE_CLASSES` Record const | Vite/Rollup may optimize away the lookup. Use explicit `getStatusBadgeClass()` if/else function. |
| SSH heredoc for TypeScript edits | Shell strips double-quote characters. Edit locally in submodule, push to Gitea, pull on server. |
| `Restart-Service GuruRMMAgent -Force` in command scripts | Kills agent before it can report result. Commands stay forever `running`. Use scheduled task with delay instead. |
| `sudo -u guru git` in systemd build context | git rejects repo as dubious ownership when running as root on guru-owned repo. Use `safe.directory` config or `sudo -u guru git`. |
| Self-updating running bash script | bash reads line-by-line from disk; replacing mid-execution silently skips remaining blocks. |
| `+1.77` legacy builds without `--ignore-rust-version` | Fail MSRV check after adding `rust-version` to Cargo.toml. Add `--ignore-rust-version` to legacy build lines only. |
| `StrictHostKeyChecking=no` for Pluto SSH | Replaced with pinned known-hosts at `/opt/gururmm/pluto_known_hosts`. MITM would compromise build artifacts. |
| CRLF line endings in migration files | sqlx SHA-384 checksum mismatch causes server crash on start. `.gitattributes` + `core.autocrlf=false` + pre-commit hook prevents this. |
| Dead WebSocket write half | WS write fails, send task dies, receive loop keeps agent in `ConnectedAgents` with dead write half. Commands silently fail. Fix: `tokio::select!` monitoring both tasks. |
**Good patterns:**
- **Platform parity rule** — any agent feature goes on Windows + Linux + macOS in the same commit. If a real implementation isn't feasible, add a working stub + `// TODO(platform): <os> — <reason>`. No silent no-ops.
- **Per-platform last-built-commit tracking** — Linux builds succeed and record progress independently of Windows builds.
- **Holistic feature development** — every feature ships backend + API + dashboard UI + docs together. Backend-only features are rejected.
- **sqlx offline mode** — compile-time query validation requires DB reachable or offline cache present.
- **`RuntimeDirectory=gururmm` in systemd unit** — systemd-native way to give agent writable `/run/gururmm/` for IPC socket.
- **Registry-first path resolution** — read `HKLM:\SOFTWARE\GuruRMM` for install dir, fall back to service PathName, then hardcoded default.
- **`interrupt_running_commands()` at reconnect** — flips all `status='running'` commands for reconnecting agent to `status='interrupted'`.
### Build & Deploy
**CRITICAL: Never trigger builds manually via SSH. All builds go through the webhook pipeline.**
```
Gitea push to main
-> webhook-handler.py (172.16.3.30:9000, parallel threads per platform)
-> build-shared.sh (auto-version bump, git sync — runs once)
-> build-linux.sh (cargo build on .30; log: /var/log/gururmm-build-linux.log)
-> build-windows.sh (SSH -> Pluto 172.16.3.36 via pinned known-hosts
cargo build --release x64 MSVC + i686 MSVC
+1.77 legacy builds with --ignore-rust-version
WiX MSI build for site-specific base
sign-windows.sh (jsign + Azure Trusted Signing)
SCP artifacts back; log: /var/log/gururmm-build-windows.log)
-> build-mac.sh (stub — no build machine configured yet)
-> artifacts -> /var/www/gururmm/downloads/ with sha256 + -latest symlinks
-> per-platform last-built-commit files updated
-> systemctl restart gururmm-agent (local agent on .30)
```
**Auto-version:** `build-shared.sh` diffs `agent/`, `server/`, `dashboard/` against last built SHA. For each changed component, bumps patch version in `Cargo.toml` or `package.json`, commits `[ci-version-bump]`, pushes. Webhook skips builds where all commits are version bumps.
**Server code changes** — separate manual step, NOT in agent pipeline:
```bash
sudo /opt/gururmm/build-server.sh
```
**Dashboard deploy** — also separate:
```bash
cd /home/guru/gururmm/dashboard && sudo -u guru npm run build
sudo rsync -av --delete /home/guru/gururmm/dashboard/dist/ /var/www/gururmm/dashboard/
```
**DB migrations** — manual; must insert SHA-384 checksum into `_sqlx_migrations` or server crashes on start.
**Pluto (172.16.3.36):**
- Windows Server 2019 VM on Jupiter (Unraid)
- SSH: `ssh -o UserKnownHostsFile=/opt/gururmm/pluto_known_hosts Administrator@172.16.3.36`
- Rust stable 1.95.0 + 1.77 pinned for legacy builds
- VS Build Tools (MSVC), sccache at `C:\sccache`, WiX v4, Gitea clone at `C:\gururmm\`
**Auto-update delivery:**
- Server scans every 300s; dispatches update command on agent heartbeat
- Gated on effective policy `auto_update` (default on when policy is null)
- Agent: downloads to PrivateTmp, verifies SHA-256, replaces binary, restarts service
- Force-trigger: `POST /api/agents/:id/update`
---
## Active State
**Fleet (as of 2026-05-24, live API verified):**
- 55 enrolled agents total; 37 online
- 40/55 on 0.6.38 (current). 15 laggards — all offline; will self-update on reconnect.
- Laggards by version: 6× v0.6.27, 4× v0.6.3, 2× v0.6.29, 1× v0.6.28, 1× v0.6.2, 1× v0.6.1 (Mikes-MacBook-Air.local — macOS, significant lag)
- "Saturn" agent not present in API as of 2026-05-24 — concern resolved or entry was removed.
**Enrolled clients/sites (live API, 2026-05-24):**
| Client | Type | Sites | Notable agents |
|---|---|---|---|
| AZ Computer Guru (internal) | Internal | DF Server Storage, Howard-VM, Main Office, Mike's Car, Mikes House | Jupiter, PLUTO, gururmm, GURU-KALI, GURU-5070, Mikes-MacBook-Air.local, ACG-DC16, NEPTUNE, ix.azcomputerguru.com |
| BirthBiologic | Corporate | Main Office | BB-SERVER |
| Cascades of Tucson | Corporate | CascadesTucson | 27 agents — CS-SERVER, RECEPTIONIST-PC, ASSISTMAN-PC, MDIRECTOR-PC, MEMRECEPT-PC, and ~22 others |
| Dataforth Corp | Corporate | D1 | AD2, DF-GAGETRAK |
| Grabb & Durando Law Office | Corporate | Main Office | GND-SERVER |
| Instrumental Music Center | Corporate | IMCMain | IMC1 |
| Key, Paul | Residential | Home | KEY-MEDIA |
| Peaceful Spirit | Residential | Bridgette Home, Country Club, Mara Home | BridgettePSHomeComputer, PST-SERVER, PST-SURFACE, Maras-HP-Laptop, MaraHomeNew |
| Safesite | Corporate | Glendale | MSI |
| Sombra Residential LLC | Corporate | main office | DESKTOP-UQRN4K3, Server2013 |
| Stamback Septic | Corporate | StambackSeptic | DESKTOP-BTR2AM3, StambackLaptopNew |
| Swanson, Len | Residential | Home | LAS-GAMER |
**API auth:**
- `POST /api/auth/login` → JWT (~24h)
- Creds: vault `infrastructure/gururmm-server.sops.yaml``credentials.gururmm-api.admin-email` / `admin-password`
- Key endpoints: `GET /api/agents`, `POST /api/agents/:id/command`, `GET /api/commands/:id`, `POST /api/agents/:id/update`
- Command fields: `command_type` (powershell/shell/exec), `command` (script text, JSON-encoded). Windows agent runs as LocalSystem.
- Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)
**Dashboard — complete and working:**
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO, Policies Dashboard (all tabs), Registry editor, MSP360 backup status card.
**Dashboard — incomplete (see UI_GAPS.md):**
- Temperature monitoring (BUG-001) — UI ready, agent-side collection never wired
- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
- Watchdog alerts UI — blocked on 2 missing server routes
- MSPBackups management UI — backend complete, no frontend
- Organizations management UI — multi-tenancy backend done, no frontend
- Tunnel session management (interactive terminal — backend skeleton, not production-ready)
**Open Gitea issues:**
- #15 — Pipeline tray build (publish tray binary to downloads)
- #16 — Windows IPC peer authz
- #17 — logind console user resolution
- #18 — macOS tray
- #19 — subscriber broadcast
**Security backlog (HIGH):**
- `credentials/:id/reveal` — horizontal privilege escalation (no ownership scope check)
- `internal_err()` — ~130 call sites returning raw DB errors to callers
---
## Key Architecture Decisions (LOCKED)
These decisions are locked. Do not reverse without explicit user approval.
1. **Per-agent enrollment keys** — MSI contains server URL + site_id only. Agent calls `POST /api/enroll` on first run; server issues unique per-agent key stored hashed. Enables revocation, clone detection, audit trail.
2. **Site-specific MSI generation** — Universal base MSI from CI; dashboard endpoint generates site-specific MSI with site_id baked in via WiX property → `HKLM\SOFTWARE\GuruRMM\SiteId`.
3. **No TOML/config for endpoints** — Server URL compiled into binary. No runtime config files for server URL or site_id.
4. **Policy inheritance chain** — global → site → client → agent. Server computes merged effective policy and pushes via `ConfigUpdate` WebSocket message.
5. **Platform parity rule** — Any agent feature ships on Windows, Linux, and macOS in the same change. Stub + TODO required if a real implementation is not yet feasible.
6. **Watchdog as separate process** — Main agent cannot reliably restart itself after a crash.
7. **Build pipeline is the only path to production** — Enforces signing, checksum generation, consistent artifact layout.
8. **Multi-tenancy identity model (ADR-001)** — Dev team with partner impersonation. Three levels: Dev → Partner → Client. Computer Guru is partner #1.
9. **Holistic feature development (DESIGN.md)** — Every feature requires backend + API + dashboard UI + documentation. Backend-only features are rejected.
10. **AI-optional operation** — GuruRMM must be fully functional without AI. AI features are enhancements, not requirements.
---
## History Highlights
| Date | Event |
|---|---|
| 2025-12-15 | Project genesis: Windows service + Linux installer + site code auth + build server. DB migrated from Jupiter Docker to local PostgreSQL. |
| 2026-04-19 | Full drill-down navigation, auto-install on first run, Pluto build VM setup started. |
| 2026-04-21 | MSI build fix (missing WiX extension flag). DESIGN.md created (holistic development mandate). BirthBiologic onboarded. |
| 2026-04-29 | UI_GAPS.md created. Holistic development principle formalized. |
| 2026-05-12 | macOS agent Phase 1 deployed from Mikes-MacBook-Air. Code signing issue on Apple Silicon noted. |
| 2026-05-15 | Dead WebSocket write-half bug fixed. Temperature struct field name mismatch fixed. |
| 2026-05-16 | Watchdog bugs fixed (sc.exe fallback, suppress_until, hypervisor detection). /feature-request skill created. |
| 2026-05-17 | Syncro PSA Integration added to roadmap (P1) after Howard /feature-request. Office power failure recovery — all VMs recovered. |
| 2026-05-18 | Multi-tenancy architecture (ADR-001) decided. 5 SPEC documents created (SPEC-001 through SPEC-006). |
| 2026-05-19 | 4-bug fix for AD2 crash loop. MSP360 backup integration completed (6 fixes). Clickable CPU/Memory gauge cards + process drill-down modal. |
| 2026-05-23 | /rmm-audit pass. Agent optimization Phases 1A-3. Auto-version bump mechanism. MSRV bumped to 1.85. Fleet at 0.6.29. |
| 2026-05-24 | Linux tray IPC + GTK (PR #13+#14) and peer-cred authz (PR #14) merged. PR #21 (ReadWritePaths fix) merged. Build pipeline split into per-platform scripts. Pluto known-hosts pinned. Fleet converged to 0.6.38. |
---
## Compilation Notes
- macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). `build-mac.sh` is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified]
- Tunnel subsystem: agent-side substantially complete; server-side is dead-code skeleton. Current live status unconfirmed. [unverified]
- Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]
- Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
## Backlinks
- [[clients/cascades-tucson]] — RECEPTIONIST-PC enrolled (site CascadesTucson)
- [[systems/gururmm-build]] — Linux VM at 172.16.3.30 on Jupiter; GuruRMM API 3001, ClaudeTools API 8001, Coord API, MariaDB, PostgreSQL, build pipeline; originally a container on Jupiter, migrated to own VM
- [[systems/jupiter]] — Unraid host at 172.16.3.20; virsh host for all VMs (GuruRMM VM, Unifi, OwnCloud, Pluto/Claude-Builder); Docker: Gitea port 3000, NPM, Seafile; iptables PREROUTING routes :443 to NPM (NPM proxy `rmm-api -> 172.16.3.20:3001` in credentials.md is STALE — actual GuruRMM API is on 172.16.3.30)
- [[systems/pluto]] — Windows build server (MSI, WiX) at 172.16.3.36