Files
claudetools/wiki/projects/gururmm.md
Mike Swanson d4eb8358ce wiki: add capability synthesis to wiki-compile; recompile GuruRMM
Skill + template:
- wiki-compile Phase 2P: type-aware authoritative-artifact discovery for
  projects (migrations, API routes, agent modules, roadmap-done, commit log),
  with a stale-submodule guard that reads origin/main when the pinned
  submodule lags. Changelogs treated as incomplete, not authoritative.
- project template: add a Capabilities / Feature Set section.

GuruRMM recompile (from live main artifacts, not session logs):
- Added Capabilities / Feature Set section covering monitoring, remote
  execution (incl. system vs user_session contexts), inventory/discovery,
  update mgmt, policy, alerting/watchdog, backup, tunnel, identity/security.
- Fixed the misleading "runs as LocalSystem" command-fields line (the gap
  that started this) and the stale BUG-001 temperature claim (now shipped).
- Qualified Entra-only SSO; noted safe-rollout is unwired scaffolding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 18:16:03 -07:00

30 KiB
Raw Blame History

type, name, display_name, last_compiled, compiled_by, sources, backlinks
type name display_name last_compiled compiled_by sources backlinks
project gururmm GuruRMM 2026-05-26 GURU-5070/claude-main
gururmm@main: server/src/api/*.rs (REST API surface, ~30 route modules)
gururmm@main: agent/src/ (agent capabilities; transport/CommandContext, ohw.rs, watchdog/wts.rs)
gururmm@main: server/migrations/*.sql (46 migrations — feature checkpoints, incl. 041_add_command_context)
gururmm@main: docs/FEATURE_ROADMAP.md, docs/specs/
gururmm@main: git log feat/perf history (changelogs incomplete past v0.6.22)
projects/msp-tools/guru-rmm/CONTEXT.md
projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
projects/msp-tools/guru-rmm/docs/UI_GAPS.md
projects/msp-tools/guru-rmm/docs/ARCHITECTURE_DECISIONS.md
projects/msp-tools/guru-rmm/docs/tech-stack.md
projects/msp-tools/guru-rmm/docs/DESIGN.md
.claude/memory/reference_gururmm_server.md
.claude/memory/reference_gururmm_api.md
.claude/memory/gururmm-development-principles.md
.claude/memory/feedback_gururmm_agent_parity.md
.claude/memory/reference_pluto_build_server.md
.claude/memory/project_mac_gururmm_setup_pending.md
credentials.md
session-logs/2025-12-15-session.md
session-logs/2025-12-20-session.md
session-logs/2026-04-19-session.md
session-logs/2026-04-21-session.md
session-logs/2026-04-29-session.md
session-logs/2026-05-12-guru-rmm-macos-agent-phase1.md
session-logs/2026-05-15-session.md
session-logs/2026-05-16-session.md
session-logs/2026-05-17-session.md
session-logs/2026-05-19-gururmm-backup-fixes.md
session-logs/2026-05-19-session.md
session-logs/2026-05-21-session.md
session-logs/2026-05-23-session.md
session-logs/2026-05-24-session.md
session-logs/2026-05-24-GURU-KALI-session.md
clients/cascades-tucson
systems/gururmm-build
systems/jupiter
systems/pluto

GuruRMM

Summary

GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.

Current version: agent 0.6.39 / 0.6.41 in fleet as of 2026-05-26 (server v0.3.x). Note: committed changelogs are stale (stop at agent v0.6.22 / server v0.3.1) — migrations + commit log are the authoritative feature record, not changelogs.

Repo: azcomputerguru/gururmm on Gitea (internal: http://172.16.3.20:3000). The copy at D:\claudetools\projects\msp-tools\guru-rmm is a stale reference submodule — do NOT develop there; all real work happens in the Gitea repo.

Goal: Full-featured MSP platform rivaling commercial RMMs, with a companion PSA (GuruPSA, separate future repo) designed as a truly integrated unified system — not bolted-together products.


Capabilities / Feature Set

Synthesized from authoritative artifacts (API routes, agent modules, 46 migrations, roadmap, commit log) at live main — not from session logs. See Compilation Notes.

Agent↔server communication is a persistent authenticated WebSocket with auto-reconnect + heartbeat; on reconnect, in-flight commands flip to interrupted. Platform-parity rule: agent features ship on Windows/Linux/macOS in the same change (stub + TODO where a real impl isn't yet feasible).

Monitoring & Telemetry

  • Core metrics per interval (policy-tunable per section): CPU %, memory %/bytes, disk %/bytes, network rx/tx deltas, uptime, logged-in user, user idle time (Win GetLastInputInfo, Linux xprintidle), public/WAN IP (cached, multi-service fallback). Cross-platform via sysinfo.
  • Hardware sensor telemetry: CPU/GPU temps + full sensor array (temperature, voltage, fan RPM, power). Windows via bundled LibreHardwareMonitor + WMI (ohw.rs); Linux via /sys/class/thermal; sysinfo::Components fallback.
  • Process drill-down: top 10 by CPU + top 10 by memory in every metrics payload (cross-platform).
  • Network state: per-interface IPv4/IPv6, MAC, derived CIDR subnets — sent on change only.

Remote Execution

  • Command types: shell, powershell, python, raw script{interpreter}, and claude_task. Options: timeout_seconds, elevated.
  • Execution context (041_add_command_context): system (default — Session 0 / service SYSTEM) or user_session (runs in the active logged-on user's desktop session via WTS token impersonation: WTSQueryUserToken + CreateProcessAsUserW + per-user env block). Windows-only; requires an active session.
  • Commands individually cancellable (POST /commands/:id/cancel) and aborted on disconnect. Auditable history; status running/completed/failed/timeout/interrupted.
  • Script library (017_scripts): stored scripts dispatched to agents with args/env/timeout/run_as_user; per-run history.

Inventory & Discovery

  • Hardware inventory (mfr/model/serial/BIOS, CPU, memory, disks, NICs, OS), software inventory (installed apps), service inventory. On-demand refresh.
  • VM / hypervisor / container detection (032/033): is_virtual_machine, hypervisor_type, vm_uuid, is_hypervisor + hosted VM UUIDs, is_container, is_unraid.
  • User/group inventory (037040): local + domain + Azure AD accounts (enabled, pw-never-expires, last-logon, is_admin, AD email/UPN/dept), domain-join classification (none/ad/aad/hybrid), domain name, M365 tenant ID, domain-controller detection (is_dc), group membership. Policy-scheduled (default 24h).
  • Network discovery: server-dispatched scans — TCP probes over configurable ranges/ports, ARP MAC, reverse DNS, basic OS fingerprint; devices stream back and are persisted/editable.

Patch / Agent Update Management

  • Self-updater: server sends version + URL + SHA256; agent downloads, verifies checksum, atomically swaps binary, restarts, and auto-rolls back to a kept backup if it fails to reconnect (~180s window).
  • Auto-update gated on effective policy auto_update (channel + defer_hours; maintenance-window field received but not yet enforced [verify]). Force via POST /agents/:id/update. Update channels at agent/site/client (026).
  • Safe-rollout (046): update_rollouts/health-metrics/events tables + /updates/rollouts promote/rollback. Scaffolding only — promotion is manual; health-gated automation is written-but-unwired (Phase 2).

Policy & Configuration Management

  • Inheritance chain global → client → site → agent; server computes merged effective policy, pushes via ConfigUpdate. Effective policy queryable per scope.
  • Checks engine (018/019): cpu, memory, disk, ping, port, script, service (restart-if-stopped, pass-if-not-exist; Win sc.exe, Linux systemctl). Policy-attached check templates (024) with push-to-agent sync. On-demand run-checks.
  • Remote registry (Windows, winreg): agent supports enumerate/read/write (typed)/create/delete. HTTP API currently exposes read-only (enumerate, read_value); write paths exist in the agent but aren't routed yet [verify].

Alerting & Watchdog

  • Threshold alerts (ack/resolve, per-agent + fleet summary, dashboard filter). Alert templates (022) with effective resolution; per-client email settings (020). Maintenance mode (021) to suppress alerting per scope.
  • Watchdog: separate supervising process (polls GuruRMMAgent every 30s, restart backoff, alert after 3 fails) + launches/reaps the tray into active user sessions via WTS. Full alert CRUD + ack/resolve.

Backup Integration (MSP360 / MSPBackups)

  • Multi-provider config (034/035) with connection test, scheduled sync, per-agent + all-providers status, fleet coverage report, and agent↔MSP360 mapping (044) with confidence scoring + manual verification.

Remote Access (Tunnel)

  • Agent side substantially built (TunnelManager state machine; Open/Close/Data). Server side is a dead-code skeleton — not declared in api/mod.rs, no /tunnel routes, WS handler logs "not yet implemented." Not production-ready.

Identity / Multi-tenancy / Security

  • Auth: JWT (login/register/me); agents auth over WS via per-agent API key + hardware device_id.
  • Microsoft Entra ID SSO (OAuth2/OIDC + PKCE), gated on server config. Multi-provider incl. Google is spec'd (SPEC-008) but Google not implemented [verify].
  • Organizations / multi-tenancy: org CRUD, per-org membership + roles, limits, dev-admin user impersonation (/auth/impersonate/:id). Backend present; dashboard UI partial.
  • Encrypted credentials vault (016): scoped global/client/site, typed (password, SSH key, SNMP), metadata-only by default with separate /reveal decrypt endpoint (known HIGH item: /reveal ownership-scope check — [verify current state]).
  • Enrollment & keys (012): per-agent key issuance on first run, site API keys (regenerable), site-specific MSI with SITEKEY injected at download, public install-report ingestion. Legacy PowerShell agent path for Server 2008 R2.
  • Logs: agent log upload (periodic + on-demand), per-agent events (042), fleet log view, AI-assisted log analysis (/logs/analyze) — AI-optional per locked decision.

Platform coverage

  • Cross-platform (Win/Linux/macOS): metrics, network state, hardware/software/service inventory, user/group + DC/domain detection, checks (service checks Win+Linux), discovery, self-update, scripts/commands.
  • Windows-only: user_session command context (WTS), LibreHardwareMonitor temps, remote registry, tray-into-session, watchdog SCM supervision.
  • macOS: agent deployed (Phase 1); tray is a stub; automated Mac build pipeline is an intentional stub (no build host) [verify before claiming CI Mac releases].

Architecture

Components

Component Location Tech State
Server 172.16.3.30:3001, systemd gururmm-server, binary /usr/local/bin/gururmm-server Rust, Axum deployed, production
Dashboard https://rmm.azcomputerguru.com, nginx at /var/www/gururmm/dashboard/ React + TypeScript + Vite, shadcn/ui, Tailwind CSS v4 deployed, production
Agent (Windows) Endpoints, installed as GuruRMMAgent Windows service via WiX MSI Rust, Windows MSVC deployed, fleet on 0.6.38
Agent (Linux) Endpoints, systemd gururmm-agent, binary /usr/local/bin/gururmm-agent Rust, musl static deployed
Agent (macOS) Endpoints, LaunchDaemon com.azcomputerguru.gururmm-agent.plist Rust, aarch64/x86_64 Phase 1 deployed 2026-05-12; code signing issue on Apple Silicon
Tray (Windows) System tray, named pipe IPC Rust deployed
Tray (Linux) System tray, Unix socket IPC, libappindicator/GTK Rust, GTK deployed 2026-05-24 (PR #13+#14 merged)
Tray (macOS) Menu bar Rust stub/TODO (issue #18)
PostgreSQL DB localhost:5432 on 172.16.3.30, database gururmm PostgreSQL deployed
Coord API 172.16.3.30:8001/api/coord FastAPI (part of ClaudeTools API) deployed
Build pipeline 172.16.3.30:9000 webhook + /opt/gururmm/ scripts Python (webhook-handler.py), Bash deployed; split into per-platform scripts 2026-05-24
Pluto (Windows build VM) 172.16.3.36, Windows Server 2019 VM on Jupiter (Unraid) Rust MSVC, WiX v4 operational

Key Files & Repos

  • Active repo: azcomputerguru/gururmmhttp://172.16.3.20:3000/azcomputerguru/gururmm
  • Reference clone: D:\claudetools\projects\msp-tools\guru-rmm — stale submodule, do not develop here
  • Server binary: /usr/local/bin/gururmm-server on 172.16.3.30
  • Agent binary (Linux): /usr/local/bin/gururmm-agent
  • Agent config (Linux/macOS): /etc/gururmm/agent.toml (root, mode 600); macOS uses /usr/local/etc/gururmm/site.plist
  • Agent registry (Windows): HKLM\SOFTWARE\GuruRMM\SiteId (written by MSI)
  • Windows service name: GuruRMMAgent (NOT gururmm-agent)
  • Downloads dir: /var/www/gururmm/downloads/ on 172.16.3.30
  • Webhook handler: /opt/gururmm/webhook-handler.py (port 9000, systemd gururmm-webhook)
  • Build scripts: /opt/gururmm/build-shared.sh, build-linux.sh, build-windows.sh, build-mac.sh (split 2026-05-24; build-agents.sh is now a compat wrapper)
  • Server build script: /opt/gururmm/build-server.sh (separate pipeline — manual trigger required for server code changes)
  • Per-platform SHA tracking: /opt/gururmm/last-built-commit-{linux,windows,mac}
  • Pluto known-hosts: /opt/gururmm/pluto_known_hosts (pinned SSH keys; installed 2026-05-24)
  • Build log (Linux): /var/log/gururmm-build-linux.log
  • Build log (Windows): /var/log/gururmm-build-windows.log
  • API (internal): http://172.16.3.30:3001
  • API (external): https://rmm-api.azcomputerguru.com (Cloudflare)
  • Dashboard: https://rmm.azcomputerguru.com
  • DB URL: postgres://gururmm:43617ebf7eb242e814ca9988cc4df5ad@localhost:5432/gururmm
  • Vault path: infrastructure/gururmm-server.sops.yaml

Repo Structure

gururmm/
├── agent/          Rust agent (managed endpoints)
│   └── src/
│       ├── ipc.rs      Unix socket IPC (Linux); Windows named pipe
│       ├── tunnel/     TunnelManager state machine
│       ├── metrics/    sysinfo collection + temps (LibreHardwareMonitor/WMI on Win, /sys/class/thermal on Linux) — BUG-001 resolved
│       ├── registry_ops/ Windows registry read/write
│       ├── updater/    Self-update handler
│       └── main.rs     systemd unit template generation
├── server/         Rust/Axum API server
│   └── src/
│       ├── api/        REST handlers
│       ├── db/         Database layer (sqlx)
│       ├── ws/         WebSocket handler
│       └── mspbackups/ MSP360 backup integration
├── tray/           System tray binary
├── installer/      WiX v4 MSI (gururmm-agent.wxs)
├── scripts/        Build/ops scripts
└── docs/           FEATURE_ROADMAP.md, UI_GAPS.md, ARCHITECTURE_DECISIONS.md, tech-stack.md, DESIGN.md, specs/

Development

Current Focus

As of 2026-05-24 (v0.6.38):

  • Tray IPC + peer authorization — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
  • Agent self-update hardening — ProtectSystem=strict needs ReadWritePaths=/var/log /usr/local/bin /etc/gururmm and RuntimeDirectory=gururmm. Fixed in PR #21.
  • Auto-update reliability — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
  • Watchdog alerts UI — backend complete but PUT /watchdog-alerts/:id/resolve and DELETE /watchdog-alerts/:id routes missing on server (found in 2026-05-23 audit).
  • MSP360 backup integration — Phase 1 complete (monitoring, alerts, mapping, storage thresholds). Phase 2 (management) not started.
  • Security audit backlog: credentials/:id/reveal horizontal privilege escalation (HIGH), internal_err() raw DB errors at ~130 call sites (HIGH).

Patterns & Anti-Patterns

Anti-patterns — never repeat:

Pattern What Went Wrong
useMemo with stable deps for data-dependent values queryClient is stable, memo never recomputes after queries resolve. Use useQuery instead.
CSS variable text colors inside the sidebar Sidebar bg is hardcoded dark; CSS vars flip in light mode. Use text-white explicitly inside sidebar.
Deploying without stopping the server first "text file busy" kernel error. Always systemctl stop before cp.
Building without DATABASE_URL sqlx compile-time macros fail. DATABASE_URL is in /home/guru/.cargo/env.
DB migrations without inserting into _sqlx_migrations Server crashes on start. Must insert SHA-384 checksum manually.
WiX MSI builds on Linux WiX requires msi.dll. MSI must be built on Pluto (Windows).
Manual builds via SSH All builds go through webhook-handler.py. Never SSH and run cargo build + artifact copy manually.
TOML/config for agent endpoint or site_id Server URL compiled into binary, site_id baked into MSI. No runtime config files for these values.
path.find('\\') in #[cfg(windows)] files Compiles on Linux silently, fails on Pluto MSVC with unterminated char literal. Use '\\\\'.
STATUS_BADGE_CLASSES Record const Vite/Rollup may optimize away the lookup. Use explicit getStatusBadgeClass() if/else function.
SSH heredoc for TypeScript edits Shell strips double-quote characters. Edit locally in submodule, push to Gitea, pull on server.
Restart-Service GuruRMMAgent -Force in command scripts Kills agent before it can report result. Commands stay forever running. Use scheduled task with delay instead.
sudo -u guru git in systemd build context git rejects repo as dubious ownership when running as root on guru-owned repo. Use safe.directory config or sudo -u guru git.
Self-updating running bash script bash reads line-by-line from disk; replacing mid-execution silently skips remaining blocks.
+1.77 legacy builds without --ignore-rust-version Fail MSRV check after adding rust-version to Cargo.toml. Add --ignore-rust-version to legacy build lines only.
StrictHostKeyChecking=no for Pluto SSH Replaced with pinned known-hosts at /opt/gururmm/pluto_known_hosts. MITM would compromise build artifacts.
CRLF line endings in migration files sqlx SHA-384 checksum mismatch causes server crash on start. .gitattributes + core.autocrlf=false + pre-commit hook prevents this.
Dead WebSocket write half WS write fails, send task dies, receive loop keeps agent in ConnectedAgents with dead write half. Commands silently fail. Fix: tokio::select! monitoring both tasks.

Good patterns:

  • Platform parity rule — any agent feature goes on Windows + Linux + macOS in the same commit. If a real implementation isn't feasible, add a working stub + // TODO(platform): <os> — <reason>. No silent no-ops.
  • Per-platform last-built-commit tracking — Linux builds succeed and record progress independently of Windows builds.
  • Holistic feature development — every feature ships backend + API + dashboard UI + docs together. Backend-only features are rejected.
  • sqlx offline mode — compile-time query validation requires DB reachable or offline cache present.
  • RuntimeDirectory=gururmm in systemd unit — systemd-native way to give agent writable /run/gururmm/ for IPC socket.
  • Registry-first path resolution — read HKLM:\SOFTWARE\GuruRMM for install dir, fall back to service PathName, then hardcoded default.
  • interrupt_running_commands() at reconnect — flips all status='running' commands for reconnecting agent to status='interrupted'.

Build & Deploy

CRITICAL: Never trigger builds manually via SSH. All builds go through the webhook pipeline.

Gitea push to main
  -> webhook-handler.py (172.16.3.30:9000, parallel threads per platform)
    -> build-shared.sh   (auto-version bump, git sync — runs once)
    -> build-linux.sh    (cargo build on .30; log: /var/log/gururmm-build-linux.log)
    -> build-windows.sh  (SSH -> Pluto 172.16.3.36 via pinned known-hosts
                          cargo build --release x64 MSVC + i686 MSVC
                          +1.77 legacy builds with --ignore-rust-version
                          WiX MSI build for site-specific base
                          sign-windows.sh (jsign + Azure Trusted Signing)
                          SCP artifacts back; log: /var/log/gururmm-build-windows.log)
    -> build-mac.sh      (stub — no build machine configured yet)
  -> artifacts -> /var/www/gururmm/downloads/ with sha256 + -latest symlinks
  -> per-platform last-built-commit files updated
  -> systemctl restart gururmm-agent (local agent on .30)

Auto-version: build-shared.sh diffs agent/, server/, dashboard/ against last built SHA. For each changed component, bumps patch version in Cargo.toml or package.json, commits [ci-version-bump], pushes. Webhook skips builds where all commits are version bumps.

Server code changes — separate manual step, NOT in agent pipeline:

sudo /opt/gururmm/build-server.sh

Dashboard deploy — also separate:

cd /home/guru/gururmm/dashboard && sudo -u guru npm run build
sudo rsync -av --delete /home/guru/gururmm/dashboard/dist/ /var/www/gururmm/dashboard/

DB migrations — manual; must insert SHA-384 checksum into _sqlx_migrations or server crashes on start.

Pluto (172.16.3.36):

  • Windows Server 2019 VM on Jupiter (Unraid)
  • SSH: ssh -o UserKnownHostsFile=/opt/gururmm/pluto_known_hosts Administrator@172.16.3.36
  • Rust stable 1.95.0 + 1.77 pinned for legacy builds
  • VS Build Tools (MSVC), sccache at C:\sccache, WiX v4, Gitea clone at C:\gururmm\

Auto-update delivery:

  • Server scans every 300s; dispatches update command on agent heartbeat
  • Gated on effective policy auto_update (default on when policy is null)
  • Agent: downloads to PrivateTmp, verifies SHA-256, replaces binary, restarts service
  • Force-trigger: POST /api/agents/:id/update

Active State

Fleet (as of 2026-05-24, live API verified):

  • 55 enrolled agents total; 37 online
  • 40/55 on 0.6.38 (current). 15 laggards — all offline; will self-update on reconnect.
  • Laggards by version: 6× v0.6.27, 4× v0.6.3, 2× v0.6.29, 1× v0.6.28, 1× v0.6.2, 1× v0.6.1 (Mikes-MacBook-Air.local — macOS, significant lag)
  • "Saturn" agent not present in API as of 2026-05-24 — concern resolved or entry was removed.

Enrolled clients/sites (live API, 2026-05-24):

Client Type Sites Notable agents
AZ Computer Guru (internal) Internal DF Server Storage, Howard-VM, Main Office, Mike's Car, Mikes House Jupiter, PLUTO, gururmm, GURU-KALI, GURU-5070, Mikes-MacBook-Air.local, ACG-DC16, NEPTUNE, ix.azcomputerguru.com
BirthBiologic Corporate Main Office BB-SERVER
Cascades of Tucson Corporate CascadesTucson 27 agents — CS-SERVER, RECEPTIONIST-PC, ASSISTMAN-PC, MDIRECTOR-PC, MEMRECEPT-PC, and ~22 others
Dataforth Corp Corporate D1 AD2, DF-GAGETRAK
Grabb & Durando Law Office Corporate Main Office GND-SERVER
Instrumental Music Center Corporate IMCMain IMC1
Key, Paul Residential Home KEY-MEDIA
Peaceful Spirit Residential Bridgette Home, Country Club, Mara Home BridgettePSHomeComputer, PST-SERVER, PST-SURFACE, Maras-HP-Laptop, MaraHomeNew
Safesite Corporate Glendale MSI
Sombra Residential LLC Corporate main office DESKTOP-UQRN4K3, Server2013
Stamback Septic Corporate StambackSeptic DESKTOP-BTR2AM3, StambackLaptopNew
Swanson, Len Residential Home LAS-GAMER

API auth:

  • POST /api/auth/login → JWT (~24h)
  • Creds: vault infrastructure/gururmm-server.sops.yamlcredentials.gururmm-api.admin-email / admin-password
  • Key endpoints: GET /api/agents, POST /api/agents/:id/command, GET /api/commands/:id, POST /api/agents/:id/update
  • Command fields: command_type (shell/powershell/python/script/claude_task), command (script text, JSON-encoded), optional contextsystem (default; Session 0 / SYSTEM) or user_session (runs in the logged-on user's desktop session via WTS token impersonation; Windows-only, needs an active session), plus timeout_seconds/elevated. The agent does NOT run everything as LocalSystem — user_session is the per-user path (migration 041_add_command_context, agent/src/watchdog/wts.rs).
  • Response: stdout, stderr, exit_code, status (running/completed/failed/timeout/interrupted)

Dashboard — complete and working: Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card.

Dashboard — incomplete (see UI_GAPS.md):

  • Temperature monitoring (BUG-001) — RESOLVED: agent-side collection is wired (LibreHardwareMonitor/WMI on Windows, /sys/class/thermal on Linux, sysinfo fallback). Roadmap BUG-001 text is stale.
  • Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
  • Watchdog alerts UI — blocked on 2 missing server routes
  • MSPBackups management UI — backend complete, no frontend
  • Organizations management UI — multi-tenancy backend done, no frontend
  • Tunnel session management (interactive terminal — backend skeleton, not production-ready)

Open Gitea issues:

  • #15 — Pipeline tray build (publish tray binary to downloads)
  • #16 — Windows IPC peer authz
  • #17 — logind console user resolution
  • #18 — macOS tray
  • #19 — subscriber broadcast

Security backlog (HIGH):

  • credentials/:id/reveal — horizontal privilege escalation (no ownership scope check)
  • internal_err() — ~130 call sites returning raw DB errors to callers

Key Architecture Decisions (LOCKED)

These decisions are locked. Do not reverse without explicit user approval.

  1. Per-agent enrollment keys — MSI contains server URL + site_id only. Agent calls POST /api/enroll on first run; server issues unique per-agent key stored hashed. Enables revocation, clone detection, audit trail.
  2. Site-specific MSI generation — Universal base MSI from CI; dashboard endpoint generates site-specific MSI with site_id baked in via WiX property → HKLM\SOFTWARE\GuruRMM\SiteId.
  3. No TOML/config for endpoints — Server URL compiled into binary. No runtime config files for server URL or site_id.
  4. Policy inheritance chain — global → site → client → agent. Server computes merged effective policy and pushes via ConfigUpdate WebSocket message.
  5. Platform parity rule — Any agent feature ships on Windows, Linux, and macOS in the same change. Stub + TODO required if a real implementation is not yet feasible.
  6. Watchdog as separate process — Main agent cannot reliably restart itself after a crash.
  7. Build pipeline is the only path to production — Enforces signing, checksum generation, consistent artifact layout.
  8. Multi-tenancy identity model (ADR-001) — Dev team with partner impersonation. Three levels: Dev → Partner → Client. Computer Guru is partner #1.
  9. Holistic feature development (DESIGN.md) — Every feature requires backend + API + dashboard UI + documentation. Backend-only features are rejected.
  10. AI-optional operation — GuruRMM must be fully functional without AI. AI features are enhancements, not requirements.

History Highlights

Date Event
2025-12-15 Project genesis: Windows service + Linux installer + site code auth + build server. DB migrated from Jupiter Docker to local PostgreSQL.
2026-04-19 Full drill-down navigation, auto-install on first run, Pluto build VM setup started.
2026-04-21 MSI build fix (missing WiX extension flag). DESIGN.md created (holistic development mandate). BirthBiologic onboarded.
2026-04-29 UI_GAPS.md created. Holistic development principle formalized.
2026-05-12 macOS agent Phase 1 deployed from Mikes-MacBook-Air. Code signing issue on Apple Silicon noted.
2026-05-15 Dead WebSocket write-half bug fixed. Temperature struct field name mismatch fixed.
2026-05-16 Watchdog bugs fixed (sc.exe fallback, suppress_until, hypervisor detection). /feature-request skill created.
2026-05-17 Syncro PSA Integration added to roadmap (P1) after Howard /feature-request. Office power failure recovery — all VMs recovered.
2026-05-18 Multi-tenancy architecture (ADR-001) decided. 5 SPEC documents created (SPEC-001 through SPEC-006).
2026-05-19 4-bug fix for AD2 crash loop. MSP360 backup integration completed (6 fixes). Clickable CPU/Memory gauge cards + process drill-down modal.
2026-05-23 /rmm-audit pass. Agent optimization Phases 1A-3. Auto-version bump mechanism. MSRV bumped to 1.85. Fleet at 0.6.29.
2026-05-24 Linux tray IPC + GTK (PR #13+#14) and peer-cred authz (PR #14) merged. PR #21 (ReadWritePaths fix) merged. Build pipeline split into per-platform scripts. Pluto known-hosts pinned. Fleet converged to 0.6.38.

Compilation Notes

  • 2026-05-26 recompile: Added the Capabilities / Feature Set section, synthesized from authoritative artifacts at live main (cd27a59) — API route modules, agent source tree, 46 migrations, roadmap, and the feat/perf commit log — NOT from session logs. This was prompted by the prior seeding missing the user_session command context entirely (it had only ever stated "runs as LocalSystem"). Corrected: command execution contexts, temperature monitoring (BUG-001 is resolved, not pending), Entra-only SSO, and added user-inventory/discovery/VM-detection/safe-rollout surfaces. Changelogs are an unreliable capability source here — committed changelogs stop at agent v0.6.22 while the fleet runs 0.6.39+; migrations + commit log are authoritative.
  • Tunnel subsystem (verified against live main): agent side substantially built; server side is a dead-code skeleton (not declared in api/mod.rs, no routes, WS handler logs "not yet implemented"). Confirmed, not unverified.
  • macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). build-mac.sh is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified]
  • Tunnel subsystem: agent-side substantially complete; server-side is dead-code skeleton. Current live status unconfirmed. [unverified]
  • Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]
  • Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
  • clients/cascades-tucson — RECEPTIONIST-PC enrolled (site CascadesTucson)
  • systems/gururmm-build — Linux VM at 172.16.3.30 on Jupiter; GuruRMM API 3001, ClaudeTools API 8001, Coord API, MariaDB, PostgreSQL, build pipeline; originally a container on Jupiter, migrated to own VM
  • systems/jupiter — Unraid host at 172.16.3.20; virsh host for all VMs (GuruRMM VM, Unifi, OwnCloud, Pluto/Claude-Builder); Docker: Gitea port 3000, NPM, Seafile; iptables PREROUTING routes :443 to NPM (NPM proxy rmm-api -> 172.16.3.20:3001 in credentials.md is STALE — actual GuruRMM API is on 172.16.3.30)
  • systems/pluto — Windows build server (MSI, WiX) at 172.16.3.36