Files
claudetools/wiki/projects/gururmm.md
Mike Swanson d4eb8358ce wiki: add capability synthesis to wiki-compile; recompile GuruRMM
Skill + template:
- wiki-compile Phase 2P: type-aware authoritative-artifact discovery for
  projects (migrations, API routes, agent modules, roadmap-done, commit log),
  with a stale-submodule guard that reads origin/main when the pinned
  submodule lags. Changelogs treated as incomplete, not authoritative.
- project template: add a Capabilities / Feature Set section.

GuruRMM recompile (from live main artifacts, not session logs):
- Added Capabilities / Feature Set section covering monitoring, remote
  execution (incl. system vs user_session contexts), inventory/discovery,
  update mgmt, policy, alerting/watchdog, backup, tunnel, identity/security.
- Fixed the misleading "runs as LocalSystem" command-fields line (the gap
  that started this) and the stale BUG-001 temperature claim (now shipped).
- Qualified Entra-only SSO; noted safe-rollout is unwired scaffolding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 18:16:03 -07:00

394 lines
30 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
type: project
name: gururmm
display_name: GuruRMM
last_compiled: 2026-05-26
compiled_by: GURU-5070/claude-main
sources:
- "gururmm@main: server/src/api/*.rs (REST API surface, ~30 route modules)"
- "gururmm@main: agent/src/ (agent capabilities; transport/CommandContext, ohw.rs, watchdog/wts.rs)"
- "gururmm@main: server/migrations/*.sql (46 migrations — feature checkpoints, incl. 041_add_command_context)"
- "gururmm@main: docs/FEATURE_ROADMAP.md, docs/specs/"
- "gururmm@main: git log feat/perf history (changelogs incomplete past v0.6.22)"
- projects/msp-tools/guru-rmm/CONTEXT.md
- projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
- projects/msp-tools/guru-rmm/docs/UI_GAPS.md
- projects/msp-tools/guru-rmm/docs/ARCHITECTURE_DECISIONS.md
- projects/msp-tools/guru-rmm/docs/tech-stack.md
- projects/msp-tools/guru-rmm/docs/DESIGN.md
- .claude/memory/reference_gururmm_server.md
- .claude/memory/reference_gururmm_api.md
- .claude/memory/gururmm-development-principles.md
- .claude/memory/feedback_gururmm_agent_parity.md
- .claude/memory/reference_pluto_build_server.md
- .claude/memory/project_mac_gururmm_setup_pending.md
- credentials.md
- session-logs/2025-12-15-session.md
- session-logs/2025-12-20-session.md
- session-logs/2026-04-19-session.md
- session-logs/2026-04-21-session.md
- session-logs/2026-04-29-session.md
- session-logs/2026-05-12-guru-rmm-macos-agent-phase1.md
- session-logs/2026-05-15-session.md
- session-logs/2026-05-16-session.md
- session-logs/2026-05-17-session.md
- session-logs/2026-05-19-gururmm-backup-fixes.md
- session-logs/2026-05-19-session.md
- session-logs/2026-05-21-session.md
- session-logs/2026-05-23-session.md
- session-logs/2026-05-24-session.md
- session-logs/2026-05-24-GURU-KALI-session.md
backlinks:
- clients/cascades-tucson
- systems/gururmm-build
- systems/jupiter
- systems/pluto
---
# GuruRMM
## Summary
GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.
**Current version:** agent 0.6.39 / 0.6.41 in fleet as of 2026-05-26 (server v0.3.x). Note: committed changelogs are stale (stop at agent v0.6.22 / server v0.3.1) — migrations + commit log are the authoritative feature record, not changelogs.
**Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a stale reference submodule — do NOT develop there; all real work happens in the Gitea repo.
**Goal:** Full-featured MSP platform rivaling commercial RMMs, with a companion PSA (GuruPSA, separate future repo) designed as a truly integrated unified system — not bolted-together products.
---
## Capabilities / Feature Set
*Synthesized from authoritative artifacts (API routes, agent modules, 46 migrations, roadmap, commit log) at live `main` — not from session logs. See Compilation Notes.*
Agent↔server communication is a persistent authenticated WebSocket with auto-reconnect + heartbeat; on reconnect, in-flight commands flip to `interrupted`. Platform-parity rule: agent features ship on Windows/Linux/macOS in the same change (stub + TODO where a real impl isn't yet feasible).
### Monitoring & Telemetry
- Core metrics per interval (policy-tunable per section): CPU %, memory %/bytes, disk %/bytes, network rx/tx deltas, uptime, logged-in user, user idle time (Win `GetLastInputInfo`, Linux `xprintidle`), public/WAN IP (cached, multi-service fallback). Cross-platform via `sysinfo`.
- Hardware sensor telemetry: CPU/GPU temps + full sensor array (temperature, voltage, fan RPM, power). Windows via bundled **LibreHardwareMonitor** + WMI (`ohw.rs`); Linux via `/sys/class/thermal`; `sysinfo::Components` fallback.
- Process drill-down: top 10 by CPU + top 10 by memory in every metrics payload (cross-platform).
- Network state: per-interface IPv4/IPv6, MAC, derived CIDR subnets — sent on change only.
### Remote Execution
- Command types: `shell`, `powershell`, `python`, raw `script{interpreter}`, and `claude_task`. Options: `timeout_seconds`, `elevated`.
- **Execution context** (`041_add_command_context`): `system` (default — Session 0 / service SYSTEM) **or `user_session`** (runs in the active logged-on user's desktop session via WTS token impersonation: `WTSQueryUserToken` + `CreateProcessAsUserW` + per-user env block). Windows-only; requires an active session.
- Commands individually cancellable (`POST /commands/:id/cancel`) and aborted on disconnect. Auditable history; status running/completed/failed/timeout/interrupted.
- Script library (`017_scripts`): stored scripts dispatched to agents with args/env/timeout/`run_as_user`; per-run history.
### Inventory & Discovery
- Hardware inventory (mfr/model/serial/BIOS, CPU, memory, disks, NICs, OS), software inventory (installed apps), service inventory. On-demand refresh.
- VM / hypervisor / container detection (`032/033`): `is_virtual_machine`, `hypervisor_type`, `vm_uuid`, `is_hypervisor` + hosted VM UUIDs, `is_container`, `is_unraid`.
- User/group inventory (`037``040`): local + domain + Azure AD accounts (enabled, pw-never-expires, last-logon, is_admin, AD email/UPN/dept), domain-join classification (none/ad/aad/hybrid), domain name, M365 tenant ID, **domain-controller detection (`is_dc`)**, group membership. Policy-scheduled (default 24h).
- Network discovery: server-dispatched scans — TCP probes over configurable ranges/ports, ARP MAC, reverse DNS, basic OS fingerprint; devices stream back and are persisted/editable.
### Patch / Agent Update Management
- Self-updater: server sends version + URL + SHA256; agent downloads, verifies checksum, atomically swaps binary, restarts, and **auto-rolls back** to a kept backup if it fails to reconnect (~180s window).
- Auto-update gated on effective policy `auto_update` (channel + defer_hours; maintenance-window field received but not yet enforced [verify]). Force via `POST /agents/:id/update`. Update channels at agent/site/client (`026`).
- Safe-rollout (`046`): `update_rollouts`/health-metrics/events tables + `/updates/rollouts` promote/rollback. **Scaffolding only — promotion is manual; health-gated automation is written-but-unwired (Phase 2).**
### Policy & Configuration Management
- Inheritance chain global → client → site → agent; server computes merged effective policy, pushes via `ConfigUpdate`. Effective policy queryable per scope.
- Checks engine (`018`/`019`): cpu, memory, disk, ping, port, script, service (restart-if-stopped, pass-if-not-exist; Win `sc.exe`, Linux `systemctl`). Policy-attached check templates (`024`) with push-to-agent sync. On-demand `run-checks`.
- Remote registry (Windows, `winreg`): agent supports enumerate/read/write (typed)/create/delete. **HTTP API currently exposes read-only (enumerate, read_value); write paths exist in the agent but aren't routed yet [verify].**
### Alerting & Watchdog
- Threshold alerts (ack/resolve, per-agent + fleet summary, dashboard filter). Alert templates (`022`) with effective resolution; per-client email settings (`020`). Maintenance mode (`021`) to suppress alerting per scope.
- Watchdog: **separate** supervising process (polls `GuruRMMAgent` every 30s, restart backoff, alert after 3 fails) + launches/reaps the tray into active user sessions via WTS. Full alert CRUD + ack/resolve.
### Backup Integration (MSP360 / MSPBackups)
- Multi-provider config (`034`/`035`) with connection test, scheduled sync, per-agent + all-providers status, fleet coverage report, and agent↔MSP360 mapping (`044`) with confidence scoring + manual verification.
### Remote Access (Tunnel)
- Agent side substantially built (`TunnelManager` state machine; Open/Close/Data). **Server side is a dead-code skeleton — not declared in `api/mod.rs`, no `/tunnel` routes, WS handler logs "not yet implemented." Not production-ready.**
### Identity / Multi-tenancy / Security
- Auth: JWT (login/register/me); agents auth over WS via per-agent API key + hardware device_id.
- **Microsoft Entra ID SSO** (OAuth2/OIDC + PKCE), gated on server config. Multi-provider incl. Google is spec'd (SPEC-008) but **Google not implemented [verify]**.
- Organizations / multi-tenancy: org CRUD, per-org membership + roles, limits, dev-admin **user impersonation** (`/auth/impersonate/:id`). Backend present; dashboard UI partial.
- Encrypted credentials vault (`016`): scoped global/client/site, typed (password, SSH key, SNMP), metadata-only by default with separate `/reveal` decrypt endpoint (known HIGH item: `/reveal` ownership-scope check — [verify current state]).
- Enrollment & keys (`012`): per-agent key issuance on first run, site API keys (regenerable), site-specific MSI with SITEKEY injected at download, public install-report ingestion. Legacy PowerShell agent path for Server 2008 R2.
- Logs: agent log upload (periodic + on-demand), per-agent events (`042`), fleet log view, AI-assisted log analysis (`/logs/analyze`) — AI-optional per locked decision.
### Platform coverage
- **Cross-platform (Win/Linux/macOS):** metrics, network state, hardware/software/service inventory, user/group + DC/domain detection, checks (service checks Win+Linux), discovery, self-update, scripts/commands.
- **Windows-only:** `user_session` command context (WTS), LibreHardwareMonitor temps, remote registry, tray-into-session, watchdog SCM supervision.
- **macOS:** agent deployed (Phase 1); tray is a stub; automated Mac build pipeline is an intentional stub (no build host) [verify before claiming CI Mac releases].
---
## Architecture
### Components
| Component | Location | Tech | State |
|---|---|---|---|
| Server | 172.16.3.30:3001, systemd `gururmm-server`, binary `/usr/local/bin/gururmm-server` | Rust, Axum | deployed, production |
| Dashboard | https://rmm.azcomputerguru.com, nginx at `/var/www/gururmm/dashboard/` | React + TypeScript + Vite, shadcn/ui, Tailwind CSS v4 | deployed, production |
| Agent (Windows) | Endpoints, installed as `GuruRMMAgent` Windows service via WiX MSI | Rust, Windows MSVC | deployed, fleet on 0.6.38 |
| Agent (Linux) | Endpoints, systemd `gururmm-agent`, binary `/usr/local/bin/gururmm-agent` | Rust, musl static | deployed |
| Agent (macOS) | Endpoints, LaunchDaemon `com.azcomputerguru.gururmm-agent.plist` | Rust, aarch64/x86_64 | Phase 1 deployed 2026-05-12; code signing issue on Apple Silicon |
| Tray (Windows) | System tray, named pipe IPC | Rust | deployed |
| Tray (Linux) | System tray, Unix socket IPC, libappindicator/GTK | Rust, GTK | deployed 2026-05-24 (PR #13+#14 merged) |
| Tray (macOS) | Menu bar | Rust | stub/TODO (issue #18) |
| PostgreSQL DB | localhost:5432 on 172.16.3.30, database `gururmm` | PostgreSQL | deployed |
| Coord API | 172.16.3.30:8001/api/coord | FastAPI (part of ClaudeTools API) | deployed |
| Build pipeline | 172.16.3.30:9000 webhook + `/opt/gururmm/` scripts | Python (webhook-handler.py), Bash | deployed; split into per-platform scripts 2026-05-24 |
| Pluto (Windows build VM) | 172.16.3.36, Windows Server 2019 VM on Jupiter (Unraid) | Rust MSVC, WiX v4 | operational |
### Key Files & Repos
- **Active repo:** `azcomputerguru/gururmm` — http://172.16.3.20:3000/azcomputerguru/gururmm
- **Reference clone:** `D:\claudetools\projects\msp-tools\guru-rmm` — stale submodule, do not develop here
- **Server binary:** `/usr/local/bin/gururmm-server` on 172.16.3.30
- **Agent binary (Linux):** `/usr/local/bin/gururmm-agent`
- **Agent config (Linux/macOS):** `/etc/gururmm/agent.toml` (root, mode 600); macOS uses `/usr/local/etc/gururmm/site.plist`
- **Agent registry (Windows):** `HKLM\SOFTWARE\GuruRMM\SiteId` (written by MSI)
- **Windows service name:** `GuruRMMAgent` (NOT `gururmm-agent`)
- **Downloads dir:** `/var/www/gururmm/downloads/` on 172.16.3.30
- **Webhook handler:** `/opt/gururmm/webhook-handler.py` (port 9000, systemd `gururmm-webhook`)
- **Build scripts:** `/opt/gururmm/build-shared.sh`, `build-linux.sh`, `build-windows.sh`, `build-mac.sh` (split 2026-05-24; `build-agents.sh` is now a compat wrapper)
- **Server build script:** `/opt/gururmm/build-server.sh` (separate pipeline — manual trigger required for server code changes)
- **Per-platform SHA tracking:** `/opt/gururmm/last-built-commit-{linux,windows,mac}`
- **Pluto known-hosts:** `/opt/gururmm/pluto_known_hosts` (pinned SSH keys; installed 2026-05-24)
- **Build log (Linux):** `/var/log/gururmm-build-linux.log`
- **Build log (Windows):** `/var/log/gururmm-build-windows.log`
- **API (internal):** http://172.16.3.30:3001
- **API (external):** https://rmm-api.azcomputerguru.com (Cloudflare)
- **Dashboard:** https://rmm.azcomputerguru.com
- **DB URL:** `postgres://gururmm:43617ebf7eb242e814ca9988cc4df5ad@localhost:5432/gururmm`
- **Vault path:** `infrastructure/gururmm-server.sops.yaml`
### Repo Structure
```
gururmm/
├── agent/ Rust agent (managed endpoints)
│ └── src/
│ ├── ipc.rs Unix socket IPC (Linux); Windows named pipe
│ ├── tunnel/ TunnelManager state machine
│ ├── metrics/ sysinfo collection + temps (LibreHardwareMonitor/WMI on Win, /sys/class/thermal on Linux) — BUG-001 resolved
│ ├── registry_ops/ Windows registry read/write
│ ├── updater/ Self-update handler
│ └── main.rs systemd unit template generation
├── server/ Rust/Axum API server
│ └── src/
│ ├── api/ REST handlers
│ ├── db/ Database layer (sqlx)
│ ├── ws/ WebSocket handler
│ └── mspbackups/ MSP360 backup integration
├── tray/ System tray binary
├── installer/ WiX v4 MSI (gururmm-agent.wxs)
├── scripts/ Build/ops scripts
└── docs/ FEATURE_ROADMAP.md, UI_GAPS.md, ARCHITECTURE_DECISIONS.md, tech-stack.md, DESIGN.md, specs/
```
---
## Development
### Current Focus
As of 2026-05-24 (v0.6.38):
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
- **Agent self-update hardening** — ProtectSystem=strict needs `ReadWritePaths=/var/log /usr/local/bin /etc/gururmm` and `RuntimeDirectory=gururmm`. Fixed in PR #21.
- **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
- **Watchdog alerts UI** — backend complete but `PUT /watchdog-alerts/:id/resolve` and `DELETE /watchdog-alerts/:id` routes missing on server (found in 2026-05-23 audit).
- **MSP360 backup integration** — Phase 1 complete (monitoring, alerts, mapping, storage thresholds). Phase 2 (management) not started.
- **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH).
### Patterns & Anti-Patterns
**Anti-patterns — never repeat:**
| Pattern | What Went Wrong |
|---|---|
| `useMemo` with stable deps for data-dependent values | queryClient is stable, memo never recomputes after queries resolve. Use `useQuery` instead. |
| CSS variable text colors inside the sidebar | Sidebar bg is hardcoded dark; CSS vars flip in light mode. Use `text-white` explicitly inside sidebar. |
| Deploying without stopping the server first | "text file busy" kernel error. Always `systemctl stop` before `cp`. |
| Building without `DATABASE_URL` | sqlx compile-time macros fail. `DATABASE_URL` is in `/home/guru/.cargo/env`. |
| DB migrations without inserting into `_sqlx_migrations` | Server crashes on start. Must insert SHA-384 checksum manually. |
| WiX MSI builds on Linux | WiX requires `msi.dll`. MSI must be built on Pluto (Windows). |
| Manual builds via SSH | All builds go through `webhook-handler.py`. Never SSH and run `cargo build` + artifact copy manually. |
| TOML/config for agent endpoint or site_id | Server URL compiled into binary, site_id baked into MSI. No runtime config files for these values. |
| `path.find('\\')` in `#[cfg(windows)]` files | Compiles on Linux silently, fails on Pluto MSVC with unterminated char literal. Use `'\\\\'`. |
| `STATUS_BADGE_CLASSES` Record const | Vite/Rollup may optimize away the lookup. Use explicit `getStatusBadgeClass()` if/else function. |
| SSH heredoc for TypeScript edits | Shell strips double-quote characters. Edit locally in submodule, push to Gitea, pull on server. |
| `Restart-Service GuruRMMAgent -Force` in command scripts | Kills agent before it can report result. Commands stay forever `running`. Use scheduled task with delay instead. |
| `sudo -u guru git` in systemd build context | git rejects repo as dubious ownership when running as root on guru-owned repo. Use `safe.directory` config or `sudo -u guru git`. |
| Self-updating running bash script | bash reads line-by-line from disk; replacing mid-execution silently skips remaining blocks. |
| `+1.77` legacy builds without `--ignore-rust-version` | Fail MSRV check after adding `rust-version` to Cargo.toml. Add `--ignore-rust-version` to legacy build lines only. |
| `StrictHostKeyChecking=no` for Pluto SSH | Replaced with pinned known-hosts at `/opt/gururmm/pluto_known_hosts`. MITM would compromise build artifacts. |
| CRLF line endings in migration files | sqlx SHA-384 checksum mismatch causes server crash on start. `.gitattributes` + `core.autocrlf=false` + pre-commit hook prevents this. |
| Dead WebSocket write half | WS write fails, send task dies, receive loop keeps agent in `ConnectedAgents` with dead write half. Commands silently fail. Fix: `tokio::select!` monitoring both tasks. |
**Good patterns:**
- **Platform parity rule** — any agent feature goes on Windows + Linux + macOS in the same commit. If a real implementation isn't feasible, add a working stub + `// TODO(platform): <os> — <reason>`. No silent no-ops.
- **Per-platform last-built-commit tracking** — Linux builds succeed and record progress independently of Windows builds.
- **Holistic feature development** — every feature ships backend + API + dashboard UI + docs together. Backend-only features are rejected.
- **sqlx offline mode** — compile-time query validation requires DB reachable or offline cache present.
- **`RuntimeDirectory=gururmm` in systemd unit** — systemd-native way to give agent writable `/run/gururmm/` for IPC socket.
- **Registry-first path resolution** — read `HKLM:\SOFTWARE\GuruRMM` for install dir, fall back to service PathName, then hardcoded default.
- **`interrupt_running_commands()` at reconnect** — flips all `status='running'` commands for reconnecting agent to `status='interrupted'`.
### Build & Deploy
**CRITICAL: Never trigger builds manually via SSH. All builds go through the webhook pipeline.**
```
Gitea push to main
-> webhook-handler.py (172.16.3.30:9000, parallel threads per platform)
-> build-shared.sh (auto-version bump, git sync — runs once)
-> build-linux.sh (cargo build on .30; log: /var/log/gururmm-build-linux.log)
-> build-windows.sh (SSH -> Pluto 172.16.3.36 via pinned known-hosts
cargo build --release x64 MSVC + i686 MSVC
+1.77 legacy builds with --ignore-rust-version
WiX MSI build for site-specific base
sign-windows.sh (jsign + Azure Trusted Signing)
SCP artifacts back; log: /var/log/gururmm-build-windows.log)
-> build-mac.sh (stub — no build machine configured yet)
-> artifacts -> /var/www/gururmm/downloads/ with sha256 + -latest symlinks
-> per-platform last-built-commit files updated
-> systemctl restart gururmm-agent (local agent on .30)
```
**Auto-version:** `build-shared.sh` diffs `agent/`, `server/`, `dashboard/` against last built SHA. For each changed component, bumps patch version in `Cargo.toml` or `package.json`, commits `[ci-version-bump]`, pushes. Webhook skips builds where all commits are version bumps.
**Server code changes** — separate manual step, NOT in agent pipeline:
```bash
sudo /opt/gururmm/build-server.sh
```
**Dashboard deploy** — also separate:
```bash
cd /home/guru/gururmm/dashboard && sudo -u guru npm run build
sudo rsync -av --delete /home/guru/gururmm/dashboard/dist/ /var/www/gururmm/dashboard/
```
**DB migrations** — manual; must insert SHA-384 checksum into `_sqlx_migrations` or server crashes on start.
**Pluto (172.16.3.36):**
- Windows Server 2019 VM on Jupiter (Unraid)
- SSH: `ssh -o UserKnownHostsFile=/opt/gururmm/pluto_known_hosts Administrator@172.16.3.36`
- Rust stable 1.95.0 + 1.77 pinned for legacy builds
- VS Build Tools (MSVC), sccache at `C:\sccache`, WiX v4, Gitea clone at `C:\gururmm\`
**Auto-update delivery:**
- Server scans every 300s; dispatches update command on agent heartbeat
- Gated on effective policy `auto_update` (default on when policy is null)
- Agent: downloads to PrivateTmp, verifies SHA-256, replaces binary, restarts service
- Force-trigger: `POST /api/agents/:id/update`
---
## Active State
**Fleet (as of 2026-05-24, live API verified):**
- 55 enrolled agents total; 37 online
- 40/55 on 0.6.38 (current). 15 laggards — all offline; will self-update on reconnect.
- Laggards by version: 6× v0.6.27, 4× v0.6.3, 2× v0.6.29, 1× v0.6.28, 1× v0.6.2, 1× v0.6.1 (Mikes-MacBook-Air.local — macOS, significant lag)
- "Saturn" agent not present in API as of 2026-05-24 — concern resolved or entry was removed.
**Enrolled clients/sites (live API, 2026-05-24):**
| Client | Type | Sites | Notable agents |
|---|---|---|---|
| AZ Computer Guru (internal) | Internal | DF Server Storage, Howard-VM, Main Office, Mike's Car, Mikes House | Jupiter, PLUTO, gururmm, GURU-KALI, GURU-5070, Mikes-MacBook-Air.local, ACG-DC16, NEPTUNE, ix.azcomputerguru.com |
| BirthBiologic | Corporate | Main Office | BB-SERVER |
| Cascades of Tucson | Corporate | CascadesTucson | 27 agents — CS-SERVER, RECEPTIONIST-PC, ASSISTMAN-PC, MDIRECTOR-PC, MEMRECEPT-PC, and ~22 others |
| Dataforth Corp | Corporate | D1 | AD2, DF-GAGETRAK |
| Grabb & Durando Law Office | Corporate | Main Office | GND-SERVER |
| Instrumental Music Center | Corporate | IMCMain | IMC1 |
| Key, Paul | Residential | Home | KEY-MEDIA |
| Peaceful Spirit | Residential | Bridgette Home, Country Club, Mara Home | BridgettePSHomeComputer, PST-SERVER, PST-SURFACE, Maras-HP-Laptop, MaraHomeNew |
| Safesite | Corporate | Glendale | MSI |
| Sombra Residential LLC | Corporate | main office | DESKTOP-UQRN4K3, Server2013 |
| Stamback Septic | Corporate | StambackSeptic | DESKTOP-BTR2AM3, StambackLaptopNew |
| Swanson, Len | Residential | Home | LAS-GAMER |
**API auth:**
- `POST /api/auth/login` → JWT (~24h)
- Creds: vault `infrastructure/gururmm-server.sops.yaml``credentials.gururmm-api.admin-email` / `admin-password`
- Key endpoints: `GET /api/agents`, `POST /api/agents/:id/command`, `GET /api/commands/:id`, `POST /api/agents/:id/update`
- Command fields: `command_type` (`shell`/`powershell`/`python`/`script`/`claude_task`), `command` (script text, JSON-encoded), optional `context`**`system`** (default; Session 0 / SYSTEM) or **`user_session`** (runs in the logged-on user's desktop session via WTS token impersonation; Windows-only, needs an active session), plus `timeout_seconds`/`elevated`. The agent does NOT run everything as LocalSystem — `user_session` is the per-user path (migration `041_add_command_context`, `agent/src/watchdog/wts.rs`).
- Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)
**Dashboard — complete and working:**
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card.
**Dashboard — incomplete (see UI_GAPS.md):**
- ~~Temperature monitoring (BUG-001)~~ — RESOLVED: agent-side collection is wired (LibreHardwareMonitor/WMI on Windows, /sys/class/thermal on Linux, sysinfo fallback). Roadmap BUG-001 text is stale.
- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
- Watchdog alerts UI — blocked on 2 missing server routes
- MSPBackups management UI — backend complete, no frontend
- Organizations management UI — multi-tenancy backend done, no frontend
- Tunnel session management (interactive terminal — backend skeleton, not production-ready)
**Open Gitea issues:**
- #15 — Pipeline tray build (publish tray binary to downloads)
- #16 — Windows IPC peer authz
- #17 — logind console user resolution
- #18 — macOS tray
- #19 — subscriber broadcast
**Security backlog (HIGH):**
- `credentials/:id/reveal` — horizontal privilege escalation (no ownership scope check)
- `internal_err()` — ~130 call sites returning raw DB errors to callers
---
## Key Architecture Decisions (LOCKED)
These decisions are locked. Do not reverse without explicit user approval.
1. **Per-agent enrollment keys** — MSI contains server URL + site_id only. Agent calls `POST /api/enroll` on first run; server issues unique per-agent key stored hashed. Enables revocation, clone detection, audit trail.
2. **Site-specific MSI generation** — Universal base MSI from CI; dashboard endpoint generates site-specific MSI with site_id baked in via WiX property → `HKLM\SOFTWARE\GuruRMM\SiteId`.
3. **No TOML/config for endpoints** — Server URL compiled into binary. No runtime config files for server URL or site_id.
4. **Policy inheritance chain** — global → site → client → agent. Server computes merged effective policy and pushes via `ConfigUpdate` WebSocket message.
5. **Platform parity rule** — Any agent feature ships on Windows, Linux, and macOS in the same change. Stub + TODO required if a real implementation is not yet feasible.
6. **Watchdog as separate process** — Main agent cannot reliably restart itself after a crash.
7. **Build pipeline is the only path to production** — Enforces signing, checksum generation, consistent artifact layout.
8. **Multi-tenancy identity model (ADR-001)** — Dev team with partner impersonation. Three levels: Dev → Partner → Client. Computer Guru is partner #1.
9. **Holistic feature development (DESIGN.md)** — Every feature requires backend + API + dashboard UI + documentation. Backend-only features are rejected.
10. **AI-optional operation** — GuruRMM must be fully functional without AI. AI features are enhancements, not requirements.
---
## History Highlights
| Date | Event |
|---|---|
| 2025-12-15 | Project genesis: Windows service + Linux installer + site code auth + build server. DB migrated from Jupiter Docker to local PostgreSQL. |
| 2026-04-19 | Full drill-down navigation, auto-install on first run, Pluto build VM setup started. |
| 2026-04-21 | MSI build fix (missing WiX extension flag). DESIGN.md created (holistic development mandate). BirthBiologic onboarded. |
| 2026-04-29 | UI_GAPS.md created. Holistic development principle formalized. |
| 2026-05-12 | macOS agent Phase 1 deployed from Mikes-MacBook-Air. Code signing issue on Apple Silicon noted. |
| 2026-05-15 | Dead WebSocket write-half bug fixed. Temperature struct field name mismatch fixed. |
| 2026-05-16 | Watchdog bugs fixed (sc.exe fallback, suppress_until, hypervisor detection). /feature-request skill created. |
| 2026-05-17 | Syncro PSA Integration added to roadmap (P1) after Howard /feature-request. Office power failure recovery — all VMs recovered. |
| 2026-05-18 | Multi-tenancy architecture (ADR-001) decided. 5 SPEC documents created (SPEC-001 through SPEC-006). |
| 2026-05-19 | 4-bug fix for AD2 crash loop. MSP360 backup integration completed (6 fixes). Clickable CPU/Memory gauge cards + process drill-down modal. |
| 2026-05-23 | /rmm-audit pass. Agent optimization Phases 1A-3. Auto-version bump mechanism. MSRV bumped to 1.85. Fleet at 0.6.29. |
| 2026-05-24 | Linux tray IPC + GTK (PR #13+#14) and peer-cred authz (PR #14) merged. PR #21 (ReadWritePaths fix) merged. Build pipeline split into per-platform scripts. Pluto known-hosts pinned. Fleet converged to 0.6.38. |
---
## Compilation Notes
- **2026-05-26 recompile:** Added the Capabilities / Feature Set section, synthesized from authoritative artifacts at live `main` (cd27a59) — API route modules, agent source tree, 46 migrations, roadmap, and the feat/perf commit log — NOT from session logs. This was prompted by the prior seeding missing the `user_session` command context entirely (it had only ever stated "runs as LocalSystem"). Corrected: command execution contexts, temperature monitoring (BUG-001 is resolved, not pending), Entra-only SSO, and added user-inventory/discovery/VM-detection/safe-rollout surfaces. **Changelogs are an unreliable capability source here** — committed changelogs stop at agent v0.6.22 while the fleet runs 0.6.39+; migrations + commit log are authoritative.
- Tunnel subsystem (verified against live main): agent side substantially built; server side is a dead-code skeleton (not declared in `api/mod.rs`, no routes, WS handler logs "not yet implemented"). Confirmed, not unverified.
- macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). `build-mac.sh` is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified]
- Tunnel subsystem: agent-side substantially complete; server-side is dead-code skeleton. Current live status unconfirmed. [unverified]
- Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]
- Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
## Backlinks
- [[clients/cascades-tucson]] — RECEPTIONIST-PC enrolled (site CascadesTucson)
- [[systems/gururmm-build]] — Linux VM at 172.16.3.30 on Jupiter; GuruRMM API 3001, ClaudeTools API 8001, Coord API, MariaDB, PostgreSQL, build pipeline; originally a container on Jupiter, migrated to own VM
- [[systems/jupiter]] — Unraid host at 172.16.3.20; virsh host for all VMs (GuruRMM VM, Unifi, OwnCloud, Pluto/Claude-Builder); Docker: Gitea port 3000, NPM, Seafile; iptables PREROUTING routes :443 to NPM (NPM proxy `rmm-api -> 172.16.3.20:3001` in credentials.md is STALE — actual GuruRMM API is on 172.16.3.30)
- [[systems/pluto]] — Windows build server (MSI, WiX) at 172.16.3.36