wiki: add capability synthesis to wiki-compile; recompile GuruRMM

Skill + template: - wiki-compile Phase 2P: type-aware authoritative-artifact discovery for projects (migrations, API routes, agent modules, roadmap-done, commit log), with a stale-submodule guard that reads origin/main when the pinned submodule lags. Changelogs treated as incomplete, not authoritative. - project template: add a Capabilities / Feature Set section. GuruRMM recompile (from live main artifacts, not session logs): - Added Capabilities / Feature Set section covering monitoring, remote execution (incl. system vs user_session contexts), inventory/discovery, update mgmt, policy, alerting/watchdog, backup, tunnel, identity/security. - Fixed the misleading "runs as LocalSystem" command-fields line (the gap that started this) and the stale BUG-001 temperature claim (now shipped). - Qualified Entra-only SSO; noted safe-rollout is unwired scaffolding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 18:16:03 -07:00
parent 28e9ecd650
commit d4eb8358ce
3 changed files with 119 additions and 7 deletions
--- a/wiki/projects/gururmm.md
+++ b/wiki/projects/gururmm.md
@@ -2,9 +2,14 @@
 type: project
 name: gururmm
 display_name: GuruRMM
-last_compiled: 2026-05-24
-compiled_by: DESKTOP-0O8A1RL/claude-main
+last_compiled: 2026-05-26
+compiled_by: GURU-5070/claude-main
 sources:
+  - "gururmm@main: server/src/api/*.rs (REST API surface, ~30 route modules)"
+  - "gururmm@main: agent/src/ (agent capabilities; transport/CommandContext, ohw.rs, watchdog/wts.rs)"
+  - "gururmm@main: server/migrations/*.sql (46 migrations — feature checkpoints, incl. 041_add_command_context)"
+  - "gururmm@main: docs/FEATURE_ROADMAP.md, docs/specs/"
+  - "gururmm@main: git log feat/perf history (changelogs incomplete past v0.6.22)"
  - projects/msp-tools/guru-rmm/CONTEXT.md
  - projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
  - projects/msp-tools/guru-rmm/docs/UI_GAPS.md
@@ -46,7 +51,7 @@ backlinks:

 GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.

-**Current version:** 0.6.38 (as of 2026-05-24; fleet converged within ~10 minutes of publish)
+**Current version:** agent 0.6.39 / 0.6.41 in fleet as of 2026-05-26 (server v0.3.x). Note: committed changelogs are stale (stop at agent v0.6.22 / server v0.3.1) — migrations + commit log are the authoritative feature record, not changelogs.

 **Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a stale reference submodule — do NOT develop there; all real work happens in the Gitea repo.

@@ -54,6 +59,65 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G

 ---

+## Capabilities / Feature Set
+
+*Synthesized from authoritative artifacts (API routes, agent modules, 46 migrations, roadmap, commit log) at live `main` — not from session logs. See Compilation Notes.*
+
+Agent↔server communication is a persistent authenticated WebSocket with auto-reconnect + heartbeat; on reconnect, in-flight commands flip to `interrupted`. Platform-parity rule: agent features ship on Windows/Linux/macOS in the same change (stub + TODO where a real impl isn't yet feasible).
+
+### Monitoring & Telemetry
+- Core metrics per interval (policy-tunable per section): CPU %, memory %/bytes, disk %/bytes, network rx/tx deltas, uptime, logged-in user, user idle time (Win `GetLastInputInfo`, Linux `xprintidle`), public/WAN IP (cached, multi-service fallback). Cross-platform via `sysinfo`.
+- Hardware sensor telemetry: CPU/GPU temps + full sensor array (temperature, voltage, fan RPM, power). Windows via bundled **LibreHardwareMonitor** + WMI (`ohw.rs`); Linux via `/sys/class/thermal`; `sysinfo::Components` fallback.
+- Process drill-down: top 10 by CPU + top 10 by memory in every metrics payload (cross-platform).
+- Network state: per-interface IPv4/IPv6, MAC, derived CIDR subnets — sent on change only.
+
+### Remote Execution
+- Command types: `shell`, `powershell`, `python`, raw `script{interpreter}`, and `claude_task`. Options: `timeout_seconds`, `elevated`.
+- **Execution context** (`041_add_command_context`): `system` (default — Session 0 / service SYSTEM) **or `user_session`** (runs in the active logged-on user's desktop session via WTS token impersonation: `WTSQueryUserToken` + `CreateProcessAsUserW` + per-user env block). Windows-only; requires an active session.
+- Commands individually cancellable (`POST /commands/:id/cancel`) and aborted on disconnect. Auditable history; status running/completed/failed/timeout/interrupted.
+- Script library (`017_scripts`): stored scripts dispatched to agents with args/env/timeout/`run_as_user`; per-run history.
+
+### Inventory & Discovery
+- Hardware inventory (mfr/model/serial/BIOS, CPU, memory, disks, NICs, OS), software inventory (installed apps), service inventory. On-demand refresh.
+- VM / hypervisor / container detection (`032/033`): `is_virtual_machine`, `hypervisor_type`, `vm_uuid`, `is_hypervisor` + hosted VM UUIDs, `is_container`, `is_unraid`.
+- User/group inventory (`037`–`040`): local + domain + Azure AD accounts (enabled, pw-never-expires, last-logon, is_admin, AD email/UPN/dept), domain-join classification (none/ad/aad/hybrid), domain name, M365 tenant ID, **domain-controller detection (`is_dc`)**, group membership. Policy-scheduled (default 24h).
+- Network discovery: server-dispatched scans — TCP probes over configurable ranges/ports, ARP MAC, reverse DNS, basic OS fingerprint; devices stream back and are persisted/editable.
+
+### Patch / Agent Update Management
+- Self-updater: server sends version + URL + SHA256; agent downloads, verifies checksum, atomically swaps binary, restarts, and **auto-rolls back** to a kept backup if it fails to reconnect (~180s window).
+- Auto-update gated on effective policy `auto_update` (channel + defer_hours; maintenance-window field received but not yet enforced [verify]). Force via `POST /agents/:id/update`. Update channels at agent/site/client (`026`).
+- Safe-rollout (`046`): `update_rollouts`/health-metrics/events tables + `/updates/rollouts` promote/rollback. **Scaffolding only — promotion is manual; health-gated automation is written-but-unwired (Phase 2).**
+
+### Policy & Configuration Management
+- Inheritance chain global → client → site → agent; server computes merged effective policy, pushes via `ConfigUpdate`. Effective policy queryable per scope.
+- Checks engine (`018`/`019`): cpu, memory, disk, ping, port, script, service (restart-if-stopped, pass-if-not-exist; Win `sc.exe`, Linux `systemctl`). Policy-attached check templates (`024`) with push-to-agent sync. On-demand `run-checks`.
+- Remote registry (Windows, `winreg`): agent supports enumerate/read/write (typed)/create/delete. **HTTP API currently exposes read-only (enumerate, read_value); write paths exist in the agent but aren't routed yet [verify].**
+
+### Alerting & Watchdog
+- Threshold alerts (ack/resolve, per-agent + fleet summary, dashboard filter). Alert templates (`022`) with effective resolution; per-client email settings (`020`). Maintenance mode (`021`) to suppress alerting per scope.
+- Watchdog: **separate** supervising process (polls `GuruRMMAgent` every 30s, restart backoff, alert after 3 fails) + launches/reaps the tray into active user sessions via WTS. Full alert CRUD + ack/resolve.
+
+### Backup Integration (MSP360 / MSPBackups)
+- Multi-provider config (`034`/`035`) with connection test, scheduled sync, per-agent + all-providers status, fleet coverage report, and agent↔MSP360 mapping (`044`) with confidence scoring + manual verification.
+
+### Remote Access (Tunnel)
+- Agent side substantially built (`TunnelManager` state machine; Open/Close/Data). **Server side is a dead-code skeleton — not declared in `api/mod.rs`, no `/tunnel` routes, WS handler logs "not yet implemented." Not production-ready.**
+
+### Identity / Multi-tenancy / Security
+- Auth: JWT (login/register/me); agents auth over WS via per-agent API key + hardware device_id.
+- **Microsoft Entra ID SSO** (OAuth2/OIDC + PKCE), gated on server config. Multi-provider incl. Google is spec'd (SPEC-008) but **Google not implemented [verify]**.
+- Organizations / multi-tenancy: org CRUD, per-org membership + roles, limits, dev-admin **user impersonation** (`/auth/impersonate/:id`). Backend present; dashboard UI partial.
+- Encrypted credentials vault (`016`): scoped global/client/site, typed (password, SSH key, SNMP), metadata-only by default with separate `/reveal` decrypt endpoint (known HIGH item: `/reveal` ownership-scope check — [verify current state]).
+- Enrollment & keys (`012`): per-agent key issuance on first run, site API keys (regenerable), site-specific MSI with SITEKEY injected at download, public install-report ingestion. Legacy PowerShell agent path for Server 2008 R2.
+- Logs: agent log upload (periodic + on-demand), per-agent events (`042`), fleet log view, AI-assisted log analysis (`/logs/analyze`) — AI-optional per locked decision.
+
+### Platform coverage
+- **Cross-platform (Win/Linux/macOS):** metrics, network state, hardware/software/service inventory, user/group + DC/domain detection, checks (service checks Win+Linux), discovery, self-update, scripts/commands.
+- **Windows-only:** `user_session` command context (WTS), LibreHardwareMonitor temps, remote registry, tray-into-session, watchdog SCM supervision.
+- **macOS:** agent deployed (Phase 1); tray is a stub; automated Mac build pipeline is an intentional stub (no build host) [verify before claiming CI Mac releases].
+
+---
+
 ## Architecture

 ### Components
@@ -104,7 +168,7 @@ gururmm/
 │   └── src/
 │       ├── ipc.rs      Unix socket IPC (Linux); Windows named pipe
 │       ├── tunnel/     TunnelManager state machine
-│       ├── metrics/    sysinfo-based collection (temp NOT yet wired — BUG-001)
+│       ├── metrics/    sysinfo collection + temps (LibreHardwareMonitor/WMI on Win, /sys/class/thermal on Linux) — BUG-001 resolved
 │       ├── registry_ops/ Windows registry read/write
 │       ├── updater/    Self-update handler
 │       └── main.rs     systemd unit template generation
@@ -249,14 +313,14 @@ sudo rsync -av --delete /home/guru/gururmm/dashboard/dist/ /var/www/gururmm/dash
 - `POST /api/auth/login` → JWT (~24h)
 - Creds: vault `infrastructure/gururmm-server.sops.yaml` → `credentials.gururmm-api.admin-email` / `admin-password`
 - Key endpoints: `GET /api/agents`, `POST /api/agents/:id/command`, `GET /api/commands/:id`, `POST /api/agents/:id/update`
- Command fields: `command_type` (powershell/shell/exec), `command` (script text, JSON-encoded). Windows agent runs as LocalSystem.
+- Command fields: `command_type` (`shell`/`powershell`/`python`/`script`/`claude_task`), `command` (script text, JSON-encoded), optional `context` — **`system`** (default; Session 0 / SYSTEM) or **`user_session`** (runs in the logged-on user's desktop session via WTS token impersonation; Windows-only, needs an active session), plus `timeout_seconds`/`elevated`. The agent does NOT run everything as LocalSystem — `user_session` is the per-user path (migration `041_add_command_context`, `agent/src/watchdog/wts.rs`).
 - Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)

 **Dashboard — complete and working:**
-Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO, Policies Dashboard (all tabs), Registry editor, MSP360 backup status card.
+Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card.

 **Dashboard — incomplete (see UI_GAPS.md):**
- Temperature monitoring (BUG-001) — UI ready, agent-side collection never wired
+- ~~Temperature monitoring (BUG-001)~~ — RESOLVED: agent-side collection is wired (LibreHardwareMonitor/WMI on Windows, /sys/class/thermal on Linux, sysinfo fallback). Roadmap BUG-001 text is stale.
 - Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
 - Watchdog alerts UI — blocked on 2 missing server routes
 - MSPBackups management UI — backend complete, no frontend
@@ -314,6 +378,8 @@ These decisions are locked. Do not reverse without explicit user approval.

 ## Compilation Notes

+- **2026-05-26 recompile:** Added the Capabilities / Feature Set section, synthesized from authoritative artifacts at live `main` (cd27a59) — API route modules, agent source tree, 46 migrations, roadmap, and the feat/perf commit log — NOT from session logs. This was prompted by the prior seeding missing the `user_session` command context entirely (it had only ever stated "runs as LocalSystem"). Corrected: command execution contexts, temperature monitoring (BUG-001 is resolved, not pending), Entra-only SSO, and added user-inventory/discovery/VM-detection/safe-rollout surfaces. **Changelogs are an unreliable capability source here** — committed changelogs stop at agent v0.6.22 while the fleet runs 0.6.39+; migrations + commit log are authoritative.
+- Tunnel subsystem (verified against live main): agent side substantially built; server side is a dead-code skeleton (not declared in `api/mod.rs`, no routes, WS handler logs "not yet implemented"). Confirmed, not unverified.
 - macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). `build-mac.sh` is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified]
 - Tunnel subsystem: agent-side substantially complete; server-side is dead-code skeleton. Current live status unconfirmed. [unverified]
 - Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]