Files
claudetools/wiki/projects/gururmm.md
Mike Swanson cd80f5e447 feat: add wiki knowledge layer (Phase 0 + Phase 1 seed)
Implements LLM-compiled wiki layer between raw session logs and live
CONTEXT.md, inspired by Karpathy's knowledge base workflow. Adds wiki/
directory structure, article templates, spec docs, and seeds first two
articles (Cascades of Tucson, GuruRMM) from 60+ session logs.

Updates CLAUDE.md to check wiki first on all context-loading triggers.
Captures verified ACG IP/hostname map and Neptune physical-location
clarification (Dataforth D2, subnet overlap TODO) in memory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 15:42:38 -07:00

316 lines
21 KiB
Markdown

---
type: project
name: gururmm
display_name: GuruRMM
last_compiled: 2026-05-24
compiled_by: DESKTOP-0O8A1RL/claude-main
sources:
- projects/msp-tools/guru-rmm/CONTEXT.md
- projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
- projects/msp-tools/guru-rmm/docs/UI_GAPS.md
- projects/msp-tools/guru-rmm/docs/ARCHITECTURE_DECISIONS.md
- projects/msp-tools/guru-rmm/docs/tech-stack.md
- projects/msp-tools/guru-rmm/docs/DESIGN.md
- .claude/memory/reference_gururmm_server.md
- .claude/memory/reference_gururmm_api.md
- .claude/memory/gururmm-development-principles.md
- .claude/memory/feedback_gururmm_agent_parity.md
- .claude/memory/reference_pluto_build_server.md
- .claude/memory/project_mac_gururmm_setup_pending.md
- credentials.md
- session-logs/2025-12-15-session.md
- session-logs/2025-12-20-session.md
- session-logs/2026-04-19-session.md
- session-logs/2026-04-21-session.md
- session-logs/2026-04-29-session.md
- session-logs/2026-05-12-guru-rmm-macos-agent-phase1.md
- session-logs/2026-05-15-session.md
- session-logs/2026-05-16-session.md
- session-logs/2026-05-17-session.md
- session-logs/2026-05-19-gururmm-backup-fixes.md
- session-logs/2026-05-19-session.md
- session-logs/2026-05-21-session.md
- session-logs/2026-05-23-session.md
- session-logs/2026-05-24-session.md
- session-logs/2026-05-24-GURU-KALI-session.md
backlinks:
- clients/cascades-tucson
- systems/gururmm-build
- systems/jupiter
- systems/pluto
---
# GuruRMM
## Summary
GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.
**Current version:** 0.6.38 (as of 2026-05-24; fleet converged within ~10 minutes of publish)
**Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a stale reference submodule — do NOT develop there; all real work happens in the Gitea repo.
**Goal:** Full-featured MSP platform rivaling commercial RMMs, with a companion PSA (GuruPSA, separate future repo) designed as a truly integrated unified system — not bolted-together products.
---
## Architecture
### Components
| Component | Location | Tech | State |
|---|---|---|---|
| Server | 172.16.3.30:3001, systemd `gururmm-server`, binary `/usr/local/bin/gururmm-server` | Rust, Axum | deployed, production |
| Dashboard | https://rmm.azcomputerguru.com, nginx at `/var/www/gururmm/dashboard/` | React + TypeScript + Vite, shadcn/ui, Tailwind CSS v4 | deployed, production |
| Agent (Windows) | Endpoints, installed as `GuruRMMAgent` Windows service via WiX MSI | Rust, Windows MSVC | deployed, fleet on 0.6.38 |
| Agent (Linux) | Endpoints, systemd `gururmm-agent`, binary `/usr/local/bin/gururmm-agent` | Rust, musl static | deployed |
| Agent (macOS) | Endpoints, LaunchDaemon `com.azcomputerguru.gururmm-agent.plist` | Rust, aarch64/x86_64 | Phase 1 deployed 2026-05-12; code signing issue on Apple Silicon |
| Tray (Windows) | System tray, named pipe IPC | Rust | deployed |
| Tray (Linux) | System tray, Unix socket IPC, libappindicator/GTK | Rust, GTK | deployed 2026-05-24 (PR #13+#14 merged) |
| Tray (macOS) | Menu bar | Rust | stub/TODO (issue #18) |
| PostgreSQL DB | localhost:5432 on 172.16.3.30, database `gururmm` | PostgreSQL | deployed |
| Coord API | 172.16.3.30:8001/api/coord | FastAPI (part of ClaudeTools API) | deployed |
| Build pipeline | 172.16.3.30:9000 webhook + `/opt/gururmm/` scripts | Python (webhook-handler.py), Bash | deployed; split into per-platform scripts 2026-05-24 |
| Pluto (Windows build VM) | 172.16.3.36, Windows Server 2019 VM on Jupiter (Unraid) | Rust MSVC, WiX v4 | operational |
### Key Files & Repos
- **Active repo:** `azcomputerguru/gururmm` — http://172.16.3.20:3000/azcomputerguru/gururmm
- **Reference clone:** `D:\claudetools\projects\msp-tools\guru-rmm` — stale submodule, do not develop here
- **Server binary:** `/usr/local/bin/gururmm-server` on 172.16.3.30
- **Agent binary (Linux):** `/usr/local/bin/gururmm-agent`
- **Agent config (Linux/macOS):** `/etc/gururmm/agent.toml` (root, mode 600); macOS uses `/usr/local/etc/gururmm/site.plist`
- **Agent registry (Windows):** `HKLM\SOFTWARE\GuruRMM\SiteId` (written by MSI)
- **Windows service name:** `GuruRMMAgent` (NOT `gururmm-agent`)
- **Downloads dir:** `/var/www/gururmm/downloads/` on 172.16.3.30
- **Webhook handler:** `/opt/gururmm/webhook-handler.py` (port 9000, systemd `gururmm-webhook`)
- **Build scripts:** `/opt/gururmm/build-shared.sh`, `build-linux.sh`, `build-windows.sh`, `build-mac.sh` (split 2026-05-24; `build-agents.sh` is now a compat wrapper)
- **Server build script:** `/opt/gururmm/build-server.sh` (separate pipeline — manual trigger required for server code changes)
- **Per-platform SHA tracking:** `/opt/gururmm/last-built-commit-{linux,windows,mac}`
- **Pluto known-hosts:** `/opt/gururmm/pluto_known_hosts` (pinned SSH keys; installed 2026-05-24)
- **Build log (Linux):** `/var/log/gururmm-build-linux.log`
- **Build log (Windows):** `/var/log/gururmm-build-windows.log`
- **API (internal):** http://172.16.3.30:3001
- **API (external):** https://rmm-api.azcomputerguru.com (Cloudflare)
- **Dashboard:** https://rmm.azcomputerguru.com
- **DB URL:** `postgres://gururmm:43617ebf7eb242e814ca9988cc4df5ad@localhost:5432/gururmm`
- **Vault path:** `infrastructure/gururmm-server.sops.yaml`
### Repo Structure
```
gururmm/
├── agent/ Rust agent (managed endpoints)
│ └── src/
│ ├── ipc.rs Unix socket IPC (Linux); Windows named pipe
│ ├── tunnel/ TunnelManager state machine
│ ├── metrics/ sysinfo-based collection (temp NOT yet wired — BUG-001)
│ ├── registry_ops/ Windows registry read/write
│ ├── updater/ Self-update handler
│ └── main.rs systemd unit template generation
├── server/ Rust/Axum API server
│ └── src/
│ ├── api/ REST handlers
│ ├── db/ Database layer (sqlx)
│ ├── ws/ WebSocket handler
│ └── mspbackups/ MSP360 backup integration
├── tray/ System tray binary
├── installer/ WiX v4 MSI (gururmm-agent.wxs)
├── scripts/ Build/ops scripts
└── docs/ FEATURE_ROADMAP.md, UI_GAPS.md, ARCHITECTURE_DECISIONS.md, tech-stack.md, DESIGN.md, specs/
```
---
## Development
### Current Focus
As of 2026-05-24 (v0.6.38):
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
- **Agent self-update hardening** — ProtectSystem=strict needs `ReadWritePaths=/var/log /usr/local/bin /etc/gururmm` and `RuntimeDirectory=gururmm`. Fixed in PR #21.
- **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
- **Watchdog alerts UI** — backend complete but `PUT /watchdog-alerts/:id/resolve` and `DELETE /watchdog-alerts/:id` routes missing on server (found in 2026-05-23 audit).
- **MSP360 backup integration** — Phase 1 complete (monitoring, alerts, mapping, storage thresholds). Phase 2 (management) not started.
- **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH).
### Patterns & Anti-Patterns
**Anti-patterns — never repeat:**
| Pattern | What Went Wrong |
|---|---|
| `useMemo` with stable deps for data-dependent values | queryClient is stable, memo never recomputes after queries resolve. Use `useQuery` instead. |
| CSS variable text colors inside the sidebar | Sidebar bg is hardcoded dark; CSS vars flip in light mode. Use `text-white` explicitly inside sidebar. |
| Deploying without stopping the server first | "text file busy" kernel error. Always `systemctl stop` before `cp`. |
| Building without `DATABASE_URL` | sqlx compile-time macros fail. `DATABASE_URL` is in `/home/guru/.cargo/env`. |
| DB migrations without inserting into `_sqlx_migrations` | Server crashes on start. Must insert SHA-384 checksum manually. |
| WiX MSI builds on Linux | WiX requires `msi.dll`. MSI must be built on Pluto (Windows). |
| Manual builds via SSH | All builds go through `webhook-handler.py`. Never SSH and run `cargo build` + artifact copy manually. |
| TOML/config for agent endpoint or site_id | Server URL compiled into binary, site_id baked into MSI. No runtime config files for these values. |
| `path.find('\\')` in `#[cfg(windows)]` files | Compiles on Linux silently, fails on Pluto MSVC with unterminated char literal. Use `'\\\\'`. |
| `STATUS_BADGE_CLASSES` Record const | Vite/Rollup may optimize away the lookup. Use explicit `getStatusBadgeClass()` if/else function. |
| SSH heredoc for TypeScript edits | Shell strips double-quote characters. Edit locally in submodule, push to Gitea, pull on server. |
| `Restart-Service GuruRMMAgent -Force` in command scripts | Kills agent before it can report result. Commands stay forever `running`. Use scheduled task with delay instead. |
| `sudo -u guru git` in systemd build context | git rejects repo as dubious ownership when running as root on guru-owned repo. Use `safe.directory` config or `sudo -u guru git`. |
| Self-updating running bash script | bash reads line-by-line from disk; replacing mid-execution silently skips remaining blocks. |
| `+1.77` legacy builds without `--ignore-rust-version` | Fail MSRV check after adding `rust-version` to Cargo.toml. Add `--ignore-rust-version` to legacy build lines only. |
| `StrictHostKeyChecking=no` for Pluto SSH | Replaced with pinned known-hosts at `/opt/gururmm/pluto_known_hosts`. MITM would compromise build artifacts. |
| CRLF line endings in migration files | sqlx SHA-384 checksum mismatch causes server crash on start. `.gitattributes` + `core.autocrlf=false` + pre-commit hook prevents this. |
| Dead WebSocket write half | WS write fails, send task dies, receive loop keeps agent in `ConnectedAgents` with dead write half. Commands silently fail. Fix: `tokio::select!` monitoring both tasks. |
**Good patterns:**
- **Platform parity rule** — any agent feature goes on Windows + Linux + macOS in the same commit. If a real implementation isn't feasible, add a working stub + `// TODO(platform): <os> — <reason>`. No silent no-ops.
- **Per-platform last-built-commit tracking** — Linux builds succeed and record progress independently of Windows builds.
- **Holistic feature development** — every feature ships backend + API + dashboard UI + docs together. Backend-only features are rejected.
- **sqlx offline mode** — compile-time query validation requires DB reachable or offline cache present.
- **`RuntimeDirectory=gururmm` in systemd unit** — systemd-native way to give agent writable `/run/gururmm/` for IPC socket.
- **Registry-first path resolution** — read `HKLM:\SOFTWARE\GuruRMM` for install dir, fall back to service PathName, then hardcoded default.
- **`interrupt_running_commands()` at reconnect** — flips all `status='running'` commands for reconnecting agent to `status='interrupted'`.
### Build & Deploy
**CRITICAL: Never trigger builds manually via SSH. All builds go through the webhook pipeline.**
```
Gitea push to main
-> webhook-handler.py (172.16.3.30:9000, parallel threads per platform)
-> build-shared.sh (auto-version bump, git sync — runs once)
-> build-linux.sh (cargo build on .30; log: /var/log/gururmm-build-linux.log)
-> build-windows.sh (SSH -> Pluto 172.16.3.36 via pinned known-hosts
cargo build --release x64 MSVC + i686 MSVC
+1.77 legacy builds with --ignore-rust-version
WiX MSI build for site-specific base
sign-windows.sh (jsign + Azure Trusted Signing)
SCP artifacts back; log: /var/log/gururmm-build-windows.log)
-> build-mac.sh (stub — no build machine configured yet)
-> artifacts -> /var/www/gururmm/downloads/ with sha256 + -latest symlinks
-> per-platform last-built-commit files updated
-> systemctl restart gururmm-agent (local agent on .30)
```
**Auto-version:** `build-shared.sh` diffs `agent/`, `server/`, `dashboard/` against last built SHA. For each changed component, bumps patch version in `Cargo.toml` or `package.json`, commits `[ci-version-bump]`, pushes. Webhook skips builds where all commits are version bumps.
**Server code changes** — separate manual step, NOT in agent pipeline:
```bash
sudo /opt/gururmm/build-server.sh
```
**Dashboard deploy** — also separate:
```bash
cd /home/guru/gururmm/dashboard && sudo -u guru npm run build
sudo rsync -av --delete /home/guru/gururmm/dashboard/dist/ /var/www/gururmm/dashboard/
```
**DB migrations** — manual; must insert SHA-384 checksum into `_sqlx_migrations` or server crashes on start.
**Pluto (172.16.3.36):**
- Windows Server 2019 VM on Jupiter (Unraid)
- SSH: `ssh -o UserKnownHostsFile=/opt/gururmm/pluto_known_hosts Administrator@172.16.3.36`
- Rust stable 1.95.0 + 1.77 pinned for legacy builds
- VS Build Tools (MSVC), sccache at `C:\sccache`, WiX v4, Gitea clone at `C:\gururmm\`
**Auto-update delivery:**
- Server scans every 300s; dispatches update command on agent heartbeat
- Gated on effective policy `auto_update` (default on when policy is null)
- Agent: downloads to PrivateTmp, verifies SHA-256, replaces binary, restarts service
- Force-trigger: `POST /api/agents/:id/update`
---
## Active State
**Fleet (as of 2026-05-24 12:33 MST):**
- ~55 enrolled agents total; ~39 online
- 37 of 39 online agents on 0.6.38
- Laggards on 0.6.37: BB-SERVER (BirthBiologic, `6c02baa7-...`) and RECEPTIONIST-PC (Cascades of Tucson, `9c91d324-...`) — flaky WebSockets, miss dispatch windows. Force-update via API when their WS is up.
**Known enrolled clients/sites:**
- Cascades of Tucson — site CascadesTucson (RECEPTIONIST-PC, CS-SERVER)
- BirthBiologic — site BRIGHT-PEAK-5980
- Paul Key — site IRON-WOLF-5819
- ACG internal — AD2, DESKTOP-0O8A1RL, GURU-KALI, and one agent labeled "Saturn" [Saturn is DECOMMISSIONED as of Apr 2026; IP 172.16.3.21 reused by Uranus — this agent entry may be stale or may actually be Uranus]
**API auth:**
- `POST /api/auth/login` → JWT (~24h)
- Creds: vault `infrastructure/gururmm-server.sops.yaml``credentials.gururmm-api.admin-email` / `admin-password`
- Key endpoints: `GET /api/agents`, `POST /api/agents/:id/command`, `GET /api/commands/:id`, `POST /api/agents/:id/update`
- Command fields: `command_type` (powershell/shell/exec), `command` (script text, JSON-encoded). Windows agent runs as LocalSystem.
- Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)
**Dashboard — complete and working:**
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO, Policies Dashboard (all tabs), Registry editor, MSP360 backup status card.
**Dashboard — incomplete (see UI_GAPS.md):**
- Temperature monitoring (BUG-001) — UI ready, agent-side collection never wired
- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
- Watchdog alerts UI — blocked on 2 missing server routes
- MSPBackups management UI — backend complete, no frontend
- Organizations management UI — multi-tenancy backend done, no frontend
- Tunnel session management (interactive terminal — backend skeleton, not production-ready)
**Open Gitea issues:**
- #15 — Pipeline tray build (publish tray binary to downloads)
- #16 — Windows IPC peer authz
- #17 — logind console user resolution
- #18 — macOS tray
- #19 — subscriber broadcast
**Security backlog (HIGH):**
- `credentials/:id/reveal` — horizontal privilege escalation (no ownership scope check)
- `internal_err()` — ~130 call sites returning raw DB errors to callers
---
## Key Architecture Decisions (LOCKED)
These decisions are locked. Do not reverse without explicit user approval.
1. **Per-agent enrollment keys** — MSI contains server URL + site_id only. Agent calls `POST /api/enroll` on first run; server issues unique per-agent key stored hashed. Enables revocation, clone detection, audit trail.
2. **Site-specific MSI generation** — Universal base MSI from CI; dashboard endpoint generates site-specific MSI with site_id baked in via WiX property → `HKLM\SOFTWARE\GuruRMM\SiteId`.
3. **No TOML/config for endpoints** — Server URL compiled into binary. No runtime config files for server URL or site_id.
4. **Policy inheritance chain** — global → site → client → agent. Server computes merged effective policy and pushes via `ConfigUpdate` WebSocket message.
5. **Platform parity rule** — Any agent feature ships on Windows, Linux, and macOS in the same change. Stub + TODO required if a real implementation is not yet feasible.
6. **Watchdog as separate process** — Main agent cannot reliably restart itself after a crash.
7. **Build pipeline is the only path to production** — Enforces signing, checksum generation, consistent artifact layout.
8. **Multi-tenancy identity model (ADR-001)** — Dev team with partner impersonation. Three levels: Dev → Partner → Client. Computer Guru is partner #1.
9. **Holistic feature development (DESIGN.md)** — Every feature requires backend + API + dashboard UI + documentation. Backend-only features are rejected.
10. **AI-optional operation** — GuruRMM must be fully functional without AI. AI features are enhancements, not requirements.
---
## History Highlights
| Date | Event |
|---|---|
| 2025-12-15 | Project genesis: Windows service + Linux installer + site code auth + build server. DB migrated from Jupiter Docker to local PostgreSQL. |
| 2026-04-19 | Full drill-down navigation, auto-install on first run, Pluto build VM setup started. |
| 2026-04-21 | MSI build fix (missing WiX extension flag). DESIGN.md created (holistic development mandate). BirthBiologic onboarded. |
| 2026-04-29 | UI_GAPS.md created. Holistic development principle formalized. |
| 2026-05-12 | macOS agent Phase 1 deployed from Mikes-MacBook-Air. Code signing issue on Apple Silicon noted. |
| 2026-05-15 | Dead WebSocket write-half bug fixed. Temperature struct field name mismatch fixed. |
| 2026-05-16 | Watchdog bugs fixed (sc.exe fallback, suppress_until, hypervisor detection). /feature-request skill created. |
| 2026-05-17 | Syncro PSA Integration added to roadmap (P1) after Howard /feature-request. Office power failure recovery — all VMs recovered. |
| 2026-05-18 | Multi-tenancy architecture (ADR-001) decided. 5 SPEC documents created (SPEC-001 through SPEC-006). |
| 2026-05-19 | 4-bug fix for AD2 crash loop. MSP360 backup integration completed (6 fixes). Clickable CPU/Memory gauge cards + process drill-down modal. |
| 2026-05-23 | /rmm-audit pass. Agent optimization Phases 1A-3. Auto-version bump mechanism. MSRV bumped to 1.85. Fleet at 0.6.29. |
| 2026-05-24 | Linux tray IPC + GTK (PR #13+#14) and peer-cred authz (PR #14) merged. PR #21 (ReadWritePaths fix) merged. Build pipeline split into per-platform scripts. Pluto known-hosts pinned. Fleet converged to 0.6.38. |
---
## Compilation Notes
- macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). `build-mac.sh` is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified]
- Tunnel subsystem: agent-side substantially complete; server-side is dead-code skeleton. Current live status unconfirmed. [unverified]
- Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]
- Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
## Backlinks
- [[clients/cascades-tucson]] — RECEPTIONIST-PC enrolled (site CascadesTucson)
- [[systems/gururmm-build]] — Linux VM at 172.16.3.30 on Jupiter; GuruRMM API 3001, ClaudeTools API 8001, Coord API, MariaDB, PostgreSQL, build pipeline; originally a container on Jupiter, migrated to own VM
- [[systems/jupiter]] — Unraid host at 172.16.3.20; virsh host for all VMs (GuruRMM VM, Unifi, OwnCloud, Pluto/Claude-Builder); Docker: Gitea port 3000, NPM, Seafile; iptables PREROUTING routes :443 to NPM (NPM proxy `rmm-api -> 172.16.3.20:3001` in credentials.md is STALE — actual GuruRMM API is on 172.16.3.30)
- [[systems/pluto]] — Windows build server (MSI, WiX) at 172.16.3.36