# GuruRMM - Project Context **Last Updated:** 2026-04-15 **Status:** Active Development - Tunnel Phase 1 Verified Live; Phase 2 Unblocked ## Quick Start - Infrastructure Overview | Component | Location | Access | |-----------|----------|--------| | **Production Server** | 172.16.3.30 (gururmm) | SSH: op://Infrastructure/GuruRMM Server/username | | **Public API** | https://rmm-api.azcomputerguru.com | Via Cloudflare Tunnel | | **Internal API** | http://172.16.3.30:3001 | Direct access | | **Database** | PostgreSQL @ 172.16.3.30:5432/gururmm | op://Infrastructure/GuruRMM Server/PostgreSQL * | | **Build Server** | Same host (gururmm-build) | Linux native builds only | | **Agent Downloads** | /var/www/gururmm/downloads/ | Nginx on port 80 | | **Gitea Repo** | git.azcomputerguru.com/azcomputerguru/gururmm | Active (NOT guru-rmm) | **All credentials:** `op read "op://Infrastructure/GuruRMM Server/[field]"` ## Current State (READ THIS FIRST) ### Version & Deployment - **Server:** v0.6.0 (commit c7c8317) - deployed 2026-04-14 - **Agent:** v0.6.0 (Linux + Windows builds) - deployed 2026-04-14 - **Database:** Migrations 001-010 applied - **Service Status:** gururmm-server.service running (PID 944198) ### Active Work - **Phase 1 Complete:** Tunnel infrastructure (REST API, WebSocket protocol, database schema, agent state machine) - **Phase 2 Pending:** Channel implementation (Terminal, File, Registry, Service) - **Phase 3 Not Started:** Production hardening (rate limiting, timeouts, metrics) ### Agent Fleet Status (as of 2026-04-15 03:20 UTC) - **Online:** 2/6 agents - AD2 (Windows 10, v0.6.0) - ID: d28a1c90-47d7-448f-a287-197bc8892234 - DESKTOP-0O8A1RL (Windows 11, v0.6.0) - ID: 0b2527cc-ab3f-49d9-9a06-bfd0b4a613a7 - **Offline:** 4/6 agents - SL-SERVER: **STUCK IN PENDING UPDATE** - requires manual service restart ### Recent Session Logs (MUST READ BEFORE CONTINUING WORK) - **2026-04-15:** End-to-end tunnel lifecycle verified via public API. Three actionable findings — `session-logs/2026-04-15-session.md` - **2026-04-14:** Tunnel API testing, authentication fix - `session-logs/2026-04-14-session.md` - **2026-04-02:** Tunnel implementation, update bug fixes - See git history - **2026-04-01:** Cloudflare Tunnel configuration - See credentials.md ### What To Do Next (priority order, revised 2026-04-15) **Architectural pivot:** multi-tenancy is now a core requirement (product going to MSP market). Logging split into three tiers (agent OS-native / client event pull / tunnel audit to DB). Detailed breakdown in ROADMAP.md (sections: Logging & Audit, Multi-tenancy, Tunnel Channels). 1. **Fix `/api/v1/tunnel/status/{id}` 403 bug** — `server/src/db/tunnel.rs:94-103`. Small PR. Blocks Phase 2 integration tests. (Roadmap S8.) 2. **Agent self-logging via OS-native sinks** — Windows Event Log provider, Linux journald, macOS os_log. Ship before anything else touches Phase 2. (Roadmap L1.) 3. **Tech-side tunnel subscriber design** — browser needs a WS endpoint to receive tunnel data; `server/src/ws/mod.rs:808-825` currently discards `AgentMessage::TunnelData`. Decide pub-sub shape before implementing any channel. (Roadmap T5.) 4. **Multi-tenancy schema** — `tenant_id` on every table. Auth middleware filters by tenant. Do this before building more features because retroactive migration cost scales with schema size. (Roadmap M1-M2.) 5. **Terminal channel** — only after 1-4. `tokio::process::Command` in `agent/src/transport/websocket.rs:handle_tunnel_data()`. (Roadmap T1.) 6. **Client event pull (`client_events` table)** — 15-min delta + on-tunnel-open/close. Windows Get-WinEvent, Linux journalctl, macOS log show. (Roadmap L2-L4.) **Housekeeping:** - Update 1Password `Infrastructure/GuruRMM Server/Admin Password` to `GuruRMM2025` (stored value is stale and fails login). - Add agent file logging (`C:\ProgramData\GuruRMM\agent.log`) as bridge until OS-native sinks land — lets Phase 2 work proceed with visibility. ## Anti-Patterns (DON'T DO THIS) ❌ **DO NOT build on macOS** - Binaries won't run on Linux server. SSH to 172.16.3.30 and build natively. ❌ **DO NOT query database directly** - Use Database Agent for ALL database operations (coordinator role). ❌ **DO NOT point downloads URL to port 3001** - API server doesn't serve /downloads. Use nginx (port 80) or public URL. ❌ **DO NOT hardcode credentials** - Always fetch from 1Password: `op read "op://Infrastructure/GuruRMM Server/..."` ❌ **DO NOT create new password utilities** - Use `/tmp/hash_password` (already compiled): ```bash /tmp/target/release/hash_password "password_here" # Output: $argon2id$v=19$m=19456,t=2,p=1$...[97 chars] ``` ❌ **DO NOT build in CloudeTools repo** - Active repo is `gururmm` on Gitea, not `guru-rmm`. ❌ **DO NOT use emojis** - ASCII markers only: [OK], [ERROR], [WARNING], [SUCCESS], [INFO] ❌ **DO NOT make breaking changes to `/api/v1/bootstrap/hello`** - This is the anchor that lets long-offline agents reconnect and self-upgrade. Input and output schemas are **additive-only forever**. An agent from v0.1 must be able to hit this endpoint in 2030 and get a meaningful response telling it how to update. Every other endpoint/message is free to evolve; this one is not. See ROADMAP.md V1-V10. ❌ **DO NOT cross module boundaries by importing another module's internals** - The product is architected modularly (core + PSA + backups + syslog + ...). Modules own their schema namespace and never touch another module's tables. Cross-module communication goes through the event bus or that module's exposed API only. Core and modules are separate Rust crates by design; enforce via `use` restrictions. Breaking this discipline once poisons the whole architecture. See ROADMAP.md X1-X12. ### Hierarchy Terminology (use these exact terms) | Tier | Term | DB | Meaning | |---|---|---|---| | 1 | Platform | — | The software author (us, GuruRMM) | | 2 | Partner | `tenant_id` | An MSP — a paying customer of the Platform | | 3 | Client | `client_id` | A Partner's customer | | 4 | Site | `site_id` | A location within a Client (physical or logical) | | 5 | Agent | `agent_id` | An endpoint at a Site | UI/API says "Partner"; DB column is `tenant_id`. Do not rename. Do not use "sub-tenant" or bare "customer". Full canonical definition + API path convention + event topic naming in ROADMAP.md Terminology section. ## Where to Find Things ### Codebase Structure ``` projects/msp-tools/guru-rmm/ ├── server/ # Rust API server │ ├── src/ │ │ ├── api/ # REST endpoints │ │ │ ├── tunnel.rs # Tunnel API (Phase 1 complete) │ │ │ ├── agents.rs # Agent management │ │ │ └── auth.rs # Login/JWT │ │ ├── db/ # Database operations │ │ │ ├── tunnel.rs # Tunnel queries │ │ │ └── agents.rs # Agent queries │ │ ├── ws/ # WebSocket protocol │ │ │ └── mod.rs # ServerMessage/AgentMessage enums │ │ └── auth/ # Password hashing (Argon2id) │ └── migrations/ # Database schema (001-010) │ └── 010_tunnel_sessions.sql # Tunnel tables (tech_sessions, tunnel_audit) ├── agent/ # Rust agent binary │ ├── src/ │ │ ├── tunnel/ # Tunnel manager (Phase 1 complete) │ │ │ └── mod.rs # AgentMode state machine │ │ ├── updater/ # Self-update system (v0.6.0 fixes applied) │ │ └── transport/ # WebSocket client │ └── Cargo.toml ├── session-logs/ # Work history (READ BEFORE STARTING) └── ROADMAP.md # Feature roadmap ``` ### Production Files on Server (172.16.3.30) - **Binary:** /opt/gururmm/gururmm-server - **Config:** /opt/gururmm/.env - **Service:** systemctl status gururmm-server - **Logs:** journalctl -u gururmm-server -n 100 - **Downloads:** /var/www/gururmm/downloads/ (served by nginx) ### Cloudflare Tunnel Config (Jupiter NAS) - **Location:** /mnt/cache/appdata/cloudflared/config.yml - **Hostname:** rmm-api.azcomputerguru.com - **Target:** http://172.16.3.30 (nginx port 80, NOT API port 3001) - **Container:** cloudflared (restart to apply changes) ## Common Operations ### Deploy Server Binary ```bash # SSH to build server SSH_USER=$(op read "op://Infrastructure/GuruRMM Server/username") SSH_PASS=$(op read "op://Infrastructure/GuruRMM Server/password") sshpass -p "${SSH_PASS}" ssh -o StrictHostKeyChecking=no ${SSH_USER}@172.16.3.30 # Build on Linux (native) cd /opt/gururmm/server cargo build --release # Install sudo systemctl stop gururmm-server sudo cp target/release/gururmm-server /opt/gururmm/ sudo systemctl start gururmm-server # Verify systemctl status gururmm-server curl http://localhost:3001/health # Should return "OK" ``` ### Deploy Agent Binaries ```bash # SSH to build server ssh ${SSH_USER}@172.16.3.30 # Build Linux agent cd /opt/gururmm/agent cargo build --release --target x86_64-unknown-linux-gnu # Build Windows agent (cross-compile) cargo build --release --target x86_64-pc-windows-gnu # Generate checksums cd /var/www/gururmm/downloads/ sha256sum gururmm-agent-linux-x64 > gururmm-agent-linux-x64.sha256 sha256sum gururmm-agent-windows-x64.exe > gururmm-agent-windows-x64.exe.sha256 # Agents will auto-update on next heartbeat ``` ### Test Tunnel API Endpoints ```bash # Get JWT token ADMIN_PASS=$(op read "op://Infrastructure/GuruRMM Server/Admin Password") TOKEN=$(curl -s http://172.16.3.30:3001/api/auth/login \ -H "Content-Type: application/json" \ -d "{\"email\":\"admin@azcomputerguru.com\",\"password\":\"${ADMIN_PASS}\"}" | \ python3 -c "import sys, json; print(json.load(sys.stdin)['token'])") # Open tunnel to AD2 curl -s http://172.16.3.30:3001/api/v1/tunnel/open \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: application/json" \ -d '{"agent_id":"d28a1c90-47d7-448f-a287-197bc8892234"}' | jq '.' # Get status (save session_id from above) curl -s http://172.16.3.30:3001/api/v1/tunnel/status/SESSION_ID \ -H "Authorization: Bearer ${TOKEN}" | jq '.' # Close tunnel curl -s http://172.16.3.30:3001/api/v1/tunnel/close \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: application/json" \ -d '{"session_id":"SESSION_ID"}' | jq '.' ``` **Full examples with output:** See session-logs/2026-04-14-session.md (lines 170-230) ### Check Agent Status ```bash # Get list of agents curl -s http://172.16.3.30:3001/api/agents \ -H "Authorization: Bearer ${TOKEN}" | jq '.' # Filter online agents only curl -s http://172.16.3.30:3001/api/agents \ -H "Authorization: Bearer ${TOKEN}" | \ jq '[.[] | select(.status == "online") | {hostname, agent_version, last_seen}]' ``` ### Database Operations (USE DATABASE AGENT) ```bash # DO NOT query directly - delegate to Database Agent # Agent will handle credentials and connection automatically # Example request to Database Agent: # "Use Database Agent to query tech_sessions table for active tunnels" ``` ### Access Database Manually (Emergency Only) ```bash SSH_USER=$(op read "op://Infrastructure/GuruRMM Server/username") SSH_PASS=$(op read "op://Infrastructure/GuruRMM Server/password") PGPASS=$(op read "op://Infrastructure/GuruRMM Server/PostgreSQL Password") sshpass -p "${SSH_PASS}" ssh -o StrictHostKeyChecking=no ${SSH_USER}@172.16.3.30 \ "PGPASSWORD='${PGPASS}' psql -h localhost -U gururmm -d gururmm" ``` ## Key Technical Decisions (ADRs) **2026-04-14:** Use Argon2id for password hashing (not bcrypt) - Library: argon2 crate v0.5 - Config: m=19456, t=2, p=1 - Output: 97-character hash string **2026-04-02:** Tunnel sessions use tech_id FK to users table - Enables session ownership validation - Prevents cross-tech session access in multi-tenant environment - Session status query returns 403 if not owned by requesting tech **2026-04-01:** Downloads URL points to nginx (port 80), not API (port 3001) - API server doesn't serve static files - Nginx configured at /var/www/gururmm/downloads/ - Cloudflare Tunnel routes rmm-api.azcomputerguru.com to nginx **2026-04-01:** Agent update system uses atomic rename pattern (Unix) - Eliminates race condition between backup and install - Copy to temp → chmod +x → rename (atomic) - Includes rollback on restart failure (v0.6.0 fix) ## Tunnel Architecture (Phase 1 Complete) ### Session Lifecycle 1. Tech opens tunnel: POST /api/v1/tunnel/open → creates tech_session record 2. Server sends TunnelOpen via WebSocket → agent receives 3. Agent transitions Heartbeat → Tunnel mode → sends TunnelReady 4. Tech can now send channel operations (Phase 2, not implemented) 5. Tech closes tunnel: POST /api/v1/tunnel/close → updates tech_session.status='closed' 6. Server sends TunnelClose → agent transitions back to Heartbeat mode ### Database Schema ```sql -- tech_sessions: Active tunnel sessions CREATE TABLE tech_sessions ( id SERIAL PRIMARY KEY, session_id VARCHAR(36) UNIQUE NOT NULL, tech_id UUID REFERENCES users(id), agent_id UUID REFERENCES agents(id), status VARCHAR(20) DEFAULT 'active', opened_at TIMESTAMPTZ DEFAULT NOW(), closed_at TIMESTAMPTZ ); -- Unique constraint: one active session per tech+agent CREATE UNIQUE INDEX idx_tech_sessions_active ON tech_sessions(tech_id, agent_id, status) WHERE status = 'active'; -- tunnel_audit: Audit log for tunnel operations CREATE TABLE tunnel_audit ( id BIGSERIAL PRIMARY KEY, session_id VARCHAR(36) REFERENCES tech_sessions(session_id), channel_id VARCHAR(36), operation VARCHAR(50), details JSONB, created_at TIMESTAMPTZ DEFAULT NOW() ); ``` ### WebSocket Protocol ```rust // Server → Agent enum ServerMessage { TunnelOpen { session_id: String, tech_id: Uuid }, TunnelClose { session_id: String }, TunnelData { channel_id: String, data: TunnelDataPayload }, } // Agent → Server enum AgentMessage { TunnelReady { session_id: String }, TunnelData { channel_id: String, data: TunnelDataPayload }, TunnelError { channel_id: String, error: String }, } ``` ## Roadmap ### Phase 2: Channel Implementation (Next) - [ ] Terminal channel (shell command execution) - [ ] File channel (upload/download with progress) - [ ] Registry channel (Windows registry access) - [ ] Service channel (Windows service management) - [ ] WebSocket data forwarding (tech ↔ server ↔ agent) - [ ] Dashboard UI for tunnel management ### Phase 3: Production Hardening - [ ] Rate limiting on tunnel operations - [ ] Session timeout enforcement (max duration) - [ ] Concurrent session limits per tech - [ ] Audit log cleanup/archival (retention policy) - [ ] Metrics collection (session duration, data transferred) - [ ] Alerting on suspicious tunnel activity ### Backlog - [ ] Fix SL-SERVER stuck update (manual restart required) - [ ] Investigate 4 duplicate agent records in database (2x SL-SERVER seen) - [ ] Windows update system testing (scheduled task timing) - [ ] Agent reconnection on network failure - [ ] Multi-tenant access control audit - [ ] **[2026-04-15] Status endpoint returns 403 for closed sessions** — should return `{status: closed}` with session record when caller owns it. See session log. (Tracked as Roadmap S8.) - [ ] **[2026-04-15] Agent writes no logs** — add tracing+file appender to `agent/src/main.rs`; logs to `C:\ProgramData\GuruRMM\agent.log`. (Bridge to Roadmap L1 OS-native sinks.) - [ ] **[2026-04-15] Logging redesign — three-tier architecture.** See ROADMAP.md "Logging, Audit & Observability" section (L1-L10). - [ ] **[2026-04-15] Multi-tenancy schema refactor.** See ROADMAP.md "Multi-tenancy / MSP SaaS" section (M1-M7). Blocks scaling to other MSPs. - [ ] **[2026-04-15] Tunnel Channels (Phase 2).** See ROADMAP.md "Tunnel Channels" section (T1-T8). T5 (tech-side subscriber) is the gating design decision. ## Useful Links - **Roadmap:** projects/msp-tools/guru-rmm/ROADMAP.md - **Latest Session:** session-logs/2026-04-14-session.md - **Gitea Repo:** http://172.16.3.20:3000/azcomputerguru/gururmm - **Credentials:** credentials.md (search for "GuruRMM Server") ## Quick Reference - API Endpoints ### Authentication - POST /api/auth/login - Get JWT token - POST /api/auth/register - Create first admin (disabled after first user) - GET /api/auth/me - Get current user info ### Tunnel Management (Phase 1) - POST /api/v1/tunnel/open - Open tunnel session - GET /api/v1/tunnel/status/:session_id - Get session status - POST /api/v1/tunnel/close - Close tunnel session ### Agents - GET /api/agents - List all agents with details - GET /api/agents/:id - Get specific agent - POST /api/agents/:id/move - Move agent to different site - DELETE /api/agents/:id - Delete agent ### Commands - POST /api/agents/:id/command - Send command to agent - GET /api/commands - List command history - GET /api/commands/:id - Get command result --- **Before starting work:** Read latest session log in session-logs/ directory **For context recovery:** Use /context skill to search previous work **For credentials:** Always use 1Password - never hardcode