Files
claudetools/projects/msp-tools/guru-rmm/CONTEXT.md

13 KiB

GuruRMM - Project Context

Last Updated: 2026-04-14 Status: Active Development - Tunnel Phase 1 Complete

Quick Start - Infrastructure Overview

Component Location Access
Production Server 172.16.3.30 (gururmm) SSH: op://Infrastructure/GuruRMM Server/username
Public API https://rmm-api.azcomputerguru.com Via Cloudflare Tunnel
Internal API http://172.16.3.30:3001 Direct access
Database PostgreSQL @ 172.16.3.30:5432/gururmm op://Infrastructure/GuruRMM Server/PostgreSQL *
Build Server Same host (gururmm-build) Linux native builds only
Agent Downloads /var/www/gururmm/downloads/ Nginx on port 80
Gitea Repo git.azcomputerguru.com/azcomputerguru/gururmm Active (NOT guru-rmm)

All credentials: op read "op://Infrastructure/GuruRMM Server/[field]"

Current State (READ THIS FIRST)

Version & Deployment

  • Server: v0.6.0 (commit c7c8317) - deployed 2026-04-14
  • Agent: v0.6.0 (Linux + Windows builds) - deployed 2026-04-14
  • Database: Migrations 001-010 applied
  • Service Status: gururmm-server.service running (PID 944198)

Active Work

  • Phase 1 Complete: Tunnel infrastructure (REST API, WebSocket protocol, database schema, agent state machine)
  • Phase 2 Pending: Channel implementation (Terminal, File, Registry, Service)
  • Phase 3 Not Started: Production hardening (rate limiting, timeouts, metrics)

Agent Fleet Status (as of 2026-04-15 03:20 UTC)

  • Online: 2/6 agents
    • AD2 (Windows 10, v0.6.0) - ID: d28a1c90-47d7-448f-a287-197bc8892234
    • DESKTOP-0O8A1RL (Windows 11, v0.6.0) - ID: 0b2527cc-ab3f-49d9-9a06-bfd0b4a613a7
  • Offline: 4/6 agents
    • SL-SERVER: STUCK IN PENDING UPDATE - requires manual service restart

Recent Session Logs (MUST READ BEFORE CONTINUING WORK)

  • 2026-04-14: Tunnel API testing, authentication fix - session-logs/2026-04-14-session.md
  • 2026-04-02: Tunnel implementation, update bug fixes - See git history
  • 2026-04-01: Cloudflare Tunnel configuration - See credentials.md

Anti-Patterns (DON'T DO THIS)

DO NOT build on macOS - Binaries won't run on Linux server. SSH to 172.16.3.30 and build natively.

DO NOT query database directly - Use Database Agent for ALL database operations (coordinator role).

DO NOT point downloads URL to port 3001 - API server doesn't serve /downloads. Use nginx (port 80) or public URL.

DO NOT hardcode credentials - Always fetch from 1Password: op read "op://Infrastructure/GuruRMM Server/..."

DO NOT create new password utilities - Use /tmp/hash_password (already compiled):

/tmp/target/release/hash_password "password_here"
# Output: $argon2id$v=19$m=19456,t=2,p=1$...[97 chars]

DO NOT build in CloudeTools repo - Active repo is gururmm on Gitea, not guru-rmm.

DO NOT use emojis - ASCII markers only: [OK], [ERROR], [WARNING], [SUCCESS], [INFO]

Where to Find Things

Codebase Structure

projects/msp-tools/guru-rmm/
├── server/                      # Rust API server
│   ├── src/
│   │   ├── api/                 # REST endpoints
│   │   │   ├── tunnel.rs        # Tunnel API (Phase 1 complete)
│   │   │   ├── agents.rs        # Agent management
│   │   │   └── auth.rs          # Login/JWT
│   │   ├── db/                  # Database operations
│   │   │   ├── tunnel.rs        # Tunnel queries
│   │   │   └── agents.rs        # Agent queries
│   │   ├── ws/                  # WebSocket protocol
│   │   │   └── mod.rs           # ServerMessage/AgentMessage enums
│   │   └── auth/                # Password hashing (Argon2id)
│   └── migrations/              # Database schema (001-010)
│       └── 010_tunnel_sessions.sql  # Tunnel tables (tech_sessions, tunnel_audit)
├── agent/                       # Rust agent binary
│   ├── src/
│   │   ├── tunnel/              # Tunnel manager (Phase 1 complete)
│   │   │   └── mod.rs           # AgentMode state machine
│   │   ├── updater/             # Self-update system (v0.6.0 fixes applied)
│   │   └── transport/           # WebSocket client
│   └── Cargo.toml
├── session-logs/                # Work history (READ BEFORE STARTING)
└── ROADMAP.md                   # Feature roadmap

Production Files on Server (172.16.3.30)

  • Binary: /opt/gururmm/gururmm-server
  • Config: /opt/gururmm/.env
  • Service: systemctl status gururmm-server
  • Logs: journalctl -u gururmm-server -n 100
  • Downloads: /var/www/gururmm/downloads/ (served by nginx)

Cloudflare Tunnel Config (Jupiter NAS)

  • Location: /mnt/cache/appdata/cloudflared/config.yml
  • Hostname: rmm-api.azcomputerguru.com
  • Target: http://172.16.3.30 (nginx port 80, NOT API port 3001)
  • Container: cloudflared (restart to apply changes)

Common Operations

Deploy Server Binary

# SSH to build server
SSH_USER=$(op read "op://Infrastructure/GuruRMM Server/username")
SSH_PASS=$(op read "op://Infrastructure/GuruRMM Server/password")
sshpass -p "${SSH_PASS}" ssh -o StrictHostKeyChecking=no ${SSH_USER}@172.16.3.30

# Build on Linux (native)
cd /opt/gururmm/server
cargo build --release

# Install
sudo systemctl stop gururmm-server
sudo cp target/release/gururmm-server /opt/gururmm/
sudo systemctl start gururmm-server

# Verify
systemctl status gururmm-server
curl http://localhost:3001/health  # Should return "OK"

Deploy Agent Binaries

# SSH to build server
ssh ${SSH_USER}@172.16.3.30

# Build Linux agent
cd /opt/gururmm/agent
cargo build --release --target x86_64-unknown-linux-gnu

# Build Windows agent (cross-compile)
cargo build --release --target x86_64-pc-windows-gnu

# Generate checksums
cd /var/www/gururmm/downloads/
sha256sum gururmm-agent-linux-x64 > gururmm-agent-linux-x64.sha256
sha256sum gururmm-agent-windows-x64.exe > gururmm-agent-windows-x64.exe.sha256

# Agents will auto-update on next heartbeat

Test Tunnel API Endpoints

# Get JWT token
ADMIN_PASS=$(op read "op://Infrastructure/GuruRMM Server/Admin Password")
TOKEN=$(curl -s http://172.16.3.30:3001/api/auth/login \
  -H "Content-Type: application/json" \
  -d "{\"email\":\"admin@azcomputerguru.com\",\"password\":\"${ADMIN_PASS}\"}" | \
  python3 -c "import sys, json; print(json.load(sys.stdin)['token'])")

# Open tunnel to AD2
curl -s http://172.16.3.30:3001/api/v1/tunnel/open \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"d28a1c90-47d7-448f-a287-197bc8892234"}' | jq '.'

# Get status (save session_id from above)
curl -s http://172.16.3.30:3001/api/v1/tunnel/status/SESSION_ID \
  -H "Authorization: Bearer ${TOKEN}" | jq '.'

# Close tunnel
curl -s http://172.16.3.30:3001/api/v1/tunnel/close \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"session_id":"SESSION_ID"}' | jq '.'

Full examples with output: See session-logs/2026-04-14-session.md (lines 170-230)

Check Agent Status

# Get list of agents
curl -s http://172.16.3.30:3001/api/agents \
  -H "Authorization: Bearer ${TOKEN}" | jq '.'

# Filter online agents only
curl -s http://172.16.3.30:3001/api/agents \
  -H "Authorization: Bearer ${TOKEN}" | \
  jq '[.[] | select(.status == "online") | {hostname, agent_version, last_seen}]'

Database Operations (USE DATABASE AGENT)

# DO NOT query directly - delegate to Database Agent
# Agent will handle credentials and connection automatically

# Example request to Database Agent:
# "Use Database Agent to query tech_sessions table for active tunnels"

Access Database Manually (Emergency Only)

SSH_USER=$(op read "op://Infrastructure/GuruRMM Server/username")
SSH_PASS=$(op read "op://Infrastructure/GuruRMM Server/password")
PGPASS=$(op read "op://Infrastructure/GuruRMM Server/PostgreSQL Password")

sshpass -p "${SSH_PASS}" ssh -o StrictHostKeyChecking=no ${SSH_USER}@172.16.3.30 \
  "PGPASSWORD='${PGPASS}' psql -h localhost -U gururmm -d gururmm"

Key Technical Decisions (ADRs)

2026-04-14: Use Argon2id for password hashing (not bcrypt)

  • Library: argon2 crate v0.5
  • Config: m=19456, t=2, p=1
  • Output: 97-character hash string

2026-04-02: Tunnel sessions use tech_id FK to users table

  • Enables session ownership validation
  • Prevents cross-tech session access in multi-tenant environment
  • Session status query returns 403 if not owned by requesting tech

2026-04-01: Downloads URL points to nginx (port 80), not API (port 3001)

  • API server doesn't serve static files
  • Nginx configured at /var/www/gururmm/downloads/
  • Cloudflare Tunnel routes rmm-api.azcomputerguru.com to nginx

2026-04-01: Agent update system uses atomic rename pattern (Unix)

  • Eliminates race condition between backup and install
  • Copy to temp → chmod +x → rename (atomic)
  • Includes rollback on restart failure (v0.6.0 fix)

Tunnel Architecture (Phase 1 Complete)

Session Lifecycle

  1. Tech opens tunnel: POST /api/v1/tunnel/open → creates tech_session record
  2. Server sends TunnelOpen via WebSocket → agent receives
  3. Agent transitions Heartbeat → Tunnel mode → sends TunnelReady
  4. Tech can now send channel operations (Phase 2, not implemented)
  5. Tech closes tunnel: POST /api/v1/tunnel/close → updates tech_session.status='closed'
  6. Server sends TunnelClose → agent transitions back to Heartbeat mode

Database Schema

-- tech_sessions: Active tunnel sessions
CREATE TABLE tech_sessions (
    id SERIAL PRIMARY KEY,
    session_id VARCHAR(36) UNIQUE NOT NULL,
    tech_id UUID REFERENCES users(id),
    agent_id UUID REFERENCES agents(id),
    status VARCHAR(20) DEFAULT 'active',
    opened_at TIMESTAMPTZ DEFAULT NOW(),
    closed_at TIMESTAMPTZ
);

-- Unique constraint: one active session per tech+agent
CREATE UNIQUE INDEX idx_tech_sessions_active
ON tech_sessions(tech_id, agent_id, status) WHERE status = 'active';

-- tunnel_audit: Audit log for tunnel operations
CREATE TABLE tunnel_audit (
    id BIGSERIAL PRIMARY KEY,
    session_id VARCHAR(36) REFERENCES tech_sessions(session_id),
    channel_id VARCHAR(36),
    operation VARCHAR(50),
    details JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

WebSocket Protocol

// Server → Agent
enum ServerMessage {
    TunnelOpen { session_id: String, tech_id: Uuid },
    TunnelClose { session_id: String },
    TunnelData { channel_id: String, data: TunnelDataPayload },
}

// Agent → Server
enum AgentMessage {
    TunnelReady { session_id: String },
    TunnelData { channel_id: String, data: TunnelDataPayload },
    TunnelError { channel_id: String, error: String },
}

Roadmap

Phase 2: Channel Implementation (Next)

  • Terminal channel (shell command execution)
  • File channel (upload/download with progress)
  • Registry channel (Windows registry access)
  • Service channel (Windows service management)
  • WebSocket data forwarding (tech ↔ server ↔ agent)
  • Dashboard UI for tunnel management

Phase 3: Production Hardening

  • Rate limiting on tunnel operations
  • Session timeout enforcement (max duration)
  • Concurrent session limits per tech
  • Audit log cleanup/archival (retention policy)
  • Metrics collection (session duration, data transferred)
  • Alerting on suspicious tunnel activity

Backlog

  • Fix SL-SERVER stuck update (manual restart required)
  • Investigate 4 duplicate agent records in database
  • Windows update system testing (scheduled task timing)
  • Agent reconnection on network failure
  • Multi-tenant access control audit

Quick Reference - API Endpoints

Authentication

  • POST /api/auth/login - Get JWT token
  • POST /api/auth/register - Create first admin (disabled after first user)
  • GET /api/auth/me - Get current user info

Tunnel Management (Phase 1)

  • POST /api/v1/tunnel/open - Open tunnel session
  • GET /api/v1/tunnel/status/:session_id - Get session status
  • POST /api/v1/tunnel/close - Close tunnel session

Agents

  • GET /api/agents - List all agents with details
  • GET /api/agents/:id - Get specific agent
  • POST /api/agents/:id/move - Move agent to different site
  • DELETE /api/agents/:id - Delete agent

Commands

  • POST /api/agents/:id/command - Send command to agent
  • GET /api/commands - List command history
  • GET /api/commands/:id - Get command result

Before starting work: Read latest session log in session-logs/ directory For context recovery: Use /context skill to search previous work For credentials: Always use 1Password - never hardcode