Files

azcomputerguru 9940faf34a Add GuruRMM real-time tunnel architecture and planning

Comprehensive design for transforming agents from 30s heartbeat mode to
persistent tunnel mode, enabling Claude Code to execute commands on remote
machines through secure multiplexed WebSocket channels.

Additions:
- Complete implementation plan with 5-phase roadmap (5-7 weeks to GA)
- Detailed architecture document covering protocol, security, and MCP integration
- Database migration for tech_sessions and tunnel_audit tables

Key architectural decisions:
- Hybrid lifecycle: WebSocket persistent, tunnel is operational state
- Channel multiplexing over single WebSocket (terminal, file ops, etc.)
- Three-layer security: JWT auth, session authorization, command validation
- Custom MCP server for Claude Code integration

Next: Phase 1 implementation (tunnel open/close endpoints, agent mode state machine)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-04-14 06:32:16 -07:00

22 KiB

Raw Blame History

GuruRMM Real-Time Tunnel Architecture Plan

Date: 2026-04-13 Status: DRAFT - Pending approval Goal: Enable Claude Code on tech workstation to execute commands on remote machines through secure tunnel

Executive Summary

This plan designs a real-time tunnel feature that transforms GuruRMM agents from periodic check-in mode (30-second heartbeats) to persistent tunnel mode when a tech opens a background session. The tunnel will support multiplexed channels for terminal access, filesystem operations, registry editor, and services management, accessible to Claude Code running on the tech's workstation.

Current Architecture (Discovered)

Server (172.16.3.30:3001)

Framework: Axum 0.7 with Tokio async runtime
WebSocket endpoint: wss://rmm-api.azcomputerguru.com/ws
Connection registry: AgentConnections HashMap tracking active WebSocket connections
Message routing: mpsc channels with dual-channel pattern (protocol messages + WebSocket Pong frames)
Protocol: Tagged JSON enums with serde (ServerMessage/AgentMessage)

Agent

Runtime: Tokio async with multiple concurrent tasks
Heartbeat interval: 30 seconds (confirmed in code)
Concurrent tasks: 3 sender tasks (metrics: 60s, network: 30s, heartbeat: 30s)
Inactivity timeout: 90 seconds
Reconnect backoff: 10 seconds

Existing Protocol

// Server → Agent
enum ServerMessage {
    AuthAck(AuthAckPayload),
    Command(CommandPayload),
    ConfigUpdate(serde_json::Value),
    Update(UpdatePayload),
    Ack { message_id: Option<String> },
    Error { code: String, message: String },
}

// Agent → Server
enum AgentMessage {
    Auth(AuthPayload),
    Heartbeat,
    CommandResult(CommandResultPayload),
    MetricsData(MetricsPayload),
    NetworkData(NetworkPayload),
}

Architectural Decisions

1. Tunnel Lifecycle: On-Demand with Persistent Connection

Decision: Hybrid approach - WebSocket stays persistent, tunnel mode is a state change

Rationale:

Existing architecture already maintains persistent WebSocket connections
Heartbeat mode and tunnel mode are operational states, not connection states
On-demand tunnel activation avoids resource waste
Persistent WebSocket enables instant mode switching

Implementation:

enum AgentMode {
    Heartbeat,  // Default: 30-second heartbeats, metrics, network monitoring
    Tunnel {    // Active session mode
        session_id: String,
        tech_id: i32,
        channels: HashMap<String, ChannelType>,
    },
}

2. Channel Multiplexing: Unified Protocol with Channel ID Routing

Decision: Single WebSocket, multiple logical channels, channel_id field for routing

Rationale:

Maintains single WebSocket connection (simpler firewall rules, NAT traversal)
Channel IDs enable concurrent operations (multiple terminals, simultaneous file transfers)
Fits naturally into existing tagged enum protocol
Allows adding new channel types without protocol changes

Protocol Extension:

// New message types
enum ServerMessage {
    // ... existing messages ...
    TunnelOpen { session_id: String, tech_id: i32 },
    TunnelClose { session_id: String },
    TunnelData { channel_id: String, data: TunnelDataPayload },
}

enum AgentMessage {
    // ... existing messages ...
    TunnelReady { session_id: String },
    TunnelData { channel_id: String, data: TunnelDataPayload },
    TunnelError { channel_id: String, error: String },
}

#[serde(tag = "type", content = "payload")]
enum TunnelDataPayload {
    Terminal { command: String },
    TerminalOutput { stdout: String, stderr: String, exit_code: Option<i32> },
    FileRead { path: String },
    FileContent { content: Vec<u8>, mime_type: String },
    FileWrite { path: String, content: Vec<u8> },
    FileList { path: String },
    FileListResult { entries: Vec<FileEntry> },
    RegistryRead { path: String, value_name: Option<String> },
    RegistryWrite { path: String, value_name: String, value: RegistryValue },
    ServiceList,
    ServiceControl { name: String, action: ServiceAction },
}

3. Claude Integration: Custom MCP Server

Decision: Build GuruRMM MCP server that provides remote execution tools

Rationale:

MCP is Claude's native integration protocol
Provides fine-grained tool permissions (user can approve specific operations)
Tools appear naturally in Claude's tool list
Can reuse existing API authentication (JWT tokens)
Server can enforce rate limiting and audit logging

MCP Tools:

// MCP Server tools
{
  "run_remote_command": {
    "agent_id": "string",
    "command": "string",
    "shell": "powershell|cmd|bash",
    "working_dir": "string",
    "timeout": "number"
  },
  "read_remote_file": {
    "agent_id": "string",
    "path": "string"
  },
  "write_remote_file": {
    "agent_id": "string",
    "path": "string",
    "content": "string"
  },
  "list_remote_directory": {
    "agent_id": "string",
    "path": "string"
  },
  "get_remote_services": {
    "agent_id": "string",
    "filter": "string"
  },
  "control_remote_service": {
    "agent_id": "string",
    "service_name": "string",
    "action": "start|stop|restart"
  }
}

4. File Operations: Hybrid Approach

Decision: Dedicated file endpoints for binary/large files, PowerShell for metadata

Rationale:

Binary files (executables, images) need raw byte transfer
Text files and metadata operations can use PowerShell (simpler, reuses existing command execution)
Chunked transfer for large files (prevents WebSocket message size limits)
Base64 encoding for binary data over JSON protocol

Implementation:

Files < 1MB: Direct transfer via TunnelData.FileContent
Files > 1MB: Chunked transfer with transfer_id for reassembly
PowerShell used for: directory listings, file metadata, permissions, ACLs

5. Security Model

Decision: Three-layer security: JWT auth, session authorization, command validation

Layer 1: JWT Authentication

Tech authenticates to server with credentials
Server issues JWT with tech_id, permissions, expiration
MCP server includes JWT in all tunnel requests

Layer 2: Session Authorization

Database tracks: tech_sessions table (tech_id, agent_id, session_id, opened_at)
Server validates: JWT valid + session exists + tech owns session
Sessions auto-expire after 4 hours of inactivity

Layer 3: Command Validation

Agent-side working directory restrictions (configurable per agent)
Server-side command sanitization (prevent injection)
Rate limiting: 100 commands per minute per tech per agent
Audit logging: All tunnel operations logged to database

Implementation Plan

Phase 1: Core Tunnel Infrastructure (Week 1)

Goal: Establish tunnel mode switching and channel routing

Server changes:

Add TunnelOpen, TunnelClose, TunnelData to ServerMessage enum
Create tech_sessions table (id, tech_id, agent_id, session_id, opened_at, last_activity)
Implement tunnel session lifecycle endpoints:
- POST /api/v1/tunnel/open - Create session, send TunnelOpen to agent
- POST /api/v1/tunnel/close - Send TunnelClose, delete session
- GET /api/v1/tunnel/status/:session_id - Check tunnel health
Add channel routing logic in WebSocket handler (route by channel_id)
Implement session validation middleware (JWT + session ownership)

Agent changes:

Add TunnelReady, TunnelData, TunnelError to AgentMessage enum
Implement AgentMode state machine (Heartbeat ↔ Tunnel transitions)
Add channel manager (HashMap<channel_id, ChannelHandler>)
Respond to TunnelOpen with TunnelReady confirmation
Handle TunnelClose gracefully (cleanup channels, return to heartbeat mode)

Testing:

Tech can open tunnel session via API
Agent switches to tunnel mode
Agent returns to heartbeat mode when session closes
Concurrent sessions rejected (one tunnel per agent)

Phase 2: Terminal Channel (Week 2)

Goal: Execute PowerShell/cmd/bash commands through tunnel

Implementation:

Create TerminalChannel handler on agent
- Spawn child process (powershell.exe, cmd.exe, or bash)
- Capture stdout/stderr streams
- Handle exit codes and timeouts
Implement TunnelDataPayload::Terminal on server
Add working directory validation on agent
Add command result streaming (chunked output for long-running commands)

API endpoint:

POST /api/v1/tunnel/:session_id/command
Body: {
  "command": "Get-Process | Where-Object CPU -gt 10",
  "shell": "powershell",
  "working_dir": "C:\\Shares\\test",
  "timeout": 30000
}
Response: {
  "stdout": "...",
  "stderr": "...",
  "exit_code": 0,
  "duration_ms": 1234
}

Testing:

Execute simple PowerShell command (Get-Date)
Execute long-running command (Sleep 10)
Test timeout enforcement
Verify working directory restriction
Test concurrent commands (multiple channel IDs)

Phase 3: File Operations (Week 3)

Goal: Read, write, list files through tunnel

Implementation:

Create FileChannel handler on agent
- Read file: fs::read, base64 encode if binary
- Write file: base64 decode, fs::write with backup
- List directory: fs::read_dir with metadata
Implement chunked transfer for files > 1MB
Add MIME type detection (read first bytes, use magic numbers)
Implement transfer_id tracking for multi-chunk uploads/downloads

API endpoints:

GET /api/v1/tunnel/:session_id/file?path=C:\logs\app.log
PUT /api/v1/tunnel/:session_id/file?path=C:\config\app.json
POST /api/v1/tunnel/:session_id/file/list?path=C:\Shares

Testing:

Read small text file (< 1KB)
Read large binary file (> 5MB, verify chunking)
Write configuration file
List directory with 100+ files
Verify file permissions respected

Phase 4: MCP Server Integration (Week 4)

Goal: Expose tunnel operations as MCP tools for Claude Code

Implementation:

Create new Rust project: gururmm-mcp-server
Use mcp-server-rs crate for MCP protocol
Implement 6 core tools (run_command, read_file, write_file, list_dir, get_services, control_service)
Add JWT token configuration (user provides token from GuruRMM web UI)
Build tunnel session manager (open session on first tool use, keep alive, close on idle)
Add tool result formatting (pretty-print PowerShell objects, syntax highlight code)

MCP server config:

{
  "mcpServers": {
    "gururmm": {
      "command": "gururmm-mcp-server",
      "args": [],
      "env": {
        "GURURMM_API_URL": "http://172.16.3.30:3001",
        "GURURMM_AUTH_TOKEN": "jwt-token-here"
      }
    }
  }
}

Testing:

Claude Code can list available agents
Claude Code can execute command on remote agent
Claude Code can read/write files on remote agent
Session auto-closes after 5 minutes idle
Rate limiting enforced (100 commands/min)

Phase 5: Advanced Features (Week 5+)

Registry Operations:

Add RegistryChannel handler (Windows-only)
Use winreg crate for safe registry access
Support HKLM, HKCU, read/write/delete operations

Service Management:

Add ServiceChannel handler (cross-platform)
Windows: use sc.exe or WMI
Linux: use systemctl
List services, start/stop/restart, get status

Interactive Terminal (Stretch Goal):

WebSocket-based PTY (pseudo-terminal)
Bidirectional streaming (stdin → agent → process, stdout/stderr → agent → server)
Support for interactive programs (vim, top, htop)
Terminal emulation (xterm compatibility)

Database Schema Changes

New Tables

-- Tunnel sessions
CREATE TABLE tech_sessions (
    id SERIAL PRIMARY KEY,
    session_id VARCHAR(36) UNIQUE NOT NULL,
    tech_id INTEGER NOT NULL REFERENCES techs(id),
    agent_id INTEGER NOT NULL REFERENCES agents(id),
    opened_at TIMESTAMP NOT NULL DEFAULT NOW(),
    last_activity TIMESTAMP NOT NULL DEFAULT NOW(),
    closed_at TIMESTAMP,
    status VARCHAR(20) NOT NULL DEFAULT 'active',
    UNIQUE(tech_id, agent_id, status) WHERE status = 'active'
);

-- Tunnel audit log
CREATE TABLE tunnel_audit (
    id SERIAL PRIMARY KEY,
    session_id VARCHAR(36) NOT NULL REFERENCES tech_sessions(session_id),
    channel_id VARCHAR(36) NOT NULL,
    operation VARCHAR(50) NOT NULL,
    details JSONB,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_tech_sessions_tech ON tech_sessions(tech_id);
CREATE INDEX idx_tech_sessions_agent ON tech_sessions(agent_id);
CREATE INDEX idx_tech_sessions_status ON tech_sessions(status);
CREATE INDEX idx_tunnel_audit_session ON tunnel_audit(session_id);
CREATE INDEX idx_tunnel_audit_created ON tunnel_audit(created_at);

Security Considerations

Working Directory Restrictions

Agent config file specifies allowed paths: allowed_paths: ["C:\\Shares", "C:\\Temp"]
All file operations validated against allowlist
Path traversal attacks prevented (reject .., absolute path validation)

Rate Limiting

Server enforces: 100 commands per minute per tech per agent
Sliding window implementation (Redis or in-memory)
429 Too Many Requests response on limit exceeded
Audit log tracks rate limit violations

Command Injection Prevention

Agent uses tokio::process::Command (no shell expansion)
PowerShell commands wrapped in -NoProfile -NonInteractive -Command
Input sanitization: reject backticks, escape quotes
Timeout enforcement: kill process after timeout

Session Management

JWT tokens expire after 24 hours
Sessions auto-expire after 4 hours inactivity
Force-close endpoint for admins: DELETE /api/v1/tunnel/:session_id/force-close
Concurrent session limit: 1 tunnel per agent (prevents session hijacking)

Audit Logging

All tunnel operations logged to tunnel_audit table
Logged fields: session_id, channel_id, operation, details (command/path/etc), timestamp
Retention: 90 days (configurable)
Suspicious activity alerts: >50 failed commands in 5 minutes

API Endpoints (New)

POST   /api/v1/tunnel/open
       Body: { "agent_id": 123 }
       Response: { "session_id": "uuid", "status": "active" }

POST   /api/v1/tunnel/close
       Body: { "session_id": "uuid" }
       Response: { "status": "closed" }

GET    /api/v1/tunnel/status/:session_id
       Response: { "session_id": "uuid", "agent_id": 123, "opened_at": "...", "last_activity": "..." }

POST   /api/v1/tunnel/:session_id/command
       Body: { "command": "...", "shell": "powershell", "working_dir": "...", "timeout": 30000 }
       Response: { "stdout": "...", "stderr": "...", "exit_code": 0, "duration_ms": 1234 }

GET    /api/v1/tunnel/:session_id/file?path=...
       Response: { "content": "base64...", "mime_type": "text/plain", "size": 1234 }

PUT    /api/v1/tunnel/:session_id/file?path=...
       Body: { "content": "base64..." }
       Response: { "success": true, "path": "...", "size": 1234 }

POST   /api/v1/tunnel/:session_id/file/list?path=...
       Response: { "entries": [{ "name": "...", "type": "file|dir", "size": 1234, "modified": "..." }] }

MCP Server Implementation

Tool Definitions

{
  "tools": [
    {
      "name": "gururmm_run_command",
      "description": "Execute a command on a remote agent through GuruRMM tunnel",
      "inputSchema": {
        "type": "object",
        "properties": {
          "agent_id": { "type": "number", "description": "Agent ID to execute on" },
          "command": { "type": "string", "description": "Command to execute" },
          "shell": { "type": "string", "enum": ["powershell", "cmd", "bash"], "default": "powershell" },
          "working_dir": { "type": "string", "description": "Working directory (optional)" },
          "timeout": { "type": "number", "description": "Timeout in milliseconds", "default": 30000 }
        },
        "required": ["agent_id", "command"]
      }
    },
    {
      "name": "gururmm_read_file",
      "description": "Read a file from a remote agent",
      "inputSchema": {
        "type": "object",
        "properties": {
          "agent_id": { "type": "number" },
          "path": { "type": "string", "description": "Full path to file" }
        },
        "required": ["agent_id", "path"]
      }
    },
    {
      "name": "gururmm_write_file",
      "description": "Write a file to a remote agent",
      "inputSchema": {
        "type": "object",
        "properties": {
          "agent_id": { "type": "number" },
          "path": { "type": "string", "description": "Full path to file" },
          "content": { "type": "string", "description": "File content" }
        },
        "required": ["agent_id", "path", "content"]
      }
    },
    {
      "name": "gururmm_list_directory",
      "description": "List files in a directory on a remote agent",
      "inputSchema": {
        "type": "object",
        "properties": {
          "agent_id": { "type": "number" },
          "path": { "type": "string", "description": "Directory path" }
        },
        "required": ["agent_id", "path"]
      }
    },
    {
      "name": "gururmm_list_agents",
      "description": "List all available agents",
      "inputSchema": {
        "type": "object",
        "properties": {},
        "required": []
      }
    }
  ]
}

Session Management

Lifecycle:

First tool call triggers tunnel open (POST /api/v1/tunnel/open)
MCP server caches session_id in memory
Subsequent tool calls reuse session
Idle timeout (5 minutes) triggers tunnel close
MCP server can handle concurrent sessions to different agents

Configuration:

MCP server reads JWT token from environment variable
API URL configurable (default: http://172.16.3.30:3001)
Session timeout configurable (default: 5 minutes)

Testing Strategy

Unit Tests

Channel routing logic (correct channel receives message)
Session validation (JWT + ownership)
Command sanitization (injection prevention)
Path validation (traversal prevention)

Integration Tests

Full tunnel lifecycle (open → command → close)
Concurrent sessions to different agents
Session timeout enforcement
Rate limiting triggers correctly

End-to-End Tests

Claude Code MCP integration
Tech opens session via web UI, Claude executes command
File upload via MCP, verify on agent
Service restart via MCP, verify status change

Rollout Plan

Phase 1: Internal Testing (Week 5)

Deploy to test environment (172.16.3.30:3001)
Test with 2 agents (AD2, DESKTOP-0O8A1RL)
Tech team validates MCP integration
Load testing: 10 concurrent sessions, 100 commands/min

Phase 2: Beta Release (Week 6)

Deploy to production (rmm-api.azcomputerguru.com)
Invite 3 beta techs (power users)
Monitor audit logs for issues
Gather feedback on MCP tool UX

Phase 3: General Availability (Week 7)

Release to all techs
Documentation: MCP server setup guide
Training video: Claude Code + GuruRMM workflow
Monitor error rates, tunnel session count

Risks and Mitigations

Risk	Impact	Mitigation
Command injection allows arbitrary code execution	Critical	Input sanitization, no shell expansion, allowlist-based path validation
Session hijacking via stolen JWT	High	Short-lived tokens (24h), session ownership validation, audit logging
WebSocket connection instability	Medium	Auto-reconnect logic, session recovery on reconnect
Rate limiting too strict (blocks legitimate use)	Medium	Configurable limits per tech, burst allowance, user feedback
File transfer timeouts on large files	Medium	Chunked transfer, resumable uploads
MCP server crashes (techs lose access)	Medium	Supervisor/systemd auto-restart, health check endpoint

Open Questions

Registry operations scope: Full registry access or restrict to specific hives (HKLM\Software, HKCU)?
Interactive terminal priority: High demand or defer to Phase 6?
Multi-tech sessions: Should multiple techs be able to share a session (pair programming)?
Credential storage: Should MCP server support credential manager integration (1Password, Windows Credential Manager)?
Agent-side logging: Should agent log tunnel operations locally (compliance requirement)?

Success Metrics

Phase 1-2 (Infrastructure):

95% tunnel open success rate
<500ms average command response time (non-blocking)
Zero session conflicts (concurrent tunnel per agent)

Phase 3-4 (MCP Integration):

80% of techs using MCP tools within 2 weeks
50 tunnel sessions per day
<5% command error rate (excluding user errors)

Phase 5+ (Adoption):

20% reduction in remote desktop sessions (techs use tunnel instead)
90% tech satisfaction rating (survey)
<1% security incidents related to tunnel misuse

Dependencies

Server:

Axum 0.7 (existing)
PostgreSQL (existing)
JWT library (existing)
tokio-tungstenite for WebSocket (existing)

Agent:

tokio 1.x (existing)
serde/serde_json (existing)
base64 crate (for file encoding)
winreg crate (Windows registry, Phase 5)

MCP Server:

mcp-server-rs crate (new dependency)
reqwest for HTTP client (new)
tokio runtime (new)

Infrastructure:

No new servers required (runs on existing 172.16.3.30)
Cloudflare tunnel already configured
Database migrations automated (existing CI/CD)

Next Steps After Approval

Create feature branch: feature/real-time-tunnel
Implement Phase 1 database migrations
Update protocol definitions (ServerMessage/AgentMessage enums)
Create tech_sessions table
Implement tunnel open/close endpoints
Update agent to handle TunnelOpen message
Write unit tests for session validation
Deploy to test environment for validation

Estimated timeline: 5 weeks to MCP integration, 6-7 weeks to GA

Status: READY FOR REVIEW Reviewer: User approval required Questions: See "Open Questions" section above

22 KiB Raw Blame History

GuruRMM Real-Time Tunnel Architecture Plan

Executive Summary

Current Architecture (Discovered)

Server (172.16.3.30:3001)

Agent

Existing Protocol

Architectural Decisions

1. Tunnel Lifecycle: On-Demand with Persistent Connection

2. Channel Multiplexing: Unified Protocol with Channel ID Routing

3. Claude Integration: Custom MCP Server

4. File Operations: Hybrid Approach

5. Security Model

Implementation Plan

Phase 1: Core Tunnel Infrastructure (Week 1)

Phase 2: Terminal Channel (Week 2)

Phase 3: File Operations (Week 3)

Phase 4: MCP Server Integration (Week 4)

Phase 5: Advanced Features (Week 5+)

Database Schema Changes

New Tables

Security Considerations

Working Directory Restrictions

Rate Limiting

Command Injection Prevention

Session Management

Audit Logging

API Endpoints (New)

MCP Server Implementation

Tool Definitions

Session Management

Testing Strategy

Unit Tests

Integration Tests

End-to-End Tests

Rollout Plan

Phase 1: Internal Testing (Week 5)

Phase 2: Beta Release (Week 6)

Phase 3: General Availability (Week 7)

Risks and Mitigations

Open Questions

Success Metrics

Dependencies

Next Steps After Approval

22 KiB

Raw Blame History