claudetools/projects/msp-tools/guru-rmm/docs/REMEDIATION_PLAN.md

# GuruRMM Remediation Plan

**Generated:** 2026-01-20
**Review Source:** Full-scale code review of all components
**Priority:** Critical issues first, then major, then warnings

---

## Executive Summary

This document provides specific code fixes for all critical and major issues identified during the comprehensive code review. Issues are organized by component and priority.

**Issue Counts:**
- CRITICAL: 16 (must fix before production)
- MAJOR: 14 (should fix before production)
- WARNING: 28 (recommended improvements)

---

## Phase 1: Critical Security Fixes (MUST DO)

### 1.1 Server: Restrict CORS Configuration

**File:** `server/src/main.rs`
**Lines:** 133-136
**Severity:** CRITICAL

**Current Code:**
```rust
let cors = CorsLayer::new()
    .allow_origin(Any)
    .allow_methods(Any)
    .allow_headers(Any);
```

**Fixed Code:**
```rust
use tower_http::cors::{CorsLayer, AllowOrigin};
use http::HeaderValue;

// Get allowed origins from environment
let dashboard_origin = std::env::var("DASHBOARD_URL")
    .unwrap_or_else(|_| "https://rmm.azcomputerguru.com".to_string());

let cors = CorsLayer::new()
    .allow_origin(AllowOrigin::exact(
        HeaderValue::from_str(&dashboard_origin).expect("Invalid DASHBOARD_URL")
    ))
    .allow_methods([
        http::Method::GET,
        http::Method::POST,
        http::Method::PUT,
        http::Method::DELETE,
        http::Method::OPTIONS,
    ])
    .allow_headers([
        http::header::AUTHORIZATION,
        http::header::CONTENT_TYPE,
        http::header::ACCEPT,
    ])
    .allow_credentials(true);
```

**Environment Variable:**
```bash
DASHBOARD_URL=https://rmm.azcomputerguru.com
```

---

### 1.2 Server: Add Authentication to All Endpoints

**File:** `server/src/api/agents.rs`
**Severity:** CRITICAL

**Problem:** Most endpoints lack the `AuthUser` extractor.

**Fix Pattern - Apply to ALL endpoint functions:**

```rust
// BEFORE (insecure)
pub async fn list_agents(
    State(state): State<AppState>,
) -> Result<Json<Vec<AgentResponse>>, (StatusCode, String)> {

// AFTER (secure)
pub async fn list_agents(
    State(state): State<AppState>,
    _user: AuthUser,  // Add this to require authentication
) -> Result<Json<Vec<AgentResponse>>, (StatusCode, String)> {
```

**Endpoints requiring AuthUser addition:**

**agents.rs:**
- `list_agents` (line 62)
- `get_agent` (line 74)
- `delete_agent` (line 87)
- `get_stats` (line 109)
- `move_agent` (line 126)
- `list_agents_with_details` (line 152)
- `list_unassigned_agents` (line 163)
- `get_agent_state` (line 175)
- `register_agent` (line 32) - Add auth OR implement rate limiting

**commands.rs:**
- `send_command` (line 46)
- `list_commands` (line 103)
- `get_command` (line 117)

**metrics.rs:**
- `get_agent_metrics` (line 29)
- `get_summary` (line 57)

**Complete Fix for commands.rs:**
```rust
// server/src/api/commands.rs

use crate::auth::AuthUser;

pub async fn send_command(
    State(state): State<AppState>,
    user: AuthUser,  // ADD THIS
    Path(agent_id): Path<Uuid>,
    Json(req): Json<SendCommandRequest>,
) -> Result<Json<CommandResponse>, (StatusCode, String)> {
    // Log who sent the command
    tracing::info!(
        user_id = %user.user_id,
        agent_id = %agent_id,
        command_type = ?req.command_type,
        "Command sent by user"
    );

    // Create command with user attribution
    let command = db::create_command(
        &state.db,
        agent_id,
        &req.command_type,
        &req.command,
        req.timeout_seconds,
        req.elevated.unwrap_or(false),
        Some(user.user_id),  // Track who created it
    )
    .await
    .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;

    // ... rest of function
}

pub async fn list_commands(
    State(state): State<AppState>,
    _user: AuthUser,  // ADD THIS
) -> Result<Json<Vec<CommandResponse>>, (StatusCode, String)> {
    // ... existing code
}

pub async fn get_command(
    State(state): State<AppState>,
    _user: AuthUser,  // ADD THIS
    Path(command_id): Path<Uuid>,
) -> Result<Json<CommandResponse>, (StatusCode, String)> {
    // ... existing code
}
```

---

### 1.3 Server: Add Rate Limiting to Registration

**File:** `server/src/main.rs`
**Severity:** CRITICAL

**Add dependency to Cargo.toml:**
```toml
tower-governor = "0.3"
```

**Implementation:**
```rust
// server/src/main.rs

use tower_governor::{GovernorLayer, GovernorConfigBuilder, KeyExtractor};
use std::net::IpAddr;

// Custom key extractor for rate limiting by IP
#[derive(Clone)]
struct RealIpKeyExtractor;

impl KeyExtractor for RealIpKeyExtractor {
    type Key = IpAddr;

    fn extract<T>(&self, req: &http::Request<T>) -> Result<Self::Key, GovernorError> {
        // Try X-Forwarded-For first (for reverse proxy)
        if let Some(forwarded) = req.headers().get("x-forwarded-for") {
            if let Ok(s) = forwarded.to_str() {
                if let Some(ip) = s.split(',').next() {
                    if let Ok(addr) = ip.trim().parse() {
                        return Ok(addr);
                    }
                }
            }
        }

        // Fall back to connection info
        req.extensions()
            .get::<axum::extract::ConnectInfo<std::net::SocketAddr>>()
            .map(|ci| ci.0.ip())
            .ok_or(GovernorError::UnableToExtractKey)
    }
}

// In main() or router setup:
let registration_limiter = GovernorConfigBuilder::default()
    .per_second(1)  // 1 request per second
    .burst_size(5)  // Allow burst of 5
    .key_extractor(RealIpKeyExtractor)
    .finish()
    .unwrap();

let app = Router::new()
    // Rate-limited registration endpoints
    .route("/api/agents/register",
        post(register_agent).layer(GovernorLayer::new(&registration_limiter)))
    .route("/api/agents/register-legacy",
        post(register_legacy).layer(GovernorLayer::new(&registration_limiter)))
    // ... other routes without rate limiting
```

---

### 1.4 Agent: Remove Unicode Characters

**File:** `agent/src/main.rs`
**Lines:** 460, 467, 478, 480
**Severity:** CRITICAL

**Find and replace all Unicode characters:**

```rust
// BEFORE
println!("\n✓ GuruRMM Agent installed successfully!");
println!("\n⚠️  IMPORTANT: Edit {} with your server URL and API key!", config_dest);

// AFTER
println!("\n[OK] GuruRMM Agent installed successfully!");
println!("\n[WARNING] IMPORTANT: Edit {} with your server URL and API key!", config_dest);
```

**Complete list of replacements:**
| Before | After |
|--------|-------|
| `✓` | `[OK]` |
| `✗` | `[ERROR]` |
| `⚠️` | `[WARNING]` |
| `ℹ️` | `[INFO]` |
| `→` | `->` |
| `•` | `-` |

---

### 1.5 Agent: Secure Update Download with Certificate Pinning

**File:** `agent/src/updater/mod.rs`
**Severity:** CRITICAL

**Add to Cargo.toml:**
```toml
[dependencies]
sha2 = "0.10"
```

**Implementation:**
```rust
// agent/src/updater/mod.rs

use sha2::{Sha256, Digest};

impl Updater {
    /// Download binary with certificate validation
    async fn download_binary(&self, url: &str, expected_checksum: &str) -> Result<PathBuf> {
        // Validate URL is from trusted domain
        let allowed_domains = [
            "rmm-api.azcomputerguru.com",
            "downloads.azcomputerguru.com",
        ];

        let parsed_url = url::Url::parse(url)
            .context("Invalid download URL")?;

        let host = parsed_url.host_str()
            .ok_or_else(|| anyhow::anyhow!("No host in download URL"))?;

        if !allowed_domains.iter().any(|d| host == *d || host.ends_with(&format!(".{}", d))) {
            return Err(anyhow::anyhow!(
                "Download URL host '{}' not in allowed domains: {:?}",
                host, allowed_domains
            ));
        }

        // Require HTTPS
        if parsed_url.scheme() != "https" {
            return Err(anyhow::anyhow!("Download URL must use HTTPS"));
        }

        info!("Downloading update from: {}", url);

        let response = self.http_client
            .get(url)
            .send()
            .await
            .context("HTTP request failed")?;

        if !response.status().is_success() {
            return Err(anyhow::anyhow!(
                "Download failed with status: {}",
                response.status()
            ));
        }

        let bytes = response.bytes().await
            .context("Failed to read response body")?;

        // Verify checksum BEFORE writing to disk
        let mut hasher = Sha256::new();
        hasher.update(&bytes);
        let actual_checksum = format!("{:x}", hasher.finalize());

        if actual_checksum != expected_checksum {
            return Err(anyhow::anyhow!(
                "Checksum mismatch! Expected: {}, Got: {}",
                expected_checksum, actual_checksum
            ));
        }

        info!("Checksum verified: {}", actual_checksum);

        // Write to secure temp location
        let temp_dir = std::env::temp_dir();
        let temp_path = temp_dir.join(format!("gururmm-update-{}.tmp", uuid::Uuid::new_v4()));

        tokio::fs::write(&temp_path, &bytes).await
            .context("Failed to write update binary")?;

        // Set restrictive permissions on Unix
        #[cfg(unix)]
        {
            use std::os::unix::fs::PermissionsExt;
            let perms = std::fs::Permissions::from_mode(0o700);
            std::fs::set_permissions(&temp_path, perms)?;
        }

        Ok(temp_path)
    }
}
```

---

### 1.6 Agent: Fix Insecure Temp File Creation

**File:** `agent/src/updater/mod.rs`
**Lines:** 275-331
**Severity:** CRITICAL

**Problem:** Using world-writable `/tmp/` for rollback script.

**Fixed Code:**
```rust
// agent/src/updater/mod.rs

#[cfg(unix)]
async fn create_rollback_script(&self, backup_path: &Path, target_path: &Path) -> Result<PathBuf> {
    use std::os::unix::fs::PermissionsExt;

    // Use /var/run or agent's own directory instead of /tmp
    let script_dir = if Path::new("/var/run/gururmm").exists() {
        PathBuf::from("/var/run/gururmm")
    } else {
        // Fall back to agent's config directory
        PathBuf::from("/etc/gururmm")
    };

    // Ensure directory exists with proper permissions
    tokio::fs::create_dir_all(&script_dir).await?;

    let script_path = script_dir.join(format!("rollback-{}.sh", uuid::Uuid::new_v4()));

    let script = format!(r#"#!/bin/bash
set -e

# Rollback script for GuruRMM agent update
BACKUP="{}"
TARGET="{}"
SCRIPT_PATH="$0"

# Function to cleanup
cleanup() {{
    rm -f "$SCRIPT_PATH"
}}
trap cleanup EXIT

# Wait for parent process to exit
sleep 2

# Restore backup
if [ -f "$BACKUP" ]; then
    cp "$BACKUP" "$TARGET"
    chmod +x "$TARGET"

    # Restart service
    if command -v systemctl &> /dev/null; then
        systemctl restart gururmm-agent
    fi
fi
"#,
        backup_path.display(),
        target_path.display()
    );

    // Write script with restrictive permissions
    tokio::fs::write(&script_path, &script).await?;

    // Set permissions to 700 (owner only)
    let perms = std::fs::Permissions::from_mode(0o700);
    std::fs::set_permissions(&script_path, perms)?;

    // Verify ownership (should be root)
    let metadata = std::fs::metadata(&script_path)?;
    if metadata.uid() != 0 {
        warn!("Rollback script not owned by root - this may be a security issue");
    }

    Ok(script_path)
}
```

---

### 1.7 Dashboard: Improve Token Storage Security

**File:** `dashboard/src/api/client.ts`
**Severity:** CRITICAL

**Option A: Use httpOnly Cookies (Recommended)**

This requires server-side changes. The server should set the token as an httpOnly cookie.

**Server-side change (server/src/api/auth.rs):**
```rust
use axum::response::{IntoResponse, Response};
use http::header::{SET_COOKIE, HeaderValue};

pub async fn login(
    State(state): State<AppState>,
    Json(req): Json<LoginRequest>,
) -> Result<Response, (StatusCode, String)> {
    // ... existing validation ...

    let token = create_jwt(&user)?;

    // Set httpOnly cookie instead of returning token in body
    let cookie = format!(
        "auth_token={}; HttpOnly; Secure; SameSite=Strict; Path=/; Max-Age=86400",
        token
    );

    let mut response = Json(LoginResponse {
        user_id: user.id,
        email: user.email,
        name: user.name,
        role: user.role,
    }).into_response();

    response.headers_mut().insert(
        SET_COOKIE,
        HeaderValue::from_str(&cookie).unwrap()
    );

    Ok(response)
}
```

**Dashboard change (client.ts):**
```typescript
// dashboard/src/api/client.ts

const api = axios.create({
  baseURL: import.meta.env.VITE_API_URL || 'https://rmm-api.azcomputerguru.com',
  withCredentials: true,  // Send cookies with requests
});

// Remove localStorage token handling - cookies are automatic

export const authApi = {
  login: async (email: string, password: string) => {
    const response = await api.post<LoginResponse>('/api/auth/login', { email, password });
    // No need to store token - it's in httpOnly cookie
    return response.data;
  },

  logout: async () => {
    await api.post('/api/auth/logout');  // Server clears cookie
  },

  // ...
};
```

**Option B: If httpOnly cookies aren't feasible (short-term)**

Add XSS protection and token encryption:

```typescript
// dashboard/src/api/client.ts

import CryptoJS from 'crypto-js';

// Use session-specific key (generated on page load)
const SESSION_KEY = CryptoJS.lib.WordArray.random(32).toString();

const encryptToken = (token: string): string => {
  return CryptoJS.AES.encrypt(token, SESSION_KEY).toString();
};

const decryptToken = (encrypted: string): string | null => {
  try {
    const bytes = CryptoJS.AES.decrypt(encrypted, SESSION_KEY);
    return bytes.toString(CryptoJS.enc.Utf8);
  } catch {
    return null;
  }
};

export const authApi = {
  login: async (email: string, password: string) => {
    const response = await api.post<LoginResponse>('/api/auth/login', { email, password });
    if (response.data.token) {
      // Store encrypted token in sessionStorage (cleared on tab close)
      sessionStorage.setItem('auth_token', encryptToken(response.data.token));
    }
    return response.data;
  },

  getToken: (): string | null => {
    const encrypted = sessionStorage.getItem('auth_token');
    return encrypted ? decryptToken(encrypted) : null;
  },

  // ...
};

// Update interceptor
api.interceptors.request.use((config) => {
  const token = authApi.getToken();
  if (token) {
    config.headers.Authorization = `Bearer ${token}`;
  }
  return config;
});
```

---

### 1.8 Dashboard: Add Command Input Validation

**File:** `dashboard/src/pages/AgentDetail.tsx`
**Lines:** 236-240
**Severity:** CRITICAL

**Add validation and confirmation:**
```typescript
// dashboard/src/pages/AgentDetail.tsx

import { useState, useCallback } from 'react';

// Add dangerous command patterns
const DANGEROUS_PATTERNS = [
  /rm\s+-rf\s+\//i,
  /format\s+[a-z]:/i,
  /del\s+\/[fqs]/i,
  /shutdown/i,
  /reboot/i,
  /init\s+0/i,
  /mkfs/i,
  /dd\s+if=/i,
  />\s*\/dev\//i,
  /chmod\s+-R\s+777/i,
];

const WARN_PATTERNS = [
  /password/i,
  /credential/i,
  /secret/i,
  /api.?key/i,
  /token/i,
];

interface CommandValidation {
  isValid: boolean;
  isDangerous: boolean;
  hasWarnings: boolean;
  messages: string[];
}

const validateCommand = (command: string): CommandValidation => {
  const messages: string[] = [];
  let isDangerous = false;
  let hasWarnings = false;

  // Check for empty command
  if (!command.trim()) {
    return { isValid: false, isDangerous: false, hasWarnings: false, messages: ['Command cannot be empty'] };
  }

  // Check for dangerous patterns
  for (const pattern of DANGEROUS_PATTERNS) {
    if (pattern.test(command)) {
      isDangerous = true;
      messages.push(`[DANGER] Potentially destructive command detected: ${pattern.source}`);
    }
  }

  // Check for warning patterns
  for (const pattern of WARN_PATTERNS) {
    if (pattern.test(command)) {
      hasWarnings = true;
      messages.push(`[WARNING] Command may expose sensitive data: ${pattern.source}`);
    }
  }

  // Check command length
  if (command.length > 10000) {
    messages.push('[WARNING] Command is very long - consider using a script file');
    hasWarnings = true;
  }

  return {
    isValid: true,
    isDangerous,
    hasWarnings,
    messages,
  };
};

// In the component:
const [showConfirmDialog, setShowConfirmDialog] = useState(false);
const [pendingCommand, setPendingCommand] = useState<string>('');
const [validationResult, setValidationResult] = useState<CommandValidation | null>(null);

const handleSendCommand = useCallback(async () => {
  if (!command.trim() || !agent) return;

  const validation = validateCommand(command);
  setValidationResult(validation);

  if (!validation.isValid) {
    return;
  }

  if (validation.isDangerous || validation.hasWarnings) {
    setPendingCommand(command);
    setShowConfirmDialog(true);
    return;
  }

  await executeCommand(command);
}, [command, agent]);

const executeCommand = async (cmd: string) => {
  try {
    await agentsApi.sendCommand(agent!.id, {
      command_type: commandType,
      command: cmd,
      timeout_seconds: 300,
      elevated: false,
    });
    setCommand('');
    setShowConfirmDialog(false);
    setPendingCommand('');
  } catch (err) {
    console.error('Failed to send command:', err);
  }
};

// Add confirmation dialog JSX:
{showConfirmDialog && (
  <div className="fixed inset-0 bg-black/50 flex items-center justify-center z-50">
    <div className="bg-white rounded-lg p-6 max-w-lg w-full mx-4">
      <h3 className="text-lg font-semibold text-red-600 mb-4">
        {validationResult?.isDangerous ? '[DANGER] Confirm Dangerous Command' : '[WARNING] Confirm Command'}
      </h3>

      <div className="mb-4 space-y-2">
        {validationResult?.messages.map((msg, i) => (
          <p key={i} className={msg.includes('DANGER') ? 'text-red-600' : 'text-yellow-600'}>
            {msg}
          </p>
        ))}
      </div>

      <pre className="bg-gray-100 p-3 rounded text-sm mb-4 overflow-x-auto">
        {pendingCommand}
      </pre>

      <p className="text-sm text-gray-600 mb-4">
        This command will be executed on: <strong>{agent?.hostname}</strong>
      </p>

      <div className="flex gap-3 justify-end">
        <button
          onClick={() => {
            setShowConfirmDialog(false);
            setPendingCommand('');
          }}
          className="px-4 py-2 border rounded hover:bg-gray-50"
        >
          Cancel
        </button>
        <button
          onClick={() => executeCommand(pendingCommand)}
          className={`px-4 py-2 rounded text-white ${
            validationResult?.isDangerous
              ? 'bg-red-600 hover:bg-red-700'
              : 'bg-yellow-600 hover:bg-yellow-700'
          }`}
        >
          Execute Anyway
        </button>
      </div>
    </div>
  </div>
)}
```

---

### 1.9 Legacy Agent: Document Certificate Validation Bypass

**File:** `agent-legacy/GuruRMM-Agent.ps1`
**Line:** 103
**Severity:** CRITICAL

**Cannot be fully fixed** for legacy systems that lack proper certificate chains.

**Mitigation: Add explicit parameter and logging:**
```powershell
# GuruRMM-Agent.ps1

param(
    [string]$ServerUrl = "",
    [string]$SiteCode = "",
    [switch]$AllowInsecureTLS,  # NEW: Must be explicitly set
    [switch]$Register,
    [switch]$Run
)

# Near line 103, replace:
function Initialize-TLS {
    # Configure TLS
    try {
        [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12
        Write-Log "TLS 1.2 configured successfully" "INFO"
    } catch {
        Write-Log "TLS 1.2 not available, falling back to TLS 1.1" "WARN"
        try {
            [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls11
        } catch {
            Write-Log "TLS 1.1 not available - connection may be insecure" "ERROR"
        }
    }

    # Certificate validation - ONLY disable if explicitly requested
    if ($script:AllowInsecureTLS) {
        Write-Log "[SECURITY WARNING] Certificate validation DISABLED - vulnerable to MITM attacks" "WARN"
        Write-Log "[SECURITY WARNING] This should only be used on legacy systems with self-signed certificates" "WARN"

        # Log to Windows Event Log for audit trail
        try {
            Write-EventLog -LogName Application -Source "GuruRMM" -EventId 1001 -EntryType Warning `
                -Message "GuruRMM agent started with certificate validation disabled. This is a security risk."
        } catch {
            Write-Log "Could not write to Windows Event Log" "WARN"
        }

        [System.Net.ServicePointManager]::ServerCertificateValidationCallback = { $true }
    } else {
        Write-Log "Certificate validation ENABLED (secure mode)" "INFO"
        # Ensure callback is not set (use default validation)
        [System.Net.ServicePointManager]::ServerCertificateValidationCallback = $null
    }
}

# Update installation documentation:
<#
.SYNOPSIS
    GuruRMM Legacy Agent for Windows Server 2008 R2 and older systems.

.DESCRIPTION
    This agent is designed for systems that cannot run the modern Rust-based agent.

    SECURITY WARNING: The -AllowInsecureTLS switch disables certificate validation,
    making the connection vulnerable to man-in-the-middle attacks. Only use this
    option on isolated networks or when using self-signed certificates that cannot
    be properly installed in the system certificate store.

.PARAMETER AllowInsecureTLS
    [SECURITY RISK] Disables SSL/TLS certificate validation. Required for systems
    with self-signed certificates or broken certificate chains. This flag must be
    explicitly provided - it is not enabled by default.

.EXAMPLE
    # Secure installation (recommended)
    .\GuruRMM-Agent.ps1 -Register -ServerUrl "https://rmm.example.com" -SiteCode "ACME-CORP-1234"

    # Insecure installation (legacy systems only)
    .\GuruRMM-Agent.ps1 -Register -ServerUrl "https://rmm.example.com" -SiteCode "ACME-CORP-1234" -AllowInsecureTLS
#>
```

---

## Phase 2: Major Issues (Should Fix)

### 2.1 Server: Add Missing Database Index

**File:** `server/migrations/005_add_missing_indexes.sql` (new file)

```sql
-- Migration: Add missing indexes for performance
-- This migration adds indexes that were identified during code review

-- Index for agent API key lookups (authentication)
CREATE INDEX IF NOT EXISTS idx_agents_api_key_hash ON agents(api_key_hash);

-- Index for site API key lookups
CREATE INDEX IF NOT EXISTS idx_sites_api_key_hash ON sites(api_key_hash);

-- Index for command status queries
CREATE INDEX IF NOT EXISTS idx_commands_status ON commands(status);

-- Index for metrics time-range queries
CREATE INDEX IF NOT EXISTS idx_metrics_agent_timestamp ON metrics(agent_id, timestamp DESC);

-- Index for finding online agents
CREATE INDEX IF NOT EXISTS idx_agents_status_last_seen ON agents(status, last_seen DESC);
```

---

### 2.2 Server: Fix WebSocket Ping/Pong Protocol

**File:** `server/src/ws/mod.rs`
**Lines:** 379-386

**Current (incorrect):**
```rust
Ok(Message::Ping(data)) => {
    if tx
        .send(ServerMessage::Ack { message_id: None })
        .await
        .is_err()
    {
        break;
    }
}
```

**Fixed:**
```rust
Ok(Message::Ping(data)) => {
    // WebSocket protocol requires Pong response to Ping
    if ws_sender
        .send(Message::Pong(data))
        .await
        .is_err()
    {
        warn!("Failed to send Pong response");
        break;
    }
}
```

---

### 2.3 Agent: Fix Service Path Construction

**File:** `agent/src/service.rs`
**Line:** 93

**Current:**
```rust
let config_path = PathBuf::from(format!(r"{}\\agent.toml", CONFIG_DIR));
```

**Fixed:**
```rust
let config_path = PathBuf::from(CONFIG_DIR).join("agent.toml");
```

---

### 2.4 Agent: Fix Rollback Script Backgrounding

**File:** `agent/src/updater/mod.rs`
**Lines:** 321-327

**Current (broken):**
```rust
tokio::process::Command::new("nohup")
    .arg("bash")
    .arg(&script_path)
    .arg("&")  // This doesn't work - & is passed as literal argument
    .spawn()
```

**Fixed:**
```rust
// Use setsid to create new session and properly detach
tokio::process::Command::new("setsid")
    .arg("bash")
    .arg(&script_path)
    .stdin(std::process::Stdio::null())
    .stdout(std::process::Stdio::null())
    .stderr(std::process::Stdio::null())
    .spawn()
    .context("Failed to spawn rollback script")?;

// Alternative using nohup correctly:
tokio::process::Command::new("bash")
    .arg("-c")
    .arg(format!("nohup {} </dev/null >/dev/null 2>&1 &", script_path.display()))
    .spawn()
    .context("Failed to spawn rollback script")?;
```

---

### 2.5 Agent: Replace todo!() with Proper Error

**File:** `agent/src/main.rs`
**Lines:** 227-229

**Current:**
```rust
#[cfg(target_os = "macos")]
{
    todo!("macOS launchd service installation not yet implemented");
}
```

**Fixed:**
```rust
#[cfg(target_os = "macos")]
{
    return Err(anyhow::anyhow!(
        "macOS launchd service installation is not yet implemented. \
         Please run the agent manually or create a launchd plist manually. \
         See: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html"
    ));
}
```

---

### 2.6 Dashboard: Fix Duplicate Filter Options

**File:** `dashboard/src/pages/Agents.tsx`
**Lines:** 175-176

**Current:**
```tsx
<option value="">All Clients</option>
<option value="">Unassigned</option>
```

**Fixed:**
```tsx
<option value="">All Clients</option>
<option value="__unassigned__">Unassigned Only</option>
```

**Update filtering logic:**
```typescript
const filteredAgents = useMemo(() => {
  let result = agents || [];

  if (clientFilter === '__unassigned__') {
    result = result.filter(a => !a.site_id && !a.client_id);
  } else if (clientFilter) {
    result = result.filter(a => a.client_id === clientFilter);
  }

  if (searchTerm) {
    const term = searchTerm.toLowerCase();
    result = result.filter(a =>
      a.hostname.toLowerCase().includes(term) ||
      a.os_type?.toLowerCase().includes(term)
    );
  }

  return result;
}, [agents, clientFilter, searchTerm]);
```

---

### 2.7 Dashboard: Add Error Handlers to Mutations

**File:** `dashboard/src/pages/Agents.tsx`
**Lines:** 98-114

**Current:**
```typescript
const deleteMutation = useMutation({
  mutationFn: agentsApi.delete,
  onSuccess: () => {
    queryClient.invalidateQueries({ queryKey: ['agents'] });
    setDeleteConfirm(null);
  },
});
```

**Fixed:**
```typescript
const deleteMutation = useMutation({
  mutationFn: agentsApi.delete,
  onSuccess: () => {
    queryClient.invalidateQueries({ queryKey: ['agents'] });
    setDeleteConfirm(null);
    // Optional: Show success toast
  },
  onError: (error: AxiosError<{ error: string }>) => {
    const message = error.response?.data?.error || error.message || 'Failed to delete agent';
    // Show error to user
    alert(`[ERROR] ${message}`);
    // Or use a toast notification library
  },
});

const moveMutation = useMutation({
  mutationFn: ({ agentId, siteId }: { agentId: string; siteId: string | null }) =>
    agentsApi.move(agentId, siteId),
  onSuccess: () => {
    queryClient.invalidateQueries({ queryKey: ['agents'] });
    setMoveAgent(null);
  },
  onError: (error: AxiosError<{ error: string }>) => {
    const message = error.response?.data?.error || error.message || 'Failed to move agent';
    alert(`[ERROR] ${message}`);
  },
});
```

---

### 2.8 Dashboard: Replace `any` Types

**File:** `dashboard/src/pages/Login.tsx`
**Line:** 24

**Current:**
```typescript
} catch (err: any) {
  setError(err.response?.data?.error || 'Login failed');
}
```

**Fixed:**
```typescript
import { AxiosError } from 'axios';

interface ApiError {
  error: string;
}

// In the component:
} catch (err) {
  if (err instanceof AxiosError) {
    const apiError = err.response?.data as ApiError | undefined;
    setError(apiError?.error || err.message || 'Login failed');
  } else if (err instanceof Error) {
    setError(err.message);
  } else {
    setError('An unexpected error occurred');
  }
}
```

**Apply same pattern to Register.tsx line 37.**

---

## Phase 3: Warning Level Issues (Recommended)

### 3.1 Add Request Logging/Audit Trail

**File:** `server/src/main.rs`

```rust
use tower_http::trace::{TraceLayer, DefaultMakeSpan, DefaultOnResponse};
use tracing::Level;

// Add to router
let app = Router::new()
    // ... routes ...
    .layer(
        TraceLayer::new_for_http()
            .make_span_with(DefaultMakeSpan::new().level(Level::INFO))
            .on_response(DefaultOnResponse::new().level(Level::INFO))
    );
```

---

### 3.2 Add Health Check with Database Verification

**File:** `server/src/main.rs`

**Current:**
```rust
async fn health_check() -> &'static str {
    "OK"
}
```

**Fixed:**
```rust
use serde::Serialize;

#[derive(Serialize)]
struct HealthResponse {
    status: &'static str,
    database: &'static str,
    version: &'static str,
}

async fn health_check(State(state): State<AppState>) -> Json<HealthResponse> {
    let db_status = match sqlx::query("SELECT 1")
        .fetch_one(&state.db)
        .await
    {
        Ok(_) => "healthy",
        Err(_) => "unhealthy",
    };

    Json(HealthResponse {
        status: if db_status == "healthy" { "ok" } else { "degraded" },
        database: db_status,
        version: env!("CARGO_PKG_VERSION"),
    })
}
```

---

### 3.3 Add Graceful Shutdown

**File:** `server/src/main.rs`

```rust
use tokio::signal;

async fn shutdown_signal() {
    let ctrl_c = async {
        signal::ctrl_c()
            .await
            .expect("Failed to install Ctrl+C handler");
    };

    #[cfg(unix)]
    let terminate = async {
        signal::unix::signal(signal::unix::SignalKind::terminate())
            .expect("Failed to install signal handler")
            .recv()
            .await;
    };

    #[cfg(not(unix))]
    let terminate = std::future::pending::<()>();

    tokio::select! {
        _ = ctrl_c => {},
        _ = terminate => {},
    }

    tracing::info!("Shutdown signal received, starting graceful shutdown");
}

// In main():
let listener = tokio::net::TcpListener::bind(&addr).await?;

axum::serve(listener, app)
    .with_graceful_shutdown(shutdown_signal())
    .await?;
```

---

## Implementation Checklist

### Critical (Do First)
- [ ] 1.1 Restrict CORS configuration
- [ ] 1.2 Add AuthUser to all endpoints
- [ ] 1.3 Add rate limiting to registration
- [ ] 1.4 Remove Unicode characters from agent output
- [ ] 1.5 Add certificate pinning for updates
- [ ] 1.6 Fix insecure temp file creation
- [ ] 1.7 Improve dashboard token storage
- [ ] 1.8 Add command input validation
- [ ] 1.9 Document/mitigate legacy agent cert bypass

### Major (Do Second)
- [ ] 2.1 Add missing database indexes
- [ ] 2.2 Fix WebSocket ping/pong
- [ ] 2.3 Fix service path construction
- [ ] 2.4 Fix rollback script backgrounding
- [ ] 2.5 Replace todo!() with proper errors
- [ ] 2.6 Fix duplicate filter options
- [ ] 2.7 Add error handlers to mutations
- [ ] 2.8 Replace `any` types with proper types

### Warnings (Do Third)
- [ ] 3.1 Add request logging/audit trail
- [ ] 3.2 Add comprehensive health check
- [ ] 3.3 Add graceful shutdown handling
- [ ] Add pagination to commands endpoint
- [ ] Add input validation to all request types
- [ ] Implement token expiration handling in dashboard
- [ ] Add dark mode toggle (CSS already exists)
- [ ] Add log rotation to legacy agent

---

## Testing Requirements

After implementing fixes:

1. **Authentication Tests**
   - Verify all protected endpoints return 401 without token
   - Verify all protected endpoints work with valid token
   - Verify rate limiting on registration endpoints

2. **Security Tests**
   - Verify CORS blocks unauthorized origins
   - Verify command validation catches dangerous patterns
   - Verify update downloads only from allowed domains

3. **Functional Tests**
   - Verify agent registration still works
   - Verify command execution still works
   - Verify metrics collection still works
   - Verify WebSocket reconnection works

4. **Legacy Agent Tests**
   - Verify agent works without -AllowInsecureTLS (with valid cert)
   - Verify agent works with -AllowInsecureTLS (with self-signed)
   - Verify Windows Event Log entries are created

---

## Estimated Effort

| Phase | Items | Estimated Time |
|-------|-------|----------------|
| Phase 1 (Critical) | 9 items | 8-12 hours |
| Phase 2 (Major) | 8 items | 4-6 hours |
| Phase 3 (Warnings) | 8+ items | 4-8 hours |
| Testing | All | 4-6 hours |
| **Total** | | **20-32 hours** |

---

**Document Version:** 1.0
**Created:** 2026-01-20
**Author:** Claude Code Review System