Files
claudetools/.claude/commands/rmm.md
Mike Swanson df9be01065 feat(rmm): onboarding diagnostic (Phase 1) - probe + triage + baseline
/rmm diagnose: dispatches a Windows security/health probe to a newly onboarded
agent, grades RED/AMBER/GREEN, writes an immutable per-client baseline
(clients/<slug>/onboarding-baselines/), diffs vs prior, and alerts CRITICALs to
#dev-alerts. Probe is PS5.1/ASCII/SYSTEM-safe, never-abort, base64 chunked upload
around the agent command-size cap. Code-reviewed (no blockers); folded in
immutability guard, severity-independent finding ids, Defender-unknown sentinel,
expanded competitor/backup detection.

First baselines captured: Rednour FRONTDESKRECEPT + LEGALASST (both RED - prior
MSP ScreenConnect/Splashtop/Syncro still live; LEGALASST OS EOL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:09:11 -07:00

32 KiB

description
description
Run commands, investigate agents, and execute remote scripts via the GuruRMM agent fleet.

/rmm — GuruRMM remote command execution

Interact with the GuruRMM agent fleet: list agents, run remote commands (PowerShell, shell, Python), poll for output, and cancel in-flight tasks.

Default posture: read before write. Always look up the agent by hostname before dispatching a command. Never guess a UUID.


Usage

/rmm                                    List all agents with connection status
/rmm agents [<client>]                  List agents, optionally filtered by client/site name
/rmm agent <hostname|uuid>              Show agent detail + last 10 commands
/rmm run <hostname|uuid>                Interactively compose and dispatch a script
/rmm shell <hostname|uuid> <cmd>        One-liner bash/sh command (Linux/macOS agents)
/rmm ps <hostname|uuid> <cmd>           One-liner PowerShell command (Windows agents)
/rmm status <command_id>                Check command status
/rmm output <command_id>                Fetch full stdout + stderr for a completed command
/rmm cancel <command_id>                Cancel a pending or running command
/rmm history <hostname|uuid> [N]        Recent command history (default 10, max 500)
/rmm onboard <client name> [site]       Create a new client + site, vault the one-time enrollment key
/rmm diagnose <hostname|uuid> [client]   Run onboarding health/security diagnostic + baseline

API configuration

Base URL: http://172.16.3.30:3001 Auth: JWT (24-hour expiry) — obtain via POST /api/auth/login, pass as Authorization: Bearer <token> Credentials vault path: infrastructure/gururmm-server.sops.yaml

  • credentials.gururmm-api.admin-email
  • credentials.gururmm-api.admin-password

Hard rules (incidents have occurred — no exceptions)

Never hardcode a UUID. Agent UUIDs change when an agent re-enrolls. Always resolve hostname → UUID via GET /api/agents at the start of every workflow. The UUID from memory or a prior session may be stale.

Never guess the shell from the hostname. Use os_type from the agent record: "windows"powershell, "linux"shell, "macos"shell. A machine named "WEST-MEADOW-9025" could be macOS. Check os_type.

Initial status "pending" is not a failure. It means the agent is offline and the command is queued. The agent will execute it when it reconnects. Poll patiently; do not cancel and retry.

Status "failed" with stderr = "Command timed out (server-side reaper)" means the command exceeded its timeout. The server reaper transitions runningfailed rather than introducing a separate timeout status. Read stderr before concluding the script had a bug.

Status "interrupted" means the agent restarted mid-execution. The command was orphaned. The server marks it interrupted when the agent reconnects. Re-run if needed.

context: "user_session" requires an active logged-on user. If no interactive user is signed in, the WTS impersonation call will fail. Check whether a user is logged in before choosing this context. Falls back silently to no output if no desktop session exists.

No interactive input. Agent shells are non-interactive. Scripts must not call Read-Host, pause, read -p, or any prompt. Pass all values as variables or parameters in the script body.

Output is streamed in full but stored as a string. For very large output (log files, directory trees), write the output to a file on the endpoint and fetch it with a second command.

elevated: true sends the flag to the agent but does not guarantee elevation. On Windows the agent is already running as SYSTEM; elevated is a hint for future agent behaviour. Do not rely on it for privilege escalation on endpoints where the agent is not already elevated.

Heredoc payloads — always use --data-binary @- and <<'JSON' for static payloads. On Windows, the Write tool and Git Bash resolve /tmp/foo.json to different directories. Heredoc avoids the file handoff entirely. Use <<'JSON' (single-quoted) for static payloads that contain no shell variables; use <<JSON (unquoted) when interpolating ${VARS}.

After every dispatch: post a one-line alert. RMM alerts go to the private #dev-alerts (Howard + Mike) — post-bot-alert.sh auto-routes any [RMM]/[DEPLOY]-prefixed message there. Write operations (dispatch + cancel) get an alert; read operations (list, status, output) do not.


Phase 0 — Bootstrap (run once per session)

IDENTITY_PATH="${HOME}/.claude/identity.json"
if [ ! -f "$IDENTITY_PATH" ]; then
    IDENTITY_PATH=$(git rev-parse --show-toplevel 2>/dev/null)/.claude/identity.json
fi
REPO_ROOT=$(jq -r '.claudetools_root // empty' "$IDENTITY_PATH" 2>/dev/null)
if [ -z "$REPO_ROOT" ]; then
    REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
fi
VAULT="$REPO_ROOT/.claude/scripts/vault.sh"
RMM="http://172.16.3.30:3001"

RMM_EMAIL=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-email)
RMM_PASS=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-password)

JWT=$(curl -s -X POST "$RMM/api/auth/login" \
  -H "Content-Type: application/json" \
  --data-binary @- <<JSON
{"email": "$RMM_EMAIL", "password": "$RMM_PASS"}
JSON
)
TOKEN=$(echo "$JWT" | jq -r '.token // empty')
if [ -z "$TOKEN" ]; then
    echo "[ERROR] RMM login failed: $JWT"
    exit 1
fi
echo "[OK] Authenticated to GuruRMM"

Reuse $TOKEN, $RMM, and $REPO_ROOT for all subsequent calls in the session. Do not re-authenticate unless you get a 401.


Resolve agent by hostname

Always resolve before dispatching. Partial hostname matches are accepted (grep client-side).

AGENTS=$(curl -s "$RMM/api/agents" -H "Authorization: Bearer $TOKEN")

# Find by exact or partial hostname (case-insensitive)
AGENT=$(echo "$AGENTS" | jq --arg h "HOSTNAME" '[.[] | select(.hostname | ascii_downcase | contains($h | ascii_downcase))] | .[0]')

AGENT_ID=$(echo "$AGENT" | jq -r '.id // empty')
AGENT_HOST=$(echo "$AGENT" | jq -r '.hostname // empty')
AGENT_OS=$(echo "$AGENT" | jq -r '.os_type // empty')
AGENT_STATUS=$(echo "$AGENT" | jq -r '.status // empty')
AGENT_LAST=$(echo "$AGENT" | jq -r '.last_seen // "never"')
IS_CONNECTED=$(echo "$AGENTS" | jq -r --arg id "$AGENT_ID" '.[] | select(.id == $id) | .is_connected // false')

if [ -z "$AGENT_ID" ]; then
    echo "[ERROR] No agent found matching hostname. Run /rmm agents to list enrolled agents."
    exit 1
fi
echo "[OK] Found agent: $AGENT_HOST ($AGENT_OS) — id=$AGENT_ID, connected=$IS_CONNECTED, last_seen=$AGENT_LAST"

If multiple agents match: list the matches and ask the user to disambiguate. Never pick the first silently when there are multiple.

If is_connected = false: warn the user that the command will queue and execute when the agent comes back online. Still dispatch unless the user says to abort.


Agent list display

WEST-MEADOW-9025   macos    online    Scileppi Law          last: 2026-05-28T18:00:00Z  v0.3.4
CS-SERVER          windows  online    Cascades of Tucson    last: 2026-05-28T17:55:00Z  v0.3.4
AD2                windows  offline   ACG Internal          last: 2026-05-27T09:10:00Z  v0.3.2

Show: hostname, os_type, online/offline, client_name (from site_name/client_name), last_seen, agent_version.


Send a command

Determine command_type from os_type

os_type Default command_type Notes
windows powershell Runs via powershell.exe -NonInteractive as SYSTEM
linux shell Runs via /bin/bash
macos shell Runs via /bin/bash (or /bin/zsh on newer installs)

Use python only when explicitly writing a Python script. Use script for saved scripts (not covered in this skill).

Basic dispatch

# For PowerShell (Windows)
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "command_type": "powershell",
  "command": "Get-ComputerInfo | Select-Object CsName, OsName, OsVersion, CsProcessors",
  "timeout_seconds": 60
}
JSON
)

# For shell (Linux/macOS)
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "command_type": "shell",
  "command": "df -h / && uptime",
  "timeout_seconds": 60
}
JSON
)

CMD_ID=$(echo "$CMD_RESP" | jq -r '.command_id // empty')
CMD_STATUS=$(echo "$CMD_RESP" | jq -r '.status // empty')

if [ -z "$CMD_ID" ]; then
    echo "[ERROR] Dispatch failed: $CMD_RESP"
    exit 1
fi
echo "[OK] Command dispatched — id=$CMD_ID, initial_status=$CMD_STATUS"

Initial status values:

  • "running" — agent is online and received the command
  • "pending" — agent is offline; command queued

Multi-line scripts (heredoc with variable interpolation)

For scripts that need shell variable values substituted at the point of the curl call:

# Use <<JSON (unquoted) so shell expands ${VARS} in the payload
# Use jq -n --arg to safely JSON-encode the script text before embedding
SCRIPT='
$path = "C:\Users\$env:USERNAME\Downloads"
Write-Host "Size: $((Get-ChildItem $path -Recurse | Measure-Object Length -Sum).Sum / 1GB) GB"
'
PAYLOAD=$(jq -n \
  --arg ct "powershell" \
  --arg cmd "$SCRIPT" \
  --argjson to 120 \
  '{command_type: $ct, command: $cmd, timeout_seconds: $to}')

CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "$PAYLOAD")

jq -n --arg cmd "$SCRIPT" correctly JSON-encodes the script including backslashes, dollar signs, and newlines. Always use this pattern for multi-line scripts — never embed raw script text into JSON by hand.

Run as logged-on user (context: user_session)

Use when the command requires a desktop session: GUI actions, Clear-RecycleBin, VPN cmdlets that need a user token, interactive-only software.

CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "command_type": "powershell",
  "command": "Get-WMIObject Win32_UserProfile | Where-Object { $_.Special -eq $false } | Select-Object LocalPath",
  "timeout_seconds": 30,
  "context": "user_session"
}
JSON
)

user_session requirements and limits:

  • Requires an active (non-locked) desktop session — no user logged in = command runs but produces empty output or silently no-ops
  • On Windows: runs under the WTS-impersonated token of the currently active user. If an admin is logged in, the command runs elevated; if a standard user, it does not.
  • On Linux/macOS: the agent uses sudo -u <active_user> or su — verify agent implementation supports this on the target platform before relying on it
  • Clear-RecycleBin, Set-VpnConnection -L2tpPsk, and similar COM/interactive-shell cmdlets require user_session
  • Clear-RecycleBin -Force silently no-ops as SYSTEM even with user_session if no desktop is present — enumerate C:\$Recycle.Bin\<SID>\* directly when running as SYSTEM

Preview before dispatch (for /rmm run)

When the user requests an interactive run without a pre-written script, show a preview before dispatching:

COMMAND PREVIEW
---------------
Agent:    WEST-MEADOW-9025 (macos) — online
Type:     shell
Context:  system
Timeout:  120s
Script:
  find /Users/sylvia/Library -name "*.plist" -maxdepth 3

Dispatch? (yes/no)

Wait for explicit confirmation before dispatching any command.


Poll for completion

poll_command() {
    local cmd_id="$1"
    local max_polls="${2:-60}"
    local interval=5
    local count=0

    while [ $count -lt $max_polls ]; do
        RESULT=$(curl -s "$RMM/api/commands/$cmd_id" \
          -H "Authorization: Bearer $TOKEN")
        STATUS=$(echo "$RESULT" | jq -r '.status // empty')

        case "$STATUS" in
            completed|failed|cancelled|interrupted)
                echo "$RESULT"
                return 0
                ;;
            running|pending)
                count=$((count + 1))
                sleep $interval
                ;;
            "")
                echo "[ERROR] Empty status — response: $RESULT"
                return 1
                ;;
        esac
    done

    echo "[WARNING] Polling timeout after $((max_polls * interval))s — last status: $STATUS"
    echo "[INFO] Command $cmd_id may still be running. Use /rmm status $cmd_id to check later."
}

RESULT=$(poll_command "$CMD_ID")

Polling cadence:

  • Default: check every 5s for up to 5 minutes (60 polls)
  • For long-running scripts (software installs, disk scans): pass a higher max_polls or remind the user to check with /rmm status later
  • Never spam polls at < 3s intervals

Display command output

display_output() {
    local result="$1"
    local status=$(echo "$result" | jq -r '.status')
    local exit_code=$(echo "$result" | jq -r '.exit_code // "—"')
    local stdout=$(echo "$result" | jq -r '.stdout // ""')
    local stderr=$(echo "$result" | jq -r '.stderr // ""')

    echo "Status:    $status (exit $exit_code)"
    if [ -n "$stdout" ]; then
        echo "--- stdout ---"
        echo "$stdout"
    fi
    if [ -n "$stderr" ]; then
        echo "--- stderr ---"
        echo "$stderr"
    fi
}

Always show stderr on non-zero exit — PowerShell writes errors to stderr even when the command partially succeeds. Never suppress stderr output.

exit_code = null on status = "pending" or status = "interrupted" — the command never ran or was orphaned. Do not treat null exit code as exit 0.

status = "failed" does NOT always mean a script bug:

  • Check stderr for "Command timed out (server-side reaper)" → timeout, increase timeout_seconds
  • Check stderr for "Agent restarted during execution" → interrupted, re-run
  • Only if stderr has actual script errors is it a script issue

Cancel a command

CANCEL_RESP=$(curl -s -X POST "$RMM/api/commands/$CMD_ID/cancel" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json")
# Response: {"status": "cancelled", "message": "Command cancelled"}
  • Only cancels pending or running commands
  • If already completed, failed, or cancelled: server returns 400 "Command already finished"
  • Cancellation of a running command sends a WebSocket cancel message to the agent; the agent may not honour it immediately

List command history

# All recent commands (admin only)
curl -s "$RMM/api/commands?limit=20" -H "Authorization: Bearer $TOKEN" | \
  jq '[.[] | {id, agent_id, command_type, status, exit_code, created_at, command_text: (.command_text[:60])}]'

# Agent-scoped (works for non-admin)
curl -s "$RMM/api/commands?agent_id=$AGENT_ID&limit=10" -H "Authorization: Bearer $TOKEN" | \
  jq '[.[] | {id, status, exit_code, command_type, created_at, preview: (.command_text[:80])}]'

Verified response shapes (from source)

POST /api/auth/login

{"token": "eyJ..."}

POST /api/agents/{id}/command

{
  "command_id": "uuid",
  "status": "running",
  "message": "Command sent to agent"
}

or if agent offline:

{
  "command_id": "uuid",
  "status": "pending",
  "message": "Agent is offline. Command queued for execution when agent reconnects."
}

GET /api/commands/{id}

{
  "id": "uuid",
  "agent_id": "uuid",
  "command_type": "powershell",
  "command_text": "...",
  "status": "completed",
  "exit_code": 0,
  "stdout": "...",
  "stderr": "",
  "created_at": "ISO-8601",
  "started_at": "ISO-8601",
  "completed_at": "ISO-8601",
  "created_by": "uuid",
  "timeout_seconds": 300,
  "context": "system"
}

Note: the response field is command_text, not command. Parsing .command will return null.

GET /api/agents (array)

[{
  "id": "uuid",
  "hostname": "WEST-MEADOW-9025",
  "os_type": "macos",
  "os_version": "14.5",
  "os_name": "macOS",
  "agent_version": "0.3.4",
  "last_seen": "ISO-8601",
  "status": "online",
  "is_connected": true,
  "device_id": "...",
  "site_id": "uuid",
  "site_name": "Main Office",
  "client_name": "Scileppi Law",
  "maintenance_mode": false,
  "maintenance_mode_note": null,
  "update_channel": null
}]

POST /api/commands/{id}/cancel

{"status": "cancelled", "message": "Command cancelled"}

All command status values

Status Meaning
pending Queued; agent offline
running Agent executing
completed Finished; exit_code set
failed Non-zero exit OR timed out (check stderr)
cancelled Cancelled via API
interrupted Agent restarted mid-run

Platform-specific patterns

Windows — PowerShell as SYSTEM

# Disk usage
Get-PSDrive -PSProvider FileSystem | Select-Object Name, @{n='Used(GB)';e={[math]::Round($_.Used/1GB,2)}}, @{n='Free(GB)';e={[math]::Round($_.Free/1GB,2)}}

# Services (a specific one)
Get-Service -Name "servicename" | Select-Object Name, Status, StartType

# Running processes (top by CPU)
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, Id, CPU, WorkingSet

# Event log errors (last 24h)
Get-WinEvent -LogName System -MaxEvents 50 | Where-Object { $_.LevelDisplayName -eq "Error" -and $_.TimeCreated -gt (Get-Date).AddHours(-24) } | Select-Object TimeCreated, Id, Message

# Recycle bin contents (correct way — Clear-RecycleBin fails as SYSTEM)
Get-ChildItem "C:\`$Recycle.Bin" -Recurse -Force -ErrorAction SilentlyContinue | Select-Object FullName, Length

# User sessions (who is logged on)
query user

# Installed software
Get-ItemProperty HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\* | Select-Object DisplayName, DisplayVersion, Publisher | Sort-Object DisplayName

Common Windows gotchas:

  • Clear-RecycleBin -Force silently no-ops in SYSTEM context — use direct C:\$Recycle.Bin\<SID>\* enumeration
  • $env:USERNAME as SYSTEM = "SYSTEM" (not the logged-on user) — use query user or WMI to find the actual logged-on username
  • Get-Date is UTC-aware; use -Format "yyyy-MM-dd HH:mm:ss" for readable output
  • PowerShell Write-Host outputs to stdout; Write-Error outputs to stderr
  • Paths with spaces need quoting: "C:\Program Files\..."
  • $ErrorActionPreference = 'Stop' makes the script fail on first error (useful for idempotent scripts); without it, PowerShell continues past non-terminating errors

macOS — shell as root

# Disk usage
df -h /

# Find large files
find /Users -maxdepth 4 -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20

# Logged-on user (who is using the GUI session)
stat -f "%Su" /dev/console

# LaunchDaemons check
ls /Library/LaunchDaemons/

# LaunchAgents for a user
ls /Users/sylvia/Library/LaunchAgents/ 2>/dev/null

# Remove a plist and unload (system scope)
launchctl bootout system /Library/LaunchDaemons/com.example.plist 2>/dev/null
rm /Library/LaunchDaemons/com.example.plist

# Mount AFP share (uses Keychain credentials)
osascript -e 'mount volume "afp://SL-SERVER._afpovertcp._tcp.local/Data"'

# File write via base64 (python3 is an Xcode stub on macOS without dev tools)
echo "BASE64ENCODED==" | /usr/bin/base64 -D > /path/to/file
chmod +x /path/to/file

# Strip macOS ACL that prevents directory deletion
chmod -a "group:everyone deny delete" /path/to/dir

Common macOS gotchas:

  • /usr/bin/python3 on macOS without Xcode CLI tools is a stub that triggers an installer dialog — use /usr/bin/base64 -D (capital D — BSD flag, not GNU) for file writing
  • nohup cmd & in agent shell context fails: nohup: can't detach from console: Inappropriate ioctl for device — use LaunchDaemon (launchctl bootstrap system <plist>) for truly detached background processes
  • Home directory subdirectories may have ACL 0: group:everyone deny delete — strip with chmod -a "group:everyone deny delete" before rmdir
  • pgrep rsync matches "colorsyncd" as a substring — use pgrep -f "rsync.*specific-path" for precision
  • launchctl bootstrap gui/501 <plist> targets the user session for UID 501 (first user); find UID with id -u sylvia
  • AFP automount via osascript uses Keychain; works only in user session context

Linux — shell as root

# Disk usage
df -h

# Memory
free -h

# Service status (systemd)
systemctl status servicename --no-pager -l

# Journal (last 50 lines of a service)
journalctl -u servicename -n 50 --no-pager

# Find large files
find / -xdev -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20

# Open ports
ss -tlnp

Investigate → script → dispatch workflow

When the user asks for something that requires investigation before writing a script:

  1. List agents — confirm target, check online status
  2. Characterize the endpoint — run a quick recon command (OS version, disk, services) if the context is unknown
  3. Write the script — draft it inline, show to user
  4. Preview — show agent + script before dispatch
  5. Dispatch — send with appropriate command_type and context
  6. Poll — wait for completion, display output
  7. Bot alert — post one-line summary

For complex multi-step remediation (move files, install software, configure services), dispatch one logical step at a time. Confirm output before proceeding to the next step.


Error handling

HTTP / Condition Meaning Recovery
401 Unauthorized JWT expired or invalid Re-authenticate (Phase 0)
404 Not Found Agent or command UUID not found Re-run GET /api/agents to refresh list
400 "Command already finished" Tried to cancel a done command No action needed
403 Forbidden Non-admin calling fleet-wide endpoint Add ?agent_id=<uuid> filter
status = "failed", stderr = "Command timed out" Timeout Increase timeout_seconds and re-run
status = "interrupted" Agent restarted Re-run the command
status = "pending" (stuck) Agent offline for extended period Notify user; do not cancel unless user confirms
command_id = null or empty Dispatch failed Check response body for error message
jq parse error on agent list Control characters in field Use grep -o '"hostname":"[^"]*"' for simple extraction

Post to #dev-alerts (after every write)

Post after every successful dispatch or cancel. Never post for read-only operations. RMM alerts route to the private #dev-alerts channel automatically (the [RMM] prefix triggers auto-routing in post-bot-alert.sh); no channel argument needed.

ALERT_OUT=$(bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" "<message>")
echo "$ALERT_OUT"

Message format:

[RMM] <Tech> dispatched to <hostname> (<os_type>) - <brief description> -> cmd:<command_id_short>
[RMM] <Tech> cancelled cmd <command_id_short> on <hostname>

<command_id_short> = first 8 chars of the UUID.

Examples:

bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \
  "[RMM] Mike dispatched to WEST-MEADOW-9025 (macos) - disk cleanup AFP symlink -> cmd:3f8a1c2e"

bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \
  "[RMM] Howard ran disk scan on CS-SERVER (windows) - exit 0, 14 GB freed -> cmd:9b4d7e01"

ASCII only — no Unicode dashes or arrows. Use - and ->.


Client / Site onboarding (/rmm onboard)

Provision a new client and its first site, then store the agent enrollment key in the vault. The site api_key is shown only once in the create response — capture it before anything else.

Verified API shapes (from server/src/api/clients.rs + sites.rs):

  • POST /api/clients body {"name": "...", "code"?: "...", "notes"?: "..."}{"id", "name", "code", "is_active", "site_count", ...}. partner_id is assigned server-side (default partner) — do NOT send it.
  • POST /api/sites body {"client_id": "<uuid>", "name": "...", "address"?: "...", "notes"?: "..."}{"site": {"id", "site_code", ...}, "api_key": "grmm_...", "message"}. site_code is server-generated (generate_unique_site_code, e.g. GREEN-FALCON-7214) — never supply it.
  • If the key is lost: POST /api/sites/:id/regenerate-key issues a new one (invalidates the old).

Workflow:

# (Phase 0 bootstrap done → $TOKEN, $RMM, $REPO_ROOT)
NAME="Rednour Law Offices"; SITE="Main"
SLUG="rednour"                       # lowercase, no spaces/hyphens — matches existing vault convention

# 1. Guard against a duplicate client
curl -s "$RMM/api/clients" -H "Authorization: Bearer $TOKEN" \
  | jq -r --arg n "$NAME" '.[] | select(.name|ascii_downcase==($n|ascii_downcase)) | "EXISTS id=\(.id)"'

# 2. Create client
CID=$(curl -s -X POST "$RMM/api/clients" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  --data-binary "{\"name\":\"$NAME\"}" | jq -r '.id')

# 3. Create site — capture the ONE-TIME api_key immediately to a file
curl -s -X POST "$RMM/api/sites" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  --data-binary "{\"client_id\":\"$CID\",\"name\":\"$SITE\"}" > /tmp/site.json
SID=$(jq -r '.site.id' /tmp/site.json); SCODE=$(jq -r '.site.site_code' /tmp/site.json); AKEY=$(jq -r '.api_key' /tmp/site.json)

# 4. Vault the enrollment key (mirror existing clients/<slug>/gururmm-site-main.sops.yaml structure)
VR=$(jq -r '.vault_path' "$REPO_ROOT/.claude/identity.json"); T="$VR/clients/$SLUG/gururmm-site-main.sops.yaml"
mkdir -p "$(dirname "$T")"
cat > "$T" <<YAML
client: $NAME
site: $SITE
created: "$(date +%F)"
credentials:
    client_id: "$CID"
    site_id: "$SID"
    site_code: "$SCODE"
    api_key: "$AKEY"
    installer_url: "https://rmm.azcomputerguru.com/install/$SCODE"
    msi_url: "https://rmm.azcomputerguru.com/api/sites/$SID/installer"
YAML
sops --config "$VR/.sops.yaml" --encrypt --in-place "$T"          # encrypt (see gotchas)
bash "$REPO_ROOT/.claude/scripts/vault.sh" get "clients/$SLUG/gururmm-site-main.sops.yaml" >/dev/null  # verify round-trip
git -C "$VR" add "clients/$SLUG/gururmm-site-main.sops.yaml" && git -C "$VR" commit -q -m "add: $NAME GuruRMM site $SITE enrollment key ($SCODE)" && git -C "$VR" push -q

Vault-encryption gotchas (learned 2026-05-29 onboarding Rednour):

  • sops --encrypt needs the vault's .sops.yaml. From outside the vault dir it errors config file not found — always pass --config "$VR/.sops.yaml". Encryption uses only the public age keys, so no private key / SOPS_AGE_KEY_FILE is required for the encrypt step.
  • Quote all date/timestamp values (created: "2026-05-29"). A bare YAML date makes sops 3.7.3 fail with Cannot walk value, unknown type: time.Time.
  • Put every secret under the credentials: block — the vault encrypted_regex covers credentials|password|secret|api_key|token|.... Fields outside it (client, site, created) stay plaintext as searchable metadata.
  • If --encrypt --in-place fails, the file is left plaintext on disk — fix and re-encrypt (or delete) immediately; never commit it unencrypted. Confirm with grep -c 'ENC\[' "$T" (should be > 0).

Report to the user + bot alert: client_id, site_id, site_code, install page https://rmm.azcomputerguru.com/install/<SCODE>, MSI https://rmm.azcomputerguru.com/api/sites/<SID>/installer, and the vault path. Onboarding is a write → post a [RMM] <tech> onboarded client '<name>' + site '<site>' (<SCODE>) bot alert.


Onboarding diagnostic (/rmm diagnose)

Run a one-shot security + health + inventory probe against a newly onboarded Windows agent and produce a prioritized "take this seriously" report plus an immutable before/after baseline. This is the Phase 1 tooling implementation; a native GuruRMM feature (DB-backed storage, scheduled re-baselines) is Phase 3.

Runner: .claude/scripts/run-onboarding-diagnostic.sh <hostname|uuid> [client-slug] Probe: .claude/scripts/onboarding-diagnostic.ps1 (Windows PowerShell 5.1, ASCII, runs as SYSTEM)

bash "$REPO_ROOT/.claude/scripts/run-onboarding-diagnostic.sh" FrontDeskReception rednour

If no client-slug is given, it is derived by slugifying the agent's client_name.

Workflow

  1. Authenticate to RMM (vault creds, same as the rest of this skill).
  2. Resolve the agent (exact UUID, exact hostname, then partial). Windows-only.
  3. Upload the probe to the endpoint base64-encoded, in <24 KB chunks (the agent caps an inline command body at ~32-40 KB; the probe is ~60 KB), then a final small command decodes it to a .ps1, runs it, and deletes both temp files. Every check in the probe is wrapped in try/catch, so one failing check becomes an unknown-severity finding instead of aborting the probe.
  4. The probe emits a single JSON object fenced by ===DIAG-JSON-START=== / ===DIAG-JSON-END===; the runner extracts it from between the markers.
  5. Grade, write two baseline files, diff against any prior baseline, alert.

Grade model

Grade Meaning
RED At least one critical finding
AMBER At least one warning, no critical
GREEN No critical and no warning

unknown-severity findings (a check that failed to run) do not change the grade but are listed in the report for manual follow-up.

What it checks

  • Security: Defender state (RTP/service/signature age/tamper), 3rd-party AV conflicts, leftover competitor RMM / remote-access agents (ScreenConnect, NinjaRMM, Datto, Atera, Kaseya, TeamViewer, AnyDesk, Splashtop, N-able, Syncro, Action1, Automate, LogMeIn), firewall profiles, BitLocker (laptop-aware), local admins / built-in Administrator / non-expiring passwords, patch posture + OS EOL, RDP/NLA, SMBv1, UAC, LAPS.
  • Health: disk free %, SMART/physical-disk health, 14-day stability (unexpected shutdown / BSOD / disk errors), pending reboot, uptime, failed auto-start services, domain secure channel, time source, battery (laptops), backup-agent presence.
  • Inventory baseline (info): model/serial, CPU/RAM, BIOS, TPM, Secure Boot, OS edition/build/activation, full installed-software list, local users/groups, network, scheduled tasks + Run-key autoruns.

Where baselines are stored

clients/<slug>/onboarding-baselines/ — two files per run, both timestamped <HOST>-<UTC-YYYYMMDDTHHMMSS>:

  • *.json — the raw immutable snapshot (do not edit; it is the source of truth for diffs).
  • *.md — the human report: grade, findings grouped critical -> warning -> info -> unknown, inventory summary, and a diff section vs the most recent prior baseline (new / resolved / regressed findings, software added/removed).

Baselines are immutable and append-only. GuruRMM-DB storage of baselines arrives with the Phase 3 native feature.

Alerting

Each CRITICAL finding and a RED overall grade auto-post a one-line [RMM] alert to #dev-alerts via post-bot-alert.sh (ASCII only, soft-fail). Example: [RMM] Onboarding diag <HOST> (<client>) = RED: <n> critical - <titles>.


Known enrolled agents (verify with GET /api/agents — UUIDs change on re-enroll)

Do not use this table as authoritative — always resolve live. Treat as a starting hint only.

Hostname Client OS
WEST-MEADOW-9025 Scileppi Law macOS
CS-SERVER Cascades of Tucson Windows
DESKTOP-DLTAGOI Cascades LE Windows
AD2 ACG Internal Windows

What NOT to use RMM for

  • Permanent file deletion without user confirmation — always show what will be deleted, get a yes
  • Credential harvesting — do not dump password stores, SAM hive, or credential manager contents
  • Software installation without stating what will be installed — always preview the installer command
  • Bulk destructive operations (rm -rf /, format, Remove-Item -Recurse C:\Windows) — require explicit written confirmation from the user in chat, not just "yes" to a vague preview
  • RMM product development — use the Coding Agent and GuruRMM project context instead