/rmm diagnose: dispatches a Windows security/health probe to a newly onboarded agent, grades RED/AMBER/GREEN, writes an immutable per-client baseline (clients/<slug>/onboarding-baselines/), diffs vs prior, and alerts CRITICALs to #dev-alerts. Probe is PS5.1/ASCII/SYSTEM-safe, never-abort, base64 chunked upload around the agent command-size cap. Code-reviewed (no blockers); folded in immutability guard, severity-independent finding ids, Defender-unknown sentinel, expanded competitor/backup detection. First baselines captured: Rednour FRONTDESKRECEPT + LEGALASST (both RED - prior MSP ScreenConnect/Splashtop/Syncro still live; LEGALASST OS EOL). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
32 KiB
description
| description |
|---|
| Run commands, investigate agents, and execute remote scripts via the GuruRMM agent fleet. |
/rmm — GuruRMM remote command execution
Interact with the GuruRMM agent fleet: list agents, run remote commands (PowerShell, shell, Python), poll for output, and cancel in-flight tasks.
Default posture: read before write. Always look up the agent by hostname before dispatching a command. Never guess a UUID.
Usage
/rmm List all agents with connection status
/rmm agents [<client>] List agents, optionally filtered by client/site name
/rmm agent <hostname|uuid> Show agent detail + last 10 commands
/rmm run <hostname|uuid> Interactively compose and dispatch a script
/rmm shell <hostname|uuid> <cmd> One-liner bash/sh command (Linux/macOS agents)
/rmm ps <hostname|uuid> <cmd> One-liner PowerShell command (Windows agents)
/rmm status <command_id> Check command status
/rmm output <command_id> Fetch full stdout + stderr for a completed command
/rmm cancel <command_id> Cancel a pending or running command
/rmm history <hostname|uuid> [N] Recent command history (default 10, max 500)
/rmm onboard <client name> [site] Create a new client + site, vault the one-time enrollment key
/rmm diagnose <hostname|uuid> [client] Run onboarding health/security diagnostic + baseline
API configuration
Base URL: http://172.16.3.30:3001
Auth: JWT (24-hour expiry) — obtain via POST /api/auth/login, pass as Authorization: Bearer <token>
Credentials vault path: infrastructure/gururmm-server.sops.yaml
credentials.gururmm-api.admin-emailcredentials.gururmm-api.admin-password
Hard rules (incidents have occurred — no exceptions)
Never hardcode a UUID. Agent UUIDs change when an agent re-enrolls. Always resolve hostname → UUID via GET /api/agents at the start of every workflow. The UUID from memory or a prior session may be stale.
Never guess the shell from the hostname. Use os_type from the agent record: "windows" → powershell, "linux" → shell, "macos" → shell. A machine named "WEST-MEADOW-9025" could be macOS. Check os_type.
Initial status "pending" is not a failure. It means the agent is offline and the command is queued. The agent will execute it when it reconnects. Poll patiently; do not cancel and retry.
Status "failed" with stderr = "Command timed out (server-side reaper)" means the command exceeded its timeout. The server reaper transitions running → failed rather than introducing a separate timeout status. Read stderr before concluding the script had a bug.
Status "interrupted" means the agent restarted mid-execution. The command was orphaned. The server marks it interrupted when the agent reconnects. Re-run if needed.
context: "user_session" requires an active logged-on user. If no interactive user is signed in, the WTS impersonation call will fail. Check whether a user is logged in before choosing this context. Falls back silently to no output if no desktop session exists.
No interactive input. Agent shells are non-interactive. Scripts must not call Read-Host, pause, read -p, or any prompt. Pass all values as variables or parameters in the script body.
Output is streamed in full but stored as a string. For very large output (log files, directory trees), write the output to a file on the endpoint and fetch it with a second command.
elevated: true sends the flag to the agent but does not guarantee elevation. On Windows the agent is already running as SYSTEM; elevated is a hint for future agent behaviour. Do not rely on it for privilege escalation on endpoints where the agent is not already elevated.
Heredoc payloads — always use --data-binary @- and <<'JSON' for static payloads. On Windows, the Write tool and Git Bash resolve /tmp/foo.json to different directories. Heredoc avoids the file handoff entirely. Use <<'JSON' (single-quoted) for static payloads that contain no shell variables; use <<JSON (unquoted) when interpolating ${VARS}.
After every dispatch: post a one-line alert. RMM alerts go to the private #dev-alerts (Howard + Mike) — post-bot-alert.sh auto-routes any [RMM]/[DEPLOY]-prefixed message there. Write operations (dispatch + cancel) get an alert; read operations (list, status, output) do not.
Phase 0 — Bootstrap (run once per session)
IDENTITY_PATH="${HOME}/.claude/identity.json"
if [ ! -f "$IDENTITY_PATH" ]; then
IDENTITY_PATH=$(git rev-parse --show-toplevel 2>/dev/null)/.claude/identity.json
fi
REPO_ROOT=$(jq -r '.claudetools_root // empty' "$IDENTITY_PATH" 2>/dev/null)
if [ -z "$REPO_ROOT" ]; then
REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
fi
VAULT="$REPO_ROOT/.claude/scripts/vault.sh"
RMM="http://172.16.3.30:3001"
RMM_EMAIL=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-email)
RMM_PASS=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-password)
JWT=$(curl -s -X POST "$RMM/api/auth/login" \
-H "Content-Type: application/json" \
--data-binary @- <<JSON
{"email": "$RMM_EMAIL", "password": "$RMM_PASS"}
JSON
)
TOKEN=$(echo "$JWT" | jq -r '.token // empty')
if [ -z "$TOKEN" ]; then
echo "[ERROR] RMM login failed: $JWT"
exit 1
fi
echo "[OK] Authenticated to GuruRMM"
Reuse $TOKEN, $RMM, and $REPO_ROOT for all subsequent calls in the session. Do not re-authenticate unless you get a 401.
Resolve agent by hostname
Always resolve before dispatching. Partial hostname matches are accepted (grep client-side).
AGENTS=$(curl -s "$RMM/api/agents" -H "Authorization: Bearer $TOKEN")
# Find by exact or partial hostname (case-insensitive)
AGENT=$(echo "$AGENTS" | jq --arg h "HOSTNAME" '[.[] | select(.hostname | ascii_downcase | contains($h | ascii_downcase))] | .[0]')
AGENT_ID=$(echo "$AGENT" | jq -r '.id // empty')
AGENT_HOST=$(echo "$AGENT" | jq -r '.hostname // empty')
AGENT_OS=$(echo "$AGENT" | jq -r '.os_type // empty')
AGENT_STATUS=$(echo "$AGENT" | jq -r '.status // empty')
AGENT_LAST=$(echo "$AGENT" | jq -r '.last_seen // "never"')
IS_CONNECTED=$(echo "$AGENTS" | jq -r --arg id "$AGENT_ID" '.[] | select(.id == $id) | .is_connected // false')
if [ -z "$AGENT_ID" ]; then
echo "[ERROR] No agent found matching hostname. Run /rmm agents to list enrolled agents."
exit 1
fi
echo "[OK] Found agent: $AGENT_HOST ($AGENT_OS) — id=$AGENT_ID, connected=$IS_CONNECTED, last_seen=$AGENT_LAST"
If multiple agents match: list the matches and ask the user to disambiguate. Never pick the first silently when there are multiple.
If is_connected = false: warn the user that the command will queue and execute when the agent comes back online. Still dispatch unless the user says to abort.
Agent list display
WEST-MEADOW-9025 macos online Scileppi Law last: 2026-05-28T18:00:00Z v0.3.4
CS-SERVER windows online Cascades of Tucson last: 2026-05-28T17:55:00Z v0.3.4
AD2 windows offline ACG Internal last: 2026-05-27T09:10:00Z v0.3.2
Show: hostname, os_type, online/offline, client_name (from site_name/client_name), last_seen, agent_version.
Send a command
Determine command_type from os_type
os_type |
Default command_type |
Notes |
|---|---|---|
windows |
powershell |
Runs via powershell.exe -NonInteractive as SYSTEM |
linux |
shell |
Runs via /bin/bash |
macos |
shell |
Runs via /bin/bash (or /bin/zsh on newer installs) |
Use python only when explicitly writing a Python script. Use script for saved scripts (not covered in this skill).
Basic dispatch
# For PowerShell (Windows)
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
--data-binary @- <<'JSON'
{
"command_type": "powershell",
"command": "Get-ComputerInfo | Select-Object CsName, OsName, OsVersion, CsProcessors",
"timeout_seconds": 60
}
JSON
)
# For shell (Linux/macOS)
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
--data-binary @- <<'JSON'
{
"command_type": "shell",
"command": "df -h / && uptime",
"timeout_seconds": 60
}
JSON
)
CMD_ID=$(echo "$CMD_RESP" | jq -r '.command_id // empty')
CMD_STATUS=$(echo "$CMD_RESP" | jq -r '.status // empty')
if [ -z "$CMD_ID" ]; then
echo "[ERROR] Dispatch failed: $CMD_RESP"
exit 1
fi
echo "[OK] Command dispatched — id=$CMD_ID, initial_status=$CMD_STATUS"
Initial status values:
"running"— agent is online and received the command"pending"— agent is offline; command queued
Multi-line scripts (heredoc with variable interpolation)
For scripts that need shell variable values substituted at the point of the curl call:
# Use <<JSON (unquoted) so shell expands ${VARS} in the payload
# Use jq -n --arg to safely JSON-encode the script text before embedding
SCRIPT='
$path = "C:\Users\$env:USERNAME\Downloads"
Write-Host "Size: $((Get-ChildItem $path -Recurse | Measure-Object Length -Sum).Sum / 1GB) GB"
'
PAYLOAD=$(jq -n \
--arg ct "powershell" \
--arg cmd "$SCRIPT" \
--argjson to 120 \
'{command_type: $ct, command: $cmd, timeout_seconds: $to}')
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "$PAYLOAD")
jq -n --arg cmd "$SCRIPT" correctly JSON-encodes the script including backslashes, dollar signs, and newlines. Always use this pattern for multi-line scripts — never embed raw script text into JSON by hand.
Run as logged-on user (context: user_session)
Use when the command requires a desktop session: GUI actions, Clear-RecycleBin, VPN cmdlets that need a user token, interactive-only software.
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
--data-binary @- <<'JSON'
{
"command_type": "powershell",
"command": "Get-WMIObject Win32_UserProfile | Where-Object { $_.Special -eq $false } | Select-Object LocalPath",
"timeout_seconds": 30,
"context": "user_session"
}
JSON
)
user_session requirements and limits:
- Requires an active (non-locked) desktop session — no user logged in = command runs but produces empty output or silently no-ops
- On Windows: runs under the WTS-impersonated token of the currently active user. If an admin is logged in, the command runs elevated; if a standard user, it does not.
- On Linux/macOS: the agent uses
sudo -u <active_user>orsu— verify agent implementation supports this on the target platform before relying on it Clear-RecycleBin,Set-VpnConnection -L2tpPsk, and similar COM/interactive-shell cmdlets requireuser_sessionClear-RecycleBin -Forcesilently no-ops as SYSTEM even withuser_sessionif no desktop is present — enumerateC:\$Recycle.Bin\<SID>\*directly when running as SYSTEM
Preview before dispatch (for /rmm run)
When the user requests an interactive run without a pre-written script, show a preview before dispatching:
COMMAND PREVIEW
---------------
Agent: WEST-MEADOW-9025 (macos) — online
Type: shell
Context: system
Timeout: 120s
Script:
find /Users/sylvia/Library -name "*.plist" -maxdepth 3
Dispatch? (yes/no)
Wait for explicit confirmation before dispatching any command.
Poll for completion
poll_command() {
local cmd_id="$1"
local max_polls="${2:-60}"
local interval=5
local count=0
while [ $count -lt $max_polls ]; do
RESULT=$(curl -s "$RMM/api/commands/$cmd_id" \
-H "Authorization: Bearer $TOKEN")
STATUS=$(echo "$RESULT" | jq -r '.status // empty')
case "$STATUS" in
completed|failed|cancelled|interrupted)
echo "$RESULT"
return 0
;;
running|pending)
count=$((count + 1))
sleep $interval
;;
"")
echo "[ERROR] Empty status — response: $RESULT"
return 1
;;
esac
done
echo "[WARNING] Polling timeout after $((max_polls * interval))s — last status: $STATUS"
echo "[INFO] Command $cmd_id may still be running. Use /rmm status $cmd_id to check later."
}
RESULT=$(poll_command "$CMD_ID")
Polling cadence:
- Default: check every 5s for up to 5 minutes (60 polls)
- For long-running scripts (software installs, disk scans): pass a higher
max_pollsor remind the user to check with/rmm statuslater - Never spam polls at < 3s intervals
Display command output
display_output() {
local result="$1"
local status=$(echo "$result" | jq -r '.status')
local exit_code=$(echo "$result" | jq -r '.exit_code // "—"')
local stdout=$(echo "$result" | jq -r '.stdout // ""')
local stderr=$(echo "$result" | jq -r '.stderr // ""')
echo "Status: $status (exit $exit_code)"
if [ -n "$stdout" ]; then
echo "--- stdout ---"
echo "$stdout"
fi
if [ -n "$stderr" ]; then
echo "--- stderr ---"
echo "$stderr"
fi
}
Always show stderr on non-zero exit — PowerShell writes errors to stderr even when the command partially succeeds. Never suppress stderr output.
exit_code = null on status = "pending" or status = "interrupted" — the command never ran or was orphaned. Do not treat null exit code as exit 0.
status = "failed" does NOT always mean a script bug:
- Check
stderrfor"Command timed out (server-side reaper)"→ timeout, increasetimeout_seconds - Check
stderrfor"Agent restarted during execution"→ interrupted, re-run - Only if stderr has actual script errors is it a script issue
Cancel a command
CANCEL_RESP=$(curl -s -X POST "$RMM/api/commands/$CMD_ID/cancel" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json")
# Response: {"status": "cancelled", "message": "Command cancelled"}
- Only cancels
pendingorrunningcommands - If already
completed,failed, orcancelled: server returns 400 "Command already finished" - Cancellation of a
runningcommand sends a WebSocket cancel message to the agent; the agent may not honour it immediately
List command history
# All recent commands (admin only)
curl -s "$RMM/api/commands?limit=20" -H "Authorization: Bearer $TOKEN" | \
jq '[.[] | {id, agent_id, command_type, status, exit_code, created_at, command_text: (.command_text[:60])}]'
# Agent-scoped (works for non-admin)
curl -s "$RMM/api/commands?agent_id=$AGENT_ID&limit=10" -H "Authorization: Bearer $TOKEN" | \
jq '[.[] | {id, status, exit_code, command_type, created_at, preview: (.command_text[:80])}]'
Verified response shapes (from source)
POST /api/auth/login
{"token": "eyJ..."}
POST /api/agents/{id}/command
{
"command_id": "uuid",
"status": "running",
"message": "Command sent to agent"
}
or if agent offline:
{
"command_id": "uuid",
"status": "pending",
"message": "Agent is offline. Command queued for execution when agent reconnects."
}
GET /api/commands/{id}
{
"id": "uuid",
"agent_id": "uuid",
"command_type": "powershell",
"command_text": "...",
"status": "completed",
"exit_code": 0,
"stdout": "...",
"stderr": "",
"created_at": "ISO-8601",
"started_at": "ISO-8601",
"completed_at": "ISO-8601",
"created_by": "uuid",
"timeout_seconds": 300,
"context": "system"
}
Note: the response field is command_text, not command. Parsing .command will return null.
GET /api/agents (array)
[{
"id": "uuid",
"hostname": "WEST-MEADOW-9025",
"os_type": "macos",
"os_version": "14.5",
"os_name": "macOS",
"agent_version": "0.3.4",
"last_seen": "ISO-8601",
"status": "online",
"is_connected": true,
"device_id": "...",
"site_id": "uuid",
"site_name": "Main Office",
"client_name": "Scileppi Law",
"maintenance_mode": false,
"maintenance_mode_note": null,
"update_channel": null
}]
POST /api/commands/{id}/cancel
{"status": "cancelled", "message": "Command cancelled"}
All command status values
| Status | Meaning |
|---|---|
pending |
Queued; agent offline |
running |
Agent executing |
completed |
Finished; exit_code set |
failed |
Non-zero exit OR timed out (check stderr) |
cancelled |
Cancelled via API |
interrupted |
Agent restarted mid-run |
Platform-specific patterns
Windows — PowerShell as SYSTEM
# Disk usage
Get-PSDrive -PSProvider FileSystem | Select-Object Name, @{n='Used(GB)';e={[math]::Round($_.Used/1GB,2)}}, @{n='Free(GB)';e={[math]::Round($_.Free/1GB,2)}}
# Services (a specific one)
Get-Service -Name "servicename" | Select-Object Name, Status, StartType
# Running processes (top by CPU)
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, Id, CPU, WorkingSet
# Event log errors (last 24h)
Get-WinEvent -LogName System -MaxEvents 50 | Where-Object { $_.LevelDisplayName -eq "Error" -and $_.TimeCreated -gt (Get-Date).AddHours(-24) } | Select-Object TimeCreated, Id, Message
# Recycle bin contents (correct way — Clear-RecycleBin fails as SYSTEM)
Get-ChildItem "C:\`$Recycle.Bin" -Recurse -Force -ErrorAction SilentlyContinue | Select-Object FullName, Length
# User sessions (who is logged on)
query user
# Installed software
Get-ItemProperty HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\* | Select-Object DisplayName, DisplayVersion, Publisher | Sort-Object DisplayName
Common Windows gotchas:
Clear-RecycleBin -Forcesilently no-ops in SYSTEM context — use directC:\$Recycle.Bin\<SID>\*enumeration$env:USERNAMEas SYSTEM ="SYSTEM"(not the logged-on user) — usequery useror WMI to find the actual logged-on usernameGet-Dateis UTC-aware; use-Format "yyyy-MM-dd HH:mm:ss"for readable output- PowerShell
Write-Hostoutputs to stdout;Write-Erroroutputs to stderr - Paths with spaces need quoting:
"C:\Program Files\..." $ErrorActionPreference = 'Stop'makes the script fail on first error (useful for idempotent scripts); without it, PowerShell continues past non-terminating errors
macOS — shell as root
# Disk usage
df -h /
# Find large files
find /Users -maxdepth 4 -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20
# Logged-on user (who is using the GUI session)
stat -f "%Su" /dev/console
# LaunchDaemons check
ls /Library/LaunchDaemons/
# LaunchAgents for a user
ls /Users/sylvia/Library/LaunchAgents/ 2>/dev/null
# Remove a plist and unload (system scope)
launchctl bootout system /Library/LaunchDaemons/com.example.plist 2>/dev/null
rm /Library/LaunchDaemons/com.example.plist
# Mount AFP share (uses Keychain credentials)
osascript -e 'mount volume "afp://SL-SERVER._afpovertcp._tcp.local/Data"'
# File write via base64 (python3 is an Xcode stub on macOS without dev tools)
echo "BASE64ENCODED==" | /usr/bin/base64 -D > /path/to/file
chmod +x /path/to/file
# Strip macOS ACL that prevents directory deletion
chmod -a "group:everyone deny delete" /path/to/dir
Common macOS gotchas:
/usr/bin/python3on macOS without Xcode CLI tools is a stub that triggers an installer dialog — use/usr/bin/base64 -D(capital D — BSD flag, not GNU) for file writingnohup cmd &in agent shell context fails:nohup: can't detach from console: Inappropriate ioctl for device— use LaunchDaemon (launchctl bootstrap system <plist>) for truly detached background processes- Home directory subdirectories may have ACL
0: group:everyone deny delete— strip withchmod -a "group:everyone deny delete"beforermdir pgrep rsyncmatches "colorsyncd" as a substring — usepgrep -f "rsync.*specific-path"for precisionlaunchctl bootstrap gui/501 <plist>targets the user session for UID 501 (first user); find UID withid -u sylvia- AFP automount via
osascriptuses Keychain; works only in user session context
Linux — shell as root
# Disk usage
df -h
# Memory
free -h
# Service status (systemd)
systemctl status servicename --no-pager -l
# Journal (last 50 lines of a service)
journalctl -u servicename -n 50 --no-pager
# Find large files
find / -xdev -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20
# Open ports
ss -tlnp
Investigate → script → dispatch workflow
When the user asks for something that requires investigation before writing a script:
- List agents — confirm target, check online status
- Characterize the endpoint — run a quick recon command (OS version, disk, services) if the context is unknown
- Write the script — draft it inline, show to user
- Preview — show agent + script before dispatch
- Dispatch — send with appropriate
command_typeandcontext - Poll — wait for completion, display output
- Bot alert — post one-line summary
For complex multi-step remediation (move files, install software, configure services), dispatch one logical step at a time. Confirm output before proceeding to the next step.
Error handling
| HTTP / Condition | Meaning | Recovery |
|---|---|---|
| 401 Unauthorized | JWT expired or invalid | Re-authenticate (Phase 0) |
| 404 Not Found | Agent or command UUID not found | Re-run GET /api/agents to refresh list |
| 400 "Command already finished" | Tried to cancel a done command | No action needed |
| 403 Forbidden | Non-admin calling fleet-wide endpoint | Add ?agent_id=<uuid> filter |
status = "failed", stderr = "Command timed out" |
Timeout | Increase timeout_seconds and re-run |
status = "interrupted" |
Agent restarted | Re-run the command |
status = "pending" (stuck) |
Agent offline for extended period | Notify user; do not cancel unless user confirms |
command_id = null or empty |
Dispatch failed | Check response body for error message |
jq parse error on agent list |
Control characters in field | Use grep -o '"hostname":"[^"]*"' for simple extraction |
Post to #dev-alerts (after every write)
Post after every successful dispatch or cancel. Never post for read-only operations. RMM alerts route to the private #dev-alerts channel automatically (the [RMM] prefix triggers auto-routing in post-bot-alert.sh); no channel argument needed.
ALERT_OUT=$(bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" "<message>")
echo "$ALERT_OUT"
Message format:
[RMM] <Tech> dispatched to <hostname> (<os_type>) - <brief description> -> cmd:<command_id_short>
[RMM] <Tech> cancelled cmd <command_id_short> on <hostname>
<command_id_short> = first 8 chars of the UUID.
Examples:
bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \
"[RMM] Mike dispatched to WEST-MEADOW-9025 (macos) - disk cleanup AFP symlink -> cmd:3f8a1c2e"
bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \
"[RMM] Howard ran disk scan on CS-SERVER (windows) - exit 0, 14 GB freed -> cmd:9b4d7e01"
ASCII only — no Unicode dashes or arrows. Use - and ->.
Client / Site onboarding (/rmm onboard)
Provision a new client and its first site, then store the agent enrollment key in the vault. The site api_key is shown only once in the create response — capture it before anything else.
Verified API shapes (from server/src/api/clients.rs + sites.rs):
POST /api/clientsbody{"name": "...", "code"?: "...", "notes"?: "..."}→{"id", "name", "code", "is_active", "site_count", ...}.partner_idis assigned server-side (default partner) — do NOT send it.POST /api/sitesbody{"client_id": "<uuid>", "name": "...", "address"?: "...", "notes"?: "..."}→{"site": {"id", "site_code", ...}, "api_key": "grmm_...", "message"}.site_codeis server-generated (generate_unique_site_code, e.g.GREEN-FALCON-7214) — never supply it.- If the key is lost:
POST /api/sites/:id/regenerate-keyissues a new one (invalidates the old).
Workflow:
# (Phase 0 bootstrap done → $TOKEN, $RMM, $REPO_ROOT)
NAME="Rednour Law Offices"; SITE="Main"
SLUG="rednour" # lowercase, no spaces/hyphens — matches existing vault convention
# 1. Guard against a duplicate client
curl -s "$RMM/api/clients" -H "Authorization: Bearer $TOKEN" \
| jq -r --arg n "$NAME" '.[] | select(.name|ascii_downcase==($n|ascii_downcase)) | "EXISTS id=\(.id)"'
# 2. Create client
CID=$(curl -s -X POST "$RMM/api/clients" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
--data-binary "{\"name\":\"$NAME\"}" | jq -r '.id')
# 3. Create site — capture the ONE-TIME api_key immediately to a file
curl -s -X POST "$RMM/api/sites" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
--data-binary "{\"client_id\":\"$CID\",\"name\":\"$SITE\"}" > /tmp/site.json
SID=$(jq -r '.site.id' /tmp/site.json); SCODE=$(jq -r '.site.site_code' /tmp/site.json); AKEY=$(jq -r '.api_key' /tmp/site.json)
# 4. Vault the enrollment key (mirror existing clients/<slug>/gururmm-site-main.sops.yaml structure)
VR=$(jq -r '.vault_path' "$REPO_ROOT/.claude/identity.json"); T="$VR/clients/$SLUG/gururmm-site-main.sops.yaml"
mkdir -p "$(dirname "$T")"
cat > "$T" <<YAML
client: $NAME
site: $SITE
created: "$(date +%F)"
credentials:
client_id: "$CID"
site_id: "$SID"
site_code: "$SCODE"
api_key: "$AKEY"
installer_url: "https://rmm.azcomputerguru.com/install/$SCODE"
msi_url: "https://rmm.azcomputerguru.com/api/sites/$SID/installer"
YAML
sops --config "$VR/.sops.yaml" --encrypt --in-place "$T" # encrypt (see gotchas)
bash "$REPO_ROOT/.claude/scripts/vault.sh" get "clients/$SLUG/gururmm-site-main.sops.yaml" >/dev/null # verify round-trip
git -C "$VR" add "clients/$SLUG/gururmm-site-main.sops.yaml" && git -C "$VR" commit -q -m "add: $NAME GuruRMM site $SITE enrollment key ($SCODE)" && git -C "$VR" push -q
Vault-encryption gotchas (learned 2026-05-29 onboarding Rednour):
sops --encryptneeds the vault's.sops.yaml. From outside the vault dir it errorsconfig file not found— always pass--config "$VR/.sops.yaml". Encryption uses only the public age keys, so no private key /SOPS_AGE_KEY_FILEis required for the encrypt step.- Quote all date/timestamp values (
created: "2026-05-29"). A bare YAML date makes sops 3.7.3 fail withCannot walk value, unknown type: time.Time. - Put every secret under the
credentials:block — the vaultencrypted_regexcoverscredentials|password|secret|api_key|token|.... Fields outside it (client,site,created) stay plaintext as searchable metadata. - If
--encrypt --in-placefails, the file is left plaintext on disk — fix and re-encrypt (or delete) immediately; never commit it unencrypted. Confirm withgrep -c 'ENC\[' "$T"(should be > 0).
Report to the user + bot alert: client_id, site_id, site_code, install page https://rmm.azcomputerguru.com/install/<SCODE>, MSI https://rmm.azcomputerguru.com/api/sites/<SID>/installer, and the vault path. Onboarding is a write → post a [RMM] <tech> onboarded client '<name>' + site '<site>' (<SCODE>) bot alert.
Onboarding diagnostic (/rmm diagnose)
Run a one-shot security + health + inventory probe against a newly onboarded Windows agent and produce a prioritized "take this seriously" report plus an immutable before/after baseline. This is the Phase 1 tooling implementation; a native GuruRMM feature (DB-backed storage, scheduled re-baselines) is Phase 3.
Runner: .claude/scripts/run-onboarding-diagnostic.sh <hostname|uuid> [client-slug]
Probe: .claude/scripts/onboarding-diagnostic.ps1 (Windows PowerShell 5.1, ASCII, runs as SYSTEM)
bash "$REPO_ROOT/.claude/scripts/run-onboarding-diagnostic.sh" FrontDeskReception rednour
If no client-slug is given, it is derived by slugifying the agent's client_name.
Workflow
- Authenticate to RMM (vault creds, same as the rest of this skill).
- Resolve the agent (exact UUID, exact hostname, then partial). Windows-only.
- Upload the probe to the endpoint base64-encoded, in <24 KB chunks (the
agent caps an inline command body at ~32-40 KB; the probe is ~60 KB), then a
final small command decodes it to a
.ps1, runs it, and deletes both temp files. Every check in the probe is wrapped in try/catch, so one failing check becomes anunknown-severity finding instead of aborting the probe. - The probe emits a single JSON object fenced by
===DIAG-JSON-START===/===DIAG-JSON-END===; the runner extracts it from between the markers. - Grade, write two baseline files, diff against any prior baseline, alert.
Grade model
| Grade | Meaning |
|---|---|
| RED | At least one critical finding |
| AMBER | At least one warning, no critical |
| GREEN | No critical and no warning |
unknown-severity findings (a check that failed to run) do not change the grade
but are listed in the report for manual follow-up.
What it checks
- Security: Defender state (RTP/service/signature age/tamper), 3rd-party AV conflicts, leftover competitor RMM / remote-access agents (ScreenConnect, NinjaRMM, Datto, Atera, Kaseya, TeamViewer, AnyDesk, Splashtop, N-able, Syncro, Action1, Automate, LogMeIn), firewall profiles, BitLocker (laptop-aware), local admins / built-in Administrator / non-expiring passwords, patch posture + OS EOL, RDP/NLA, SMBv1, UAC, LAPS.
- Health: disk free %, SMART/physical-disk health, 14-day stability (unexpected shutdown / BSOD / disk errors), pending reboot, uptime, failed auto-start services, domain secure channel, time source, battery (laptops), backup-agent presence.
- Inventory baseline (info): model/serial, CPU/RAM, BIOS, TPM, Secure Boot, OS edition/build/activation, full installed-software list, local users/groups, network, scheduled tasks + Run-key autoruns.
Where baselines are stored
clients/<slug>/onboarding-baselines/ — two files per run, both timestamped
<HOST>-<UTC-YYYYMMDDTHHMMSS>:
*.json— the raw immutable snapshot (do not edit; it is the source of truth for diffs).*.md— the human report: grade, findings grouped critical -> warning -> info -> unknown, inventory summary, and a diff section vs the most recent prior baseline (new / resolved / regressed findings, software added/removed).
Baselines are immutable and append-only. GuruRMM-DB storage of baselines arrives with the Phase 3 native feature.
Alerting
Each CRITICAL finding and a RED overall grade auto-post a one-line
[RMM] alert to #dev-alerts via post-bot-alert.sh (ASCII only, soft-fail).
Example: [RMM] Onboarding diag <HOST> (<client>) = RED: <n> critical - <titles>.
Known enrolled agents (verify with GET /api/agents — UUIDs change on re-enroll)
Do not use this table as authoritative — always resolve live. Treat as a starting hint only.
| Hostname | Client | OS |
|---|---|---|
| WEST-MEADOW-9025 | Scileppi Law | macOS |
| CS-SERVER | Cascades of Tucson | Windows |
| DESKTOP-DLTAGOI | Cascades LE | Windows |
| AD2 | ACG Internal | Windows |
What NOT to use RMM for
- Permanent file deletion without user confirmation — always show what will be deleted, get a yes
- Credential harvesting — do not dump password stores, SAM hive, or credential manager contents
- Software installation without stating what will be installed — always preview the installer command
- Bulk destructive operations (
rm -rf /, format,Remove-Item -Recurse C:\Windows) — require explicit written confirmation from the user in chat, not just "yes" to a vague preview - RMM product development — use the Coding Agent and GuruRMM project context instead