Files

Mike Swanson df9be01065 feat(rmm): onboarding diagnostic (Phase 1) - probe + triage + baseline

/rmm diagnose: dispatches a Windows security/health probe to a newly onboarded
agent, grades RED/AMBER/GREEN, writes an immutable per-client baseline
(clients/<slug>/onboarding-baselines/), diffs vs prior, and alerts CRITICALs to
#dev-alerts. Probe is PS5.1/ASCII/SYSTEM-safe, never-abort, base64 chunked upload
around the agent command-size cap. Code-reviewed (no blockers); folded in
immutability guard, severity-independent finding ids, Defender-unknown sentinel,
expanded competitor/backup detection.

First baselines captured: Rednour FRONTDESKRECEPT + LEGALASST (both RED - prior
MSP ScreenConnect/Splashtop/Syncro still live; LEGALASST OS EOL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-29 13:09:11 -07:00

32 KiB

Raw Blame History

description

description
Run commands, investigate agents, and execute remote scripts via the GuruRMM agent fleet.

/rmm — GuruRMM remote command execution

Interact with the GuruRMM agent fleet: list agents, run remote commands (PowerShell, shell, Python), poll for output, and cancel in-flight tasks.

Default posture: read before write. Always look up the agent by hostname before dispatching a command. Never guess a UUID.

Usage

/rmm                                    List all agents with connection status
/rmm agents [<client>]                  List agents, optionally filtered by client/site name
/rmm agent <hostname|uuid>              Show agent detail + last 10 commands
/rmm run <hostname|uuid>                Interactively compose and dispatch a script
/rmm shell <hostname|uuid> <cmd>        One-liner bash/sh command (Linux/macOS agents)
/rmm ps <hostname|uuid> <cmd>           One-liner PowerShell command (Windows agents)
/rmm status <command_id>                Check command status
/rmm output <command_id>                Fetch full stdout + stderr for a completed command
/rmm cancel <command_id>                Cancel a pending or running command
/rmm history <hostname|uuid> [N]        Recent command history (default 10, max 500)
/rmm onboard <client name> [site]       Create a new client + site, vault the one-time enrollment key
/rmm diagnose <hostname|uuid> [client]   Run onboarding health/security diagnostic + baseline

API configuration

Base URL: http://172.16.3.30:3001 Auth: JWT (24-hour expiry) — obtain via POST /api/auth/login, pass as Authorization: Bearer <token> Credentials vault path: infrastructure/gururmm-server.sops.yaml

credentials.gururmm-api.admin-email
credentials.gururmm-api.admin-password

Hard rules (incidents have occurred — no exceptions)

Never hardcode a UUID. Agent UUIDs change when an agent re-enrolls. Always resolve hostname → UUID via GET /api/agents at the start of every workflow. The UUID from memory or a prior session may be stale.

Never guess the shell from the hostname. Use os_type from the agent record: "windows" → powershell, "linux" → shell, "macos" → shell. A machine named "WEST-MEADOW-9025" could be macOS. Check os_type.

Initial status "pending" is not a failure. It means the agent is offline and the command is queued. The agent will execute it when it reconnects. Poll patiently; do not cancel and retry.

Status "failed" with stderr = "Command timed out (server-side reaper)" means the command exceeded its timeout. The server reaper transitions running → failed rather than introducing a separate timeout status. Read stderr before concluding the script had a bug.

Status "interrupted" means the agent restarted mid-execution. The command was orphaned. The server marks it interrupted when the agent reconnects. Re-run if needed.

context: "user_session" requires an active logged-on user. If no interactive user is signed in, the WTS impersonation call will fail. Check whether a user is logged in before choosing this context. Falls back silently to no output if no desktop session exists.

No interactive input. Agent shells are non-interactive. Scripts must not call Read-Host, pause, read -p, or any prompt. Pass all values as variables or parameters in the script body.

Output is streamed in full but stored as a string. For very large output (log files, directory trees), write the output to a file on the endpoint and fetch it with a second command.

elevated: true sends the flag to the agent but does not guarantee elevation. On Windows the agent is already running as SYSTEM; elevated is a hint for future agent behaviour. Do not rely on it for privilege escalation on endpoints where the agent is not already elevated.

Heredoc payloads — always use --data-binary @- and <<'JSON' for static payloads. On Windows, the Write tool and Git Bash resolve /tmp/foo.json to different directories. Heredoc avoids the file handoff entirely. Use <<'JSON' (single-quoted) for static payloads that contain no shell variables; use <<JSON (unquoted) when interpolating ${VARS}.

After every dispatch: post a one-line alert. RMM alerts go to the private #dev-alerts (Howard + Mike) — post-bot-alert.sh auto-routes any [RMM]/[DEPLOY]-prefixed message there. Write operations (dispatch + cancel) get an alert; read operations (list, status, output) do not.

Phase 0 — Bootstrap (run once per session)

IDENTITY_PATH="${HOME}/.claude/identity.json"
if [ ! -f "$IDENTITY_PATH" ]; then
    IDENTITY_PATH=$(git rev-parse --show-toplevel 2>/dev/null)/.claude/identity.json
fi
REPO_ROOT=$(jq -r '.claudetools_root // empty' "$IDENTITY_PATH" 2>/dev/null)
if [ -z "$REPO_ROOT" ]; then
    REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
fi
VAULT="$REPO_ROOT/.claude/scripts/vault.sh"
RMM="http://172.16.3.30:3001"

RMM_EMAIL=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-email)
RMM_PASS=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-password)

JWT=$(curl -s -X POST "$RMM/api/auth/login" \
  -H "Content-Type: application/json" \
  --data-binary @- <<JSON
{"email": "$RMM_EMAIL", "password": "$RMM_PASS"}
JSON
)
TOKEN=$(echo "$JWT" | jq -r '.token // empty')
if [ -z "$TOKEN" ]; then
    echo "[ERROR] RMM login failed: $JWT"
    exit 1
fi
echo "[OK] Authenticated to GuruRMM"

Reuse $TOKEN, $RMM, and $REPO_ROOT for all subsequent calls in the session. Do not re-authenticate unless you get a 401.

Resolve agent by hostname

Always resolve before dispatching. Partial hostname matches are accepted (grep client-side).

AGENTS=$(curl -s "$RMM/api/agents" -H "Authorization: Bearer $TOKEN")

# Find by exact or partial hostname (case-insensitive)
AGENT=$(echo "$AGENTS" | jq --arg h "HOSTNAME" '[.[] | select(.hostname | ascii_downcase | contains($h | ascii_downcase))] | .[0]')

AGENT_ID=$(echo "$AGENT" | jq -r '.id // empty')
AGENT_HOST=$(echo "$AGENT" | jq -r '.hostname // empty')
AGENT_OS=$(echo "$AGENT" | jq -r '.os_type // empty')
AGENT_STATUS=$(echo "$AGENT" | jq -r '.status // empty')
AGENT_LAST=$(echo "$AGENT" | jq -r '.last_seen // "never"')
IS_CONNECTED=$(echo "$AGENTS" | jq -r --arg id "$AGENT_ID" '.[] | select(.id == $id) | .is_connected // false')

if [ -z "$AGENT_ID" ]; then
    echo "[ERROR] No agent found matching hostname. Run /rmm agents to list enrolled agents."
    exit 1
fi
echo "[OK] Found agent: $AGENT_HOST ($AGENT_OS) — id=$AGENT_ID, connected=$IS_CONNECTED, last_seen=$AGENT_LAST"

If multiple agents match: list the matches and ask the user to disambiguate. Never pick the first silently when there are multiple.

If is_connected = false: warn the user that the command will queue and execute when the agent comes back online. Still dispatch unless the user says to abort.

Agent list display

WEST-MEADOW-9025   macos    online    Scileppi Law          last: 2026-05-28T18:00:00Z  v0.3.4
CS-SERVER          windows  online    Cascades of Tucson    last: 2026-05-28T17:55:00Z  v0.3.4
AD2                windows  offline   ACG Internal          last: 2026-05-27T09:10:00Z  v0.3.2

Show: hostname, os_type, online/offline, client_name (from site_name/client_name), last_seen, agent_version.

Send a command

Determine command_type from os_type

`os_type`	Default `command_type`	Notes
`windows`	`powershell`	Runs via `powershell.exe -NonInteractive` as SYSTEM
`linux`	`shell`	Runs via `/bin/bash`
`macos`	`shell`	Runs via `/bin/bash` (or `/bin/zsh` on newer installs)

Use python only when explicitly writing a Python script. Use script for saved scripts (not covered in this skill).

Basic dispatch

# For PowerShell (Windows)
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "command_type": "powershell",
  "command": "Get-ComputerInfo | Select-Object CsName, OsName, OsVersion, CsProcessors",
  "timeout_seconds": 60
}
JSON
)

# For shell (Linux/macOS)
CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "command_type": "shell",
  "command": "df -h / && uptime",
  "timeout_seconds": 60
}
JSON
)

CMD_ID=$(echo "$CMD_RESP" | jq -r '.command_id // empty')
CMD_STATUS=$(echo "$CMD_RESP" | jq -r '.status // empty')

if [ -z "$CMD_ID" ]; then
    echo "[ERROR] Dispatch failed: $CMD_RESP"
    exit 1
fi
echo "[OK] Command dispatched — id=$CMD_ID, initial_status=$CMD_STATUS"

Initial status values:

"running" — agent is online and received the command
"pending" — agent is offline; command queued

Multi-line scripts (heredoc with variable interpolation)

For scripts that need shell variable values substituted at the point of the curl call:

# Use <<JSON (unquoted) so shell expands ${VARS} in the payload
# Use jq -n --arg to safely JSON-encode the script text before embedding
SCRIPT='
$path = "C:\Users\$env:USERNAME\Downloads"
Write-Host "Size: $((Get-ChildItem $path -Recurse | Measure-Object Length -Sum).Sum / 1GB) GB"
'
PAYLOAD=$(jq -n \
  --arg ct "powershell" \
  --arg cmd "$SCRIPT" \
  --argjson to 120 \
  '{command_type: $ct, command: $cmd, timeout_seconds: $to}')

CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "$PAYLOAD")

jq -n --arg cmd "$SCRIPT" correctly JSON-encodes the script including backslashes, dollar signs, and newlines. Always use this pattern for multi-line scripts — never embed raw script text into JSON by hand.

Run as logged-on user (`context: user_session`)

Use when the command requires a desktop session: GUI actions, Clear-RecycleBin, VPN cmdlets that need a user token, interactive-only software.

CMD_RESP=$(curl -s -X POST "$RMM/api/agents/$AGENT_ID/command" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "command_type": "powershell",
  "command": "Get-WMIObject Win32_UserProfile | Where-Object { $_.Special -eq $false } | Select-Object LocalPath",
  "timeout_seconds": 30,
  "context": "user_session"
}
JSON
)

user_session requirements and limits:

Requires an active (non-locked) desktop session — no user logged in = command runs but produces empty output or silently no-ops
On Windows: runs under the WTS-impersonated token of the currently active user. If an admin is logged in, the command runs elevated; if a standard user, it does not.
On Linux/macOS: the agent uses sudo -u <active_user> or su — verify agent implementation supports this on the target platform before relying on it
Clear-RecycleBin, Set-VpnConnection -L2tpPsk, and similar COM/interactive-shell cmdlets require user_session
Clear-RecycleBin -Force silently no-ops as SYSTEM even with user_session if no desktop is present — enumerate C:\$Recycle.Bin\<SID>\* directly when running as SYSTEM

Preview before dispatch (for `/rmm run`)

When the user requests an interactive run without a pre-written script, show a preview before dispatching:

COMMAND PREVIEW
---------------
Agent:    WEST-MEADOW-9025 (macos) — online
Type:     shell
Context:  system
Timeout:  120s
Script:
  find /Users/sylvia/Library -name "*.plist" -maxdepth 3

Dispatch? (yes/no)

Wait for explicit confirmation before dispatching any command.

Poll for completion

poll_command() {
    local cmd_id="$1"
    local max_polls="${2:-60}"
    local interval=5
    local count=0

    while [ $count -lt $max_polls ]; do
        RESULT=$(curl -s "$RMM/api/commands/$cmd_id" \
          -H "Authorization: Bearer $TOKEN")
        STATUS=$(echo "$RESULT" | jq -r '.status // empty')

        case "$STATUS" in
            completed|failed|cancelled|interrupted)
                echo "$RESULT"
                return 0
                ;;
            running|pending)
                count=$((count + 1))
                sleep $interval
                ;;
            "")
                echo "[ERROR] Empty status — response: $RESULT"
                return 1
                ;;
        esac
    done

    echo "[WARNING] Polling timeout after $((max_polls * interval))s — last status: $STATUS"
    echo "[INFO] Command $cmd_id may still be running. Use /rmm status $cmd_id to check later."
}

RESULT=$(poll_command "$CMD_ID")

Polling cadence:

Default: check every 5s for up to 5 minutes (60 polls)
For long-running scripts (software installs, disk scans): pass a higher max_polls or remind the user to check with /rmm status later
Never spam polls at < 3s intervals

Display command output

display_output() {
    local result="$1"
    local status=$(echo "$result" | jq -r '.status')
    local exit_code=$(echo "$result" | jq -r '.exit_code // "—"')
    local stdout=$(echo "$result" | jq -r '.stdout // ""')
    local stderr=$(echo "$result" | jq -r '.stderr // ""')

    echo "Status:    $status (exit $exit_code)"
    if [ -n "$stdout" ]; then
        echo "--- stdout ---"
        echo "$stdout"
    fi
    if [ -n "$stderr" ]; then
        echo "--- stderr ---"
        echo "$stderr"
    fi
}

Always show stderr on non-zero exit — PowerShell writes errors to stderr even when the command partially succeeds. Never suppress stderr output.

exit_code = null on status = "pending" or status = "interrupted" — the command never ran or was orphaned. Do not treat null exit code as exit 0.

status = "failed" does NOT always mean a script bug:

Check stderr for "Command timed out (server-side reaper)" → timeout, increase timeout_seconds
Check stderr for "Agent restarted during execution" → interrupted, re-run
Only if stderr has actual script errors is it a script issue

Cancel a command

CANCEL_RESP=$(curl -s -X POST "$RMM/api/commands/$CMD_ID/cancel" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json")
# Response: {"status": "cancelled", "message": "Command cancelled"}

Only cancels pending or running commands
If already completed, failed, or cancelled: server returns 400 "Command already finished"
Cancellation of a running command sends a WebSocket cancel message to the agent; the agent may not honour it immediately

List command history

# All recent commands (admin only)
curl -s "$RMM/api/commands?limit=20" -H "Authorization: Bearer $TOKEN" | \
  jq '[.[] | {id, agent_id, command_type, status, exit_code, created_at, command_text: (.command_text[:60])}]'

# Agent-scoped (works for non-admin)
curl -s "$RMM/api/commands?agent_id=$AGENT_ID&limit=10" -H "Authorization: Bearer $TOKEN" | \
  jq '[.[] | {id, status, exit_code, command_type, created_at, preview: (.command_text[:80])}]'

Verified response shapes (from source)

POST /api/auth/login

{"token": "eyJ..."}

POST /api/agents/{id}/command

{
  "command_id": "uuid",
  "status": "running",
  "message": "Command sent to agent"
}

or if agent offline:

{
  "command_id": "uuid",
  "status": "pending",
  "message": "Agent is offline. Command queued for execution when agent reconnects."
}

GET /api/commands/{id}

{
  "id": "uuid",
  "agent_id": "uuid",
  "command_type": "powershell",
  "command_text": "...",
  "status": "completed",
  "exit_code": 0,
  "stdout": "...",
  "stderr": "",
  "created_at": "ISO-8601",
  "started_at": "ISO-8601",
  "completed_at": "ISO-8601",
  "created_by": "uuid",
  "timeout_seconds": 300,
  "context": "system"
}

Note: the response field is command_text, not command. Parsing .command will return null.

GET /api/agents (array)

[{
  "id": "uuid",
  "hostname": "WEST-MEADOW-9025",
  "os_type": "macos",
  "os_version": "14.5",
  "os_name": "macOS",
  "agent_version": "0.3.4",
  "last_seen": "ISO-8601",
  "status": "online",
  "is_connected": true,
  "device_id": "...",
  "site_id": "uuid",
  "site_name": "Main Office",
  "client_name": "Scileppi Law",
  "maintenance_mode": false,
  "maintenance_mode_note": null,
  "update_channel": null
}]

POST /api/commands/{id}/cancel

{"status": "cancelled", "message": "Command cancelled"}

All command status values

Status	Meaning
`pending`	Queued; agent offline
`running`	Agent executing
`completed`	Finished; `exit_code` set
`failed`	Non-zero exit OR timed out (check stderr)
`cancelled`	Cancelled via API
`interrupted`	Agent restarted mid-run

Platform-specific patterns

Windows — PowerShell as SYSTEM

# Disk usage
Get-PSDrive -PSProvider FileSystem | Select-Object Name, @{n='Used(GB)';e={[math]::Round($_.Used/1GB,2)}}, @{n='Free(GB)';e={[math]::Round($_.Free/1GB,2)}}

# Services (a specific one)
Get-Service -Name "servicename" | Select-Object Name, Status, StartType

# Running processes (top by CPU)
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, Id, CPU, WorkingSet

# Event log errors (last 24h)
Get-WinEvent -LogName System -MaxEvents 50 | Where-Object { $_.LevelDisplayName -eq "Error" -and $_.TimeCreated -gt (Get-Date).AddHours(-24) } | Select-Object TimeCreated, Id, Message

# Recycle bin contents (correct way — Clear-RecycleBin fails as SYSTEM)
Get-ChildItem "C:\`$Recycle.Bin" -Recurse -Force -ErrorAction SilentlyContinue | Select-Object FullName, Length

# User sessions (who is logged on)
query user

# Installed software
Get-ItemProperty HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\* | Select-Object DisplayName, DisplayVersion, Publisher | Sort-Object DisplayName

Common Windows gotchas:

Clear-RecycleBin -Force silently no-ops in SYSTEM context — use direct C:\$Recycle.Bin\<SID>\* enumeration
$env:USERNAME as SYSTEM = "SYSTEM" (not the logged-on user) — use query user or WMI to find the actual logged-on username
Get-Date is UTC-aware; use -Format "yyyy-MM-dd HH:mm:ss" for readable output
PowerShell Write-Host outputs to stdout; Write-Error outputs to stderr
Paths with spaces need quoting: "C:\Program Files\..."
$ErrorActionPreference = 'Stop' makes the script fail on first error (useful for idempotent scripts); without it, PowerShell continues past non-terminating errors

macOS — shell as root

# Disk usage
df -h /

# Find large files
find /Users -maxdepth 4 -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20

# Logged-on user (who is using the GUI session)
stat -f "%Su" /dev/console

# LaunchDaemons check
ls /Library/LaunchDaemons/

# LaunchAgents for a user
ls /Users/sylvia/Library/LaunchAgents/ 2>/dev/null

# Remove a plist and unload (system scope)
launchctl bootout system /Library/LaunchDaemons/com.example.plist 2>/dev/null
rm /Library/LaunchDaemons/com.example.plist

# Mount AFP share (uses Keychain credentials)
osascript -e 'mount volume "afp://SL-SERVER._afpovertcp._tcp.local/Data"'

# File write via base64 (python3 is an Xcode stub on macOS without dev tools)
echo "BASE64ENCODED==" | /usr/bin/base64 -D > /path/to/file
chmod +x /path/to/file

# Strip macOS ACL that prevents directory deletion
chmod -a "group:everyone deny delete" /path/to/dir

Common macOS gotchas:

/usr/bin/python3 on macOS without Xcode CLI tools is a stub that triggers an installer dialog — use /usr/bin/base64 -D (capital D — BSD flag, not GNU) for file writing
nohup cmd & in agent shell context fails: nohup: can't detach from console: Inappropriate ioctl for device — use LaunchDaemon (launchctl bootstrap system <plist>) for truly detached background processes
Home directory subdirectories may have ACL 0: group:everyone deny delete — strip with chmod -a "group:everyone deny delete" before rmdir
pgrep rsync matches "colorsyncd" as a substring — use pgrep -f "rsync.*specific-path" for precision
launchctl bootstrap gui/501 <plist> targets the user session for UID 501 (first user); find UID with id -u sylvia
AFP automount via osascript uses Keychain; works only in user session context

Linux — shell as root

# Disk usage
df -h

# Memory
free -h

# Service status (systemd)
systemctl status servicename --no-pager -l

# Journal (last 50 lines of a service)
journalctl -u servicename -n 50 --no-pager

# Find large files
find / -xdev -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20

# Open ports
ss -tlnp

Investigate → script → dispatch workflow

When the user asks for something that requires investigation before writing a script:

List agents — confirm target, check online status
Characterize the endpoint — run a quick recon command (OS version, disk, services) if the context is unknown
Write the script — draft it inline, show to user
Preview — show agent + script before dispatch
Dispatch — send with appropriate command_type and context
Poll — wait for completion, display output
Bot alert — post one-line summary

For complex multi-step remediation (move files, install software, configure services), dispatch one logical step at a time. Confirm output before proceeding to the next step.

Error handling

HTTP / Condition	Meaning	Recovery
401 Unauthorized	JWT expired or invalid	Re-authenticate (Phase 0)
404 Not Found	Agent or command UUID not found	Re-run `GET /api/agents` to refresh list
400 "Command already finished"	Tried to cancel a done command	No action needed
403 Forbidden	Non-admin calling fleet-wide endpoint	Add `?agent_id=<uuid>` filter
`status = "failed"`, stderr = "Command timed out"	Timeout	Increase `timeout_seconds` and re-run
`status = "interrupted"`	Agent restarted	Re-run the command
`status = "pending"` (stuck)	Agent offline for extended period	Notify user; do not cancel unless user confirms
`command_id = null` or empty	Dispatch failed	Check response body for error message
`jq` parse error on agent list	Control characters in field	Use `grep -o '"hostname":"[^"]*"'` for simple extraction

Post to #dev-alerts (after every write)

Post after every successful dispatch or cancel. Never post for read-only operations. RMM alerts route to the private #dev-alerts channel automatically (the [RMM] prefix triggers auto-routing in post-bot-alert.sh); no channel argument needed.

ALERT_OUT=$(bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" "<message>")
echo "$ALERT_OUT"

Message format:

[RMM] <Tech> dispatched to <hostname> (<os_type>) - <brief description> -> cmd:<command_id_short>
[RMM] <Tech> cancelled cmd <command_id_short> on <hostname>

<command_id_short> = first 8 chars of the UUID.

Examples:

bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \
  "[RMM] Mike dispatched to WEST-MEADOW-9025 (macos) - disk cleanup AFP symlink -> cmd:3f8a1c2e"

bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \
  "[RMM] Howard ran disk scan on CS-SERVER (windows) - exit 0, 14 GB freed -> cmd:9b4d7e01"

ASCII only — no Unicode dashes or arrows. Use - and ->.

Client / Site onboarding (`/rmm onboard`)

Provision a new client and its first site, then store the agent enrollment key in the vault. The site api_key is shown only once in the create response — capture it before anything else.

Verified API shapes (from server/src/api/clients.rs + sites.rs):

POST /api/clients body {"name": "...", "code"?: "...", "notes"?: "..."} → {"id", "name", "code", "is_active", "site_count", ...}. partner_id is assigned server-side (default partner) — do NOT send it.
POST /api/sites body {"client_id": "<uuid>", "name": "...", "address"?: "...", "notes"?: "..."} → {"site": {"id", "site_code", ...}, "api_key": "grmm_...", "message"}. site_code is server-generated (generate_unique_site_code, e.g. GREEN-FALCON-7214) — never supply it.
If the key is lost: POST /api/sites/:id/regenerate-key issues a new one (invalidates the old).

Workflow:

# (Phase 0 bootstrap done → $TOKEN, $RMM, $REPO_ROOT)
NAME="Rednour Law Offices"; SITE="Main"
SLUG="rednour"                       # lowercase, no spaces/hyphens — matches existing vault convention

# 1. Guard against a duplicate client
curl -s "$RMM/api/clients" -H "Authorization: Bearer $TOKEN" \
  | jq -r --arg n "$NAME" '.[] | select(.name|ascii_downcase==($n|ascii_downcase)) | "EXISTS id=\(.id)"'

# 2. Create client
CID=$(curl -s -X POST "$RMM/api/clients" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  --data-binary "{\"name\":\"$NAME\"}" | jq -r '.id')

# 3. Create site — capture the ONE-TIME api_key immediately to a file
curl -s -X POST "$RMM/api/sites" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  --data-binary "{\"client_id\":\"$CID\",\"name\":\"$SITE\"}" > /tmp/site.json
SID=$(jq -r '.site.id' /tmp/site.json); SCODE=$(jq -r '.site.site_code' /tmp/site.json); AKEY=$(jq -r '.api_key' /tmp/site.json)

# 4. Vault the enrollment key (mirror existing clients/<slug>/gururmm-site-main.sops.yaml structure)
VR=$(jq -r '.vault_path' "$REPO_ROOT/.claude/identity.json"); T="$VR/clients/$SLUG/gururmm-site-main.sops.yaml"
mkdir -p "$(dirname "$T")"
cat > "$T" <<YAML
client: $NAME
site: $SITE
created: "$(date +%F)"
credentials:
    client_id: "$CID"
    site_id: "$SID"
    site_code: "$SCODE"
    api_key: "$AKEY"
    installer_url: "https://rmm.azcomputerguru.com/install/$SCODE"
    msi_url: "https://rmm.azcomputerguru.com/api/sites/$SID/installer"
YAML
sops --config "$VR/.sops.yaml" --encrypt --in-place "$T"          # encrypt (see gotchas)
bash "$REPO_ROOT/.claude/scripts/vault.sh" get "clients/$SLUG/gururmm-site-main.sops.yaml" >/dev/null  # verify round-trip
git -C "$VR" add "clients/$SLUG/gururmm-site-main.sops.yaml" && git -C "$VR" commit -q -m "add: $NAME GuruRMM site $SITE enrollment key ($SCODE)" && git -C "$VR" push -q

Vault-encryption gotchas (learned 2026-05-29 onboarding Rednour):

sops --encrypt needs the vault's .sops.yaml. From outside the vault dir it errors config file not found — always pass --config "$VR/.sops.yaml". Encryption uses only the public age keys, so no private key / SOPS_AGE_KEY_FILE is required for the encrypt step.
Quote all date/timestamp values (created: "2026-05-29"). A bare YAML date makes sops 3.7.3 fail with Cannot walk value, unknown type: time.Time.
Put every secret under the credentials: block — the vault encrypted_regex covers credentials|password|secret|api_key|token|.... Fields outside it (client, site, created) stay plaintext as searchable metadata.
If --encrypt --in-place fails, the file is left plaintext on disk — fix and re-encrypt (or delete) immediately; never commit it unencrypted. Confirm with grep -c 'ENC\[' "$T" (should be > 0).

Report to the user + bot alert: client_id, site_id, site_code, install page https://rmm.azcomputerguru.com/install/<SCODE>, MSI https://rmm.azcomputerguru.com/api/sites/<SID>/installer, and the vault path. Onboarding is a write → post a [RMM] <tech> onboarded client '<name>' + site '<site>' (<SCODE>) bot alert.

Onboarding diagnostic (`/rmm diagnose`)

Run a one-shot security + health + inventory probe against a newly onboarded Windows agent and produce a prioritized "take this seriously" report plus an immutable before/after baseline. This is the Phase 1 tooling implementation; a native GuruRMM feature (DB-backed storage, scheduled re-baselines) is Phase 3.

Runner: .claude/scripts/run-onboarding-diagnostic.sh <hostname|uuid> [client-slug] Probe: .claude/scripts/onboarding-diagnostic.ps1 (Windows PowerShell 5.1, ASCII, runs as SYSTEM)

bash "$REPO_ROOT/.claude/scripts/run-onboarding-diagnostic.sh" FrontDeskReception rednour

If no client-slug is given, it is derived by slugifying the agent's client_name.

Workflow

Authenticate to RMM (vault creds, same as the rest of this skill).
Resolve the agent (exact UUID, exact hostname, then partial). Windows-only.
Upload the probe to the endpoint base64-encoded, in <24 KB chunks (the agent caps an inline command body at ~32-40 KB; the probe is ~60 KB), then a final small command decodes it to a .ps1, runs it, and deletes both temp files. Every check in the probe is wrapped in try/catch, so one failing check becomes an unknown-severity finding instead of aborting the probe.
The probe emits a single JSON object fenced by ===DIAG-JSON-START=== / ===DIAG-JSON-END===; the runner extracts it from between the markers.
Grade, write two baseline files, diff against any prior baseline, alert.

Grade model

Grade	Meaning
RED	At least one `critical` finding
AMBER	At least one `warning`, no `critical`
GREEN	No `critical` and no `warning`

unknown-severity findings (a check that failed to run) do not change the grade but are listed in the report for manual follow-up.

What it checks

Security: Defender state (RTP/service/signature age/tamper), 3rd-party AV conflicts, leftover competitor RMM / remote-access agents (ScreenConnect, NinjaRMM, Datto, Atera, Kaseya, TeamViewer, AnyDesk, Splashtop, N-able, Syncro, Action1, Automate, LogMeIn), firewall profiles, BitLocker (laptop-aware), local admins / built-in Administrator / non-expiring passwords, patch posture + OS EOL, RDP/NLA, SMBv1, UAC, LAPS.
Health: disk free %, SMART/physical-disk health, 14-day stability (unexpected shutdown / BSOD / disk errors), pending reboot, uptime, failed auto-start services, domain secure channel, time source, battery (laptops), backup-agent presence.
Inventory baseline (info): model/serial, CPU/RAM, BIOS, TPM, Secure Boot, OS edition/build/activation, full installed-software list, local users/groups, network, scheduled tasks + Run-key autoruns.

Where baselines are stored

clients/<slug>/onboarding-baselines/ — two files per run, both timestamped <HOST>-<UTC-YYYYMMDDTHHMMSS>:

*.json — the raw immutable snapshot (do not edit; it is the source of truth for diffs).
*.md — the human report: grade, findings grouped critical -> warning -> info -> unknown, inventory summary, and a diff section vs the most recent prior baseline (new / resolved / regressed findings, software added/removed).

Baselines are immutable and append-only. GuruRMM-DB storage of baselines arrives with the Phase 3 native feature.

Alerting

Each CRITICAL finding and a RED overall grade auto-post a one-line [RMM] alert to #dev-alerts via post-bot-alert.sh (ASCII only, soft-fail). Example: [RMM] Onboarding diag <HOST> (<client>) = RED: <n> critical - <titles>.

Known enrolled agents (verify with GET /api/agents — UUIDs change on re-enroll)

Do not use this table as authoritative — always resolve live. Treat as a starting hint only.

Hostname	Client	OS
WEST-MEADOW-9025	Scileppi Law	macOS
CS-SERVER	Cascades of Tucson	Windows
DESKTOP-DLTAGOI	Cascades LE	Windows
AD2	ACG Internal	Windows

What NOT to use RMM for

Permanent file deletion without user confirmation — always show what will be deleted, get a yes
Credential harvesting — do not dump password stores, SAM hive, or credential manager contents
Software installation without stating what will be installed — always preview the installer command
Bulk destructive operations (rm -rf /, format, Remove-Item -Recurse C:\Windows) — require explicit written confirmation from the user in chat, not just "yes" to a vague preview
RMM product development — use the Coding Agent and GuruRMM project context instead

32 KiB Raw Blame History