diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 8c7faa5..86ce71a 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -267,6 +267,7 @@ Vault structure: `infrastructure/`, `clients/`, `services/`, `projects/`, `msp-t | `/sync` | Sync config from Gitea repository | | `/create-spec` | Create app specification for AutoCoder | | `/frontend-design` | Modern frontend design (auto-invoke after UI changes) | +| `/rmm` | Remote command execution on GuruRMM agents — list, run, poll, cancel | | `/remediation-tool` | M365 breach checks, tenant sweeps, gated remediation | | `/feature-request` | Howard submits a GuruRMM feature request — Claude classifies it and messages Mike | | `/shape-spec` | Pre-implementation spec for a GuruRMM feature — produces plan.md, shape.md, references.md, standards.md | diff --git a/.claude/commands/rmm.md b/.claude/commands/rmm.md new file mode 100644 index 0000000..3431103 --- /dev/null +++ b/.claude/commands/rmm.md @@ -0,0 +1,642 @@ +--- +description: Run commands, investigate agents, and execute remote scripts via the GuruRMM agent fleet. +--- + +# /rmm — GuruRMM remote command execution + +Interact with the GuruRMM agent fleet: list agents, run remote commands (PowerShell, shell, Python), poll for output, and cancel in-flight tasks. + +**Default posture: read before write.** Always look up the agent by hostname before dispatching a command. Never guess a UUID. + +--- + +## Usage + +``` +/rmm List all agents with connection status +/rmm agents [] List agents, optionally filtered by client/site name +/rmm agent Show agent detail + last 10 commands +/rmm run Interactively compose and dispatch a script +/rmm shell One-liner bash/sh command (Linux/macOS agents) +/rmm ps One-liner PowerShell command (Windows agents) +/rmm status Check command status +/rmm output Fetch full stdout + stderr for a completed command +/rmm cancel Cancel a pending or running command +/rmm history [N] Recent command history (default 10, max 500) +``` + +--- + +## API configuration + +**Base URL:** `http://172.16.3.30:3001` +**Auth:** JWT (24-hour expiry) — obtain via `POST /api/auth/login`, pass as `Authorization: Bearer ` +**Credentials vault path:** `infrastructure/gururmm-server.sops.yaml` + - `credentials.gururmm-api.admin-email` + - `credentials.gururmm-api.admin-password` + +--- + +## Hard rules (incidents have occurred — no exceptions) + +**Never hardcode a UUID.** Agent UUIDs change when an agent re-enrolls. Always resolve hostname → UUID via `GET /api/agents` at the start of every workflow. The UUID from memory or a prior session may be stale. + +**Never guess the shell from the hostname.** Use `os_type` from the agent record: `"windows"` → `powershell`, `"linux"` → `shell`, `"macos"` → `shell`. A machine named "WEST-MEADOW-9025" could be macOS. Check `os_type`. + +**Initial status `"pending"` is not a failure.** It means the agent is offline and the command is queued. The agent will execute it when it reconnects. Poll patiently; do not cancel and retry. + +**Status `"failed"` with `stderr = "Command timed out (server-side reaper)"` means the command exceeded its timeout.** The server reaper transitions `running` → `failed` rather than introducing a separate `timeout` status. Read stderr before concluding the script had a bug. + +**Status `"interrupted"` means the agent restarted mid-execution.** The command was orphaned. The server marks it interrupted when the agent reconnects. Re-run if needed. + +**`context: "user_session"` requires an active logged-on user.** If no interactive user is signed in, the WTS impersonation call will fail. Check whether a user is logged in before choosing this context. Falls back silently to no output if no desktop session exists. + +**No interactive input.** Agent shells are non-interactive. Scripts must not call `Read-Host`, `pause`, `read -p`, or any prompt. Pass all values as variables or parameters in the script body. + +**Output is streamed in full but stored as a string.** For very large output (log files, directory trees), write the output to a file on the endpoint and fetch it with a second command. + +**`elevated: true` sends the flag to the agent but does not guarantee elevation.** On Windows the agent is already running as SYSTEM; `elevated` is a hint for future agent behaviour. Do not rely on it for privilege escalation on endpoints where the agent is not already elevated. + +**Heredoc payloads — always use `--data-binary @-` and `<<'JSON'` for static payloads.** On Windows, the Write tool and Git Bash resolve `/tmp/foo.json` to different directories. Heredoc avoids the file handoff entirely. Use `<<'JSON'` (single-quoted) for static payloads that contain no shell variables; use `</dev/null)/.claude/identity.json +fi +REPO_ROOT=$(jq -r '.claudetools_root // empty' "$IDENTITY_PATH" 2>/dev/null) +if [ -z "$REPO_ROOT" ]; then + REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null) +fi +VAULT="$REPO_ROOT/.claude/scripts/vault.sh" +RMM="http://172.16.3.30:3001" + +RMM_EMAIL=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-email) +RMM_PASS=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-password) + +JWT=$(curl -s -X POST "$RMM/api/auth/login" \ + -H "Content-Type: application/json" \ + --data-binary @- <` or `su` — verify agent implementation supports this on the target platform before relying on it +- `Clear-RecycleBin`, `Set-VpnConnection -L2tpPsk`, and similar COM/interactive-shell cmdlets require `user_session` +- `Clear-RecycleBin -Force` silently no-ops as SYSTEM even with `user_session` if no desktop is present — enumerate `C:\$Recycle.Bin\\*` directly when running as SYSTEM + +### Preview before dispatch (for `/rmm run`) + +When the user requests an interactive run without a pre-written script, show a preview before dispatching: + +``` +COMMAND PREVIEW +--------------- +Agent: WEST-MEADOW-9025 (macos) — online +Type: shell +Context: system +Timeout: 120s +Script: + find /Users/sylvia/Library -name "*.plist" -maxdepth 3 + +Dispatch? (yes/no) +``` + +Wait for explicit confirmation before dispatching any command. + +--- + +## Poll for completion + +```bash +poll_command() { + local cmd_id="$1" + local max_polls="${2:-60}" + local interval=5 + local count=0 + + while [ $count -lt $max_polls ]; do + RESULT=$(curl -s "$RMM/api/commands/$cmd_id" \ + -H "Authorization: Bearer $TOKEN") + STATUS=$(echo "$RESULT" | jq -r '.status // empty') + + case "$STATUS" in + completed|failed|cancelled|interrupted) + echo "$RESULT" + return 0 + ;; + running|pending) + count=$((count + 1)) + sleep $interval + ;; + "") + echo "[ERROR] Empty status — response: $RESULT" + return 1 + ;; + esac + done + + echo "[WARNING] Polling timeout after $((max_polls * interval))s — last status: $STATUS" + echo "[INFO] Command $cmd_id may still be running. Use /rmm status $cmd_id to check later." +} + +RESULT=$(poll_command "$CMD_ID") +``` + +**Polling cadence:** +- Default: check every 5s for up to 5 minutes (60 polls) +- For long-running scripts (software installs, disk scans): pass a higher `max_polls` or remind the user to check with `/rmm status` later +- Never spam polls at < 3s intervals + +--- + +## Display command output + +```bash +display_output() { + local result="$1" + local status=$(echo "$result" | jq -r '.status') + local exit_code=$(echo "$result" | jq -r '.exit_code // "—"') + local stdout=$(echo "$result" | jq -r '.stdout // ""') + local stderr=$(echo "$result" | jq -r '.stderr // ""') + + echo "Status: $status (exit $exit_code)" + if [ -n "$stdout" ]; then + echo "--- stdout ---" + echo "$stdout" + fi + if [ -n "$stderr" ]; then + echo "--- stderr ---" + echo "$stderr" + fi +} +``` + +**Always show stderr on non-zero exit** — PowerShell writes errors to stderr even when the command partially succeeds. Never suppress stderr output. + +**`exit_code = null`** on `status = "pending"` or `status = "interrupted"` — the command never ran or was orphaned. Do not treat null exit code as exit 0. + +**`status = "failed"` does NOT always mean a script bug:** +- Check `stderr` for `"Command timed out (server-side reaper)"` → timeout, increase `timeout_seconds` +- Check `stderr` for `"Agent restarted during execution"` → interrupted, re-run +- Only if stderr has actual script errors is it a script issue + +--- + +## Cancel a command + +```bash +CANCEL_RESP=$(curl -s -X POST "$RMM/api/commands/$CMD_ID/cancel" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json") +# Response: {"status": "cancelled", "message": "Command cancelled"} +``` + +- Only cancels `pending` or `running` commands +- If already `completed`, `failed`, or `cancelled`: server returns 400 "Command already finished" +- Cancellation of a `running` command sends a WebSocket cancel message to the agent; the agent may not honour it immediately + +--- + +## List command history + +```bash +# All recent commands (admin only) +curl -s "$RMM/api/commands?limit=20" -H "Authorization: Bearer $TOKEN" | \ + jq '[.[] | {id, agent_id, command_type, status, exit_code, created_at, command_text: (.command_text[:60])}]' + +# Agent-scoped (works for non-admin) +curl -s "$RMM/api/commands?agent_id=$AGENT_ID&limit=10" -H "Authorization: Bearer $TOKEN" | \ + jq '[.[] | {id, status, exit_code, command_type, created_at, preview: (.command_text[:80])}]' +``` + +--- + +## Verified response shapes (from source) + +### POST /api/auth/login +```json +{"token": "eyJ..."} +``` + +### POST /api/agents/{id}/command +```json +{ + "command_id": "uuid", + "status": "running", + "message": "Command sent to agent" +} +``` +or if agent offline: +```json +{ + "command_id": "uuid", + "status": "pending", + "message": "Agent is offline. Command queued for execution when agent reconnects." +} +``` + +### GET /api/commands/{id} +```json +{ + "id": "uuid", + "agent_id": "uuid", + "command_type": "powershell", + "command_text": "...", + "status": "completed", + "exit_code": 0, + "stdout": "...", + "stderr": "", + "created_at": "ISO-8601", + "started_at": "ISO-8601", + "completed_at": "ISO-8601", + "created_by": "uuid", + "timeout_seconds": 300, + "context": "system" +} +``` + +**Note:** the response field is `command_text`, not `command`. Parsing `.command` will return null. + +### GET /api/agents (array) +```json +[{ + "id": "uuid", + "hostname": "WEST-MEADOW-9025", + "os_type": "macos", + "os_version": "14.5", + "os_name": "macOS", + "agent_version": "0.3.4", + "last_seen": "ISO-8601", + "status": "online", + "is_connected": true, + "device_id": "...", + "site_id": "uuid", + "site_name": "Main Office", + "client_name": "Scileppi Law", + "maintenance_mode": false, + "maintenance_mode_note": null, + "update_channel": null +}] +``` + +### POST /api/commands/{id}/cancel +```json +{"status": "cancelled", "message": "Command cancelled"} +``` + +### All command status values +| Status | Meaning | +|--------|---------| +| `pending` | Queued; agent offline | +| `running` | Agent executing | +| `completed` | Finished; `exit_code` set | +| `failed` | Non-zero exit OR timed out (check stderr) | +| `cancelled` | Cancelled via API | +| `interrupted` | Agent restarted mid-run | + +--- + +## Platform-specific patterns + +### Windows — PowerShell as SYSTEM + +```powershell +# Disk usage +Get-PSDrive -PSProvider FileSystem | Select-Object Name, @{n='Used(GB)';e={[math]::Round($_.Used/1GB,2)}}, @{n='Free(GB)';e={[math]::Round($_.Free/1GB,2)}} + +# Services (a specific one) +Get-Service -Name "servicename" | Select-Object Name, Status, StartType + +# Running processes (top by CPU) +Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, Id, CPU, WorkingSet + +# Event log errors (last 24h) +Get-WinEvent -LogName System -MaxEvents 50 | Where-Object { $_.LevelDisplayName -eq "Error" -and $_.TimeCreated -gt (Get-Date).AddHours(-24) } | Select-Object TimeCreated, Id, Message + +# Recycle bin contents (correct way — Clear-RecycleBin fails as SYSTEM) +Get-ChildItem "C:\`$Recycle.Bin" -Recurse -Force -ErrorAction SilentlyContinue | Select-Object FullName, Length + +# User sessions (who is logged on) +query user + +# Installed software +Get-ItemProperty HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\* | Select-Object DisplayName, DisplayVersion, Publisher | Sort-Object DisplayName +``` + +**Common Windows gotchas:** +- `Clear-RecycleBin -Force` silently no-ops in SYSTEM context — use direct `C:\$Recycle.Bin\\*` enumeration +- `$env:USERNAME` as SYSTEM = `"SYSTEM"` (not the logged-on user) — use `query user` or WMI to find the actual logged-on username +- `Get-Date` is UTC-aware; use `-Format "yyyy-MM-dd HH:mm:ss"` for readable output +- PowerShell `Write-Host` outputs to stdout; `Write-Error` outputs to stderr +- Paths with spaces need quoting: `"C:\Program Files\..."` +- `$ErrorActionPreference = 'Stop'` makes the script fail on first error (useful for idempotent scripts); without it, PowerShell continues past non-terminating errors + +### macOS — shell as root + +```bash +# Disk usage +df -h / + +# Find large files +find /Users -maxdepth 4 -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20 + +# Logged-on user (who is using the GUI session) +stat -f "%Su" /dev/console + +# LaunchDaemons check +ls /Library/LaunchDaemons/ + +# LaunchAgents for a user +ls /Users/sylvia/Library/LaunchAgents/ 2>/dev/null + +# Remove a plist and unload (system scope) +launchctl bootout system /Library/LaunchDaemons/com.example.plist 2>/dev/null +rm /Library/LaunchDaemons/com.example.plist + +# Mount AFP share (uses Keychain credentials) +osascript -e 'mount volume "afp://SL-SERVER._afpovertcp._tcp.local/Data"' + +# File write via base64 (python3 is an Xcode stub on macOS without dev tools) +echo "BASE64ENCODED==" | /usr/bin/base64 -D > /path/to/file +chmod +x /path/to/file + +# Strip macOS ACL that prevents directory deletion +chmod -a "group:everyone deny delete" /path/to/dir +``` + +**Common macOS gotchas:** +- `/usr/bin/python3` on macOS without Xcode CLI tools is a stub that triggers an installer dialog — use `/usr/bin/base64 -D` (capital D — BSD flag, not GNU) for file writing +- `nohup cmd &` in agent shell context fails: `nohup: can't detach from console: Inappropriate ioctl for device` — use LaunchDaemon (`launchctl bootstrap system `) for truly detached background processes +- Home directory subdirectories may have ACL `0: group:everyone deny delete` — strip with `chmod -a "group:everyone deny delete"` before `rmdir` +- `pgrep rsync` matches "colorsyncd" as a substring — use `pgrep -f "rsync.*specific-path"` for precision +- `launchctl bootstrap gui/501 ` targets the user session for UID 501 (first user); find UID with `id -u sylvia` +- AFP automount via `osascript` uses Keychain; works only in user session context + +### Linux — shell as root + +```bash +# Disk usage +df -h + +# Memory +free -h + +# Service status (systemd) +systemctl status servicename --no-pager -l + +# Journal (last 50 lines of a service) +journalctl -u servicename -n 50 --no-pager + +# Find large files +find / -xdev -size +100M -type f 2>/dev/null | sort -k5 -rn | head -20 + +# Open ports +ss -tlnp +``` + +--- + +## Investigate → script → dispatch workflow + +When the user asks for something that requires investigation before writing a script: + +1. **List agents** — confirm target, check online status +2. **Characterize the endpoint** — run a quick recon command (OS version, disk, services) if the context is unknown +3. **Write the script** — draft it inline, show to user +4. **Preview** — show agent + script before dispatch +5. **Dispatch** — send with appropriate `command_type` and `context` +6. **Poll** — wait for completion, display output +7. **Bot alert** — post one-line summary + +For complex multi-step remediation (move files, install software, configure services), dispatch one logical step at a time. Confirm output before proceeding to the next step. + +--- + +## Error handling + +| HTTP / Condition | Meaning | Recovery | +|-----------------|---------|----------| +| 401 Unauthorized | JWT expired or invalid | Re-authenticate (Phase 0) | +| 404 Not Found | Agent or command UUID not found | Re-run `GET /api/agents` to refresh list | +| 400 "Command already finished" | Tried to cancel a done command | No action needed | +| 403 Forbidden | Non-admin calling fleet-wide endpoint | Add `?agent_id=` filter | +| `status = "failed"`, stderr = "Command timed out" | Timeout | Increase `timeout_seconds` and re-run | +| `status = "interrupted"` | Agent restarted | Re-run the command | +| `status = "pending"` (stuck) | Agent offline for extended period | Notify user; do not cancel unless user confirms | +| `command_id = null` or empty | Dispatch failed | Check response body for error message | +| `jq` parse error on agent list | Control characters in field | Use `grep -o '"hostname":"[^"]*"'` for simple extraction | + +--- + +## Post to #bot-alerts (after every write) + +Post after every successful dispatch or cancel. Never post for read-only operations. + +```bash +ALERT_OUT=$(bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" "") +echo "$ALERT_OUT" +``` + +**Message format:** +``` +[RMM] dispatched to () - -> cmd: +[RMM] cancelled cmd on +``` + +`` = first 8 chars of the UUID. + +**Examples:** +```bash +bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \ + "[RMM] Mike dispatched to WEST-MEADOW-9025 (macos) - disk cleanup AFP symlink -> cmd:3f8a1c2e" + +bash "$REPO_ROOT/.claude/scripts/post-bot-alert.sh" \ + "[RMM] Howard ran disk scan on CS-SERVER (windows) - exit 0, 14 GB freed -> cmd:9b4d7e01" +``` + +ASCII only — no Unicode dashes or arrows. Use `-` and `->`. + +--- + +## Known enrolled agents (verify with GET /api/agents — UUIDs change on re-enroll) + +Do not use this table as authoritative — always resolve live. Treat as a starting hint only. + +| Hostname | Client | OS | +|----------|--------|----| +| WEST-MEADOW-9025 | Scileppi Law | macOS | +| CS-SERVER | Cascades of Tucson | Windows | +| DESKTOP-DLTAGOI | Cascades LE | Windows | +| AD2 | ACG Internal | Windows | + +--- + +## What NOT to use RMM for + +- **Permanent file deletion without user confirmation** — always show what will be deleted, get a yes +- **Credential harvesting** — do not dump password stores, SAM hive, or credential manager contents +- **Software installation without stating what will be installed** — always preview the installer command +- **Bulk destructive operations** (`rm -rf /`, format, `Remove-Item -Recurse C:\Windows`) — require explicit written confirmation from the user in chat, not just "yes" to a vague preview +- **RMM product development** — use the Coding Agent and GuruRMM project context instead diff --git a/.claude/memory/feedback_syncro_content_type.md b/.claude/memory/feedback_syncro_content_type.md new file mode 100644 index 0000000..2a87ae8 --- /dev/null +++ b/.claude/memory/feedback_syncro_content_type.md @@ -0,0 +1,12 @@ +--- +name: feedback-syncro-content-type +description: Syncro API POST calls require explicit Content-Type application/json header or they 400 with an HTML error page +metadata: + type: feedback +--- + +Always include `-H "Content-Type: application/json"` on every Syncro API POST/PUT call (comments, tickets, line items, estimates). + +**Why:** Without it, curl sends the JSON body as `application/x-www-form-urlencoded`, which Syncro rejects with an HTML 400 page instead of a JSON error. The HTML response looks like a hard failure but it's just a missing header. Discovered 2026-05-28 when posting a comment to ticket #32333 — two 400 HTML responses before the fix. + +**How to apply:** Every `curl -X POST` or `curl -X PUT` to the Syncro API needs the header. The subject field is also required on ticket comments (`{"subject":"...","body":"...","hidden":true,"do_not_email":true}`).