feat(rmm): onboarding diagnostic (Phase 1) - probe + triage + baseline

/rmm diagnose: dispatches a Windows security/health probe to a newly onboarded
agent, grades RED/AMBER/GREEN, writes an immutable per-client baseline
(clients/<slug>/onboarding-baselines/), diffs vs prior, and alerts CRITICALs to
#dev-alerts. Probe is PS5.1/ASCII/SYSTEM-safe, never-abort, base64 chunked upload
around the agent command-size cap. Code-reviewed (no blockers); folded in
immutability guard, severity-independent finding ids, Defender-unknown sentinel,
expanded competitor/backup detection.

First baselines captured: Rednour FRONTDESKRECEPT + LEGALASST (both RED - prior
MSP ScreenConnect/Splashtop/Syncro still live; LEGALASST OS EOL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 13:08:43 -07:00
parent 02c402ea78
commit df9be01065
7 changed files with 4033 additions and 0 deletions

View File

@@ -24,6 +24,7 @@ Interact with the GuruRMM agent fleet: list agents, run remote commands (PowerSh
/rmm cancel <command_id> Cancel a pending or running command
/rmm history <hostname|uuid> [N] Recent command history (default 10, max 500)
/rmm onboard <client name> [site] Create a new client + site, vault the one-time enrollment key
/rmm diagnose <hostname|uuid> [client] Run onboarding health/security diagnostic + baseline
```
---
@@ -679,6 +680,84 @@ git -C "$VR" add "clients/$SLUG/gururmm-site-main.sops.yaml" && git -C "$VR" com
---
## Onboarding diagnostic (`/rmm diagnose`)
Run a one-shot security + health + inventory probe against a newly onboarded
Windows agent and produce a prioritized "take this seriously" report plus an
immutable before/after baseline. This is the Phase 1 tooling implementation;
a native GuruRMM feature (DB-backed storage, scheduled re-baselines) is Phase 3.
**Runner:** `.claude/scripts/run-onboarding-diagnostic.sh <hostname|uuid> [client-slug]`
**Probe:** `.claude/scripts/onboarding-diagnostic.ps1` (Windows PowerShell 5.1, ASCII, runs as SYSTEM)
```bash
bash "$REPO_ROOT/.claude/scripts/run-onboarding-diagnostic.sh" FrontDeskReception rednour
```
If no client-slug is given, it is derived by slugifying the agent's `client_name`.
### Workflow
1. Authenticate to RMM (vault creds, same as the rest of this skill).
2. Resolve the agent (exact UUID, exact hostname, then partial). Windows-only.
3. Upload the probe to the endpoint **base64-encoded, in <24 KB chunks** (the
agent caps an inline command body at ~32-40 KB; the probe is ~60 KB), then a
final small command decodes it to a `.ps1`, runs it, and deletes both temp
files. Every check in the probe is wrapped in try/catch, so one failing check
becomes an `unknown`-severity finding instead of aborting the probe.
4. The probe emits a single JSON object fenced by `===DIAG-JSON-START===` /
`===DIAG-JSON-END===`; the runner extracts it from between the markers.
5. Grade, write two baseline files, diff against any prior baseline, alert.
### Grade model
| Grade | Meaning |
|-------|---------|
| **RED** | At least one `critical` finding |
| **AMBER** | At least one `warning`, no `critical` |
| **GREEN** | No `critical` and no `warning` |
`unknown`-severity findings (a check that failed to run) do not change the grade
but are listed in the report for manual follow-up.
### What it checks
- **Security:** Defender state (RTP/service/signature age/tamper), 3rd-party AV
conflicts, leftover competitor RMM / remote-access agents (ScreenConnect,
NinjaRMM, Datto, Atera, Kaseya, TeamViewer, AnyDesk, Splashtop, N-able,
Syncro, Action1, Automate, LogMeIn), firewall profiles, BitLocker (laptop-aware),
local admins / built-in Administrator / non-expiring passwords, patch posture +
OS EOL, RDP/NLA, SMBv1, UAC, LAPS.
- **Health:** disk free %, SMART/physical-disk health, 14-day stability
(unexpected shutdown / BSOD / disk errors), pending reboot, uptime, failed
auto-start services, domain secure channel, time source, battery (laptops),
backup-agent presence.
- **Inventory baseline (info):** model/serial, CPU/RAM, BIOS, TPM, Secure Boot,
OS edition/build/activation, full installed-software list, local users/groups,
network, scheduled tasks + Run-key autoruns.
### Where baselines are stored
`clients/<slug>/onboarding-baselines/` — two files per run, both timestamped
`<HOST>-<UTC-YYYYMMDDTHHMMSS>`:
- `*.json` — the raw immutable snapshot (do not edit; it is the source of truth
for diffs).
- `*.md` — the human report: grade, findings grouped critical -> warning ->
info -> unknown, inventory summary, and a diff section vs the most recent prior
baseline (new / resolved / regressed findings, software added/removed).
Baselines are immutable and append-only. GuruRMM-DB storage of baselines arrives
with the Phase 3 native feature.
### Alerting
Each **CRITICAL** finding and a **RED** overall grade auto-post a one-line
`[RMM]` alert to **#dev-alerts** via `post-bot-alert.sh` (ASCII only, soft-fail).
Example: `[RMM] Onboarding diag <HOST> (<client>) = RED: <n> critical - <titles>`.
---
## Known enrolled agents (verify with GET /api/agents — UUIDs change on re-enroll)
Do not use this table as authoritative — always resolve live. Treat as a starting hint only.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,574 @@
#!/usr/bin/env bash
# run-onboarding-diagnostic.sh - GuruRMM onboarding diagnostic runner (Phase 1).
#
# Dispatches .claude/scripts/onboarding-diagnostic.ps1 to a Windows agent via the
# GuruRMM RMM API, extracts the fenced JSON result, grades it RED/AMBER/GREEN,
# writes an immutable baseline (JSON + Markdown report) under
# clients/<slug>/onboarding-baselines/, diffs against any prior baseline, and
# alerts #dev-alerts on RED / critical findings.
#
# Usage:
# bash run-onboarding-diagnostic.sh <hostname-or-uuid> [client-slug]
#
# Mirrors the plumbing in .claude/commands/rmm.md (vault auth -> JWT -> dispatch
# -> poll -> command_text/stdout). Read-only against the endpoint; the probe only
# collects, it changes nothing.
set -u
# ---------------------------------------------------------------------------
# Args
# ---------------------------------------------------------------------------
TARGET="${1:-}"
CLIENT_SLUG="${2:-}"
if [ -z "$TARGET" ]; then
echo "[ERROR] Usage: bash run-onboarding-diagnostic.sh <hostname-or-uuid> [client-slug]" >&2
exit 1
fi
# ---------------------------------------------------------------------------
# Bootstrap (resolve repo root, vault, RMM base)
# ---------------------------------------------------------------------------
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
VAULT="$REPO_ROOT/.claude/scripts/vault.sh"
PROBE="$SCRIPT_DIR/onboarding-diagnostic.ps1"
ALERT="$REPO_ROOT/.claude/scripts/post-bot-alert.sh"
RMM="http://172.16.3.30:3001"
if [ ! -f "$PROBE" ]; then
echo "[ERROR] Probe script not found: $PROBE" >&2
exit 1
fi
for tool in jq curl; do
if ! command -v "$tool" >/dev/null 2>&1; then
echo "[ERROR] Required tool not found: $tool" >&2
exit 1
fi
done
# Soft-fail wrapper for the bot alert so an alerting failure never aborts the run.
post_alert() {
local msg="$1"
if [ -f "$ALERT" ]; then
bash "$ALERT" "$msg" >/dev/null 2>&1 || true
fi
}
# ---------------------------------------------------------------------------
# Authenticate
# ---------------------------------------------------------------------------
RMM_EMAIL="$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-email 2>/dev/null)"
RMM_PASS="$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-password 2>/dev/null)"
if [ -z "$RMM_EMAIL" ] || [ -z "$RMM_PASS" ] || [ "$RMM_EMAIL" = "null" ]; then
echo "[ERROR] Could not read GuruRMM credentials from vault (infrastructure/gururmm-server.sops.yaml)" >&2
exit 1
fi
LOGIN_PAYLOAD="$(jq -nc --arg e "$RMM_EMAIL" --arg p "$RMM_PASS" '{email:$e, password:$p}')"
TOKEN="$(curl -s -m 30 -X POST "$RMM/api/auth/login" \
-H "Content-Type: application/json" \
--data-binary "$LOGIN_PAYLOAD" | jq -r '.token // empty')"
if [ -z "$TOKEN" ]; then
echo "[ERROR] RMM login failed (no token returned)" >&2
exit 1
fi
echo "[OK] Authenticated to GuruRMM"
# ---------------------------------------------------------------------------
# Resolve agent (by exact UUID, exact hostname, then partial hostname)
# ---------------------------------------------------------------------------
AGENTS="$(curl -s -m 30 "$RMM/api/agents" -H "Authorization: Bearer $TOKEN")"
if [ -z "$AGENTS" ] || ! echo "$AGENTS" | jq -e 'type=="array"' >/dev/null 2>&1; then
echo "[ERROR] Could not retrieve agent list" >&2
exit 1
fi
# UUID-shaped target -> match by id; otherwise match by hostname.
AGENT=""
if echo "$TARGET" | grep -qiE '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'; then
AGENT="$(echo "$AGENTS" | jq --arg id "$TARGET" '[.[] | select(.id==$id)] | .[0] // empty')"
else
# exact hostname (case-insensitive) first
AGENT="$(echo "$AGENTS" | jq --arg h "$TARGET" '[.[] | select((.hostname|ascii_downcase)==($h|ascii_downcase))] | .[0] // empty')"
if [ -z "$AGENT" ] || [ "$AGENT" = "null" ]; then
# partial match
MATCHES="$(echo "$AGENTS" | jq --arg h "$TARGET" '[.[] | select(.hostname|ascii_downcase|contains($h|ascii_downcase))]')"
COUNT="$(echo "$MATCHES" | jq 'length')"
if [ "$COUNT" = "0" ]; then
AGENT=""
elif [ "$COUNT" = "1" ]; then
AGENT="$(echo "$MATCHES" | jq '.[0]')"
else
echo "[ERROR] Multiple agents match '$TARGET' - be more specific:" >&2
echo "$MATCHES" | jq -r '.[] | " \(.hostname) (\(.os_type)) id=\(.id) client=\(.client_name)"' >&2
exit 1
fi
fi
fi
if [ -z "$AGENT" ] || [ "$AGENT" = "null" ]; then
echo "[ERROR] No agent found matching '$TARGET'. Run /rmm agents to list enrolled agents." >&2
exit 1
fi
AGENT_ID="$(echo "$AGENT" | jq -r '.id // empty')"
AGENT_HOST="$(echo "$AGENT" | jq -r '.hostname // empty')"
AGENT_OS="$(echo "$AGENT" | jq -r '.os_type // empty')"
AGENT_STATUS="$(echo "$AGENT" | jq -r '.status // "unknown"')"
AGENT_CONNECTED="$(echo "$AGENT" | jq -r '.is_connected // "null"')"
AGENT_CLIENT="$(echo "$AGENT" | jq -r '.client_name // empty')"
AGENT_LAST="$(echo "$AGENT" | jq -r '.last_seen // "never"')"
echo "[OK] Agent: $AGENT_HOST ($AGENT_OS) status=$AGENT_STATUS connected=$AGENT_CONNECTED client=$AGENT_CLIENT last_seen=$AGENT_LAST id=$AGENT_ID"
if [ "$AGENT_OS" != "windows" ]; then
echo "[ERROR] This diagnostic is Windows-only. Agent os_type='$AGENT_OS'." >&2
exit 1
fi
# Treat online if status==online OR is_connected==true (is_connected can be null even when online).
if [ "$AGENT_STATUS" != "online" ] && [ "$AGENT_CONNECTED" != "true" ]; then
echo "[WARNING] Agent appears offline (status=$AGENT_STATUS). The command will queue and run when it reconnects."
fi
# Derive client slug if not supplied: prefer explicit arg; else slugify client_name.
if [ -z "$CLIENT_SLUG" ]; then
if [ -n "$AGENT_CLIENT" ]; then
CLIENT_SLUG="$(echo "$AGENT_CLIENT" | tr '[:upper:]' '[:lower:]' | sed -E 's/[^a-z0-9]+/-/g; s/^-+//; s/-+$//')"
echo "[INFO] No client slug supplied; derived '$CLIENT_SLUG' from client name '$AGENT_CLIENT'."
else
CLIENT_SLUG="_unsorted"
echo "[WARNING] No client slug and no client name; using '_unsorted'."
fi
fi
# ---------------------------------------------------------------------------
# Command dispatch helper
# ---------------------------------------------------------------------------
# The agent caps the inline command body at roughly 32-40 KB (above that it
# returns "Failed to execute command" before PowerShell ever runs). The probe is
# ~60 KB, so we cannot send it inline. Instead we:
# 1. base64-encode the probe locally,
# 2. upload it to a temp file on the endpoint in <24 KB chunks (one command
# each: first writes, the rest append),
# 3. send a final small command that decodes the file to a .ps1, runs it,
# prints the fenced JSON, and deletes both temp files.
# Each dispatched command stays well under the agent limit, so this scales no
# matter how large the probe grows in later phases.
WORK_DIR="$(mktemp -d 2>/dev/null || echo "${TMPDIR:-/tmp}/onboard-diag-$$")"
mkdir -p "$WORK_DIR" 2>/dev/null || true
cleanup() { rm -rf "$WORK_DIR" 2>/dev/null || true; }
trap cleanup EXIT
# dispatch_one <command-file-with-script> <timeout_seconds> -> echoes result JSON, returns 0/1
dispatch_one() {
local script_file="$1"
local to="$2"
local payload_file resp cmd_id status result count
payload_file="$WORK_DIR/payload.json"
jq -nc --rawfile cmd "$script_file" --argjson to "$to" \
'{command_type:"powershell", command:$cmd, timeout_seconds:$to}' > "$payload_file"
resp="$(curl -s -m 30 -X POST "$RMM/api/agents/$AGENT_ID/command" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
--data-binary "@$payload_file")"
cmd_id="$(echo "$resp" | jq -r '.command_id // empty')"
if [ -z "$cmd_id" ]; then
echo "[ERROR] Dispatch failed: $resp" >&2
return 1
fi
count=0
while [ $count -lt 72 ]; do
result="$(curl -s -m 30 "$RMM/api/commands/$cmd_id" -H "Authorization: Bearer $TOKEN")"
status="$(echo "$result" | jq -r '.status // empty')"
case "$status" in
completed|failed|cancelled|interrupted)
# Persist the command id to a file: this function runs in a $( )
# subshell, so a plain variable assignment would not survive.
printf '%s' "$cmd_id" > "$WORK_DIR/last_cmd_id" 2>/dev/null || true
echo "$result"
return 0
;;
running|pending|"") count=$((count + 1)); sleep 5 ;;
*) count=$((count + 1)); sleep 5 ;;
esac
done
echo "[ERROR] Command $cmd_id did not finish (last status=$status)" >&2
return 1
}
# ---------------------------------------------------------------------------
# Upload probe (base64, chunked) then execute
# ---------------------------------------------------------------------------
echo "[INFO] Uploading probe to endpoint (chunked base64)..."
# Stable-ish remote temp names; unique per run via timestamp+pid.
REMOTE_TAG="grmm_onboard_$(date -u +%Y%m%d%H%M%S)_$$"
REMOTE_B64="\$env:TEMP\\${REMOTE_TAG}.b64"
REMOTE_PS1="\$env:TEMP\\${REMOTE_TAG}.ps1"
# Produce base64 (single line) and split into chunks.
B64_FILE="$WORK_DIR/probe.b64"
base64 -w0 "$PROBE" > "$B64_FILE" 2>/dev/null || base64 "$PROBE" | tr -d '\n' > "$B64_FILE"
CHUNK_DIR="$WORK_DIR/chunks"
mkdir -p "$CHUNK_DIR"
split -b 24000 "$B64_FILE" "$CHUNK_DIR/chunk_"
CHUNKS=$(ls -1 "$CHUNK_DIR"/chunk_* | sort)
N_CHUNKS=$(echo "$CHUNKS" | wc -l | tr -d ' ')
echo "[INFO] Probe is $(wc -c < "$PROBE") bytes -> $N_CHUNKS chunk(s)"
IDX=0
for ch in $CHUNKS; do
IDX=$((IDX + 1))
# INVARIANT: DATA is RFC4648 standard base64 (alphabet A-Za-z0-9+/ with '='
# padding). None of those characters are PowerShell metacharacters, so DATA
# is safe to interpolate raw into the here-doc below. If this is ever changed
# to base64url (alphabet adds '-' and '_'), it stays safe too - but revisit
# this assertion before swapping the encoder, do not assume silently.
DATA="$(cat "$ch")"
SCRIPT_FILE="$WORK_DIR/chunkcmd.ps1"
if [ "$IDX" -eq 1 ]; then
# First chunk: create/overwrite the file (no newline appended).
cat > "$SCRIPT_FILE" <<PS
\$ErrorActionPreference = 'Stop'
[System.IO.File]::WriteAllText("$REMOTE_B64", "$DATA")
Write-Output "CHUNK $IDX OK"
PS
else
cat > "$SCRIPT_FILE" <<PS
\$ErrorActionPreference = 'Stop'
[System.IO.File]::AppendAllText("$REMOTE_B64", "$DATA")
Write-Output "CHUNK $IDX OK"
PS
fi
CH_RESULT="$(dispatch_one "$SCRIPT_FILE" 60)" || { echo "[ERROR] Chunk $IDX dispatch failed" >&2; exit 1; }
CH_STATUS="$(echo "$CH_RESULT" | jq -r '.status')"
if [ "$CH_STATUS" != "completed" ]; then
echo "[ERROR] Chunk $IDX upload failed: status=$CH_STATUS stderr=$(echo "$CH_RESULT" | jq -r '.stderr' | head -c 200)" >&2
exit 1
fi
echo "[OK] Uploaded chunk $IDX/$N_CHUNKS"
done
echo "[INFO] Decoding and executing probe on endpoint (timeout 240s)..."
# Final command: decode base64 file -> .ps1, run it, then clean up both temp files.
RUN_SCRIPT="$WORK_DIR/runcmd.ps1"
cat > "$RUN_SCRIPT" <<PS
\$ErrorActionPreference = 'Continue'
try {
\$b64 = [System.IO.File]::ReadAllText("$REMOTE_B64")
\$bytes = [System.Convert]::FromBase64String(\$b64)
[System.IO.File]::WriteAllBytes("$REMOTE_PS1", \$bytes)
& powershell.exe -NonInteractive -ExecutionPolicy Bypass -File "$REMOTE_PS1"
} catch {
Write-Output ("PROBE_RUN_ERROR: " + \$_.Exception.Message)
} finally {
Remove-Item -Path "$REMOTE_B64" -Force -ErrorAction SilentlyContinue
Remove-Item -Path "$REMOTE_PS1" -Force -ErrorAction SilentlyContinue
}
PS
RESULT="$(dispatch_one "$RUN_SCRIPT" 240)" || { echo "[ERROR] Probe execution dispatch failed" >&2; exit 1; }
CMD_ID="$(cat "$WORK_DIR/last_cmd_id" 2>/dev/null || echo unknown)"
FINAL_STATUS="$(echo "$RESULT" | jq -r '.status // empty')"
EXIT_CODE="$(echo "$RESULT" | jq -r '.exit_code // "null"')"
STDOUT="$(echo "$RESULT" | jq -r '.stdout // ""')"
STDERR="$(echo "$RESULT" | jq -r '.stderr // ""')"
echo "[INFO] Probe finished: status=$FINAL_STATUS exit_code=$EXIT_CODE stdout_len=${#STDOUT} stderr_len=${#STDERR} cmd=$CMD_ID"
# ---------------------------------------------------------------------------
# Extract fenced JSON from stdout
# ---------------------------------------------------------------------------
# Pull text strictly between the markers. awk handles arbitrary surrounding noise.
DIAG_JSON="$(printf '%s' "$STDOUT" | awk '
/===DIAG-JSON-START===/ { capture=1; next }
/===DIAG-JSON-END===/ { capture=0 }
capture { print }
')"
if [ -z "$DIAG_JSON" ] || ! echo "$DIAG_JSON" | jq -e '.host' >/dev/null 2>&1; then
echo "[ERROR] Could not extract valid diagnostic JSON from probe output." >&2
echo "[ERROR] status=$FINAL_STATUS exit_code=$EXIT_CODE" >&2
if [ -n "$STDERR" ]; then
echo "--- stderr ---" >&2
printf '%s\n' "$STDERR" | head -40 >&2
fi
echo "--- stdout (first 60 lines) ---" >&2
printf '%s\n' "$STDOUT" | head -60 >&2
exit 1
fi
echo "[OK] Extracted diagnostic JSON ($(echo "$DIAG_JSON" | wc -c | tr -d ' ') bytes)"
# ---------------------------------------------------------------------------
# Grade: RED (any critical) / AMBER (any warning, no critical) / GREEN (none)
# ---------------------------------------------------------------------------
N_CRIT="$(echo "$DIAG_JSON" | jq '[.findings[] | select(.severity=="critical")] | length')"
N_WARN="$(echo "$DIAG_JSON" | jq '[.findings[] | select(.severity=="warning")] | length')"
N_UNK="$(echo "$DIAG_JSON" | jq '[.findings[] | select(.severity=="unknown")] | length')"
N_INFO="$(echo "$DIAG_JSON" | jq '[.findings[] | select(.severity=="info")] | length')"
if [ "$N_CRIT" -gt 0 ]; then
GRADE="RED"
elif [ "$N_WARN" -gt 0 ]; then
GRADE="AMBER"
else
GRADE="GREEN"
fi
PROBE_HOST="$(echo "$DIAG_JSON" | jq -r '.host // empty')"
[ -z "$PROBE_HOST" ] && PROBE_HOST="$AGENT_HOST"
COLLECTED="$(echo "$DIAG_JSON" | jq -r '.collected_at_utc // empty')"
echo "[INFO] Grade=$GRADE critical=$N_CRIT warning=$N_WARN unknown=$N_UNK info=$N_INFO"
# ---------------------------------------------------------------------------
# Output paths
# ---------------------------------------------------------------------------
BASE_DIR="$REPO_ROOT/clients/$CLIENT_SLUG/onboarding-baselines"
mkdir -p "$BASE_DIR"
UTC_STAMP="$(date -u +%Y%m%dT%H%M%S)"
SAFE_HOST="$(echo "$PROBE_HOST" | sed -E 's/[^A-Za-z0-9._-]+/_/g')"
JSON_PATH="$BASE_DIR/${SAFE_HOST}-${UTC_STAMP}.json"
MD_PATH="$BASE_DIR/${SAFE_HOST}-${UTC_STAMP}.md"
# Immutability guard: the per-second UTC_STAMP can collide if two runs land in
# the same second (or a re-run of the same dispatch). A baseline is immutable
# once written, so never truncate an existing one - append a PID uniquifier
# instead so the prior baseline survives intact.
if [ -e "$JSON_PATH" ]; then JSON_PATH="${JSON_PATH%.json}-$$.json"; MD_PATH="${MD_PATH%.md}-$$.md"; fi
# Find the most recent PRIOR baseline json for this host (before we write the new one).
PRIOR_JSON=""
PRIOR_JSON="$(ls -1 "$BASE_DIR/${SAFE_HOST}-"*.json 2>/dev/null | sort | tail -n 1)"
# Write the immutable raw snapshot (pretty-printed for readability/diffing).
echo "$DIAG_JSON" | jq '.' > "$JSON_PATH"
# ---------------------------------------------------------------------------
# Build the Markdown report
# ---------------------------------------------------------------------------
{
echo "# Onboarding Diagnostic Baseline - $PROBE_HOST"
echo ""
echo "- **Grade:** $GRADE"
echo "- **Host:** $PROBE_HOST"
echo "- **Client:** ${AGENT_CLIENT:-$CLIENT_SLUG} (\`$CLIENT_SLUG\`)"
echo "- **Collected (UTC):** $COLLECTED"
echo "- **Agent ID:** $AGENT_ID"
echo "- **Command ID:** $CMD_ID"
echo "- **Findings:** $N_CRIT critical / $N_WARN warning / $N_INFO info / $N_UNK unknown"
echo ""
OS_CAPTION="$(echo "$DIAG_JSON" | jq -r '.os.caption // "?"')"
OS_BUILD="$(echo "$DIAG_JSON" | jq -r '.os.build // "?"')"
echo "- **OS:** $OS_CAPTION (build $OS_BUILD)"
echo ""
echo "---"
echo ""
for sev in critical warning info unknown; do
SEV_COUNT="$(echo "$DIAG_JSON" | jq --arg s "$sev" '[.findings[] | select(.severity==$s)] | length')"
[ "$SEV_COUNT" = "0" ] && continue
SEV_LABEL="$(echo "$sev" | tr '[:lower:]' '[:upper:]')"
echo "## $SEV_LABEL ($SEV_COUNT)"
echo ""
echo "$DIAG_JSON" | jq -r --arg s "$sev" '
.findings[] | select(.severity==$s) |
"### " + .title + "\n" +
"- **Category:** " + (.category // "?") + "\n" +
"- **ID:** `" + (.id // "?") + "`\n" +
"- " + (.detail // "") + "\n" +
(if (.evidence // "") != "" then "\n```\n" + .evidence + "\n```\n" else "" end)
'
echo ""
done
echo "---"
echo ""
echo "## Inventory Baseline Summary"
echo ""
echo "$DIAG_JSON" | jq -r '
.facts as $f |
"- **Manufacturer / Model:** " + (($f.hardware.manufacturer // "?") + " / " + ($f.hardware.model // "?")) + "\n" +
"- **Serial:** " + ($f.hardware.serial // "?") + "\n" +
"- **CPU:** " + ($f.hardware.cpu // "?") + " (" + (($f.hardware.cpu_cores // 0)|tostring) + " cores / " + (($f.hardware.cpu_logical // 0)|tostring) + " logical)\n" +
"- **RAM (GB):** " + (($f.hardware.ram_gb // 0)|tostring) + "\n" +
"- **BIOS:** " + ($f.hardware.bios_version // "?") + " (" + ($f.hardware.bios_date // "?") + ")\n" +
"- **Chassis is laptop:** " + (($f.is_laptop // false)|tostring) + "\n" +
"- **TPM present / Secure Boot:** " + (($f.tpm.present // "?")|tostring) + " / " + (($f.secure_boot // "?")|tostring) + "\n" +
"- **Domain joined:** " + (($f.domain_joined // false)|tostring) + " (" + ($f.domain // "?") + ")\n" +
"- **OS activation licensed:** " + (($f.activation.licensed // "?")|tostring) + "\n" +
"- **Uptime (days):** " + (($f.uptime_days // "?")|tostring) + "\n" +
"- **Pending reboot:** " + (($f.pending_reboot // false)|tostring) + "\n" +
"- **Installed software count:** " + (($f.installed_software_count // 0)|tostring) + "\n" +
"- **Scheduled tasks (non-MS, enabled):** " + (($f.scheduled_tasks_count // 0)|tostring) + "\n" +
"- **Local administrators:** " + (($f.local_administrators // []) | join(", "))
'
echo ""
echo "### Fixed volumes"
echo ""
echo "$DIAG_JSON" | jq -r '
(.facts.volumes // []) | .[] |
"- " + (.drive // "?") + " - " + ((.free_gb // 0)|tostring) + " GB free of " + ((.size_gb // 0)|tostring) + " GB (" + ((.free_pct // 0)|tostring) + "%)"
'
echo ""
echo "### Network adapters"
echo ""
echo "$DIAG_JSON" | jq -r '
(.facts.network_adapters // []) | .[] |
"- " + (.description // "?") + " - IP: " + ((.ip // []) | join(", ")) + " - DNS: " + ((.dns // []) | join(", ")) + " - DHCP: " + ((.dhcp // false)|tostring)
'
echo ""
# -----------------------------------------------------------------------
# DIFF section vs prior baseline
# -----------------------------------------------------------------------
if [ -n "$PRIOR_JSON" ] && [ -f "$PRIOR_JSON" ]; then
PRIOR_STAMP="$(basename "$PRIOR_JSON")"
echo "---"
echo ""
echo "## Diff vs Prior Baseline"
echo ""
echo "- **Compared against:** \`$PRIOR_STAMP\`"
echo ""
# New findings: ids present now but not before.
NEW_FINDINGS="$(jq -n \
--slurpfile cur "$JSON_PATH" \
--slurpfile old "$PRIOR_JSON" '
($old[0].findings // []) as $o |
($cur[0].findings // []) as $c |
($o | map(.id)) as $oids |
[ $c[] | select(.severity!="info") | select(.id as $id | ($oids | index($id)) | not) ]
')"
# Resolved findings: ids present before but not now.
RESOLVED_FINDINGS="$(jq -n \
--slurpfile cur "$JSON_PATH" \
--slurpfile old "$PRIOR_JSON" '
($old[0].findings // []) as $o |
($cur[0].findings // []) as $c |
($c | map(.id)) as $cids |
[ $o[] | select(.severity!="info") | select(.id as $id | ($cids | index($id)) | not) ]
')"
# Regressed: same id, severity got worse (info<warning<critical; unknown treated as warning-level).
REGRESSED="$(jq -n \
--slurpfile cur "$JSON_PATH" \
--slurpfile old "$PRIOR_JSON" '
def rank(s): if s=="critical" then 3 elif s=="warning" then 2 elif s=="unknown" then 2 elif s=="info" then 1 else 0 end;
($old[0].findings // []) as $o |
($cur[0].findings // []) as $c |
($o | map({key:.id, value:.severity}) | from_entries) as $om |
[ $c[] | select(.id as $id | $om[$id] != null) | select(rank(.severity) > rank($om[.id])) |
{id, title, was: $om[.id], now: .severity} ]
')"
echo "**New findings:**"
echo ""
if [ "$(echo "$NEW_FINDINGS" | jq 'length')" = "0" ]; then
echo "- (none)"
else
echo "$NEW_FINDINGS" | jq -r '.[] | "- [" + (.severity|ascii_upcase) + "] " + .title'
fi
echo ""
echo "**Resolved findings:**"
echo ""
if [ "$(echo "$RESOLVED_FINDINGS" | jq 'length')" = "0" ]; then
echo "- (none)"
else
echo "$RESOLVED_FINDINGS" | jq -r '.[] | "- [" + (.severity|ascii_upcase) + "] " + .title'
fi
echo ""
echo "**Regressed findings:**"
echo ""
if [ "$(echo "$REGRESSED" | jq 'length')" = "0" ]; then
echo "- (none)"
else
echo "$REGRESSED" | jq -r '.[] | "- " + .title + " (" + .was + " -> " + .now + ")"'
fi
echo ""
# Installed-software deltas
SW_ADDED="$(jq -n \
--slurpfile cur "$JSON_PATH" \
--slurpfile old "$PRIOR_JSON" '
((($old[0].facts.installed_software // []) | map(.name)) | unique) as $o |
((($cur[0].facts.installed_software // []) | map(.name)) | unique) as $c |
[ $c[] | select(. as $n | ($o | index($n)) | not) ]
')"
SW_REMOVED="$(jq -n \
--slurpfile cur "$JSON_PATH" \
--slurpfile old "$PRIOR_JSON" '
((($old[0].facts.installed_software // []) | map(.name)) | unique) as $o |
((($cur[0].facts.installed_software // []) | map(.name)) | unique) as $c |
[ $o[] | select(. as $n | ($c | index($n)) | not) ]
')"
echo "**Software added:**"
echo ""
if [ "$(echo "$SW_ADDED" | jq 'length')" = "0" ]; then
echo "- (none)"
else
echo "$SW_ADDED" | jq -r '.[] | "- " + .'
fi
echo ""
echo "**Software removed:**"
echo ""
if [ "$(echo "$SW_REMOVED" | jq 'length')" = "0" ]; then
echo "- (none)"
else
echo "$SW_REMOVED" | jq -r '.[] | "- " + .'
fi
echo ""
else
echo "---"
echo ""
echo "## Diff vs Prior Baseline"
echo ""
echo "- No prior baseline found for this host. This is the first baseline."
echo ""
fi
echo "---"
echo ""
echo "_Generated by run-onboarding-diagnostic.sh (GuruRMM onboarding diagnostic, Phase 1). Raw snapshot: \`$(basename "$JSON_PATH")\` (immutable)._"
} > "$MD_PATH"
# ---------------------------------------------------------------------------
# Alerts (soft-fail): one line for RED overall, one per critical finding (capped)
# ---------------------------------------------------------------------------
if [ "$GRADE" = "RED" ]; then
CRIT_TITLES="$(echo "$DIAG_JSON" | jq -r '[.findings[] | select(.severity=="critical") | .title] | .[0:3] | join("; ")')"
MORE=""
if [ "$N_CRIT" -gt 3 ]; then MORE=" (+$((N_CRIT - 3)) more)"; fi
post_alert "[RMM] Onboarding diag $PROBE_HOST ($CLIENT_SLUG) = RED: $N_CRIT critical - ${CRIT_TITLES}${MORE}"
elif [ "$GRADE" = "AMBER" ]; then
post_alert "[RMM] Onboarding diag $PROBE_HOST ($CLIENT_SLUG) = AMBER: $N_WARN warning, 0 critical"
fi
# ---------------------------------------------------------------------------
# Final console summary
# ---------------------------------------------------------------------------
echo ""
echo "=========================================================="
echo " Onboarding diagnostic complete"
echo " Host: $PROBE_HOST"
echo " Client: ${AGENT_CLIENT:-$CLIENT_SLUG} ($CLIENT_SLUG)"
echo " Grade: $GRADE ($N_CRIT critical / $N_WARN warning / $N_INFO info / $N_UNK unknown)"
echo " JSON: $JSON_PATH"
echo " Report: $MD_PATH"
echo "=========================================================="
echo ""
echo "Report path: $MD_PATH"