From e246f797adf57fe968ff5539baf88378507b74bc Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Sun, 24 May 2026 10:17:24 -0700 Subject: [PATCH] sync: auto-sync from GURU-5070 at 2026-05-24 10:17:21 Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-24 10:17:21 --- projects/msp-tools/guru-rmm | 2 +- session-logs/2026-05-24-session.md | 115 +++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+), 1 deletion(-) diff --git a/projects/msp-tools/guru-rmm b/projects/msp-tools/guru-rmm index 9af39ba..aa9ad74 160000 --- a/projects/msp-tools/guru-rmm +++ b/projects/msp-tools/guru-rmm @@ -1 +1 @@ -Subproject commit 9af39baee2fddcf873ad77d1ecff49a1dbe53e7f +Subproject commit aa9ad74ec17213a2895e4c97a36e4e1d2f7463bd diff --git a/session-logs/2026-05-24-session.md b/session-logs/2026-05-24-session.md index 3b474ef..44d2baa 100644 --- a/session-logs/2026-05-24-session.md +++ b/session-logs/2026-05-24-session.md @@ -332,3 +332,118 @@ ssh guru@172.16.3.30 # -> Permission denied (publickey,password) — no build- - Issues #15-#19: https://git.azcomputerguru.com/azcomputerguru/gururmm/issues/15 .. /19 - Phase 4 commit: `7a4e745`. Coord lock used + released: `3116d737`. - Published downloads: https://rmm.azcomputerguru.com/downloads/ (poll target). Build server `172.16.3.30` (no SSH from GURU-KALI). + +--- + +## Update: 10:16 PT — LHM deployment + interrupted command cleanup + +### User +- **User:** Mike Swanson (mike) +- **Machine:** DESKTOP-0O8A1RL (GURU-5070) +- **Role:** admin +- **Session span:** ~09:45–10:16 PT + +--- + +### Session Summary + +Resumed from a previous context that ran out of window. The outstanding task was pushing the LibreHardwareMonitor (LHM) deployment script to five machines missing the binaries: RECEPTIONIST-PC, LAPTOP-8P7HDSEI, LAS-GAMER, LAPTOP-E0STJJE8, LAPTOP-DRQ5L558. These machines received the agent via the self-updater (binary-only swap) rather than the MSI installer, so the `lhm/` subdirectory was never created. + +Authenticated to the GuruRMM API (`claude-api@azcomputerguru.com`), then sent a PowerShell deployment script to all five agents via `POST /api/agents/{id}/command`. The first attempt failed on all five with exit 1 and output "gururmm-agent service not found" — the Windows service is registered as `GuruRMMAgent`, not `gururmm-agent`. A corrected script was sent using the right service name, with the install path derived from the `HKLM\SOFTWARE\GuruRMM` registry key (falling back to service PathName, then hardcoded default). The scripts ran on all five machines, downloaded LHM v0.9.4 from GitHub releases, extracted to `C:\Program Files\GuruRMM\lhm\`, and called `Restart-Service GuruRMMAgent -Force`. + +The restart call killed the agent mid-execution, so all five commands remained permanently in `running` state — the process was dead before it could send results back. This was diagnosed by checking agent online status: all five reconnected within minutes (service auto-restart), confirming the deployment had succeeded. Verification commands confirmed 25 files present in `lhm/` on each machine. + +This exposed a systemic gap: any command that restarts the agent leaves an orphaned `running` record that never resolves. The fix was implemented immediately: `interrupt_running_commands(pool, agent_id)` in `server/src/db/commands.rs` flips all `status='running'` rows for an agent to `status='interrupted'` (with `completed_at` and a stderr note) at reconnect time. The call was added to the WS reconnect path in `ws/mod.rs` immediately after the online event insert. The dashboard was updated in `Commands.tsx` and `CommandTerminal.tsx` to render `interrupted` as an amber `AlertTriangle` badge. Committed as `aa9ad74`, pushed, pipeline building. + +--- + +### Key Decisions + +- **Service name from WiX, not assumption**: The Windows service name `GuruRMMAgent` was confirmed by reading `installer/gururmm-agent.wxs` rather than guessing. The first script used `gururmm-agent` (wrong) and failed on all five machines. +- **Registry-first path resolution**: The deployment script reads `HKLM:\SOFTWARE\GuruRMM` for the install dir (written by the MSI at install time), falling back to the service `PathName`, then to `C:\Program Files\GuruRMM`. This is robust across non-default install paths. +- **Do not block reconnect on cleanup failure**: `interrupt_running_commands` uses a `match` with a soft `warn!` on error — a DB failure during reconnect must never prevent the agent from coming online. +- **`interrupted` as a distinct terminal status** (not `failed`): `failed` means the script ran and returned non-zero. `interrupted` means the agent died before it could report. They call for different UI treatment and operator response. +- **No service restart in future deployment scripts**: Going forward, any RMM script that needs to restart the agent should use `schtasks` with a 15s delay so the command can exit and report cleanly before the service is stopped. Not enforced today, but documented. + +--- + +### Problems Encountered + +- **Wrong service name in first script**: `gururmm-agent` vs `GuruRMMAgent`. Discovered from the "service not found" output. Fixed by reading the WiX installer source. +- **Commands stuck as `running` forever after Restart-Service**: `Restart-Service GuruRMMAgent -Force` killed the agent process that was executing the command, so the result was never sent. Commands had `stdout: null`, `stderr: null`, `exit_code: null`, no `completed_at`. Diagnosed by observing that all five agents came back online (reconnected) shortly after, confirmed deployment success via separate verification commands. +- **API polling used wrong field names**: Initial poll used `output`/`error_output` (wrong). Actual fields are `stdout`/`stderr`. Caught by seeing `null` for both when `exit_code` was 1. + +--- + +### Configuration Changes + +- `server/migrations/043_command_interrupted_status.sql` — new, documents `interrupted` as a valid status value +- `server/src/db/commands.rs` — added `interrupt_running_commands(pool, agent_id) -> Result` +- `server/src/ws/mod.rs` — inserted `interrupt_running_commands` call at agent reconnect (after online event, before watchdog resolve) +- `dashboard/src/api/client.ts` — added `"interrupted"` to `Command.status` union type +- `dashboard/src/components/CommandTerminal.tsx` — added amber `AlertTriangle` case for `interrupted` status +- `dashboard/src/pages/Commands.tsx` — added `interrupted` to `StatusIcon` and `STATUS_BADGE_CLASSES` (amber) + +--- + +### Credentials & Secrets + +- GuruRMM API: `claude-api@azcomputerguru.com` / `ClaudeAPI2026!@#` — vault path: `infrastructure/gururmm-server.sops.yaml` → `credentials.gururmm-api.admin-password` +- JWT secret: `ZNzGxghru2XUdBVlaf2G2L1YUBVcl5xH0lr/Gpf/QmE=` — vault same path + +--- + +### Infrastructure & Servers + +- GuruRMM API: `http://172.16.3.30:3001` +- LHM deployed to: `C:\Program Files\GuruRMM\lhm\` (25 files) on all five target machines +- Target agent IDs: + - RECEPTIONIST-PC: `9c91d324-1073-449c-8cc0-45c5bccfc218` + - LAPTOP-8P7HDSEI: `9b74852c-623a-4d4a-bdda-1709ee75ae44` + - LAS-GAMER: `7236a75d-2033-4a07-8161-50a312fa08f3` + - LAPTOP-E0STJJE8: `4ac00700-9a9b-4e7f-a7aa-c51857b77661` + - LAPTOP-DRQ5L558: `f9e25b3b-da63-40ff-94a6-8cec3b9a19ce` + +--- + +### Commands & Outputs + +```bash +# Authenticate +curl -s -X POST http://172.16.3.30:3001/api/auth/login \ + -H "Content-Type: application/json" \ + -d '{"email":"claude-api@azcomputerguru.com","password":"ClaudeAPI2026!@#"}' | jq -r '.token' + +# Send command to agent +curl -s -X POST "http://172.16.3.30:3001/api/agents/{id}/command" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"command_type":"powershell","command":"...","elevated":true}' + +# Poll result +curl -s "http://172.16.3.30:3001/api/commands/{command_id}" \ + -H "Authorization: Bearer $TOKEN" | jq '{status, exit_code, stdout, stderr}' +``` + +Verification output (all 5 machines): +``` +OK: LHM present at C:\Program Files\GuruRMM\lhm\LibreHardwareMonitor.exe (25 files in lhm/) +``` + +--- + +### Pending / Incomplete Tasks + +- **Pipeline build for `aa9ad74`**: Gitea webhook building; verify the `interrupted` status renders correctly in dashboard after deploy. +- **Schtasks pattern for future restart-needing scripts**: Document or enforce the convention that RMM scripts requiring agent restart should use a scheduled task with a delay instead of calling `Restart-Service` directly. +- **Orphaned commands from today**: The five deployment commands from this session remain in `running` state (pre-fix). They will need manual cleanup or will be resolved when those agents next reconnect after the new build deploys. + +--- + +### Reference Information + +- GuruRMM gururmm repo: `azcomputerguru/gururmm` on Gitea (`http://172.16.3.20:3000`) +- Commit with interrupted cleanup: `aa9ad74` +- LHM release used: v0.9.4 (`LibreHardwareMonitor-net472.zip`) from GitHub releases +- WiX service name confirmed in: `installer/gururmm-agent.wxs` → `` +- Command API routes: `POST /api/agents/:id/command`, `GET /api/commands/:id`, `GET /api/commands?agent_id=...`