sync: auto-sync from GURU-5070 at 2026-05-24 10:17:21
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-24 10:17:21
This commit is contained in:
Submodule projects/msp-tools/guru-rmm updated: 9af39baee2...aa9ad74ec1
@@ -332,3 +332,118 @@ ssh guru@172.16.3.30 # -> Permission denied (publickey,password) — no build-
|
||||
- Issues #15-#19: https://git.azcomputerguru.com/azcomputerguru/gururmm/issues/15 .. /19
|
||||
- Phase 4 commit: `7a4e745`. Coord lock used + released: `3116d737`.
|
||||
- Published downloads: https://rmm.azcomputerguru.com/downloads/ (poll target). Build server `172.16.3.30` (no SSH from GURU-KALI).
|
||||
|
||||
---
|
||||
|
||||
## Update: 10:16 PT — LHM deployment + interrupted command cleanup
|
||||
|
||||
### User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** DESKTOP-0O8A1RL (GURU-5070)
|
||||
- **Role:** admin
|
||||
- **Session span:** ~09:45–10:16 PT
|
||||
|
||||
---
|
||||
|
||||
### Session Summary
|
||||
|
||||
Resumed from a previous context that ran out of window. The outstanding task was pushing the LibreHardwareMonitor (LHM) deployment script to five machines missing the binaries: RECEPTIONIST-PC, LAPTOP-8P7HDSEI, LAS-GAMER, LAPTOP-E0STJJE8, LAPTOP-DRQ5L558. These machines received the agent via the self-updater (binary-only swap) rather than the MSI installer, so the `lhm/` subdirectory was never created.
|
||||
|
||||
Authenticated to the GuruRMM API (`claude-api@azcomputerguru.com`), then sent a PowerShell deployment script to all five agents via `POST /api/agents/{id}/command`. The first attempt failed on all five with exit 1 and output "gururmm-agent service not found" — the Windows service is registered as `GuruRMMAgent`, not `gururmm-agent`. A corrected script was sent using the right service name, with the install path derived from the `HKLM\SOFTWARE\GuruRMM` registry key (falling back to service PathName, then hardcoded default). The scripts ran on all five machines, downloaded LHM v0.9.4 from GitHub releases, extracted to `C:\Program Files\GuruRMM\lhm\`, and called `Restart-Service GuruRMMAgent -Force`.
|
||||
|
||||
The restart call killed the agent mid-execution, so all five commands remained permanently in `running` state — the process was dead before it could send results back. This was diagnosed by checking agent online status: all five reconnected within minutes (service auto-restart), confirming the deployment had succeeded. Verification commands confirmed 25 files present in `lhm/` on each machine.
|
||||
|
||||
This exposed a systemic gap: any command that restarts the agent leaves an orphaned `running` record that never resolves. The fix was implemented immediately: `interrupt_running_commands(pool, agent_id)` in `server/src/db/commands.rs` flips all `status='running'` rows for an agent to `status='interrupted'` (with `completed_at` and a stderr note) at reconnect time. The call was added to the WS reconnect path in `ws/mod.rs` immediately after the online event insert. The dashboard was updated in `Commands.tsx` and `CommandTerminal.tsx` to render `interrupted` as an amber `AlertTriangle` badge. Committed as `aa9ad74`, pushed, pipeline building.
|
||||
|
||||
---
|
||||
|
||||
### Key Decisions
|
||||
|
||||
- **Service name from WiX, not assumption**: The Windows service name `GuruRMMAgent` was confirmed by reading `installer/gururmm-agent.wxs` rather than guessing. The first script used `gururmm-agent` (wrong) and failed on all five machines.
|
||||
- **Registry-first path resolution**: The deployment script reads `HKLM:\SOFTWARE\GuruRMM` for the install dir (written by the MSI at install time), falling back to the service `PathName`, then to `C:\Program Files\GuruRMM`. This is robust across non-default install paths.
|
||||
- **Do not block reconnect on cleanup failure**: `interrupt_running_commands` uses a `match` with a soft `warn!` on error — a DB failure during reconnect must never prevent the agent from coming online.
|
||||
- **`interrupted` as a distinct terminal status** (not `failed`): `failed` means the script ran and returned non-zero. `interrupted` means the agent died before it could report. They call for different UI treatment and operator response.
|
||||
- **No service restart in future deployment scripts**: Going forward, any RMM script that needs to restart the agent should use `schtasks` with a 15s delay so the command can exit and report cleanly before the service is stopped. Not enforced today, but documented.
|
||||
|
||||
---
|
||||
|
||||
### Problems Encountered
|
||||
|
||||
- **Wrong service name in first script**: `gururmm-agent` vs `GuruRMMAgent`. Discovered from the "service not found" output. Fixed by reading the WiX installer source.
|
||||
- **Commands stuck as `running` forever after Restart-Service**: `Restart-Service GuruRMMAgent -Force` killed the agent process that was executing the command, so the result was never sent. Commands had `stdout: null`, `stderr: null`, `exit_code: null`, no `completed_at`. Diagnosed by observing that all five agents came back online (reconnected) shortly after, confirmed deployment success via separate verification commands.
|
||||
- **API polling used wrong field names**: Initial poll used `output`/`error_output` (wrong). Actual fields are `stdout`/`stderr`. Caught by seeing `null` for both when `exit_code` was 1.
|
||||
|
||||
---
|
||||
|
||||
### Configuration Changes
|
||||
|
||||
- `server/migrations/043_command_interrupted_status.sql` — new, documents `interrupted` as a valid status value
|
||||
- `server/src/db/commands.rs` — added `interrupt_running_commands(pool, agent_id) -> Result<u64, sqlx::Error>`
|
||||
- `server/src/ws/mod.rs` — inserted `interrupt_running_commands` call at agent reconnect (after online event, before watchdog resolve)
|
||||
- `dashboard/src/api/client.ts` — added `"interrupted"` to `Command.status` union type
|
||||
- `dashboard/src/components/CommandTerminal.tsx` — added amber `AlertTriangle` case for `interrupted` status
|
||||
- `dashboard/src/pages/Commands.tsx` — added `interrupted` to `StatusIcon` and `STATUS_BADGE_CLASSES` (amber)
|
||||
|
||||
---
|
||||
|
||||
### Credentials & Secrets
|
||||
|
||||
- GuruRMM API: `claude-api@azcomputerguru.com` / `ClaudeAPI2026!@#` — vault path: `infrastructure/gururmm-server.sops.yaml` → `credentials.gururmm-api.admin-password`
|
||||
- JWT secret: `ZNzGxghru2XUdBVlaf2G2L1YUBVcl5xH0lr/Gpf/QmE=` — vault same path
|
||||
|
||||
---
|
||||
|
||||
### Infrastructure & Servers
|
||||
|
||||
- GuruRMM API: `http://172.16.3.30:3001`
|
||||
- LHM deployed to: `C:\Program Files\GuruRMM\lhm\` (25 files) on all five target machines
|
||||
- Target agent IDs:
|
||||
- RECEPTIONIST-PC: `9c91d324-1073-449c-8cc0-45c5bccfc218`
|
||||
- LAPTOP-8P7HDSEI: `9b74852c-623a-4d4a-bdda-1709ee75ae44`
|
||||
- LAS-GAMER: `7236a75d-2033-4a07-8161-50a312fa08f3`
|
||||
- LAPTOP-E0STJJE8: `4ac00700-9a9b-4e7f-a7aa-c51857b77661`
|
||||
- LAPTOP-DRQ5L558: `f9e25b3b-da63-40ff-94a6-8cec3b9a19ce`
|
||||
|
||||
---
|
||||
|
||||
### Commands & Outputs
|
||||
|
||||
```bash
|
||||
# Authenticate
|
||||
curl -s -X POST http://172.16.3.30:3001/api/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email":"claude-api@azcomputerguru.com","password":"ClaudeAPI2026!@#"}' | jq -r '.token'
|
||||
|
||||
# Send command to agent
|
||||
curl -s -X POST "http://172.16.3.30:3001/api/agents/{id}/command" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"command_type":"powershell","command":"...","elevated":true}'
|
||||
|
||||
# Poll result
|
||||
curl -s "http://172.16.3.30:3001/api/commands/{command_id}" \
|
||||
-H "Authorization: Bearer $TOKEN" | jq '{status, exit_code, stdout, stderr}'
|
||||
```
|
||||
|
||||
Verification output (all 5 machines):
|
||||
```
|
||||
OK: LHM present at C:\Program Files\GuruRMM\lhm\LibreHardwareMonitor.exe (25 files in lhm/)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Pending / Incomplete Tasks
|
||||
|
||||
- **Pipeline build for `aa9ad74`**: Gitea webhook building; verify the `interrupted` status renders correctly in dashboard after deploy.
|
||||
- **Schtasks pattern for future restart-needing scripts**: Document or enforce the convention that RMM scripts requiring agent restart should use a scheduled task with a delay instead of calling `Restart-Service` directly.
|
||||
- **Orphaned commands from today**: The five deployment commands from this session remain in `running` state (pre-fix). They will need manual cleanup or will be resolved when those agents next reconnect after the new build deploys.
|
||||
|
||||
---
|
||||
|
||||
### Reference Information
|
||||
|
||||
- GuruRMM gururmm repo: `azcomputerguru/gururmm` on Gitea (`http://172.16.3.20:3000`)
|
||||
- Commit with interrupted cleanup: `aa9ad74`
|
||||
- LHM release used: v0.9.4 (`LibreHardwareMonitor-net472.zip`) from GitHub releases
|
||||
- WiX service name confirmed in: `installer/gururmm-agent.wxs` → `<ServiceInstall Name="GuruRMMAgent" ...>`
|
||||
- Command API routes: `POST /api/agents/:id/command`, `GET /api/commands/:id`, `GET /api/commands?agent_id=...`
|
||||
|
||||
Reference in New Issue
Block a user