Session log: GuruRMM 4-bug fix + MSP360 backup integration 2026-05-19

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-19 16:53:48 -07:00
parent 6608c05674
commit b804a92a05
6 changed files with 63 additions and 2 deletions

View File

@@ -0,0 +1,54 @@
# Session Log: GuruRMM — Bug Fixes & MSP360 Backup Integration
**Date:** 2026-05-19
**Duration:** ~3 hours
## User
- **User:** Mike Swanson (mike)
- **Machine:** DESKTOP-0O8A1RL
- **Role:** admin
## Summary
Two areas of work: fixed 4 agent/server bugs identified from AD2's crash loop, then diagnosed and fixed the MSP360 backup integration which had never been configured.
## Part 1: 4-Bug Fix (v0.6.25)
Investigated why AD2's RMM agent was crash-looping and why the watchdog never fired. Root cause: agent 0.6.22/0.6.23 sent `user_inventory_report` WS messages the server couldn't deserialize. Also found a 48-minute update gap where the 30s grace period was too short for a Windows Defender scan of the new binary.
### Bugs fixed (commits 56723b1, 2a7b74b):
1. **Grace period too short during updates** — extended to poll `agent_updates` for up to 2 hours before marking agent offline
2. **AgentMessage unknown variant crash** — silently skips unknown WS message types (forward-compat); previously crashed the WS handler
3. **WatchdogEvent not persisted** — WatchdogEvent messages now written to `watchdog_events` DB table
4. **Watchdog never started**`ensure_watchdog_running()` was implemented but never called from `run_agent()`; `agent-id.txt` sidecar (required by `post_watchdog_alert`) was never written after WS auth
5. **Reviewer notes** (commit 2a7b74b): `has_in_progress_update` NULL gap fixed; `warn!` on WatchdogEvent DB insert failure
## Part 2: MSP360 Backup Integration
Backup tab on AD2 showed nothing. Root cause chain:
1. `mspbackups_config` was empty — API credentials never configured. Fixed: loaded credentials from vault (`msp-tools/msp360-api.sops.yaml`), configured via API.
2. `POST /api/mspbackups/config` failed with `partner_id` NOT NULL violation — handler was passing `None`. Fixed in commit `3b29acc`.
3. Build pipeline only builds agents, not server. Discovered `build-server.sh` at `/opt/gururmm/build-server.sh`.
4. SOPS vault file had unquoted YAML timestamp (`created: 2026-05-18T00:00:00Z`) causing `time.Time` walk error. Fixed by quoting it in the raw YAML.
5. MSP360 `/api/Monitoring` returns `null` for `LastStart`/`NextStart` on 14 records — struct had `String` not `Option<String>`. Fixed in commit `91630cb`.
6. Hostname match picked offline phantom AD2 agent (f6a99fe7, crash-loop duplicate) instead of online agent (49c66d8b). Fixed in commit `86e7ade`: `find_agent_by_hostname_ci` now orders by `status='online'` first.
7. `last_backup_at`/`next_backup_at` stored as NULL — MSP360 dates lack timezone (`2026-05-19T07:00:04`, not RFC3339). Fixed in commit `f146bd9`: fallback parser treats naive timestamps as UTC.
### Result
AD2 backup tab now shows: `status: success`, last backup `2026-05-19T07:00:04Z`, next `2026-05-20T07:00:00Z`, plan `AD2 Image`, 6 files, ~355 GB. Syncs every 15 minutes.
## Server Builds (manual — not triggered by agent pipeline)
- `sudo /opt/gururmm/build-server.sh` — used for all server-only deploys
- Server binary at `/opt/gururmm/gururmm-server`, service: `gururmm-server`
## Commits (gururmm repo)
- `56723b1` — fix: 4-bug fix (grace period, AgentMessage forward-compat, WatchdogEvent, watchdog start)
- `2a7b74b` — fix: reviewer notes (NULL gap, warn! on watchdog event)
- `3b29acc` — fix: mspbackups config partner_id lookup
- `91630cb` — fix: handle null LastStart/NextStart in MSP360 BackupPlan
- `86e7ade` — fix: prefer online agent in MSP360 hostname match
- `f146bd9` — fix: parse MSP360 no-timezone dates as UTC
## Anti-Pattern Added
Build-server.sh is separate from build-agents.sh. Server code changes require manual `sudo /opt/gururmm/build-server.sh` after pushing to Gitea.