3.6 KiB
Session Log: GuruRMM — Bug Fixes & MSP360 Backup Integration
Date: 2026-05-19 Duration: ~3 hours
User
- User: Mike Swanson (mike)
- Machine: DESKTOP-0O8A1RL
- Role: admin
Summary
Two areas of work: fixed 4 agent/server bugs identified from AD2's crash loop, then diagnosed and fixed the MSP360 backup integration which had never been configured.
Part 1: 4-Bug Fix (v0.6.25)
Investigated why AD2's RMM agent was crash-looping and why the watchdog never fired. Root cause: agent 0.6.22/0.6.23 sent user_inventory_report WS messages the server couldn't deserialize. Also found a 48-minute update gap where the 30s grace period was too short for a Windows Defender scan of the new binary.
Bugs fixed (commits 56723b1, 2a7b74b):
- Grace period too short during updates — extended to poll
agent_updatesfor up to 2 hours before marking agent offline - AgentMessage unknown variant crash — silently skips unknown WS message types (forward-compat); previously crashed the WS handler
- WatchdogEvent not persisted — WatchdogEvent messages now written to
watchdog_eventsDB table - Watchdog never started —
ensure_watchdog_running()was implemented but never called fromrun_agent();agent-id.txtsidecar (required bypost_watchdog_alert) was never written after WS auth - Reviewer notes (commit 2a7b74b):
has_in_progress_updateNULL gap fixed;warn!on WatchdogEvent DB insert failure
Part 2: MSP360 Backup Integration
Backup tab on AD2 showed nothing. Root cause chain:
mspbackups_configwas empty — API credentials never configured. Fixed: loaded credentials from vault (msp-tools/msp360-api.sops.yaml), configured via API.POST /api/mspbackups/configfailed withpartner_idNOT NULL violation — handler was passingNone. Fixed in commit3b29acc.- Build pipeline only builds agents, not server. Discovered
build-server.shat/opt/gururmm/build-server.sh. - SOPS vault file had unquoted YAML timestamp (
created: 2026-05-18T00:00:00Z) causingtime.Timewalk error. Fixed by quoting it in the raw YAML. - MSP360
/api/MonitoringreturnsnullforLastStart/NextStarton 14 records — struct hadStringnotOption<String>. Fixed in commit91630cb. - Hostname match picked offline phantom AD2 agent (f6a99fe7, crash-loop duplicate) instead of online agent (49c66d8b). Fixed in commit
86e7ade:find_agent_by_hostname_cinow orders bystatus='online'first. last_backup_at/next_backup_atstored as NULL — MSP360 dates lack timezone (2026-05-19T07:00:04, not RFC3339). Fixed in commitf146bd9: fallback parser treats naive timestamps as UTC.
Result
AD2 backup tab now shows: status: success, last backup 2026-05-19T07:00:04Z, next 2026-05-20T07:00:00Z, plan AD2 Image, 6 files, ~355 GB. Syncs every 15 minutes.
Server Builds (manual — not triggered by agent pipeline)
sudo /opt/gururmm/build-server.sh— used for all server-only deploys- Server binary at
/opt/gururmm/gururmm-server, service:gururmm-server
Commits (gururmm repo)
56723b1— fix: 4-bug fix (grace period, AgentMessage forward-compat, WatchdogEvent, watchdog start)2a7b74b— fix: reviewer notes (NULL gap, warn! on watchdog event)3b29acc— fix: mspbackups config partner_id lookup91630cb— fix: handle null LastStart/NextStart in MSP360 BackupPlan86e7ade— fix: prefer online agent in MSP360 hostname matchf146bd9— fix: parse MSP360 no-timezone dates as UTC
Anti-Pattern Added
Build-server.sh is separate from build-agents.sh. Server code changes require manual sudo /opt/gururmm/build-server.sh after pushing to Gitea.