Files
claudetools/session-logs/2026-05-19-gururmm-backup-fixes.md

3.6 KiB

Session Log: GuruRMM — Bug Fixes & MSP360 Backup Integration

Date: 2026-05-19 Duration: ~3 hours

User

  • User: Mike Swanson (mike)
  • Machine: DESKTOP-0O8A1RL
  • Role: admin

Summary

Two areas of work: fixed 4 agent/server bugs identified from AD2's crash loop, then diagnosed and fixed the MSP360 backup integration which had never been configured.

Part 1: 4-Bug Fix (v0.6.25)

Investigated why AD2's RMM agent was crash-looping and why the watchdog never fired. Root cause: agent 0.6.22/0.6.23 sent user_inventory_report WS messages the server couldn't deserialize. Also found a 48-minute update gap where the 30s grace period was too short for a Windows Defender scan of the new binary.

Bugs fixed (commits 56723b1, 2a7b74b):

  1. Grace period too short during updates — extended to poll agent_updates for up to 2 hours before marking agent offline
  2. AgentMessage unknown variant crash — silently skips unknown WS message types (forward-compat); previously crashed the WS handler
  3. WatchdogEvent not persisted — WatchdogEvent messages now written to watchdog_events DB table
  4. Watchdog never startedensure_watchdog_running() was implemented but never called from run_agent(); agent-id.txt sidecar (required by post_watchdog_alert) was never written after WS auth
  5. Reviewer notes (commit 2a7b74b): has_in_progress_update NULL gap fixed; warn! on WatchdogEvent DB insert failure

Part 2: MSP360 Backup Integration

Backup tab on AD2 showed nothing. Root cause chain:

  1. mspbackups_config was empty — API credentials never configured. Fixed: loaded credentials from vault (msp-tools/msp360-api.sops.yaml), configured via API.
  2. POST /api/mspbackups/config failed with partner_id NOT NULL violation — handler was passing None. Fixed in commit 3b29acc.
  3. Build pipeline only builds agents, not server. Discovered build-server.sh at /opt/gururmm/build-server.sh.
  4. SOPS vault file had unquoted YAML timestamp (created: 2026-05-18T00:00:00Z) causing time.Time walk error. Fixed by quoting it in the raw YAML.
  5. MSP360 /api/Monitoring returns null for LastStart/NextStart on 14 records — struct had String not Option<String>. Fixed in commit 91630cb.
  6. Hostname match picked offline phantom AD2 agent (f6a99fe7, crash-loop duplicate) instead of online agent (49c66d8b). Fixed in commit 86e7ade: find_agent_by_hostname_ci now orders by status='online' first.
  7. last_backup_at/next_backup_at stored as NULL — MSP360 dates lack timezone (2026-05-19T07:00:04, not RFC3339). Fixed in commit f146bd9: fallback parser treats naive timestamps as UTC.

Result

AD2 backup tab now shows: status: success, last backup 2026-05-19T07:00:04Z, next 2026-05-20T07:00:00Z, plan AD2 Image, 6 files, ~355 GB. Syncs every 15 minutes.

Server Builds (manual — not triggered by agent pipeline)

  • sudo /opt/gururmm/build-server.sh — used for all server-only deploys
  • Server binary at /opt/gururmm/gururmm-server, service: gururmm-server

Commits (gururmm repo)

  • 56723b1 — fix: 4-bug fix (grace period, AgentMessage forward-compat, WatchdogEvent, watchdog start)
  • 2a7b74b — fix: reviewer notes (NULL gap, warn! on watchdog event)
  • 3b29acc — fix: mspbackups config partner_id lookup
  • 91630cb — fix: handle null LastStart/NextStart in MSP360 BackupPlan
  • 86e7ade — fix: prefer online agent in MSP360 hostname match
  • f146bd9 — fix: parse MSP360 no-timezone dates as UTC

Anti-Pattern Added

Build-server.sh is separate from build-agents.sh. Server code changes require manual sudo /opt/gururmm/build-server.sh after pushing to Gitea.