Files
claudetools/session-logs/2026-05-19-gururmm-backup-fixes.md
Mike Swanson 5ead5d4dee sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-19 17:56:56
Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-19 17:56:56
2026-05-19 17:57:02 -07:00

7.1 KiB

Session Log: GuruRMM — Bug Fixes & MSP360 Backup Integration

Date: 2026-05-19 Duration: ~3 hours

User

  • User: Mike Swanson (mike)
  • Machine: DESKTOP-0O8A1RL
  • Role: admin

Summary

Two areas of work: fixed 4 agent/server bugs identified from AD2's crash loop, then diagnosed and fixed the MSP360 backup integration which had never been configured.

Part 1: 4-Bug Fix (v0.6.25)

Investigated why AD2's RMM agent was crash-looping and why the watchdog never fired. Root cause: agent 0.6.22/0.6.23 sent user_inventory_report WS messages the server couldn't deserialize. Also found a 48-minute update gap where the 30s grace period was too short for a Windows Defender scan of the new binary.

Bugs fixed (commits 56723b1, 2a7b74b):

  1. Grace period too short during updates — extended to poll agent_updates for up to 2 hours before marking agent offline
  2. AgentMessage unknown variant crash — silently skips unknown WS message types (forward-compat); previously crashed the WS handler
  3. WatchdogEvent not persisted — WatchdogEvent messages now written to watchdog_events DB table
  4. Watchdog never startedensure_watchdog_running() was implemented but never called from run_agent(); agent-id.txt sidecar (required by post_watchdog_alert) was never written after WS auth
  5. Reviewer notes (commit 2a7b74b): has_in_progress_update NULL gap fixed; warn! on WatchdogEvent DB insert failure

Part 2: MSP360 Backup Integration

Backup tab on AD2 showed nothing. Root cause chain:

  1. mspbackups_config was empty — API credentials never configured. Fixed: loaded credentials from vault (msp-tools/msp360-api.sops.yaml), configured via API.
  2. POST /api/mspbackups/config failed with partner_id NOT NULL violation — handler was passing None. Fixed in commit 3b29acc.
  3. Build pipeline only builds agents, not server. Discovered build-server.sh at /opt/gururmm/build-server.sh.
  4. SOPS vault file had unquoted YAML timestamp (created: 2026-05-18T00:00:00Z) causing time.Time walk error. Fixed by quoting it in the raw YAML.
  5. MSP360 /api/Monitoring returns null for LastStart/NextStart on 14 records — struct had String not Option<String>. Fixed in commit 91630cb.
  6. Hostname match picked offline phantom AD2 agent (f6a99fe7, crash-loop duplicate) instead of online agent (49c66d8b). Fixed in commit 86e7ade: find_agent_by_hostname_ci now orders by status='online' first.
  7. last_backup_at/next_backup_at stored as NULL — MSP360 dates lack timezone (2026-05-19T07:00:04, not RFC3339). Fixed in commit f146bd9: fallback parser treats naive timestamps as UTC.

Result

AD2 backup tab now shows: status: success, last backup 2026-05-19T07:00:04Z, next 2026-05-20T07:00:00Z, plan AD2 Image, 6 files, ~355 GB. Syncs every 15 minutes.

Server Builds (manual — not triggered by agent pipeline)

  • sudo /opt/gururmm/build-server.sh — used for all server-only deploys
  • Server binary at /opt/gururmm/gururmm-server, service: gururmm-server

Commits (gururmm repo)

  • 56723b1 — fix: 4-bug fix (grace period, AgentMessage forward-compat, WatchdogEvent, watchdog start)
  • 2a7b74b — fix: reviewer notes (NULL gap, warn! on watchdog event)
  • 3b29acc — fix: mspbackups config partner_id lookup
  • 91630cb — fix: handle null LastStart/NextStart in MSP360 BackupPlan
  • 86e7ade — fix: prefer online agent in MSP360 hostname match
  • f146bd9 — fix: parse MSP360 no-timezone dates as UTC

Anti-Pattern Added

Build-server.sh is separate from build-agents.sh. Server code changes require manual sudo /opt/gururmm/build-server.sh after pushing to Gitea.


Update: ~17:30 PT — Self-heal alert view + agent alerts tab

Session Summary

Resumed from a context compaction boundary. The self-heal alert changes (committed to server but not built in the previous context) were deployed first: server rebuilt to v0.3.3 and dashboard deployed. CONTEXT.md was updated to reflect the split versioning (agent 0.6.25 / server 0.3.3) and to document the build-server.sh anti-pattern.

A second feature request came in: the top-level Alerts tab should show only active (unacknowledged) alerts, while the agent detail page should have its own verbose, filterable alert history. Three commits landed in total across the two work blocks.

For the agent detail alerts tab: the alertsApi.list endpoint already supported agent_id filtering in AlertFilter. AlertRow, StatusBadge, and formatRelative were exported from Alerts.tsx for reuse in AgentDetail.tsx. A new AgentAlertsPanel component was added inline (following the same pattern as AgentLogsPanel), defaulting to all statuses to show full history.

Key Decisions

  • Default to active not unresolved on top-level Alerts: "Unresolved" (active + acknowledged) was the previous session's choice, but acknowledged alerts have already been triaged — the at-a-glance view should only show what needs attention. Acknowledged and resolved are still a dropdown away.
  • Agent detail shows all statuses by default: Contrast with the fleet view — the per-machine tab is the history view, so defaulting to all statuses (including resolved) gives a complete picture of what happened on that machine.
  • Exported shared components from Alerts.tsx rather than creating a new file: AlertRow, StatusBadge, formatRelative, SeverityBadge were already complete and tested. Extracting to a shared component file was not worth the churn; direct exports kept the diff minimal.
  • No server-side changes needed: GET /api/alerts already accepts agent_id in AlertFilter. The feature was purely a dashboard change.

Configuration Changes

  • dashboard/src/pages/Alerts.tsx — default status filter "unresolved""active"; dropdown reordered; AlertRow, StatusBadge, SeverityBadge, formatRelative exported
  • dashboard/src/pages/AgentDetail.tsx"alerts" added to TabId and VALID_TABS; AgentAlertsPanel component added; "Alerts" tab wired into tab bar and TabPanel tree
  • server/src/db/alerts.rs"unresolved" meta-filter maps to IN ('active', 'acknowledged'); status_contributes_param boolean guards bind-slot indexing (deployed in previous context, built this session)
  • projects/msp-tools/guru-rmm/CONTEXT.md — version split to agent 0.6.25 / server 0.3.3; build-server.sh anti-pattern documented

Infrastructure

  • Server: 172.16.3.30 | gururmm-server service | /usr/local/bin/gururmm-server
  • Dashboard: nginx @ /var/www/gururmm/dashboard/ | proxied via https://rmm.azcomputerguru.com

Commits (gururmm repo)

  • 2b10d17 — feat: self-heal alert view — unresolved default filter (server/src/db/alerts.rs, dashboard/src/pages/Alerts.tsx)
  • e5ac537 — (previous session boundary commit)
  • f888788 — feat: agent alerts tab + active-only default on top-level view (dashboard/src/pages/Alerts.tsx, dashboard/src/pages/AgentDetail.tsx)

Server Builds

  • sudo /opt/gururmm/build-server.sh ran at 00:14 UTC (17:14 PT) → v0.3.3 deployed
  • Dashboard built (npm run build) and deployed to /var/www/gururmm/dashboard/ twice (once per feature batch)