Author: Mike Swanson Machine: DESKTOP-0O8A1RL Timestamp: 2026-05-19 17:56:56
7.1 KiB
Session Log: GuruRMM — Bug Fixes & MSP360 Backup Integration
Date: 2026-05-19 Duration: ~3 hours
User
- User: Mike Swanson (mike)
- Machine: DESKTOP-0O8A1RL
- Role: admin
Summary
Two areas of work: fixed 4 agent/server bugs identified from AD2's crash loop, then diagnosed and fixed the MSP360 backup integration which had never been configured.
Part 1: 4-Bug Fix (v0.6.25)
Investigated why AD2's RMM agent was crash-looping and why the watchdog never fired. Root cause: agent 0.6.22/0.6.23 sent user_inventory_report WS messages the server couldn't deserialize. Also found a 48-minute update gap where the 30s grace period was too short for a Windows Defender scan of the new binary.
Bugs fixed (commits 56723b1, 2a7b74b):
- Grace period too short during updates — extended to poll
agent_updatesfor up to 2 hours before marking agent offline - AgentMessage unknown variant crash — silently skips unknown WS message types (forward-compat); previously crashed the WS handler
- WatchdogEvent not persisted — WatchdogEvent messages now written to
watchdog_eventsDB table - Watchdog never started —
ensure_watchdog_running()was implemented but never called fromrun_agent();agent-id.txtsidecar (required bypost_watchdog_alert) was never written after WS auth - Reviewer notes (commit 2a7b74b):
has_in_progress_updateNULL gap fixed;warn!on WatchdogEvent DB insert failure
Part 2: MSP360 Backup Integration
Backup tab on AD2 showed nothing. Root cause chain:
mspbackups_configwas empty — API credentials never configured. Fixed: loaded credentials from vault (msp-tools/msp360-api.sops.yaml), configured via API.POST /api/mspbackups/configfailed withpartner_idNOT NULL violation — handler was passingNone. Fixed in commit3b29acc.- Build pipeline only builds agents, not server. Discovered
build-server.shat/opt/gururmm/build-server.sh. - SOPS vault file had unquoted YAML timestamp (
created: 2026-05-18T00:00:00Z) causingtime.Timewalk error. Fixed by quoting it in the raw YAML. - MSP360
/api/MonitoringreturnsnullforLastStart/NextStarton 14 records — struct hadStringnotOption<String>. Fixed in commit91630cb. - Hostname match picked offline phantom AD2 agent (f6a99fe7, crash-loop duplicate) instead of online agent (49c66d8b). Fixed in commit
86e7ade:find_agent_by_hostname_cinow orders bystatus='online'first. last_backup_at/next_backup_atstored as NULL — MSP360 dates lack timezone (2026-05-19T07:00:04, not RFC3339). Fixed in commitf146bd9: fallback parser treats naive timestamps as UTC.
Result
AD2 backup tab now shows: status: success, last backup 2026-05-19T07:00:04Z, next 2026-05-20T07:00:00Z, plan AD2 Image, 6 files, ~355 GB. Syncs every 15 minutes.
Server Builds (manual — not triggered by agent pipeline)
sudo /opt/gururmm/build-server.sh— used for all server-only deploys- Server binary at
/opt/gururmm/gururmm-server, service:gururmm-server
Commits (gururmm repo)
56723b1— fix: 4-bug fix (grace period, AgentMessage forward-compat, WatchdogEvent, watchdog start)2a7b74b— fix: reviewer notes (NULL gap, warn! on watchdog event)3b29acc— fix: mspbackups config partner_id lookup91630cb— fix: handle null LastStart/NextStart in MSP360 BackupPlan86e7ade— fix: prefer online agent in MSP360 hostname matchf146bd9— fix: parse MSP360 no-timezone dates as UTC
Anti-Pattern Added
Build-server.sh is separate from build-agents.sh. Server code changes require manual sudo /opt/gururmm/build-server.sh after pushing to Gitea.
Update: ~17:30 PT — Self-heal alert view + agent alerts tab
Session Summary
Resumed from a context compaction boundary. The self-heal alert changes (committed to server but not built in the previous context) were deployed first: server rebuilt to v0.3.3 and dashboard deployed. CONTEXT.md was updated to reflect the split versioning (agent 0.6.25 / server 0.3.3) and to document the build-server.sh anti-pattern.
A second feature request came in: the top-level Alerts tab should show only active (unacknowledged) alerts, while the agent detail page should have its own verbose, filterable alert history. Three commits landed in total across the two work blocks.
For the agent detail alerts tab: the alertsApi.list endpoint already supported agent_id filtering in AlertFilter. AlertRow, StatusBadge, and formatRelative were exported from Alerts.tsx for reuse in AgentDetail.tsx. A new AgentAlertsPanel component was added inline (following the same pattern as AgentLogsPanel), defaulting to all statuses to show full history.
Key Decisions
- Default to
activenotunresolvedon top-level Alerts: "Unresolved" (active + acknowledged) was the previous session's choice, but acknowledged alerts have already been triaged — the at-a-glance view should only show what needs attention. Acknowledged and resolved are still a dropdown away. - Agent detail shows all statuses by default: Contrast with the fleet view — the per-machine tab is the history view, so defaulting to all statuses (including resolved) gives a complete picture of what happened on that machine.
- Exported shared components from Alerts.tsx rather than creating a new file:
AlertRow,StatusBadge,formatRelative,SeverityBadgewere already complete and tested. Extracting to a shared component file was not worth the churn; direct exports kept the diff minimal. - No server-side changes needed:
GET /api/alertsalready acceptsagent_idinAlertFilter. The feature was purely a dashboard change.
Configuration Changes
dashboard/src/pages/Alerts.tsx— default status filter"unresolved"→"active"; dropdown reordered;AlertRow,StatusBadge,SeverityBadge,formatRelativeexporteddashboard/src/pages/AgentDetail.tsx—"alerts"added toTabIdandVALID_TABS;AgentAlertsPanelcomponent added; "Alerts" tab wired into tab bar andTabPaneltreeserver/src/db/alerts.rs—"unresolved"meta-filter maps toIN ('active', 'acknowledged');status_contributes_paramboolean guards bind-slot indexing (deployed in previous context, built this session)projects/msp-tools/guru-rmm/CONTEXT.md— version split to agent 0.6.25 / server 0.3.3;build-server.shanti-pattern documented
Infrastructure
- Server: 172.16.3.30 | gururmm-server service |
/usr/local/bin/gururmm-server - Dashboard: nginx @
/var/www/gururmm/dashboard/| proxied via https://rmm.azcomputerguru.com
Commits (gururmm repo)
2b10d17— feat: self-heal alert view — unresolved default filter (server/src/db/alerts.rs,dashboard/src/pages/Alerts.tsx)e5ac537— (previous session boundary commit)f888788— feat: agent alerts tab + active-only default on top-level view (dashboard/src/pages/Alerts.tsx,dashboard/src/pages/AgentDetail.tsx)
Server Builds
sudo /opt/gururmm/build-server.shran at 00:14 UTC (17:14 PT) → v0.3.3 deployed- Dashboard built (
npm run build) and deployed to/var/www/gururmm/dashboard/twice (once per feature batch)