# Phase 6: End-to-End Testing Plan ## Safe Agent Rollout System **Date:** 2026-05-25 **Version:** GuruRMM v0.6.41+ **Tester:** Mike Swanson --- ## Prerequisites ### Environment Setup - [ ] SSH access to gururmm-build (172.16.3.30) - [ ] Access to GuruRMM dashboard (https://rmm.azcomputerguru.com) - [ ] JWT token for API testing - [ ] At least 2 test agents (GURU-KALI, GURU-5070 recommended) ### Pre-Test Verification ```bash # On Saturn ssh azcomputerguru@172.16.3.30 # 1. Verify migration 046 is applied sudo -u postgres psql gururmm_production -c "\d update_rollouts" sudo -u postgres psql gururmm_production -c "\d update_health_metrics" sudo -u postgres psql gururmm_production -c "\d agent_update_events" # 2. Verify server build is current cd /opt/gururmm/server git status # Should show Phase 4 code cargo build --release --features production # 3. Verify dashboard build is current cd /opt/gururmm/dashboard git status # Should show Phase 5 code npm run build # 4. Verify health monitor is running sudo systemctl status gururmm-server sudo journalctl -u gururmm-server -n 50 | grep "Health monitoring task spawned" ``` --- ## Test 1: Beta-First Build Workflow **Objective:** Verify new builds default to beta channel and stable agents don't receive them. ### Steps 1. **Trigger a test build** ```bash # On Saturn cd /opt/gururmm sudo ./build-linux.sh # Will auto-increment to next version sudo ./build-windows.sh ``` 2. **Verify .channel files created** ```bash cd /var/www/gururmm/downloads ls -la *.channel | tail -10 # Expected: New version should have .channel files containing "beta" VERSION=$(ls -t gururmm-agent-linux-amd64-*.tar.gz | head -1 | grep -oP '\d+\.\d+\.\d+') cat gururmm-agent-linux-amd64-${VERSION}.tar.gz.channel # Should output: beta ``` 3. **Mark test agents as beta** ```bash # Via API or SQL curl -X PATCH https://rmm.azcomputerguru.com/api/agents/GURU-KALI-UUID/channel \ -H "Authorization: Bearer $TOKEN" \ -d '{"update_channel": "beta"}' ``` 4. **Verify beta agent receives update** - Open dashboard → Agents → GURU-KALI - Wait for agent connection (heartbeat every 60s) - Check agent state for pending update - Expected: Should see update_available = true 5. **Verify stable agent does NOT receive update** - Ensure GURU-5070 is on "stable" channel - Check agent state - Expected: update_available = false (version not in stable channel) ### Success Criteria - ✅ .channel files exist for new version - ✅ .channel files contain "beta" - ✅ Beta agents offered the update - ✅ Stable agents NOT offered the update - ✅ Scanner logs show beta/stable filtering --- ## Test 2: Health Monitoring & Crash Detection **Objective:** Verify health monitor detects crashes and updates metrics. ### Steps 1. **Clear existing health data (optional)** ```sql sudo -u postgres psql gururmm_production -c "DELETE FROM update_health_metrics WHERE version = '$VERSION';" sudo -u postgres psql gururmm_production -c "DELETE FROM agent_update_events WHERE version_to = '$VERSION';" ``` 2. **Simulate successful update** ```bash # On test agent (GURU-KALI) # Let update complete normally # Wait 5 minutes ``` 3. **Check event logging** ```sql SELECT event_type, version_to, created_at FROM agent_update_events WHERE agent_id = 'GURU-KALI-UUID' ORDER BY created_at DESC LIMIT 5; # Expected events: # - update_dispatched # - download_started (if implemented) # - download_complete (if implemented) # - update_applied ``` 4. **Check health metrics incremented** ```sql SELECT version, total_attempts, successful_updates, failed_updates, crash_count, health_status FROM update_health_metrics WHERE version = '$VERSION'; # Expected: # total_attempts = 1 # successful_updates = 1 # health_status = 'unknown' (< 5 attempts) ``` 5. **Simulate crash** ```bash # On test agent # 1. Trigger update dispatch # 2. Immediately after "update_applied" event, stop agent service sudo systemctl stop gururmm-agent # 3. Wait 60-90 seconds for health monitor scan ``` 6. **Verify crash detection** ```sql SELECT event_type, created_at FROM agent_update_events WHERE agent_id = 'GURU-KALI-UUID' AND event_type = 'crash_detected' ORDER BY created_at DESC; # Expected: Should see crash_detected event SELECT crash_count, health_status FROM update_health_metrics WHERE version = '$VERSION'; # Expected: crash_count incremented, health_status may change ``` 7. **Check server logs** ```bash sudo journalctl -u gururmm-server -n 100 | grep -E "crash|health" # Expected: "Detected crash: agent X went offline after updating to Y" ``` ### Success Criteria - ✅ Events logged correctly (update_dispatched, update_applied) - ✅ Health metrics incremented on success - ✅ Crash detected within 90 seconds - ✅ crash_detected event logged - ✅ Crash counter incremented - ✅ Health status updated based on thresholds --- ## Test 3: Promotion Workflow **Objective:** Verify promotion from beta to stable with health gates. ### Steps 1. **Attempt promotion with insufficient data** ```bash curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"os": "linux", "arch": "amd64", "force": false}' # Expected: May succeed (unknown status allows promotion) or fail if health check implemented ``` 2. **Generate healthy metrics** ```bash # Simulate 5+ successful updates # Option A: Manually insert via SQL (for testing) # Option B: Trigger real updates on multiple beta agents # SQL approach for testing: sudo -u postgres psql gururmm_production << EOF UPDATE update_health_metrics SET total_attempts = 5, successful_updates = 5, failed_updates = 0, crash_count = 0, health_status = 'healthy' WHERE version = '$VERSION' AND os = 'linux' AND arch = 'amd64'; EOF ``` 3. **Verify health status** ```bash curl https://rmm.azcomputerguru.com/api/updates/rollouts \ -H "Authorization: Bearer $TOKEN" | jq '.[] | select(.version == "'$VERSION'")' # Expected: health.status = "healthy" ``` 4. **Promote to stable** ```bash curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"os": "linux", "arch": "amd64", "force": false}' # Expected: {"success": true, "message": "Promoted...", "files_updated": 2} ``` 5. **Verify .channel files updated** ```bash cat /var/www/gururmm/downloads/gururmm-agent-linux-amd64-${VERSION}.tar.gz.channel # Expected: stable ``` 6. **Verify database updated** ```sql SELECT channel, promoted_at, promoted_by FROM update_rollouts WHERE version = '$VERSION' AND os = 'linux' AND arch = 'amd64'; # Expected: channel = 'stable', promoted_at = NOW(), promoted_by = user_id ``` 7. **Verify stable agents receive update** - Ensure test agent is on "stable" channel - Wait for scanner rescan (happens immediately after promotion) - Check agent state - Expected: update_available = true 8. **Test force promotion** ```bash # Set health to warning sudo -u postgres psql gururmm_production << EOF UPDATE update_health_metrics SET health_status = 'warning' WHERE version = '$VERSION' AND os = 'windows' AND arch = 'amd64'; EOF # Try promotion without force curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \ -H "Authorization: Bearer $TOKEN" \ -d '{"os": "windows", "arch": "amd64", "force": false}' # Expected: 403 error with message about health status # Try with force flag curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \ -H "Authorization: Bearer $TOKEN" \ -d '{"os": "windows", "arch": "amd64", "force": true}' # Expected: 200 success (overridden health check) ``` ### Success Criteria - ✅ Promotion blocked for unhealthy versions (unless forced) - ✅ Promotion succeeds for healthy versions - ✅ .channel files updated from "beta" to "stable" - ✅ Database rollouts table updated - ✅ Scanner rescans immediately - ✅ Stable agents receive update after promotion - ✅ Force flag overrides health checks - ✅ Dashboard shows updated channel --- ## Test 4: Rollback Workflow **Objective:** Verify rollback blocks version and force-downgrades agents. ### Steps 1. **Prepare for rollback** ```bash # Ensure test agent is running the rollback target version # Verify previous stable version exists curl https://rmm.azcomputerguru.com/api/updates/rollouts \ -H "Authorization: Bearer $TOKEN" | jq '.[] | select(.channel == "stable") | .version' ``` 2. **Execute rollback** ```bash curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/rollback \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "os": "linux", "arch": "amd64", "reason": "Test rollback: simulating critical bug in version '$VERSION'" }' # Expected: {"success": true, "agents_affected": 1, "downgrade_version": "0.6.40"} ``` 3. **Verify .channel files removed** ```bash ls /var/www/gururmm/downloads/gururmm-agent-linux-amd64-${VERSION}.tar.gz.channel # Expected: File not found (removed) ``` 4. **Verify health status blocked** ```sql SELECT health_status, last_incident FROM update_health_metrics WHERE version = '$VERSION' AND os = 'linux' AND arch = 'amd64'; # Expected: health_status = 'blocked', last_incident = reason text ``` 5. **Verify forced downgrade dispatched** ```bash # Check server logs for WebSocket dispatch sudo journalctl -u gururmm-server -n 100 | grep -i "downgrade\|rollback" # Check agent receives forced update # Monitor agent logs for update trigger ``` 6. **Verify agent downgrades** - Agent should receive UpdateAvailable message with previous version - Agent should download and install previous version - Check agent version after completion - Expected: agent_version = previous stable version 7. **Verify blocked version not offered again** ```bash # Scanner should skip files without .channel files # Verify version is not in available updates list curl https://rmm.azcomputerguru.com/api/updates/rollouts \ -H "Authorization: Bearer $TOKEN" | jq '.[] | select(.version == "'$VERSION'")' # If present, should show channel = null or health.status = "blocked" ``` ### Success Criteria - ✅ .channel files removed - ✅ Health status set to "blocked" - ✅ Last incident reason recorded - ✅ Connected agents receive forced downgrade - ✅ Agents successfully downgrade to previous stable - ✅ Blocked version not offered to new agents - ✅ Dashboard shows blocked status --- ## Test 5: Dashboard UI Testing **Objective:** Verify Updates page displays correctly and actions work. ### Steps 1. **Access Updates page** - Navigate to https://rmm.azcomputerguru.com/updates - Login if needed 2. **Verify data display** - [ ] Table shows all rollout versions - [ ] Columns: Version, OS/Arch, Channel, Health, Success Rate, Agent Counts, Actions - [ ] Health badges color-coded (green/yellow/red/gray) - [ ] Success rate calculated correctly - [ ] Agent counts accurate 3. **Test promote button** - [ ] Enabled for beta + healthy versions only - [ ] Disabled with tooltip for unhealthy versions - [ ] Click opens confirmation dialog - [ ] Confirm triggers API call - [ ] Success toast appears - [ ] Table refreshes with updated data 4. **Test rollback button** - [ ] Always enabled - [ ] Click opens dialog with reason input - [ ] Reason field is required - [ ] Confirm triggers API call - [ ] Success toast shows agent count - [ ] Table refreshes with updated data 5. **Test error handling** - [ ] Shows loading state during fetch - [ ] Shows error message if API fails - [ ] Retry button works - [ ] Shows empty state if no rollouts 6. **Test auto-refresh** - [ ] Data refreshes every 30 seconds - [ ] Refresh doesn't disrupt UI interactions - [ ] Manual refresh button works ### Success Criteria - ✅ All table columns display correct data - ✅ Health badges use correct colors - ✅ Promote button only enabled for healthy beta versions - ✅ Rollback button always enabled - ✅ Confirmation dialogs work - ✅ API calls succeed - ✅ Toasts display success/error - ✅ Auto-refresh works - ✅ Responsive on mobile --- ## Test 6: Integration Testing **Objective:** Test complete workflows end-to-end. ### Workflow 1: New Build → Beta Testing → Promotion → Stable Deployment 1. Trigger new build (auto-bumps version) 2. Verify .channel files = "beta" 3. Mark GURU-KALI as beta agent 4. Wait for update dispatch 5. Monitor update installation 6. Verify success event logged 7. Repeat 4 more times for healthy status 8. Promote via dashboard 9. Verify GURU-5070 (stable) receives update 10. Monitor stable deployment 11. Verify all agents updated **Expected:** Beta testing prevents bad updates from reaching production. ### Workflow 2: Critical Bug → Rollback → Fleet Downgrade 1. Simulate critical bug discovered post-promotion 2. Execute rollback via dashboard 3. Verify all agents receive forced downgrade 4. Verify agents revert to previous stable 5. Verify new agents don't receive blocked version 6. Verify health metrics show blocked status **Expected:** Rollback protects fleet from bad updates. ### Workflow 3: Crash Detection → Auto-Block (Future Enhancement) 1. Deploy update to beta agents 2. Simulate crash (stop service after update) 3. Wait for health monitor (60s) 4. Verify crash detected and logged 5. Check if crash rate >25% 6. Verify health status = "critical" 7. Attempt promotion 8. Verify promotion blocked **Expected:** High crash rates prevent automatic promotion. --- ## Performance Testing ### Load Testing - [ ] 100+ agents checking for updates simultaneously - [ ] Scanner performance with 50+ versions - [ ] Health monitor with 1000+ update events - [ ] Dashboard with 20+ rollouts displayed ### Stress Testing - [ ] Rapid version releases (5 builds in 10 minutes) - [ ] Mass rollback (100+ agents) - [ ] Concurrent API calls (multiple users promoting/rolling back) --- ## Security Testing ### Authentication - [ ] All API endpoints require valid JWT - [ ] Expired tokens rejected - [ ] Invalid tokens rejected ### Authorization - [ ] Admin role can promote/rollback - [ ] Non-admin role blocked (if RBAC implemented) ### Input Validation - [ ] SQL injection attempts blocked - [ ] XSS attempts in reason field sanitized - [ ] Invalid version strings rejected - [ ] Invalid OS/arch values rejected ### File System Security - [ ] .channel files have correct permissions - [ ] Path traversal attempts blocked - [ ] Only authorized processes can modify .channel files --- ## Regression Testing ### Existing Functionality - [ ] Agent registration still works - [ ] Heartbeat processing unaffected - [ ] Command execution unaffected - [ ] Metrics collection unaffected - [ ] Alert generation unaffected - [ ] Policy enforcement unaffected ### Database Performance - [ ] No slow queries introduced - [ ] Indexes used efficiently - [ ] No lock contention --- ## Documentation Verification - [ ] API endpoints documented - [ ] Database schema documented - [ ] Dashboard user guide accurate - [ ] Admin procedures documented - [ ] Troubleshooting guide created --- ## Sign-Off ### Phase 6 Test Results **Tester:** ___________________________ **Date:** ___________________________ **Test 1 - Beta-First Workflow:** ⬜ PASS ⬜ FAIL **Test 2 - Health Monitoring:** ⬜ PASS ⬜ FAIL **Test 3 - Promotion:** ⬜ PASS ⬜ FAIL **Test 4 - Rollback:** ⬜ PASS ⬜ FAIL **Test 5 - Dashboard UI:** ⬜ PASS ⬜ FAIL **Test 6 - Integration:** ⬜ PASS ⬜ FAIL **Overall Status:** ⬜ APPROVED FOR PRODUCTION ⬜ NEEDS FIXES **Notes:** ``` ``` **Blockers/Issues:** ``` ``` **Deployment Date:** ___________________________