test: Add Phase 6 testing plan and verification script
Created comprehensive test plan covering: - Test 1: Beta-first build workflow - Test 2: Health monitoring and crash detection - Test 3: Promotion workflow with health gates - Test 4: Rollback and force-downgrade - Test 5: Dashboard UI testing - Test 6: End-to-end integration testing Added verification script for Saturn: - Checks all 5 phases implementation - Verifies database tables, source files, routes - Validates build artifacts and service status - Provides clear pass/fail status Usage: Run verify-rollout-system.sh on Saturn before Phase 6 testing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
571
PHASE_6_TEST_PLAN.md
Normal file
571
PHASE_6_TEST_PLAN.md
Normal file
@@ -0,0 +1,571 @@
|
|||||||
|
# Phase 6: End-to-End Testing Plan
|
||||||
|
## Safe Agent Rollout System
|
||||||
|
|
||||||
|
**Date:** 2026-05-25
|
||||||
|
**Version:** GuruRMM v0.6.41+
|
||||||
|
**Tester:** Mike Swanson
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### Environment Setup
|
||||||
|
- [ ] SSH access to Saturn (172.16.3.30)
|
||||||
|
- [ ] Access to GuruRMM dashboard (https://rmm.azcomputerguru.com)
|
||||||
|
- [ ] JWT token for API testing
|
||||||
|
- [ ] At least 2 test agents (GURU-KALI, GURU-5070 recommended)
|
||||||
|
|
||||||
|
### Pre-Test Verification
|
||||||
|
```bash
|
||||||
|
# On Saturn
|
||||||
|
ssh azcomputerguru@172.16.3.30
|
||||||
|
|
||||||
|
# 1. Verify migration 046 is applied
|
||||||
|
sudo -u postgres psql gururmm_production -c "\d update_rollouts"
|
||||||
|
sudo -u postgres psql gururmm_production -c "\d update_health_metrics"
|
||||||
|
sudo -u postgres psql gururmm_production -c "\d agent_update_events"
|
||||||
|
|
||||||
|
# 2. Verify server build is current
|
||||||
|
cd /opt/gururmm/server
|
||||||
|
git status # Should show Phase 4 code
|
||||||
|
cargo build --release --features production
|
||||||
|
|
||||||
|
# 3. Verify dashboard build is current
|
||||||
|
cd /opt/gururmm/dashboard
|
||||||
|
git status # Should show Phase 5 code
|
||||||
|
npm run build
|
||||||
|
|
||||||
|
# 4. Verify health monitor is running
|
||||||
|
sudo systemctl status gururmm-server
|
||||||
|
sudo journalctl -u gururmm-server -n 50 | grep "Health monitoring task spawned"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 1: Beta-First Build Workflow
|
||||||
|
|
||||||
|
**Objective:** Verify new builds default to beta channel and stable agents don't receive them.
|
||||||
|
|
||||||
|
### Steps
|
||||||
|
|
||||||
|
1. **Trigger a test build**
|
||||||
|
```bash
|
||||||
|
# On Saturn
|
||||||
|
cd /opt/gururmm
|
||||||
|
sudo ./build-linux.sh # Will auto-increment to next version
|
||||||
|
sudo ./build-windows.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Verify .channel files created**
|
||||||
|
```bash
|
||||||
|
cd /var/www/gururmm/downloads
|
||||||
|
ls -la *.channel | tail -10
|
||||||
|
|
||||||
|
# Expected: New version should have .channel files containing "beta"
|
||||||
|
VERSION=$(ls -t gururmm-agent-linux-amd64-*.tar.gz | head -1 | grep -oP '\d+\.\d+\.\d+')
|
||||||
|
cat gururmm-agent-linux-amd64-${VERSION}.tar.gz.channel
|
||||||
|
# Should output: beta
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Mark test agents as beta**
|
||||||
|
```bash
|
||||||
|
# Via API or SQL
|
||||||
|
curl -X PATCH https://rmm.azcomputerguru.com/api/agents/GURU-KALI-UUID/channel \
|
||||||
|
-H "Authorization: Bearer $TOKEN" \
|
||||||
|
-d '{"update_channel": "beta"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Verify beta agent receives update**
|
||||||
|
- Open dashboard → Agents → GURU-KALI
|
||||||
|
- Wait for agent connection (heartbeat every 60s)
|
||||||
|
- Check agent state for pending update
|
||||||
|
- Expected: Should see update_available = true
|
||||||
|
|
||||||
|
5. **Verify stable agent does NOT receive update**
|
||||||
|
- Ensure GURU-5070 is on "stable" channel
|
||||||
|
- Check agent state
|
||||||
|
- Expected: update_available = false (version not in stable channel)
|
||||||
|
|
||||||
|
### Success Criteria
|
||||||
|
- ✅ .channel files exist for new version
|
||||||
|
- ✅ .channel files contain "beta"
|
||||||
|
- ✅ Beta agents offered the update
|
||||||
|
- ✅ Stable agents NOT offered the update
|
||||||
|
- ✅ Scanner logs show beta/stable filtering
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 2: Health Monitoring & Crash Detection
|
||||||
|
|
||||||
|
**Objective:** Verify health monitor detects crashes and updates metrics.
|
||||||
|
|
||||||
|
### Steps
|
||||||
|
|
||||||
|
1. **Clear existing health data (optional)**
|
||||||
|
```sql
|
||||||
|
sudo -u postgres psql gururmm_production -c "DELETE FROM update_health_metrics WHERE version = '$VERSION';"
|
||||||
|
sudo -u postgres psql gururmm_production -c "DELETE FROM agent_update_events WHERE version_to = '$VERSION';"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Simulate successful update**
|
||||||
|
```bash
|
||||||
|
# On test agent (GURU-KALI)
|
||||||
|
# Let update complete normally
|
||||||
|
# Wait 5 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Check event logging**
|
||||||
|
```sql
|
||||||
|
SELECT event_type, version_to, created_at
|
||||||
|
FROM agent_update_events
|
||||||
|
WHERE agent_id = 'GURU-KALI-UUID'
|
||||||
|
ORDER BY created_at DESC
|
||||||
|
LIMIT 5;
|
||||||
|
|
||||||
|
# Expected events:
|
||||||
|
# - update_dispatched
|
||||||
|
# - download_started (if implemented)
|
||||||
|
# - download_complete (if implemented)
|
||||||
|
# - update_applied
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Check health metrics incremented**
|
||||||
|
```sql
|
||||||
|
SELECT version, total_attempts, successful_updates, failed_updates, crash_count, health_status
|
||||||
|
FROM update_health_metrics
|
||||||
|
WHERE version = '$VERSION';
|
||||||
|
|
||||||
|
# Expected:
|
||||||
|
# total_attempts = 1
|
||||||
|
# successful_updates = 1
|
||||||
|
# health_status = 'unknown' (< 5 attempts)
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Simulate crash**
|
||||||
|
```bash
|
||||||
|
# On test agent
|
||||||
|
# 1. Trigger update dispatch
|
||||||
|
# 2. Immediately after "update_applied" event, stop agent service
|
||||||
|
sudo systemctl stop gururmm-agent
|
||||||
|
# 3. Wait 60-90 seconds for health monitor scan
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Verify crash detection**
|
||||||
|
```sql
|
||||||
|
SELECT event_type, created_at
|
||||||
|
FROM agent_update_events
|
||||||
|
WHERE agent_id = 'GURU-KALI-UUID'
|
||||||
|
AND event_type = 'crash_detected'
|
||||||
|
ORDER BY created_at DESC;
|
||||||
|
|
||||||
|
# Expected: Should see crash_detected event
|
||||||
|
|
||||||
|
SELECT crash_count, health_status
|
||||||
|
FROM update_health_metrics
|
||||||
|
WHERE version = '$VERSION';
|
||||||
|
|
||||||
|
# Expected: crash_count incremented, health_status may change
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Check server logs**
|
||||||
|
```bash
|
||||||
|
sudo journalctl -u gururmm-server -n 100 | grep -E "crash|health"
|
||||||
|
# Expected: "Detected crash: agent X went offline after updating to Y"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Success Criteria
|
||||||
|
- ✅ Events logged correctly (update_dispatched, update_applied)
|
||||||
|
- ✅ Health metrics incremented on success
|
||||||
|
- ✅ Crash detected within 90 seconds
|
||||||
|
- ✅ crash_detected event logged
|
||||||
|
- ✅ Crash counter incremented
|
||||||
|
- ✅ Health status updated based on thresholds
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 3: Promotion Workflow
|
||||||
|
|
||||||
|
**Objective:** Verify promotion from beta to stable with health gates.
|
||||||
|
|
||||||
|
### Steps
|
||||||
|
|
||||||
|
1. **Attempt promotion with insufficient data**
|
||||||
|
```bash
|
||||||
|
curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \
|
||||||
|
-H "Authorization: Bearer $TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"os": "linux", "arch": "amd64", "force": false}'
|
||||||
|
|
||||||
|
# Expected: May succeed (unknown status allows promotion) or fail if health check implemented
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Generate healthy metrics**
|
||||||
|
```bash
|
||||||
|
# Simulate 5+ successful updates
|
||||||
|
# Option A: Manually insert via SQL (for testing)
|
||||||
|
# Option B: Trigger real updates on multiple beta agents
|
||||||
|
|
||||||
|
# SQL approach for testing:
|
||||||
|
sudo -u postgres psql gururmm_production << EOF
|
||||||
|
UPDATE update_health_metrics
|
||||||
|
SET total_attempts = 5,
|
||||||
|
successful_updates = 5,
|
||||||
|
failed_updates = 0,
|
||||||
|
crash_count = 0,
|
||||||
|
health_status = 'healthy'
|
||||||
|
WHERE version = '$VERSION' AND os = 'linux' AND arch = 'amd64';
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Verify health status**
|
||||||
|
```bash
|
||||||
|
curl https://rmm.azcomputerguru.com/api/updates/rollouts \
|
||||||
|
-H "Authorization: Bearer $TOKEN" | jq '.[] | select(.version == "'$VERSION'")'
|
||||||
|
|
||||||
|
# Expected: health.status = "healthy"
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Promote to stable**
|
||||||
|
```bash
|
||||||
|
curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \
|
||||||
|
-H "Authorization: Bearer $TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"os": "linux", "arch": "amd64", "force": false}'
|
||||||
|
|
||||||
|
# Expected: {"success": true, "message": "Promoted...", "files_updated": 2}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Verify .channel files updated**
|
||||||
|
```bash
|
||||||
|
cat /var/www/gururmm/downloads/gururmm-agent-linux-amd64-${VERSION}.tar.gz.channel
|
||||||
|
# Expected: stable
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Verify database updated**
|
||||||
|
```sql
|
||||||
|
SELECT channel, promoted_at, promoted_by
|
||||||
|
FROM update_rollouts
|
||||||
|
WHERE version = '$VERSION' AND os = 'linux' AND arch = 'amd64';
|
||||||
|
|
||||||
|
# Expected: channel = 'stable', promoted_at = NOW(), promoted_by = user_id
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Verify stable agents receive update**
|
||||||
|
- Ensure test agent is on "stable" channel
|
||||||
|
- Wait for scanner rescan (happens immediately after promotion)
|
||||||
|
- Check agent state
|
||||||
|
- Expected: update_available = true
|
||||||
|
|
||||||
|
8. **Test force promotion**
|
||||||
|
```bash
|
||||||
|
# Set health to warning
|
||||||
|
sudo -u postgres psql gururmm_production << EOF
|
||||||
|
UPDATE update_health_metrics
|
||||||
|
SET health_status = 'warning'
|
||||||
|
WHERE version = '$VERSION' AND os = 'windows' AND arch = 'amd64';
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Try promotion without force
|
||||||
|
curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \
|
||||||
|
-H "Authorization: Bearer $TOKEN" \
|
||||||
|
-d '{"os": "windows", "arch": "amd64", "force": false}'
|
||||||
|
|
||||||
|
# Expected: 403 error with message about health status
|
||||||
|
|
||||||
|
# Try with force flag
|
||||||
|
curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/promote \
|
||||||
|
-H "Authorization: Bearer $TOKEN" \
|
||||||
|
-d '{"os": "windows", "arch": "amd64", "force": true}'
|
||||||
|
|
||||||
|
# Expected: 200 success (overridden health check)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Success Criteria
|
||||||
|
- ✅ Promotion blocked for unhealthy versions (unless forced)
|
||||||
|
- ✅ Promotion succeeds for healthy versions
|
||||||
|
- ✅ .channel files updated from "beta" to "stable"
|
||||||
|
- ✅ Database rollouts table updated
|
||||||
|
- ✅ Scanner rescans immediately
|
||||||
|
- ✅ Stable agents receive update after promotion
|
||||||
|
- ✅ Force flag overrides health checks
|
||||||
|
- ✅ Dashboard shows updated channel
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 4: Rollback Workflow
|
||||||
|
|
||||||
|
**Objective:** Verify rollback blocks version and force-downgrades agents.
|
||||||
|
|
||||||
|
### Steps
|
||||||
|
|
||||||
|
1. **Prepare for rollback**
|
||||||
|
```bash
|
||||||
|
# Ensure test agent is running the rollback target version
|
||||||
|
# Verify previous stable version exists
|
||||||
|
curl https://rmm.azcomputerguru.com/api/updates/rollouts \
|
||||||
|
-H "Authorization: Bearer $TOKEN" | jq '.[] | select(.channel == "stable") | .version'
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Execute rollback**
|
||||||
|
```bash
|
||||||
|
curl -X POST https://rmm.azcomputerguru.com/api/updates/rollouts/$VERSION/rollback \
|
||||||
|
-H "Authorization: Bearer $TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"os": "linux",
|
||||||
|
"arch": "amd64",
|
||||||
|
"reason": "Test rollback: simulating critical bug in version '$VERSION'"
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Expected: {"success": true, "agents_affected": 1, "downgrade_version": "0.6.40"}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Verify .channel files removed**
|
||||||
|
```bash
|
||||||
|
ls /var/www/gururmm/downloads/gururmm-agent-linux-amd64-${VERSION}.tar.gz.channel
|
||||||
|
# Expected: File not found (removed)
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Verify health status blocked**
|
||||||
|
```sql
|
||||||
|
SELECT health_status, last_incident
|
||||||
|
FROM update_health_metrics
|
||||||
|
WHERE version = '$VERSION' AND os = 'linux' AND arch = 'amd64';
|
||||||
|
|
||||||
|
# Expected: health_status = 'blocked', last_incident = reason text
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Verify forced downgrade dispatched**
|
||||||
|
```bash
|
||||||
|
# Check server logs for WebSocket dispatch
|
||||||
|
sudo journalctl -u gururmm-server -n 100 | grep -i "downgrade\|rollback"
|
||||||
|
|
||||||
|
# Check agent receives forced update
|
||||||
|
# Monitor agent logs for update trigger
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Verify agent downgrades**
|
||||||
|
- Agent should receive UpdateAvailable message with previous version
|
||||||
|
- Agent should download and install previous version
|
||||||
|
- Check agent version after completion
|
||||||
|
- Expected: agent_version = previous stable version
|
||||||
|
|
||||||
|
7. **Verify blocked version not offered again**
|
||||||
|
```bash
|
||||||
|
# Scanner should skip files without .channel files
|
||||||
|
# Verify version is not in available updates list
|
||||||
|
curl https://rmm.azcomputerguru.com/api/updates/rollouts \
|
||||||
|
-H "Authorization: Bearer $TOKEN" | jq '.[] | select(.version == "'$VERSION'")'
|
||||||
|
|
||||||
|
# If present, should show channel = null or health.status = "blocked"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Success Criteria
|
||||||
|
- ✅ .channel files removed
|
||||||
|
- ✅ Health status set to "blocked"
|
||||||
|
- ✅ Last incident reason recorded
|
||||||
|
- ✅ Connected agents receive forced downgrade
|
||||||
|
- ✅ Agents successfully downgrade to previous stable
|
||||||
|
- ✅ Blocked version not offered to new agents
|
||||||
|
- ✅ Dashboard shows blocked status
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 5: Dashboard UI Testing
|
||||||
|
|
||||||
|
**Objective:** Verify Updates page displays correctly and actions work.
|
||||||
|
|
||||||
|
### Steps
|
||||||
|
|
||||||
|
1. **Access Updates page**
|
||||||
|
- Navigate to https://rmm.azcomputerguru.com/updates
|
||||||
|
- Login if needed
|
||||||
|
|
||||||
|
2. **Verify data display**
|
||||||
|
- [ ] Table shows all rollout versions
|
||||||
|
- [ ] Columns: Version, OS/Arch, Channel, Health, Success Rate, Agent Counts, Actions
|
||||||
|
- [ ] Health badges color-coded (green/yellow/red/gray)
|
||||||
|
- [ ] Success rate calculated correctly
|
||||||
|
- [ ] Agent counts accurate
|
||||||
|
|
||||||
|
3. **Test promote button**
|
||||||
|
- [ ] Enabled for beta + healthy versions only
|
||||||
|
- [ ] Disabled with tooltip for unhealthy versions
|
||||||
|
- [ ] Click opens confirmation dialog
|
||||||
|
- [ ] Confirm triggers API call
|
||||||
|
- [ ] Success toast appears
|
||||||
|
- [ ] Table refreshes with updated data
|
||||||
|
|
||||||
|
4. **Test rollback button**
|
||||||
|
- [ ] Always enabled
|
||||||
|
- [ ] Click opens dialog with reason input
|
||||||
|
- [ ] Reason field is required
|
||||||
|
- [ ] Confirm triggers API call
|
||||||
|
- [ ] Success toast shows agent count
|
||||||
|
- [ ] Table refreshes with updated data
|
||||||
|
|
||||||
|
5. **Test error handling**
|
||||||
|
- [ ] Shows loading state during fetch
|
||||||
|
- [ ] Shows error message if API fails
|
||||||
|
- [ ] Retry button works
|
||||||
|
- [ ] Shows empty state if no rollouts
|
||||||
|
|
||||||
|
6. **Test auto-refresh**
|
||||||
|
- [ ] Data refreshes every 30 seconds
|
||||||
|
- [ ] Refresh doesn't disrupt UI interactions
|
||||||
|
- [ ] Manual refresh button works
|
||||||
|
|
||||||
|
### Success Criteria
|
||||||
|
- ✅ All table columns display correct data
|
||||||
|
- ✅ Health badges use correct colors
|
||||||
|
- ✅ Promote button only enabled for healthy beta versions
|
||||||
|
- ✅ Rollback button always enabled
|
||||||
|
- ✅ Confirmation dialogs work
|
||||||
|
- ✅ API calls succeed
|
||||||
|
- ✅ Toasts display success/error
|
||||||
|
- ✅ Auto-refresh works
|
||||||
|
- ✅ Responsive on mobile
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 6: Integration Testing
|
||||||
|
|
||||||
|
**Objective:** Test complete workflows end-to-end.
|
||||||
|
|
||||||
|
### Workflow 1: New Build → Beta Testing → Promotion → Stable Deployment
|
||||||
|
|
||||||
|
1. Trigger new build (auto-bumps version)
|
||||||
|
2. Verify .channel files = "beta"
|
||||||
|
3. Mark GURU-KALI as beta agent
|
||||||
|
4. Wait for update dispatch
|
||||||
|
5. Monitor update installation
|
||||||
|
6. Verify success event logged
|
||||||
|
7. Repeat 4 more times for healthy status
|
||||||
|
8. Promote via dashboard
|
||||||
|
9. Verify GURU-5070 (stable) receives update
|
||||||
|
10. Monitor stable deployment
|
||||||
|
11. Verify all agents updated
|
||||||
|
|
||||||
|
**Expected:** Beta testing prevents bad updates from reaching production.
|
||||||
|
|
||||||
|
### Workflow 2: Critical Bug → Rollback → Fleet Downgrade
|
||||||
|
|
||||||
|
1. Simulate critical bug discovered post-promotion
|
||||||
|
2. Execute rollback via dashboard
|
||||||
|
3. Verify all agents receive forced downgrade
|
||||||
|
4. Verify agents revert to previous stable
|
||||||
|
5. Verify new agents don't receive blocked version
|
||||||
|
6. Verify health metrics show blocked status
|
||||||
|
|
||||||
|
**Expected:** Rollback protects fleet from bad updates.
|
||||||
|
|
||||||
|
### Workflow 3: Crash Detection → Auto-Block (Future Enhancement)
|
||||||
|
|
||||||
|
1. Deploy update to beta agents
|
||||||
|
2. Simulate crash (stop service after update)
|
||||||
|
3. Wait for health monitor (60s)
|
||||||
|
4. Verify crash detected and logged
|
||||||
|
5. Check if crash rate >25%
|
||||||
|
6. Verify health status = "critical"
|
||||||
|
7. Attempt promotion
|
||||||
|
8. Verify promotion blocked
|
||||||
|
|
||||||
|
**Expected:** High crash rates prevent automatic promotion.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance Testing
|
||||||
|
|
||||||
|
### Load Testing
|
||||||
|
- [ ] 100+ agents checking for updates simultaneously
|
||||||
|
- [ ] Scanner performance with 50+ versions
|
||||||
|
- [ ] Health monitor with 1000+ update events
|
||||||
|
- [ ] Dashboard with 20+ rollouts displayed
|
||||||
|
|
||||||
|
### Stress Testing
|
||||||
|
- [ ] Rapid version releases (5 builds in 10 minutes)
|
||||||
|
- [ ] Mass rollback (100+ agents)
|
||||||
|
- [ ] Concurrent API calls (multiple users promoting/rolling back)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Security Testing
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
- [ ] All API endpoints require valid JWT
|
||||||
|
- [ ] Expired tokens rejected
|
||||||
|
- [ ] Invalid tokens rejected
|
||||||
|
|
||||||
|
### Authorization
|
||||||
|
- [ ] Admin role can promote/rollback
|
||||||
|
- [ ] Non-admin role blocked (if RBAC implemented)
|
||||||
|
|
||||||
|
### Input Validation
|
||||||
|
- [ ] SQL injection attempts blocked
|
||||||
|
- [ ] XSS attempts in reason field sanitized
|
||||||
|
- [ ] Invalid version strings rejected
|
||||||
|
- [ ] Invalid OS/arch values rejected
|
||||||
|
|
||||||
|
### File System Security
|
||||||
|
- [ ] .channel files have correct permissions
|
||||||
|
- [ ] Path traversal attempts blocked
|
||||||
|
- [ ] Only authorized processes can modify .channel files
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Regression Testing
|
||||||
|
|
||||||
|
### Existing Functionality
|
||||||
|
- [ ] Agent registration still works
|
||||||
|
- [ ] Heartbeat processing unaffected
|
||||||
|
- [ ] Command execution unaffected
|
||||||
|
- [ ] Metrics collection unaffected
|
||||||
|
- [ ] Alert generation unaffected
|
||||||
|
- [ ] Policy enforcement unaffected
|
||||||
|
|
||||||
|
### Database Performance
|
||||||
|
- [ ] No slow queries introduced
|
||||||
|
- [ ] Indexes used efficiently
|
||||||
|
- [ ] No lock contention
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation Verification
|
||||||
|
|
||||||
|
- [ ] API endpoints documented
|
||||||
|
- [ ] Database schema documented
|
||||||
|
- [ ] Dashboard user guide accurate
|
||||||
|
- [ ] Admin procedures documented
|
||||||
|
- [ ] Troubleshooting guide created
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sign-Off
|
||||||
|
|
||||||
|
### Phase 6 Test Results
|
||||||
|
|
||||||
|
**Tester:** ___________________________
|
||||||
|
**Date:** ___________________________
|
||||||
|
|
||||||
|
**Test 1 - Beta-First Workflow:** ⬜ PASS ⬜ FAIL
|
||||||
|
**Test 2 - Health Monitoring:** ⬜ PASS ⬜ FAIL
|
||||||
|
**Test 3 - Promotion:** ⬜ PASS ⬜ FAIL
|
||||||
|
**Test 4 - Rollback:** ⬜ PASS ⬜ FAIL
|
||||||
|
**Test 5 - Dashboard UI:** ⬜ PASS ⬜ FAIL
|
||||||
|
**Test 6 - Integration:** ⬜ PASS ⬜ FAIL
|
||||||
|
|
||||||
|
**Overall Status:** ⬜ APPROVED FOR PRODUCTION ⬜ NEEDS FIXES
|
||||||
|
|
||||||
|
**Notes:**
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
**Blockers/Issues:**
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
**Deployment Date:** ___________________________
|
||||||
282
verify-rollout-system.sh
Executable file
282
verify-rollout-system.sh
Executable file
@@ -0,0 +1,282 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Verification script for Safe Agent Rollout System
|
||||||
|
# Run on Saturn (172.16.3.30) to verify Phase 1-5 implementation
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
RED='\033[0;31m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
echo "=========================================="
|
||||||
|
echo "GuruRMM Safe Rollout System Verification"
|
||||||
|
echo "=========================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Function to check status
|
||||||
|
check() {
|
||||||
|
if [ $? -eq 0 ]; then
|
||||||
|
echo -e "${GREEN}[OK]${NC} $1"
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo -e "${RED}[FAIL]${NC} $1"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
info() {
|
||||||
|
echo -e "${YELLOW}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
FAIL_COUNT=0
|
||||||
|
|
||||||
|
# ===== Phase 1: Build Scripts =====
|
||||||
|
echo "Phase 1: Build Scripts"
|
||||||
|
echo "----------------------"
|
||||||
|
|
||||||
|
if grep -q "Mark all new builds as beta" /opt/gururmm/build-linux.sh; then
|
||||||
|
check "build-linux.sh has beta marking code"
|
||||||
|
else
|
||||||
|
check "build-linux.sh missing beta marking code"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if grep -q "Mark all new builds as beta" /opt/gururmm/build-windows.sh; then
|
||||||
|
check "build-windows.sh has beta marking code"
|
||||||
|
else
|
||||||
|
check "build-windows.sh missing beta marking code"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for actual .channel files
|
||||||
|
CHANNEL_COUNT=$(find /var/www/gururmm/downloads -name "*.channel" 2>/dev/null | wc -l)
|
||||||
|
if [ "$CHANNEL_COUNT" -gt 0 ]; then
|
||||||
|
check ".channel files exist in downloads directory ($CHANNEL_COUNT found)"
|
||||||
|
info "Sample: $(find /var/www/gururmm/downloads -name "*.channel" | head -1)"
|
||||||
|
SAMPLE_FILE=$(find /var/www/gururmm/downloads -name "*.channel" | head -1)
|
||||||
|
if [ -f "$SAMPLE_FILE" ]; then
|
||||||
|
SAMPLE_CONTENT=$(cat "$SAMPLE_FILE")
|
||||||
|
info "Content: $SAMPLE_CONTENT"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check ".channel files in downloads directory"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
info "No .channel files found - may need to trigger a build"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ===== Phase 2: Database Migration =====
|
||||||
|
echo "Phase 2: Database Migration"
|
||||||
|
echo "---------------------------"
|
||||||
|
|
||||||
|
# Check tables exist
|
||||||
|
if sudo -u postgres psql gururmm_production -t -c "\d update_rollouts" &>/dev/null; then
|
||||||
|
check "update_rollouts table exists"
|
||||||
|
else
|
||||||
|
check "update_rollouts table exists"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if sudo -u postgres psql gururmm_production -t -c "\d update_health_metrics" &>/dev/null; then
|
||||||
|
check "update_health_metrics table exists"
|
||||||
|
else
|
||||||
|
check "update_health_metrics table exists"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if sudo -u postgres psql gururmm_production -t -c "\d agent_update_events" &>/dev/null; then
|
||||||
|
check "agent_update_events table exists"
|
||||||
|
else
|
||||||
|
check "agent_update_events table exists"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for data
|
||||||
|
ROLLOUT_COUNT=$(sudo -u postgres psql gururmm_production -t -c "SELECT COUNT(*) FROM update_rollouts" 2>/dev/null | xargs)
|
||||||
|
info "Rollouts tracked: $ROLLOUT_COUNT"
|
||||||
|
|
||||||
|
EVENT_COUNT=$(sudo -u postgres psql gururmm_production -t -c "SELECT COUNT(*) FROM agent_update_events" 2>/dev/null | xargs)
|
||||||
|
info "Update events logged: $EVENT_COUNT"
|
||||||
|
|
||||||
|
METRIC_COUNT=$(sudo -u postgres psql gururmm_production -t -c "SELECT COUNT(*) FROM update_health_metrics" 2>/dev/null | xargs)
|
||||||
|
info "Health metrics tracked: $METRIC_COUNT"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ===== Phase 3: Health Monitoring =====
|
||||||
|
echo "Phase 3: Health Monitoring"
|
||||||
|
echo "--------------------------"
|
||||||
|
|
||||||
|
# Check source files exist
|
||||||
|
if [ -f "/opt/gururmm/server/src/updates/health.rs" ]; then
|
||||||
|
check "health.rs source file exists"
|
||||||
|
else
|
||||||
|
check "health.rs source file exists"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if server is running
|
||||||
|
if systemctl is-active --quiet gururmm-server; then
|
||||||
|
check "GuruRMM server is running"
|
||||||
|
|
||||||
|
# Check for health monitor in logs
|
||||||
|
if sudo journalctl -u gururmm-server --since "1 hour ago" | grep -q "Health monitoring task spawned"; then
|
||||||
|
check "Health monitor task spawned (found in logs)"
|
||||||
|
else
|
||||||
|
echo -e "${YELLOW}[WARN]${NC} Health monitor spawn message not found in recent logs"
|
||||||
|
info "May need to restart service if code just deployed"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check "GuruRMM server is running"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ===== Phase 4: API Endpoints =====
|
||||||
|
echo "Phase 4: API Endpoints"
|
||||||
|
echo "----------------------"
|
||||||
|
|
||||||
|
if [ -f "/opt/gururmm/server/src/api/updates.rs" ]; then
|
||||||
|
check "updates.rs API file exists"
|
||||||
|
|
||||||
|
# Check for key functions
|
||||||
|
if grep -q "pub async fn list_rollouts" /opt/gururmm/server/src/api/updates.rs; then
|
||||||
|
check "list_rollouts endpoint defined"
|
||||||
|
else
|
||||||
|
check "list_rollouts endpoint defined"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if grep -q "pub async fn promote_version" /opt/gururmm/server/src/api/updates.rs; then
|
||||||
|
check "promote_version endpoint defined"
|
||||||
|
else
|
||||||
|
check "promote_version endpoint defined"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if grep -q "pub async fn rollback_version" /opt/gururmm/server/src/api/updates.rs; then
|
||||||
|
check "rollback_version endpoint defined"
|
||||||
|
else
|
||||||
|
check "rollback_version endpoint defined"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check "updates.rs API file exists"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check routes registered
|
||||||
|
if grep -q "api::updates::list_rollouts" /opt/gururmm/server/src/api/mod.rs; then
|
||||||
|
check "API routes registered in mod.rs"
|
||||||
|
else
|
||||||
|
check "API routes registered in mod.rs"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ===== Phase 5: Dashboard UI =====
|
||||||
|
echo "Phase 5: Dashboard UI"
|
||||||
|
echo "---------------------"
|
||||||
|
|
||||||
|
if [ -f "/opt/gururmm/dashboard/src/pages/Updates.tsx" ]; then
|
||||||
|
check "Updates.tsx page exists"
|
||||||
|
|
||||||
|
# Check for key components
|
||||||
|
if grep -q "RolloutInfo" /opt/gururmm/dashboard/src/pages/Updates.tsx; then
|
||||||
|
check "RolloutInfo interface defined"
|
||||||
|
else
|
||||||
|
check "RolloutInfo interface defined"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if grep -q "handlePromote" /opt/gururmm/dashboard/src/pages/Updates.tsx; then
|
||||||
|
check "Promote functionality implemented"
|
||||||
|
else
|
||||||
|
check "Promote functionality implemented"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if grep -q "handleRollback" /opt/gururmm/dashboard/src/pages/Updates.tsx; then
|
||||||
|
check "Rollback functionality implemented"
|
||||||
|
else
|
||||||
|
check "Rollback functionality implemented"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check "Updates.tsx page exists"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check navigation
|
||||||
|
if grep -q "/updates" /opt/gururmm/dashboard/src/App.tsx; then
|
||||||
|
check "Updates route registered in App.tsx"
|
||||||
|
else
|
||||||
|
check "Updates route registered in App.tsx"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if grep -q "updates" /opt/gururmm/dashboard/src/components/Layout.tsx; then
|
||||||
|
check "Updates navigation link added"
|
||||||
|
else
|
||||||
|
check "Updates navigation link added"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ===== Build Status =====
|
||||||
|
echo "Build Status"
|
||||||
|
echo "------------"
|
||||||
|
|
||||||
|
# Check server binary
|
||||||
|
if [ -f "/opt/gururmm/gururmm-server" ]; then
|
||||||
|
SERVER_SIZE=$(stat -f%z "/opt/gururmm/gururmm-server" 2>/dev/null || stat -c%s "/opt/gururmm/gururmm-server" 2>/dev/null)
|
||||||
|
SERVER_DATE=$(stat -f%Sm "/opt/gururmm/gururmm-server" 2>/dev/null || stat -c%y "/opt/gururmm/gururmm-server" 2>/dev/null | cut -d' ' -f1)
|
||||||
|
check "Server binary exists (${SERVER_SIZE} bytes, ${SERVER_DATE})"
|
||||||
|
else
|
||||||
|
check "Server binary exists"
|
||||||
|
((FAIL_COUNT++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check dashboard build
|
||||||
|
if [ -d "/opt/gururmm/dashboard/dist" ]; then
|
||||||
|
check "Dashboard build exists"
|
||||||
|
else
|
||||||
|
echo -e "${YELLOW}[WARN]${NC} Dashboard dist/ directory not found - may need to run 'npm run build'"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ===== Summary =====
|
||||||
|
echo "=========================================="
|
||||||
|
echo "Verification Summary"
|
||||||
|
echo "=========================================="
|
||||||
|
|
||||||
|
if [ $FAIL_COUNT -eq 0 ]; then
|
||||||
|
echo -e "${GREEN}✓ All checks passed!${NC}"
|
||||||
|
echo ""
|
||||||
|
echo "Safe Agent Rollout System is ready for Phase 6 testing."
|
||||||
|
echo ""
|
||||||
|
echo "Next steps:"
|
||||||
|
echo " 1. Review PHASE_6_TEST_PLAN.md"
|
||||||
|
echo " 2. Execute Test 1: Beta-first build workflow"
|
||||||
|
echo " 3. Execute Test 2-4: Health monitoring, promotion, rollback"
|
||||||
|
echo " 4. Execute Test 5: Dashboard UI testing"
|
||||||
|
echo " 5. Execute Test 6: Integration testing"
|
||||||
|
exit 0
|
||||||
|
else
|
||||||
|
echo -e "${RED}✗ ${FAIL_COUNT} check(s) failed${NC}"
|
||||||
|
echo ""
|
||||||
|
echo "Review failures above and fix before proceeding to Phase 6."
|
||||||
|
echo ""
|
||||||
|
echo "Common issues:"
|
||||||
|
echo " - Code not deployed (git pull + rebuild needed)"
|
||||||
|
echo " - Migration not applied (run migration 046)"
|
||||||
|
echo " - Service not restarted (systemctl restart gururmm-server)"
|
||||||
|
echo " - Build not triggered (no .channel files yet)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
Reference in New Issue
Block a user