Files
claudetools/UPDATES_PAGE_USER_GUIDE.md
Mike Swanson 355c4acbc9 sync: auto-sync from Mikes-MacBook-Air.local at 2026-05-25 13:53:11
Author: Mike Swanson
Machine: Mikes-MacBook-Air.local
Timestamp: 2026-05-25 13:53:11
2026-05-25 13:53:13 -07:00

305 lines
8.8 KiB
Markdown

# Updates Page - User Guide
## Overview
The Updates page provides a centralized dashboard for managing agent version rollouts across your GuruRMM infrastructure. It shows real-time health metrics and enables safe promotion or emergency rollback of agent versions.
## Accessing the Page
1. Log into GuruRMM dashboard
2. Navigate to **Config > Updates** in the sidebar
3. Or visit: `https://rmm.azcomputerguru.com/updates`
## Understanding the Table
### Columns
#### Version
- Displays the agent version number (e.g., 0.6.27)
- Shown in monospace font for clarity
- Sorted newest to oldest
#### OS / Arch
- Operating system and architecture
- Examples: `windows / x86_64`, `linux / aarch64`
- Each OS/arch combination is tracked separately
#### Channel
- **Beta** (blue badge): Testing channel with limited agents
- **Stable** (purple badge): Production channel with all agents
#### Health Status
- **Healthy** (green): All metrics within safe thresholds
- **Warning** (yellow): Some metrics approaching thresholds
- **Critical** (red): Metrics exceed safety thresholds
- **Blocked** (dark red): Version blocked from promotion
- **Unknown** (gray): No health data yet
#### Success Rate
- Percentage of successful update attempts
- Color coded:
- Green: >= 95% (excellent)
- Yellow: >= 80% (acceptable)
- Red: < 80% (concerning)
- Shows fraction: e.g., "96% (48/50)"
#### Beta Agents
- Number of agents currently on this version in beta channel
- Updates in real-time
#### Stable Agents
- Number of agents currently on this version in stable channel
- Updates in real-time
#### Actions
- **Promote** (up arrow): Move beta version to stable
- **Rollback** (rotate arrow): Force downgrade all agents
## Promoting a Version to Stable
### When to Promote
Promote a beta version when:
- Health status is "Healthy" (green)
- Success rate is >= 95%
- Beta agents have been running it for sufficient time
- No critical issues reported
### How to Promote
1. Find the beta version you want to promote
2. Click the **Promote** button (up arrow)
3. Review the confirmation dialog
4. Click **Promote** to confirm
### If Health Check Fails
If the automatic health check fails (e.g., crash rate too high):
1. You'll see a warning dialog explaining the issue
2. Option to **Force Promote** appears
3. Review the warning carefully
4. Only force promote if you understand the risks
5. Consider investigating the health issues first
### After Promotion
- Success toast shows: "Version X.Y.Z promoted to stable"
- Table refreshes automatically
- All agents on "stable" channel will update to this version
- Beta agents remain on beta (they'll get the next beta)
## Rolling Back a Version
### When to Rollback
Rollback a version when:
- Critical bug discovered after promotion
- Unexpected behavior in production
- Security vulnerability found
- Performance issues causing problems
### How to Rollback
1. Find the version causing issues
2. Click the **Rollback** button (rotate arrow)
3. Enter a **required reason** in the dialog
- Example: "Critical memory leak causing crashes"
- This reason is logged for audit purposes
4. Review the warning: "This will force-downgrade all agents"
5. Click **Rollback** to confirm
### After Rollback
- Success toast shows: "Version X.Y.Z rolled back. N agents downgraded"
- All agents on this version will downgrade immediately
- Previous stable version becomes active again
- Rollback is logged in the database
## Understanding Health Metrics
### What Gets Tracked
- **Total Attempts**: Number of agents that attempted to update
- **Success Count**: Updates that completed successfully
- **Failure Count**: Updates that failed (network, download, etc.)
- **Crash Count**: Agents that crashed after updating
### Health Status Calculation
The system automatically calculates health based on:
- Success rate (success_count / total_attempts)
- Crash rate (crash_count / total_attempts)
- Failure patterns over time
### Thresholds (from Phase 4)
- **Healthy**:
- Success rate >= 95%
- Crash rate < 1%
- Failure rate < 5%
- **Warning**:
- Success rate 90-94%
- Crash rate 1-2%
- Failure rate 5-10%
- **Critical**:
- Success rate < 90%
- Crash rate >= 2%
- Failure rate > 10%
- **Blocked**:
- Crash rate >= 5% (auto-blocked from promotion)
## Auto-Refresh
### Behavior
- Table refreshes every 30 seconds automatically
- Manual refresh available with **Refresh** button
- Auto-refresh doesn't interrupt dialogs or actions
- Loading indicator shows during refresh
### Why Auto-Refresh?
- See real-time health status changes
- Monitor promotion rollout progress
- Catch issues as they emerge
- No need to manually reload
## Best Practices
### Testing Flow
1. **Deploy to Beta**: Build and deploy new version
2. **Monitor Health**: Watch Updates page for 24-48 hours
3. **Check Success Rate**: Ensure >= 95% success
4. **Review Logs**: Look for any errors or warnings
5. **Promote**: Once healthy, promote to stable
6. **Monitor Rollout**: Watch stable agents update
7. **Be Ready**: Keep an eye on metrics for first hour
### Emergency Response
1. **Notice Issue**: See critical health status or reports
2. **Assess Impact**: Check how many agents affected
3. **Rollback**: Use rollback button immediately
4. **Document**: Provide clear reason in rollback dialog
5. **Investigate**: Review logs and crash reports
6. **Fix**: Prepare hotfix version
7. **Re-test**: Deploy to beta first, never skip testing
### Version Naming
- Use semantic versioning: MAJOR.MINOR.PATCH
- Example: 0.6.27 → 0.6.28 (patch), 0.7.0 (minor), 1.0.0 (major)
- Consistent versioning helps with sorting and tracking
## Common Scenarios
### Scenario 1: Normal Promotion
```
1. New version 0.6.28 deployed to beta
2. 15 beta agents update successfully (100% success rate)
3. Health shows "Healthy" (green)
4. After 24 hours, promote to stable
5. 230 stable agents begin updating
6. Monitor for issues during rollout
```
### Scenario 2: Warning Status
```
1. Beta version shows "Warning" (yellow)
2. Success rate is 92% (46/50 agents)
3. Investigate: 4 agents failed due to network timeout
4. Decision: Wait longer or fix issue before promoting
5. Do not promote until "Healthy"
```
### Scenario 3: Emergency Rollback
```
1. Version 0.6.28 promoted to stable
2. After 30 minutes, users report crashes
3. Updates page shows "Critical" status
4. Crash count increasing
5. Immediately rollback with reason: "Crashes on startup"
6. Agents downgrade within minutes
7. Investigate crash dumps and fix issue
```
### Scenario 4: Force Promotion
```
1. Beta version shows "Warning" but issue is minor
2. Attempt to promote
3. System blocks due to health check
4. Review the specific warning
5. If acceptable (e.g., known cosmetic issue), force promote
6. Monitor closely after force promotion
```
## Troubleshooting
### Table Shows "No rollouts yet"
- No versions have been deployed to beta or stable
- Build and deploy a version to see it appear
- Check server logs for build issues
### Promote Button Disabled
- Version is not on beta channel (only beta can promote)
- Health status is not "Healthy"
- Hover over button to see tooltip explaining why
### Auto-Refresh Stopped
- Check network connection
- Check browser console for errors
- Try manual refresh button
- Reload page if issues persist
### Action Failed with Error
- Check network connectivity
- Verify you're still logged in
- Check server status
- Review error message in toast notification
## Security & Permissions
### Who Can Promote/Rollback?
- Currently: All authenticated users
- Future: May be restricted to admin role
- All actions are logged with user attribution
### Audit Trail
- All promotions logged in database
- All rollbacks logged with reason
- Timestamps and user IDs recorded
- Review audit logs: `rollout_events` table
## Performance
### Page Load Time
- Initial load: < 2 seconds
- Auto-refresh: < 500ms
- Action response: < 1 second
### Mobile Support
- Fully responsive design
- Table scrolls horizontally on small screens
- Dialogs adapt to mobile viewport
- All actions work on touch devices
## Related Pages
### Agent Detail Page
- Shows current version for individual agent
- Links to Updates page for version info
- Displays update channel (beta/stable)
### Policies Page
- Configure auto-update policies
- Set update channel per client/site/agent
- Control update timing and windows
### Logs Page
- View update-related log entries
- Filter by "update" or "rollout" keywords
- See detailed failure messages
## Support
### Need Help?
- Check server logs: `/api/logs` endpoint
- Review agent logs on affected machines
- Contact support with version number and error details
- Include rollout health metrics in report
### Report Issues
- Use rollback feature to mitigate first
- Document reproduction steps
- Include success rate and crash count
- Provide sample agent IDs affected
---
**Version**: 1.0
**Last Updated**: 2026-05-25
**Part of**: Safe Agent Rollout System (Phase 5)