sync: auto-sync from Mikes-MacBook-Air.local at 2026-05-25 13:53:11
Author: Mike Swanson Machine: Mikes-MacBook-Air.local Timestamp: 2026-05-25 13:53:11
This commit is contained in:
304
UPDATES_PAGE_USER_GUIDE.md
Normal file
304
UPDATES_PAGE_USER_GUIDE.md
Normal file
@@ -0,0 +1,304 @@
|
||||
# Updates Page - User Guide
|
||||
|
||||
## Overview
|
||||
The Updates page provides a centralized dashboard for managing agent version rollouts across your GuruRMM infrastructure. It shows real-time health metrics and enables safe promotion or emergency rollback of agent versions.
|
||||
|
||||
## Accessing the Page
|
||||
1. Log into GuruRMM dashboard
|
||||
2. Navigate to **Config > Updates** in the sidebar
|
||||
3. Or visit: `https://rmm.azcomputerguru.com/updates`
|
||||
|
||||
## Understanding the Table
|
||||
|
||||
### Columns
|
||||
|
||||
#### Version
|
||||
- Displays the agent version number (e.g., 0.6.27)
|
||||
- Shown in monospace font for clarity
|
||||
- Sorted newest to oldest
|
||||
|
||||
#### OS / Arch
|
||||
- Operating system and architecture
|
||||
- Examples: `windows / x86_64`, `linux / aarch64`
|
||||
- Each OS/arch combination is tracked separately
|
||||
|
||||
#### Channel
|
||||
- **Beta** (blue badge): Testing channel with limited agents
|
||||
- **Stable** (purple badge): Production channel with all agents
|
||||
|
||||
#### Health Status
|
||||
- **Healthy** (green): All metrics within safe thresholds
|
||||
- **Warning** (yellow): Some metrics approaching thresholds
|
||||
- **Critical** (red): Metrics exceed safety thresholds
|
||||
- **Blocked** (dark red): Version blocked from promotion
|
||||
- **Unknown** (gray): No health data yet
|
||||
|
||||
#### Success Rate
|
||||
- Percentage of successful update attempts
|
||||
- Color coded:
|
||||
- Green: >= 95% (excellent)
|
||||
- Yellow: >= 80% (acceptable)
|
||||
- Red: < 80% (concerning)
|
||||
- Shows fraction: e.g., "96% (48/50)"
|
||||
|
||||
#### Beta Agents
|
||||
- Number of agents currently on this version in beta channel
|
||||
- Updates in real-time
|
||||
|
||||
#### Stable Agents
|
||||
- Number of agents currently on this version in stable channel
|
||||
- Updates in real-time
|
||||
|
||||
#### Actions
|
||||
- **Promote** (up arrow): Move beta version to stable
|
||||
- **Rollback** (rotate arrow): Force downgrade all agents
|
||||
|
||||
## Promoting a Version to Stable
|
||||
|
||||
### When to Promote
|
||||
Promote a beta version when:
|
||||
- Health status is "Healthy" (green)
|
||||
- Success rate is >= 95%
|
||||
- Beta agents have been running it for sufficient time
|
||||
- No critical issues reported
|
||||
|
||||
### How to Promote
|
||||
1. Find the beta version you want to promote
|
||||
2. Click the **Promote** button (up arrow)
|
||||
3. Review the confirmation dialog
|
||||
4. Click **Promote** to confirm
|
||||
|
||||
### If Health Check Fails
|
||||
If the automatic health check fails (e.g., crash rate too high):
|
||||
1. You'll see a warning dialog explaining the issue
|
||||
2. Option to **Force Promote** appears
|
||||
3. Review the warning carefully
|
||||
4. Only force promote if you understand the risks
|
||||
5. Consider investigating the health issues first
|
||||
|
||||
### After Promotion
|
||||
- Success toast shows: "Version X.Y.Z promoted to stable"
|
||||
- Table refreshes automatically
|
||||
- All agents on "stable" channel will update to this version
|
||||
- Beta agents remain on beta (they'll get the next beta)
|
||||
|
||||
## Rolling Back a Version
|
||||
|
||||
### When to Rollback
|
||||
Rollback a version when:
|
||||
- Critical bug discovered after promotion
|
||||
- Unexpected behavior in production
|
||||
- Security vulnerability found
|
||||
- Performance issues causing problems
|
||||
|
||||
### How to Rollback
|
||||
1. Find the version causing issues
|
||||
2. Click the **Rollback** button (rotate arrow)
|
||||
3. Enter a **required reason** in the dialog
|
||||
- Example: "Critical memory leak causing crashes"
|
||||
- This reason is logged for audit purposes
|
||||
4. Review the warning: "This will force-downgrade all agents"
|
||||
5. Click **Rollback** to confirm
|
||||
|
||||
### After Rollback
|
||||
- Success toast shows: "Version X.Y.Z rolled back. N agents downgraded"
|
||||
- All agents on this version will downgrade immediately
|
||||
- Previous stable version becomes active again
|
||||
- Rollback is logged in the database
|
||||
|
||||
## Understanding Health Metrics
|
||||
|
||||
### What Gets Tracked
|
||||
- **Total Attempts**: Number of agents that attempted to update
|
||||
- **Success Count**: Updates that completed successfully
|
||||
- **Failure Count**: Updates that failed (network, download, etc.)
|
||||
- **Crash Count**: Agents that crashed after updating
|
||||
|
||||
### Health Status Calculation
|
||||
The system automatically calculates health based on:
|
||||
- Success rate (success_count / total_attempts)
|
||||
- Crash rate (crash_count / total_attempts)
|
||||
- Failure patterns over time
|
||||
|
||||
### Thresholds (from Phase 4)
|
||||
- **Healthy**:
|
||||
- Success rate >= 95%
|
||||
- Crash rate < 1%
|
||||
- Failure rate < 5%
|
||||
- **Warning**:
|
||||
- Success rate 90-94%
|
||||
- Crash rate 1-2%
|
||||
- Failure rate 5-10%
|
||||
- **Critical**:
|
||||
- Success rate < 90%
|
||||
- Crash rate >= 2%
|
||||
- Failure rate > 10%
|
||||
- **Blocked**:
|
||||
- Crash rate >= 5% (auto-blocked from promotion)
|
||||
|
||||
## Auto-Refresh
|
||||
|
||||
### Behavior
|
||||
- Table refreshes every 30 seconds automatically
|
||||
- Manual refresh available with **Refresh** button
|
||||
- Auto-refresh doesn't interrupt dialogs or actions
|
||||
- Loading indicator shows during refresh
|
||||
|
||||
### Why Auto-Refresh?
|
||||
- See real-time health status changes
|
||||
- Monitor promotion rollout progress
|
||||
- Catch issues as they emerge
|
||||
- No need to manually reload
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Testing Flow
|
||||
1. **Deploy to Beta**: Build and deploy new version
|
||||
2. **Monitor Health**: Watch Updates page for 24-48 hours
|
||||
3. **Check Success Rate**: Ensure >= 95% success
|
||||
4. **Review Logs**: Look for any errors or warnings
|
||||
5. **Promote**: Once healthy, promote to stable
|
||||
6. **Monitor Rollout**: Watch stable agents update
|
||||
7. **Be Ready**: Keep an eye on metrics for first hour
|
||||
|
||||
### Emergency Response
|
||||
1. **Notice Issue**: See critical health status or reports
|
||||
2. **Assess Impact**: Check how many agents affected
|
||||
3. **Rollback**: Use rollback button immediately
|
||||
4. **Document**: Provide clear reason in rollback dialog
|
||||
5. **Investigate**: Review logs and crash reports
|
||||
6. **Fix**: Prepare hotfix version
|
||||
7. **Re-test**: Deploy to beta first, never skip testing
|
||||
|
||||
### Version Naming
|
||||
- Use semantic versioning: MAJOR.MINOR.PATCH
|
||||
- Example: 0.6.27 → 0.6.28 (patch), 0.7.0 (minor), 1.0.0 (major)
|
||||
- Consistent versioning helps with sorting and tracking
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario 1: Normal Promotion
|
||||
```
|
||||
1. New version 0.6.28 deployed to beta
|
||||
2. 15 beta agents update successfully (100% success rate)
|
||||
3. Health shows "Healthy" (green)
|
||||
4. After 24 hours, promote to stable
|
||||
5. 230 stable agents begin updating
|
||||
6. Monitor for issues during rollout
|
||||
```
|
||||
|
||||
### Scenario 2: Warning Status
|
||||
```
|
||||
1. Beta version shows "Warning" (yellow)
|
||||
2. Success rate is 92% (46/50 agents)
|
||||
3. Investigate: 4 agents failed due to network timeout
|
||||
4. Decision: Wait longer or fix issue before promoting
|
||||
5. Do not promote until "Healthy"
|
||||
```
|
||||
|
||||
### Scenario 3: Emergency Rollback
|
||||
```
|
||||
1. Version 0.6.28 promoted to stable
|
||||
2. After 30 minutes, users report crashes
|
||||
3. Updates page shows "Critical" status
|
||||
4. Crash count increasing
|
||||
5. Immediately rollback with reason: "Crashes on startup"
|
||||
6. Agents downgrade within minutes
|
||||
7. Investigate crash dumps and fix issue
|
||||
```
|
||||
|
||||
### Scenario 4: Force Promotion
|
||||
```
|
||||
1. Beta version shows "Warning" but issue is minor
|
||||
2. Attempt to promote
|
||||
3. System blocks due to health check
|
||||
4. Review the specific warning
|
||||
5. If acceptable (e.g., known cosmetic issue), force promote
|
||||
6. Monitor closely after force promotion
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Table Shows "No rollouts yet"
|
||||
- No versions have been deployed to beta or stable
|
||||
- Build and deploy a version to see it appear
|
||||
- Check server logs for build issues
|
||||
|
||||
### Promote Button Disabled
|
||||
- Version is not on beta channel (only beta can promote)
|
||||
- Health status is not "Healthy"
|
||||
- Hover over button to see tooltip explaining why
|
||||
|
||||
### Auto-Refresh Stopped
|
||||
- Check network connection
|
||||
- Check browser console for errors
|
||||
- Try manual refresh button
|
||||
- Reload page if issues persist
|
||||
|
||||
### Action Failed with Error
|
||||
- Check network connectivity
|
||||
- Verify you're still logged in
|
||||
- Check server status
|
||||
- Review error message in toast notification
|
||||
|
||||
## Security & Permissions
|
||||
|
||||
### Who Can Promote/Rollback?
|
||||
- Currently: All authenticated users
|
||||
- Future: May be restricted to admin role
|
||||
- All actions are logged with user attribution
|
||||
|
||||
### Audit Trail
|
||||
- All promotions logged in database
|
||||
- All rollbacks logged with reason
|
||||
- Timestamps and user IDs recorded
|
||||
- Review audit logs: `rollout_events` table
|
||||
|
||||
## Performance
|
||||
|
||||
### Page Load Time
|
||||
- Initial load: < 2 seconds
|
||||
- Auto-refresh: < 500ms
|
||||
- Action response: < 1 second
|
||||
|
||||
### Mobile Support
|
||||
- Fully responsive design
|
||||
- Table scrolls horizontally on small screens
|
||||
- Dialogs adapt to mobile viewport
|
||||
- All actions work on touch devices
|
||||
|
||||
## Related Pages
|
||||
|
||||
### Agent Detail Page
|
||||
- Shows current version for individual agent
|
||||
- Links to Updates page for version info
|
||||
- Displays update channel (beta/stable)
|
||||
|
||||
### Policies Page
|
||||
- Configure auto-update policies
|
||||
- Set update channel per client/site/agent
|
||||
- Control update timing and windows
|
||||
|
||||
### Logs Page
|
||||
- View update-related log entries
|
||||
- Filter by "update" or "rollout" keywords
|
||||
- See detailed failure messages
|
||||
|
||||
## Support
|
||||
|
||||
### Need Help?
|
||||
- Check server logs: `/api/logs` endpoint
|
||||
- Review agent logs on affected machines
|
||||
- Contact support with version number and error details
|
||||
- Include rollout health metrics in report
|
||||
|
||||
### Report Issues
|
||||
- Use rollback feature to mitigate first
|
||||
- Document reproduction steps
|
||||
- Include success rate and crash count
|
||||
- Provide sample agent IDs affected
|
||||
|
||||
---
|
||||
**Version**: 1.0
|
||||
**Last Updated**: 2026-05-25
|
||||
**Part of**: Safe Agent Rollout System (Phase 5)
|
||||
Reference in New Issue
Block a user