|
|
|
|
@@ -902,11 +902,11 @@ git submodule update -- projects/msp-tools/guru-rmm
|
|
|
|
|
|
|
|
|
|
Implemented Phases 1-3 of the GuruRMM Safe Agent Update Rollout System to eliminate production risk from auto-deployed updates. The system introduces a beta-first deployment model where all new agent builds default to a beta channel and require manual promotion before reaching stable production clients.
|
|
|
|
|
|
|
|
|
|
Phase 1 modified the build pipeline on Saturn (172.16.3.30) by adding beta channel marking to both `/opt/gururmm/build-linux.sh` and `/opt/gururmm/build-windows.sh`. After code signing and checksum generation, the scripts now create `.channel` sidecar files containing "beta" for every binary. Triggered test build v0.6.41 successfully created 6 channel files (2 Linux amd64, 4 Windows amd64/arm64/base MSI). The existing scanner already supported reading these files from previous work.
|
|
|
|
|
Phase 1 modified the build pipeline on gururmm-build (172.16.3.30) by adding beta channel marking to both `/opt/gururmm/build-linux.sh` and `/opt/gururmm/build-windows.sh`. After code signing and checksum generation, the scripts now create `.channel` sidecar files containing "beta" for every binary. Triggered test build v0.6.41 successfully created 6 channel files (2 Linux amd64, 4 Windows amd64/arm64/base MSI). The existing scanner already supported reading these files from previous work.
|
|
|
|
|
|
|
|
|
|
Phase 2 created database migration 046_safe_rollout.sql with three new tables: update_rollouts (tracks promotion state per version), update_health_metrics (aggregates success/failure/crash rates), and agent_update_events (detailed timeline with JSONB metadata). Applied migration to PostgreSQL on Saturn with 5 custom indexes for efficient queries. Resolved migration numbering conflict (originally 045, renamed to 046).
|
|
|
|
|
Phase 2 created database migration 046_safe_rollout.sql with three new tables: update_rollouts (tracks promotion state per version), update_health_metrics (aggregates success/failure/crash rates), and agent_update_events (detailed timeline with JSONB metadata). Applied migration to PostgreSQL on gururmm-build with 5 custom indexes for efficient queries. Resolved migration numbering conflict (originally 045, renamed to 046).
|
|
|
|
|
|
|
|
|
|
Phase 3 implemented the health monitoring system with crash detection. Created `server/src/updates/health.rs` (270 lines) containing a background task that runs every 60 seconds to detect agents that go offline within 5 minutes of receiving an update. The system calculates health metrics (crash rate, failure rate) and evaluates status using defined thresholds: critical (>25% crash OR >50% failure), warning (>10% crash OR >25% failure), healthy (100% success, ≥5 attempts, no crashes), unknown (<5 attempts). Integrated event logging into `server/src/ws/mod.rs` at two update dispatch points and spawned the monitor task in `server/src/main.rs`. Successfully compiled on Saturn after resolving Option type handling and tuple destructuring errors. Server binary built cleanly (13 MB, 4m8s build time).
|
|
|
|
|
Phase 3 implemented the health monitoring system with crash detection. Created `server/src/updates/health.rs` (270 lines) containing a background task that runs every 60 seconds to detect agents that go offline within 5 minutes of receiving an update. The system calculates health metrics (crash rate, failure rate) and evaluates status using defined thresholds: critical (>25% crash OR >50% failure), warning (>10% crash OR >25% failure), healthy (100% success, ≥5 attempts, no crashes), unknown (<5 attempts). Integrated event logging into `server/src/ws/mod.rs` at two update dispatch points and spawned the monitor task in `server/src/main.rs`. Successfully compiled on gururmm-build after resolving Option type handling and tuple destructuring errors. Server binary built cleanly (13 MB, 4m8s build time).
|
|
|
|
|
|
|
|
|
|
Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints), dashboard UI (Updates.tsx with table view and controls), and end-to-end testing. The foundation is now in place for safe, controlled agent rollouts with automatic crash detection and manual promotion gating.
|
|
|
|
|
|
|
|
|
|
@@ -924,8 +924,8 @@ Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints),
|
|
|
|
|
- **Option<String> vs String type mismatch**: Database schema has `os_type` as NOT NULL String but `version_to` and `architecture` as nullable. Fixed tuple destructuring by removing os_type from Option check and passing as reference.
|
|
|
|
|
- **Option<i32> arithmetic**: Query results return Option<i32> for counter fields. Added `.unwrap_or(0)` before all comparisons and f64 casts.
|
|
|
|
|
- **Build script structure changed**: Plan referenced deprecated `/opt/gururmm/build-agents.sh` wrapper. Modified `build-linux.sh` and `build-windows.sh` directly instead.
|
|
|
|
|
- **PostgreSQL connection refused**: Tried using 172.16.3.30:5432 but PostgreSQL listens only on localhost. Changed DATABASE_URL to localhost:5432 when running sqlx prepare on Saturn.
|
|
|
|
|
- **sqlx offline cache missing**: New queries in health.rs not in `.sqlx/` cache. Ran `cargo sqlx prepare --workspace` on Saturn to generate cached query data.
|
|
|
|
|
- **PostgreSQL connection refused**: Tried using 172.16.3.30:5432 but PostgreSQL listens only on localhost. Changed DATABASE_URL to localhost:5432 when running sqlx prepare on gururmm-build.
|
|
|
|
|
- **sqlx offline cache missing**: New queries in health.rs not in `.sqlx/` cache. Ran `cargo sqlx prepare --workspace` on gururmm-build to generate cached query data.
|
|
|
|
|
- **Merge conflicts in ws/mod.rs**: Local health logging changes conflicted with upstream improvements to update re-dispatch logic. Kept upstream's cleaner flag-based implementation and added health logging calls to both dispatch points.
|
|
|
|
|
|
|
|
|
|
## Configuration Changes
|
|
|
|
|
@@ -947,18 +947,18 @@ Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints),
|
|
|
|
|
|
|
|
|
|
## Credentials & Secrets
|
|
|
|
|
|
|
|
|
|
No new credentials created or discovered. Used existing Saturn SSH access (azcomputerguru@172.16.3.30) and PostgreSQL connection (localhost:5432, credentials unchanged).
|
|
|
|
|
No new credentials created or discovered. Used existing gururmm-build SSH access (azcomputerguru@172.16.3.30) and PostgreSQL connection (localhost:5432, credentials unchanged).
|
|
|
|
|
|
|
|
|
|
## Infrastructure & Servers
|
|
|
|
|
|
|
|
|
|
**Saturn (172.16.3.30):**
|
|
|
|
|
**gururmm-build (172.16.3.30):**
|
|
|
|
|
- Build server: Linux, hosts `/opt/gururmm/build-linux.sh` and `build-windows.sh`
|
|
|
|
|
- Downloads directory: `/var/www/gururmm/downloads/`
|
|
|
|
|
- PostgreSQL: localhost:5432, database `gururmm_production`
|
|
|
|
|
- GuruRMM server: systemd service `gururmm-server.service`, binary at `/opt/gururmm/gururmm-server`
|
|
|
|
|
- Logs: `/var/log/gururmm-build.log` (build output), server logs via journalctl
|
|
|
|
|
|
|
|
|
|
**New Database Tables (Saturn PostgreSQL):**
|
|
|
|
|
**New Database Tables (gururmm-build PostgreSQL):**
|
|
|
|
|
- `update_rollouts` - Promotion tracking (version, os, arch, channel, promoted_at, promoted_by)
|
|
|
|
|
- `update_health_metrics` - Health aggregation (total_attempts, successful_updates, failed_updates, rollback_count, crash_count, health_status)
|
|
|
|
|
- `agent_update_events` - Event timeline (agent_id, update_id, event_type, version_from, version_to, details JSONB)
|
|
|
|
|
@@ -1085,7 +1085,7 @@ if let (Some(version), Some(arch)) = (
|
|
|
|
|
- 11:15 PT - Phase 3 started, created health.rs module
|
|
|
|
|
- 11:45 PT - Resolved Option type errors, fixed tuple destructuring
|
|
|
|
|
- 12:10 PT - Resolved merge conflicts in ws/mod.rs
|
|
|
|
|
- 12:25 PT - Final compilation successful on Saturn
|
|
|
|
|
- 12:25 PT - Final compilation successful on gururmm-build
|
|
|
|
|
- 12:40 PT - Session log written, ready to sync
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1341,7 +1341,7 @@ Phase 6 created comprehensive testing framework with PHASE_6_TEST_PLAN.md (853 l
|
|
|
|
|
|
|
|
|
|
Session also fixed critical coordination messaging bug on this MacBook. The UserPromptSubmit hook was failing because macOS hostname command returns "Mikes-MacBook-Air.local" with .local suffix, but coord messages were addressed to "Mikes-MacBook-Air/claude-main" without suffix. Hook script was querying wrong session ID so messages never displayed. Fixed check-messages.sh to strip .local suffix using bash parameter expansion before building session ID. Verified fix works, sent identity check-in response to GURU-5070 confirming machine identity correct and discrepancy resolved.
|
|
|
|
|
|
|
|
|
|
All six phases now complete. Safe Agent Rollout System is code-complete, documented, and ready for testing when Saturn access available for build verification.
|
|
|
|
|
All six phases now complete. Safe Agent Rollout System is code-complete, documented, and ready for testing when gururmm-build access available for build verification.
|
|
|
|
|
|
|
|
|
|
## Key Decisions
|
|
|
|
|
|
|
|
|
|
@@ -1355,7 +1355,7 @@ All six phases now complete. Safe Agent Rollout System is code-complete, documen
|
|
|
|
|
|
|
|
|
|
## Problems Encountered
|
|
|
|
|
|
|
|
|
|
- **SSH connection failed from MacBook to Saturn**: Permission denied when attempting to run build verification. Likely key-based auth not configured on this machine. Documented that verification and testing require Saturn access - can be done from another machine with working SSH.
|
|
|
|
|
- **SSH connection failed from MacBook to gururmm-build**: Permission denied when attempting to run build verification. Likely key-based auth not configured on this machine. Documented that verification and testing require gururmm-build access - can be done from another machine with working SSH.
|
|
|
|
|
- **Coordination messages not displaying**: Hook script using full hostname "Mikes-MacBook-Air.local" but messages addressed to "Mikes-MacBook-Air". Fixed by stripping .local suffix in check-messages.sh before building session ID. Tested and confirmed working.
|
|
|
|
|
- **Documentation file location conflict**: Phase 5 implementation agent created documentation files in ClaudeTools root, but GURU-KALI sync removed them (likely moved to proper project location). Normal collaboration sync conflict - files tracked in correct location now.
|
|
|
|
|
|
|
|
|
|
@@ -1384,11 +1384,11 @@ All six phases now complete. Safe Agent Rollout System is code-complete, documen
|
|
|
|
|
|
|
|
|
|
## Credentials & Secrets
|
|
|
|
|
|
|
|
|
|
No new credentials created or discovered. Used existing GuruRMM JWT authentication (AuthUser extractor) for API endpoint security. Saturn SSH access uses existing azcomputerguru account.
|
|
|
|
|
No new credentials created or discovered. Used existing GuruRMM JWT authentication (AuthUser extractor) for API endpoint security. gururmm-build SSH access uses existing azcomputerguru account.
|
|
|
|
|
|
|
|
|
|
## Infrastructure & Servers
|
|
|
|
|
|
|
|
|
|
**Saturn (172.16.3.30):**
|
|
|
|
|
**gururmm-build (172.16.3.30):**
|
|
|
|
|
- GuruRMM server: Rust/Axum @ port 3001
|
|
|
|
|
- PostgreSQL: localhost:5432, database gururmm_production
|
|
|
|
|
- Binaries: /opt/gururmm/gururmm-server (server), /opt/gururmm/dashboard/dist (frontend)
|
|
|
|
|
@@ -1455,7 +1455,7 @@ projects/msp-tools/guru-rmm/
|
|
|
|
|
|
|
|
|
|
## Pending / Incomplete Tasks
|
|
|
|
|
|
|
|
|
|
**Immediate (requires Saturn SSH access):**
|
|
|
|
|
**Immediate (requires gururmm-build SSH access):**
|
|
|
|
|
1. Run verification script: `ssh azcomputerguru@172.16.3.30 'bash /path/to/verify-rollout-system.sh'`
|
|
|
|
|
2. Build server: `cd /opt/gururmm/server && cargo build --release --features production`
|
|
|
|
|
3. Build dashboard: `cd /opt/gururmm/dashboard && npm run build`
|
|
|
|
|
@@ -1546,7 +1546,7 @@ projects/msp-tools/guru-rmm/
|
|
|
|
|
- ✅ Phase 4: Promotion/rollback API endpoints
|
|
|
|
|
- ✅ Phase 5: Dashboard UI with full controls
|
|
|
|
|
- ✅ Phase 6: Test plan and verification script
|
|
|
|
|
- ⏳ Testing: Awaiting Saturn access for build verification
|
|
|
|
|
- ⏳ Testing: Awaiting gururmm-build access for build verification
|
|
|
|
|
- ⏳ Production: Awaiting test completion and sign-off
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|