From 9bee713d9c89059914d380fc73d33e0207bcd3ac Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Mon, 25 May 2026 14:23:52 -0700 Subject: [PATCH] fix: Correct server name from Saturn to gururmm-build Saturn is decommissioned. The GuruRMM build server at 172.16.3.30 is correctly named 'gururmm-build'. Also fixed wiki standards template that incorrectly listed Neptune as 172.16.3.30. Neptune is actually the Exchange server at Dataforth (172.16.3.11), not the GuruRMM build server. Updated files: - PHASE_6_TEST_PLAN.md (all Saturn references) - verify-rollout-system.sh (comments) - session-logs/2026-05-25-session.md (all Saturn references) - .claude/specs/wiki-layer/standards.md (Neptune example) Co-Authored-By: Claude Sonnet 4.5 --- .claude/specs/wiki-layer/standards.md | 2 +- PHASE_6_TEST_PLAN.md | 2 +- session-logs/2026-05-25-session.md | 30 +++++++++++++-------------- verify-rollout-system.sh | 2 +- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/.claude/specs/wiki-layer/standards.md b/.claude/specs/wiki-layer/standards.md index e8c5e0f..649f84d 100644 --- a/.claude/specs/wiki-layer/standards.md +++ b/.claude/specs/wiki-layer/standards.md @@ -213,7 +213,7 @@ Last updated: YYYY-MM-DD ## Systems | Article | Summary | Last Compiled | |---|---|---| -| [Neptune](systems/neptune.md) | Primary server, 172.16.3.30, MariaDB + API | 2026-05-24 | +| [gururmm-build](systems/gururmm-build.md) | GuruRMM build server, 172.16.3.30, MariaDB + ClaudeTools API | 2026-05-24 | ## Patterns | Article | Summary | Last Compiled | diff --git a/PHASE_6_TEST_PLAN.md b/PHASE_6_TEST_PLAN.md index e4133bd..022f5d6 100644 --- a/PHASE_6_TEST_PLAN.md +++ b/PHASE_6_TEST_PLAN.md @@ -10,7 +10,7 @@ ## Prerequisites ### Environment Setup -- [ ] SSH access to Saturn (172.16.3.30) +- [ ] SSH access to gururmm-build (172.16.3.30) - [ ] Access to GuruRMM dashboard (https://rmm.azcomputerguru.com) - [ ] JWT token for API testing - [ ] At least 2 test agents (GURU-KALI, GURU-5070 recommended) diff --git a/session-logs/2026-05-25-session.md b/session-logs/2026-05-25-session.md index 2afffa8..5f978b0 100644 --- a/session-logs/2026-05-25-session.md +++ b/session-logs/2026-05-25-session.md @@ -902,11 +902,11 @@ git submodule update -- projects/msp-tools/guru-rmm Implemented Phases 1-3 of the GuruRMM Safe Agent Update Rollout System to eliminate production risk from auto-deployed updates. The system introduces a beta-first deployment model where all new agent builds default to a beta channel and require manual promotion before reaching stable production clients. -Phase 1 modified the build pipeline on Saturn (172.16.3.30) by adding beta channel marking to both `/opt/gururmm/build-linux.sh` and `/opt/gururmm/build-windows.sh`. After code signing and checksum generation, the scripts now create `.channel` sidecar files containing "beta" for every binary. Triggered test build v0.6.41 successfully created 6 channel files (2 Linux amd64, 4 Windows amd64/arm64/base MSI). The existing scanner already supported reading these files from previous work. +Phase 1 modified the build pipeline on gururmm-build (172.16.3.30) by adding beta channel marking to both `/opt/gururmm/build-linux.sh` and `/opt/gururmm/build-windows.sh`. After code signing and checksum generation, the scripts now create `.channel` sidecar files containing "beta" for every binary. Triggered test build v0.6.41 successfully created 6 channel files (2 Linux amd64, 4 Windows amd64/arm64/base MSI). The existing scanner already supported reading these files from previous work. -Phase 2 created database migration 046_safe_rollout.sql with three new tables: update_rollouts (tracks promotion state per version), update_health_metrics (aggregates success/failure/crash rates), and agent_update_events (detailed timeline with JSONB metadata). Applied migration to PostgreSQL on Saturn with 5 custom indexes for efficient queries. Resolved migration numbering conflict (originally 045, renamed to 046). +Phase 2 created database migration 046_safe_rollout.sql with three new tables: update_rollouts (tracks promotion state per version), update_health_metrics (aggregates success/failure/crash rates), and agent_update_events (detailed timeline with JSONB metadata). Applied migration to PostgreSQL on gururmm-build with 5 custom indexes for efficient queries. Resolved migration numbering conflict (originally 045, renamed to 046). -Phase 3 implemented the health monitoring system with crash detection. Created `server/src/updates/health.rs` (270 lines) containing a background task that runs every 60 seconds to detect agents that go offline within 5 minutes of receiving an update. The system calculates health metrics (crash rate, failure rate) and evaluates status using defined thresholds: critical (>25% crash OR >50% failure), warning (>10% crash OR >25% failure), healthy (100% success, ≥5 attempts, no crashes), unknown (<5 attempts). Integrated event logging into `server/src/ws/mod.rs` at two update dispatch points and spawned the monitor task in `server/src/main.rs`. Successfully compiled on Saturn after resolving Option type handling and tuple destructuring errors. Server binary built cleanly (13 MB, 4m8s build time). +Phase 3 implemented the health monitoring system with crash detection. Created `server/src/updates/health.rs` (270 lines) containing a background task that runs every 60 seconds to detect agents that go offline within 5 minutes of receiving an update. The system calculates health metrics (crash rate, failure rate) and evaluates status using defined thresholds: critical (>25% crash OR >50% failure), warning (>10% crash OR >25% failure), healthy (100% success, ≥5 attempts, no crashes), unknown (<5 attempts). Integrated event logging into `server/src/ws/mod.rs` at two update dispatch points and spawned the monitor task in `server/src/main.rs`. Successfully compiled on gururmm-build after resolving Option type handling and tuple destructuring errors. Server binary built cleanly (13 MB, 4m8s build time). Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints), dashboard UI (Updates.tsx with table view and controls), and end-to-end testing. The foundation is now in place for safe, controlled agent rollouts with automatic crash detection and manual promotion gating. @@ -924,8 +924,8 @@ Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints), - **Option vs String type mismatch**: Database schema has `os_type` as NOT NULL String but `version_to` and `architecture` as nullable. Fixed tuple destructuring by removing os_type from Option check and passing as reference. - **Option arithmetic**: Query results return Option for counter fields. Added `.unwrap_or(0)` before all comparisons and f64 casts. - **Build script structure changed**: Plan referenced deprecated `/opt/gururmm/build-agents.sh` wrapper. Modified `build-linux.sh` and `build-windows.sh` directly instead. -- **PostgreSQL connection refused**: Tried using 172.16.3.30:5432 but PostgreSQL listens only on localhost. Changed DATABASE_URL to localhost:5432 when running sqlx prepare on Saturn. -- **sqlx offline cache missing**: New queries in health.rs not in `.sqlx/` cache. Ran `cargo sqlx prepare --workspace` on Saturn to generate cached query data. +- **PostgreSQL connection refused**: Tried using 172.16.3.30:5432 but PostgreSQL listens only on localhost. Changed DATABASE_URL to localhost:5432 when running sqlx prepare on gururmm-build. +- **sqlx offline cache missing**: New queries in health.rs not in `.sqlx/` cache. Ran `cargo sqlx prepare --workspace` on gururmm-build to generate cached query data. - **Merge conflicts in ws/mod.rs**: Local health logging changes conflicted with upstream improvements to update re-dispatch logic. Kept upstream's cleaner flag-based implementation and added health logging calls to both dispatch points. ## Configuration Changes @@ -947,18 +947,18 @@ Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints), ## Credentials & Secrets -No new credentials created or discovered. Used existing Saturn SSH access (azcomputerguru@172.16.3.30) and PostgreSQL connection (localhost:5432, credentials unchanged). +No new credentials created or discovered. Used existing gururmm-build SSH access (azcomputerguru@172.16.3.30) and PostgreSQL connection (localhost:5432, credentials unchanged). ## Infrastructure & Servers -**Saturn (172.16.3.30):** +**gururmm-build (172.16.3.30):** - Build server: Linux, hosts `/opt/gururmm/build-linux.sh` and `build-windows.sh` - Downloads directory: `/var/www/gururmm/downloads/` - PostgreSQL: localhost:5432, database `gururmm_production` - GuruRMM server: systemd service `gururmm-server.service`, binary at `/opt/gururmm/gururmm-server` - Logs: `/var/log/gururmm-build.log` (build output), server logs via journalctl -**New Database Tables (Saturn PostgreSQL):** +**New Database Tables (gururmm-build PostgreSQL):** - `update_rollouts` - Promotion tracking (version, os, arch, channel, promoted_at, promoted_by) - `update_health_metrics` - Health aggregation (total_attempts, successful_updates, failed_updates, rollback_count, crash_count, health_status) - `agent_update_events` - Event timeline (agent_id, update_id, event_type, version_from, version_to, details JSONB) @@ -1085,7 +1085,7 @@ if let (Some(version), Some(arch)) = ( - 11:15 PT - Phase 3 started, created health.rs module - 11:45 PT - Resolved Option type errors, fixed tuple destructuring - 12:10 PT - Resolved merge conflicts in ws/mod.rs -- 12:25 PT - Final compilation successful on Saturn +- 12:25 PT - Final compilation successful on gururmm-build - 12:40 PT - Session log written, ready to sync @@ -1341,7 +1341,7 @@ Phase 6 created comprehensive testing framework with PHASE_6_TEST_PLAN.md (853 l Session also fixed critical coordination messaging bug on this MacBook. The UserPromptSubmit hook was failing because macOS hostname command returns "Mikes-MacBook-Air.local" with .local suffix, but coord messages were addressed to "Mikes-MacBook-Air/claude-main" without suffix. Hook script was querying wrong session ID so messages never displayed. Fixed check-messages.sh to strip .local suffix using bash parameter expansion before building session ID. Verified fix works, sent identity check-in response to GURU-5070 confirming machine identity correct and discrepancy resolved. -All six phases now complete. Safe Agent Rollout System is code-complete, documented, and ready for testing when Saturn access available for build verification. +All six phases now complete. Safe Agent Rollout System is code-complete, documented, and ready for testing when gururmm-build access available for build verification. ## Key Decisions @@ -1355,7 +1355,7 @@ All six phases now complete. Safe Agent Rollout System is code-complete, documen ## Problems Encountered -- **SSH connection failed from MacBook to Saturn**: Permission denied when attempting to run build verification. Likely key-based auth not configured on this machine. Documented that verification and testing require Saturn access - can be done from another machine with working SSH. +- **SSH connection failed from MacBook to gururmm-build**: Permission denied when attempting to run build verification. Likely key-based auth not configured on this machine. Documented that verification and testing require gururmm-build access - can be done from another machine with working SSH. - **Coordination messages not displaying**: Hook script using full hostname "Mikes-MacBook-Air.local" but messages addressed to "Mikes-MacBook-Air". Fixed by stripping .local suffix in check-messages.sh before building session ID. Tested and confirmed working. - **Documentation file location conflict**: Phase 5 implementation agent created documentation files in ClaudeTools root, but GURU-KALI sync removed them (likely moved to proper project location). Normal collaboration sync conflict - files tracked in correct location now. @@ -1384,11 +1384,11 @@ All six phases now complete. Safe Agent Rollout System is code-complete, documen ## Credentials & Secrets -No new credentials created or discovered. Used existing GuruRMM JWT authentication (AuthUser extractor) for API endpoint security. Saturn SSH access uses existing azcomputerguru account. +No new credentials created or discovered. Used existing GuruRMM JWT authentication (AuthUser extractor) for API endpoint security. gururmm-build SSH access uses existing azcomputerguru account. ## Infrastructure & Servers -**Saturn (172.16.3.30):** +**gururmm-build (172.16.3.30):** - GuruRMM server: Rust/Axum @ port 3001 - PostgreSQL: localhost:5432, database gururmm_production - Binaries: /opt/gururmm/gururmm-server (server), /opt/gururmm/dashboard/dist (frontend) @@ -1455,7 +1455,7 @@ projects/msp-tools/guru-rmm/ ## Pending / Incomplete Tasks -**Immediate (requires Saturn SSH access):** +**Immediate (requires gururmm-build SSH access):** 1. Run verification script: `ssh azcomputerguru@172.16.3.30 'bash /path/to/verify-rollout-system.sh'` 2. Build server: `cd /opt/gururmm/server && cargo build --release --features production` 3. Build dashboard: `cd /opt/gururmm/dashboard && npm run build` @@ -1546,7 +1546,7 @@ projects/msp-tools/guru-rmm/ - ✅ Phase 4: Promotion/rollback API endpoints - ✅ Phase 5: Dashboard UI with full controls - ✅ Phase 6: Test plan and verification script -- ⏳ Testing: Awaiting Saturn access for build verification +- ⏳ Testing: Awaiting gururmm-build access for build verification - ⏳ Production: Awaiting test completion and sign-off diff --git a/verify-rollout-system.sh b/verify-rollout-system.sh index 314accb..1a88f21 100755 --- a/verify-rollout-system.sh +++ b/verify-rollout-system.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash # Verification script for Safe Agent Rollout System -# Run on Saturn (172.16.3.30) to verify Phase 1-5 implementation +# Run on gururmm-build (172.16.3.30) to verify Phase 1-5 implementation set -e