fix: Correct server name from Saturn to gururmm-build

Saturn is decommissioned. The GuruRMM build server at 172.16.3.30
is correctly named 'gururmm-build'.

Also fixed wiki standards template that incorrectly listed Neptune
as 172.16.3.30. Neptune is actually the Exchange server at Dataforth
(172.16.3.11), not the GuruRMM build server.

Updated files:
- PHASE_6_TEST_PLAN.md (all Saturn references)
- verify-rollout-system.sh (comments)
- session-logs/2026-05-25-session.md (all Saturn references)
- .claude/specs/wiki-layer/standards.md (Neptune example)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 14:23:52 -07:00
parent 30cb6a8f2b
commit 9bee713d9c
4 changed files with 18 additions and 18 deletions

View File

@@ -213,7 +213,7 @@ Last updated: YYYY-MM-DD
## Systems
| Article | Summary | Last Compiled |
|---|---|---|
| [Neptune](systems/neptune.md) | Primary server, 172.16.3.30, MariaDB + API | 2026-05-24 |
| [gururmm-build](systems/gururmm-build.md) | GuruRMM build server, 172.16.3.30, MariaDB + ClaudeTools API | 2026-05-24 |
## Patterns
| Article | Summary | Last Compiled |

View File

@@ -10,7 +10,7 @@
## Prerequisites
### Environment Setup
- [ ] SSH access to Saturn (172.16.3.30)
- [ ] SSH access to gururmm-build (172.16.3.30)
- [ ] Access to GuruRMM dashboard (https://rmm.azcomputerguru.com)
- [ ] JWT token for API testing
- [ ] At least 2 test agents (GURU-KALI, GURU-5070 recommended)

View File

@@ -902,11 +902,11 @@ git submodule update -- projects/msp-tools/guru-rmm
Implemented Phases 1-3 of the GuruRMM Safe Agent Update Rollout System to eliminate production risk from auto-deployed updates. The system introduces a beta-first deployment model where all new agent builds default to a beta channel and require manual promotion before reaching stable production clients.
Phase 1 modified the build pipeline on Saturn (172.16.3.30) by adding beta channel marking to both `/opt/gururmm/build-linux.sh` and `/opt/gururmm/build-windows.sh`. After code signing and checksum generation, the scripts now create `.channel` sidecar files containing "beta" for every binary. Triggered test build v0.6.41 successfully created 6 channel files (2 Linux amd64, 4 Windows amd64/arm64/base MSI). The existing scanner already supported reading these files from previous work.
Phase 1 modified the build pipeline on gururmm-build (172.16.3.30) by adding beta channel marking to both `/opt/gururmm/build-linux.sh` and `/opt/gururmm/build-windows.sh`. After code signing and checksum generation, the scripts now create `.channel` sidecar files containing "beta" for every binary. Triggered test build v0.6.41 successfully created 6 channel files (2 Linux amd64, 4 Windows amd64/arm64/base MSI). The existing scanner already supported reading these files from previous work.
Phase 2 created database migration 046_safe_rollout.sql with three new tables: update_rollouts (tracks promotion state per version), update_health_metrics (aggregates success/failure/crash rates), and agent_update_events (detailed timeline with JSONB metadata). Applied migration to PostgreSQL on Saturn with 5 custom indexes for efficient queries. Resolved migration numbering conflict (originally 045, renamed to 046).
Phase 2 created database migration 046_safe_rollout.sql with three new tables: update_rollouts (tracks promotion state per version), update_health_metrics (aggregates success/failure/crash rates), and agent_update_events (detailed timeline with JSONB metadata). Applied migration to PostgreSQL on gururmm-build with 5 custom indexes for efficient queries. Resolved migration numbering conflict (originally 045, renamed to 046).
Phase 3 implemented the health monitoring system with crash detection. Created `server/src/updates/health.rs` (270 lines) containing a background task that runs every 60 seconds to detect agents that go offline within 5 minutes of receiving an update. The system calculates health metrics (crash rate, failure rate) and evaluates status using defined thresholds: critical (>25% crash OR >50% failure), warning (>10% crash OR >25% failure), healthy (100% success, ≥5 attempts, no crashes), unknown (<5 attempts). Integrated event logging into `server/src/ws/mod.rs` at two update dispatch points and spawned the monitor task in `server/src/main.rs`. Successfully compiled on Saturn after resolving Option type handling and tuple destructuring errors. Server binary built cleanly (13 MB, 4m8s build time).
Phase 3 implemented the health monitoring system with crash detection. Created `server/src/updates/health.rs` (270 lines) containing a background task that runs every 60 seconds to detect agents that go offline within 5 minutes of receiving an update. The system calculates health metrics (crash rate, failure rate) and evaluates status using defined thresholds: critical (>25% crash OR >50% failure), warning (>10% crash OR >25% failure), healthy (100% success, ≥5 attempts, no crashes), unknown (<5 attempts). Integrated event logging into `server/src/ws/mod.rs` at two update dispatch points and spawned the monitor task in `server/src/main.rs`. Successfully compiled on gururmm-build after resolving Option type handling and tuple destructuring errors. Server binary built cleanly (13 MB, 4m8s build time).
Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints), dashboard UI (Updates.tsx with table view and controls), and end-to-end testing. The foundation is now in place for safe, controlled agent rollouts with automatic crash detection and manual promotion gating.
@@ -924,8 +924,8 @@ Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints),
- **Option<String> vs String type mismatch**: Database schema has `os_type` as NOT NULL String but `version_to` and `architecture` as nullable. Fixed tuple destructuring by removing os_type from Option check and passing as reference.
- **Option<i32> arithmetic**: Query results return Option<i32> for counter fields. Added `.unwrap_or(0)` before all comparisons and f64 casts.
- **Build script structure changed**: Plan referenced deprecated `/opt/gururmm/build-agents.sh` wrapper. Modified `build-linux.sh` and `build-windows.sh` directly instead.
- **PostgreSQL connection refused**: Tried using 172.16.3.30:5432 but PostgreSQL listens only on localhost. Changed DATABASE_URL to localhost:5432 when running sqlx prepare on Saturn.
- **sqlx offline cache missing**: New queries in health.rs not in `.sqlx/` cache. Ran `cargo sqlx prepare --workspace` on Saturn to generate cached query data.
- **PostgreSQL connection refused**: Tried using 172.16.3.30:5432 but PostgreSQL listens only on localhost. Changed DATABASE_URL to localhost:5432 when running sqlx prepare on gururmm-build.
- **sqlx offline cache missing**: New queries in health.rs not in `.sqlx/` cache. Ran `cargo sqlx prepare --workspace` on gururmm-build to generate cached query data.
- **Merge conflicts in ws/mod.rs**: Local health logging changes conflicted with upstream improvements to update re-dispatch logic. Kept upstream's cleaner flag-based implementation and added health logging calls to both dispatch points.
## Configuration Changes
@@ -947,18 +947,18 @@ Phases 4-6 remain pending: promotion/rollback API endpoints (3 REST endpoints),
## Credentials & Secrets
No new credentials created or discovered. Used existing Saturn SSH access (azcomputerguru@172.16.3.30) and PostgreSQL connection (localhost:5432, credentials unchanged).
No new credentials created or discovered. Used existing gururmm-build SSH access (azcomputerguru@172.16.3.30) and PostgreSQL connection (localhost:5432, credentials unchanged).
## Infrastructure & Servers
**Saturn (172.16.3.30):**
**gururmm-build (172.16.3.30):**
- Build server: Linux, hosts `/opt/gururmm/build-linux.sh` and `build-windows.sh`
- Downloads directory: `/var/www/gururmm/downloads/`
- PostgreSQL: localhost:5432, database `gururmm_production`
- GuruRMM server: systemd service `gururmm-server.service`, binary at `/opt/gururmm/gururmm-server`
- Logs: `/var/log/gururmm-build.log` (build output), server logs via journalctl
**New Database Tables (Saturn PostgreSQL):**
**New Database Tables (gururmm-build PostgreSQL):**
- `update_rollouts` - Promotion tracking (version, os, arch, channel, promoted_at, promoted_by)
- `update_health_metrics` - Health aggregation (total_attempts, successful_updates, failed_updates, rollback_count, crash_count, health_status)
- `agent_update_events` - Event timeline (agent_id, update_id, event_type, version_from, version_to, details JSONB)
@@ -1085,7 +1085,7 @@ if let (Some(version), Some(arch)) = (
- 11:15 PT - Phase 3 started, created health.rs module
- 11:45 PT - Resolved Option type errors, fixed tuple destructuring
- 12:10 PT - Resolved merge conflicts in ws/mod.rs
- 12:25 PT - Final compilation successful on Saturn
- 12:25 PT - Final compilation successful on gururmm-build
- 12:40 PT - Session log written, ready to sync
@@ -1341,7 +1341,7 @@ Phase 6 created comprehensive testing framework with PHASE_6_TEST_PLAN.md (853 l
Session also fixed critical coordination messaging bug on this MacBook. The UserPromptSubmit hook was failing because macOS hostname command returns "Mikes-MacBook-Air.local" with .local suffix, but coord messages were addressed to "Mikes-MacBook-Air/claude-main" without suffix. Hook script was querying wrong session ID so messages never displayed. Fixed check-messages.sh to strip .local suffix using bash parameter expansion before building session ID. Verified fix works, sent identity check-in response to GURU-5070 confirming machine identity correct and discrepancy resolved.
All six phases now complete. Safe Agent Rollout System is code-complete, documented, and ready for testing when Saturn access available for build verification.
All six phases now complete. Safe Agent Rollout System is code-complete, documented, and ready for testing when gururmm-build access available for build verification.
## Key Decisions
@@ -1355,7 +1355,7 @@ All six phases now complete. Safe Agent Rollout System is code-complete, documen
## Problems Encountered
- **SSH connection failed from MacBook to Saturn**: Permission denied when attempting to run build verification. Likely key-based auth not configured on this machine. Documented that verification and testing require Saturn access - can be done from another machine with working SSH.
- **SSH connection failed from MacBook to gururmm-build**: Permission denied when attempting to run build verification. Likely key-based auth not configured on this machine. Documented that verification and testing require gururmm-build access - can be done from another machine with working SSH.
- **Coordination messages not displaying**: Hook script using full hostname "Mikes-MacBook-Air.local" but messages addressed to "Mikes-MacBook-Air". Fixed by stripping .local suffix in check-messages.sh before building session ID. Tested and confirmed working.
- **Documentation file location conflict**: Phase 5 implementation agent created documentation files in ClaudeTools root, but GURU-KALI sync removed them (likely moved to proper project location). Normal collaboration sync conflict - files tracked in correct location now.
@@ -1384,11 +1384,11 @@ All six phases now complete. Safe Agent Rollout System is code-complete, documen
## Credentials & Secrets
No new credentials created or discovered. Used existing GuruRMM JWT authentication (AuthUser extractor) for API endpoint security. Saturn SSH access uses existing azcomputerguru account.
No new credentials created or discovered. Used existing GuruRMM JWT authentication (AuthUser extractor) for API endpoint security. gururmm-build SSH access uses existing azcomputerguru account.
## Infrastructure & Servers
**Saturn (172.16.3.30):**
**gururmm-build (172.16.3.30):**
- GuruRMM server: Rust/Axum @ port 3001
- PostgreSQL: localhost:5432, database gururmm_production
- Binaries: /opt/gururmm/gururmm-server (server), /opt/gururmm/dashboard/dist (frontend)
@@ -1455,7 +1455,7 @@ projects/msp-tools/guru-rmm/
## Pending / Incomplete Tasks
**Immediate (requires Saturn SSH access):**
**Immediate (requires gururmm-build SSH access):**
1. Run verification script: `ssh azcomputerguru@172.16.3.30 'bash /path/to/verify-rollout-system.sh'`
2. Build server: `cd /opt/gururmm/server && cargo build --release --features production`
3. Build dashboard: `cd /opt/gururmm/dashboard && npm run build`
@@ -1546,7 +1546,7 @@ projects/msp-tools/guru-rmm/
- ✅ Phase 4: Promotion/rollback API endpoints
- ✅ Phase 5: Dashboard UI with full controls
- ✅ Phase 6: Test plan and verification script
- ⏳ Testing: Awaiting Saturn access for build verification
- ⏳ Testing: Awaiting gururmm-build access for build verification
- ⏳ Production: Awaiting test completion and sign-off

View File

@@ -1,6 +1,6 @@
#!/usr/bin/env bash
# Verification script for Safe Agent Rollout System
# Run on Saturn (172.16.3.30) to verify Phase 1-5 implementation
# Run on gururmm-build (172.16.3.30) to verify Phase 1-5 implementation
set -e