sync: auto-sync from GURU-BEAST-ROG at 2026-06-07 21:26:22
Author: Mike Swanson Machine: GURU-BEAST-ROG Timestamp: 2026-06-07 21:26:22
This commit is contained in:
146
session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md
Normal file
146
session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# Session Log — GuruRMM UI Gaps + Enrollment Audit
|
||||
|
||||
**Date:** 2026-06-07
|
||||
**Topic:** GuruRMM UI/Dashboard — gap verification, batch implementation, hotfix, enrollment audit
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-BEAST-ROG
|
||||
- **Role:** admin
|
||||
|
||||
---
|
||||
|
||||
## Session Summary
|
||||
|
||||
Session started as a GuruRMM UI/Dashboard feedback thread via Discord bot. First pass verified all known UI gaps against the current dashboard source code (submodule was 50 commits behind origin/main at the time of the first coding pass, which caused a merge conflict scenario later). Eight gaps were confirmed open: enrollment management, AgentDetail client row not rendering, dashboard fleet counts computed client-side, tunnel session UI, fleet discovery page, install reporting page, BSOD crashes tab, and agent version history.
|
||||
|
||||
The main implementation batch addressed everything except tunnel (server skeleton is "not yet implemented" and needs xterm.js — deferred). A server-side Rust agent added six new/updated API endpoints: `get_agent` now returns `AgentWithDetails` (client_id + client_name via JOIN), `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, and `DELETE /api/agents/:id/key`. A dashboard agent added two new pages (InstallReports, Discovery), a Crashes tab on AgentDetail, a version history table in the Updates tab, fleet stat wiring on Dashboard.tsx, and a Revoke Key button in SiteDetail. Code review approved all changes. The Gitea agent committed and pushed but the submodule was in detached HEAD 50 commits behind main, requiring a stash/pull/rebase merge that resolved five conflicts including Layout.tsx (upstream had switched to FunctionRail architecture) and Dashboard.tsx (upstream had switched to a Triage/ExceptionStream layout).
|
||||
|
||||
Immediately after the build deployed, Mike reported RMM was down. Diagnosis: the new `get_agent_with_details_by_id` SQL query was missing `a.role_override` — migration 054 had added `role_override: String` (required, non-optional) to `AgentWithDetails` and the existing `get_all_agents_with_details` already included it, but the new single-agent function was written from the pre-054 SQL. The `list_agents` endpoint continued working while `GET /api/agents/:id` returned 500. Hotfix was a one-line SQL change (`a.role_override` added), committed and pushed immediately (`6faa382`).
|
||||
|
||||
Enrollment work followed. Investigation revealed the `enrolled_agents` table (migration 012) already stored everything needed — enrolled_at, last_seen, ip_address, os_version, revoked — so no new migration was required. The original `DELETE /api/agents/:id/key` implementation only nulled `agents.api_key_hash` (legacy Mode 1/2 key), but WS auth checks `enrolled_agents.agent_key_hash` first (Mode 3 for `agk_` keys), so revocation was incomplete for modern agents. Implementation added four new DB functions, `GET /api/sites/:id/enrolled-agents` with duplicate-hostname annotation, `POST /api/enrolled-agents/:id/revoke`, and fixed `DELETE /api/agents/:id/key` to cascade-revoke both key layers. Dashboard gained an Enrollment tab on SiteDetail with full audit table, status badges, duplicate warnings, per-row revoke with confirm. A second detached-HEAD merge was required; the Gitea agent resolved conflicts but incorrectly kept the old route pointer — caught and fixed manually (`00129af`).
|
||||
|
||||
Session closed with a sync that also deployed Phase 5c of sync.sh (skills mirroring to `~/.claude/skills/`) per a coord message from GURU-5070.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Tunnel deferred:** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented" — wiring the route would expose a non-functional endpoint. Excluded from this batch; needs proper design + xterm.js before implementation.
|
||||
- **No migration for enrollment:** `enrolled_agents` table (migration 012) already had all required fields. The gap was API exposure and UI, not missing data.
|
||||
- **Dual-layer revoke:** `DELETE /api/agents/:id/key` was redesigned to revoke both `agents.api_key_hash` (legacy Mode 1/2) and `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Originally only the legacy field was touched, leaving modern agents able to reconnect.
|
||||
- **Duplicate hostname detection via two queries:** Rather than a SQL window function, `list_site_enrollments` does a second query for `COUNT(*) > 1` grouped hostnames, then annotates in Rust. Simpler and safer with runtime sqlx (no compile-time macro).
|
||||
- **`is_duplicate_hostname` scoped to active rows only:** Only non-revoked enrollments count toward the duplicate flag. A revoked + active pair is not flagged — the revocation resolved the ambiguity.
|
||||
- **Per-row revoke vs. bulk revoke:** Added `POST /api/enrolled-agents/:id/revoke` for targeted per-enrollment revocation (with cascade to `agents.api_key_hash`). The existing `DELETE /api/agents/:id/key` is the bulk "revoke everything for this agent" path.
|
||||
|
||||
---
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **Submodule detached HEAD (twice):** The submodule was 50 commits behind `origin/main` each time the coding agents worked on it. Required stash → checkout main → pull → stash pop → conflict resolution. Upstream had refactored Dashboard.tsx to a Triage/ExceptionStream layout and Layout.tsx to FunctionRail/InfrastructureSpine — the coding agents worked on the old layout. Gitea agent resolved both merges.
|
||||
- **`role_override` field missing from new SQL query:** `get_agent_with_details_by_id` was copied from pre-migration-054 SQL. Migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and updated `get_all_agents_with_details` but not our new function. `sqlx::FromRow` panics at runtime when a required field has no column. Caused production 500s on all `GET /api/agents/:id` calls. Fixed in hotfix commit `6faa382`.
|
||||
- **Route pointer left on old handler after merge:** The Gitea agent's conflict resolution for mod.rs kept the original `revoke_agent_key` route (single-layer revoke) rather than switching to `revoke_agent_key_handler` (dual-layer). Caught by manual review, fixed in `00129af`.
|
||||
- **SSH to production server unavailable from BEAST:** `known_hosts` file issue prevented SSH key auth during incident diagnosis. Worked around by getting a JWT token via API and testing endpoints directly with curl.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
**GuruRMM submodule (`projects/msp-tools/guru-rmm/`):**
|
||||
|
||||
Server files modified:
|
||||
- `server/src/db/agents.rs` — `get_agent_with_details_by_id`, `revoke_agent_key` (DB functions)
|
||||
- `server/src/db/enroll.rs` — `get_enrollment_by_id`, `revoke_single_enrollment`, `list_enrollments_for_site`, `duplicate_active_hostnames` (4 new DB functions)
|
||||
- `server/src/db/discovery.rs` — `DiscoveredDeviceWithContext`, `list_all_discovered_devices`
|
||||
- `server/src/db/updates.rs` — `Serialize` derive added to `AgentUpdateRecord`
|
||||
- `server/src/db/bsod_events.rs` — removed stale `#[allow(dead_code)]`
|
||||
- `server/src/api/agents.rs` — updated `get_agent` handler; added `list_bsod_events`, `get_agent_version_history`, `revoke_agent_key` (old, kept), `revoke_agent_key_handler` (new dual-revoke)
|
||||
- `server/src/api/enroll.rs` — `EnrollmentRecord` struct, `list_site_enrollments`, `revoke_enrollment` handlers
|
||||
- `server/src/api/install_report.rs` — `InstallReport` struct, `list_install_reports`, `get_install_report` handlers
|
||||
- `server/src/api/discovery.rs` — `list_all_devices` handler
|
||||
- `server/src/api/mod.rs` — 10 new routes wired
|
||||
|
||||
Dashboard files modified:
|
||||
- `dashboard/src/api/client.ts` — `AgentStats`, `BsodEvent`, `AgentUpdateRecord`, `InstallReport`, `DiscoveredDeviceWithContext`, `EnrollmentRecord` interfaces; `agentsApi` extensions; `installReportsApi`, `discoveryFleetApi`, `enrollApi` namespaces
|
||||
- `dashboard/src/pages/AgentDetail.tsx` — Crashes tab, version history in Updates tab
|
||||
- `dashboard/src/pages/Dashboard.tsx` — fleet stats wired to `/agents/stats`
|
||||
- `dashboard/src/pages/SiteDetail.tsx` — Revoke Key button + Enrollment tab
|
||||
- `dashboard/src/components/Tabs.tsx` — optional `badge` prop
|
||||
- `dashboard/src/components/FunctionRail.tsx` — Install Reports + Fleet Discovery nav links
|
||||
- `dashboard/src/App.tsx` — `/install-reports`, `/discovery` routes
|
||||
|
||||
Dashboard files created:
|
||||
- `dashboard/src/pages/InstallReports.tsx`
|
||||
- `dashboard/src/pages/Discovery.tsx`
|
||||
|
||||
**ClaudeTools repo:**
|
||||
- `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` — updated to mark 8 items complete, detail enrollment partial status
|
||||
|
||||
---
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- GuruRMM API admin: `claude-api@azcomputerguru.com` / vault: `infrastructure/gururmm-server.sops.yaml` field `admin-password`
|
||||
- SSH: `guru@172.16.3.30` — SSH key auth failing from BEAST (known_hosts issue); password in vault `infrastructure/gururmm-server.sops.yaml` field `password`
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- **GuruRMM server:** `172.16.3.30:3001` (Rust/Axum, systemd `gururmm-server`)
|
||||
- **Dashboard:** `https://rmm.azcomputerguru.com` (nginx, `/var/www/gururmm/dashboard/`)
|
||||
- **Gitea:** `https://git.azcomputerguru.com/azcomputerguru/gururmm` (repo: `azcomputerguru/gururmm`)
|
||||
- **Build pipeline webhook:** `172.16.3.30:9000` (not reachable from BEAST — Gitea triggers it)
|
||||
- **Coord API:** `http://172.16.3.30:8001/api/coord`
|
||||
|
||||
---
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
```bash
|
||||
# Verified API server up during incident
|
||||
curl -s "http://172.16.3.30:3001/api/auth/me"
|
||||
# → "Missing authorization header" (server alive)
|
||||
|
||||
# Confirmed new build deployed (new endpoint returns 200)
|
||||
curl -sv "http://172.16.3.30:3001/api/agents/<id>/bsod-events" -H "Authorization: Bearer $TOKEN"
|
||||
# → HTTP/1.1 200 OK
|
||||
|
||||
# Confirmed bug: get_agent returning 500
|
||||
curl -s "http://172.16.3.30:3001/api/agents/<id>" -H "Authorization: Bearer $TOKEN"
|
||||
# → "Internal server error"
|
||||
|
||||
# Hotfix commits (gururmm submodule)
|
||||
# 5854c63 — main UI gap batch
|
||||
# 6faa382 — hotfix: role_override missing from get_agent_with_details_by_id
|
||||
# 49a7109 — enrollment audit batch
|
||||
# 00129af — fix: wire revoke_agent_key_handler to route
|
||||
|
||||
# Coord message handled: GURU-5070 skills empty, Phase 5c fix
|
||||
# BEAST sync confirmed 19 skills deployed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **Tunnel Session Management (P2):** Server skeleton "not yet implemented". Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort.
|
||||
- **Enrollment audit log (enrollment table) — key status column:** The `enrolled_agents` table doesn't expose a "key_prefix" field (key hash is intentionally hidden). The UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of key hash as a non-sensitive display field in a future pass.
|
||||
- **SSH from BEAST to production:** `known_hosts` file issue prevents SSH key auth. Should be fixed for incident response capability. Likely needs the production server's host key added to BEAST's known_hosts.
|
||||
- **Tunnel (deferred):** See above.
|
||||
- **Documentation (P3):** User guide, inline help tooltips — still not started.
|
||||
- **`revoke_agent_key` (old handler):** Dead code now that route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass to avoid confusion.
|
||||
|
||||
---
|
||||
|
||||
## Reference Information
|
||||
|
||||
- GuruRMM wiki: `wiki/projects/gururmm.md`
|
||||
- UI gaps doc: `projects/msp-tools/guru-rmm/docs/UI_GAPS.md`
|
||||
- enrolled_agents migration: `server/migrations/012_enrolled_agents.sql`
|
||||
- Key commits:
|
||||
- `5854c63` — UI gap batch (7 server endpoints, 2 new pages, 5 dashboard enhancements)
|
||||
- `6faa382` — hotfix role_override in get_agent SQL
|
||||
- `49a7109` — enrollment audit batch
|
||||
- `00129af` — fix route pointer to dual-revoke handler
|
||||
- Coord message addressed: GURU-5070 skills Phase 5c fix (commit `62fed033`)
|
||||
Reference in New Issue
Block a user