sync: auto-sync from GURU-BEAST-ROG at 2026-06-07 21:26:22

Author: Mike Swanson
Machine: GURU-BEAST-ROG
Timestamp: 2026-06-07 21:26:22
This commit is contained in:
2026-06-07 21:26:26 -07:00
parent 62fed03362
commit 14362628a2
2 changed files with 213 additions and 6 deletions

View File

@@ -0,0 +1,146 @@
# Session Log — GuruRMM UI Gaps + Enrollment Audit
**Date:** 2026-06-07
**Topic:** GuruRMM UI/Dashboard — gap verification, batch implementation, hotfix, enrollment audit
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-BEAST-ROG
- **Role:** admin
---
## Session Summary
Session started as a GuruRMM UI/Dashboard feedback thread via Discord bot. First pass verified all known UI gaps against the current dashboard source code (submodule was 50 commits behind origin/main at the time of the first coding pass, which caused a merge conflict scenario later). Eight gaps were confirmed open: enrollment management, AgentDetail client row not rendering, dashboard fleet counts computed client-side, tunnel session UI, fleet discovery page, install reporting page, BSOD crashes tab, and agent version history.
The main implementation batch addressed everything except tunnel (server skeleton is "not yet implemented" and needs xterm.js — deferred). A server-side Rust agent added six new/updated API endpoints: `get_agent` now returns `AgentWithDetails` (client_id + client_name via JOIN), `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, and `DELETE /api/agents/:id/key`. A dashboard agent added two new pages (InstallReports, Discovery), a Crashes tab on AgentDetail, a version history table in the Updates tab, fleet stat wiring on Dashboard.tsx, and a Revoke Key button in SiteDetail. Code review approved all changes. The Gitea agent committed and pushed but the submodule was in detached HEAD 50 commits behind main, requiring a stash/pull/rebase merge that resolved five conflicts including Layout.tsx (upstream had switched to FunctionRail architecture) and Dashboard.tsx (upstream had switched to a Triage/ExceptionStream layout).
Immediately after the build deployed, Mike reported RMM was down. Diagnosis: the new `get_agent_with_details_by_id` SQL query was missing `a.role_override` — migration 054 had added `role_override: String` (required, non-optional) to `AgentWithDetails` and the existing `get_all_agents_with_details` already included it, but the new single-agent function was written from the pre-054 SQL. The `list_agents` endpoint continued working while `GET /api/agents/:id` returned 500. Hotfix was a one-line SQL change (`a.role_override` added), committed and pushed immediately (`6faa382`).
Enrollment work followed. Investigation revealed the `enrolled_agents` table (migration 012) already stored everything needed — enrolled_at, last_seen, ip_address, os_version, revoked — so no new migration was required. The original `DELETE /api/agents/:id/key` implementation only nulled `agents.api_key_hash` (legacy Mode 1/2 key), but WS auth checks `enrolled_agents.agent_key_hash` first (Mode 3 for `agk_` keys), so revocation was incomplete for modern agents. Implementation added four new DB functions, `GET /api/sites/:id/enrolled-agents` with duplicate-hostname annotation, `POST /api/enrolled-agents/:id/revoke`, and fixed `DELETE /api/agents/:id/key` to cascade-revoke both key layers. Dashboard gained an Enrollment tab on SiteDetail with full audit table, status badges, duplicate warnings, per-row revoke with confirm. A second detached-HEAD merge was required; the Gitea agent resolved conflicts but incorrectly kept the old route pointer — caught and fixed manually (`00129af`).
Session closed with a sync that also deployed Phase 5c of sync.sh (skills mirroring to `~/.claude/skills/`) per a coord message from GURU-5070.
---
## Key Decisions
- **Tunnel deferred:** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented" — wiring the route would expose a non-functional endpoint. Excluded from this batch; needs proper design + xterm.js before implementation.
- **No migration for enrollment:** `enrolled_agents` table (migration 012) already had all required fields. The gap was API exposure and UI, not missing data.
- **Dual-layer revoke:** `DELETE /api/agents/:id/key` was redesigned to revoke both `agents.api_key_hash` (legacy Mode 1/2) and `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Originally only the legacy field was touched, leaving modern agents able to reconnect.
- **Duplicate hostname detection via two queries:** Rather than a SQL window function, `list_site_enrollments` does a second query for `COUNT(*) > 1` grouped hostnames, then annotates in Rust. Simpler and safer with runtime sqlx (no compile-time macro).
- **`is_duplicate_hostname` scoped to active rows only:** Only non-revoked enrollments count toward the duplicate flag. A revoked + active pair is not flagged — the revocation resolved the ambiguity.
- **Per-row revoke vs. bulk revoke:** Added `POST /api/enrolled-agents/:id/revoke` for targeted per-enrollment revocation (with cascade to `agents.api_key_hash`). The existing `DELETE /api/agents/:id/key` is the bulk "revoke everything for this agent" path.
---
## Problems Encountered
- **Submodule detached HEAD (twice):** The submodule was 50 commits behind `origin/main` each time the coding agents worked on it. Required stash → checkout main → pull → stash pop → conflict resolution. Upstream had refactored Dashboard.tsx to a Triage/ExceptionStream layout and Layout.tsx to FunctionRail/InfrastructureSpine — the coding agents worked on the old layout. Gitea agent resolved both merges.
- **`role_override` field missing from new SQL query:** `get_agent_with_details_by_id` was copied from pre-migration-054 SQL. Migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and updated `get_all_agents_with_details` but not our new function. `sqlx::FromRow` panics at runtime when a required field has no column. Caused production 500s on all `GET /api/agents/:id` calls. Fixed in hotfix commit `6faa382`.
- **Route pointer left on old handler after merge:** The Gitea agent's conflict resolution for mod.rs kept the original `revoke_agent_key` route (single-layer revoke) rather than switching to `revoke_agent_key_handler` (dual-layer). Caught by manual review, fixed in `00129af`.
- **SSH to production server unavailable from BEAST:** `known_hosts` file issue prevented SSH key auth during incident diagnosis. Worked around by getting a JWT token via API and testing endpoints directly with curl.
---
## Configuration Changes
**GuruRMM submodule (`projects/msp-tools/guru-rmm/`):**
Server files modified:
- `server/src/db/agents.rs``get_agent_with_details_by_id`, `revoke_agent_key` (DB functions)
- `server/src/db/enroll.rs``get_enrollment_by_id`, `revoke_single_enrollment`, `list_enrollments_for_site`, `duplicate_active_hostnames` (4 new DB functions)
- `server/src/db/discovery.rs``DiscoveredDeviceWithContext`, `list_all_discovered_devices`
- `server/src/db/updates.rs``Serialize` derive added to `AgentUpdateRecord`
- `server/src/db/bsod_events.rs` — removed stale `#[allow(dead_code)]`
- `server/src/api/agents.rs` — updated `get_agent` handler; added `list_bsod_events`, `get_agent_version_history`, `revoke_agent_key` (old, kept), `revoke_agent_key_handler` (new dual-revoke)
- `server/src/api/enroll.rs``EnrollmentRecord` struct, `list_site_enrollments`, `revoke_enrollment` handlers
- `server/src/api/install_report.rs``InstallReport` struct, `list_install_reports`, `get_install_report` handlers
- `server/src/api/discovery.rs``list_all_devices` handler
- `server/src/api/mod.rs` — 10 new routes wired
Dashboard files modified:
- `dashboard/src/api/client.ts``AgentStats`, `BsodEvent`, `AgentUpdateRecord`, `InstallReport`, `DiscoveredDeviceWithContext`, `EnrollmentRecord` interfaces; `agentsApi` extensions; `installReportsApi`, `discoveryFleetApi`, `enrollApi` namespaces
- `dashboard/src/pages/AgentDetail.tsx` — Crashes tab, version history in Updates tab
- `dashboard/src/pages/Dashboard.tsx` — fleet stats wired to `/agents/stats`
- `dashboard/src/pages/SiteDetail.tsx` — Revoke Key button + Enrollment tab
- `dashboard/src/components/Tabs.tsx` — optional `badge` prop
- `dashboard/src/components/FunctionRail.tsx` — Install Reports + Fleet Discovery nav links
- `dashboard/src/App.tsx``/install-reports`, `/discovery` routes
Dashboard files created:
- `dashboard/src/pages/InstallReports.tsx`
- `dashboard/src/pages/Discovery.tsx`
**ClaudeTools repo:**
- `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` — updated to mark 8 items complete, detail enrollment partial status
---
## Credentials & Secrets
- GuruRMM API admin: `claude-api@azcomputerguru.com` / vault: `infrastructure/gururmm-server.sops.yaml` field `admin-password`
- SSH: `guru@172.16.3.30` — SSH key auth failing from BEAST (known_hosts issue); password in vault `infrastructure/gururmm-server.sops.yaml` field `password`
---
## Infrastructure & Servers
- **GuruRMM server:** `172.16.3.30:3001` (Rust/Axum, systemd `gururmm-server`)
- **Dashboard:** `https://rmm.azcomputerguru.com` (nginx, `/var/www/gururmm/dashboard/`)
- **Gitea:** `https://git.azcomputerguru.com/azcomputerguru/gururmm` (repo: `azcomputerguru/gururmm`)
- **Build pipeline webhook:** `172.16.3.30:9000` (not reachable from BEAST — Gitea triggers it)
- **Coord API:** `http://172.16.3.30:8001/api/coord`
---
## Commands & Outputs
```bash
# Verified API server up during incident
curl -s "http://172.16.3.30:3001/api/auth/me"
# → "Missing authorization header" (server alive)
# Confirmed new build deployed (new endpoint returns 200)
curl -sv "http://172.16.3.30:3001/api/agents/<id>/bsod-events" -H "Authorization: Bearer $TOKEN"
# → HTTP/1.1 200 OK
# Confirmed bug: get_agent returning 500
curl -s "http://172.16.3.30:3001/api/agents/<id>" -H "Authorization: Bearer $TOKEN"
# → "Internal server error"
# Hotfix commits (gururmm submodule)
# 5854c63 — main UI gap batch
# 6faa382 — hotfix: role_override missing from get_agent_with_details_by_id
# 49a7109 — enrollment audit batch
# 00129af — fix: wire revoke_agent_key_handler to route
# Coord message handled: GURU-5070 skills empty, Phase 5c fix
# BEAST sync confirmed 19 skills deployed
```
---
## Pending / Incomplete Tasks
- **Tunnel Session Management (P2):** Server skeleton "not yet implemented". Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort.
- **Enrollment audit log (enrollment table) — key status column:** The `enrolled_agents` table doesn't expose a "key_prefix" field (key hash is intentionally hidden). The UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of key hash as a non-sensitive display field in a future pass.
- **SSH from BEAST to production:** `known_hosts` file issue prevents SSH key auth. Should be fixed for incident response capability. Likely needs the production server's host key added to BEAST's known_hosts.
- **Tunnel (deferred):** See above.
- **Documentation (P3):** User guide, inline help tooltips — still not started.
- **`revoke_agent_key` (old handler):** Dead code now that route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass to avoid confusion.
---
## Reference Information
- GuruRMM wiki: `wiki/projects/gururmm.md`
- UI gaps doc: `projects/msp-tools/guru-rmm/docs/UI_GAPS.md`
- enrolled_agents migration: `server/migrations/012_enrolled_agents.sql`
- Key commits:
- `5854c63` — UI gap batch (7 server endpoints, 2 new pages, 5 dashboard enhancements)
- `6faa382` — hotfix role_override in get_agent SQL
- `49a7109` — enrollment audit batch
- `00129af` — fix route pointer to dual-revoke handler
- Coord message addressed: GURU-5070 skills Phase 5c fix (commit `62fed033`)