sync: auto-sync from GURU-BEAST-ROG at 2026-06-07 21:26:22
Author: Mike Swanson Machine: GURU-BEAST-ROG Timestamp: 2026-06-07 21:26:22
This commit is contained in:
146
session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md
Normal file
146
session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# Session Log — GuruRMM UI Gaps + Enrollment Audit
|
||||
|
||||
**Date:** 2026-06-07
|
||||
**Topic:** GuruRMM UI/Dashboard — gap verification, batch implementation, hotfix, enrollment audit
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-BEAST-ROG
|
||||
- **Role:** admin
|
||||
|
||||
---
|
||||
|
||||
## Session Summary
|
||||
|
||||
Session started as a GuruRMM UI/Dashboard feedback thread via Discord bot. First pass verified all known UI gaps against the current dashboard source code (submodule was 50 commits behind origin/main at the time of the first coding pass, which caused a merge conflict scenario later). Eight gaps were confirmed open: enrollment management, AgentDetail client row not rendering, dashboard fleet counts computed client-side, tunnel session UI, fleet discovery page, install reporting page, BSOD crashes tab, and agent version history.
|
||||
|
||||
The main implementation batch addressed everything except tunnel (server skeleton is "not yet implemented" and needs xterm.js — deferred). A server-side Rust agent added six new/updated API endpoints: `get_agent` now returns `AgentWithDetails` (client_id + client_name via JOIN), `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, and `DELETE /api/agents/:id/key`. A dashboard agent added two new pages (InstallReports, Discovery), a Crashes tab on AgentDetail, a version history table in the Updates tab, fleet stat wiring on Dashboard.tsx, and a Revoke Key button in SiteDetail. Code review approved all changes. The Gitea agent committed and pushed but the submodule was in detached HEAD 50 commits behind main, requiring a stash/pull/rebase merge that resolved five conflicts including Layout.tsx (upstream had switched to FunctionRail architecture) and Dashboard.tsx (upstream had switched to a Triage/ExceptionStream layout).
|
||||
|
||||
Immediately after the build deployed, Mike reported RMM was down. Diagnosis: the new `get_agent_with_details_by_id` SQL query was missing `a.role_override` — migration 054 had added `role_override: String` (required, non-optional) to `AgentWithDetails` and the existing `get_all_agents_with_details` already included it, but the new single-agent function was written from the pre-054 SQL. The `list_agents` endpoint continued working while `GET /api/agents/:id` returned 500. Hotfix was a one-line SQL change (`a.role_override` added), committed and pushed immediately (`6faa382`).
|
||||
|
||||
Enrollment work followed. Investigation revealed the `enrolled_agents` table (migration 012) already stored everything needed — enrolled_at, last_seen, ip_address, os_version, revoked — so no new migration was required. The original `DELETE /api/agents/:id/key` implementation only nulled `agents.api_key_hash` (legacy Mode 1/2 key), but WS auth checks `enrolled_agents.agent_key_hash` first (Mode 3 for `agk_` keys), so revocation was incomplete for modern agents. Implementation added four new DB functions, `GET /api/sites/:id/enrolled-agents` with duplicate-hostname annotation, `POST /api/enrolled-agents/:id/revoke`, and fixed `DELETE /api/agents/:id/key` to cascade-revoke both key layers. Dashboard gained an Enrollment tab on SiteDetail with full audit table, status badges, duplicate warnings, per-row revoke with confirm. A second detached-HEAD merge was required; the Gitea agent resolved conflicts but incorrectly kept the old route pointer — caught and fixed manually (`00129af`).
|
||||
|
||||
Session closed with a sync that also deployed Phase 5c of sync.sh (skills mirroring to `~/.claude/skills/`) per a coord message from GURU-5070.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Tunnel deferred:** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented" — wiring the route would expose a non-functional endpoint. Excluded from this batch; needs proper design + xterm.js before implementation.
|
||||
- **No migration for enrollment:** `enrolled_agents` table (migration 012) already had all required fields. The gap was API exposure and UI, not missing data.
|
||||
- **Dual-layer revoke:** `DELETE /api/agents/:id/key` was redesigned to revoke both `agents.api_key_hash` (legacy Mode 1/2) and `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Originally only the legacy field was touched, leaving modern agents able to reconnect.
|
||||
- **Duplicate hostname detection via two queries:** Rather than a SQL window function, `list_site_enrollments` does a second query for `COUNT(*) > 1` grouped hostnames, then annotates in Rust. Simpler and safer with runtime sqlx (no compile-time macro).
|
||||
- **`is_duplicate_hostname` scoped to active rows only:** Only non-revoked enrollments count toward the duplicate flag. A revoked + active pair is not flagged — the revocation resolved the ambiguity.
|
||||
- **Per-row revoke vs. bulk revoke:** Added `POST /api/enrolled-agents/:id/revoke` for targeted per-enrollment revocation (with cascade to `agents.api_key_hash`). The existing `DELETE /api/agents/:id/key` is the bulk "revoke everything for this agent" path.
|
||||
|
||||
---
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **Submodule detached HEAD (twice):** The submodule was 50 commits behind `origin/main` each time the coding agents worked on it. Required stash → checkout main → pull → stash pop → conflict resolution. Upstream had refactored Dashboard.tsx to a Triage/ExceptionStream layout and Layout.tsx to FunctionRail/InfrastructureSpine — the coding agents worked on the old layout. Gitea agent resolved both merges.
|
||||
- **`role_override` field missing from new SQL query:** `get_agent_with_details_by_id` was copied from pre-migration-054 SQL. Migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and updated `get_all_agents_with_details` but not our new function. `sqlx::FromRow` panics at runtime when a required field has no column. Caused production 500s on all `GET /api/agents/:id` calls. Fixed in hotfix commit `6faa382`.
|
||||
- **Route pointer left on old handler after merge:** The Gitea agent's conflict resolution for mod.rs kept the original `revoke_agent_key` route (single-layer revoke) rather than switching to `revoke_agent_key_handler` (dual-layer). Caught by manual review, fixed in `00129af`.
|
||||
- **SSH to production server unavailable from BEAST:** `known_hosts` file issue prevented SSH key auth during incident diagnosis. Worked around by getting a JWT token via API and testing endpoints directly with curl.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
**GuruRMM submodule (`projects/msp-tools/guru-rmm/`):**
|
||||
|
||||
Server files modified:
|
||||
- `server/src/db/agents.rs` — `get_agent_with_details_by_id`, `revoke_agent_key` (DB functions)
|
||||
- `server/src/db/enroll.rs` — `get_enrollment_by_id`, `revoke_single_enrollment`, `list_enrollments_for_site`, `duplicate_active_hostnames` (4 new DB functions)
|
||||
- `server/src/db/discovery.rs` — `DiscoveredDeviceWithContext`, `list_all_discovered_devices`
|
||||
- `server/src/db/updates.rs` — `Serialize` derive added to `AgentUpdateRecord`
|
||||
- `server/src/db/bsod_events.rs` — removed stale `#[allow(dead_code)]`
|
||||
- `server/src/api/agents.rs` — updated `get_agent` handler; added `list_bsod_events`, `get_agent_version_history`, `revoke_agent_key` (old, kept), `revoke_agent_key_handler` (new dual-revoke)
|
||||
- `server/src/api/enroll.rs` — `EnrollmentRecord` struct, `list_site_enrollments`, `revoke_enrollment` handlers
|
||||
- `server/src/api/install_report.rs` — `InstallReport` struct, `list_install_reports`, `get_install_report` handlers
|
||||
- `server/src/api/discovery.rs` — `list_all_devices` handler
|
||||
- `server/src/api/mod.rs` — 10 new routes wired
|
||||
|
||||
Dashboard files modified:
|
||||
- `dashboard/src/api/client.ts` — `AgentStats`, `BsodEvent`, `AgentUpdateRecord`, `InstallReport`, `DiscoveredDeviceWithContext`, `EnrollmentRecord` interfaces; `agentsApi` extensions; `installReportsApi`, `discoveryFleetApi`, `enrollApi` namespaces
|
||||
- `dashboard/src/pages/AgentDetail.tsx` — Crashes tab, version history in Updates tab
|
||||
- `dashboard/src/pages/Dashboard.tsx` — fleet stats wired to `/agents/stats`
|
||||
- `dashboard/src/pages/SiteDetail.tsx` — Revoke Key button + Enrollment tab
|
||||
- `dashboard/src/components/Tabs.tsx` — optional `badge` prop
|
||||
- `dashboard/src/components/FunctionRail.tsx` — Install Reports + Fleet Discovery nav links
|
||||
- `dashboard/src/App.tsx` — `/install-reports`, `/discovery` routes
|
||||
|
||||
Dashboard files created:
|
||||
- `dashboard/src/pages/InstallReports.tsx`
|
||||
- `dashboard/src/pages/Discovery.tsx`
|
||||
|
||||
**ClaudeTools repo:**
|
||||
- `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` — updated to mark 8 items complete, detail enrollment partial status
|
||||
|
||||
---
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- GuruRMM API admin: `claude-api@azcomputerguru.com` / vault: `infrastructure/gururmm-server.sops.yaml` field `admin-password`
|
||||
- SSH: `guru@172.16.3.30` — SSH key auth failing from BEAST (known_hosts issue); password in vault `infrastructure/gururmm-server.sops.yaml` field `password`
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- **GuruRMM server:** `172.16.3.30:3001` (Rust/Axum, systemd `gururmm-server`)
|
||||
- **Dashboard:** `https://rmm.azcomputerguru.com` (nginx, `/var/www/gururmm/dashboard/`)
|
||||
- **Gitea:** `https://git.azcomputerguru.com/azcomputerguru/gururmm` (repo: `azcomputerguru/gururmm`)
|
||||
- **Build pipeline webhook:** `172.16.3.30:9000` (not reachable from BEAST — Gitea triggers it)
|
||||
- **Coord API:** `http://172.16.3.30:8001/api/coord`
|
||||
|
||||
---
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
```bash
|
||||
# Verified API server up during incident
|
||||
curl -s "http://172.16.3.30:3001/api/auth/me"
|
||||
# → "Missing authorization header" (server alive)
|
||||
|
||||
# Confirmed new build deployed (new endpoint returns 200)
|
||||
curl -sv "http://172.16.3.30:3001/api/agents/<id>/bsod-events" -H "Authorization: Bearer $TOKEN"
|
||||
# → HTTP/1.1 200 OK
|
||||
|
||||
# Confirmed bug: get_agent returning 500
|
||||
curl -s "http://172.16.3.30:3001/api/agents/<id>" -H "Authorization: Bearer $TOKEN"
|
||||
# → "Internal server error"
|
||||
|
||||
# Hotfix commits (gururmm submodule)
|
||||
# 5854c63 — main UI gap batch
|
||||
# 6faa382 — hotfix: role_override missing from get_agent_with_details_by_id
|
||||
# 49a7109 — enrollment audit batch
|
||||
# 00129af — fix: wire revoke_agent_key_handler to route
|
||||
|
||||
# Coord message handled: GURU-5070 skills empty, Phase 5c fix
|
||||
# BEAST sync confirmed 19 skills deployed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **Tunnel Session Management (P2):** Server skeleton "not yet implemented". Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort.
|
||||
- **Enrollment audit log (enrollment table) — key status column:** The `enrolled_agents` table doesn't expose a "key_prefix" field (key hash is intentionally hidden). The UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of key hash as a non-sensitive display field in a future pass.
|
||||
- **SSH from BEAST to production:** `known_hosts` file issue prevents SSH key auth. Should be fixed for incident response capability. Likely needs the production server's host key added to BEAST's known_hosts.
|
||||
- **Tunnel (deferred):** See above.
|
||||
- **Documentation (P3):** User guide, inline help tooltips — still not started.
|
||||
- **`revoke_agent_key` (old handler):** Dead code now that route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass to avoid confusion.
|
||||
|
||||
---
|
||||
|
||||
## Reference Information
|
||||
|
||||
- GuruRMM wiki: `wiki/projects/gururmm.md`
|
||||
- UI gaps doc: `projects/msp-tools/guru-rmm/docs/UI_GAPS.md`
|
||||
- enrolled_agents migration: `server/migrations/012_enrolled_agents.sql`
|
||||
- Key commits:
|
||||
- `5854c63` — UI gap batch (7 server endpoints, 2 new pages, 5 dashboard enhancements)
|
||||
- `6faa382` — hotfix role_override in get_agent SQL
|
||||
- `49a7109` — enrollment audit batch
|
||||
- `00129af` — fix route pointer to dual-revoke handler
|
||||
- Coord message addressed: GURU-5070 skills Phase 5c fix (commit `62fed033`)
|
||||
@@ -3,7 +3,7 @@ type: project
|
||||
name: gururmm
|
||||
display_name: GuruRMM
|
||||
last_compiled: 2026-06-07
|
||||
compiled_by: Mikes-MacBook-Air/claude-main
|
||||
compiled_by: GURU-BEAST-ROG/discord-bot
|
||||
aliases:
|
||||
- guru-rmm
|
||||
sources:
|
||||
@@ -53,6 +53,7 @@ sources:
|
||||
- "live GuruRMM Postgres query 2026-06-04: agents/sites/update_rollouts/agent_updates tables (channel verification)"
|
||||
- session-logs/2026-06-07-mike-gururmm-backup-alert-cleanup.md
|
||||
- session-logs/2026-06-07-mike-gururmm-offline-alerting-mute.md
|
||||
- session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md
|
||||
backlinks:
|
||||
- clients/cascades-tucson
|
||||
- systems/gururmm-build
|
||||
@@ -82,6 +83,33 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G
|
||||
|
||||
## Recent Work
|
||||
|
||||
### 2026-06-07 — UI Gap Batch + Enrollment Audit (GURU-BEAST-ROG session)
|
||||
|
||||
**API changes:**
|
||||
- `get_agent` handler now returns `AgentWithDetails` — includes `client_id` and `client_name` via JOIN query (was previously missing client context)
|
||||
- New endpoints: `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`
|
||||
- New endpoints: `GET /api/install-reports`, `GET /api/install-reports/:id`
|
||||
- New endpoint: `GET /api/discovery/all-devices`
|
||||
- `DELETE /api/agents/:id/key` redesigned as dual-revoke: nulls `agents.api_key_hash` (legacy Mode 1/2) AND sets `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Previous implementation only touched the legacy field, leaving modern agents able to reconnect.
|
||||
- New endpoint: `GET /api/sites/:id/enrolled-agents` — full enrollment audit with duplicate-hostname annotation (scoped to active/non-revoked rows)
|
||||
- New endpoint: `POST /api/enrolled-agents/:id/revoke` — per-enrollment targeted revocation with cascade to `agents.api_key_hash`
|
||||
|
||||
**Dashboard additions:**
|
||||
- AgentDetail: Crashes tab (BSOD events), version history table in Updates tab
|
||||
- Dashboard.tsx: fleet stats wired to `/agents/stats` endpoint (was computed client-side)
|
||||
- SiteDetail: Revoke Key button + Enrollment tab with full audit table, status badges, duplicate-hostname warnings, per-row revoke with confirm dialog
|
||||
- New pages: InstallReports (`/install-reports`), Discovery (`/discovery`)
|
||||
- FunctionRail: nav links for Install Reports + Fleet Discovery
|
||||
|
||||
**Hotfix (production 500s):**
|
||||
- `get_agent_with_details_by_id` SQL was missing `a.role_override` — migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and the pre-054 SQL template was used for the new function. `sqlx::FromRow` panics at runtime when a required field has no column. All `GET /api/agents/:id` calls returned 500 until hotfix commit `6faa382`.
|
||||
|
||||
**Key commits:** `5854c63` (UI gap batch), `6faa382` (hotfix: role_override in SQL), `49a7109` (enrollment audit batch), `00129af` (fix: wire revoke_agent_key_handler to route)
|
||||
|
||||
**Tunnel status:** Server `tunnel.rs` is a dead-code skeleton logging "not yet implemented" — deferred; needs xterm.js design + WS protocol spec before implementation.
|
||||
|
||||
---
|
||||
|
||||
### 2026-06-07 — Credential Inheritance Deployment & Offboarding Spec
|
||||
|
||||
**Deployed to production:**
|
||||
@@ -175,6 +203,34 @@ Agent<->server communication is a persistent authenticated WebSocket with auto-r
|
||||
- Enrollment & keys (`012`): per-agent key issuance on first run, site API keys (regenerable), site-specific MSI with SITEKEY injected at download, public install-report ingestion. Legacy PowerShell agent path for Server 2008 R2.
|
||||
- Logs: agent log upload (periodic + on-demand), per-agent events (`042`), fleet log view, AI-assisted log analysis (`/logs/analyze`) — AI-optional per locked decision.
|
||||
|
||||
### Enrollment Detail (migration 012 + 2026-06-07 audit)
|
||||
|
||||
The `enrolled_agents` table (migration 012) is the authoritative enrollment history log:
|
||||
|
||||
| Field | Notes |
|
||||
|---|---|
|
||||
| `site_id` | Site the agent enrolled under |
|
||||
| `agent_id` | Set on WebSocket connect (links enrollment record to the agent row) |
|
||||
| `hostname` | Hostname at enrollment time |
|
||||
| `agent_key_hash` | Hash of the `agk_` enrollment key (intentionally hidden; not surfaced in API) |
|
||||
| `enrolled_at` | Timestamp of initial enrollment |
|
||||
| `last_seen` | Updated on each WS connection |
|
||||
| `ip_address` | IP at last connection |
|
||||
| `os_version` | OS at enrollment time |
|
||||
| `revoked` | Boolean; set TRUE by revocation endpoints |
|
||||
|
||||
**WS auth modes:**
|
||||
- **Mode 3** (modern, `agk_` keys): checks `enrolled_agents.agent_key_hash` first
|
||||
- **Mode 1/2** (legacy): falls back to `agents.api_key_hash`
|
||||
|
||||
**Revocation — dual-layer (fixed 2026-06-07):**
|
||||
- `DELETE /api/agents/:id/key` (bulk): revokes BOTH `agents.api_key_hash` (Mode 1/2) AND sets `enrolled_agents.revoked = TRUE` for all enrollment records (Mode 3). Previously only touched the legacy field.
|
||||
- `POST /api/enrolled-agents/:id/revoke` (targeted): per-enrollment revocation with cascade to `agents.api_key_hash`.
|
||||
|
||||
**Enrollment audit endpoint:** `GET /api/sites/:id/enrolled-agents` — returns full enrollment history with `is_duplicate_hostname` annotation. Duplicate flag is scoped to active (non-revoked) rows only; a revoked + active pair is not flagged.
|
||||
|
||||
**Dashboard:** Enrollment tab on SiteDetail — audit table with status badges, duplicate-hostname warnings, and per-row Revoke button with confirmation dialog (shipped 2026-06-07).
|
||||
|
||||
### Platform coverage
|
||||
- **Cross-platform (Win/Linux/macOS):** metrics, network state, hardware/software/service inventory, user/group + DC/domain detection, checks (service checks Win+Linux), discovery, self-update, scripts/commands.
|
||||
- **Windows-only:** `user_session` command context (WTS), LibreHardwareMonitor temps, remote registry, tray-into-session, watchdog SCM supervision, BSOD detection.
|
||||
@@ -266,7 +322,7 @@ As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.45):
|
||||
- **Credential inheritance (deployed 2026-06-07):** Production server running v0.3.45 with full credential inheritance and de-duplication. `/effective` endpoints validated. Dashboard clickable alert badges and client-scoped filtering implemented.
|
||||
- **SPEC-028 offboarding wizard (specification complete):** 835-line spec created for site and client offboarding workflows. Includes data export, dependency analysis, typed confirmation, and audit logging. Roadmap updated with "Client & Site Lifecycle Management" section. Implementation pending.
|
||||
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main -> beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event -> 3s wait -> `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
|
||||
- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab + BSOD in Alerts stream (issue #10, dashboard bullets unchecked); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
|
||||
- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab shipped 2026-06-07 (AgentDetail Crashes tab + version history in Updates tab). Remaining: BSOD in Alerts stream (issue #10); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
|
||||
- **Linux fleet unit drift:** Auto-updater replaces the binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix Linux agents have new binary + old unit (missing `StateDirectory=gururmm`). Needs an ops-script pass via `/rmm` or organic at next reinstall.
|
||||
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
|
||||
- **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
|
||||
@@ -274,6 +330,10 @@ As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.45):
|
||||
- **MSP360 backup integration** — Alert quality pass shipped 2026-06-07 (commits `779f7f6` + `b82c010`): false `backup_failed` alerts 15 -> 2; `backup_storage_low` removed (structurally false signal; 5 -> 0 false alerts). Dashboard UI shipped 2026-05-31. Phase 2 (management) not started; genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint).
|
||||
- **MSP360 API scope (confirmed 2026-06-07):** Provider API token (vault `msp-tools/msp360-api.sops.yaml`) is monitoring-only at our tier: Companies/Users/Monitoring GET endpoints; every management path (plan delete, manual run, storage config) returns 404. White-labeled MSP360 agent has no `cbb.exe` at any standard path. Plan delete/run/storage remain MSP360-console tasks.
|
||||
- **Alert mute dashboard (Task 5) -- NOT started:** Ignore button + required-reason prompt on alert row; Muted filter on alerts page + agent Alerts tab; Un-ignore control; muted rows show Un-ignore instead of Ack/Resolve. Server endpoints ready: `POST /api/alerts/:id/mute` + `/unmute`.
|
||||
- **Tunnel session management (P2, deferred):** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented"; wiring the route would expose a non-functional endpoint. Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort.
|
||||
- **SSH from BEAST to production broken:** `known_hosts` issue prevents SSH key auth from GURU-BEAST-ROG to `172.16.3.30`. Workaround during incident: JWT token + curl for endpoint diagnosis. Needs production server host key added to BEAST's `~/.ssh/known_hosts`.
|
||||
- **`revoke_agent_key` dead code:** Old single-layer handler is dead code now that the route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass.
|
||||
- **Enrollment key display gap:** `enrolled_agents` table doesn't expose a key prefix — key hash is intentionally hidden. UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of hash as a non-sensitive display field in a future pass.
|
||||
- **Offline alerting -- classification gap:** SIF-SERVER (SifOidak.local DC), Server2013 (Sombra), and other inventory-less offline servers currently auto-classify as workstations (no `os_product_type`; `os_name` doesn't match ~/server/i for those names). Needs manual `role_override` via `PUT /api/agents/:id/role-override` -- Mike held on tagging them.
|
||||
- **`/backup-status` endpoint shape gap:** Returns only one plan per agent (not all plans); makes agents with a dead old plan + healthy current plan look stale-but-green in the backup tab. Compliance domain evaluates all plans correctly. Not fixed this session — noted for future.
|
||||
- **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH).
|
||||
@@ -406,16 +466,15 @@ Gitea push to main
|
||||
- Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)
|
||||
|
||||
**Dashboard — complete and working:**
|
||||
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support.
|
||||
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support, AgentDetail Crashes tab (BSOD events), AgentDetail version history in Updates tab, fleet stats from `/agents/stats`, SiteDetail Revoke Key button + Enrollment audit tab (with duplicate-hostname badge), Install Reports page, Fleet Discovery page.
|
||||
|
||||
**Dashboard — incomplete (see UI_GAPS.md):**
|
||||
- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
|
||||
- Watchdog alerts UI — blocked on 2 missing server routes
|
||||
- BSOD/Crashes tab on Agent Detail (Phase 2 deferred)
|
||||
- BSOD in Alerts stream (Phase 2 deferred)
|
||||
- Tunnel session management (interactive terminal — backend skeleton, not production-ready)
|
||||
- Tunnel session management (interactive terminal — server skeleton "not yet implemented"; needs xterm.js design + WS protocol spec)
|
||||
- Offboarding wizard UI (SPEC-028 complete, implementation pending)
|
||||
- Alert mute UI -- Ignore button + required-reason prompt, Muted filter, Un-ignore control (server endpoints ready; dashboard Task 5 NOT started)
|
||||
- Documentation / user guide / inline help tooltips (P3, not started)
|
||||
|
||||
**Open Gitea issues:**
|
||||
- #10 — BSOD detection Phase 2/3 (dashboard + fetch_bsod_dump + full bugcheck table)
|
||||
@@ -478,6 +537,7 @@ These decisions are locked. Do not reverse without explicit user approval.
|
||||
| 2026-06-07 | Backup-alert quality pass shipped. FU1 (`summarize_backup_error` decodes MSP360 message JSON; `create_or_update_alert` now refreshes title/message/severity on re-trigger, also fixes latent severity-escalation freeze) + FU2 (exclude non-backup PlanTypes 8=Restore/13=Consistency-check from alerting/compliance): false `backup_failed` alerts 15 -> 2 fleet-wide (survivors AD1, LAB-Becky are genuine and self-describing), commit `779f7f6`. `backup_storage_low` alert type removed entirely (commit `b82c010`): `DataCopied/TotalData` measures backup-dataset completeness, not destination capacity — produced 5 fleet-wide false alerts including DF-HYPERV-B "100% Full" on a 4 GB plan; `resolve_all_backup_storage_alerts` (type-scoped, idempotent, once-per-tick) clears stragglers; 5 -> 0 verified after 17:21:41 UTC restart. Genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint). `BACKUP_STALE` evaluator confirmed already correct — no new code. Both commits on main. Submodule pinned at `226ba9f` in parent. |
|
||||
| 2026-06-07 | Credential inheritance deployed to production (server v0.3.45). Hierarchical credential propagation (Global → Client → Site) with `is_inheritable` flag and de-duplication by (credential_type, label). `/effective` endpoints validated. Dashboard UI: clickable alert severity badges with client filtering, offline badge now scopes to client-specific agents. SPEC-028 offboarding wizard specification created (835 lines) covering site and client offboarding workflows with data export, dependency analysis, typed confirmation, and audit logging. FEATURE_ROADMAP.md updated with "Client & Site Lifecycle Management" section. |
|
||||
| 2026-06-07 | Role-aware offline alerting + alert ignore/mute shipped (second session). Offline sweep evaluator (60s tokio interval, `server/src/alerts/offline.rs`): server-only `agent_offline` alerts (classifier: `os_product_type` 2/3, else `os_name` ~/server/i, else `role_override`); site rule (>=50% + >=3) -> `mass_offline_site`; fleet rule (>=10) -> `mass_offline_fleet`; aggregates pinned to representative agent with site/fleet `dedup_key`. Warm-up restart guard (code review caught + fixed `last_seen < started_at` spec defect — permanently false in steady state). Migration 054 (`agents.role_override`). Dashboard triage: servers elevated individually; workstations collapsed into roll-up. `PUT /api/agents/:id/role-override`. Verified live vs WIN-TG2STMODJG8 ("Windows Server 2019 Standard Evaluation", auto-classified via `os_name`); zero false mass-offline against ~55 chronically-offline agents; restart guard held across deploys. Known gap: `os_product_type` on only 16/168 agents; `os_name` is workhorse. Commits f1cdf5d/30e4f23/21d63bd/3eedf91. Alert ignore/mute (perma-silence): migration 055 (`alert_mutes` table + `muted` status); `dedup_key`-keyed; reason required (400 if missing); gates `create_or_update_alert` + `create_check_alert` bypass; `POST /api/alerts/:id/mute` + `/unmute`; dashboard UI NOT started. Verified live (mute -> `muted`; 60s sweep did not re-fire; unmute -> `active`). Commits 29c405e/a120e71. MSP360 Provider API probed live: monitoring-only tier (Companies/Users/Monitoring GET; management paths 404); plan delete/run/storage remain console tasks; white-label agent has no `cbb.exe`. |
|
||||
| 2026-06-07 | UI gap batch + enrollment audit (GURU-BEAST-ROG session). `get_agent` upgraded to `AgentWithDetails` with `client_id`/`client_name` JOIN. Six new/updated server endpoints: `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, `DELETE /api/agents/:id/key` (redesigned as dual-layer revoke). Two new enrollment endpoints: `GET /api/sites/:id/enrolled-agents` (audit with duplicate-hostname annotation) + `POST /api/enrolled-agents/:id/revoke`. Dashboard: AgentDetail Crashes tab + version history, Dashboard.tsx fleet stats from `/agents/stats`, SiteDetail Revoke Key + Enrollment audit tab, new pages InstallReports + Discovery. Hotfix `6faa382`: `role_override` field missing from new `get_agent_with_details_by_id` SQL caused all `GET /api/agents/:id` to return 500 in production; fixed within same session. Enrollment `enrolled_agents` table (migration 012) required no new migration — gap was API exposure and UI only. WS auth: Mode 3 checks `enrolled_agents.agent_key_hash` first; Mode 1/2 falls back to `agents.api_key_hash`; dual-revoke now covers both layers. Key commits: `5854c63`, `6faa382`, `49a7109`, `00129af`. All UI gaps except Tunnel and documentation are now complete. SSH from BEAST to production broken (known_hosts issue) — workaround: JWT + curl. |
|
||||
|
||||
---
|
||||
|
||||
@@ -493,6 +553,7 @@ These decisions are locked. Do not reverse without explicit user approval.
|
||||
- **2026-06-07 recompile:** Folded in backup-alert quality pass (commits `779f7f6` + `b82c010`, both on main). Updated Backup Integration capability section: added FU1/FU2 alert quality pass detail (false backup_failed 15->2; summarize_backup_error; create_or_update_alert refresh); documented backup_storage_low removal (structurally false DataCopied/TotalData signal; 5->0 false alerts; resolve_all_backup_storage_alerts); confirmed BACKUP_STALE evaluator correct (no new code); added key functions list and MSP360 PlanType exclusion map. Updated Repo Structure to include db/mspbackups.rs and mspbackups/ key functions. Updated Current Focus MSP360 line and added /backup-status endpoint shape gap. Updated Summary date and added backup-alert quality pass note. Active State date note updated. Added 2026-06-07 History row. Patterns and History existing rows preserved verbatim.
|
||||
- **2026-06-07 recompile:** Updated for credential inheritance production deployment (server v0.3.45), clickable alert badges with client filtering, and SPEC-028 offboarding wizard specification. Added Recent Work section documenting 2026-06-07 session accomplishments. Updated Current Focus to reflect credential inheritance as deployed and offboarding wizard as spec-complete/implementation-pending. Updated Dashboard status to include credentials management with inheritance. Updated version numbers throughout (server 0.3.37 → 0.3.45). Added session-logs/2026-06-07-mike-gururmm-offboarding-spec.md to sources. Updated History Highlights with 2026-06-07 entry.
|
||||
- **2026-06-07 recompile (second session):** Folded in role-aware offline alerting (server/src/alerts/offline.rs: offline_sweep + agent_role classifier + mass_offline site/fleet detection; migration 054 agents.role_override; dashboard triage + role-override control on agent detail; warm-up restart guard; four commits f1cdf5d/30e4f23/21d63bd/3eedf91) and alert ignore/mute perma-silence (alert_mutes table, muted status, is_dedup_muted/mute_condition/unmute_condition, two-choke-point gate at create_or_update_alert + create_check_alert bypass, mute/unmute API; migration 055; dashboard pending; two commits 29c405e/a120e71). Added MSP360 API scope finding (monitoring-only tier confirmed by live probe; management paths 404). Updated Alerting & Watchdog section with offline alerting and mute detail including new alert types (agent_offline, mass_offline_site, mass_offline_fleet) and new status (muted). Updated Repo Structure (alerts/ directory; db/alerts.rs key functions; dashboard/ entry with agentRole.ts). Updated Development / Current Focus with alert mute dashboard (Task 5 not started), offline classification gap, and MSP360 API scope item. Added alert mute UI to Dashboard incomplete list. Added second 2026-06-07 History row. Updated migration count to 55+ (054/055 confirmed). Added session log source. Patterns section and all existing History rows preserved verbatim.
|
||||
- **2026-06-07 recompile (GURU-BEAST-ROG/discord-bot):** Folded in UI gap batch + enrollment audit session. Added new Recent Work entry (2026-06-07, UI gaps + enrollment). Added Enrollment Detail subsection under Identity / Multi-tenancy documenting `enrolled_agents` table schema, WS auth modes, dual-layer revoke design, and enrollment audit endpoint. Updated `get_agent` capability (now `AgentWithDetails` with client JOIN). Documented six new/updated endpoints (bsod-events, version-history, install-reports, discovery/all-devices, dual-revoke key, enrollment audit + revoke). Updated Dashboard complete list: added Crashes tab, version history, fleet stats, SiteDetail Revoke Key + Enrollment tab, InstallReports + Discovery pages. Updated Dashboard incomplete list: removed enrollment management (now complete); updated Tunnel status (server skeleton confirmed "not yet implemented"); added documentation gap. Updated Current Focus: BSOD Phase 2 partially complete (Crashes tab shipped; Alerts stream + dump upload still deferred); added Tunnel deferred detail, SSH from BEAST broken, dead code cleanup, enrollment key display gap. Added new 2026-06-07 History row (UI gap batch + enrollment audit). Added session log to sources. Updated compiled_by to GURU-BEAST-ROG/discord-bot. History and Patterns sections preserved verbatim.
|
||||
|
||||
## Backlinks
|
||||
|
||||
|
||||
Reference in New Issue
Block a user