sync: auto-sync from GURU-BEAST-ROG at 2026-06-07 21:26:22

Author: Mike Swanson
Machine: GURU-BEAST-ROG
Timestamp: 2026-06-07 21:26:22
This commit is contained in:
2026-06-07 21:26:26 -07:00
parent 62fed03362
commit 14362628a2
2 changed files with 213 additions and 6 deletions

View File

@@ -0,0 +1,146 @@
# Session Log — GuruRMM UI Gaps + Enrollment Audit
**Date:** 2026-06-07
**Topic:** GuruRMM UI/Dashboard — gap verification, batch implementation, hotfix, enrollment audit
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-BEAST-ROG
- **Role:** admin
---
## Session Summary
Session started as a GuruRMM UI/Dashboard feedback thread via Discord bot. First pass verified all known UI gaps against the current dashboard source code (submodule was 50 commits behind origin/main at the time of the first coding pass, which caused a merge conflict scenario later). Eight gaps were confirmed open: enrollment management, AgentDetail client row not rendering, dashboard fleet counts computed client-side, tunnel session UI, fleet discovery page, install reporting page, BSOD crashes tab, and agent version history.
The main implementation batch addressed everything except tunnel (server skeleton is "not yet implemented" and needs xterm.js — deferred). A server-side Rust agent added six new/updated API endpoints: `get_agent` now returns `AgentWithDetails` (client_id + client_name via JOIN), `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, and `DELETE /api/agents/:id/key`. A dashboard agent added two new pages (InstallReports, Discovery), a Crashes tab on AgentDetail, a version history table in the Updates tab, fleet stat wiring on Dashboard.tsx, and a Revoke Key button in SiteDetail. Code review approved all changes. The Gitea agent committed and pushed but the submodule was in detached HEAD 50 commits behind main, requiring a stash/pull/rebase merge that resolved five conflicts including Layout.tsx (upstream had switched to FunctionRail architecture) and Dashboard.tsx (upstream had switched to a Triage/ExceptionStream layout).
Immediately after the build deployed, Mike reported RMM was down. Diagnosis: the new `get_agent_with_details_by_id` SQL query was missing `a.role_override` — migration 054 had added `role_override: String` (required, non-optional) to `AgentWithDetails` and the existing `get_all_agents_with_details` already included it, but the new single-agent function was written from the pre-054 SQL. The `list_agents` endpoint continued working while `GET /api/agents/:id` returned 500. Hotfix was a one-line SQL change (`a.role_override` added), committed and pushed immediately (`6faa382`).
Enrollment work followed. Investigation revealed the `enrolled_agents` table (migration 012) already stored everything needed — enrolled_at, last_seen, ip_address, os_version, revoked — so no new migration was required. The original `DELETE /api/agents/:id/key` implementation only nulled `agents.api_key_hash` (legacy Mode 1/2 key), but WS auth checks `enrolled_agents.agent_key_hash` first (Mode 3 for `agk_` keys), so revocation was incomplete for modern agents. Implementation added four new DB functions, `GET /api/sites/:id/enrolled-agents` with duplicate-hostname annotation, `POST /api/enrolled-agents/:id/revoke`, and fixed `DELETE /api/agents/:id/key` to cascade-revoke both key layers. Dashboard gained an Enrollment tab on SiteDetail with full audit table, status badges, duplicate warnings, per-row revoke with confirm. A second detached-HEAD merge was required; the Gitea agent resolved conflicts but incorrectly kept the old route pointer — caught and fixed manually (`00129af`).
Session closed with a sync that also deployed Phase 5c of sync.sh (skills mirroring to `~/.claude/skills/`) per a coord message from GURU-5070.
---
## Key Decisions
- **Tunnel deferred:** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented" — wiring the route would expose a non-functional endpoint. Excluded from this batch; needs proper design + xterm.js before implementation.
- **No migration for enrollment:** `enrolled_agents` table (migration 012) already had all required fields. The gap was API exposure and UI, not missing data.
- **Dual-layer revoke:** `DELETE /api/agents/:id/key` was redesigned to revoke both `agents.api_key_hash` (legacy Mode 1/2) and `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Originally only the legacy field was touched, leaving modern agents able to reconnect.
- **Duplicate hostname detection via two queries:** Rather than a SQL window function, `list_site_enrollments` does a second query for `COUNT(*) > 1` grouped hostnames, then annotates in Rust. Simpler and safer with runtime sqlx (no compile-time macro).
- **`is_duplicate_hostname` scoped to active rows only:** Only non-revoked enrollments count toward the duplicate flag. A revoked + active pair is not flagged — the revocation resolved the ambiguity.
- **Per-row revoke vs. bulk revoke:** Added `POST /api/enrolled-agents/:id/revoke` for targeted per-enrollment revocation (with cascade to `agents.api_key_hash`). The existing `DELETE /api/agents/:id/key` is the bulk "revoke everything for this agent" path.
---
## Problems Encountered
- **Submodule detached HEAD (twice):** The submodule was 50 commits behind `origin/main` each time the coding agents worked on it. Required stash → checkout main → pull → stash pop → conflict resolution. Upstream had refactored Dashboard.tsx to a Triage/ExceptionStream layout and Layout.tsx to FunctionRail/InfrastructureSpine — the coding agents worked on the old layout. Gitea agent resolved both merges.
- **`role_override` field missing from new SQL query:** `get_agent_with_details_by_id` was copied from pre-migration-054 SQL. Migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and updated `get_all_agents_with_details` but not our new function. `sqlx::FromRow` panics at runtime when a required field has no column. Caused production 500s on all `GET /api/agents/:id` calls. Fixed in hotfix commit `6faa382`.
- **Route pointer left on old handler after merge:** The Gitea agent's conflict resolution for mod.rs kept the original `revoke_agent_key` route (single-layer revoke) rather than switching to `revoke_agent_key_handler` (dual-layer). Caught by manual review, fixed in `00129af`.
- **SSH to production server unavailable from BEAST:** `known_hosts` file issue prevented SSH key auth during incident diagnosis. Worked around by getting a JWT token via API and testing endpoints directly with curl.
---
## Configuration Changes
**GuruRMM submodule (`projects/msp-tools/guru-rmm/`):**
Server files modified:
- `server/src/db/agents.rs``get_agent_with_details_by_id`, `revoke_agent_key` (DB functions)
- `server/src/db/enroll.rs``get_enrollment_by_id`, `revoke_single_enrollment`, `list_enrollments_for_site`, `duplicate_active_hostnames` (4 new DB functions)
- `server/src/db/discovery.rs``DiscoveredDeviceWithContext`, `list_all_discovered_devices`
- `server/src/db/updates.rs``Serialize` derive added to `AgentUpdateRecord`
- `server/src/db/bsod_events.rs` — removed stale `#[allow(dead_code)]`
- `server/src/api/agents.rs` — updated `get_agent` handler; added `list_bsod_events`, `get_agent_version_history`, `revoke_agent_key` (old, kept), `revoke_agent_key_handler` (new dual-revoke)
- `server/src/api/enroll.rs``EnrollmentRecord` struct, `list_site_enrollments`, `revoke_enrollment` handlers
- `server/src/api/install_report.rs``InstallReport` struct, `list_install_reports`, `get_install_report` handlers
- `server/src/api/discovery.rs``list_all_devices` handler
- `server/src/api/mod.rs` — 10 new routes wired
Dashboard files modified:
- `dashboard/src/api/client.ts``AgentStats`, `BsodEvent`, `AgentUpdateRecord`, `InstallReport`, `DiscoveredDeviceWithContext`, `EnrollmentRecord` interfaces; `agentsApi` extensions; `installReportsApi`, `discoveryFleetApi`, `enrollApi` namespaces
- `dashboard/src/pages/AgentDetail.tsx` — Crashes tab, version history in Updates tab
- `dashboard/src/pages/Dashboard.tsx` — fleet stats wired to `/agents/stats`
- `dashboard/src/pages/SiteDetail.tsx` — Revoke Key button + Enrollment tab
- `dashboard/src/components/Tabs.tsx` — optional `badge` prop
- `dashboard/src/components/FunctionRail.tsx` — Install Reports + Fleet Discovery nav links
- `dashboard/src/App.tsx``/install-reports`, `/discovery` routes
Dashboard files created:
- `dashboard/src/pages/InstallReports.tsx`
- `dashboard/src/pages/Discovery.tsx`
**ClaudeTools repo:**
- `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` — updated to mark 8 items complete, detail enrollment partial status
---
## Credentials & Secrets
- GuruRMM API admin: `claude-api@azcomputerguru.com` / vault: `infrastructure/gururmm-server.sops.yaml` field `admin-password`
- SSH: `guru@172.16.3.30` — SSH key auth failing from BEAST (known_hosts issue); password in vault `infrastructure/gururmm-server.sops.yaml` field `password`
---
## Infrastructure & Servers
- **GuruRMM server:** `172.16.3.30:3001` (Rust/Axum, systemd `gururmm-server`)
- **Dashboard:** `https://rmm.azcomputerguru.com` (nginx, `/var/www/gururmm/dashboard/`)
- **Gitea:** `https://git.azcomputerguru.com/azcomputerguru/gururmm` (repo: `azcomputerguru/gururmm`)
- **Build pipeline webhook:** `172.16.3.30:9000` (not reachable from BEAST — Gitea triggers it)
- **Coord API:** `http://172.16.3.30:8001/api/coord`
---
## Commands & Outputs
```bash
# Verified API server up during incident
curl -s "http://172.16.3.30:3001/api/auth/me"
# → "Missing authorization header" (server alive)
# Confirmed new build deployed (new endpoint returns 200)
curl -sv "http://172.16.3.30:3001/api/agents/<id>/bsod-events" -H "Authorization: Bearer $TOKEN"
# → HTTP/1.1 200 OK
# Confirmed bug: get_agent returning 500
curl -s "http://172.16.3.30:3001/api/agents/<id>" -H "Authorization: Bearer $TOKEN"
# → "Internal server error"
# Hotfix commits (gururmm submodule)
# 5854c63 — main UI gap batch
# 6faa382 — hotfix: role_override missing from get_agent_with_details_by_id
# 49a7109 — enrollment audit batch
# 00129af — fix: wire revoke_agent_key_handler to route
# Coord message handled: GURU-5070 skills empty, Phase 5c fix
# BEAST sync confirmed 19 skills deployed
```
---
## Pending / Incomplete Tasks
- **Tunnel Session Management (P2):** Server skeleton "not yet implemented". Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort.
- **Enrollment audit log (enrollment table) — key status column:** The `enrolled_agents` table doesn't expose a "key_prefix" field (key hash is intentionally hidden). The UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of key hash as a non-sensitive display field in a future pass.
- **SSH from BEAST to production:** `known_hosts` file issue prevents SSH key auth. Should be fixed for incident response capability. Likely needs the production server's host key added to BEAST's known_hosts.
- **Tunnel (deferred):** See above.
- **Documentation (P3):** User guide, inline help tooltips — still not started.
- **`revoke_agent_key` (old handler):** Dead code now that route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass to avoid confusion.
---
## Reference Information
- GuruRMM wiki: `wiki/projects/gururmm.md`
- UI gaps doc: `projects/msp-tools/guru-rmm/docs/UI_GAPS.md`
- enrolled_agents migration: `server/migrations/012_enrolled_agents.sql`
- Key commits:
- `5854c63` — UI gap batch (7 server endpoints, 2 new pages, 5 dashboard enhancements)
- `6faa382` — hotfix role_override in get_agent SQL
- `49a7109` — enrollment audit batch
- `00129af` — fix route pointer to dual-revoke handler
- Coord message addressed: GURU-5070 skills Phase 5c fix (commit `62fed033`)

View File

@@ -3,7 +3,7 @@ type: project
name: gururmm
display_name: GuruRMM
last_compiled: 2026-06-07
compiled_by: Mikes-MacBook-Air/claude-main
compiled_by: GURU-BEAST-ROG/discord-bot
aliases:
- guru-rmm
sources:
@@ -53,6 +53,7 @@ sources:
- "live GuruRMM Postgres query 2026-06-04: agents/sites/update_rollouts/agent_updates tables (channel verification)"
- session-logs/2026-06-07-mike-gururmm-backup-alert-cleanup.md
- session-logs/2026-06-07-mike-gururmm-offline-alerting-mute.md
- session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md
backlinks:
- clients/cascades-tucson
- systems/gururmm-build
@@ -82,6 +83,33 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G
## Recent Work
### 2026-06-07 — UI Gap Batch + Enrollment Audit (GURU-BEAST-ROG session)
**API changes:**
- `get_agent` handler now returns `AgentWithDetails` — includes `client_id` and `client_name` via JOIN query (was previously missing client context)
- New endpoints: `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`
- New endpoints: `GET /api/install-reports`, `GET /api/install-reports/:id`
- New endpoint: `GET /api/discovery/all-devices`
- `DELETE /api/agents/:id/key` redesigned as dual-revoke: nulls `agents.api_key_hash` (legacy Mode 1/2) AND sets `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Previous implementation only touched the legacy field, leaving modern agents able to reconnect.
- New endpoint: `GET /api/sites/:id/enrolled-agents` — full enrollment audit with duplicate-hostname annotation (scoped to active/non-revoked rows)
- New endpoint: `POST /api/enrolled-agents/:id/revoke` — per-enrollment targeted revocation with cascade to `agents.api_key_hash`
**Dashboard additions:**
- AgentDetail: Crashes tab (BSOD events), version history table in Updates tab
- Dashboard.tsx: fleet stats wired to `/agents/stats` endpoint (was computed client-side)
- SiteDetail: Revoke Key button + Enrollment tab with full audit table, status badges, duplicate-hostname warnings, per-row revoke with confirm dialog
- New pages: InstallReports (`/install-reports`), Discovery (`/discovery`)
- FunctionRail: nav links for Install Reports + Fleet Discovery
**Hotfix (production 500s):**
- `get_agent_with_details_by_id` SQL was missing `a.role_override` — migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and the pre-054 SQL template was used for the new function. `sqlx::FromRow` panics at runtime when a required field has no column. All `GET /api/agents/:id` calls returned 500 until hotfix commit `6faa382`.
**Key commits:** `5854c63` (UI gap batch), `6faa382` (hotfix: role_override in SQL), `49a7109` (enrollment audit batch), `00129af` (fix: wire revoke_agent_key_handler to route)
**Tunnel status:** Server `tunnel.rs` is a dead-code skeleton logging "not yet implemented" — deferred; needs xterm.js design + WS protocol spec before implementation.
---
### 2026-06-07 — Credential Inheritance Deployment & Offboarding Spec
**Deployed to production:**
@@ -175,6 +203,34 @@ Agent<->server communication is a persistent authenticated WebSocket with auto-r
- Enrollment & keys (`012`): per-agent key issuance on first run, site API keys (regenerable), site-specific MSI with SITEKEY injected at download, public install-report ingestion. Legacy PowerShell agent path for Server 2008 R2.
- Logs: agent log upload (periodic + on-demand), per-agent events (`042`), fleet log view, AI-assisted log analysis (`/logs/analyze`) — AI-optional per locked decision.
### Enrollment Detail (migration 012 + 2026-06-07 audit)
The `enrolled_agents` table (migration 012) is the authoritative enrollment history log:
| Field | Notes |
|---|---|
| `site_id` | Site the agent enrolled under |
| `agent_id` | Set on WebSocket connect (links enrollment record to the agent row) |
| `hostname` | Hostname at enrollment time |
| `agent_key_hash` | Hash of the `agk_` enrollment key (intentionally hidden; not surfaced in API) |
| `enrolled_at` | Timestamp of initial enrollment |
| `last_seen` | Updated on each WS connection |
| `ip_address` | IP at last connection |
| `os_version` | OS at enrollment time |
| `revoked` | Boolean; set TRUE by revocation endpoints |
**WS auth modes:**
- **Mode 3** (modern, `agk_` keys): checks `enrolled_agents.agent_key_hash` first
- **Mode 1/2** (legacy): falls back to `agents.api_key_hash`
**Revocation — dual-layer (fixed 2026-06-07):**
- `DELETE /api/agents/:id/key` (bulk): revokes BOTH `agents.api_key_hash` (Mode 1/2) AND sets `enrolled_agents.revoked = TRUE` for all enrollment records (Mode 3). Previously only touched the legacy field.
- `POST /api/enrolled-agents/:id/revoke` (targeted): per-enrollment revocation with cascade to `agents.api_key_hash`.
**Enrollment audit endpoint:** `GET /api/sites/:id/enrolled-agents` — returns full enrollment history with `is_duplicate_hostname` annotation. Duplicate flag is scoped to active (non-revoked) rows only; a revoked + active pair is not flagged.
**Dashboard:** Enrollment tab on SiteDetail — audit table with status badges, duplicate-hostname warnings, and per-row Revoke button with confirmation dialog (shipped 2026-06-07).
### Platform coverage
- **Cross-platform (Win/Linux/macOS):** metrics, network state, hardware/software/service inventory, user/group + DC/domain detection, checks (service checks Win+Linux), discovery, self-update, scripts/commands.
- **Windows-only:** `user_session` command context (WTS), LibreHardwareMonitor temps, remote registry, tray-into-session, watchdog SCM supervision, BSOD detection.
@@ -266,7 +322,7 @@ As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.45):
- **Credential inheritance (deployed 2026-06-07):** Production server running v0.3.45 with full credential inheritance and de-duplication. `/effective` endpoints validated. Dashboard clickable alert badges and client-scoped filtering implemented.
- **SPEC-028 offboarding wizard (specification complete):** 835-line spec created for site and client offboarding workflows. Includes data export, dependency analysis, typed confirmation, and audit logging. Roadmap updated with "Client & Site Lifecycle Management" section. Implementation pending.
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main -> beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event -> 3s wait -> `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab + BSOD in Alerts stream (issue #10, dashboard bullets unchecked); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab shipped 2026-06-07 (AgentDetail Crashes tab + version history in Updates tab). Remaining: BSOD in Alerts stream (issue #10); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
- **Linux fleet unit drift:** Auto-updater replaces the binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix Linux agents have new binary + old unit (missing `StateDirectory=gururmm`). Needs an ops-script pass via `/rmm` or organic at next reinstall.
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
- **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
@@ -274,6 +330,10 @@ As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.45):
- **MSP360 backup integration** — Alert quality pass shipped 2026-06-07 (commits `779f7f6` + `b82c010`): false `backup_failed` alerts 15 -> 2; `backup_storage_low` removed (structurally false signal; 5 -> 0 false alerts). Dashboard UI shipped 2026-05-31. Phase 2 (management) not started; genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint).
- **MSP360 API scope (confirmed 2026-06-07):** Provider API token (vault `msp-tools/msp360-api.sops.yaml`) is monitoring-only at our tier: Companies/Users/Monitoring GET endpoints; every management path (plan delete, manual run, storage config) returns 404. White-labeled MSP360 agent has no `cbb.exe` at any standard path. Plan delete/run/storage remain MSP360-console tasks.
- **Alert mute dashboard (Task 5) -- NOT started:** Ignore button + required-reason prompt on alert row; Muted filter on alerts page + agent Alerts tab; Un-ignore control; muted rows show Un-ignore instead of Ack/Resolve. Server endpoints ready: `POST /api/alerts/:id/mute` + `/unmute`.
- **Tunnel session management (P2, deferred):** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented"; wiring the route would expose a non-functional endpoint. Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort.
- **SSH from BEAST to production broken:** `known_hosts` issue prevents SSH key auth from GURU-BEAST-ROG to `172.16.3.30`. Workaround during incident: JWT token + curl for endpoint diagnosis. Needs production server host key added to BEAST's `~/.ssh/known_hosts`.
- **`revoke_agent_key` dead code:** Old single-layer handler is dead code now that the route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass.
- **Enrollment key display gap:** `enrolled_agents` table doesn't expose a key prefix — key hash is intentionally hidden. UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of hash as a non-sensitive display field in a future pass.
- **Offline alerting -- classification gap:** SIF-SERVER (SifOidak.local DC), Server2013 (Sombra), and other inventory-less offline servers currently auto-classify as workstations (no `os_product_type`; `os_name` doesn't match ~/server/i for those names). Needs manual `role_override` via `PUT /api/agents/:id/role-override` -- Mike held on tagging them.
- **`/backup-status` endpoint shape gap:** Returns only one plan per agent (not all plans); makes agents with a dead old plan + healthy current plan look stale-but-green in the backup tab. Compliance domain evaluates all plans correctly. Not fixed this session — noted for future.
- **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH).
@@ -406,16 +466,15 @@ Gitea push to main
- Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)
**Dashboard — complete and working:**
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support.
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support, AgentDetail Crashes tab (BSOD events), AgentDetail version history in Updates tab, fleet stats from `/agents/stats`, SiteDetail Revoke Key button + Enrollment audit tab (with duplicate-hostname badge), Install Reports page, Fleet Discovery page.
**Dashboard — incomplete (see UI_GAPS.md):**
- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
- Watchdog alerts UI — blocked on 2 missing server routes
- BSOD/Crashes tab on Agent Detail (Phase 2 deferred)
- BSOD in Alerts stream (Phase 2 deferred)
- Tunnel session management (interactive terminal — backend skeleton, not production-ready)
- Tunnel session management (interactive terminal — server skeleton "not yet implemented"; needs xterm.js design + WS protocol spec)
- Offboarding wizard UI (SPEC-028 complete, implementation pending)
- Alert mute UI -- Ignore button + required-reason prompt, Muted filter, Un-ignore control (server endpoints ready; dashboard Task 5 NOT started)
- Documentation / user guide / inline help tooltips (P3, not started)
**Open Gitea issues:**
- #10 — BSOD detection Phase 2/3 (dashboard + fetch_bsod_dump + full bugcheck table)
@@ -478,6 +537,7 @@ These decisions are locked. Do not reverse without explicit user approval.
| 2026-06-07 | Backup-alert quality pass shipped. FU1 (`summarize_backup_error` decodes MSP360 message JSON; `create_or_update_alert` now refreshes title/message/severity on re-trigger, also fixes latent severity-escalation freeze) + FU2 (exclude non-backup PlanTypes 8=Restore/13=Consistency-check from alerting/compliance): false `backup_failed` alerts 15 -> 2 fleet-wide (survivors AD1, LAB-Becky are genuine and self-describing), commit `779f7f6`. `backup_storage_low` alert type removed entirely (commit `b82c010`): `DataCopied/TotalData` measures backup-dataset completeness, not destination capacity — produced 5 fleet-wide false alerts including DF-HYPERV-B "100% Full" on a 4 GB plan; `resolve_all_backup_storage_alerts` (type-scoped, idempotent, once-per-tick) clears stragglers; 5 -> 0 verified after 17:21:41 UTC restart. Genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint). `BACKUP_STALE` evaluator confirmed already correct — no new code. Both commits on main. Submodule pinned at `226ba9f` in parent. |
| 2026-06-07 | Credential inheritance deployed to production (server v0.3.45). Hierarchical credential propagation (Global → Client → Site) with `is_inheritable` flag and de-duplication by (credential_type, label). `/effective` endpoints validated. Dashboard UI: clickable alert severity badges with client filtering, offline badge now scopes to client-specific agents. SPEC-028 offboarding wizard specification created (835 lines) covering site and client offboarding workflows with data export, dependency analysis, typed confirmation, and audit logging. FEATURE_ROADMAP.md updated with "Client & Site Lifecycle Management" section. |
| 2026-06-07 | Role-aware offline alerting + alert ignore/mute shipped (second session). Offline sweep evaluator (60s tokio interval, `server/src/alerts/offline.rs`): server-only `agent_offline` alerts (classifier: `os_product_type` 2/3, else `os_name` ~/server/i, else `role_override`); site rule (>=50% + >=3) -> `mass_offline_site`; fleet rule (>=10) -> `mass_offline_fleet`; aggregates pinned to representative agent with site/fleet `dedup_key`. Warm-up restart guard (code review caught + fixed `last_seen < started_at` spec defect — permanently false in steady state). Migration 054 (`agents.role_override`). Dashboard triage: servers elevated individually; workstations collapsed into roll-up. `PUT /api/agents/:id/role-override`. Verified live vs WIN-TG2STMODJG8 ("Windows Server 2019 Standard Evaluation", auto-classified via `os_name`); zero false mass-offline against ~55 chronically-offline agents; restart guard held across deploys. Known gap: `os_product_type` on only 16/168 agents; `os_name` is workhorse. Commits f1cdf5d/30e4f23/21d63bd/3eedf91. Alert ignore/mute (perma-silence): migration 055 (`alert_mutes` table + `muted` status); `dedup_key`-keyed; reason required (400 if missing); gates `create_or_update_alert` + `create_check_alert` bypass; `POST /api/alerts/:id/mute` + `/unmute`; dashboard UI NOT started. Verified live (mute -> `muted`; 60s sweep did not re-fire; unmute -> `active`). Commits 29c405e/a120e71. MSP360 Provider API probed live: monitoring-only tier (Companies/Users/Monitoring GET; management paths 404); plan delete/run/storage remain console tasks; white-label agent has no `cbb.exe`. |
| 2026-06-07 | UI gap batch + enrollment audit (GURU-BEAST-ROG session). `get_agent` upgraded to `AgentWithDetails` with `client_id`/`client_name` JOIN. Six new/updated server endpoints: `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, `DELETE /api/agents/:id/key` (redesigned as dual-layer revoke). Two new enrollment endpoints: `GET /api/sites/:id/enrolled-agents` (audit with duplicate-hostname annotation) + `POST /api/enrolled-agents/:id/revoke`. Dashboard: AgentDetail Crashes tab + version history, Dashboard.tsx fleet stats from `/agents/stats`, SiteDetail Revoke Key + Enrollment audit tab, new pages InstallReports + Discovery. Hotfix `6faa382`: `role_override` field missing from new `get_agent_with_details_by_id` SQL caused all `GET /api/agents/:id` to return 500 in production; fixed within same session. Enrollment `enrolled_agents` table (migration 012) required no new migration — gap was API exposure and UI only. WS auth: Mode 3 checks `enrolled_agents.agent_key_hash` first; Mode 1/2 falls back to `agents.api_key_hash`; dual-revoke now covers both layers. Key commits: `5854c63`, `6faa382`, `49a7109`, `00129af`. All UI gaps except Tunnel and documentation are now complete. SSH from BEAST to production broken (known_hosts issue) — workaround: JWT + curl. |
---
@@ -493,6 +553,7 @@ These decisions are locked. Do not reverse without explicit user approval.
- **2026-06-07 recompile:** Folded in backup-alert quality pass (commits `779f7f6` + `b82c010`, both on main). Updated Backup Integration capability section: added FU1/FU2 alert quality pass detail (false backup_failed 15->2; summarize_backup_error; create_or_update_alert refresh); documented backup_storage_low removal (structurally false DataCopied/TotalData signal; 5->0 false alerts; resolve_all_backup_storage_alerts); confirmed BACKUP_STALE evaluator correct (no new code); added key functions list and MSP360 PlanType exclusion map. Updated Repo Structure to include db/mspbackups.rs and mspbackups/ key functions. Updated Current Focus MSP360 line and added /backup-status endpoint shape gap. Updated Summary date and added backup-alert quality pass note. Active State date note updated. Added 2026-06-07 History row. Patterns and History existing rows preserved verbatim.
- **2026-06-07 recompile:** Updated for credential inheritance production deployment (server v0.3.45), clickable alert badges with client filtering, and SPEC-028 offboarding wizard specification. Added Recent Work section documenting 2026-06-07 session accomplishments. Updated Current Focus to reflect credential inheritance as deployed and offboarding wizard as spec-complete/implementation-pending. Updated Dashboard status to include credentials management with inheritance. Updated version numbers throughout (server 0.3.37 → 0.3.45). Added session-logs/2026-06-07-mike-gururmm-offboarding-spec.md to sources. Updated History Highlights with 2026-06-07 entry.
- **2026-06-07 recompile (second session):** Folded in role-aware offline alerting (server/src/alerts/offline.rs: offline_sweep + agent_role classifier + mass_offline site/fleet detection; migration 054 agents.role_override; dashboard triage + role-override control on agent detail; warm-up restart guard; four commits f1cdf5d/30e4f23/21d63bd/3eedf91) and alert ignore/mute perma-silence (alert_mutes table, muted status, is_dedup_muted/mute_condition/unmute_condition, two-choke-point gate at create_or_update_alert + create_check_alert bypass, mute/unmute API; migration 055; dashboard pending; two commits 29c405e/a120e71). Added MSP360 API scope finding (monitoring-only tier confirmed by live probe; management paths 404). Updated Alerting & Watchdog section with offline alerting and mute detail including new alert types (agent_offline, mass_offline_site, mass_offline_fleet) and new status (muted). Updated Repo Structure (alerts/ directory; db/alerts.rs key functions; dashboard/ entry with agentRole.ts). Updated Development / Current Focus with alert mute dashboard (Task 5 not started), offline classification gap, and MSP360 API scope item. Added alert mute UI to Dashboard incomplete list. Added second 2026-06-07 History row. Updated migration count to 55+ (054/055 confirmed). Added session log source. Patterns section and all existing History rows preserved verbatim.
- **2026-06-07 recompile (GURU-BEAST-ROG/discord-bot):** Folded in UI gap batch + enrollment audit session. Added new Recent Work entry (2026-06-07, UI gaps + enrollment). Added Enrollment Detail subsection under Identity / Multi-tenancy documenting `enrolled_agents` table schema, WS auth modes, dual-layer revoke design, and enrollment audit endpoint. Updated `get_agent` capability (now `AgentWithDetails` with client JOIN). Documented six new/updated endpoints (bsod-events, version-history, install-reports, discovery/all-devices, dual-revoke key, enrollment audit + revoke). Updated Dashboard complete list: added Crashes tab, version history, fleet stats, SiteDetail Revoke Key + Enrollment tab, InstallReports + Discovery pages. Updated Dashboard incomplete list: removed enrollment management (now complete); updated Tunnel status (server skeleton confirmed "not yet implemented"); added documentation gap. Updated Current Focus: BSOD Phase 2 partially complete (Crashes tab shipped; Alerts stream + dump upload still deferred); added Tunnel deferred detail, SSH from BEAST broken, dead code cleanup, enrollment key display gap. Added new 2026-06-07 History row (UI gap batch + enrollment audit). Added session log to sources. Updated compiled_by to GURU-BEAST-ROG/discord-bot. History and Patterns sections preserved verbatim.
## Backlinks