From 14362628a2914c4c4b32395a36a2e6e474208025 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Sun, 7 Jun 2026 21:26:26 -0700 Subject: [PATCH] sync: auto-sync from GURU-BEAST-ROG at 2026-06-07 21:26:22 Author: Mike Swanson Machine: GURU-BEAST-ROG Timestamp: 2026-06-07 21:26:22 --- ...6-06-07-mike-gururmm-ui-gaps-enrollment.md | 146 ++++++++++++++++++ wiki/projects/gururmm.md | 73 ++++++++- 2 files changed, 213 insertions(+), 6 deletions(-) create mode 100644 session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md diff --git a/session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md b/session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md new file mode 100644 index 0000000..894fdea --- /dev/null +++ b/session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md @@ -0,0 +1,146 @@ +# Session Log — GuruRMM UI Gaps + Enrollment Audit + +**Date:** 2026-06-07 +**Topic:** GuruRMM UI/Dashboard — gap verification, batch implementation, hotfix, enrollment audit + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-BEAST-ROG +- **Role:** admin + +--- + +## Session Summary + +Session started as a GuruRMM UI/Dashboard feedback thread via Discord bot. First pass verified all known UI gaps against the current dashboard source code (submodule was 50 commits behind origin/main at the time of the first coding pass, which caused a merge conflict scenario later). Eight gaps were confirmed open: enrollment management, AgentDetail client row not rendering, dashboard fleet counts computed client-side, tunnel session UI, fleet discovery page, install reporting page, BSOD crashes tab, and agent version history. + +The main implementation batch addressed everything except tunnel (server skeleton is "not yet implemented" and needs xterm.js — deferred). A server-side Rust agent added six new/updated API endpoints: `get_agent` now returns `AgentWithDetails` (client_id + client_name via JOIN), `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, and `DELETE /api/agents/:id/key`. A dashboard agent added two new pages (InstallReports, Discovery), a Crashes tab on AgentDetail, a version history table in the Updates tab, fleet stat wiring on Dashboard.tsx, and a Revoke Key button in SiteDetail. Code review approved all changes. The Gitea agent committed and pushed but the submodule was in detached HEAD 50 commits behind main, requiring a stash/pull/rebase merge that resolved five conflicts including Layout.tsx (upstream had switched to FunctionRail architecture) and Dashboard.tsx (upstream had switched to a Triage/ExceptionStream layout). + +Immediately after the build deployed, Mike reported RMM was down. Diagnosis: the new `get_agent_with_details_by_id` SQL query was missing `a.role_override` — migration 054 had added `role_override: String` (required, non-optional) to `AgentWithDetails` and the existing `get_all_agents_with_details` already included it, but the new single-agent function was written from the pre-054 SQL. The `list_agents` endpoint continued working while `GET /api/agents/:id` returned 500. Hotfix was a one-line SQL change (`a.role_override` added), committed and pushed immediately (`6faa382`). + +Enrollment work followed. Investigation revealed the `enrolled_agents` table (migration 012) already stored everything needed — enrolled_at, last_seen, ip_address, os_version, revoked — so no new migration was required. The original `DELETE /api/agents/:id/key` implementation only nulled `agents.api_key_hash` (legacy Mode 1/2 key), but WS auth checks `enrolled_agents.agent_key_hash` first (Mode 3 for `agk_` keys), so revocation was incomplete for modern agents. Implementation added four new DB functions, `GET /api/sites/:id/enrolled-agents` with duplicate-hostname annotation, `POST /api/enrolled-agents/:id/revoke`, and fixed `DELETE /api/agents/:id/key` to cascade-revoke both key layers. Dashboard gained an Enrollment tab on SiteDetail with full audit table, status badges, duplicate warnings, per-row revoke with confirm. A second detached-HEAD merge was required; the Gitea agent resolved conflicts but incorrectly kept the old route pointer — caught and fixed manually (`00129af`). + +Session closed with a sync that also deployed Phase 5c of sync.sh (skills mirroring to `~/.claude/skills/`) per a coord message from GURU-5070. + +--- + +## Key Decisions + +- **Tunnel deferred:** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented" — wiring the route would expose a non-functional endpoint. Excluded from this batch; needs proper design + xterm.js before implementation. +- **No migration for enrollment:** `enrolled_agents` table (migration 012) already had all required fields. The gap was API exposure and UI, not missing data. +- **Dual-layer revoke:** `DELETE /api/agents/:id/key` was redesigned to revoke both `agents.api_key_hash` (legacy Mode 1/2) and `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Originally only the legacy field was touched, leaving modern agents able to reconnect. +- **Duplicate hostname detection via two queries:** Rather than a SQL window function, `list_site_enrollments` does a second query for `COUNT(*) > 1` grouped hostnames, then annotates in Rust. Simpler and safer with runtime sqlx (no compile-time macro). +- **`is_duplicate_hostname` scoped to active rows only:** Only non-revoked enrollments count toward the duplicate flag. A revoked + active pair is not flagged — the revocation resolved the ambiguity. +- **Per-row revoke vs. bulk revoke:** Added `POST /api/enrolled-agents/:id/revoke` for targeted per-enrollment revocation (with cascade to `agents.api_key_hash`). The existing `DELETE /api/agents/:id/key` is the bulk "revoke everything for this agent" path. + +--- + +## Problems Encountered + +- **Submodule detached HEAD (twice):** The submodule was 50 commits behind `origin/main` each time the coding agents worked on it. Required stash → checkout main → pull → stash pop → conflict resolution. Upstream had refactored Dashboard.tsx to a Triage/ExceptionStream layout and Layout.tsx to FunctionRail/InfrastructureSpine — the coding agents worked on the old layout. Gitea agent resolved both merges. +- **`role_override` field missing from new SQL query:** `get_agent_with_details_by_id` was copied from pre-migration-054 SQL. Migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and updated `get_all_agents_with_details` but not our new function. `sqlx::FromRow` panics at runtime when a required field has no column. Caused production 500s on all `GET /api/agents/:id` calls. Fixed in hotfix commit `6faa382`. +- **Route pointer left on old handler after merge:** The Gitea agent's conflict resolution for mod.rs kept the original `revoke_agent_key` route (single-layer revoke) rather than switching to `revoke_agent_key_handler` (dual-layer). Caught by manual review, fixed in `00129af`. +- **SSH to production server unavailable from BEAST:** `known_hosts` file issue prevented SSH key auth during incident diagnosis. Worked around by getting a JWT token via API and testing endpoints directly with curl. + +--- + +## Configuration Changes + +**GuruRMM submodule (`projects/msp-tools/guru-rmm/`):** + +Server files modified: +- `server/src/db/agents.rs` — `get_agent_with_details_by_id`, `revoke_agent_key` (DB functions) +- `server/src/db/enroll.rs` — `get_enrollment_by_id`, `revoke_single_enrollment`, `list_enrollments_for_site`, `duplicate_active_hostnames` (4 new DB functions) +- `server/src/db/discovery.rs` — `DiscoveredDeviceWithContext`, `list_all_discovered_devices` +- `server/src/db/updates.rs` — `Serialize` derive added to `AgentUpdateRecord` +- `server/src/db/bsod_events.rs` — removed stale `#[allow(dead_code)]` +- `server/src/api/agents.rs` — updated `get_agent` handler; added `list_bsod_events`, `get_agent_version_history`, `revoke_agent_key` (old, kept), `revoke_agent_key_handler` (new dual-revoke) +- `server/src/api/enroll.rs` — `EnrollmentRecord` struct, `list_site_enrollments`, `revoke_enrollment` handlers +- `server/src/api/install_report.rs` — `InstallReport` struct, `list_install_reports`, `get_install_report` handlers +- `server/src/api/discovery.rs` — `list_all_devices` handler +- `server/src/api/mod.rs` — 10 new routes wired + +Dashboard files modified: +- `dashboard/src/api/client.ts` — `AgentStats`, `BsodEvent`, `AgentUpdateRecord`, `InstallReport`, `DiscoveredDeviceWithContext`, `EnrollmentRecord` interfaces; `agentsApi` extensions; `installReportsApi`, `discoveryFleetApi`, `enrollApi` namespaces +- `dashboard/src/pages/AgentDetail.tsx` — Crashes tab, version history in Updates tab +- `dashboard/src/pages/Dashboard.tsx` — fleet stats wired to `/agents/stats` +- `dashboard/src/pages/SiteDetail.tsx` — Revoke Key button + Enrollment tab +- `dashboard/src/components/Tabs.tsx` — optional `badge` prop +- `dashboard/src/components/FunctionRail.tsx` — Install Reports + Fleet Discovery nav links +- `dashboard/src/App.tsx` — `/install-reports`, `/discovery` routes + +Dashboard files created: +- `dashboard/src/pages/InstallReports.tsx` +- `dashboard/src/pages/Discovery.tsx` + +**ClaudeTools repo:** +- `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` — updated to mark 8 items complete, detail enrollment partial status + +--- + +## Credentials & Secrets + +- GuruRMM API admin: `claude-api@azcomputerguru.com` / vault: `infrastructure/gururmm-server.sops.yaml` field `admin-password` +- SSH: `guru@172.16.3.30` — SSH key auth failing from BEAST (known_hosts issue); password in vault `infrastructure/gururmm-server.sops.yaml` field `password` + +--- + +## Infrastructure & Servers + +- **GuruRMM server:** `172.16.3.30:3001` (Rust/Axum, systemd `gururmm-server`) +- **Dashboard:** `https://rmm.azcomputerguru.com` (nginx, `/var/www/gururmm/dashboard/`) +- **Gitea:** `https://git.azcomputerguru.com/azcomputerguru/gururmm` (repo: `azcomputerguru/gururmm`) +- **Build pipeline webhook:** `172.16.3.30:9000` (not reachable from BEAST — Gitea triggers it) +- **Coord API:** `http://172.16.3.30:8001/api/coord` + +--- + +## Commands & Outputs + +```bash +# Verified API server up during incident +curl -s "http://172.16.3.30:3001/api/auth/me" +# → "Missing authorization header" (server alive) + +# Confirmed new build deployed (new endpoint returns 200) +curl -sv "http://172.16.3.30:3001/api/agents//bsod-events" -H "Authorization: Bearer $TOKEN" +# → HTTP/1.1 200 OK + +# Confirmed bug: get_agent returning 500 +curl -s "http://172.16.3.30:3001/api/agents/" -H "Authorization: Bearer $TOKEN" +# → "Internal server error" + +# Hotfix commits (gururmm submodule) +# 5854c63 — main UI gap batch +# 6faa382 — hotfix: role_override missing from get_agent_with_details_by_id +# 49a7109 — enrollment audit batch +# 00129af — fix: wire revoke_agent_key_handler to route + +# Coord message handled: GURU-5070 skills empty, Phase 5c fix +# BEAST sync confirmed 19 skills deployed +``` + +--- + +## Pending / Incomplete Tasks + +- **Tunnel Session Management (P2):** Server skeleton "not yet implemented". Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort. +- **Enrollment audit log (enrollment table) — key status column:** The `enrolled_agents` table doesn't expose a "key_prefix" field (key hash is intentionally hidden). The UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of key hash as a non-sensitive display field in a future pass. +- **SSH from BEAST to production:** `known_hosts` file issue prevents SSH key auth. Should be fixed for incident response capability. Likely needs the production server's host key added to BEAST's known_hosts. +- **Tunnel (deferred):** See above. +- **Documentation (P3):** User guide, inline help tooltips — still not started. +- **`revoke_agent_key` (old handler):** Dead code now that route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass to avoid confusion. + +--- + +## Reference Information + +- GuruRMM wiki: `wiki/projects/gururmm.md` +- UI gaps doc: `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` +- enrolled_agents migration: `server/migrations/012_enrolled_agents.sql` +- Key commits: + - `5854c63` — UI gap batch (7 server endpoints, 2 new pages, 5 dashboard enhancements) + - `6faa382` — hotfix role_override in get_agent SQL + - `49a7109` — enrollment audit batch + - `00129af` — fix route pointer to dual-revoke handler +- Coord message addressed: GURU-5070 skills Phase 5c fix (commit `62fed033`) diff --git a/wiki/projects/gururmm.md b/wiki/projects/gururmm.md index 33cc73d..cfc8cac 100644 --- a/wiki/projects/gururmm.md +++ b/wiki/projects/gururmm.md @@ -3,7 +3,7 @@ type: project name: gururmm display_name: GuruRMM last_compiled: 2026-06-07 -compiled_by: Mikes-MacBook-Air/claude-main +compiled_by: GURU-BEAST-ROG/discord-bot aliases: - guru-rmm sources: @@ -53,6 +53,7 @@ sources: - "live GuruRMM Postgres query 2026-06-04: agents/sites/update_rollouts/agent_updates tables (channel verification)" - session-logs/2026-06-07-mike-gururmm-backup-alert-cleanup.md - session-logs/2026-06-07-mike-gururmm-offline-alerting-mute.md + - session-logs/2026-06-07-mike-gururmm-ui-gaps-enrollment.md backlinks: - clients/cascades-tucson - systems/gururmm-build @@ -82,6 +83,33 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G ## Recent Work +### 2026-06-07 — UI Gap Batch + Enrollment Audit (GURU-BEAST-ROG session) + +**API changes:** +- `get_agent` handler now returns `AgentWithDetails` — includes `client_id` and `client_name` via JOIN query (was previously missing client context) +- New endpoints: `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history` +- New endpoints: `GET /api/install-reports`, `GET /api/install-reports/:id` +- New endpoint: `GET /api/discovery/all-devices` +- `DELETE /api/agents/:id/key` redesigned as dual-revoke: nulls `agents.api_key_hash` (legacy Mode 1/2) AND sets `enrolled_agents.revoked = TRUE` (modern agk_ Mode 3). Previous implementation only touched the legacy field, leaving modern agents able to reconnect. +- New endpoint: `GET /api/sites/:id/enrolled-agents` — full enrollment audit with duplicate-hostname annotation (scoped to active/non-revoked rows) +- New endpoint: `POST /api/enrolled-agents/:id/revoke` — per-enrollment targeted revocation with cascade to `agents.api_key_hash` + +**Dashboard additions:** +- AgentDetail: Crashes tab (BSOD events), version history table in Updates tab +- Dashboard.tsx: fleet stats wired to `/agents/stats` endpoint (was computed client-side) +- SiteDetail: Revoke Key button + Enrollment tab with full audit table, status badges, duplicate-hostname warnings, per-row revoke with confirm dialog +- New pages: InstallReports (`/install-reports`), Discovery (`/discovery`) +- FunctionRail: nav links for Install Reports + Fleet Discovery + +**Hotfix (production 500s):** +- `get_agent_with_details_by_id` SQL was missing `a.role_override` — migration 054 added `role_override: String` (non-optional) to `AgentWithDetails` and the pre-054 SQL template was used for the new function. `sqlx::FromRow` panics at runtime when a required field has no column. All `GET /api/agents/:id` calls returned 500 until hotfix commit `6faa382`. + +**Key commits:** `5854c63` (UI gap batch), `6faa382` (hotfix: role_override in SQL), `49a7109` (enrollment audit batch), `00129af` (fix: wire revoke_agent_key_handler to route) + +**Tunnel status:** Server `tunnel.rs` is a dead-code skeleton logging "not yet implemented" — deferred; needs xterm.js design + WS protocol spec before implementation. + +--- + ### 2026-06-07 — Credential Inheritance Deployment & Offboarding Spec **Deployed to production:** @@ -175,6 +203,34 @@ Agent<->server communication is a persistent authenticated WebSocket with auto-r - Enrollment & keys (`012`): per-agent key issuance on first run, site API keys (regenerable), site-specific MSI with SITEKEY injected at download, public install-report ingestion. Legacy PowerShell agent path for Server 2008 R2. - Logs: agent log upload (periodic + on-demand), per-agent events (`042`), fleet log view, AI-assisted log analysis (`/logs/analyze`) — AI-optional per locked decision. +### Enrollment Detail (migration 012 + 2026-06-07 audit) + +The `enrolled_agents` table (migration 012) is the authoritative enrollment history log: + +| Field | Notes | +|---|---| +| `site_id` | Site the agent enrolled under | +| `agent_id` | Set on WebSocket connect (links enrollment record to the agent row) | +| `hostname` | Hostname at enrollment time | +| `agent_key_hash` | Hash of the `agk_` enrollment key (intentionally hidden; not surfaced in API) | +| `enrolled_at` | Timestamp of initial enrollment | +| `last_seen` | Updated on each WS connection | +| `ip_address` | IP at last connection | +| `os_version` | OS at enrollment time | +| `revoked` | Boolean; set TRUE by revocation endpoints | + +**WS auth modes:** +- **Mode 3** (modern, `agk_` keys): checks `enrolled_agents.agent_key_hash` first +- **Mode 1/2** (legacy): falls back to `agents.api_key_hash` + +**Revocation — dual-layer (fixed 2026-06-07):** +- `DELETE /api/agents/:id/key` (bulk): revokes BOTH `agents.api_key_hash` (Mode 1/2) AND sets `enrolled_agents.revoked = TRUE` for all enrollment records (Mode 3). Previously only touched the legacy field. +- `POST /api/enrolled-agents/:id/revoke` (targeted): per-enrollment revocation with cascade to `agents.api_key_hash`. + +**Enrollment audit endpoint:** `GET /api/sites/:id/enrolled-agents` — returns full enrollment history with `is_duplicate_hostname` annotation. Duplicate flag is scoped to active (non-revoked) rows only; a revoked + active pair is not flagged. + +**Dashboard:** Enrollment tab on SiteDetail — audit table with status badges, duplicate-hostname warnings, and per-row Revoke button with confirmation dialog (shipped 2026-06-07). + ### Platform coverage - **Cross-platform (Win/Linux/macOS):** metrics, network state, hardware/software/service inventory, user/group + DC/domain detection, checks (service checks Win+Linux), discovery, self-update, scripts/commands. - **Windows-only:** `user_session` command context (WTS), LibreHardwareMonitor temps, remote registry, tray-into-session, watchdog SCM supervision, BSOD detection. @@ -266,7 +322,7 @@ As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.45): - **Credential inheritance (deployed 2026-06-07):** Production server running v0.3.45 with full credential inheritance and de-duplication. `/effective` endpoints validated. Dashboard clickable alert badges and client-scoped filtering implemented. - **SPEC-028 offboarding wizard (specification complete):** 835-line spec created for site and client offboarding workflows. Includes data export, dependency analysis, typed confirmation, and audit logging. Roadmap updated with "Client & Site Lifecycle Management" section. Implementation pending. - **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main -> beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event -> 3s wait -> `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.] -- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab + BSOD in Alerts stream (issue #10, dashboard bullets unchecked); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map). +- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab shipped 2026-06-07 (AgentDetail Crashes tab + version history in Updates tab). Remaining: BSOD in Alerts stream (issue #10); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map). - **Linux fleet unit drift:** Auto-updater replaces the binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix Linux agents have new binary + old unit (missing `StateDirectory=gururmm`). Needs an ops-script pass via `/rmm` or organic at next reinstall. - **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19). - **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24. @@ -274,6 +330,10 @@ As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.45): - **MSP360 backup integration** — Alert quality pass shipped 2026-06-07 (commits `779f7f6` + `b82c010`): false `backup_failed` alerts 15 -> 2; `backup_storage_low` removed (structurally false signal; 5 -> 0 false alerts). Dashboard UI shipped 2026-05-31. Phase 2 (management) not started; genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint). - **MSP360 API scope (confirmed 2026-06-07):** Provider API token (vault `msp-tools/msp360-api.sops.yaml`) is monitoring-only at our tier: Companies/Users/Monitoring GET endpoints; every management path (plan delete, manual run, storage config) returns 404. White-labeled MSP360 agent has no `cbb.exe` at any standard path. Plan delete/run/storage remain MSP360-console tasks. - **Alert mute dashboard (Task 5) -- NOT started:** Ignore button + required-reason prompt on alert row; Muted filter on alerts page + agent Alerts tab; Un-ignore control; muted rows show Un-ignore instead of Ack/Resolve. Server endpoints ready: `POST /api/alerts/:id/mute` + `/unmute`. +- **Tunnel session management (P2, deferred):** Server `tunnel.rs` is a dead-code skeleton that logs "not yet implemented"; wiring the route would expose a non-functional endpoint. Needs: xterm.js design, WS protocol spec for dashboard client, proper Phase 2 implementation. Estimated 3-5 days separate effort. +- **SSH from BEAST to production broken:** `known_hosts` issue prevents SSH key auth from GURU-BEAST-ROG to `172.16.3.30`. Workaround during incident: JWT token + curl for endpoint diagnosis. Needs production server host key added to BEAST's `~/.ssh/known_hosts`. +- **`revoke_agent_key` dead code:** Old single-layer handler is dead code now that the route points to `revoke_agent_key_handler`. Should be removed in a cleanup pass. +- **Enrollment key display gap:** `enrolled_agents` table doesn't expose a key prefix — key hash is intentionally hidden. UI shows hostname, dates, IPs but not a redacted key identifier. Could add first-8-chars of hash as a non-sensitive display field in a future pass. - **Offline alerting -- classification gap:** SIF-SERVER (SifOidak.local DC), Server2013 (Sombra), and other inventory-less offline servers currently auto-classify as workstations (no `os_product_type`; `os_name` doesn't match ~/server/i for those names). Needs manual `role_override` via `PUT /api/agents/:id/role-override` -- Mike held on tagging them. - **`/backup-status` endpoint shape gap:** Returns only one plan per agent (not all plans); makes agents with a dead old plan + healthy current plan look stale-but-green in the backup tab. Compliance domain evaluates all plans correctly. Not fixed this session — noted for future. - **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH). @@ -406,16 +466,15 @@ Gitea push to main - Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted) **Dashboard — complete and working:** -Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support. +Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support, AgentDetail Crashes tab (BSOD events), AgentDetail version history in Updates tab, fleet stats from `/agents/stats`, SiteDetail Revoke Key button + Enrollment audit tab (with duplicate-hostname badge), Install Reports page, Fleet Discovery page. **Dashboard — incomplete (see UI_GAPS.md):** -- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings) - Watchdog alerts UI — blocked on 2 missing server routes -- BSOD/Crashes tab on Agent Detail (Phase 2 deferred) - BSOD in Alerts stream (Phase 2 deferred) -- Tunnel session management (interactive terminal — backend skeleton, not production-ready) +- Tunnel session management (interactive terminal — server skeleton "not yet implemented"; needs xterm.js design + WS protocol spec) - Offboarding wizard UI (SPEC-028 complete, implementation pending) - Alert mute UI -- Ignore button + required-reason prompt, Muted filter, Un-ignore control (server endpoints ready; dashboard Task 5 NOT started) +- Documentation / user guide / inline help tooltips (P3, not started) **Open Gitea issues:** - #10 — BSOD detection Phase 2/3 (dashboard + fetch_bsod_dump + full bugcheck table) @@ -478,6 +537,7 @@ These decisions are locked. Do not reverse without explicit user approval. | 2026-06-07 | Backup-alert quality pass shipped. FU1 (`summarize_backup_error` decodes MSP360 message JSON; `create_or_update_alert` now refreshes title/message/severity on re-trigger, also fixes latent severity-escalation freeze) + FU2 (exclude non-backup PlanTypes 8=Restore/13=Consistency-check from alerting/compliance): false `backup_failed` alerts 15 -> 2 fleet-wide (survivors AD1, LAB-Becky are genuine and self-describing), commit `779f7f6`. `backup_storage_low` alert type removed entirely (commit `b82c010`): `DataCopied/TotalData` measures backup-dataset completeness, not destination capacity — produced 5 fleet-wide false alerts including DF-HYPERV-B "100% Full" on a 4 GB plan; `resolve_all_backup_storage_alerts` (type-scoped, idempotent, once-per-tick) clears stragglers; 5 -> 0 verified after 17:21:41 UTC restart. Genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint). `BACKUP_STALE` evaluator confirmed already correct — no new code. Both commits on main. Submodule pinned at `226ba9f` in parent. | | 2026-06-07 | Credential inheritance deployed to production (server v0.3.45). Hierarchical credential propagation (Global → Client → Site) with `is_inheritable` flag and de-duplication by (credential_type, label). `/effective` endpoints validated. Dashboard UI: clickable alert severity badges with client filtering, offline badge now scopes to client-specific agents. SPEC-028 offboarding wizard specification created (835 lines) covering site and client offboarding workflows with data export, dependency analysis, typed confirmation, and audit logging. FEATURE_ROADMAP.md updated with "Client & Site Lifecycle Management" section. | | 2026-06-07 | Role-aware offline alerting + alert ignore/mute shipped (second session). Offline sweep evaluator (60s tokio interval, `server/src/alerts/offline.rs`): server-only `agent_offline` alerts (classifier: `os_product_type` 2/3, else `os_name` ~/server/i, else `role_override`); site rule (>=50% + >=3) -> `mass_offline_site`; fleet rule (>=10) -> `mass_offline_fleet`; aggregates pinned to representative agent with site/fleet `dedup_key`. Warm-up restart guard (code review caught + fixed `last_seen < started_at` spec defect — permanently false in steady state). Migration 054 (`agents.role_override`). Dashboard triage: servers elevated individually; workstations collapsed into roll-up. `PUT /api/agents/:id/role-override`. Verified live vs WIN-TG2STMODJG8 ("Windows Server 2019 Standard Evaluation", auto-classified via `os_name`); zero false mass-offline against ~55 chronically-offline agents; restart guard held across deploys. Known gap: `os_product_type` on only 16/168 agents; `os_name` is workhorse. Commits f1cdf5d/30e4f23/21d63bd/3eedf91. Alert ignore/mute (perma-silence): migration 055 (`alert_mutes` table + `muted` status); `dedup_key`-keyed; reason required (400 if missing); gates `create_or_update_alert` + `create_check_alert` bypass; `POST /api/alerts/:id/mute` + `/unmute`; dashboard UI NOT started. Verified live (mute -> `muted`; 60s sweep did not re-fire; unmute -> `active`). Commits 29c405e/a120e71. MSP360 Provider API probed live: monitoring-only tier (Companies/Users/Monitoring GET; management paths 404); plan delete/run/storage remain console tasks; white-label agent has no `cbb.exe`. | +| 2026-06-07 | UI gap batch + enrollment audit (GURU-BEAST-ROG session). `get_agent` upgraded to `AgentWithDetails` with `client_id`/`client_name` JOIN. Six new/updated server endpoints: `GET /api/agents/:id/bsod-events`, `GET /api/agents/:id/version-history`, `GET /api/install-reports` + `/:id`, `GET /api/discovery/all-devices`, `DELETE /api/agents/:id/key` (redesigned as dual-layer revoke). Two new enrollment endpoints: `GET /api/sites/:id/enrolled-agents` (audit with duplicate-hostname annotation) + `POST /api/enrolled-agents/:id/revoke`. Dashboard: AgentDetail Crashes tab + version history, Dashboard.tsx fleet stats from `/agents/stats`, SiteDetail Revoke Key + Enrollment audit tab, new pages InstallReports + Discovery. Hotfix `6faa382`: `role_override` field missing from new `get_agent_with_details_by_id` SQL caused all `GET /api/agents/:id` to return 500 in production; fixed within same session. Enrollment `enrolled_agents` table (migration 012) required no new migration — gap was API exposure and UI only. WS auth: Mode 3 checks `enrolled_agents.agent_key_hash` first; Mode 1/2 falls back to `agents.api_key_hash`; dual-revoke now covers both layers. Key commits: `5854c63`, `6faa382`, `49a7109`, `00129af`. All UI gaps except Tunnel and documentation are now complete. SSH from BEAST to production broken (known_hosts issue) — workaround: JWT + curl. | --- @@ -493,6 +553,7 @@ These decisions are locked. Do not reverse without explicit user approval. - **2026-06-07 recompile:** Folded in backup-alert quality pass (commits `779f7f6` + `b82c010`, both on main). Updated Backup Integration capability section: added FU1/FU2 alert quality pass detail (false backup_failed 15->2; summarize_backup_error; create_or_update_alert refresh); documented backup_storage_low removal (structurally false DataCopied/TotalData signal; 5->0 false alerts; resolve_all_backup_storage_alerts); confirmed BACKUP_STALE evaluator correct (no new code); added key functions list and MSP360 PlanType exclusion map. Updated Repo Structure to include db/mspbackups.rs and mspbackups/ key functions. Updated Current Focus MSP360 line and added /backup-status endpoint shape gap. Updated Summary date and added backup-alert quality pass note. Active State date note updated. Added 2026-06-07 History row. Patterns and History existing rows preserved verbatim. - **2026-06-07 recompile:** Updated for credential inheritance production deployment (server v0.3.45), clickable alert badges with client filtering, and SPEC-028 offboarding wizard specification. Added Recent Work section documenting 2026-06-07 session accomplishments. Updated Current Focus to reflect credential inheritance as deployed and offboarding wizard as spec-complete/implementation-pending. Updated Dashboard status to include credentials management with inheritance. Updated version numbers throughout (server 0.3.37 → 0.3.45). Added session-logs/2026-06-07-mike-gururmm-offboarding-spec.md to sources. Updated History Highlights with 2026-06-07 entry. - **2026-06-07 recompile (second session):** Folded in role-aware offline alerting (server/src/alerts/offline.rs: offline_sweep + agent_role classifier + mass_offline site/fleet detection; migration 054 agents.role_override; dashboard triage + role-override control on agent detail; warm-up restart guard; four commits f1cdf5d/30e4f23/21d63bd/3eedf91) and alert ignore/mute perma-silence (alert_mutes table, muted status, is_dedup_muted/mute_condition/unmute_condition, two-choke-point gate at create_or_update_alert + create_check_alert bypass, mute/unmute API; migration 055; dashboard pending; two commits 29c405e/a120e71). Added MSP360 API scope finding (monitoring-only tier confirmed by live probe; management paths 404). Updated Alerting & Watchdog section with offline alerting and mute detail including new alert types (agent_offline, mass_offline_site, mass_offline_fleet) and new status (muted). Updated Repo Structure (alerts/ directory; db/alerts.rs key functions; dashboard/ entry with agentRole.ts). Updated Development / Current Focus with alert mute dashboard (Task 5 not started), offline classification gap, and MSP360 API scope item. Added alert mute UI to Dashboard incomplete list. Added second 2026-06-07 History row. Updated migration count to 55+ (054/055 confirmed). Added session log source. Patterns section and all existing History rows preserved verbatim. +- **2026-06-07 recompile (GURU-BEAST-ROG/discord-bot):** Folded in UI gap batch + enrollment audit session. Added new Recent Work entry (2026-06-07, UI gaps + enrollment). Added Enrollment Detail subsection under Identity / Multi-tenancy documenting `enrolled_agents` table schema, WS auth modes, dual-layer revoke design, and enrollment audit endpoint. Updated `get_agent` capability (now `AgentWithDetails` with client JOIN). Documented six new/updated endpoints (bsod-events, version-history, install-reports, discovery/all-devices, dual-revoke key, enrollment audit + revoke). Updated Dashboard complete list: added Crashes tab, version history, fleet stats, SiteDetail Revoke Key + Enrollment tab, InstallReports + Discovery pages. Updated Dashboard incomplete list: removed enrollment management (now complete); updated Tunnel status (server skeleton confirmed "not yet implemented"); added documentation gap. Updated Current Focus: BSOD Phase 2 partially complete (Crashes tab shipped; Alerts stream + dump upload still deferred); added Tunnel deferred detail, SSH from BEAST broken, dead code cleanup, enrollment key display gap. Added new 2026-06-07 History row (UI gap batch + enrollment audit). Added session log to sources. Updated compiled_by to GURU-BEAST-ROG/discord-bot. History and Patterns sections preserved verbatim. ## Backlinks