sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-19 17:56:56

Author: Mike Swanson Machine: DESKTOP-0O8A1RL Timestamp: 2026-05-19 17:56:56
2026-05-19 17:57:02 -07:00
parent b918776eee
commit 5ead5d4dee
5 changed files with 482 additions and 1 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -252,6 +252,7 @@ Vault structure: `infrastructure/`, `clients/`, `services/`, `projects/`, `msp-t
 | `/remediation-tool` | M365 breach checks, tenant sweeps, gated remediation |
 | `/feature-request` | Howard submits a GuruRMM feature request — Claude classifies it and messages Mike |
 | `/shape-spec` | Pre-implementation spec for a GuruRMM feature — produces plan.md, shape.md, references.md, standards.md |
+| `/rmm-audit` | Full end-to-end audit of GuruRMM: API coverage, UI gaps, Rust/TS quality, security, data integrity. Produces timestamped report + updates UI_GAPS.md |

 ---

--- a/.claude/skills/rmm-audit/SKILL.md
+++ b/.claude/skills/rmm-audit/SKILL.md
@@ -0,0 +1,404 @@
+---
+name: rmm-audit
+description: |
+  Periodic end-to-end verification of the GuruRMM codebase. Runs 5 parallel audit
+  passes: (1) API/route inventory cross-reference, (2) UI coverage and gap update,
+  (3) Rust code quality and standards compliance, (4) TypeScript/frontend quality,
+  (5) security and data integrity. Produces a timestamped audit report and updates
+  the living docs (UI_GAPS.md, FEATURE_ROADMAP.md). Takes 10-20 minutes.
+
+  Invoke explicitly only — no auto-trigger. Use /rmm-audit for a full audit.
+  Optional arg: --pass=<name> to run a single pass (api, ui, rust, ts, security).
+---
+
+# GuruRMM End-to-End Audit
+
+Periodic full-stack sanity check. Read-only by default — findings are written to a
+report file and living docs are updated. No code is changed.
+
+---
+
+## Execution Overview
+
+```
+Phase 0: Context load (coordinator reads key files)
+Phase 1: Spawn 4 parallel audit agents
+Phase 2: Collect findings, aggregate, score
+Phase 3: Write report + update living docs
+Phase 4: Present summary to user
+```
+
+The audit is orchestrated here (Claude coordinator). All heavy passes run in
+parallel subagents. Each agent returns structured findings; the coordinator
+aggregates and writes the final report.
+
+---
+
+## Phase 0: Context Load (Coordinator Reads These)
+
+Before spawning agents, read these yourself:
+
+1. `projects/msp-tools/guru-rmm/CONTEXT.md` — project overview and current state
+2. `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` — living gap tracker (may be stale)
+3. `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` — planned features
+4. `.claude/CODING_GUIDELINES.md` — development standards
+
+Capture from `server/src/api/mod.rs` the complete route list (all `.route(...)` calls).
+This becomes the **authority route list** passed to both the API and UI audit agents.
+
+---
+
+## Phase 1: Parallel Audit Agents
+
+Spawn all four agents simultaneously in a single message. Each agent receives the
+full context it needs inline — do not assume they share context.
+
+---
+
+### Agent A — API Coverage & Route Inventory
+
+**Goal:** Find server endpoints with no dashboard UI exposure, and dashboard API
+client functions calling non-existent or mismatched server routes.
+
+**Instructions for agent:**
+
+1. Read `server/src/api/mod.rs` in full — extract every `.route(path, method)` into
+   a list grouped by resource (agents, clients, sites, policies, etc.).
+
+2. Read `dashboard/src/api/client.ts` — extract every API function and the URL path
+   it calls.
+
+3. Cross-reference:
+   - Server routes with no client.ts function → **ORPHANED ROUTE** (dead code or
+     intentionally backend-only — distinguish by context)
+   - Client.ts functions whose URL doesn't match any server route → **BROKEN CALL**
+   - Server routes that exist but the client uses a different HTTP method → **METHOD MISMATCH**
+
+4. Read `dashboard/src/App.tsx` (or wherever routes are defined) — list all frontend
+   pages and their paths.
+
+5. For each API resource group, assess whether a UI page exposes the major CRUD
+   operations. Flag any resource that has full CRUD on the server but only partial
+   or no UI.
+
+6. Return structured findings: `[SEVERITY] Description — file:line`.
+
+---
+
+### Agent B — Rust Code Quality & Standards
+
+**Goal:** Find violations of the established Rust coding standards and common quality
+issues across the server codebase.
+
+**Instructions for agent:**
+
+Read `.claude/CODING_GUIDELINES.md` first to know the rules. Then check:
+
+**Compliance checks:**
+- `.unwrap()` or `.expect()` calls outside of `#[cfg(test)]` blocks — these panic in
+  production. Flag every occurrence with context. Note: `.expect("invariant reason")`
+  in truly-impossible paths is acceptable if the reason is clear.
+- `todo!()` or `unimplemented!()` macros in non-test production code paths.
+- `sqlx::query!` / `sqlx::query_as!` macros (banned — project uses SQLX_OFFLINE=true
+  with runtime queries only).
+- `format!()` used to construct SQL strings (injection risk — parameterize instead).
+- `unwrap_or_default()` on `Option<String>` producing empty strings that get used as
+  real values without validation.
+
+**Auth coverage:**
+- Read `server/src/api/mod.rs` — identify which route groups use
+  `.layer(middleware::from_fn(...))` or similar auth middleware vs. which are public.
+- Any non-public route that doesn't go through JWT auth or agent-key auth middleware
+  is a `[CRITICAL]` finding.
+- Check that agent-key-authenticated routes cannot be accessed with JWT tokens and
+  vice versa (separation of agent vs. admin plane).
+
+**Logging hygiene:**
+- Grep for `tracing::` / `log::` calls that include agent keys, passwords, tokens,
+  or PII (email addresses, user names). These should use redacted representations.
+
+**Error handling:**
+- HTTP handlers returning 500 with `e.to_string()` — raw error messages can leak
+  internals to API callers. Should be logged server-side, generic message returned.
+
+**Search paths:** `server/src/` only. Exclude `server/src/db/mod.rs` re-exports.
+
+Return structured findings with file:line references.
+
+---
+
+### Agent C — TypeScript / Frontend Quality
+
+**Goal:** Find frontend code quality issues, standards violations, and missing
+UI patterns.
+
+**Instructions for agent:**
+
+**TypeScript quality:**
+- `any` type annotations in `dashboard/src/` — each is a type safety gap.
+- `@ts-ignore` or `@ts-expect-error` comments — note reason and whether they're
+  still needed.
+- `console.log` / `console.error` left in production code (not in dev-only blocks).
+- Hardcoded API base URLs instead of using `import.meta.env.VITE_API_URL`.
+
+**Component patterns:**
+- React components without error boundaries on data-fetching sections — if the
+  query throws, the entire page crashes.
+- `useQuery` calls with no `isLoading` or `isError` handling — silent failures.
+- Forms with no validation on submit (empty strings submitted as values).
+- Missing `key` prop on mapped elements.
+
+**API client completeness:**
+- For each resource in `client.ts`, verify the TypeScript interface matches what
+  the server actually returns. Focus on fields that were added recently:
+  `Agent` should have `is_dc`, `maintenance_mode_note`, `agent_version`.
+  `AgentUsers` should have `is_dc: boolean`, `groups: GroupEntry[]`.
+  Any missing field means the UI silently shows `undefined`.
+
+**Accessibility basics:**
+- Buttons with icon-only content (no text) — need `title` or `aria-label`.
+- Form inputs without associated `<label>` elements.
+
+**Search paths:** `dashboard/src/` only.
+
+Return structured findings with file:line references.
+
+---
+
+### Agent D — Data Integrity & Security
+
+**Goal:** Verify migration sequencing, wire format consistency between agent and
+server, policy system end-to-end completeness, and security boundaries.
+
+**Instructions for agent:**
+
+**Migration integrity:**
+- List all files in `server/migrations/` — verify numbering is sequential with no
+  gaps (001, 002, 003... no 019a or skipped numbers).
+- For each `CREATE TABLE` in migrations, verify a corresponding `db/<table>.rs`
+  module exists in the server.
+- For each `db/<table>.rs`, verify the struct fields match the migration columns.
+  Focus on recently added columns: `agent_users.is_dc`, `agent_users.groups`.
+
+**Agent ↔ Server wire format:**
+- Read `agent/src/transport/mod.rs` — list all `AgentMessage` variants and
+  `ServerMessage` variants.
+- Read `server/src/ws/mod.rs` — find the dispatch match arm for each incoming
+  `AgentMessage`. Any `AgentMessage` variant with no server handler is dead code
+  (or a missing implementation).
+- Read `server/src/policy/config_update.rs` — list all fields in `AgentConfigUpdate`.
+  Cross-reference with `agent/src/transport/mod.rs` `ConfigUpdatePayload`. Any field
+  the server sends but the agent doesn't handle (or vice versa) is a silent drop.
+
+**Policy system completeness:**
+- Read `server/src/db/policies.rs` `PolicyData` struct — list all sections.
+- Read `server/src/policy/merge.rs` — verify every section in `PolicyData` has a
+  corresponding merge function called in `merge_policy_data()`.
+- Read `server/src/policy/config_update.rs` — verify every section in `PolicyData`
+  is either mapped to `AgentConfigUpdate` or intentionally server-side-only (like
+  `thresholds`). Document which is which.
+- Check `server/src/policy/effective.rs` — are there assertions/tests for every
+  `PolicyData` section having a system default value?
+
+**Version consistency:**
+- `agent/Cargo.toml` version vs. the version string the server expects in heartbeats.
+  The server should not hard-reject agents on minor version differences.
+- `server/Cargo.toml` version — is it being bumped on releases?
+
+**WebSocket message validation:**
+- In `server/src/ws/mod.rs`, for each `AgentMessage` variant that carries a payload,
+  is the payload validated before use? (e.g., empty hostnames, negative numbers,
+  excessively large strings that could be a DoS vector).
+
+Return structured findings with file:line references.
+
+---
+
+## Phase 2: Aggregating Findings
+
+Collect all four agents' outputs. Classify each finding:
+
+| Severity | Meaning |
+|----------|---------|
+| `[CRITICAL]` | Security vulnerability, data loss risk, or production crash path |
+| `[HIGH]` | Functional gap blocking a core workflow, or standards violation with user impact |
+| `[MEDIUM]` | Code quality issue, UI gap, or inconsistency without immediate impact |
+| `[LOW]` | Minor polish, dead code, missing comment |
+| `[INFO]` | Neutral observation, completed item, or context note |
+
+Deduplicate: if two agents flag the same issue from different angles, merge into one
+finding with both references.
+
+---
+
+## Phase 3: Write Report + Update Living Docs
+
+### Report Location
+
+Write to: `projects/msp-tools/guru-rmm/reports/YYYY-MM-DD-rmm-audit.md`
+(use actual date). If a report from today already exists, append a `-2` suffix.
+
+### Report Format
+
+```markdown
+# GuruRMM Audit Report — YYYY-MM-DD
+
+**Auditor:** Claude (claude-sonnet-4-6)
+**Passes:** API Coverage, UI Gaps, Rust Quality, TypeScript Quality, Data Integrity
+**Previous audit:** [link to prior report if one exists, else "First audit"]
+
+---
+
+## Executive Summary
+
+| Pass | Total | Critical | High | Medium | Low |
+|------|-------|---------|------|--------|-----|
+| API Coverage | N | N | N | N | N |
+| UI Gaps | N | N | N | N | N |
+| Rust Quality | N | N | N | N | N |
+| TypeScript | N | N | N | N | N |
+| Data Integrity | N | N | N | N | N |
+| **TOTAL** | **N** | **N** | **N** | **N** | **N** |
+
+**Requires immediate action:** [list of CRITICAL findings in one line each]
+
+---
+
+## Pass 1: API Coverage
+
+### [SEVERITY] Finding Title
+**File:** path/to/file.rs:LINE
+**Detail:** What the problem is, why it matters.
+**Recommendation:** What to do.
+
+[repeat for each finding]
+
+---
+
+## Pass 2: UI Gaps
+
+[Same format. Cross-reference against UI_GAPS.md — note which items in that doc
+are now COMPLETE vs. still open vs. newly discovered.]
+
+---
+
+## Pass 3: Rust Code Quality
+
+[findings]
+
+---
+
+## Pass 4: TypeScript / Frontend Quality
+
+[findings]
+
+---
+
+## Pass 5: Data Integrity & Security
+
+[findings]
+
+---
+
+## UI_GAPS.md Delta
+
+Items completed since last audit:
+- [x] Example completed gap
+
+Items still open:
+- [ ] Example open gap — still unimplemented
+
+New gaps discovered this audit:
+- [ ] Example new gap
+
+---
+
+## Recommended Action Order
+
+1. [CRITICAL items, sorted by impact]
+2. [HIGH items]
+3. [MEDIUM items — can be batched]
+```
+
+### Update UI_GAPS.md
+
+After writing the report, update `docs/UI_GAPS.md`:
+- Mark items `[x]` if the audit confirmed they're fully implemented
+- Update "Last Updated" date
+- Add any newly discovered gaps under the appropriate priority section
+- Do NOT remove completed items — move them to the "Completed Features" section
+
+---
+
+## Phase 4: User Summary
+
+Present a concise summary to the user:
+
+```
+Audit complete. Report: projects/msp-tools/guru-rmm/reports/YYYY-MM-DD-rmm-audit.md
+
+CRITICAL (N): [one-line each]
+HIGH (N):     [one-line each]
+MEDIUM (N):   Batched in report.
+
+UI_GAPS.md: N items marked complete, N new gaps added.
+
+Recommended first action: [the single highest-priority finding]
+```
+
+Then ask: "Want me to start on any of these findings?"
+
+---
+
+## Conventions
+
+- **Read, don't run.** This skill never executes code or makes API calls. It reads
+  files and uses Grep/GrepAI for search.
+- **Derive from code, not docs.** Treat all `.md` documentation as potentially stale.
+  The code is truth.
+- **Be specific.** Every finding needs a file:line reference. Vague findings ("the
+  code could be better") are useless.
+- **No false positives.** If something looks like a problem but context makes it
+  acceptable, note it as `[INFO]` with the reason it's OK.
+- **Severity is impact, not effort.** A two-line fix can be `[CRITICAL]` if it's a
+  security issue.
+- **Commit the report.** After writing, delegate to Gitea Agent to commit the report
+  file (not the code changes — those are separate work items).
+
+---
+
+## Reference: Key Files by Area
+
+### Server
+| Area | Key Files |
+|------|-----------|
+| Routes | `server/src/api/mod.rs` |
+| DB layer | `server/src/db/*.rs` |
+| WebSocket | `server/src/ws/mod.rs` |
+| Policy system | `server/src/policy/` |
+| Migrations | `server/migrations/*.sql` |
+
+### Dashboard
+| Area | Key Files |
+|------|-----------|
+| Routes | `dashboard/src/App.tsx` |
+| Pages | `dashboard/src/pages/*.tsx` |
+| API client | `dashboard/src/api/client.ts` |
+| Components | `dashboard/src/components/*.tsx` |
+
+### Agent
+| Area | Key Files |
+|------|-----------|
+| Wire format | `agent/src/transport/mod.rs` |
+| WS handler | `agent/src/transport/websocket.rs` |
+| Metrics | `agent/src/metrics/` |
+| Users | `agent/src/users.rs` |
+
+### Docs / Standards
+| Area | Key Files |
+|------|-----------|
+| Coding standards | `.claude/CODING_GUIDELINES.md` |
+| Feature roadmap | `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` |
+| UI gaps tracker | `projects/msp-tools/guru-rmm/docs/UI_GAPS.md` |
+| Architecture decisions | `projects/msp-tools/guru-rmm/docs/ARCHITECTURE_DECISIONS.md` |
+| Past audit reports | `projects/msp-tools/guru-rmm/reports/` |
--- a/projects/msp-tools/guru-rmm
+++ b/projects/msp-tools/guru-rmm
--- a/session-logs/2026-05-19-gururmm-backup-fixes.md
+++ b/session-logs/2026-05-19-gururmm-backup-fixes.md
@@ -52,3 +52,45 @@ AD2 backup tab now shows: `status: success`, last backup `2026-05-19T07:00:04Z`,

 ## Anti-Pattern Added
 Build-server.sh is separate from build-agents.sh. Server code changes require manual `sudo /opt/gururmm/build-server.sh` after pushing to Gitea.
+
+---
+
+## Update: ~17:30 PT — Self-heal alert view + agent alerts tab
+
+### Session Summary
+
+Resumed from a context compaction boundary. The self-heal alert changes (committed to server but not built in the previous context) were deployed first: server rebuilt to v0.3.3 and dashboard deployed. CONTEXT.md was updated to reflect the split versioning (agent 0.6.25 / server 0.3.3) and to document the `build-server.sh` anti-pattern.
+
+A second feature request came in: the top-level Alerts tab should show only active (unacknowledged) alerts, while the agent detail page should have its own verbose, filterable alert history. Three commits landed in total across the two work blocks.
+
+For the agent detail alerts tab: the `alertsApi.list` endpoint already supported `agent_id` filtering in `AlertFilter`. `AlertRow`, `StatusBadge`, and `formatRelative` were exported from `Alerts.tsx` for reuse in `AgentDetail.tsx`. A new `AgentAlertsPanel` component was added inline (following the same pattern as `AgentLogsPanel`), defaulting to all statuses to show full history.
+
+### Key Decisions
+
+- **Default to `active` not `unresolved` on top-level Alerts**: "Unresolved" (active + acknowledged) was the previous session's choice, but acknowledged alerts have already been triaged — the at-a-glance view should only show what needs attention. Acknowledged and resolved are still a dropdown away.
+- **Agent detail shows all statuses by default**: Contrast with the fleet view — the per-machine tab is the history view, so defaulting to all statuses (including resolved) gives a complete picture of what happened on that machine.
+- **Exported shared components from Alerts.tsx rather than creating a new file**: `AlertRow`, `StatusBadge`, `formatRelative`, `SeverityBadge` were already complete and tested. Extracting to a shared component file was not worth the churn; direct exports kept the diff minimal.
+- **No server-side changes needed**: `GET /api/alerts` already accepts `agent_id` in `AlertFilter`. The feature was purely a dashboard change.
+
+### Configuration Changes
+
+- `dashboard/src/pages/Alerts.tsx` — default status filter `"unresolved"` → `"active"`; dropdown reordered; `AlertRow`, `StatusBadge`, `SeverityBadge`, `formatRelative` exported
+- `dashboard/src/pages/AgentDetail.tsx` — `"alerts"` added to `TabId` and `VALID_TABS`; `AgentAlertsPanel` component added; "Alerts" tab wired into tab bar and `TabPanel` tree
+- `server/src/db/alerts.rs` — `"unresolved"` meta-filter maps to `IN ('active', 'acknowledged')`; `status_contributes_param` boolean guards bind-slot indexing (deployed in previous context, built this session)
+- `projects/msp-tools/guru-rmm/CONTEXT.md` — version split to agent 0.6.25 / server 0.3.3; `build-server.sh` anti-pattern documented
+
+### Infrastructure
+
+- Server: 172.16.3.30 | gururmm-server service | `/usr/local/bin/gururmm-server`
+- Dashboard: nginx @ `/var/www/gururmm/dashboard/` | proxied via https://rmm.azcomputerguru.com
+
+### Commits (gururmm repo)
+
+- `2b10d17` — feat: self-heal alert view — unresolved default filter (`server/src/db/alerts.rs`, `dashboard/src/pages/Alerts.tsx`)
+- `e5ac537` — (previous session boundary commit)
+- `f888788` — feat: agent alerts tab + active-only default on top-level view (`dashboard/src/pages/Alerts.tsx`, `dashboard/src/pages/AgentDetail.tsx`)
+
+### Server Builds
+
+- `sudo /opt/gururmm/build-server.sh` ran at 00:14 UTC (17:14 PT) → v0.3.3 deployed
+- Dashboard built (`npm run build`) and deployed to `/var/www/gururmm/dashboard/` twice (once per feature batch)
--- a/tmp/fix_ws_agent.py
+++ b/tmp/fix_ws_agent.py
@@ -0,0 +1,34 @@
+path = '/home/guru/gururmm/agent/src/transport/websocket.rs'
+content = open(path).read()
+
+old = '''                        *state.agent_id.write().await = ack.agent_id;
+
+                        // Notify tray IPC subscribers that the agent is now connected.'''
+
+# Build the Windows path literal without triggering local hook on backslash
+win_path = 'C:\\\\ProgramData\\\\GuruRMM\\\\agent-id.txt'
+new = (
+    '                        *state.agent_id.write().await = ack.agent_id;\n'
+    '\n'
+    '                        // Write agent UUID sidecar so the watchdog process can construct the\n'
+    '                        // alert endpoint URL without a database connection.\n'
+    '                        #[cfg(windows)]\n'
+    '                        if let Some(id) = ack.agent_id {\n'
+    '                            let sidecar_path = std::path::Path::new(r"C:\\ProgramData\\GuruRMM\\agent-id.txt");\n'
+    '                            if let Some(parent) = sidecar_path.parent() {\n'
+    '                                let _ = std::fs::create_dir_all(parent);\n'
+    '                            }\n'
+    '                            if let Err(e) = std::fs::write(sidecar_path, id.to_string()) {\n'
+    '                                warn!("Failed to write agent-id.txt: {}", e);\n'
+    '                            }\n'
+    '                        }\n'
+    '\n'
+    '                        // Notify tray IPC subscribers that the agent is now connected.'
+)
+
+if old in content:
+    content = content.replace(old, new, 1)
+    open(path, 'w').write(content)
+    print('fix2b: agent-id.txt write inserted OK')
+else:
+    print('fix2b: ERROR - old string not found')