Files
claudetools/.claude/skills/rmm-audit/SKILL.md
Mike Swanson a885b54deb feat: make FEATURE_ROADMAP a living doc — dev definition-of-done + audit default
Mike's decision (2026-05-27): the roadmap is a maintained status-and-plan
tracker ([ ]=planned, [x]=shipped, dated), consulted going in and updated
coming out.

- gururmm-development-principles memory: new "Living Roadmap (MANDATORY)"
  principle — consult before building, update the entry in the SAME change
  when shipping/modifying; roadmap update is part of definition-of-done.
  Dev is the primary maintainer; the audit is the backstop.
- rmm-audit skill: state the convention explicitly — the roadmap pass
  default is reconcile-and-flip (not annotate-only).

(Companion gururmm-repo changes — DESIGN.md principle + baseline checkbox
reconcile — pushed separately to the gururmm repo.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 06:34:41 -07:00

25 KiB
Raw Blame History

name, description
name description
rmm-audit Periodic end-to-end verification of the GuruRMM codebase and build infrastructure. Runs 5 parallel audit passes: (1) API/route inventory cross-reference, (2) UI coverage and gap update, (3) Rust code quality and standards compliance, (4) TypeScript/frontend quality, (5) security and data integrity. A 6th sequential pass audits build pipeline health (logs, artifacts, change gates, script integrity). Produces a timestamped audit report and updates the living docs (UI_GAPS.md, FEATURE_ROADMAP.md). Takes 10-20 minutes. Invoke explicitly only — no auto-trigger. Use /rmm-audit for a full audit. Optional arg: --pass=<name> to run a single pass (api, ui, rust, ts, security, pipeline, roadmap). The roadmap pass reconciles FEATURE_ROADMAP.md checkboxes against the code and cleans up stale ones.

GuruRMM End-to-End Audit

Periodic full-stack sanity check. Read-only by default — findings are written to a report file and living docs are updated. No code is changed.


Execution Overview

Phase 0: Context load (coordinator reads key files)
Phase 1: Spawn 5 parallel audit agents (codebase passes)
Phase 2: Run build pipeline audit (sequential — requires SSH to build server)
Phase 3: Collect findings, aggregate, score
Phase 4: Write report + update living docs
Phase 5: Present summary to user

The audit is orchestrated here (Claude coordinator). All codebase passes run in parallel subagents. The build pipeline pass runs sequentially after (it touches live server state via SSH). Each agent returns structured findings; the coordinator aggregates and writes the final report.

Model (MANDATORY)

Always run every audit pass on Opus 4.7. Spawn each agent with model: "opus" (the opus alias resolves to the latest Opus, currently claude-opus-4-7). This overrides the default complexity-based model routing — do NOT downgrade any pass to Sonnet or Haiku, even the lower-stakes ones (API coverage, TypeScript). The coordinator orchestration also runs on 4.7.


Phase 0: Context Load (Coordinator Reads These)

Before spawning agents, read these yourself:

  1. projects/msp-tools/guru-rmm/CONTEXT.md — project overview and current state
  2. projects/msp-tools/guru-rmm/docs/UI_GAPS.md — living gap tracker (may be stale)
  3. projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md — planned features
  4. .claude/CODING_GUIDELINES.md — development standards

Capture from server/src/api/mod.rs the complete route list (all .route(...) calls). This becomes the authority route list passed to both the API and UI audit agents.

Also extract every checkbox line from FEATURE_ROADMAP.md (both [ ] and [x], with section + priority) into a roadmap claims list — passed to Agent F for reconciliation against the code.


Phase 1: Parallel Audit Agents

Spawn all five parallel agents (A, B, C, D, F) simultaneously in a single message. Each agent receives the full context it needs inline — do not assume they share context. (Agent E — Build Pipeline — runs sequentially afterward; it SSHes the live build server.)


Agent A — API Coverage & Route Inventory

Goal: Find server endpoints with no dashboard UI exposure, and dashboard API client functions calling non-existent or mismatched server routes.

Instructions for agent:

  1. Read server/src/api/mod.rs in full — extract every .route(path, method) into a list grouped by resource (agents, clients, sites, policies, etc.).

  2. Read dashboard/src/api/client.ts — extract every API function and the URL path it calls.

  3. Cross-reference:

    • Server routes with no client.ts function → ORPHANED ROUTE (dead code or intentionally backend-only — distinguish by context)
    • Client.ts functions whose URL doesn't match any server route → BROKEN CALL
    • Server routes that exist but the client uses a different HTTP method → METHOD MISMATCH
  4. Read dashboard/src/App.tsx (or wherever routes are defined) — list all frontend pages and their paths.

  5. For each API resource group, assess whether a UI page exposes the major CRUD operations. Flag any resource that has full CRUD on the server but only partial or no UI.

  6. Return structured findings: [SEVERITY] Description — file:line.


Agent B — Rust Code Quality & Standards

Goal: Find violations of the established Rust coding standards and common quality issues across the server codebase.

Instructions for agent:

Read .claude/CODING_GUIDELINES.md first to know the rules. Then check:

Compliance checks:

  • .unwrap() or .expect() calls outside of #[cfg(test)] blocks — these panic in production. Flag every occurrence with context. Note: .expect("invariant reason") in truly-impossible paths is acceptable if the reason is clear.
  • todo!() or unimplemented!() macros in non-test production code paths.
  • sqlx::query! / sqlx::query_as! macros (banned — project uses SQLX_OFFLINE=true with runtime queries only).
  • format!() used to construct SQL strings (injection risk — parameterize instead).
  • unwrap_or_default() on Option<String> producing empty strings that get used as real values without validation.

Auth coverage:

  • Read server/src/api/mod.rs — identify which route groups use .layer(middleware::from_fn(...)) or similar auth middleware vs. which are public.
  • Any non-public route that doesn't go through JWT auth or agent-key auth middleware is a [CRITICAL] finding.
  • Check that agent-key-authenticated routes cannot be accessed with JWT tokens and vice versa (separation of agent vs. admin plane).

Logging hygiene:

  • Grep for tracing:: / log:: calls that include agent keys, passwords, tokens, or PII (email addresses, user names). These should use redacted representations.

Error handling:

  • HTTP handlers returning 500 with e.to_string() — raw error messages can leak internals to API callers. Should be logged server-side, generic message returned.

Search paths: server/src/ only. Exclude server/src/db/mod.rs re-exports.

Return structured findings with file:line references.


Agent C — TypeScript / Frontend Quality

Goal: Find frontend code quality issues, standards violations, and missing UI patterns.

Instructions for agent:

TypeScript quality:

  • any type annotations in dashboard/src/ — each is a type safety gap.
  • @ts-ignore or @ts-expect-error comments — note reason and whether they're still needed.
  • console.log / console.error left in production code (not in dev-only blocks).
  • Hardcoded API base URLs instead of using import.meta.env.VITE_API_URL.

Component patterns:

  • React components without error boundaries on data-fetching sections — if the query throws, the entire page crashes.
  • useQuery calls with no isLoading or isError handling — silent failures.
  • Forms with no validation on submit (empty strings submitted as values).
  • Missing key prop on mapped elements.

API client completeness:

  • For each resource in client.ts, verify the TypeScript interface matches what the server actually returns. Focus on fields that were added recently: Agent should have is_dc, maintenance_mode_note, agent_version. AgentUsers should have is_dc: boolean, groups: GroupEntry[]. Any missing field means the UI silently shows undefined.

Accessibility basics:

  • Buttons with icon-only content (no text) — need title or aria-label.
  • Form inputs without associated <label> elements.

Search paths: dashboard/src/ only.

Return structured findings with file:line references.


Agent D — Data Integrity & Security

Goal: Verify migration sequencing, wire format consistency between agent and server, policy system end-to-end completeness, and security boundaries.

Instructions for agent:

Migration integrity:

  • List all files in server/migrations/ — verify numbering is sequential with no gaps (001, 002, 003... no 019a or skipped numbers).
  • For each CREATE TABLE in migrations, verify a corresponding db/<table>.rs module exists in the server.
  • For each db/<table>.rs, verify the struct fields match the migration columns. Focus on recently added columns: agent_users.is_dc, agent_users.groups.

Agent ↔ Server wire format:

  • Read agent/src/transport/mod.rs — list all AgentMessage variants and ServerMessage variants.
  • Read server/src/ws/mod.rs — find the dispatch match arm for each incoming AgentMessage. Any AgentMessage variant with no server handler is dead code (or a missing implementation).
  • Read server/src/policy/config_update.rs — list all fields in AgentConfigUpdate. Cross-reference with agent/src/transport/mod.rs ConfigUpdatePayload. Any field the server sends but the agent doesn't handle (or vice versa) is a silent drop.

Policy system completeness:

  • Read server/src/db/policies.rs PolicyData struct — list all sections.
  • Read server/src/policy/merge.rs — verify every section in PolicyData has a corresponding merge function called in merge_policy_data().
  • Read server/src/policy/config_update.rs — verify every section in PolicyData is either mapped to AgentConfigUpdate or intentionally server-side-only (like thresholds). Document which is which.
  • Check server/src/policy/effective.rs — are there assertions/tests for every PolicyData section having a system default value?

Version consistency:

  • agent/Cargo.toml version vs. the version string the server expects in heartbeats. The server should not hard-reject agents on minor version differences.
  • server/Cargo.toml version — is it being bumped on releases?

WebSocket message validation:

  • In server/src/ws/mod.rs, for each AgentMessage variant that carries a payload, is the payload validated before use? (e.g., empty hostnames, negative numbers, excessively large strings that could be a DoS vector).

Return structured findings with file:line references.


Agent F — Roadmap Reconciliation

Goal: Make FEATURE_ROADMAP.md tell the truth. The roadmap drifts from reality: features ship in code while their checkbox stays [ ] (verified cases: the Network Discovery Node backend, and temperature monitoring / BUG-001 — both shipped while marked incomplete), and occasionally an item is [x] but the code was never written or was reverted. This pass reconciles every checkbox against the code.

Runs in parallel with AD (read-only code search, no SSH).

Instructions for agent:

  1. Read docs/FEATURE_ROADMAP.md and extract every checkbox line with its state ([ ] / [x]), section, and priority.

  2. For EACH item, find the implementing artifact in the code — do NOT trust the checkbox. Map feature → evidence:

    • API/endpoint features → server/src/api/mod.rs routes + the handler module
    • DB/schema features → server/migrations/*.sql + server/src/db/*.rs
    • Agent capabilities → agent/src/ modules
    • Dashboard/UI features → dashboard/src/pages|components/*.tsx + api/client.ts Use Grep/GrepAI. Cite the exact artifact (file:line, migration name, route path).
  3. Classify each item:

    • STALE-INCOMPLETE[ ] but the code fully implements it end-to-end → recommend flip to [x]. Cite the proving artifact.
    • PARTIAL[ ], backend/some layers exist but not end-to-end for the stated scope (e.g. backend + API done but no UI; scaffolding present but unwired — like safe-rollout, or discovery with a per-agent tab but no fleet view). Keep [ ] but recommend an inline annotation, e.g. [ ] ... (backend done; UI/fleet-view pending). Do NOT flip partials to [x].
    • STALE-COMPLETE[x] but the code does not implement it (never built or reverted) → recommend flip to [ ], flag [HIGH] (the roadmap is lying).
    • ACCURATE — checkbox matches code. No change.
  4. Be conservative: only flip [ ][x] when the evidence is unambiguous AND the feature is complete end-to-end for its stated scope. When in doubt → PARTIAL with a note, never a flip. A backend-only or scaffolding-only implementation is PARTIAL.

Return a table: item text | section | current state | verdict (STALE-INCOMPLETE / PARTIAL / STALE-COMPLETE / ACCURATE) | proving-or-missing artifact


Agent E — Build Pipeline Health

Goal: Verify the build/deploy infrastructure is functioning correctly and producing fresh, trustworthy artifacts. This pass catches issues invisible to codebase-only audits: log rot, stale artifacts, dead pipeline paths, and change gate failures.

NOTE: This agent runs sequentially (after Agents AD complete) because it SSHes into the live build server. It is read-only — it checks state but does not trigger builds.

Instructions for agent:

Connect to the build server: ssh guru@172.16.3.30

1. Log integrity — check for doubling and freshness:

# Check Windows build log — each line should appear exactly once
tail -50 /var/log/gururmm-build-windows.log
# Check Linux build log
tail -50 /var/log/gururmm-build-linux.log
  • Lines duplicated (same content appearing twice in a row) → [HIGH] log doubling — double-writer bug
  • Last entry timestamp > 7 days old AND recent pushes known → [HIGH] stale log — builds may be silently failing
  • Log file missing entirely → [CRITICAL] — build infrastructure not initialised
  • Presence of === PHASE: markers → [INFO] phase tracking is active (expected)

2. Artifact freshness — check distribution directory:

ls -lht /var/www/gururmm/downloads/windows/amd64/ | head -10
ls -lht /var/www/gururmm/downloads/linux/amd64/ | head -10
  • Newest MSI/EXE older than 14 days AND active development confirmed → [HIGH] artifacts stale
  • Legacy path /opt/gururmm/updates/windows/amd64/ should NOT be served (it is the old path); if a symlink or nginx config still points there → [HIGH] dead artifact path still active

3. Per-platform last-built-commit recency:

cat /opt/gururmm/last-built-commit-linux
cat /opt/gururmm/last-built-commit-windows
cat /opt/gururmm/last-built-commit-mac
  • SHA should be recent relative to git log --oneline -5 in /home/guru/gururmm
  • Linux and Windows SHAs diverging by many commits → [MEDIUM] platform builds out of sync
  • A SHA that resolves to a commit months old while git log shows recent work → [HIGH] change gate stuck

4. Stale lock files:

ls -la /var/run/gururmm-build-*.lock 2>/dev/null
  • Lock file present with no corresponding running process → [HIGH] orphaned lock, all future builds for that platform will be blocked until manually removed
  • Check: ps aux | grep build- — if no build-linux.sh / build-windows.sh running but lock exists, it's orphaned

5. Script syntax validity:

bash -n /opt/gururmm/build-shared.sh
bash -n /opt/gururmm/build-linux.sh
bash -n /opt/gururmm/build-windows.sh
bash -n /opt/gururmm/build-mac.sh
  • Any syntax error → [CRITICAL] — that platform's builds will silently fail at next trigger

6. Webhook handler health:

curl -s http://localhost:9000/health
ps aux | grep webhook-handler
  • /health returns non-200 or connection refused → [CRITICAL] webhook handler down
  • Handler not in process list → [CRITICAL] handler not running
  • Check handler is using the new multi-threaded version (should mention PLATFORMS in its source): grep -c PLATFORMS /opt/gururmm/webhook-handler.py Count of 0 → [HIGH] old monolithic handler still deployed

7. Pluto known-hosts file:

ls -la /opt/gururmm/pluto_known_hosts
wc -l /opt/gururmm/pluto_known_hosts
  • File missing → [CRITICAL] Windows builds will fail (SSH strict host checking with no key file)
  • File empty (0 lines) → [CRITICAL] same
  • Confirm build-windows.sh references it: grep pluto_known_hosts /opt/gururmm/build-windows.sh If missing → [HIGH] StrictHostKeyChecking=no likely, MITM risk on build artifacts

8. Tray EXE accumulation:

ls -lht /var/www/gururmm/downloads/windows/amd64/gururmm-tray-* 2>/dev/null | wc -l
  • More than 3 tray EXE versions present → [LOW] cleanup not running (design: keep latest 2)

9. Build compat wrapper check:

head -5 /opt/gururmm/build-agents.sh
  • Should begin with a deprecation warning and call to build-shared.sh
  • If it still contains the old monolithic build logic → [HIGH] pipeline split not deployed

Return structured findings with source (file path + line or command output) for every finding.


Phase 3: Aggregating Findings

Collect all five agents' outputs. Classify each finding:

Severity Meaning
[CRITICAL] Security vulnerability, data loss risk, or production crash path
[HIGH] Functional gap blocking a core workflow, or standards violation with user impact
[MEDIUM] Code quality issue, UI gap, or inconsistency without immediate impact
[LOW] Minor polish, dead code, missing comment
[INFO] Neutral observation, completed item, or context note

Deduplicate: if two agents flag the same issue from different angles, merge into one finding with both references.


Phase 3: Write Report + Update Living Docs

Report Location

Write to: projects/msp-tools/guru-rmm/reports/YYYY-MM-DD-rmm-audit.md (use actual date). If a report from today already exists, append a -2 suffix.

Report Format

# GuruRMM Audit Report — YYYY-MM-DD

**Auditor:** Claude (claude-opus-4-7)
**Passes:** API Coverage, UI Gaps, Rust Quality, TypeScript Quality, Data Integrity, Build Pipeline, Roadmap Reconciliation
**Previous audit:** [link to prior report if one exists, else "First audit"]

---

## Executive Summary

| Pass | Total | Critical | High | Medium | Low |
|------|-------|---------|------|--------|-----|
| API Coverage | N | N | N | N | N |
| UI Gaps | N | N | N | N | N |
| Rust Quality | N | N | N | N | N |
| TypeScript | N | N | N | N | N |
| Data Integrity | N | N | N | N | N |
| Build Pipeline | N | N | N | N | N |
| **TOTAL** | **N** | **N** | **N** | **N** | **N** |

**Requires immediate action:** [list of CRITICAL findings in one line each]

---

## Pass 1: API Coverage

### [SEVERITY] Finding Title
**File:** path/to/file.rs:LINE
**Detail:** What the problem is, why it matters.
**Recommendation:** What to do.

[repeat for each finding]

---

## Pass 2: UI Gaps

[Same format. Cross-reference against UI_GAPS.md — note which items in that doc
are now COMPLETE vs. still open vs. newly discovered.]

---

## Pass 3: Rust Code Quality

[findings]

---

## Pass 4: TypeScript / Frontend Quality

[findings]

---

## Pass 5: Data Integrity & Security

[findings]

---

## Pass 6: Build Pipeline Health

[findings — log integrity, artifact freshness, change gate state, lock files, script
syntax, webhook handler health, Pluto known-hosts, tray EXE accumulation]

---

## UI_GAPS.md Delta

Items completed since last audit:
- [x] Example completed gap

Items still open:
- [ ] Example open gap — still unimplemented

New gaps discovered this audit:
- [ ] Example new gap

---

## FEATURE_ROADMAP.md Delta (Agent F)

Checkboxes corrected to match code reality this audit:
- `[ ]``[x]` **<item>** — was marked incomplete but is shipped. Proof: `<artifact>`.
- `[x]``[ ]` **<item>** — marked complete but NOT in code (regression / never built). `[HIGH]`

Annotated as partial (left `[ ]`, scope clarified):
- `[ ]` **<item>** (backend done; UI/fleet-view pending) — `<artifact>`

Verified accurate (no change): N items.

---

## Recommended Action Order

1. [CRITICAL items, sorted by impact]
2. [HIGH items]
3. [MEDIUM items — can be batched]

Update UI_GAPS.md

After writing the report, update docs/UI_GAPS.md:

  • Mark items [x] if the audit confirmed they're fully implemented
  • Update "Last Updated" date
  • Add any newly discovered gaps under the appropriate priority section
  • Do NOT remove completed items — move them to the "Completed Features" section

Update FEATURE_ROADMAP.md (roadmap cleanup)

Apply Agent F's reconciliation to docs/FEATURE_ROADMAP.md — this is the roadmap cleanup the audit is responsible for. The roadmap is a living doc, so editing it fits the "living docs are updated" exception to read-only.

Convention (decided 2026-05-27): FEATURE_ROADMAP.md is a maintained status-and-plan tracker — [ ] = planned, [x] = shipped (dated). Dev work is the primary maintainer (updating roadmap entries is part of GuruRMM definition-of-done — see gururmm-development-principles); this audit pass is the backstop. So the default IS to flip stale checkboxes (reconcile-and-flip), NOT annotate-only. Only fall back to annotate-only if the user explicitly requests it for a given run.

  • STALE-INCOMPLETE → flip [ ] to [x] for every item Agent F proved is shipped end-to-end. Keep the line text; optionally append (verified <date>).
  • PARTIAL → leave [ ], append the scope annotation Agent F recommends (e.g. (backend done; UI/fleet-view pending)). Never flip a partial to [x].
  • STALE-COMPLETE → flip [x] to [ ] and add [REGRESSION — flagged YYYY-MM-DD] so it's visible; mirror it as a [HIGH] finding in the report.
  • Update any "Last Updated" / status line in the roadmap.
  • Every checkbox change MUST also appear in the report's "FEATURE_ROADMAP.md Delta" section with its proving (or missing) artifact — no silent edits.
  • When evidence is ambiguous, do NOT change the checkbox; record it as PARTIAL/INFO and leave the cleanup for human review. Bias toward under-flipping.

Phase 5: User Summary

Present a concise summary to the user:

Audit complete. Report: projects/msp-tools/guru-rmm/reports/YYYY-MM-DD-rmm-audit.md

CRITICAL (N): [one-line each]
HIGH (N):     [one-line each]
MEDIUM (N):   Batched in report.

Pipeline: [one-line status — e.g. "all green" or highest-severity finding]
UI_GAPS.md: N items marked complete, N new gaps added.
Roadmap: N checkboxes corrected (N stale-incomplete flipped to done, N partial annotated, N regressions flagged).

Recommended first action: [the single highest-priority finding]

Then ask: "Want me to start on any of these findings?"


Conventions

  • Read, don't run. This skill never executes code or makes API calls. It reads files and uses Grep/GrepAI for search.
  • Derive from code, not docs. Treat all .md documentation as potentially stale. The code is truth.
  • Be specific. Every finding needs a file:line reference. Vague findings ("the code could be better") are useless.
  • No false positives. If something looks like a problem but context makes it acceptable, note it as [INFO] with the reason it's OK.
  • Severity is impact, not effort. A two-line fix can be [CRITICAL] if it's a security issue.
  • Commit the report. After writing, delegate to Gitea Agent to commit the report file (not the code changes — those are separate work items).

Reference: Key Files by Area

Server

Area Key Files
Routes server/src/api/mod.rs
DB layer server/src/db/*.rs
WebSocket server/src/ws/mod.rs
Policy system server/src/policy/
Migrations server/migrations/*.sql

Dashboard

Area Key Files
Routes dashboard/src/App.tsx
Pages dashboard/src/pages/*.tsx
API client dashboard/src/api/client.ts
Components dashboard/src/components/*.tsx

Agent

Area Key Files
Wire format agent/src/transport/mod.rs
WS handler agent/src/transport/websocket.rs
Metrics agent/src/metrics/
Users agent/src/users.rs

Docs / Standards

Area Key Files
Coding standards .claude/CODING_GUIDELINES.md
Feature roadmap projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md
UI gaps tracker projects/msp-tools/guru-rmm/docs/UI_GAPS.md
Architecture decisions projects/msp-tools/guru-rmm/docs/ARCHITECTURE_DECISIONS.md
Past audit reports projects/msp-tools/guru-rmm/reports/

Build Pipeline (on 172.16.3.30)

Area Path
Webhook handler /opt/gururmm/webhook-handler.py
Shared build script /opt/gururmm/build-shared.sh
Linux build script /opt/gururmm/build-linux.sh
Windows build script /opt/gururmm/build-windows.sh
Mac build script /opt/gururmm/build-mac.sh
Pluto known-hosts /opt/gururmm/pluto_known_hosts
Linux build log /var/log/gururmm-build-linux.log
Windows build log /var/log/gururmm-build-windows.log
Distribution dir /var/www/gururmm/downloads/
Per-platform last SHA /opt/gururmm/last-built-commit-{linux,windows,mac}
Lock files /var/run/gururmm-build-{linux,windows,mac}.lock
Pluto machine doc .claude/machines/pluto.md