From d4eb8358ce88f66d65abf5f417d7a1ea06aec1e3 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Tue, 26 May 2026 18:16:03 -0700 Subject: [PATCH] wiki: add capability synthesis to wiki-compile; recompile GuruRMM Skill + template: - wiki-compile Phase 2P: type-aware authoritative-artifact discovery for projects (migrations, API routes, agent modules, roadmap-done, commit log), with a stale-submodule guard that reads origin/main when the pinned submodule lags. Changelogs treated as incomplete, not authoritative. - project template: add a Capabilities / Feature Set section. GuruRMM recompile (from live main artifacts, not session logs): - Added Capabilities / Feature Set section covering monitoring, remote execution (incl. system vs user_session contexts), inventory/discovery, update mgmt, policy, alerting/watchdog, backup, tunnel, identity/security. - Fixed the misleading "runs as LocalSystem" command-fields line (the gap that started this) and the stale BUG-001 temperature claim (now shipped). - Qualified Entra-only SSO; noted safe-rollout is unwired scaffolding. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/commands/wiki-compile.md | 42 +++++++++++++++++ wiki/_templates/project.md | 4 ++ wiki/projects/gururmm.md | 80 +++++++++++++++++++++++++++++--- 3 files changed, 119 insertions(+), 7 deletions(-) diff --git a/.claude/commands/wiki-compile.md b/.claude/commands/wiki-compile.md index d29e1ed..4ed17a7 100644 --- a/.claude/commands/wiki-compile.md +++ b/.claude/commands/wiki-compile.md @@ -167,6 +167,48 @@ Only the count is used — individual asset details go in session logs and clien --- +## Phase 2P — Authoritative Artifact Discovery (projects only) + +**Applies when `TARGET_TYPE == project`.** Skip for clients/systems. + +Session logs narrate *work done and why* — they are a structurally incomplete record of *what a product can do*. For a code-bearing project, the authoritative capability record is the **code, migrations, API routes, and roadmap**, not the logs. Compiling a project from logs alone WILL miss shipped features (this is exactly how the GuruRMM `user_session` command context was missed). So for projects, dig into the artifacts directly. + +### 2P-a — Locate the repo and guard against a stale submodule + +Many projects are tracked as a **pinned git submodule** whose commit deliberately lags the live repo. Reading the working tree alone gives stale artifacts. Always check: + +```bash +REPO="$CLAUDETOOLS_ROOT/projects/" +cd "$REPO" +git fetch origin main 2>/dev/null +PINNED=$(git rev-parse --short HEAD) +LIVE=$(git rev-parse --short origin/main) +echo "pinned=$PINNED live=$LIVE" +# If they differ, read artifacts from origin/main, NOT the working tree. +REF="origin/main" # use this ref for all artifact reads below; falls back to HEAD if no remote +``` + +Read live artifacts without disturbing the pinned pointer, either via `git show $REF:` / `git ls-tree -r --name-only $REF -- `, or by creating a throwaway worktree: `git worktree add /tmp/-live $REF` (remove with `git worktree remove` when done). + +### 2P-b — Gather authoritative artifacts (priority order) + +1. **DB migrations** — `git ls-tree -r --name-only $REF -- `. Each migration is a feature/schema checkpoint; the filenames alone are a capability timeline (e.g. `041_add_command_context`). This is the most reliable signal and is usually current even when changelogs are not. +2. **API routes** — the real server surface. Read the route-registration file(s) (e.g. `server/src/api/mod.rs`) and the per-resource handler modules. Enumerate endpoints + notable request options (auth modes, contexts, scopes). +3. **Agent / client capabilities** — module tree of the agent/client (e.g. `agent/src/`): metrics, checks, command execution, updater, tunnel, watchdog, inventory, registry ops. Note **per-platform coverage**. +4. **Completed roadmap items** — `docs/FEATURE_ROADMAP.md` checked/done items. +5. **Specs** — `docs/specs/` (shipped vs proposed). +6. **Commit log** — `git log --oneline $REF` filtered for `feat`/`perf` since the article's `last_compiled`. Fuller and more current than changelogs. +7. **Changelogs** — read if present, but treat as **incomplete** (they are frequently stale — verified on GuruRMM: committed changelogs stopped at v0.6.22 while the fleet ran 0.6.39+). Never rely on them as the sole capability source. +8. **README / DESIGN / ARCHITECTURE docs** — for framing and locked decisions. + +### 2P-c — Synthesize the Capabilities / Feature Set section + +From the artifacts above, produce the **Capabilities / Feature Set** section (see project template). Organize by surface (monitoring, remote execution, management, integrations, security). Explicitly capture **execution modes and important options** — e.g. command contexts (`system` vs `user_session`), auth modes, policy scopes, platform coverage. Cross-check the existing article (full recompile) and **correct any capability statement that is now incomplete or wrong** (e.g. "runs as LocalSystem" without the user-session context). + +For large repos, delegate the artifact read + synthesis to an agent (general-purpose) pointed at the live ref/worktree, and integrate the returned section after review — don't flood the main context with the full code/migration dump. + +--- + ## Phase 3 — Session Log Discovery Find all session logs that mention this client: diff --git a/wiki/_templates/project.md b/wiki/_templates/project.md index ce406e1..13bda7c 100644 --- a/wiki/_templates/project.md +++ b/wiki/_templates/project.md @@ -14,6 +14,10 @@ backlinks: [] *(What it is, current maturity, who uses it, what problem it solves)* +## Capabilities / Feature Set + +*(What the product can actually DO — synthesized from AUTHORITATIVE ARTIFACTS, not just session-log narrative: API routes (the real surface), agent/module structure, DB migrations (each is a feature checkpoint), completed roadmap items, specs, and the feat/perf commit log. Changelogs help but are often incomplete — do not rely on them alone. Organize by surface (e.g. monitoring, remote execution, management, integrations, security). Call out platform coverage and important execution modes/options — e.g. command execution contexts, auth modes, policy scopes. This section answers "what does it support?" so future readers don't have to re-derive it from code.)* + ## Architecture ### Components diff --git a/wiki/projects/gururmm.md b/wiki/projects/gururmm.md index 3688ada..8834dfb 100644 --- a/wiki/projects/gururmm.md +++ b/wiki/projects/gururmm.md @@ -2,9 +2,14 @@ type: project name: gururmm display_name: GuruRMM -last_compiled: 2026-05-24 -compiled_by: DESKTOP-0O8A1RL/claude-main +last_compiled: 2026-05-26 +compiled_by: GURU-5070/claude-main sources: + - "gururmm@main: server/src/api/*.rs (REST API surface, ~30 route modules)" + - "gururmm@main: agent/src/ (agent capabilities; transport/CommandContext, ohw.rs, watchdog/wts.rs)" + - "gururmm@main: server/migrations/*.sql (46 migrations — feature checkpoints, incl. 041_add_command_context)" + - "gururmm@main: docs/FEATURE_ROADMAP.md, docs/specs/" + - "gururmm@main: git log feat/perf history (changelogs incomplete past v0.6.22)" - projects/msp-tools/guru-rmm/CONTEXT.md - projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md - projects/msp-tools/guru-rmm/docs/UI_GAPS.md @@ -46,7 +51,7 @@ backlinks: GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 55 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints. -**Current version:** 0.6.38 (as of 2026-05-24; fleet converged within ~10 minutes of publish) +**Current version:** agent 0.6.39 / 0.6.41 in fleet as of 2026-05-26 (server v0.3.x). Note: committed changelogs are stale (stop at agent v0.6.22 / server v0.3.1) — migrations + commit log are the authoritative feature record, not changelogs. **Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a stale reference submodule — do NOT develop there; all real work happens in the Gitea repo. @@ -54,6 +59,65 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G --- +## Capabilities / Feature Set + +*Synthesized from authoritative artifacts (API routes, agent modules, 46 migrations, roadmap, commit log) at live `main` — not from session logs. See Compilation Notes.* + +Agent↔server communication is a persistent authenticated WebSocket with auto-reconnect + heartbeat; on reconnect, in-flight commands flip to `interrupted`. Platform-parity rule: agent features ship on Windows/Linux/macOS in the same change (stub + TODO where a real impl isn't yet feasible). + +### Monitoring & Telemetry +- Core metrics per interval (policy-tunable per section): CPU %, memory %/bytes, disk %/bytes, network rx/tx deltas, uptime, logged-in user, user idle time (Win `GetLastInputInfo`, Linux `xprintidle`), public/WAN IP (cached, multi-service fallback). Cross-platform via `sysinfo`. +- Hardware sensor telemetry: CPU/GPU temps + full sensor array (temperature, voltage, fan RPM, power). Windows via bundled **LibreHardwareMonitor** + WMI (`ohw.rs`); Linux via `/sys/class/thermal`; `sysinfo::Components` fallback. +- Process drill-down: top 10 by CPU + top 10 by memory in every metrics payload (cross-platform). +- Network state: per-interface IPv4/IPv6, MAC, derived CIDR subnets — sent on change only. + +### Remote Execution +- Command types: `shell`, `powershell`, `python`, raw `script{interpreter}`, and `claude_task`. Options: `timeout_seconds`, `elevated`. +- **Execution context** (`041_add_command_context`): `system` (default — Session 0 / service SYSTEM) **or `user_session`** (runs in the active logged-on user's desktop session via WTS token impersonation: `WTSQueryUserToken` + `CreateProcessAsUserW` + per-user env block). Windows-only; requires an active session. +- Commands individually cancellable (`POST /commands/:id/cancel`) and aborted on disconnect. Auditable history; status running/completed/failed/timeout/interrupted. +- Script library (`017_scripts`): stored scripts dispatched to agents with args/env/timeout/`run_as_user`; per-run history. + +### Inventory & Discovery +- Hardware inventory (mfr/model/serial/BIOS, CPU, memory, disks, NICs, OS), software inventory (installed apps), service inventory. On-demand refresh. +- VM / hypervisor / container detection (`032/033`): `is_virtual_machine`, `hypervisor_type`, `vm_uuid`, `is_hypervisor` + hosted VM UUIDs, `is_container`, `is_unraid`. +- User/group inventory (`037`–`040`): local + domain + Azure AD accounts (enabled, pw-never-expires, last-logon, is_admin, AD email/UPN/dept), domain-join classification (none/ad/aad/hybrid), domain name, M365 tenant ID, **domain-controller detection (`is_dc`)**, group membership. Policy-scheduled (default 24h). +- Network discovery: server-dispatched scans — TCP probes over configurable ranges/ports, ARP MAC, reverse DNS, basic OS fingerprint; devices stream back and are persisted/editable. + +### Patch / Agent Update Management +- Self-updater: server sends version + URL + SHA256; agent downloads, verifies checksum, atomically swaps binary, restarts, and **auto-rolls back** to a kept backup if it fails to reconnect (~180s window). +- Auto-update gated on effective policy `auto_update` (channel + defer_hours; maintenance-window field received but not yet enforced [verify]). Force via `POST /agents/:id/update`. Update channels at agent/site/client (`026`). +- Safe-rollout (`046`): `update_rollouts`/health-metrics/events tables + `/updates/rollouts` promote/rollback. **Scaffolding only — promotion is manual; health-gated automation is written-but-unwired (Phase 2).** + +### Policy & Configuration Management +- Inheritance chain global → client → site → agent; server computes merged effective policy, pushes via `ConfigUpdate`. Effective policy queryable per scope. +- Checks engine (`018`/`019`): cpu, memory, disk, ping, port, script, service (restart-if-stopped, pass-if-not-exist; Win `sc.exe`, Linux `systemctl`). Policy-attached check templates (`024`) with push-to-agent sync. On-demand `run-checks`. +- Remote registry (Windows, `winreg`): agent supports enumerate/read/write (typed)/create/delete. **HTTP API currently exposes read-only (enumerate, read_value); write paths exist in the agent but aren't routed yet [verify].** + +### Alerting & Watchdog +- Threshold alerts (ack/resolve, per-agent + fleet summary, dashboard filter). Alert templates (`022`) with effective resolution; per-client email settings (`020`). Maintenance mode (`021`) to suppress alerting per scope. +- Watchdog: **separate** supervising process (polls `GuruRMMAgent` every 30s, restart backoff, alert after 3 fails) + launches/reaps the tray into active user sessions via WTS. Full alert CRUD + ack/resolve. + +### Backup Integration (MSP360 / MSPBackups) +- Multi-provider config (`034`/`035`) with connection test, scheduled sync, per-agent + all-providers status, fleet coverage report, and agent↔MSP360 mapping (`044`) with confidence scoring + manual verification. + +### Remote Access (Tunnel) +- Agent side substantially built (`TunnelManager` state machine; Open/Close/Data). **Server side is a dead-code skeleton — not declared in `api/mod.rs`, no `/tunnel` routes, WS handler logs "not yet implemented." Not production-ready.** + +### Identity / Multi-tenancy / Security +- Auth: JWT (login/register/me); agents auth over WS via per-agent API key + hardware device_id. +- **Microsoft Entra ID SSO** (OAuth2/OIDC + PKCE), gated on server config. Multi-provider incl. Google is spec'd (SPEC-008) but **Google not implemented [verify]**. +- Organizations / multi-tenancy: org CRUD, per-org membership + roles, limits, dev-admin **user impersonation** (`/auth/impersonate/:id`). Backend present; dashboard UI partial. +- Encrypted credentials vault (`016`): scoped global/client/site, typed (password, SSH key, SNMP), metadata-only by default with separate `/reveal` decrypt endpoint (known HIGH item: `/reveal` ownership-scope check — [verify current state]). +- Enrollment & keys (`012`): per-agent key issuance on first run, site API keys (regenerable), site-specific MSI with SITEKEY injected at download, public install-report ingestion. Legacy PowerShell agent path for Server 2008 R2. +- Logs: agent log upload (periodic + on-demand), per-agent events (`042`), fleet log view, AI-assisted log analysis (`/logs/analyze`) — AI-optional per locked decision. + +### Platform coverage +- **Cross-platform (Win/Linux/macOS):** metrics, network state, hardware/software/service inventory, user/group + DC/domain detection, checks (service checks Win+Linux), discovery, self-update, scripts/commands. +- **Windows-only:** `user_session` command context (WTS), LibreHardwareMonitor temps, remote registry, tray-into-session, watchdog SCM supervision. +- **macOS:** agent deployed (Phase 1); tray is a stub; automated Mac build pipeline is an intentional stub (no build host) [verify before claiming CI Mac releases]. + +--- + ## Architecture ### Components @@ -104,7 +168,7 @@ gururmm/ │ └── src/ │ ├── ipc.rs Unix socket IPC (Linux); Windows named pipe │ ├── tunnel/ TunnelManager state machine -│ ├── metrics/ sysinfo-based collection (temp NOT yet wired — BUG-001) +│ ├── metrics/ sysinfo collection + temps (LibreHardwareMonitor/WMI on Win, /sys/class/thermal on Linux) — BUG-001 resolved │ ├── registry_ops/ Windows registry read/write │ ├── updater/ Self-update handler │ └── main.rs systemd unit template generation @@ -249,14 +313,14 @@ sudo rsync -av --delete /home/guru/gururmm/dashboard/dist/ /var/www/gururmm/dash - `POST /api/auth/login` → JWT (~24h) - Creds: vault `infrastructure/gururmm-server.sops.yaml` → `credentials.gururmm-api.admin-email` / `admin-password` - Key endpoints: `GET /api/agents`, `POST /api/agents/:id/command`, `GET /api/commands/:id`, `POST /api/agents/:id/update` -- Command fields: `command_type` (powershell/shell/exec), `command` (script text, JSON-encoded). Windows agent runs as LocalSystem. +- Command fields: `command_type` (`shell`/`powershell`/`python`/`script`/`claude_task`), `command` (script text, JSON-encoded), optional `context` — **`system`** (default; Session 0 / SYSTEM) or **`user_session`** (runs in the logged-on user's desktop session via WTS token impersonation; Windows-only, needs an active session), plus `timeout_seconds`/`elevated`. The agent does NOT run everything as LocalSystem — `user_session` is the per-user path (migration `041_add_command_context`, `agent/src/watchdog/wts.rs`). - Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted) **Dashboard — complete and working:** -Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO, Policies Dashboard (all tabs), Registry editor, MSP360 backup status card. +Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card. **Dashboard — incomplete (see UI_GAPS.md):** -- Temperature monitoring (BUG-001) — UI ready, agent-side collection never wired +- ~~Temperature monitoring (BUG-001)~~ — RESOLVED: agent-side collection is wired (LibreHardwareMonitor/WMI on Windows, /sys/class/thermal on Linux, sysinfo fallback). Roadmap BUG-001 text is stale. - Enrollment management UI (revoke keys, audit log, duplicate hostname warnings) - Watchdog alerts UI — blocked on 2 missing server routes - MSPBackups management UI — backend complete, no frontend @@ -314,6 +378,8 @@ These decisions are locked. Do not reverse without explicit user approval. ## Compilation Notes +- **2026-05-26 recompile:** Added the Capabilities / Feature Set section, synthesized from authoritative artifacts at live `main` (cd27a59) — API route modules, agent source tree, 46 migrations, roadmap, and the feat/perf commit log — NOT from session logs. This was prompted by the prior seeding missing the `user_session` command context entirely (it had only ever stated "runs as LocalSystem"). Corrected: command execution contexts, temperature monitoring (BUG-001 is resolved, not pending), Entra-only SSO, and added user-inventory/discovery/VM-detection/safe-rollout surfaces. **Changelogs are an unreliable capability source here** — committed changelogs stop at agent v0.6.22 while the fleet runs 0.6.39+; migrations + commit log are authoritative. +- Tunnel subsystem (verified against live main): agent side substantially built; server side is a dead-code skeleton (not declared in `api/mod.rs`, no routes, WS handler logs "not yet implemented"). Confirmed, not unverified. - macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). `build-mac.sh` is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified] - Tunnel subsystem: agent-side substantially complete; server-side is dead-code skeleton. Current live status unconfirmed. [unverified] - Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]