wiki: compile gururmm (full) — SPEC-030 software uninstall + universal installer
Fold delta since 2026-06-22 into the GuruRMM project article: SPEC-030 remote software inventory + bulk uninstall (engine, three-state knowledge base, async removal jobs, migrations 061-064), universal self-detecting installer (Feature 9, v0.6.71); versions to 0.6.75/0.3.87; flag stale PR #40-#46 migration numbering as [VERIFY]. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,14 +2,24 @@
|
||||
type: project
|
||||
name: gururmm
|
||||
display_name: GuruRMM
|
||||
last_compiled: 2026-06-22
|
||||
last_compiled: 2026-06-25
|
||||
compiled_by: Howard-Home/claude-main
|
||||
aliases:
|
||||
- guru-rmm
|
||||
sources:
|
||||
- "gururmm@main: server/src/api/*.rs (REST API surface, ~37 route modules)"
|
||||
- "gururmm@main: server/src/api/*.rs (REST API surface, ~38 route modules; software.rs SPEC-030)"
|
||||
- "gururmm@main: agent/src/ (agent capabilities; transport/CommandContext, ohw.rs, watchdog/wts.rs, bsod.rs, device_id.rs)"
|
||||
- "gururmm@main: server/migrations/*.sql (60 migrations — feature checkpoints through 060_alert_mutes_agent_id_index)"
|
||||
- "gururmm@main: agent/scripts/uninstall-engine.ps1 (SPEC-030 Tier-1 silent uninstall engine)"
|
||||
- "gururmm@main: server/src/db/software_jobs.rs (async removal jobs)"
|
||||
- "gururmm@main: dashboard/src/components/SoftwareManager.tsx (installed-programs list + bulk uninstall + live progress)"
|
||||
- "gururmm@main: server/migrations/*.sql (64 migrations — feature checkpoints through 064_software_removal_jobs)"
|
||||
- "gururmm@main: server/migrations/061_software_removal_attempts.sql"
|
||||
- "gururmm@main: server/migrations/062_software_knowledge.sql"
|
||||
- "gururmm@main: server/migrations/063_software_knowledge_timing.sql"
|
||||
- "gururmm@main: server/migrations/064_software_removal_jobs.sql"
|
||||
- "gururmm@main: git log origin/main (2026-06-22..2026-06-25 — SPEC-030 + universal installer)"
|
||||
- "gururmm@main: session-logs/2026-06/2026-06-24-howard-gururmm-async-removal-jobs.md"
|
||||
- "gururmm@main: session-logs/2026-06/2026-06-25-howard-win11-pit-restore-rmm-thought.md"
|
||||
- "gururmm@main: server/migrations/048_bsod_events.sql"
|
||||
- "gururmm@main: server/migrations/056_audit_log.sql"
|
||||
- "gururmm@main: server/migrations/057_log_signatures.sql"
|
||||
@@ -102,7 +112,11 @@ backlinks:
|
||||
|
||||
GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer Guru LLC for internal MSP operations and eventual productization. The server (Rust/Axum) and dashboard (React/TypeScript) are production-deployed at https://rmm.azcomputerguru.com with approximately 270 enrolled agents across multiple client sites. The agent runs on managed Windows, Linux, and macOS endpoints.
|
||||
|
||||
**Current version:** agent 0.6.67 (stable) / server v0.3.74+ as of 2026-06-22. Fleet at ~270 enrolled / ~178 typically online; all metrics flowing, alerts firing with 0 legacy null dedup_keys, Windows build green at v0.6.67 (marker `1dce66d` on origin/main).
|
||||
**Current version:** agent 0.6.75 / server v0.3.87 as of 2026-06-25 (was 0.6.67 / 0.3.74 on 2026-06-22). Fleet at ~270 enrolled / ~178 typically online; all metrics flowing, alerts firing with 0 legacy null dedup_keys.
|
||||
|
||||
**SPEC-030 Remote Software Inventory + Bulk Uninstall (SHIPPED 2026-06-23..24, Route B = server-orchestrated):** GuruRMM can now list installed programs per agent and remotely uninstall them in bulk. A PowerShell **silent-uninstall engine** (`agent/scripts/uninstall-engine.ps1`, BCU-informed, vendor silent-switch table) runs per target; the engine **refuses to guess** (returns `needs_interactive` rather than risk a wrong removal) and **verifies** each success via a generic ARP re-check (`Test-StillInstalled`) that catches the false-success class. A fleet **three-state knowledge base** (migration 062: `silent` / `requires_ui` / `unknown`, keyed by DisplayName) + **self-tuning learned timeouts** (migration 063) + a **removal-tracking loop** (migration 061: what still needs removing, flag for a remote GuruConnect session). Bulk uninstall is now an **async job** (migration 064): POST returns a `job_id` instantly, a background worker processes targets one at a time, and the dashboard polls a live progress bar — replacing the synchronous request that died at the proxy's ~100s timeout. See the Capabilities section for full detail. Known gaps: drivers/system components appear in the list (they register in ARP) — UI exclusion not yet built; Launchy/AIMP-class uninstallers that hang the agent on an orphaned child process tree need recipes (Route A), not Route B.
|
||||
|
||||
**Universal self-detecting installer (Feature 9, SHIPPED P1, v0.6.71+):** One install URL/script self-detects arch/OS/legacy and pulls the correct agent; the dashboard install UI now points at it (replaces the hardcoded-amd64 one-liner). P2 (native i386 bootstrapper EXE for locked-down boxes) pending.
|
||||
|
||||
**BUG-021 (FIXED, commit `1dce66d`, 2026-06-22):** The legacy wave (Rust 1.77 for Win7/Server 2008 R2) was resolving deps fresh and pulling `edition2024` crates. Fixed by pinning `getrandom 0.3.1` + `zeroize 1.8.1` below edition2024 in agent Cargo.toml. NOT a toolchain bump — bumping would drop legacy support. Windows build went green at 02:19, v0.6.67.
|
||||
|
||||
@@ -128,6 +142,27 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G
|
||||
|
||||
## Recent Work
|
||||
|
||||
### 2026-06-23..24 — SPEC-030 Remote Software Uninstall + Universal Installer (Howard-Home)
|
||||
|
||||
**SPEC-030 software inventory + remote bulk uninstall (Route B, shipped to main + deployed; agent 0.6.75 / server v0.3.87):**
|
||||
|
||||
- **Capability shipped:** `/agents/:id/software` lists installed programs; `/agents/:id/software/uninstall` runs a bulk uninstall; the dashboard `SoftwareManager.tsx` shows the live programs list + bulk-select + a live progress bar. Server endpoints in `server/src/api/software.rs`.
|
||||
- **Silent-uninstall engine** (`agent/scripts/uninstall-engine.ps1`): Tier-1 silent removal, BCU-informed detection + fail-fast, vendor silent-switch table, captures **real** process exit codes (fixed a `Start-Process` null-ExitCode bug, `f6c1945`). The engine **refuses to guess** — Avast/AVG (`icarus.exe`), Malwarebytes (`mb5uns.exe`) correctly returned `needs_interactive` rather than risk a bad removal.
|
||||
- **False-success class fixed (CRITICAL, PR #53):** Avira's WiX/Burn **Package Cache bundle GUID** was misread as an MSI ProductCode; `msiexec /x {bundleGuid}` returned 1605 ("not installed") and was mapped to success="already gone" while Avira stayed fully installed. Fix: skip the MSI tier for `\Package Cache\` bundles + a **generic post-success ARP re-check** (`Test-StillInstalled`) that downgrades a reported success to failed if the program is still registered. Generic (not AV-specific) so it catches the whole class.
|
||||
- **Code-review correctness fixes (PR #54):** `Test-StillInstalled` matches by `product_code` alone when present (no false-failure on same-named programs); skip the rescan on reboot-pending verdicts (3010/1641); exclude MSI-1605 no-op successes from the learned-timing sample.
|
||||
- **Async removal jobs — the permanent fix (PR #56, migration 064):** A synchronous bulk uninstall died at Cloudflare's ~100s timeout (popup vanished, no progress) AND the agent's command kill-timeout == the engine budget killed the engine mid-flush (37 apps removed but reported `failed`/no-output). Rebuilt as a **JOB**: POST creates a `software_removal_jobs` row and returns a `job_id` instantly (0.06s); a background worker dispatches the engine one program at a time, recording each outcome as it lands; `GET /agents/:id/software/uninstall/jobs/:job_id` is polled by the dashboard. `server/src/db/software_jobs.rs` (new) + async refactor of `software.rs`. Stop-gap flush-slack (PR #55, `ENGINE_FLUSH_SLACK_SECS=120`) shipped first.
|
||||
- **Three-state removal knowledge base (migrations 061-063):** `software_removal_attempts` (per-agent removal-tracking loop — failures + no-silent-path programs flagged for a remote session; resolved when a later removal succeeds), `software_knowledge` (fleet KB keyed by DisplayName: `silent` / `requires_ui` / `unknown`), `software_knowledge_timing` (fleet-learned self-tuning per-target timeout — generous while barely measured, narrowing toward the slowest observed success). Dashboard three-state KB UI shipped.
|
||||
- **Orphaned-entry + retry-verify (PR #57):** GOM Player = orphaned ARP entry (uninstaller file missing) → `tier=orphaned` "needs registry cleanup"; TeamViewer async-fork false-success → retry-verify (~9s, 4 checks); bounded async stdout/stderr read.
|
||||
- **Known gaps surfaced (not yet built):** (1) drivers/system components (Intel/Realtek, Asmedia USB) appear in `/software` because they register in ARP — a top-down 100-program select removed a driver; the UI needs a driver/system-component exclusion/flag. (2) **Launchy/AIMP-class** uninstallers don't honor silent-as-SYSTEM and hang the agent on a lingering child process tree until the agent kill-timeout (~420s) → `failed`/no-output; Route B can't fully control this — they need **recipes / agent-side process handling (Route A)**. Open decision: kill-process-tree-on-timeout engine mitigation (clean `needs_interactive`) vs accept as recipe work. (3) No list-jobs endpoint yet (only get-by-id).
|
||||
- **av-removal-recipes spec (P3, scoped to removal-only, no RMM install path)** and the **Third-Party AV Removal Recipes** RMM_THOUGHTS entry capture the recipe-tier follow-on.
|
||||
|
||||
**Universal self-detecting installer (Feature 9 — P1 built+deployed+verified, v0.6.71):**
|
||||
- One install path self-detects arch/OS (x64/x86/legacy/ARM) and pulls the correct agent; `feat(installer)` `4194b0a`, dashboard install UI repointed at it (`53bb682`). Script path proven on-box (GND-JWILL, `de30ebc`). P2 (native i386 bootstrapper EXE for locked-down boxes) + offline-bundle still pending.
|
||||
|
||||
**RMM_THOUGHTS movement:** Feature 10 (AMPIPIT recovery environment add-on) → **Discussed** (Mike likes it, deferred to revisit); Third-Party AV Removal Recipes + Win11 KB5095093 Point-in-Time Restore (2026-06-25) added as Raw.
|
||||
|
||||
---
|
||||
|
||||
### 2026-06-22 — BUG-021 Fixed + v0.6.67 Stable; Fleet Verified (Howard-Home)
|
||||
|
||||
- **BUG-021:** Legacy build (Rust 1.77 via `$CARGO +1.77 --features legacy`) was pulling `wit-bindgen` (edition2024) through fresh dep resolution. Fixed by pinning `getrandom 0.3.1` + `zeroize 1.8.1` in agent Cargo.toml below edition2024. Commit `1dce66d` (current origin/main HEAD). Windows build went green at 2026-06-22 02:19, v0.6.67.
|
||||
@@ -276,7 +311,7 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G
|
||||
|
||||
## Capabilities / Feature Set
|
||||
|
||||
*Synthesized from authoritative artifacts (API routes, agent modules, 60 migrations through migration 060, roadmap, commit log) — not from session logs alone. See Compilation Notes.*
|
||||
*Synthesized from authoritative artifacts (API routes, agent modules, 64 migrations through migration 064, roadmap, commit log) — not from session logs alone. See Compilation Notes.*
|
||||
|
||||
Agent<->server communication is a persistent authenticated WebSocket with auto-reconnect + heartbeat; on reconnect, in-flight commands flip to `interrupted`. Platform-parity rule: agent features ship on Windows/Linux/macOS in the same change (stub + TODO where a real impl isn't yet feasible).
|
||||
|
||||
@@ -299,6 +334,14 @@ Agent<->server communication is a persistent authenticated WebSocket with auto-r
|
||||
- Code anchors: `server/src/db/commands.rs` (`requeue_undelivered_commands`, `fail_timed_out_commands` rewrite, `DELIVERY_ACK_DEADLINE_SECS=60`, `MAX_DELIVERY_ATTEMPTS=10`), `server/src/ws/mod.rs` (`redispatch_pending_commands`, `CommandAck` handler), `agent/src/transport/websocket.rs` + `agent/src/commands/mod.rs` (ACK send + recent-results cache, `RECENT_CAP=64`, `MAX_CACHED_RESULT_BYTES=256 KB`).
|
||||
- Phase 2 (live TTY — rides warm WS, seq/resume, cadence switch) and Phase 3 (Adaptive keepalive, bulk file transfer -> short-lived HTTPS, server half-open eviction sweeper) are PLANNED, not shipped.
|
||||
|
||||
### Software Inventory & Remote Uninstall (SPEC-030, shipped 2026-06-23..24)
|
||||
- **Route B (server-orchestrated):** the server drives removal via agent commands; chosen over embedding the engine per-command (32K dispatch limit). Endpoints: `GET /agents/:id/software` (installed-programs list), `POST /agents/:id/software/uninstall` (async bulk uninstall → `job_id`), `GET /agents/:id/software/uninstall/jobs/:job_id` (live job poll), `GET /agents/:id/software/removal-status` + `POST .../removal-status/:attempt_id/resolve` (removal-tracking loop), `GET /software/knowledge` + `POST /software/knowledge/classify` (fleet KB).
|
||||
- **Silent-uninstall engine** (`agent/scripts/uninstall-engine.ps1`): Tier-1 silent removal, BCU-informed detection + fail-fast, vendor silent-switch table, captures real process exit codes. **Refuses to guess** — returns `needs_interactive` rather than risk a wrong removal (Avast/AVG/Malwarebytes correctly bounce). **Generic post-success verification** (`Test-StillInstalled`, ARP re-check by product_code when present) downgrades a false "success" to failed — catches the WiX/Burn Package-Cache-bundle 1605 false-success class (skip MSI tier for `\Package Cache\` bundles). Orphaned ARP entries (missing uninstaller) classified `tier=orphaned`; reboot-pending (3010/1641) skips the rescan; async-fork uninstallers get retry-verify (~9s/4 checks).
|
||||
- **Async removal jobs (migration 064):** bulk uninstall is non-blocking — `software_removal_jobs` row created, `job_id` returned in ~0.06s, a background worker processes targets one at a time recording each outcome as it lands, dashboard polls a live progress bar. Replaces the synchronous request that died at the proxy's ~100s timeout and that the agent kill-timeout truncated mid-flush. `server/src/db/software_jobs.rs`.
|
||||
- **Fleet removal knowledge base (migrations 061-063):** `software_removal_attempts` (per-agent: programs that did NOT remove silently — failures + no-silent-path — so the dashboard shows what still needs removing and flags ones needing a remote GuruConnect session; flips to resolved on a later success). `software_knowledge` (fleet "our list" keyed by exact DisplayName, three-state: `silent` = verified silent method exists / `requires_ui` = VM-verified cannot be removed silently / `unknown` = default-uninstaller log kept). `software_knowledge_timing` (self-tuning per-target uninstall timeout — running sample count + slowest success observed; wide while barely measured, narrowing toward measured reality; 1605 no-ops excluded).
|
||||
- **Dashboard:** `SoftwareManager.tsx` — live installed-programs list, bulk-select, fire-then-poll uninstall with live progress bar + per-program results; three-state removal-knowledge UI.
|
||||
- **Known gaps:** drivers/system components register in ARP and appear in the list (a top-down bulk-select can nuke a driver — UI exclusion/flag not yet built); Launchy/AIMP-class uninstallers hang the agent on an orphaned child process tree (~420s kill-timeout) → `failed`/no-output, fundamentally need recipes / agent-side process handling (Route A); no list-jobs endpoint yet (get-by-id only). The **av-removal-recipes spec (P3, removal-only)** is the recipe-tier follow-on.
|
||||
|
||||
### Inventory & Discovery
|
||||
- Hardware inventory (mfr/model/serial/BIOS, CPU, memory, disks, NICs, OS), software inventory (installed apps), service inventory. On-demand refresh.
|
||||
- VM / hypervisor / container detection (`032/033`): `is_virtual_machine`, `hypervisor_type`, `vm_uuid`, `is_hypervisor` + hosted VM UUIDs, `is_container`, `is_unraid`.
|
||||
@@ -313,6 +356,7 @@ Agent<->server communication is a persistent authenticated WebSocket with auto-r
|
||||
- Safe-rollout (`046`): `update_rollouts`/health-metrics/events tables + `/updates/rollouts` promote/rollback. **Health-gated automation is written-but-unwired (Phase 2); promotion is manual via API.**
|
||||
- **Container guard (BUG-019, shipped `66a7f4e`, 2026-06-21):** Agent inside a container early-returns from `perform_update()` with `UpdateStatus::Failed` + clear message, skipping the in-place binary swap that caused silent downgrades on container recreate. Full image-update path (SPEC-023) is not yet implemented.
|
||||
- **Legacy fleet build support (two-wave parallel build):** Agent ships both a stable-toolchain (modern x86_64 + x86, debug, tray, cleanup) wave and a legacy wave (Rust 1.77, `--features legacy`, for Win7/Server 2008 R2 endpoints). The `gururmm-build` skill wraps pre-merge verification. IMPORTANT: the legacy wave resolves deps fresh on Rust 1.77 — dep pins to avoid edition2024 pulls are required (BUG-021 lesson: pin `getrandom 0.3.1` + `zeroize 1.8.1`).
|
||||
- **Universal self-detecting installer (Feature 9, P1 shipped v0.6.71):** A single install path self-detects arch/OS (x64 / x86 / legacy / ARM) and pulls the correct agent binary, so one URL/script covers the full Windows variant matrix. The dashboard install UI points at it (replaces the hardcoded-amd64 `irm … | iex` one-liner). MSI stays for GPO/Intune. P2 (native i386 bootstrapper EXE for locked-down boxes) + an offline-bundle variant are pending.
|
||||
|
||||
### Policy & Configuration Management
|
||||
- Inheritance chain global -> client -> site -> agent; server computes merged effective policy, pushes via `ConfigUpdate`. Effective policy queryable per scope.
|
||||
@@ -459,18 +503,22 @@ gururmm/
|
||||
│ └── main.rs systemd unit template generation
|
||||
├── server/ Rust/Axum API server
|
||||
│ └── src/
|
||||
│ ├── api/ REST handlers (~37 route modules; updates.rs: promote/rollback)
|
||||
│ ├── api/ REST handlers (~38 route modules; updates.rs: promote/rollback; software.rs SPEC-030)
|
||||
│ ├── alerts/ Alerting modules (offline.rs: offline_sweep + mass_offline)
|
||||
│ ├── db/ Database layer (sqlx); commands.rs (requeue_undelivered, fail_timed_out rewrite)
|
||||
│ │ software_jobs.rs (SPEC-030 async removal jobs)
|
||||
│ │ watchdog_events.rs (doc-only stub — table exists, path removed per BUG-022 PR #45)
|
||||
│ ├── fingerprint.rs Log signature normalization + hash
|
||||
│ ├── ws/ WebSocket handler (CommandAck, redispatch_pending_commands)
|
||||
│ └── mspbackups/ MSP360 backup integration
|
||||
├── dashboard/ React/TypeScript UI
|
||||
│ └── src/pages/
|
||||
│ └── EventLogWatches.tsx CRUD management UI (shipped 0fa65f5)
|
||||
│ ├── pages/
|
||||
│ │ └── EventLogWatches.tsx CRUD management UI (shipped 0fa65f5)
|
||||
│ └── components/
|
||||
│ └── SoftwareManager.tsx SPEC-030 installed-programs list + bulk uninstall + live progress
|
||||
├── agent/scripts/ uninstall-engine.ps1 (SPEC-030 Tier-1 silent uninstall engine)
|
||||
├── tray/ System tray binary
|
||||
├── installer/ WiX v4 MSI (gururmm-agent.wxs); cleanup.ps1 (preserves device_id)
|
||||
├── installer/ WiX v4 MSI (gururmm-agent.wxs); cleanup.ps1 (preserves device_id); universal self-detecting installer (Feature 9)
|
||||
├── deploy/
|
||||
│ └── build-pipeline/ webhook-handler.py, build-*.sh, build-server.sh
|
||||
├── scripts/ Build/ops scripts
|
||||
@@ -484,17 +532,10 @@ gururmm/
|
||||
|
||||
### Current Focus
|
||||
|
||||
As of 2026-06-22 (agent 0.6.67 stable / server v0.3.74+):
|
||||
As of 2026-06-25 (agent 0.6.75 / server v0.3.87):
|
||||
|
||||
- **BUG-022 (PR #45, pending merge):** Remove dead WatchdogEvent WS path. Merging PR #45 (`fix/bug-022-watchdog-event-deadcode`) = fleet build+deploy. Empty `watchdog_events` table stays until a future consolidated cleanup migration.
|
||||
- **Open PRs awaiting merge (migration order matters — merge in order: 060 already on main; 061 -> 062 -> 063):**
|
||||
- PR #40 SPEC-021 + BUG-020 (migration 063, renumbered from 060): logged-in-user domain/account-type detection + watchdog tray-teardown wiring. Also needs Pluto-signed-MSI agent build + fleet rollout.
|
||||
- PR #41 BUG-018 FK indexes (migration 061): FK indexes on five previously-unindexed cascade children (speeds the BUG-018 background purge). BUG-018 handler itself is already merged (`cea87d4`).
|
||||
- PR #42 MSP360 deep-link (migration 062): "Open in MSP360" button on backup tab plan card.
|
||||
- PR #43 Event Log Watch policy-clobber HIGH fix: no migration; merge anytime.
|
||||
- PR #44 Audit cleanup (low findings from 2026-06-21 audit).
|
||||
- PR #45 BUG-022 watchdog dead code.
|
||||
- PR #46 docs (BUG-021/022 roadmap status updates + RMM_THOUGHTS).
|
||||
- **SPEC-030 software-uninstall follow-on (Route A / recipes):** Launchy/AIMP-class uninstallers hang the agent on an orphaned child process tree → `failed`/no-output; Route B can't fully control this. Open decision: a kill-process-tree-on-timeout engine mitigation (return a clean `needs_interactive`) vs the **av-removal-recipes** tier (P3 spec, removal-only). Also: a **driver/system-component exclusion/flag** in the Software Manager UI (drivers register in ARP and appear in the bulk-select list — a top-down select removed a driver), and a **list-jobs endpoint** (currently get-by-id only).
|
||||
- **[VERIFY] Earlier pending PRs #40-#46 (SPEC-021 logged-in-user domain detection, BUG-018 FK indexes, MSP360 deep-link, Event Log Watch policy-clobber HIGH fix, audit cleanup, BUG-022 watchdog dead-code):** these were open as of 2026-06-22 and predicted to take migrations 061-063. The **software-removal work merged ahead and consumed migrations 061-064**, so those predicted numbers no longer hold. Confirm the live merge/migration status of each before relying on it — do not assume the 2026-06-22 numbering. (BUG-022 WatchdogEvent dead-code removal — verify whether it landed; the empty `watchdog_events` table remains either way.)
|
||||
- **Agent-comms-durability Phase 2 (live TTY, planned):** Extend WS message enum with TTY stream frames (stdin/stdout/stderr, resize, seq). "Activate" cadence switch. Resume from seq on mid-session drop. Single-use time-bounded session token; max one interactive session per agent; full audit. Estimated 3-5 days separate effort.
|
||||
- **Agent-comms-durability Phase 3 (planned):** Adaptive keepalive (AIMD, persist interval), bulk file transfer -> short-lived HTTPS, server half-open eviction sweeper (`last_inbound` track + evict >~90s; treat missed Pong as close).
|
||||
- **Durable agent identity Phase 1 Tasks 2-3 (pending):** Task 2 = hardware-fingerprint capture. Task 3 = dashboard "probable duplicate" surfacing (read-only). Phase 2 (guarded auto-reclaim) + Phase 3 (operator merge tool for the 9 existing ghosts) pending soak.
|
||||
@@ -644,9 +685,9 @@ Dashboard changes go to beta BEFORE main. To preview a feature branch without me
|
||||
|
||||
## Active State
|
||||
|
||||
**Fleet (as of 2026-06-22):**
|
||||
**Fleet (as of 2026-06-25):**
|
||||
- ~270 enrolled agents total; ~178 typically online
|
||||
- Stable channel: 0.6.67 windows/amd64 (Windows build green 2026-06-22 02:19, marker `1dce66d`)
|
||||
- Agent 0.6.75 / server v0.3.87 (up from 0.6.67 / 0.3.74 on 2026-06-22 — SPEC-030 software uninstall + universal installer landed)
|
||||
- Metrics flowing (~2531 rows/15 min); alerts firing with 0 legacy null dedup_keys; per-agent API endpoints all HTTP 200
|
||||
- Beta channel: site "Mike's Car" (`103c10b9-c1de-4dd8-b382-b8362ed3143e`) has `update_channel='beta'` (persists across re-enrollment). All GURU-5070 machines are on this site.
|
||||
|
||||
@@ -672,10 +713,11 @@ Dashboard changes go to beta BEFORE main. To preview a feature branch without me
|
||||
- `POST /api/auth/login` -> JWT (~24h)
|
||||
- Creds: vault `infrastructure/gururmm-server.sops.yaml` -> `credentials.gururmm-api.admin-email` / `admin-password`; or via `bash .claude/scripts/rmm-auth.sh`
|
||||
- Key endpoints: `GET /api/agents`, `POST /api/agents/:id/command`, `GET /api/commands/:id`, `POST /api/agents/:id/update`, `POST /api/updates/rollouts/:version/promote`
|
||||
- Software (SPEC-030): `GET /api/agents/:id/software`, `POST /api/agents/:id/software/uninstall` (→ job_id), `GET /api/agents/:id/software/uninstall/jobs/:job_id`, `GET /api/agents/:id/software/removal-status`, `POST /api/agents/:id/software/removal-status/:attempt_id/resolve`, `GET /api/software/knowledge`, `POST /api/software/knowledge/classify`
|
||||
- Command fields: `command_type` (`shell`/`powershell`/`python`/`script`/`claude_task`/`cmd` alias), `command` (script text, JSON-encoded), optional `context` — `system` (default) or `user_session` (Windows WTS), plus `timeout_seconds`/`elevated`.
|
||||
|
||||
**Dashboard — complete and working:**
|
||||
Agents management (delete now returns 202 + background purge), Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis (Claude API), Alerts (clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO, Policies Dashboard (all tabs), Registry editor (read-only via HTTP), MSP360 backup status + mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance, AgentDetail Crashes tab + version history, fleet stats from `/agents/stats`, SiteDetail Revoke Key + Enrollment audit tab, Install Reports page, Fleet Discovery page, **Event Log Watches management page** (`/event-log-watches`, shipped `0fa65f5`).
|
||||
Agents management (delete now returns 202 + background purge), Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis (Claude API), Alerts (clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO, Policies Dashboard (all tabs), Registry editor (read-only via HTTP), MSP360 backup status + mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance, AgentDetail Crashes tab + version history, fleet stats from `/agents/stats`, SiteDetail Revoke Key + Enrollment audit tab, Install Reports page, Fleet Discovery page, **Event Log Watches management page** (`/event-log-watches`, shipped `0fa65f5`), **Software Manager** (installed-programs list + bulk remote uninstall with live progress + three-state removal knowledge base, SPEC-030, shipped 2026-06-23..24).
|
||||
|
||||
**Dashboard — incomplete (see UI_GAPS.md):**
|
||||
- Watchdog alerts UI — blocked on 2 missing server routes
|
||||
@@ -751,6 +793,9 @@ These decisions are locked. Do not reverse without explicit user approval.
|
||||
| 2026-06-18 | sync.sh submodule auto-heal verified fleet-wide after earlier 2026-06-15 fix. RMM_THOUGHTS 2026-06-08 re-grounding pass + 8 Raw sections rescued and staged. |
|
||||
| 2026-06-21 | BUG-019 (container self-update guard) fixed and merged to main (66a7f4e, v0.6.67 beta). BUG-018 (DELETE 202+bg, cea87d4) merged. Event Log Watch UI shipped (0fa65f5). Enrollment modal UX fix merged to prod (4027c86). sync.sh populate-only guard (Phase 2) fixed submodule-clobber root cause. Howard cleared for GuruRMM merges/deploys (Mike decision). gururmm-build skill + docs/BUILD.md created. Five PRs opened (#40-#44) covering SPEC-021, BUG-018 FK indexes, MSP360 deeplink, Event Log Watch policy-clobber HIGH fix, audit cleanup. |
|
||||
| 2026-06-22 | BUG-021 (legacy build dep-pin getrandom+zeroize) fixed on main (1dce66d). Windows build green at v0.6.67, 2026-06-22 02:19. Fleet verified: ~270 agents / ~178 online, 0 legacy null dedup_keys. BUG-022 filed + fixed (PR #45): removed dead WatchdogEvent WS path (no producer — watchdog has no WS connection; REST watchdog-alert is the only real path). |
|
||||
| 2026-06-23 | Universal self-detecting installer (Feature 9) P1 built+deployed+verified (v0.6.71). One install path self-detects arch/OS/legacy/ARM and pulls the correct agent; dashboard install UI repointed at it (`4194b0a`, `53bb682`); script path proven on GND-JWILL (`de30ebc`). av-removal-recipes spec (P3, removal-only) added (#52). RMM_THOUGHTS Feature 10 (AMPIPIT recovery env) → Discussed. |
|
||||
| 2026-06-24 | SPEC-030 remote software inventory + bulk uninstall shipped (Route B). Silent-uninstall engine (`uninstall-engine.ps1`, BCU-informed, refuses to guess). CRITICAL Avira false-success fixed — WiX Package-Cache bundle GUID misread as MSI ProductCode (1605 "already gone" while installed); skip MSI tier for Package Cache + generic `Test-StillInstalled` ARP re-check (PR #53). Code-review correctness fixes (PR #54). Async removal jobs = permanent fix (PR #56, migration 064): POST→job_id in 0.06s, background per-target worker, dashboard live progress — replaces the synchronous request that died at the proxy ~100s timeout + was truncated by the agent kill-timeout mid-flush. Flush-slack stop-gap (PR #55). Three-state knowledge base + learned timing (migrations 061-063). Orphaned-entry + retry-verify (PR #57). Gaps surfaced: drivers in `/software` (ARP), Launchy/AIMP need recipes (Route A), no list-jobs endpoint. agent 0.6.75 / server v0.3.87. |
|
||||
| 2026-06-25 | Win11 KB5095093 reviewed for MSP fleet use; Point-in-Time Restore (Enterprise snapshot/rollback knobs) + GP update-pause captured as a Raw RMM_THOUGHTS entry (manage the knobs from the RMM; sits below AMPIPIT Feature 10 on the remediation ladder). |
|
||||
|
||||
---
|
||||
|
||||
@@ -765,6 +810,7 @@ These decisions are locked. Do not reverse without explicit user approval.
|
||||
- **2026-06-04 recompile:** Corrected GURU-5070 channel state. Stable fleet pinned at 0.6.47. BUG-020 documented.
|
||||
- **2026-06-07 recompile:** Folded in backup-alert quality pass, credential inheritance, offline alerting + mute, UI gap batch + enrollment audit. Updated migration count to 55+ (054/055 confirmed).
|
||||
- **2026-06-11 recompile (GURU-5070/claude-main):** Full recompile. Added: (1) Physical server migration (Ubuntu 26.04, PG 18, binary at /opt/gururmm, old VM .46 rollback anchor). (2) Durable agent identity (ghost root cause, spec, Phase 1 Task 1 durable device_id, channel pin fix). (3) Agent comms durability Phase 1 full detail (spec, slices A/B/C, deployment, fleet rollout, canary verification). (4) New capabilities: Audit Log (migration 056), Systemic Log Feedback Intelligence (migration 057), comms durability architecture. (5) Updated fleet size (~215 enrolled, 168-182 online). (6) Updated agent/server versions to 0.6.63/0.3.68. (7) Build pipeline: server IS auto-deployed by webhook (correction to earlier assumption); parallel build prototype on Pluto (3.51x, deferred integration). (8) Channel/promotion model documented in detail. (9) Downloads layout documented. (10) Updated server binary path + SSH details for physical box. (11) Added new anti-patterns (installer `&` operator, reaper false-fail, agent-level channel pin). Added ADR-11 (durability-first command delivery). Migration count updated to 59. Patterns/History preserved verbatim except new entries added.
|
||||
- **2026-06-25 recompile (Howard-Home/claude-main):** Delta from 2026-06-22 (full recompile against live origin/main @ `0074c3a`; migrations 60 → 64; agent 0.6.67 → 0.6.75, server 0.3.74 → 0.3.87). (1) **SPEC-030 remote software inventory + bulk uninstall** added as a major new Capabilities subsection — Route B (server-orchestrated) endpoints, the `uninstall-engine.ps1` silent engine (refuses-to-guess + generic `Test-StillInstalled` ARP verification + Package-Cache-bundle 1605 false-success fix + orphaned/retry handling), async removal jobs (migration 064), three-state fleet knowledge base + learned timing (migrations 061-063), `SoftwareManager.tsx` dashboard. New `software.rs` API routes + `software_jobs.rs` DB module. Known gaps captured (drivers-in-ARP UI exclusion, Launchy/AIMP Route-A recipes, no list-jobs endpoint). (2) **Universal self-detecting installer (Feature 9)** P1 shipped (v0.6.71) — Patch/Update + repo-structure + history updated. (3) New Recent Work entry (2026-06-23..24) + 4 History rows (06-23/06-24/06-25). (4) **Current Focus rewritten** — SPEC-030 follow-on (recipes, driver exclusion, list-jobs); **flagged the old "pending PRs #40-#46, migrations 061-063" block as [VERIFY]** because software-removal merged ahead and took migrations 061-064, invalidating the predicted numbering. (5) Fleet/version/dashboard/API-endpoint blocks updated. (6) Sources + migration-count updated. History/Patterns preserved verbatim except new entries added. NOTE: the pre-2026-06-22 SPEC-030 migration-number predictions in the older Compilation Note below are left as the historical record — see the 2026-06-25 [VERIFY] note for current reality.
|
||||
- **2026-06-22 recompile (Howard-Home/claude-main):** Delta from 2026-06-11: (1) Fleet updated to ~270 enrolled / ~178 online, agent v0.6.67. (2) BUG-021 (legacy wave dep-pin, commit 1dce66d), BUG-018 (DELETE 202+bg, cea87d4), BUG-019 (container guard, 66a7f4e) all fixed on main. (3) Event Log Watch UI shipped (0fa65f5). (4) Enrollment modal UX fix to prod (4027c86). (5) Watchdog section corrected per BUG-022: WatchdogEvent WS path was dead code (no producer; watchdog has no WS connection); removed in PR #45. REST `watchdog-alert` is the only supported watchdog alert path. (6) Beast parallel two-wave Windows build documented (lever A, 336s, ~3.8x vs Pluto). (7) BUG-021 dep-pin gotcha documented (getrandom 0.3.1 + zeroize 1.8.1). (8) command_type "cmd" alias + NAK documented. (9) Event Log Watch policy-clobber HIGH fix (PR #43). (10) MSP360 deep-link (PR #42, mig 062). (11) SPEC-021 (PR #40, mig 063). (12) sync.sh submodule-clobber root-cause fix (populate-only guard). (13) BSOD dedup key changed to bugcheck code. (14) MSI EXDEV fix. (15) 500-error-leak fix. (16) logs/analyze switched to Claude API. (17) gururmm-build skill + docs/BUILD.md. (18) Howard cleared for merges/deploys. (19) Dashboard beta-before-main rule. (20) Migration count updated to 60 (origin/main), with 061-063 on pending branches. (21) Old VM confirmed deleted (2026-06-12). (22) Open PRs #40-#46 documented with merge-order. (23) New anti-patterns added (command_type cmd, cargo +1.77 fetch, target-dir concurrency, BUG-021 dep-pin, stale snapshot build status, config-push clobber, stale submodule working tree). (24) New good patterns added (worktrees for concurrency, gururmm-build skill, beta-before-main).
|
||||
|
||||
## Backlinks
|
||||
|
||||
Reference in New Issue
Block a user