sync: auto-sync from GURU-5070 at 2026-06-07 10:33:04

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-07 10:33:04
This commit is contained in:
2026-06-07 10:33:08 -07:00
parent 7ba2f26fde
commit b848e34a8e
3 changed files with 106 additions and 23 deletions

View File

@@ -2,7 +2,7 @@
type: project
name: gururmm
display_name: GuruRMM
last_compiled: 2026-06-04
last_compiled: 2026-06-07
compiled_by: GURU-5070/claude-main
aliases:
- guru-rmm
@@ -50,6 +50,7 @@ sources:
- session-logs/2026-05-31-howard-gururmm-roadmap-and-features.md
- session-logs/2026-06-02-mike-bsod-detection-and-pipeline.md
- "live GuruRMM Postgres query 2026-06-04: agents/sites/update_rollouts/agent_updates tables (channel verification)"
- session-logs/2026-06-07-mike-gururmm-backup-alert-cleanup.md
backlinks:
- clients/cascades-tucson
- systems/gururmm-build
@@ -65,6 +66,8 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G
**Current version:** agent 0.6.54 (beta) / 0.6.47 (stable) / server 0.3.37 as of 2026-06-04. Fleet on stable target 0.6.47 (pinned 2026-05-28); GURU-5070 is the lone beta agent (explicit per-agent override), running 0.6.54 and auto-riding each new beta build. Note: committed changelogs are stale (stop at agent v0.6.22 / server v0.3.1) — migrations + commit log are the authoritative feature record, not changelogs.
**Backup-alert quality pass shipped 2026-06-07:** False `backup_failed` alerts reduced 15 -> 2 fleet-wide (commits `779f7f6` + `b82c010` on main). `backup_storage_low` alert type removed entirely — the `DataCopied/TotalData` ratio measures backup-dataset completeness, not destination capacity, and produced 5 fleet-wide false alerts. See Backup Integration section for full detail.
**See also:** `wiki/projects/guru-rmm.md` is a redirect tombstone pointing here (slug disambiguation: on-disk directory is `guru-rmm` hyphenated; wiki and Gitea repo use `gururmm` no-hyphen).
**Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a git submodule tracking the active `azcomputerguru/gururmm` repo; the pinned pointer normally lags `main` (expected). Development happens in the submodule working tree and changes are committed and pushed to Gitea from there.
@@ -77,7 +80,7 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G
*Synthesized from authoritative artifacts (API routes, agent modules, 48 migrations, roadmap, commit log) at live `main` — not from session logs. See Compilation Notes.*
Agentserver communication is a persistent authenticated WebSocket with auto-reconnect + heartbeat; on reconnect, in-flight commands flip to `interrupted`. Platform-parity rule: agent features ship on Windows/Linux/macOS in the same change (stub + TODO where a real impl isn't yet feasible).
Agent<->server communication is a persistent authenticated WebSocket with auto-reconnect + heartbeat; on reconnect, in-flight commands flip to `interrupted`. Platform-parity rule: agent features ship on Windows/Linux/macOS in the same change (stub + TODO where a real impl isn't yet feasible).
### Monitoring & Telemetry
- Core metrics per interval (policy-tunable per section): CPU %, memory %/bytes, disk %/bytes, network rx/tx deltas, uptime, logged-in user, user idle time (Win `GetLastInputInfo`, Linux `xprintidle`), public/WAN IP (cached, multi-service fallback). Cross-platform via `sysinfo`.
@@ -95,7 +98,7 @@ Agent↔server communication is a persistent authenticated WebSocket with auto-r
### Inventory & Discovery
- Hardware inventory (mfr/model/serial/BIOS, CPU, memory, disks, NICs, OS), software inventory (installed apps), service inventory. On-demand refresh.
- VM / hypervisor / container detection (`032/033`): `is_virtual_machine`, `hypervisor_type`, `vm_uuid`, `is_hypervisor` + hosted VM UUIDs, `is_container`, `is_unraid`.
- User/group inventory (`037``040`): local + domain + Azure AD accounts (enabled, pw-never-expires, last-logon, is_admin, AD email/UPN/dept), domain-join classification (none/ad/aad/hybrid), domain name, M365 tenant ID, **domain-controller detection (`is_dc`)**, group membership. Policy-scheduled (default 24h).
- User/group inventory (`037`-`040`): local + domain + Azure AD accounts (enabled, pw-never-expires, last-logon, is_admin, AD email/UPN/dept), domain-join classification (none/ad/aad/hybrid), domain name, M365 tenant ID, **domain-controller detection (`is_dc`)**, group membership. Policy-scheduled (default 24h).
- Network discovery: server-dispatched scans — TCP probes over configurable ranges/ports, ARP MAC, reverse DNS, basic OS fingerprint; devices stream back and are persisted/editable.
### Patch / Agent Update Management
@@ -104,7 +107,7 @@ Agent↔server communication is a persistent authenticated WebSocket with auto-r
- Safe-rollout (`046`): `update_rollouts`/health-metrics/events tables + `/updates/rollouts` promote/rollback. **Scaffolding only — promotion is manual; health-gated automation is written-but-unwired (Phase 2).**
### Policy & Configuration Management
- Inheritance chain global client site agent; server computes merged effective policy, pushes via `ConfigUpdate`. Effective policy queryable per scope.
- Inheritance chain global -> client -> site -> agent; server computes merged effective policy, pushes via `ConfigUpdate`. Effective policy queryable per scope.
- Checks engine (`018`/`019`): cpu, memory, disk, ping, port, script, service (restart-if-stopped, pass-if-not-exist; Win `sc.exe`, Linux `systemctl`). Policy-attached check templates (`024`) with push-to-agent sync. On-demand `run-checks`.
- Remote registry (Windows, `winreg`): agent supports enumerate/read/write (typed)/create/delete. **HTTP API currently exposes read-only (enumerate, read_value); write paths exist in the agent but aren't routed yet [verify].**
@@ -113,7 +116,12 @@ Agent↔server communication is a persistent authenticated WebSocket with auto-r
- Watchdog: **separate** supervising process (polls `GuruRMMAgent` every 30s, restart backoff, alert after 3 fails) + launches/reaps the tray into active user sessions via WTS. Full alert CRUD + ack/resolve.
### Backup Integration (MSP360 / MSPBackups)
- Multi-provider config (`034`/`035`) with connection test, scheduled sync, per-agent + all-providers status, fleet coverage report, and agentMSP360 mapping (`044`) with confidence scoring + manual verification. Dashboard UI for mappings/verify shipped 2026-05-31.
- Multi-provider config (`034`/`035`) with connection test, scheduled sync, per-agent + all-providers status, fleet coverage report, and agent<->MSP360 mapping (`044`) with confidence scoring + manual verification. Dashboard UI for mappings/verify shipped 2026-05-31.
- **Alert quality pass (2026-06-07, commit `779f7f6`):** Non-backup MSP360 PlanTypes (8=Restore, 13=Consistency-check) excluded from backup alerting and compliance evaluation (FU2 guard). MSP360 message JSON decoded into readable alert text via `summarize_backup_error` (FU1). `create_or_update_alert` now refreshes `title`/`message`/`severity` on re-trigger, also fixing a latent severity-escalation freeze where re-triggered alerts kept stale severity. Fleet result: false `backup_failed` alerts 15 -> 2; survivors (AD1: retention warning + file skips, LAB-Becky: no storage account configured) are genuine and self-describing.
- **`backup_storage_low` alert type REMOVED (2026-06-07, commit `b82c010`):** `check_storage_threshold` computed `usage_percent = DataCopied / TotalData * 100`, but those MSP360 fields describe how much of the source dataset was uploaded, not the cloud destination's remaining capacity. Image/full-backup plans are naturally near 100% by design; the check produced 5 fleet-wide false alerts (3 Critical), the clearest being DF-HYPERV-B "100% Full" on a 4 GB Hyper-V plan. MSP360 Managed Backup does not expose true destination capacity in those fields. Removed `check_storage_threshold` call + function; added `resolve_all_backup_storage_alerts` (type-scoped UPDATE, idempotent, once-per-sync-tick after prune) to clear the 5 stale alerts. Fleet: 5 -> 0 `backup_storage_low` alerts verified. Genuine destination-capacity alerting deferred (would need MSP360 storage-accounts endpoint — separate feature, not scoped).
- **Backup staleness:** Handled by the existing `BACKUP_STALE` evaluator (cadence-derived window + 7-day backstop + unit tests). A plan reporting status=success can still be `non_compliant/BACKUP_STALE` if its last run is past the window (e.g. an abandoned plan on the same agent as a healthy current plan). No new code required.
- **Key functions** (`server/src/mspbackups/` + `server/src/db/`): `derive_backup_status`, `error_is_benign`, `summarize_backup_error`, `resolve_orphaned_backup_alerts`, `resolve_all_backup_storage_alerts`, `evaluate_plan` (BACKUP_STALE), `create_or_update_alert`.
- MSP360 PlanType map: 3=Files, 7=SQL, 8=Restore, 11=Image, 13=Consistency-check, 16=HyperV. Non-backup (excluded from alerting/compliance): 8, 13.
### Remote Access (Tunnel)
- Agent side substantially built (`TunnelManager` state machine; Open/Close/Data). **Server side is a dead-code skeleton — not declared in `api/mod.rs`, no `/tunnel` routes, WS handler logs "not yet implemented." Not production-ready.**
@@ -193,9 +201,9 @@ gururmm/
├── server/ Rust/Axum API server
│ └── src/
│ ├── api/ REST handlers
│ ├── db/ Database layer (sqlx); db/bsod_events.rs
│ ├── db/ Database layer (sqlx); db/bsod_events.rs, db/mspbackups.rs
│ ├── ws/ WebSocket handler (BsodEvent dispatch)
│ └── mspbackups/ MSP360 backup integration
│ └── mspbackups/ MSP360 backup integration (sync.rs: derive_backup_status, summarize_backup_error, resolve_all_backup_storage_alerts; client.rs: BackupPlan.plan_type)
├── tray/ System tray binary
├── installer/ WiX v4 MSI (gururmm-agent.wxs)
├── deploy/
@@ -210,15 +218,16 @@ gururmm/
### Current Focus
As of 2026-06-04 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.37):
As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.37+):
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event 3s wait `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main -> beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event -> 3s wait -> `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab + BSOD in Alerts stream (issue #10, dashboard bullets unchecked); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
- **Linux fleet unit drift:** Auto-updater replaces the binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix Linux agents have new binary + old unit (missing `StateDirectory=gururmm`). Needs an ops-script pass via `/rmm` or organic at next reinstall.
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
- **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
- **Watchdog alerts UI** — backend complete but `PUT /watchdog-alerts/:id/resolve` and `DELETE /watchdog-alerts/:id` routes missing on server (found in 2026-05-23 audit).
- **MSP360 backup integration** — Phase 1 complete (monitoring, alerts, mapping, storage thresholds; dashboard UI shipped 2026-05-31). Phase 2 (management) not started.
- **MSP360 backup integration** — Alert quality pass shipped 2026-06-07 (commits `779f7f6` + `b82c010`): false `backup_failed` alerts 15 -> 2; `backup_storage_low` removed (structurally false signal; 5 -> 0 false alerts). Dashboard UI shipped 2026-05-31. Phase 2 (management) not started; genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint).
- **`/backup-status` endpoint shape gap:** Returns only one plan per agent (not all plans); makes agents with a dead old plan + healthy current plan look stale-but-green in the backup tab. Compliance domain evaluates all plans correctly. Not fixed this session — noted for future.
- **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH).
### Patterns & Anti-Patterns
@@ -298,7 +307,7 @@ Gitea push to main
| beta | https://rmm-beta.azcomputerguru.com | `/var/www/gururmm/dashboard-beta` | auto on push — `build-dashboard.sh` (now dispatched by the webhook alongside agent/server builds, change-gated on `last-built-commit-dashboard`) |
| prod | https://rmm.azcomputerguru.com | `/var/www/gururmm/dashboard` | explicit only — `sudo /opt/gururmm/promote-dashboard.sh --confirm` (backs up prod; `--rollback` restores) |
**Do NOT hand-rsync into the prod web root** (the old `npm run build && rsync ... dashboard/` is superseded). One artifact serves both channels — the Vite build bakes in the absolute prod API URL (`rmm-api.azcomputerguru.com`), so beta uses shared prod data and is byte-identical to prod; beta is branded by an nginx-layer `sub_filter` BETA banner (`deploy/nginx/rmm-beta.conf`), so promotion is a plain rsync. **Serving/TLS:** second nginx vhost on `.30` (`server_name rmm-beta`, specific name beats prod `_`), Cloudflare `rmm-beta` A`72.194.62.10` proxied (mirrors `rmm`), Jupiter NPM proxy host **id=11** `.30:80` presenting cert **id=10** (zone SSL mode is Full; if ever Full-Strict, beta needs its own SAN/cert).
**Do NOT hand-rsync into the prod web root** (the old `npm run build && rsync ... dashboard/` is superseded). One artifact serves both channels — the Vite build bakes in the absolute prod API URL (`rmm-api.azcomputerguru.com`), so beta uses shared prod data and is byte-identical to prod; beta is branded by an nginx-layer `sub_filter` BETA banner (`deploy/nginx/rmm-beta.conf`), so promotion is a plain rsync. **Serving/TLS:** second nginx vhost on `.30` (`server_name rmm-beta`, specific name beats prod `_`), Cloudflare `rmm-beta` A->``72.194.62.10` proxied (mirrors `rmm`), Jupiter NPM proxy host **id=11** -> `.30:80` presenting cert **id=10** (zone SSL mode is Full; if ever Full-Strict, beta needs its own SAN/cert).
**DB migrations** — manual; must insert SHA-384 checksum into `_sqlx_migrations` or server crashes on start.
@@ -318,7 +327,7 @@ Gitea push to main
## Active State
**Fleet (as of 2026-06-04, live Postgres verified):**
**Fleet (as of 2026-06-04, live Postgres verified; no enrollment changes in 2026-06-07 session):**
- 55 enrolled agents total
- Stable channel: pinned at 0.6.47 windows/amd64 (promoted 2026-05-28); 0.6.46 linux. All 39 sites and 118 agents are on stable (channel NULL = stable default).
- Beta channel: **GURU-5070 only** — per-agent `update_channel = 'beta'` override (site "Mike's Car" / `103c10b9-c1de-4dd8-b382-b8362ed3143e` has `update_channel = NULL`, so stable is the site default; GURU-5070 is the explicit per-agent exception). Beta has no `update_rollouts` pin — server dispatches the newest signed beta artifact straight from the build pipeline.
@@ -342,14 +351,14 @@ Gitea push to main
| Swanson, Len | Residential | Home | LAS-GAMER |
**API auth:**
- `POST /api/auth/login` JWT (~24h)
- Creds: vault `infrastructure/gururmm-server.sops.yaml` `credentials.gururmm-api.admin-email` / `admin-password`
- `POST /api/auth/login` -> JWT (~24h)
- Creds: vault `infrastructure/gururmm-server.sops.yaml` -> `credentials.gururmm-api.admin-email` / `admin-password`
- Key endpoints: `GET /api/agents`, `POST /api/agents/:id/command`, `GET /api/commands/:id`, `POST /api/agents/:id/update`
- Command fields: `command_type` (`shell`/`powershell`/`python`/`script`/`claude_task`), `command` (script text, JSON-encoded), optional `context`**`system`** (default; Session 0 / SYSTEM) or **`user_session`** (runs in the logged-on user's desktop session via WTS token impersonation; Windows-only, needs an active session), plus `timeout_seconds`/`elevated`. The agent does NOT run everything as LocalSystem — `user_session` is the per-user path (migration `041_add_command_context`, `agent/src/watchdog/wts.rs`).
- Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)
**Dashboard — complete and working:**
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agentbackup mappings/verify UI, Organizations management + dev-admin impersonation UI.
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI.
**Dashboard — incomplete (see UI_GAPS.md):**
- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
@@ -367,9 +376,9 @@ Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI
- #19 — subscriber broadcast
**BUG-020 — tray duplicate/ghost icons (fixed to beta 2026-06-04; dormant follow-up open):**
- Symptom: duplicate AND ghost `gururmm-tray.exe` tray icons. Live evidence: 5 stacked tray processes in Session 1 on GURU-5070 (one per watchdog restart over 6/16/2).
- Root cause: `TrayLauncher` (`agent/src/watchdog/wts.rs`) tracked launches only in an in-memory `HashMap<sid,HANDLE>` that resets on watchdog restart (esp. agent auto-update), so it relaunched trays into sessions that already had one; no single-instance guard in the tray; `terminate_all` hard-killed via `TerminateProcess` skipping the tray's `Drop` `NIM_DELETE` (ghost).
- Fix (commit `137dd85`, gururmm@main beta): (1) per-session `Local\GuruRMM_Tray` single-instance mutex; (2) launcher reconciliation via `WTSEnumerateProcessesW` (idempotent); (3) graceful `Global\GuruRMM_TrayShutdown_{sid}` event 3s wait `TerminateProcess` fallback.
- Symptom: duplicate AND ghost `gururmm-tray.exe` tray icons. Live evidence: 5 stacked tray processes in Session 1 on GURU-5070 (one per watchdog restart over 6/1-6/2).
- Root cause: `TrayLauncher` (`agent/src/watchdog/wts.rs`) tracked launches only in an in-memory `HashMap<sid,HANDLE>` that resets on watchdog restart (esp. agent auto-update), so it relaunched trays into sessions that already had one; no single-instance guard in the tray; `terminate_all` hard-killed via `TerminateProcess` skipping the tray's `Drop` -> `NIM_DELETE` (ghost).
- Fix (commit `137dd85`, gururmm@main -> beta): (1) per-session `Local\GuruRMM_Tray` single-instance mutex; (2) launcher reconciliation via `WTSEnumerateProcessesW` (idempotent); (3) graceful `Global\GuruRMM_TrayShutdown_{sid}` event -> 3s wait -> `TerminateProcess` fallback.
- Verified: independent Grok review + Code Review Agent APPROVE.
- Follow-up (coord todo `25fdf31a`): wire `terminate_all` graceful-shutdown into the watchdog policy-disable/uninstall path so fix #3 becomes active.
@@ -384,13 +393,13 @@ Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI
These decisions are locked. Do not reverse without explicit user approval.
1. **Per-agent enrollment keys** — MSI contains server URL + site_id only. Agent calls `POST /api/enroll` on first run; server issues unique per-agent key stored hashed. Enables revocation, clone detection, audit trail.
2. **Site-specific MSI generation** — Universal base MSI from CI; dashboard endpoint generates site-specific MSI with site_id baked in via WiX property `HKLM\SOFTWARE\GuruRMM\SiteId`.
2. **Site-specific MSI generation** — Universal base MSI from CI; dashboard endpoint generates site-specific MSI with site_id baked in via WiX property -> `HKLM\SOFTWARE\GuruRMM\SiteId`.
3. **No TOML/config for endpoints** — Server URL compiled into binary. No runtime config files for server URL or site_id.
4. **Policy inheritance chain** — global site client agent. Server computes merged effective policy and pushes via `ConfigUpdate` WebSocket message.
4. **Policy inheritance chain** — global -> site -> client -> agent. Server computes merged effective policy and pushes via `ConfigUpdate` WebSocket message.
5. **Platform parity rule** — Any agent feature ships on Windows, Linux, and macOS in the same change. Stub + TODO required if a real implementation is not yet feasible.
6. **Watchdog as separate process** — Main agent cannot reliably restart itself after a crash.
7. **Build pipeline is the only path to production** — Enforces signing, checksum generation, consistent artifact layout.
8. **Multi-tenancy identity model (ADR-001)** — Dev team with partner impersonation. Three levels: Dev Partner Client. Computer Guru is partner #1.
8. **Multi-tenancy identity model (ADR-001)** — Dev team with partner impersonation. Three levels: Dev -> Partner -> Client. Computer Guru is partner #1.
9. **Holistic feature development (DESIGN.md)** — Every feature requires backend + API + dashboard UI + documentation. Backend-only features are rejected.
10. **AI-optional operation** — GuruRMM must be fully functional without AI. AI features are enhancements, not requirements.
@@ -416,6 +425,7 @@ These decisions are locked. Do not reverse without explicit user approval.
| 2026-06-01 | BUG-016 (Linux systemd missing StateDirectory=gururmm) + BUG-017 (device_id OnceLock cache) fixed (commit 30da053). GURU-KALI had 11 ghost agent rows from repeated UUID churn — fixed and verified. BSOD forensics: GURU-5070 bluescreened with `0x116 VIDEO_TDR_FAILURE` (nvlddmkm.sys, NVIDIA driver 32.0.15.9201 on RTX 5070 Ti Laptop GPU); GuruConnect cleared on three grounds; root cause one-off driver TDR. BSOD detection feature (issue #10 Phase 1) implemented: bsod.rs + migration 048 + ws/mod.rs handler; code review caught and fixed SF-1 (watermark before send) + SF-2 (non-atomic watermark write); merged to main (0ec55cf), agent versioned 0.6.51. |
| 2026-06-02 | Server 0.3.37 + migration 048 deployed. Build channel default-beta fix applied to build-windows.sh + build-linux.sh (macOS already correct). Webhook wired to dispatch build-server.sh with change-gate (last-built-commit-server) + backup/rollback. Fleet converged to 0.6.51. GURU-KALI BUG-016 unit file refreshed, override removed, verified clean. [NOTE: the session log recorded "GURU-5070 promoted to stable" — contradicted by live DB; see 2026-06-04 entry.] |
| 2026-06-04 | Channel correction confirmed via live Postgres query: GURU-5070 `agents.update_channel = 'beta'` (explicit per-agent override). Site "Mike's Car" and all 39 sites are `update_channel = NULL` (stable default); GURU-5070 is the only beta agent in the 119-agent fleet. Stable channel pinned at 0.6.47 windows/amd64 + 0.6.46 linux via `update_rollouts` (promoted 2026-05-28); beta channel has 0 `update_rollouts` rows (server dispatches newest signed beta artifact directly). GURU-5070 running 0.6.54. BUG-020 (duplicate/ghost tray icons) fixed in commit `137dd85` to beta: per-session single-instance mutex + `WTSEnumerateProcessesW` reconciliation + graceful shutdown event (fix #3 dormant pending `terminate_all` wiring — coord todo `25fdf31a`). Verified by Grok + Code Review Agent. |
| 2026-06-07 | Backup-alert quality pass shipped. FU1 (`summarize_backup_error` decodes MSP360 message JSON; `create_or_update_alert` now refreshes title/message/severity on re-trigger, also fixes latent severity-escalation freeze) + FU2 (exclude non-backup PlanTypes 8=Restore/13=Consistency-check from alerting/compliance): false `backup_failed` alerts 15 -> 2 fleet-wide (survivors AD1, LAB-Becky are genuine and self-describing), commit `779f7f6`. `backup_storage_low` alert type removed entirely (commit `b82c010`): `DataCopied/TotalData` measures backup-dataset completeness, not destination capacity — produced 5 fleet-wide false alerts including DF-HYPERV-B "100% Full" on a 4 GB plan; `resolve_all_backup_storage_alerts` (type-scoped, idempotent, once-per-tick) clears stragglers; 5 -> 0 verified after 17:21:41 UTC restart. Genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint). `BACKUP_STALE` evaluator confirmed already correct — no new code. Both commits on main. Submodule pinned at `226ba9f` in parent. |
---
@@ -426,8 +436,9 @@ These decisions are locked. Do not reverse without explicit user approval.
- macOS build status: Phase 1 was deployed manually from Mikes-MacBook-Air (2026-05-12). `build-mac.sh` is a stub as of 2026-05-24 — unclear if automated pipeline includes macOS yet. [unverified]
- Pre-commit hook on 172.16.3.30 lacks execute bit (noted 2026-05-23) — likely still unfixed. [unverified]
- Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
- **2026-06-02 recompile:** Folded in BSOD detection feature (Phase 1 shipped — agent/src/bsod.rs, migration 048, ws handler, always-Critical alerts, verified against real 0x116 dump); server build now wired into webhook (change-gated + rollback); build channel default changed to beta (stable is explicit promote); versions updated to agent 0.6.51 / server 0.3.37; fleet converged. Corrected submodule framing (tracks active repo, develop here + push to Gitea — not "stale, do not develop"). Added build-server.sh change-gate marker and server build log to Key Files. Added server's root RMM agent as a good pattern. Updated Current Focus with BSOD Phase 2/3 and Linux fleet unit drift. Added four new anti-patterns (minidump crate, default-stable builds, webhook agent-only gap, auto-update race). Migration count updated 46 48.
- **2026-06-02 recompile:** Folded in BSOD detection feature (Phase 1 shipped — agent/src/bsod.rs, migration 048, ws handler, always-Critical alerts, verified against real 0x116 dump); server build now wired into webhook (change-gated + rollback); build channel default changed to beta (stable is explicit promote); versions updated to agent 0.6.51 / server 0.3.37; fleet converged. Corrected submodule framing (tracks active repo, develop here + push to Gitea — not "stale, do not develop"). Added build-server.sh change-gate marker and server build log to Key Files. Added server's root RMM agent as a good pattern. Updated Current Focus with BSOD Phase 2/3 and Linux fleet unit drift. Added four new anti-patterns (minidump crate, default-stable builds, webhook agent-only gap, auto-update race). Migration count updated 46 -> 48.
- **2026-06-04 recompile:** Corrected GURU-5070 channel state — live Postgres confirms `update_channel = 'beta'` per-agent (not stable as the 2026-06-02 session log implied). Stable fleet pinned at 0.6.47 (not 0.6.51). GURU-5070 on 0.6.54 beta. Beta channel has no `update_rollouts` pin. Added BUG-020 (tray duplicate/ghost icons) — symptom, root cause, fix commit `137dd85`, dormant follow-up for fix #3 wiring. Updated Summary, Components table, Active State, Current Focus, History, Good Patterns, and Compilation Notes. Added sources entry for live Postgres query + commit 137dd85. Added `aliases: [guru-rmm]` frontmatter to cross-reference the tombstone at `wiki/projects/guru-rmm.md`.
- **2026-06-07 recompile:** Folded in backup-alert quality pass (commits `779f7f6` + `b82c010`, both on main). Updated Backup Integration capability section: added FU1/FU2 alert quality pass detail (false backup_failed 15->2; summarize_backup_error; create_or_update_alert refresh); documented backup_storage_low removal (structurally false DataCopied/TotalData signal; 5->0 false alerts; resolve_all_backup_storage_alerts); confirmed BACKUP_STALE evaluator correct (no new code); added key functions list and MSP360 PlanType exclusion map. Updated Repo Structure to include db/mspbackups.rs and mspbackups/ key functions. Updated Current Focus MSP360 line and added /backup-status endpoint shape gap. Updated Summary date and added backup-alert quality pass note. Active State date note updated. Added 2026-06-07 History row. Patterns and History existing rows preserved verbatim.
## Backlinks