sync: auto-sync from GURU-5070 at 2026-06-07 16:47:01
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-07 16:47:01
This commit is contained in:
@@ -55,7 +55,7 @@ Run `/wiki-lint` to check for stale entries and broken backlinks.
|
||||
|
||||
| Article | Summary | Last Compiled |
|
||||
|---|---|---|
|
||||
| [GuruRMM](projects/gururmm.md) | RMM platform, Rust/Axum server + React dashboard + cross-platform agent; stable fleet pinned v0.6.47; lone beta agent GURU-5070 on v0.6.54 (per-agent channel override); server v0.3.45; 55 enrolled agents; backup-alert quality pass shipped + credential inheritance deployed + offboarding wizard spec complete; clickable alert badges with client filtering; tray BUG-020 (duplicate/ghost icons) fixed to beta (commit 137dd85); active development | 2026-06-07 |
|
||||
| [GuruRMM](projects/gururmm.md) | RMM platform, Rust/Axum server + React dashboard + cross-platform agent; stable fleet pinned v0.6.47; lone beta agent GURU-5070 on v0.6.54 (per-agent channel override); server v0.3.45; 55 enrolled agents; backup-alert quality pass shipped (false backup_failed 15->2; backup_storage_low removed); credential inheritance deployed (hierarchical Global->Client->Site with is_inheritable + de-dup, /effective endpoints) + clickable alert badges with client filtering; SPEC-028 offboarding wizard spec complete (835 lines); role-aware offline alerting + correlated mass-offline detection shipped (agent_offline/mass_offline_site/mass_offline_fleet; migration 054; dashboard triage); alert ignore/mute (perma-silence, migration 055, muted status) shipped server-only (dashboard pending); tray BUG-020 (duplicate/ghost icons) fixed to beta (commit 137dd85); active development | 2026-06-07 |
|
||||
| [Dataforth DOS — Test Datasheet Pipeline](projects/dataforth-dos.md) | DOS update system + TestDataDB pipeline (Node.js, PostgreSQL, Hoffman API); 469K records, 458.5K live on website; 2025 crypto attack recovery; security incident 2026-03-27; SCMVAS/SCMHVAS extension; email notifications via Graph API | 2026-05-24 |
|
||||
| [ClaudeTools Discord Bot](projects/discord-bot.md) | Claude Agent SDK bot in Discord; one persistent session per thread; Phase 1.5 complete (native tools, no hand-written tools); Phases 2-4 (API integration, remediation, UX) pending; runs as NSSM service on BEAST | 2026-05-24 |
|
||||
| [The Computer Guru Show](projects/radio-show.md) | Radio show archive processing pipeline (Whisper + pyannote + SQLite FTS5) + post-show content workflow; 572 episodes indexed; FastAPI UI redesigned; Jupiter audio-file gap open | 2026-05-24 |
|
||||
|
||||
@@ -9,7 +9,7 @@ aliases:
|
||||
sources:
|
||||
- "gururmm@main: server/src/api/*.rs (REST API surface, ~30 route modules)"
|
||||
- "gururmm@main: agent/src/ (agent capabilities; transport/CommandContext, ohw.rs, watchdog/wts.rs, bsod.rs)"
|
||||
- "gururmm@main: server/migrations/*.sql (48 migrations — feature checkpoints, incl. 048_bsod_events)"
|
||||
- "gururmm@main: server/migrations/*.sql (55+ migrations — feature checkpoints, incl. 048_bsod_events, 054_agent_role_override, 055_alert_mutes)"
|
||||
- "gururmm@main: docs/FEATURE_ROADMAP.md, docs/specs/"
|
||||
- "gururmm@main: git log feat/perf history (changelogs incomplete past v0.6.22)"
|
||||
- "gururmm@main: server/migrations/048_bsod_events.sql"
|
||||
@@ -52,6 +52,7 @@ sources:
|
||||
- session-logs/2026-06-07-mike-gururmm-offboarding-spec.md
|
||||
- "live GuruRMM Postgres query 2026-06-04: agents/sites/update_rollouts/agent_updates tables (channel verification)"
|
||||
- session-logs/2026-06-07-mike-gururmm-backup-alert-cleanup.md
|
||||
- session-logs/2026-06-07-mike-gururmm-offline-alerting-mute.md
|
||||
backlinks:
|
||||
- clients/cascades-tucson
|
||||
- systems/gururmm-build
|
||||
@@ -69,6 +70,8 @@ GuruRMM is a Remote Monitoring & Management platform built by Arizona Computer G
|
||||
|
||||
**Backup-alert quality pass shipped 2026-06-07:** False `backup_failed` alerts reduced 15 -> 2 fleet-wide (commits `779f7f6` + `b82c010` on main). `backup_storage_low` alert type removed entirely — the `DataCopied/TotalData` ratio measures backup-dataset completeness, not destination capacity, and produced 5 fleet-wide false alerts. See Backup Integration section for full detail.
|
||||
|
||||
**Role-aware offline alerting + alert ignore/mute shipped 2026-06-07 (second session):** Scheduled offline-sweep evaluator (60s tokio interval, server-only). `agent_offline` alerts for servers only; classifier: `os_product_type` 2/3, else `os_name`/`os_version` ~/server/i, else manual `role_override` (migration 054). Site rule (>=50% + >=3) -> `mass_offline_site`; fleet rule (>=10) -> `mass_offline_fleet`; aggregates pinned to a representative offline agent. Warm-up restart guard. Dashboard: servers elevated in triage individually; workstations collapsed. Alert mute (perma-silence, migration 055 `alert_mutes` + `muted` status): keyed on `dedup_key`, permanent until un-ignored, reason required; gates both `create_or_update_alert` and `create_check_alert` bypass. Dashboard Ignore/Muted/Un-ignore UI NOT yet built. Commits f1cdf5d/30e4f23/21d63bd/3eedf91 (offline), 29c405e/a120e71 (mute). Known gap: `os_product_type` populated on only ~16/168 agents; `os_name` is the workhorse classifier.
|
||||
|
||||
**See also:** `wiki/projects/guru-rmm.md` is a redirect tombstone pointing here (slug disambiguation: on-disk directory is `guru-rmm` hyphenated; wiki and Gitea repo use `gururmm` no-hyphen).
|
||||
|
||||
**Repo:** `azcomputerguru/gururmm` on Gitea (internal: http://172.16.3.20:3000). The copy at `D:\claudetools\projects\msp-tools\guru-rmm` is a git submodule tracking the active `azcomputerguru/gururmm` repo; the pinned pointer normally lags `main` (expected). Development happens in the submodule working tree and changes are committed and pushed to Gitea from there.
|
||||
@@ -147,6 +150,8 @@ Agent<->server communication is a persistent authenticated WebSocket with auto-r
|
||||
### Alerting & Watchdog
|
||||
- Threshold alerts (ack/resolve, per-agent + fleet summary, dashboard filter). Alert templates (`022`) with effective resolution; per-client email settings (`020`). Maintenance mode (`021`) to suppress alerting per scope.
|
||||
- Watchdog: **separate** supervising process (polls `GuruRMMAgent` every 30s, restart backoff, alert after 3 fails) + launches/reaps the tray into active user sessions via WTS. Full alert CRUD + ack/resolve.
|
||||
- **Role-aware offline alerting (shipped 2026-06-07):** Scheduled offline-sweep evaluator (60s tokio interval; `server/src/alerts/offline.rs`). Generates server-only `agent_offline` alerts based on `agent_role` classifier: `os_product_type` IN {2,3} -> server; else `os_name`/`os_version` ~/server/i -> server; else manual `role_override` (migration 054, `agents.role_override`). Site rule (>=50% of a site's servers offline AND >=3 absolute) -> `mass_offline_site` aggregate; fleet rule (>=10 servers offline) -> `mass_offline_fleet` aggregate; aggregates pinned to a representative offline agent with site/fleet `dedup_key` (avoids making `alerts.agent_id` nullable). Restart guard = warm-up window after boot only (NOT `last_seen < started_at`, which is permanently false in steady state -- code review caught and fixed this spec defect). Dashboard: offline servers individual + elevated in triage; offline workstations collapsed into a "N workstations offline" roll-up; role-override control on agent detail page. `PUT /api/agents/:id/role-override`. Known gap: `os_product_type` populated on only ~16/168 agents; `os_name` is the workhorse classifier; inventory-less offline servers (e.g., SIF-SERVER, Server2013) auto-classify as workstations until manually overridden.
|
||||
- **Alert ignore/mute -- perma-silence (server only, shipped 2026-06-07; dashboard pending):** `alert_mutes` table (migration 055) + `muted` alert status. Mute keyed on `dedup_key` (universal recurring-condition id, always set). Permanent until un-ignored; reason required (400 if missing). Gate inserted at the top of `create_or_update_alert` AND the `create_check_alert` bypass so muted conditions write `status='muted'` and never email -- the active alert path is byte-for-byte unchanged. Transactional `mute_condition`/`unmute_condition`. `POST /api/alerts/:id/mute` + `/unmute`. Distinct from ack/resolve (which quiet only the current cycle). Dashboard Ignore button, Muted filter, and Un-ignore NOT yet built. New alert types: `agent_offline`, `mass_offline_site`, `mass_offline_fleet`. New status: `muted`. Key functions: `is_dedup_muted`, `mute_condition`, `unmute_condition` (db/alerts.rs); `offline_sweep`, `agent_role` (alerts/offline.rs).
|
||||
|
||||
### Credentials Management
|
||||
- Encrypted credentials vault (`016`): scoped global/client/site, typed (password, SSH key, SNMP), metadata-only by default with separate `/reveal` decrypt endpoint (known HIGH item: `/reveal` ownership-scope check — [verify current state]).
|
||||
@@ -236,10 +241,12 @@ gururmm/
|
||||
│ └── main.rs systemd unit template generation
|
||||
├── server/ Rust/Axum API server
|
||||
│ └── src/
|
||||
│ ├── api/ REST handlers
|
||||
│ ├── db/ Database layer (sqlx); db/bsod_events.rs, db/mspbackups.rs
|
||||
│ ├── api/ REST handlers (alerts.rs: mute/unmute; agents.rs: role-override)
|
||||
│ ├── alerts/ Alerting modules (offline.rs: offline_sweep, mass_offline detection, agent_role classifier, warm-up restart guard)
|
||||
│ ├── db/ Database layer (sqlx); db/bsod_events.rs, db/mspbackups.rs, db/alerts.rs (is_dedup_muted, mute_condition, unmute_condition, AlertStatus::Muted)
|
||||
│ ├── ws/ WebSocket handler (BsodEvent dispatch)
|
||||
│ └── mspbackups/ MSP360 backup integration (sync.rs: derive_backup_status, summarize_backup_error, resolve_all_backup_storage_alerts; client.rs: BackupPlan.plan_type)
|
||||
├── dashboard/ React/TypeScript UI (lib/agentRole.ts: server classifier; components/ExceptionStream.tsx: offline triage with workstation roll-up; pages/AgentDetail.tsx: role-override control)
|
||||
├── tray/ System tray binary
|
||||
├── installer/ WiX v4 MSI (gururmm-agent.wxs)
|
||||
├── deploy/
|
||||
@@ -254,23 +261,20 @@ gururmm/
|
||||
|
||||
### Current Focus
|
||||
|
||||
<<<<<<< HEAD
|
||||
As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.37+):
|
||||
|
||||
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main -> beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event -> 3s wait -> `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
|
||||
=======
|
||||
As of 2026-06-07 (agent 0.6.54 beta / 0.6.47 stable / server 0.3.45):
|
||||
|
||||
- **Credential inheritance (deployed 2026-06-07):** Production server running v0.3.45 with full credential inheritance and de-duplication. `/effective` endpoints validated. Dashboard clickable alert badges and client-scoped filtering implemented.
|
||||
- **SPEC-028 offboarding wizard (specification complete):** 835-line spec created for site and client offboarding workflows. Includes data export, dependency analysis, typed confirmation, and audit logging. Roadmap updated with "Client & Site Lifecycle Management" section. Implementation pending.
|
||||
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main → beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event → 3s wait → `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
|
||||
>>>>>>> 5869da2 (sync: auto-sync from Mikes-MacBook-Air.local at 2026-06-07 12:59:13)
|
||||
- **BUG-020 — tray duplicate/ghost icons (fixed to beta, 2026-06-04):** Commit `137dd85` shipped to main -> beta. Fix #1: per-session `Local\GuruRMM_Tray` single-instance mutex in the tray binary. Fix #2: `TrayLauncher` reconciliation via `WTSEnumerateProcessesW` (idempotent across watchdog restarts). Fix #3: graceful `Global\GuruRMM_TrayShutdown_{sid}` event -> 3s wait -> `TerminateProcess` fallback (so `NIM_DELETE` fires and ghost icon is cleaned). [NOTE: Fix #3 is implemented but dormant — `terminate_all` has no caller in the agent yet. Tracked in coord todo `25fdf31a` to wire into the watchdog policy-disable/uninstall path.]
|
||||
- **BSOD detection Phase 2/3 (deferred):** Dashboard "Crashes" tab + BSOD in Alerts stream (issue #10, dashboard bullets unchecked); `fetch_bsod_dump` on-demand upload; full ~350-entry bugcheck name table (Phase 1 ships a 10-code map).
|
||||
- **Linux fleet unit drift:** Auto-updater replaces the binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix Linux agents have new binary + old unit (missing `StateDirectory=gururmm`). Needs an ops-script pass via `/rmm` or organic at next reinstall.
|
||||
- **Tray IPC + peer authorization** — Linux tray merged (PR #13+#14). Open: Windows peer authz (#16), logind console-user resolution (#17), macOS tray (#18), subscriber broadcast (#19).
|
||||
- **Auto-update reliability** — BB-SERVER and RECEPTIONIST-PC (Cascades) miss dispatch windows due to flaky WebSockets. Re-querying pending updates on reconnect: incomplete as of 2026-05-24.
|
||||
- **Watchdog alerts UI** — backend complete but `PUT /watchdog-alerts/:id/resolve` and `DELETE /watchdog-alerts/:id` routes missing on server (found in 2026-05-23 audit).
|
||||
- **MSP360 backup integration** — Alert quality pass shipped 2026-06-07 (commits `779f7f6` + `b82c010`): false `backup_failed` alerts 15 -> 2; `backup_storage_low` removed (structurally false signal; 5 -> 0 false alerts). Dashboard UI shipped 2026-05-31. Phase 2 (management) not started; genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint).
|
||||
- **MSP360 API scope (confirmed 2026-06-07):** Provider API token (vault `msp-tools/msp360-api.sops.yaml`) is monitoring-only at our tier: Companies/Users/Monitoring GET endpoints; every management path (plan delete, manual run, storage config) returns 404. White-labeled MSP360 agent has no `cbb.exe` at any standard path. Plan delete/run/storage remain MSP360-console tasks.
|
||||
- **Alert mute dashboard (Task 5) -- NOT started:** Ignore button + required-reason prompt on alert row; Muted filter on alerts page + agent Alerts tab; Un-ignore control; muted rows show Un-ignore instead of Ack/Resolve. Server endpoints ready: `POST /api/alerts/:id/mute` + `/unmute`.
|
||||
- **Offline alerting -- classification gap:** SIF-SERVER (SifOidak.local DC), Server2013 (Sombra), and other inventory-less offline servers currently auto-classify as workstations (no `os_product_type`; `os_name` doesn't match ~/server/i for those names). Needs manual `role_override` via `PUT /api/agents/:id/role-override` -- Mike held on tagging them.
|
||||
- **`/backup-status` endpoint shape gap:** Returns only one plan per agent (not all plans); makes agents with a dead old plan + healthy current plan look stale-but-green in the backup tab. Compliance domain evaluates all plans correctly. Not fixed this session — noted for future.
|
||||
- **Security audit backlog:** `credentials/:id/reveal` horizontal privilege escalation (HIGH), `internal_err()` raw DB errors at ~130 call sites (HIGH).
|
||||
|
||||
@@ -371,11 +375,7 @@ Gitea push to main
|
||||
|
||||
## Active State
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Fleet (as of 2026-06-04, live Postgres verified; no enrollment changes in 2026-06-07 session):**
|
||||
=======
|
||||
**Fleet (as of 2026-06-07):**
|
||||
>>>>>>> 5869da2 (sync: auto-sync from Mikes-MacBook-Air.local at 2026-06-07 12:59:13)
|
||||
- 55 enrolled agents total
|
||||
- Stable channel: pinned at 0.6.47 windows/amd64 (promoted 2026-05-28); 0.6.46 linux. All 39 sites and 118 agents are on stable (channel NULL = stable default).
|
||||
- Beta channel: **GURU-5070 only** — per-agent `update_channel = 'beta'` override (site "Mike's Car" / `103c10b9-c1de-4dd8-b382-b8362ed3143e` has `update_channel = NULL`, so stable is the site default; GURU-5070 is the explicit per-agent exception). Beta has no `update_rollouts` pin — server dispatches the newest signed beta artifact straight from the build pipeline.
|
||||
@@ -406,11 +406,7 @@ Gitea push to main
|
||||
- Response: `stdout`, `stderr`, `exit_code`, `status` (running/completed/failed/timeout/interrupted)
|
||||
|
||||
**Dashboard — complete and working:**
|
||||
<<<<<<< HEAD
|
||||
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts, Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI.
|
||||
=======
|
||||
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent↔backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support.
|
||||
>>>>>>> 5869da2 (sync: auto-sync from Mikes-MacBook-Air.local at 2026-06-07 12:59:13)
|
||||
Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI analysis, Alerts (with clickable severity badges + client filtering), Metrics (CPU/RAM/disk/network, process drill-down modal), Auto-update triggering, Network state, Entra ID SSO (Entra only — Google planned per SPEC-008, not implemented), Policies Dashboard (all tabs), Registry editor, MSP360 backup status card + agent<->backup mappings/verify UI, Organizations management + dev-admin impersonation UI, Credentials management with inheritance support.
|
||||
|
||||
**Dashboard — incomplete (see UI_GAPS.md):**
|
||||
- Enrollment management UI (revoke keys, audit log, duplicate hostname warnings)
|
||||
@@ -419,6 +415,7 @@ Agents management, Clients/Sites CRUD, Commands execution + terminal, Logs + AI
|
||||
- BSOD in Alerts stream (Phase 2 deferred)
|
||||
- Tunnel session management (interactive terminal — backend skeleton, not production-ready)
|
||||
- Offboarding wizard UI (SPEC-028 complete, implementation pending)
|
||||
- Alert mute UI -- Ignore button + required-reason prompt, Muted filter, Un-ignore control (server endpoints ready; dashboard Task 5 NOT started)
|
||||
|
||||
**Open Gitea issues:**
|
||||
- #10 — BSOD detection Phase 2/3 (dashboard + fetch_bsod_dump + full bugcheck table)
|
||||
@@ -478,11 +475,9 @@ These decisions are locked. Do not reverse without explicit user approval.
|
||||
| 2026-06-01 | BUG-016 (Linux systemd missing StateDirectory=gururmm) + BUG-017 (device_id OnceLock cache) fixed (commit 30da053). GURU-KALI had 11 ghost agent rows from repeated UUID churn — fixed and verified. BSOD forensics: GURU-5070 bluescreened with `0x116 VIDEO_TDR_FAILURE` (nvlddmkm.sys, NVIDIA driver 32.0.15.9201 on RTX 5070 Ti Laptop GPU); GuruConnect cleared on three grounds; root cause one-off driver TDR. BSOD detection feature (issue #10 Phase 1) implemented: bsod.rs + migration 048 + ws/mod.rs handler; code review caught and fixed SF-1 (watermark before send) + SF-2 (non-atomic watermark write); merged to main (0ec55cf), agent versioned 0.6.51. |
|
||||
| 2026-06-02 | Server 0.3.37 + migration 048 deployed. Build channel default-beta fix applied to build-windows.sh + build-linux.sh (macOS already correct). Webhook wired to dispatch build-server.sh with change-gate (last-built-commit-server) + backup/rollback. Fleet converged to 0.6.51. GURU-KALI BUG-016 unit file refreshed, override removed, verified clean. [NOTE: the session log recorded "GURU-5070 promoted to stable" — contradicted by live DB; see 2026-06-04 entry.] |
|
||||
| 2026-06-04 | Channel correction confirmed via live Postgres query: GURU-5070 `agents.update_channel = 'beta'` (explicit per-agent override). Site "Mike's Car" and all 39 sites are `update_channel = NULL` (stable default); GURU-5070 is the only beta agent in the 119-agent fleet. Stable channel pinned at 0.6.47 windows/amd64 + 0.6.46 linux via `update_rollouts` (promoted 2026-05-28); beta channel has 0 `update_rollouts` rows (server dispatches newest signed beta artifact directly). GURU-5070 running 0.6.54. BUG-020 (duplicate/ghost tray icons) fixed in commit `137dd85` to beta: per-session single-instance mutex + `WTSEnumerateProcessesW` reconciliation + graceful shutdown event (fix #3 dormant pending `terminate_all` wiring — coord todo `25fdf31a`). Verified by Grok + Code Review Agent. |
|
||||
<<<<<<< HEAD
|
||||
| 2026-06-07 | Backup-alert quality pass shipped. FU1 (`summarize_backup_error` decodes MSP360 message JSON; `create_or_update_alert` now refreshes title/message/severity on re-trigger, also fixes latent severity-escalation freeze) + FU2 (exclude non-backup PlanTypes 8=Restore/13=Consistency-check from alerting/compliance): false `backup_failed` alerts 15 -> 2 fleet-wide (survivors AD1, LAB-Becky are genuine and self-describing), commit `779f7f6`. `backup_storage_low` alert type removed entirely (commit `b82c010`): `DataCopied/TotalData` measures backup-dataset completeness, not destination capacity — produced 5 fleet-wide false alerts including DF-HYPERV-B "100% Full" on a 4 GB plan; `resolve_all_backup_storage_alerts` (type-scoped, idempotent, once-per-tick) clears stragglers; 5 -> 0 verified after 17:21:41 UTC restart. Genuine destination-capacity alerting deferred (needs MSP360 storage-accounts endpoint). `BACKUP_STALE` evaluator confirmed already correct — no new code. Both commits on main. Submodule pinned at `226ba9f` in parent. |
|
||||
=======
|
||||
| 2026-06-07 | Credential inheritance deployed to production (server v0.3.45). Hierarchical credential propagation (Global → Client → Site) with `is_inheritable` flag and de-duplication by (credential_type, label). `/effective` endpoints validated. Dashboard UI: clickable alert severity badges with client filtering, offline badge now scopes to client-specific agents. SPEC-028 offboarding wizard specification created (835 lines) covering site and client offboarding workflows with data export, dependency analysis, typed confirmation, and audit logging. FEATURE_ROADMAP.md updated with "Client & Site Lifecycle Management" section. |
|
||||
>>>>>>> 5869da2 (sync: auto-sync from Mikes-MacBook-Air.local at 2026-06-07 12:59:13)
|
||||
| 2026-06-07 | Role-aware offline alerting + alert ignore/mute shipped (second session). Offline sweep evaluator (60s tokio interval, `server/src/alerts/offline.rs`): server-only `agent_offline` alerts (classifier: `os_product_type` 2/3, else `os_name` ~/server/i, else `role_override`); site rule (>=50% + >=3) -> `mass_offline_site`; fleet rule (>=10) -> `mass_offline_fleet`; aggregates pinned to representative agent with site/fleet `dedup_key`. Warm-up restart guard (code review caught + fixed `last_seen < started_at` spec defect — permanently false in steady state). Migration 054 (`agents.role_override`). Dashboard triage: servers elevated individually; workstations collapsed into roll-up. `PUT /api/agents/:id/role-override`. Verified live vs WIN-TG2STMODJG8 ("Windows Server 2019 Standard Evaluation", auto-classified via `os_name`); zero false mass-offline against ~55 chronically-offline agents; restart guard held across deploys. Known gap: `os_product_type` on only 16/168 agents; `os_name` is workhorse. Commits f1cdf5d/30e4f23/21d63bd/3eedf91. Alert ignore/mute (perma-silence): migration 055 (`alert_mutes` table + `muted` status); `dedup_key`-keyed; reason required (400 if missing); gates `create_or_update_alert` + `create_check_alert` bypass; `POST /api/alerts/:id/mute` + `/unmute`; dashboard UI NOT started. Verified live (mute -> `muted`; 60s sweep did not re-fire; unmute -> `active`). Commits 29c405e/a120e71. MSP360 Provider API probed live: monitoring-only tier (Companies/Users/Monitoring GET; management paths 404); plan delete/run/storage remain console tasks; white-label agent has no `cbb.exe`. |
|
||||
|
||||
---
|
||||
|
||||
@@ -495,11 +490,9 @@ These decisions are locked. Do not reverse without explicit user approval.
|
||||
- Auto-update reliability fix for BB-SERVER and RECEPTIONIST-PC was incomplete at 2026-05-24 save. [unverified]
|
||||
- **2026-06-02 recompile:** Folded in BSOD detection feature (Phase 1 shipped — agent/src/bsod.rs, migration 048, ws handler, always-Critical alerts, verified against real 0x116 dump); server build now wired into webhook (change-gated + rollback); build channel default changed to beta (stable is explicit promote); versions updated to agent 0.6.51 / server 0.3.37; fleet converged. Corrected submodule framing (tracks active repo, develop here + push to Gitea — not "stale, do not develop"). Added build-server.sh change-gate marker and server build log to Key Files. Added server's root RMM agent as a good pattern. Updated Current Focus with BSOD Phase 2/3 and Linux fleet unit drift. Added four new anti-patterns (minidump crate, default-stable builds, webhook agent-only gap, auto-update race). Migration count updated 46 -> 48.
|
||||
- **2026-06-04 recompile:** Corrected GURU-5070 channel state — live Postgres confirms `update_channel = 'beta'` per-agent (not stable as the 2026-06-02 session log implied). Stable fleet pinned at 0.6.47 (not 0.6.51). GURU-5070 on 0.6.54 beta. Beta channel has no `update_rollouts` pin. Added BUG-020 (tray duplicate/ghost icons) — symptom, root cause, fix commit `137dd85`, dormant follow-up for fix #3 wiring. Updated Summary, Components table, Active State, Current Focus, History, Good Patterns, and Compilation Notes. Added sources entry for live Postgres query + commit 137dd85. Added `aliases: [guru-rmm]` frontmatter to cross-reference the tombstone at `wiki/projects/guru-rmm.md`.
|
||||
<<<<<<< HEAD
|
||||
- **2026-06-07 recompile:** Folded in backup-alert quality pass (commits `779f7f6` + `b82c010`, both on main). Updated Backup Integration capability section: added FU1/FU2 alert quality pass detail (false backup_failed 15->2; summarize_backup_error; create_or_update_alert refresh); documented backup_storage_low removal (structurally false DataCopied/TotalData signal; 5->0 false alerts; resolve_all_backup_storage_alerts); confirmed BACKUP_STALE evaluator correct (no new code); added key functions list and MSP360 PlanType exclusion map. Updated Repo Structure to include db/mspbackups.rs and mspbackups/ key functions. Updated Current Focus MSP360 line and added /backup-status endpoint shape gap. Updated Summary date and added backup-alert quality pass note. Active State date note updated. Added 2026-06-07 History row. Patterns and History existing rows preserved verbatim.
|
||||
=======
|
||||
- **2026-06-07 recompile:** Updated for credential inheritance production deployment (server v0.3.45), clickable alert badges with client filtering, and SPEC-028 offboarding wizard specification. Added Recent Work section documenting 2026-06-07 session accomplishments. Updated Current Focus to reflect credential inheritance as deployed and offboarding wizard as spec-complete/implementation-pending. Updated Dashboard status to include credentials management with inheritance. Updated version numbers throughout (server 0.3.37 → 0.3.45). Added session-logs/2026-06-07-mike-gururmm-offboarding-spec.md to sources. Updated History Highlights with 2026-06-07 entry.
|
||||
>>>>>>> 5869da2 (sync: auto-sync from Mikes-MacBook-Air.local at 2026-06-07 12:59:13)
|
||||
- **2026-06-07 recompile (second session):** Folded in role-aware offline alerting (server/src/alerts/offline.rs: offline_sweep + agent_role classifier + mass_offline site/fleet detection; migration 054 agents.role_override; dashboard triage + role-override control on agent detail; warm-up restart guard; four commits f1cdf5d/30e4f23/21d63bd/3eedf91) and alert ignore/mute perma-silence (alert_mutes table, muted status, is_dedup_muted/mute_condition/unmute_condition, two-choke-point gate at create_or_update_alert + create_check_alert bypass, mute/unmute API; migration 055; dashboard pending; two commits 29c405e/a120e71). Added MSP360 API scope finding (monitoring-only tier confirmed by live probe; management paths 404). Updated Alerting & Watchdog section with offline alerting and mute detail including new alert types (agent_offline, mass_offline_site, mass_offline_fleet) and new status (muted). Updated Repo Structure (alerts/ directory; db/alerts.rs key functions; dashboard/ entry with agentRole.ts). Updated Development / Current Focus with alert mute dashboard (Task 5 not started), offline classification gap, and MSP360 API scope item. Added alert mute UI to Dashboard incomplete list. Added second 2026-06-07 History row. Updated migration count to 55+ (054/055 confirmed). Added session log source. Patterns section and all existing History rows preserved verbatim.
|
||||
|
||||
## Backlinks
|
||||
|
||||
|
||||
@@ -48,7 +48,7 @@ Not documented. iDRAC available at 172.16.1.73 (DHCP) for OOB management.
|
||||
| OwnCloud | 172.16.3.22 | running | OwnCloud file sync VM (cloud.acghosting.com) |
|
||||
| Unifi | (IP not documented) | running | UniFi Network controller |
|
||||
| Windows 7 | — | shut off | — |
|
||||
| Windows Server 2016 | — | shut off | — |
|
||||
| Windows Server 2016 | (none — APIPA) | running | Windows guest `ACG-DWP-X-BB`; e1000 NIC `vnet8` on br0, DHCP not leasing — see Known Issues |
|
||||
| Windows Server 2016_Template | — | shut off | — |
|
||||
|
||||
## Access
|
||||
@@ -89,6 +89,7 @@ Not documented. iDRAC available at 172.16.1.73 (DHCP) for OOB management.
|
||||
- **iDRAC IP is DHCP** (172.16.1.73) — may drift. Verify before relying on it for OOB access.
|
||||
- **guruRMM API proxy stale** — see NPM table above. Fix before it causes a routing incident.
|
||||
- **Post-power-failure recovery order matters** — see `.claude/POWER_FAILURE_RUNBOOK.md` for the full recovery sequence (Tailscale routes, libvirt/VMs, Seafile, NPM/DNS in order).
|
||||
- **VM "Windows Server 2016" (`ACG-DWP-X-BB`) — no LAN (2026-06-07):** guest stuck on APIPA `169.254.157.152`, no DHCP lease. Host side is healthy (vnet8 bridged to br0, forwarding, receiving LAN broadcast); fault is guest-side — single e1000 NIC set to DHCP, pfSense (172.16.0.1) not leasing it. Diagnose via `virsh domifaddr 9 --source agent` and qemu guest-exec `ipconfig /all`. Fix path: `ipconfig /renew` in-guest (stuck-client case) or assign a static IP if that is the intended config. PAUSED pending Mike's DHCP-vs-static decision.
|
||||
|
||||
## Backlinks
|
||||
|
||||
|
||||
Reference in New Issue
Block a user