sync: auto-sync from GURU-5070 at 2026-06-07 17:45:03

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-07 17:45:03
This commit is contained in:
2026-06-07 17:45:07 -07:00
parent 05c17b476f
commit b928fdb8f3
3 changed files with 51 additions and 0 deletions

View File

@@ -168,3 +168,29 @@ works from GURU-5070).
status `muted`; new tables `alert_mutes`; new columns `agents.role_override`.
- Test box: WIN-TG2STMODJG8 agent id b6c715df-09fe-4e97-b09a-82a1b535f041 (offline
eval server, used for live verification).
---
## Update: 00:44 PT — alert-mute dashboard shipped, beta->prod promotion, MSI dedup cleanup, parked design
### Completed since the initial save
- **Alert ignore/mute dashboard (Task 5) — shipped + deployed** (commit `63d7c5d`). Reused the existing Radix Dialog for the required-reason prompt; shared `AlertRow` so both the alerts page and the agent Alerts tab get Ignore/Un-ignore + the Muted filter; muted rows show Un-ignore (not Ack/Resolve) + reason/who/when; status badge labels `muted` as "Ignored". Code Review APPROVED; `npm run build` clean. The alert-mute feature is now complete end-to-end (server `a120e71` + dashboard `63d7c5d`).
- **Roadmap entries** (commit `5f1d9f0`): both offline alerting and alert ignore/mute added to FEATURE_ROADMAP.md Alerting section; specs' roadmap tasks marked DONE.
- **Dashboard beta -> production promotion.** Prod (`/var/www/gururmm/dashboard`, rmm.azcomputerguru.com) was stuck at the 2026-06-04 build. Ran `sudo /opt/gururmm/promote-dashboard.sh --confirm` on 172.16.3.30 (the sanctioned promote: backs up prod first, keeps 10 backups, rsync beta->prod, `--rollback` to undo). Backup `dashboard-20260608-000926`; prod now byte-identical to beta. This shipped ALL accumulated beta work to prod: this session's offline triage + role-override + alert mute UI, plus other sessions' credential-inheritance mgmt + clickable alert badges. Coord component gururmm/dashboard -> deployed.
- **MSI BSOD duplicate cleanup.** MSI (Safesite/Glendale, agent a685af29...) had 4 BSOD VIDEO_TDR_FAILURE (0x116) alerts, all with `dedup_key = NULL` (legacy, from the early-June BSOD implementation before the dedup key was added). Mike had muted one; the other 3 active duplicates could not be spanned by the mute (NULL key falls back to single-row). Resolved the 3 duplicates (final: 0 active, 1 muted "Client to replace unit", 3 resolved). Two resolve calls hit transient HTTP 000; retried clean.
### Parked design conversation (NOT approved — captured for later)
A stream-of-thought design discussion produced two parked features, captured in
`projects/msp-tools/guru-rmm/docs/PARKED_alert-lifecycle-and-telemetry-cadence.md`
(commit `82a7b14`) + two coord todos (`eb14a4d2`, `bacc4f12`):
1. **Stateful alert lifecycle + occurrence history** (server): one alert per dedup_key spanning all statuses; re-fire reopens only when "substantively different" (severity escalation or re-fire after clear; not per-1%); occurrence count+first/last on the row, dated history via existing agent_events (no new table); ack reopens on substantive change, mute is absolute; server-side per-condition cooldown (~15min default, escalations bypass, recovery always prompt); subsumes BSOD per-bugcheck dedup + mute-across-refire + legacy null-key cleanup. Root cause it fixes: dedup currently matches only status='active', so ack/resolve breaks dedup and spawns duplicates.
2. **Tiered agent telemetry cadence** (agent): per-collector collection intervals (live/on-demand ~5-10s only when watched; standard ~30-60s; slow ~5-15min; daily). Disk consumption (slow) vs disk I/O (fast) are different collectors; fast diagnostics as on-demand/live mode; cadence config in policy, server-pushed; cross-platform scheduler. Larger, separate agent-side feature.
Saved a feedback memory (`.claude/memory/feedback_stream_of_thought_design.md`): Mike
prefers free-form design conversations; Claude sharpens as a design partner but does
not build until an explicit go, then decomposes into specs.
### Still open / parked
- Jupiter VM `ACG-DWP-X-BB` no-DHCP (diagnosed, paused for Mike's DHCP-vs-static call; see wiki/systems/jupiter.md).
- Offline classification overrides (SIF-SERVER/Server2013/WIN-TG2STMODJG8) — taggable from the agent detail role-override control; held.
- The two parked design features above; MSP360 console todos (3) from the morning.