sync: auto-sync from GURU-5070 at 2026-06-04 19:27:51

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-04 19:27:51
This commit is contained in:
2026-06-04 19:27:56 -07:00
parent e08488ae5e
commit 8389e64a02
3 changed files with 201 additions and 0 deletions

View File

@@ -190,3 +190,55 @@ Get-ScheduledTask -TaskPath "\Microsoft\Windows\Windows Defender\" | Enable-Sche
- Disable is NOT update-proof: a Windows feature update may re-enable real-time protection. For a genuinely permanent disable, offered (deferred): Safe Mode service-disable of WinDefend/Sense/WdNisSvc/WdFilter, OR install Bitdefender to auto-supersede Defender.
- Machine currently has NO active AV. Consider Bitdefender if this is to remain a working posture.
---
## Update: 19:26 PT — GuruRMM tray-icon bug fix (shipped) + SPEC-016 VSS backend (checkpointed)
### Session Summary
Fixed the recurring GuruRMM duplicate/ghost tray-icon bug end to end. Diagnosed three compounding defects via code review, live evidence on GURU-5070 (5 stacked `gururmm-tray.exe` in Session 1, one per watchdog restart), and an independent Grok review: (1) `TrayLauncher` tracked launches in an in-memory map that resets on watchdog restart/auto-update, so it relaunched trays into sessions that already had one; (2) `terminate_all` used `TerminateProcess` (hard kill) which skips the tray's Drop -> `Shell_NotifyIcon(NIM_DELETE)`, orphaning icons; (3) no single-instance guard in the tray. Coding Agent implemented a per-session `Local\GuruRMM_Tray` mutex, `WTSEnumerateProcessesW` launcher reconciliation, and a graceful `Global\GuruRMM_TrayShutdown_{sid}` event. Code Review Agent APPROVE. Committed to gururmm main (`137dd85`) -> beta build. Killed the 5 stray processes on GURU-5070. Verified via live Postgres that GURU-5070 is the lone beta agent (explicit per-agent `update_channel=beta`; stable fleet pinned 0.6.47), so the fix auto-lands here.
Recompiled the `gururmm` wiki article (corrected the stale "GURU-5070 promoted to stable" line — DB shows beta; added BUG-020) and created a `guru-rmm` -> `gururmm` redirect tombstone (the on-disk dir is hyphenated, the wiki/repo slug is not, which had caused a dead-end lookup earlier in the session).
Began SPEC-016 (VSS Shadow Copy Management) as a full build per Mike, using Grok as design sounding-board + code reviewer. Grok's review reshaped the design: WMI `Win32_ShadowCopy.Create` (not vssadmin on client SKUs), robocopy `/COPYALL` restore (Copy-Item drops ACLs), bounded shadowstorage provisioning, and mandatory guardrails before default-on. Rewrote SPEC-016 to v2 (JSONB policy model, migration 049->050 correction, 8 guardrails). Mid-build, Mike refined the core decision: VSS ON by default for SERVERS only, OFF for workstations (tiny-disk servers covered by existing low-space alerts). This made the default OS-type-aware. Rebuilt the brittle OS identifier (Caption + ProductType + DisplayVersion + edition; migration 049_os_identity; `agent_is_server()` classifier) to fix Mike's "ugly names / no edition" gripes and provide a reliable server/workstation flag. Built the agent VSS core (Stage 1, reviewed + fixed: C1/H1/H2/M2/M4/L1, 5 tests) and the server stage (Stage 2: VssConfig policy, OS-aware default, migration 050_vss, db/vss.rs, api/vss.rs, WS ingest, capability gate; 10 tests). Checkpointed the backend to gururmm branch `feat/vss-shadow-copy` (`8f61624`) per Mike; dashboard + OS/server review + migration apply remain (coord todo 8c86d987).
### Key Decisions
- Tray fix shipped #1+#2 (mutex + reconciliation) live; #3 (graceful shutdown) implemented but dormant because `terminate_all` has no caller in the agent — tracked follow-up (todo 25fdf31a) rather than wiring it speculatively into the watchdog teardown.
- VSS default-on made OS-type-aware (Mike): server ON / workstation OFF, decided server-side in `get_effective_policy()` via `agent_is_server()`, not in the OS-agnostic static `system_defaults()`. Workstations were the disk blast-radius Grok warned about.
- `min_volume_gb` default = None (no size gate): rely on existing low-space alerts for tiny-disk servers rather than silently skipping them (Mike). The Stage-1 agent wrongly coerced None->100GB (C1); fixed.
- OS identifier sources `Win32_OperatingSystem.Caption`/`ProductType`/registry `DisplayVersion` as primary, keeping the old build-number map only as fallback — robust against unlisted builds.
- Checkpointed the VSS backend to a feature branch (not main) so no beta build of dormant code (agent VSS is inert until the server emits a `vss` policy section); secures ~4500 lines of uncommitted work.
- Kept the submodule gitlink lagging main (fe551e4) — normal/expected per project rules; did not force-bump to 137dd85.
### Problems Encountered
- PowerShell tool `EPERM uv_spawn` on Defender/VSS-touching commands earlier was the harness sandbox, not AV — bypass with `dangerouslyDisableSandbox` (carried over from the Defender work).
- Grok code-review runs hit the headless "Cancelled" finalization quirk 3x this session; deferred Grok code review to the (more useful) fixed code / next session. Grok design sounding-board (text mode) worked.
- Code Review found 3 blockers in the VSS agent (C1 min_volume_gb None->100GB gate breaking small-disk servers; H1 dest blocklist bypassable via 8.3 short names / C:-only; H2 unbounded recursive C:\Users backup-artifact scan). All fixed + tests added.
- Gitea Agent discovered the submodule was on detached HEAD `fe551e4` (the human-flow commit the gitlink tracks), not main `137dd85`; the VSS WIP was built on fe551e4. Branched from there and returned to fe551e4 to keep the parent gitlink clean, instead of blindly checking out main (which would have dirtied the parent).
### Configuration Changes
- gururmm main `137dd85`: tray fix — `agent/src/watchdog/wts.rs`, `tray/src/winsingleton.rs` (new), `tray/src/main.rs`, `tray/src/tray.rs`, `tray/Cargo.toml`.
- gururmm branch `feat/vss-shadow-copy` `8f61624` (22 files): `agent/src/vss.rs` (new), `agent/src/inventory.rs` (OS rebuild), `agent/src/transport/{mod,websocket}.rs`, `agent/src/main.rs`, `server/src/db/{policies,vss(new),agents,inventory,enroll,mod}.rs`, `server/src/policy/{merge,effective,config_update}.rs`, `server/src/api/{vss(new),mod,policies}.rs`, `server/src/ws/mod.rs`, `server/src/main.rs`, `server/migrations/049_os_identity.sql` (new), `server/migrations/050_vss.sql` (new), `docs/specs/SPEC-016-vss-shadow-copy-management.md` (v2), `docs/FEATURE_ROADMAP.md` (BUG-020).
- ClaudeTools root (this save): `wiki/projects/gururmm.md` (recompiled), `wiki/projects/guru-rmm.md` (tombstone, new), `wiki/index.md`.
### Infrastructure & Servers
- GURU-5070: GuruRMM agent 0.6.54, AgentKey `agk_ybg4Ty6zXU_2Ee0ddlUUtuZdz0B9Qw4_`, SiteId `103c10b9-c1de-4dd8-b382-b8362ed3143e` ("Mike's Car"), device_id `a5c3fa53-193a-46e9-a83e-675eb1baaff0`, agent_id `c043d9ac-4020-4cab-a5f4-b90213d11e73`. Lone beta agent (explicit `update_channel=beta`).
- GuruRMM Postgres `gururmm` @ 172.16.3.30:5432 (binds 127.0.0.1; query over SSH guru@172.16.3.30, creds SOPS `projects/gururmm/database.sops.yaml`). Stable channel pinned 0.6.47 win / 0.6.46 linux (update_rollouts, 2026-05-28); beta has 0 rollout rows (serves newest signed artifact).
- Migrations 049_os_identity + 050_vss NOT yet applied to live Postgres (pending).
### Pending / Incomplete Tasks
- VSS (todo 8c86d987): Stage 3 dashboard UI; Code Review + Grok of OS-identifier + server stages; apply migrations 049/050 to live Postgres; wire kill-switch to a server-settings table (none exists); merge `feat/vss-shadow-copy` -> main after review.
- Tray (todo 25fdf31a): wire `terminate_all` into watchdog policy-disable/uninstall teardown so VSS-style graceful tray shutdown (#3) actually fires.
### Reference Information
- gururmm commits: tray fix `137dd85` (main); VSS backend `8f61624` (branch feat/vss-shadow-copy). Gitlink tracks `fe551e4`.
- Coord todos: 8c86d987 (VSS finish), 25fdf31a (terminate_all wiring).
- Spec: `projects/msp-tools/guru-rmm/docs/specs/SPEC-016-vss-shadow-copy-management.md` (v2, Approved).
- Bug register: `docs/FEATURE_ROADMAP.md` (BUG-020 tray).