sync: auto-sync from HOWARD-HOME at 2026-06-22 14:04:53

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-22 14:04:53
This commit is contained in:
2026-06-22 14:05:24 -07:00
parent 48286e80e0
commit 26aa5034f1
5 changed files with 194 additions and 1 deletions

View File

@@ -170,3 +170,5 @@
- [AI-auth product boundary](project_ai_auth_product_boundary.md) — ClaudeTools/ClaudeTools 3.0 = internal-only, per-person subscription OAuth ok; GuruRMM = sellable, customer brings own API key (never ACG's subscription); backend dev = internal. Anthropic ToS bans subscription auth in third-party products.
- [RMM SYSTEM context can't see user mapped drives](feedback_rmm_system_context_mapped_drives.md) — RMM runs as SYSTEM; `Test-Path F:\` etc. is False even when the user's mapped/redirected drive exists. Diagnose mapped-drive/redirect issues in `context:user_session`. Elevated apps (e.g. QB DB Server Manager "unable to retrieve root folder") need `EnableLinkedConnections=1` + reboot.
- [AD2 = Dataforth-ops fork](project_ad2_dataforth_fork.md) — branch ad2 = main + thin Dataforth layer; keep fork edits ADDITIVE (Dataforth context in clients/dataforth/CLAUDE.dataforth.md, NOT .claude/CLAUDE.md); rebase onto main directly when sync.sh self-lock hits; no vault/jq/sops/age on this box.
- [GuruScan verification IN TEST / paused](project_guruscan_in_test_paused.md) — multi-engine scanner verify on DESKTOP-MS42HNC paused 2026-06-22 (VM rebooted mid-Emsisoft run); HitmanPro done (36 removed), Emsisoft full-scan unverified; resume `guruscan-agent-test.sh DESKTOP-MS42HNC scan-one Emsisoft`; Defender RTP/Tamper still off on VM
- [GuruRMM fleet dispatch-hang fix](project_gururmm_dispatch_hang_fix.md) — blocking send_to on a full bounded channel to one black-holed agent wedged ALL command dispatch; fixed with try_send (9dae20c, deployed); proper black-hole eviction still missing (was reverted in 80df458) — finish it if it recurs

View File

@@ -0,0 +1,16 @@
---
name: project_gururmm_dispatch_hang_fix
description: GuruRMM fleet-wide command-dispatch hang root cause + fix (send_to try_send, 9dae20c) and the still-missing eviction
metadata:
type: project
---
On 2026-06-22 the live GuruRMM server (`172.16.3.30:3001`) hung on **every** `POST /api/agents/:id/command` (30s+ timeouts, all agents; GET worked) — command dispatch was down fleet-wide.
**Root cause:** `AgentConnections::send_to` (server `src/ws/mod.rs`) did a blocking `tx.send(msg).await` on a bounded (cap 100) per-agent mpsc channel. A black-holed/half-open agent socket stops its WS writer draining the channel → it fills → `send().await` blocks forever. `send_command` holds `state.agents.read().await` across that await, so the next agent (re)connect's `.write().await` starves tokio's write-preferring `RwLock`, queuing all later dispatches behind it. **One dead socket wedged the whole fleet.** The recovery path "evict non-delivering connections" (`7c578fd`) had been **reverted** (`80df458`), leaving no escape hatch.
**Fix (`9dae20c`, on `main`, deployed):** `send_to` now uses non-blocking `try_send` — a full/closed channel returns "not delivered"; the command stays persisted and is re-offered by `redispatch_pending_commands` (reconnect) + the reaper `requeue_undelivered_commands`. Failure stays local. Verified live (other agent ran a command end-to-end in ~5s).
**Still open / watch:** the proper per-connection eviction of a black-holed socket is still absent (only reverted code existed). A truly half-open agent will keep heartbeating `online` while its server→agent channel silently drops messages (commands dispatch as `running` but never return → reaper fails them on timeout). If this recurs, finish the eviction/keepalive-drop work rather than relying on `try_send` alone.
Deploy model: merging to `gururmm` `main` triggers the webhook build on `.30` (rebuild + `systemctl` restart, auto-rollback if the binary won't start). See the `gururmm-build` skill. Pairs with [[project_guruscan_in_test_paused]].

View File

@@ -0,0 +1,17 @@
---
name: project_guruscan_in_test_paused
description: GuruScan multi-engine scan verification is IN TEST / paused on DESKTOP-MS42HNC (resume steps + state)
metadata:
type: project
---
GuruScan (multi-engine malware scanner, `projects/msp-tools/guru-scan`, hardened at `fb09102`) is **IN TEST — PAUSED** as of 2026-06-22. Verifying full-scan + full-removal + automated lifecycle on test VM **DESKTOP-MS42HNC** (agent id `0de89b88-b21d-4647-ab64-96157ba87cc5`, client AZ Computer Guru / site Howard-VM — a flaky laptop VM that sleeps/reboots).
State:
- **HitmanPro** lifecycle already verified (36 threats detected+removed, reboot-cleanup task fires).
- **Emsisoft** full `/f=C:\` run (the heavy ~80-min engine) was launched with 516 samples staged but **interrupted by a VM reboot — NOT yet verified**. This is the open item.
- Resume hands-off: `bash .claude/scripts/guruscan-agent-test.sh DESKTOP-MS42HNC scan-one Emsisoft` (self-restores samples, detached no-cap, reports removal + results.json + cleanup task).
Cleanup still owed on the VM once testing is done: remove samples/zip/EICAR/test tasks, clear scanner quarantine, and **re-enable Windows Defender RTP + Tamper Protection** (disabled at the console for malware testing).
Blocker that surfaced + was fixed this session: a fleet-wide RMM command-dispatch hang — see [[project_gururmm_dispatch_hang_fix]].