42 lines
2.9 KiB
Markdown
42 lines
2.9 KiB
Markdown
# GuruRMM duplicate-agent dedup plan (2026-07-04)
|
|
|
|
## Root cause (FIXED in code)
|
|
`server/src/ws/mod.rs` — both agent-connect enrollment paths deduped by **`(site_id, device_id)`**
|
|
(`get_agent_by_site_and_device`). A machine already enrolled in one site that ran a *different*
|
|
site's installer (a new site code, or the Staging installer) was not found in the target site, so
|
|
the server **created a new `agents` row** → a cross-site duplicate sharing the same `device_id`.
|
|
The `/api/agents/:id/move` endpoint itself is a clean `UPDATE` — it was NOT the culprit; the
|
|
duplicates were minted at re-enrollment. (The 500s under load are a separate server-capacity issue.)
|
|
|
|
**Fix:** added `db::get_agent_by_device_id()` (global, device_id is hardware-unique) and changed
|
|
both enroll paths to look up by **real device_id across all sites** and **re-home** the existing
|
|
record (`move_agent_to_site`) instead of creating a duplicate. Path 2 (enrollment key) only uses the
|
|
global match when a genuine `device_id` is present — it falls back to the site-scoped hostname match
|
|
when there is none, so distinct same-named machines (e.g. the 4x `SERVER`) are never wrongly merged.
|
|
A blanket `UNIQUE(device_id)` DB constraint is intentionally NOT added — it would collide on the
|
|
legacy hostname-fallback rows. `cargo check` passes.
|
|
|
|
## Deploy sequence (order matters)
|
|
1. **Deploy the fixed server** (build + restart). Until then, re-homing/re-enroll can still duplicate.
|
|
2. **Delete the 39 stale orphans** (below) — `dedup-plan.json.delete_ids`.
|
|
3. **Re-home the 37 wrong-site survivors** (below) — now safe (move UPDATEs, no dup). Many will
|
|
also self-correct on next enrollment once the fixed server is live.
|
|
|
|
## 39 SAFE deletes (same device_id, stale orphan — keep the live record, delete the stale one)
|
|
Machine-readable ids: `projects/gps-rmm-audit/dedup-plan.json` → `delete_ids`.
|
|
- 24 Dataforth `D2-*` (stale D2 originals; live dup currently in D1 — see site-fix)
|
|
- Staging/Bucket-C re-enroll orphans (CNX-LAB-00, CURTIS-002-W7, Mel-PC, TPS-SVR, etc.)
|
|
|
|
## 37 SITE-FIX (survivor is in the WRONG site after dedup — re-home once server is fixed)
|
|
- 24 Dataforth machines: live record sits in **D1**, belongs in **D2** → move to D2.
|
|
- ~13 Bucket-C machines: live record still in **Staging** → re-home to real client
|
|
(Safesite/Bell, Mineralogical, IMC, Curtis, etc.) via `reassign-staging.py` (works once move is reliable).
|
|
|
|
## 13 MANUAL review (different device_id — NOT auto-deleted; likely reimage or genuinely distinct)
|
|
BridgettePSHomeComputer, DESKTOP-BTR2AM3, GURU-5070, MSI, Maras-HP-Laptop, PST-SURFACE,
|
|
RECEPTIONIST-PC, RMM-TEST-MACHINE, SERVER (4x), Sif-Laptop554, Sif-Laptop555, bfcfbc739d23, gururmm.
|
|
These share a hostname but have different device_ids — inspect each before any delete.
|
|
|
|
## Note
|
|
The current RMM enrollment count (~105) is INFLATED by these duplicates; true unique-device count is lower.
|