Files
claudetools/session-logs/2026-05-28-howard-discovery-testing.md
Howard Enos c62b3c0626 sync: auto-sync from HOWARD-HOME at 2026-05-28 17:43:22
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-28 17:43:22
2026-05-28 17:43:29 -07:00

7.9 KiB

Session Log — 2026-05-28 — GuruRMM Discovery Testing + Bug Fixes

User

  • User: Howard Enos (howard)
  • Machine: Howard-Home
  • Role: tech

Session Summary

Howard installed the GuruRMM agent on WIN-TG2STMODJG8 at site eeb5f001-447b-4c1e-adc8-e18db2be9b5b and wanted to test the network discovery feature — specifically whether it could find devices on the network and auto-install the agent on them.

Research confirmed discovery is partially implemented: TCP connect probing + reverse DNS + ARP lookup shipped; ICMP/ARP/SNMP scanning and scheduled scans are not yet implemented (roadmap P2). Auto-installing the agent on discovered devices is not built — the "deploying" status is a label only, with no actual push-install mechanism behind it.

WIN-TG2STMODJG8 (agent ID eee9f26d-0dbc-4b8e-8e42-3a901b4ff73a) was configured as the discovery node for the site via API, with suggested subnet 172.16.0.0/23 auto-populated from the agent's network interface data. A scan was triggered and completed in ~16 seconds, finding 4 devices: 172.16.1.6 (TGC-SERVER, Windows, port 3389), 172.16.1.46 (WIN-TG2STMODJG8 itself, ports 135/445/5985), 172.16.1.136 (Windows, port 3389), 172.16.1.15 (Linux, port 22). All marked unmanaged.

Howard noticed the device count growing with each scan and asked about a timeout. Investigation confirmed two bugs: (1) no scan timeout — if the agent disconnects mid-scan, the scan record stays status=running forever; (2) no guard against triggering a second scan while one is running. Fixed in server/src/api/discovery.rs and server/src/db/discovery.rs: expire_stale_scans() marks any running scan older than 10 minutes as failed, and has_running_scan() blocks new triggers with HTTP 409 while a scan is active. Committed as c6f1f73.

The growing device count question was also resolved: the discovered_devices table is cumulative (unique constraint on site_id+ip_address, ON CONFLICT UPDATE). The apparent growth was because early scans were finding different IPs as ARP cache populated — once stable, new_devices: 0 confirmed no duplicates were being created.

Howard then reported that a machine on the network could not be pinged from GuruRMM and also was not found by the scanner. Root cause: the ping check uses the system ping command (ICMP), which Windows Firewall blocks by default. The discovery scanner was TCP-only — a host with all ports firewalled and ICMP blocked would be invisible to both. Fixed in agent/src/discovery/mod.rs: added ping_host() as an ICMP fallback after TCP probing. If no TCP ports respond, the scanner runs ping -n 1 -w 500 <ip> (Windows) or ping -c 1 -W 1 <ip> (Linux). Hosts that respond to ICMP but have no open TCP ports now appear in discovery with open_ports: [] and os_hint: unknown. Committed as fcf5833.


Key Decisions

  • Discovery does NOT auto-install agents — the "deploying" status flag exists in the UI and DB but there is no actual push-install mechanism. This is a future P2 feature. Clearly communicated to Howard.
  • Scheduled scans not implemented — the UI shows Daily/Weekly options but the backend scheduler is not wired up. On-demand only. Roadmap P2.
  • Discovery is not AI-driven — once the node is configured (one-time setup through the dashboard UI), scans are triggered by button click or (future) schedule. No AI involvement at runtime. Howard confirmed this was his expectation.
  • ICMP fallback uses shell ping, not raw sockets — raw ICMP sockets require elevated privileges and are blocked on Windows without manifest changes. Shell ping binary approach matches the existing checks.rs pattern and works within the agent's current privilege level.
  • Stale scan timeout set to 10 minutes — conservative enough to not expire legitimate scans on large subnets, aggressive enough to clean up disconnected-agent orphans before the next triggered scan.
  • HTTP 409 for concurrent scan guard — standard REST conflict code; the dashboard's toast error handling will display the message to the user.

Problems Encountered

  • Push rejected (twice) — Mike had pushed between commit and push during both fixes. Resolved both times with git pull --rebase origin main && git push origin main.

Configuration Changes

  • projects/msp-tools/guru-rmm/server/src/db/discovery.rs — added expire_stale_scans() and has_running_scan() functions
  • projects/msp-tools/guru-rmm/server/src/api/discovery.rs — wired both into trigger_scan(): expire stale scans before each trigger, block if already running (HTTP 409)
  • projects/msp-tools/guru-rmm/agent/src/discovery/mod.rs — added ping_host() ICMP fallback; updated comment in run_scan to reflect fallback logic
  • projects/msp-tools/guru-rmm (submodule) — advanced to fcf5833

Credentials & Secrets

None created or modified this session.


Infrastructure & Servers

  • GuruRMM dashboard: https://rmm.azcomputerguru.com
  • GuruRMM server: 172.16.3.30:3001
  • Discovery test site: eeb5f001-447b-4c1e-adc8-e18db2be9b5b
  • Discovery node agent: WIN-TG2STMODJG8 (eee9f26d-0dbc-4b8e-8e42-3a901b4ff73a), online
  • Subnet scanned: 172.16.0.0/23

Commands & Outputs

# Authenticate
POST /api/auth/login { email, password } → JWT token

# Get agents at site
GET /api/agents?site_id=eeb5f001-447b-4c1e-adc8-e18db2be9b5b
# → 10 agents; WIN-TG2STMODJG8 online

# Get suggested subnets
GET /api/agents/eee9f26d.../discovery/subnets
# → ["172.16.0.0/23"]

# Configure discovery node
POST /api/agents/eee9f26d.../discovery { site_id, ip_ranges: ["172.16.0.0/23"], ... }
# → node created

# Trigger scan
POST /api/sites/eeb5f001.../discovery/scan
# → scan_id: 6c25d374-..., status: initiated
# Completed in ~16s, devices_found: 4, new_devices: 4

# Second scan (dedup confirmation)
# → devices_found: 9, new_devices: 0 (stable, no duplicates)

# gururmm commits
c6f1f73  fix(discovery): add scan timeout and in-progress guard
fcf5833  fix(discovery): add ICMP ping fallback for TCP-silent hosts

Pending / Incomplete Tasks

  • Discovery auto-deploy (P2): Not built. Needs a mechanism to push the agent installer to a discovered Windows machine — likely SMB + PSExec or WMI with tech-provided credentials. Would be a significant new feature.
  • Discovery scheduling (P2): UI has Daily/Weekly options but backend scheduler not implemented. Needs a background task on the server.
  • New agent build + deploy needed: Both discovery fixes are in the codebase but won't take effect on the agent (ICMP fallback) or server (timeout + concurrency guard) until the next build is deployed to 172.16.3.30.
  • SPEC-012 implementation: Sortable table headers, 4h estimate, no blockers.
  • SPEC-013 (P3): Deferred — revisit after file transfer (P2) ships.
  • SPEC-014 follow-up (Mike's): Policy tab UI for watch rules; push rules to agent on connect.
  • Cascades pending migration: Ashley Jensen folder redirect, RECEPTIONIST-PC drives, NURSESTATION-PC HIPAA GPO, Nurses credential vault, Phase 3 domain joins, Entra Connect OU expansion, M365 relicensing (time-sensitive).

Reference Information

  • Discovery node agent: WIN-TG2STMODJG8 — eee9f26d-0dbc-4b8e-8e42-3a901b4ff73a
  • Discovery test site ID: eeb5f001-447b-4c1e-adc8-e18db2be9b5b
  • Scan timeout fix commit: c6f1f73 (server: expire_stale_scans 10min + 409 on concurrent)
  • ICMP fallback commit: fcf5833 (agent: ping_host() fallback in discovery/mod.rs)
  • Default scan ports: 22, 80, 135, 443, 445, 3389, 5985, 9100, 161
  • Scan concurrency: 50 (configurable), timeout_ms: 500 per connection
  • discovery_nodes table: agent_id PK, scan_config JSONB
  • discovered_devices table: UNIQUE (site_id, ip_address) — cumulative across scans
  • discovery_scans table: status IN ('running','completed','failed')
  • Syncro note: POST /tickets/{id}/comments and POST /tickets/{id}/invoice return 404 for large-format ticket IDs. Workaround: POST /invoices (top-level) works. Comments require GUI.