Files
claudetools/session-logs/2026-05-28-howard-discovery-testing.md
Howard Enos c62b3c0626 sync: auto-sync from HOWARD-HOME at 2026-05-28 17:43:22
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-28 17:43:22
2026-05-28 17:43:29 -07:00

125 lines
7.9 KiB
Markdown

# Session Log — 2026-05-28 — GuruRMM Discovery Testing + Bug Fixes
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
---
## Session Summary
Howard installed the GuruRMM agent on WIN-TG2STMODJG8 at site eeb5f001-447b-4c1e-adc8-e18db2be9b5b and wanted to test the network discovery feature — specifically whether it could find devices on the network and auto-install the agent on them.
Research confirmed discovery is partially implemented: TCP connect probing + reverse DNS + ARP lookup shipped; ICMP/ARP/SNMP scanning and scheduled scans are not yet implemented (roadmap P2). Auto-installing the agent on discovered devices is not built — the "deploying" status is a label only, with no actual push-install mechanism behind it.
WIN-TG2STMODJG8 (agent ID eee9f26d-0dbc-4b8e-8e42-3a901b4ff73a) was configured as the discovery node for the site via API, with suggested subnet 172.16.0.0/23 auto-populated from the agent's network interface data. A scan was triggered and completed in ~16 seconds, finding 4 devices: 172.16.1.6 (TGC-SERVER, Windows, port 3389), 172.16.1.46 (WIN-TG2STMODJG8 itself, ports 135/445/5985), 172.16.1.136 (Windows, port 3389), 172.16.1.15 (Linux, port 22). All marked unmanaged.
Howard noticed the device count growing with each scan and asked about a timeout. Investigation confirmed two bugs: (1) no scan timeout — if the agent disconnects mid-scan, the scan record stays status=running forever; (2) no guard against triggering a second scan while one is running. Fixed in server/src/api/discovery.rs and server/src/db/discovery.rs: `expire_stale_scans()` marks any running scan older than 10 minutes as failed, and `has_running_scan()` blocks new triggers with HTTP 409 while a scan is active. Committed as c6f1f73.
The growing device count question was also resolved: the discovered_devices table is cumulative (unique constraint on site_id+ip_address, ON CONFLICT UPDATE). The apparent growth was because early scans were finding different IPs as ARP cache populated — once stable, `new_devices: 0` confirmed no duplicates were being created.
Howard then reported that a machine on the network could not be pinged from GuruRMM and also was not found by the scanner. Root cause: the ping check uses the system `ping` command (ICMP), which Windows Firewall blocks by default. The discovery scanner was TCP-only — a host with all ports firewalled and ICMP blocked would be invisible to both. Fixed in agent/src/discovery/mod.rs: added `ping_host()` as an ICMP fallback after TCP probing. If no TCP ports respond, the scanner runs `ping -n 1 -w 500 <ip>` (Windows) or `ping -c 1 -W 1 <ip>` (Linux). Hosts that respond to ICMP but have no open TCP ports now appear in discovery with `open_ports: []` and `os_hint: unknown`. Committed as fcf5833.
---
## Key Decisions
- **Discovery does NOT auto-install agents** — the "deploying" status flag exists in the UI and DB but there is no actual push-install mechanism. This is a future P2 feature. Clearly communicated to Howard.
- **Scheduled scans not implemented** — the UI shows Daily/Weekly options but the backend scheduler is not wired up. On-demand only. Roadmap P2.
- **Discovery is not AI-driven** — once the node is configured (one-time setup through the dashboard UI), scans are triggered by button click or (future) schedule. No AI involvement at runtime. Howard confirmed this was his expectation.
- **ICMP fallback uses shell ping, not raw sockets** — raw ICMP sockets require elevated privileges and are blocked on Windows without manifest changes. Shell `ping` binary approach matches the existing checks.rs pattern and works within the agent's current privilege level.
- **Stale scan timeout set to 10 minutes** — conservative enough to not expire legitimate scans on large subnets, aggressive enough to clean up disconnected-agent orphans before the next triggered scan.
- **HTTP 409 for concurrent scan guard** — standard REST conflict code; the dashboard's toast error handling will display the message to the user.
---
## Problems Encountered
- **Push rejected (twice)** — Mike had pushed between commit and push during both fixes. Resolved both times with `git pull --rebase origin main && git push origin main`.
---
## Configuration Changes
- `projects/msp-tools/guru-rmm/server/src/db/discovery.rs` — added `expire_stale_scans()` and `has_running_scan()` functions
- `projects/msp-tools/guru-rmm/server/src/api/discovery.rs` — wired both into `trigger_scan()`: expire stale scans before each trigger, block if already running (HTTP 409)
- `projects/msp-tools/guru-rmm/agent/src/discovery/mod.rs` — added `ping_host()` ICMP fallback; updated comment in run_scan to reflect fallback logic
- `projects/msp-tools/guru-rmm` (submodule) — advanced to fcf5833
---
## Credentials & Secrets
None created or modified this session.
---
## Infrastructure & Servers
- **GuruRMM dashboard:** https://rmm.azcomputerguru.com
- **GuruRMM server:** 172.16.3.30:3001
- **Discovery test site:** eeb5f001-447b-4c1e-adc8-e18db2be9b5b
- **Discovery node agent:** WIN-TG2STMODJG8 (eee9f26d-0dbc-4b8e-8e42-3a901b4ff73a), online
- **Subnet scanned:** 172.16.0.0/23
---
## Commands & Outputs
```bash
# Authenticate
POST /api/auth/login { email, password } → JWT token
# Get agents at site
GET /api/agents?site_id=eeb5f001-447b-4c1e-adc8-e18db2be9b5b
# → 10 agents; WIN-TG2STMODJG8 online
# Get suggested subnets
GET /api/agents/eee9f26d.../discovery/subnets
# → ["172.16.0.0/23"]
# Configure discovery node
POST /api/agents/eee9f26d.../discovery { site_id, ip_ranges: ["172.16.0.0/23"], ... }
# → node created
# Trigger scan
POST /api/sites/eeb5f001.../discovery/scan
# → scan_id: 6c25d374-..., status: initiated
# Completed in ~16s, devices_found: 4, new_devices: 4
# Second scan (dedup confirmation)
# → devices_found: 9, new_devices: 0 (stable, no duplicates)
# gururmm commits
c6f1f73 fix(discovery): add scan timeout and in-progress guard
fcf5833 fix(discovery): add ICMP ping fallback for TCP-silent hosts
```
---
## Pending / Incomplete Tasks
- **Discovery auto-deploy (P2):** Not built. Needs a mechanism to push the agent installer to a discovered Windows machine — likely SMB + PSExec or WMI with tech-provided credentials. Would be a significant new feature.
- **Discovery scheduling (P2):** UI has Daily/Weekly options but backend scheduler not implemented. Needs a background task on the server.
- **New agent build + deploy needed:** Both discovery fixes are in the codebase but won't take effect on the agent (ICMP fallback) or server (timeout + concurrency guard) until the next build is deployed to 172.16.3.30.
- **SPEC-012 implementation:** Sortable table headers, 4h estimate, no blockers.
- **SPEC-013 (P3):** Deferred — revisit after file transfer (P2) ships.
- **SPEC-014 follow-up (Mike's):** Policy tab UI for watch rules; push rules to agent on connect.
- **Cascades pending migration:** Ashley Jensen folder redirect, RECEPTIONIST-PC drives, NURSESTATION-PC HIPAA GPO, Nurses credential vault, Phase 3 domain joins, Entra Connect OU expansion, M365 relicensing (time-sensitive).
---
## Reference Information
- **Discovery node agent:** WIN-TG2STMODJG8 — eee9f26d-0dbc-4b8e-8e42-3a901b4ff73a
- **Discovery test site ID:** eeb5f001-447b-4c1e-adc8-e18db2be9b5b
- **Scan timeout fix commit:** c6f1f73 (server: expire_stale_scans 10min + 409 on concurrent)
- **ICMP fallback commit:** fcf5833 (agent: ping_host() fallback in discovery/mod.rs)
- **Default scan ports:** 22, 80, 135, 443, 445, 3389, 5985, 9100, 161
- **Scan concurrency:** 50 (configurable), timeout_ms: 500 per connection
- **discovery_nodes table:** agent_id PK, scan_config JSONB
- **discovered_devices table:** UNIQUE (site_id, ip_address) — cumulative across scans
- **discovery_scans table:** status IN ('running','completed','failed')
- **Syncro note:** POST /tickets/{id}/comments and POST /tickets/{id}/invoice return 404 for large-format ticket IDs. Workaround: POST /invoices (top-level) works. Comments require GUI.