Files
guru-connect/docs/specs/SPEC-006-universal-machine-search.md
Mike Swanson 0eb38520ed spec: add SPEC-006 universal machine search
Single search box matching case-insensitive substring across ALL machine
attributes (OS, logged-on user, external/private IP, company, site, tag,
serial, MAC, client version, ...) server-side, ScreenConnect-style. Replaces
the dashboard's hostname/agent_id-only client filter (inadequate at ~900+
machines). pg_trgm GIN index over a concatenated searchable-text expression
(INET cast to text, tags via array_to_string); multi-term AND; optional
field-scoped syntax (os:/user:/ip:). Parameterized + fixed column allowlist
(no injection), admin-guarded, DoS-capped. Depends on SPEC-003 (attrs must be
persisted to be searchable); reuses SPEC-005 enriched payload. Requested by
Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:21:10 -07:00

8.3 KiB

SPEC-006: Universal Machine Search ("Everything Is Searchable")

Status: Proposed Priority: P2 Requested By: Mike (2026-05-30) Estimated Effort: Medium

Overview

Give the Operator Console a single search box that matches a query against any attribute of any guest machine and returns all matches — OS, logged-on user, IP, company, site, tag, serial, MAC, version, anything — exactly like ScreenConnect. Today the dashboard filters only hostname/agent_id client-side (MachinesPage.tsx:46), which is useless across a ~900-machine fleet. Success = typing "windows 11" finds every Win11 box, "fred" finds every machine where Fred is/was logged on, and "98.97" finds the machine at that IP — server-side, fast, case-insensitive substring, across the whole fleet.

Examples (user-stated): search an OS string → all machines with that OS; a username → all machines with that logged-on user; an IP (external or private) → the matching machine(s).

Scope

Included in v1

  • Server-side multi-attribute search on GET /api/machines?q=<text> (extend list_machines), default = case-insensitive substring, match-any-field across: hostname, agent_id/machine_uid, organization (company), site, department, device_type, tags, logged_on_user, os_name/os_version (+locale), agent_version (client version), manufacturer, model, serial_number, machine_description, external_ip, private_ip, mac_address, status.
  • Multi-term AND: space-separated terms all must match (each term may match a different field) — "windows fred" = Win machine where Fred is logged on.
  • Performance: a Postgres index that supports substring search across all columns (see Architecture) so search stays fast at fleet scale.
  • IP/array handling: external_ip/private_ip (INET, from SPEC-003) matched as text for partial-IP queries ("98.97"); tags (TEXT[]) matched per-element.
  • Wire the dashboard search box to the server query (debounced), replacing the hostname-only client filter; render results through the SPEC-005 list view.

Nice-to-have (v1 if cheap, else fast-follow)

  • Field-scoped syntax like ScreenConnect: os:windows, user:fred, ip:98.97, company:"arizona computer", tag:winter, version:23.9 — a known-prefix restricts the match to that column; unprefixed terms stay match-any-field; quotes group a phrase.

Explicitly out of scope

  • Saved searches / smart groups / search-driven "session groups" (ScreenConnect's + Create Session Group) — separate feature.
  • Boolean OR / NOT / regex / fuzzy ranking — v1 is AND-of-substring; relevance ranking deferred.
  • Searching live-only state not persisted to the DB (current viewer/host connection) — exposed as a secondary structured filter (online/offline), not free-text, since it lives in the in-memory SessionManager, not connect_machines.
  • Full event/session-history search — this spec is machine search.

Architecture

  • Relay-server / DB (the core): add search to db::machines and list_machines (main.rs:636). Query shape:
    • Maintain a searchable-text representation of each connect_machines row and index it for substring search. Two viable approaches (decide in planning):
      1. pg_trgm GIN index over a concatenated expression (lower(hostname||' '||coalesce(organization,'')||' '||…||' '||host(external_ip)||…)). Supports ILIKE '%term%' on arbitrary substrings with index acceleration. Simple, matches ScreenConnect's substring semantics best.
      2. Generated search_text column (maintained on write / via the same path as update_machine_metadata/update_machine_inventory) + pg_trgm GIN on it. Slightly more plumbing, cleaner query, easier to weight later.
    • tags TEXT[] folded into the text via array_to_string(tags,' '); INET via host(external_ip)/host(private_ip).
    • Field-scoped terms compile to a targeted col ILIKE '%v%' (or = ANY(tags) for tag:); unscoped terms to the concatenated-text match; all terms AND-ed.
  • API: list_machines gains q (and reuses the SPEC-005 enriched MachineInfo payload so results render fully). Same AuthenticatedUser admin guard.
  • Dashboard: the existing search input drives listMachines({ q }) (debounced ~250ms) in dashboard/src/api/machines.ts; remove the hostname-only client filter in MachinesPage.tsx. Show result count + "no matches" empty state.
  • Protobuf: none — server/dashboard/DB only. Searchability of inventory fields is entirely dependent on SPEC-003 persisting them.

Implementation details

  • Files to touch: server/migrations/ (new migration: CREATE EXTENSION IF NOT EXISTS pg_trgm; + the GIN index / optional search_text column — idempotent, startup-applied by sqlx::migrate!(), never pre-applied via psql); server/src/db/machines.rs (search_machines(q) or get_all_machines gaining a filter; parameterized, never string-interpolated); server/src/main.rs:636 (list_machines reads q); dashboard/src/features/machines/MachinesPage.tsx:46 (drop local filter, call server), dashboard/src/api/machines.ts (q param).
  • Key logic: tokenize on whitespace (respecting quotes), classify each token as scoped/unscoped, build a parameterized WHERE term1 AND term2 …. Cap query length and term count.

Security considerations

  • SQL injection: all terms are bound parameters ($n), never concatenated into SQL. Column list for scoped search is a fixed allowlist — a field: prefix can only map to a known column, never arbitrary SQL.
  • Admin-authenticated only (AuthenticatedUser), same as the existing machines list — no new unauthenticated surface; search exposes nothing the list doesn't already.
  • DoS guard: cap query length, number of terms, and result page size; pg_trgm keeps worst-case substring scans index-backed so a broad query can't table-scan 900+ rows unindexed.
  • Treat the query as untrusted text; the response is the same admin-only machine data.

Testing strategy

  • Unit: tokenizer (quotes, multi-term, scoped vs. unscoped); query builder emits parameterized SQL with the right AND structure; scoped field: maps only to allowlisted columns (unknown prefix → treated as literal text, not an error/injection).
  • Integration (seeded DB): "windows 11" returns all Win11 rows; a username returns rows with that logged_on_user; a partial IP returns rows whose external_ip/ private_ip contains it; a tag value returns tagged rows; multi-term ANDs correctly; empty q returns the full list unchanged.
  • Performance: with ~1k seeded rows, a broad substring query uses the GIN index (EXPLAIN shows index scan, not seq scan) and returns within target latency.
  • Manual: on the live console, reproduce the user's three examples (OS, username, IP).

Effort estimate & dependencies

  • Size: Medium. The query builder + index is the bulk; the API/dashboard wiring is small. Field-scoped syntax adds a little parsing.
  • Depends on: SPEC-003 — the inventory attributes (OS detail, user, IP, MAC, serial, version, device type) must be persisted on connect_machines to be searchable; without it, search covers only the handful of existing columns. Reuses SPEC-005's enriched payload to render results; benefits from SPEC-004 dedup so results aren't padded with ghost duplicates. Migration orders after SPEC-003/004's.
  • Unblocks: fast fleet triage ("who's on this IP / running this OS / logged in as X"), and the saved-search / smart-group follow-up.

Open questions

  1. Index strategy — concatenated-expression pg_trgm GIN vs. a maintained search_text column. Proposed: start with the expression index (less write plumbing); move to search_text if weighting/ranking is wanted later.
  2. Field-scoped syntax in v1 or fast-follow? Default match-any-field covers the stated use cases; scoped syntax is polish.
  3. Result cap / pagination — return all matches, or page (limit/offset)? At ~1k rows a cap with "N more" may suffice; confirm.
  4. Include live online/host state as a filter — free-text can't reach in-memory state; offer status:online as a structured filter that the server resolves against the SessionManager, or keep search DB-only in v1?