spec: add SPEC-006 universal machine search

Single search box matching case-insensitive substring across ALL machine
attributes (OS, logged-on user, external/private IP, company, site, tag,
serial, MAC, client version, ...) server-side, ScreenConnect-style. Replaces
the dashboard's hostname/agent_id-only client filter (inadequate at ~900+
machines). pg_trgm GIN index over a concatenated searchable-text expression
(INET cast to text, tags via array_to_string); multi-term AND; optional
field-scoped syntax (os:/user:/ip:). Parameterized + fixed column allowlist
(no injection), admin-guarded, DoS-capped. Depends on SPEC-003 (attrs must be
persisted to be searchable); reuses SPEC-005 enriched payload. Requested by
Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-30 16:21:10 -07:00
parent cdc182f0fb
commit 0eb38520ed
2 changed files with 146 additions and 0 deletions

View File

@@ -51,6 +51,7 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
- [ ] **Stable machine identity + session lifecycle reaping + operator removal** — P1 — give the agent a deterministic machine-derived `machine_uid` (Windows `MachineGuid`-based) so the same box can't register duplicates (root cause: `agent_id` is a config-file random UUID that a portable/misconfigured run regenerates each launch); key registration on it; add TTL reaping + same-machine supersede as defense-in-depth; and admin-gated per-row + multi-select bulk removal of stale sessions/units. Identity must be bound to the per-machine agent key (spoof guard). Fixes ghost-session accumulation seen on the live console (15 sessions / 0 live, ~10 orphans for one machine). ([SPEC-004](specs/SPEC-004-session-lifecycle-and-removal.md))
- [ ] **Machines list view — dual connection indicators + rich rows** — P2 — ScreenConnect "Access"-list parity: per-row Host/Guest two-segment connection bar (Guest=agent online, Host=viewer connected, with names + durations) and rich inline metadata (company, site, device type, tags, logged-on user + idle, client version in red when outdated). Server-enriches `/api/machines` with live session state + SPEC-003 inventory. ([SPEC-005](specs/SPEC-005-machines-list-view-parity.md))
- [ ] Machines "by Company" tree nav with per-company counts — P3 — left-nav grouping sidebar (screenshot parity). Follow-up sub-item of SPEC-005.
- [ ] **Universal machine search ("everything is searchable")** — P2 — server-side `?q=` on `/api/machines` matching case-insensitive substring across ALL attributes (OS, logged-on user, external/private IP, company, site, tag, serial, MAC, version, …), pg_trgm GIN-indexed; multi-term AND + optional field-scoped syntax (`os:`, `user:`, `ip:`). Replaces the hostname-only client filter. Depends on SPEC-003 (attrs must be persisted). ([SPEC-006](specs/SPEC-006-universal-machine-search.md))
- [ ] Programmatic session pre-create + viewer-token (integration contract) — P2
## Security & Infrastructure

View File

@@ -0,0 +1,145 @@
# SPEC-006: Universal Machine Search ("Everything Is Searchable")
**Status:** Proposed
**Priority:** P2
**Requested By:** Mike (2026-05-30)
**Estimated Effort:** Medium
## Overview
Give the Operator Console a single search box that matches a query against **any**
attribute of any guest machine and returns all matches — OS, logged-on user, IP, company,
site, tag, serial, MAC, version, anything — exactly like ScreenConnect. Today the
dashboard filters only `hostname`/`agent_id` client-side (`MachinesPage.tsx:46`), which
is useless across a ~900-machine fleet. Success = typing "windows 11" finds every Win11
box, "fred" finds every machine where Fred is/was logged on, and "98.97" finds the
machine at that IP — server-side, fast, case-insensitive substring, across the whole
fleet.
**Examples (user-stated):** search an OS string → all machines with that OS; a username →
all machines with that logged-on user; an IP (external or private) → the matching
machine(s).
## Scope
### Included in v1
- **Server-side multi-attribute search** on `GET /api/machines?q=<text>` (extend
`list_machines`), default = **case-insensitive substring, match-any-field** across:
`hostname`, `agent_id`/`machine_uid`, `organization` (company), `site`, `department`,
`device_type`, `tags`, `logged_on_user`, `os_name`/`os_version` (+locale),
`agent_version` (client version), `manufacturer`, `model`, `serial_number`,
`machine_description`, `external_ip`, `private_ip`, `mac_address`, `status`.
- **Multi-term AND:** space-separated terms all must match (each term may match a
different field) — "windows fred" = Win machine where Fred is logged on.
- **Performance:** a Postgres index that supports substring search across all columns
(see Architecture) so search stays fast at fleet scale.
- **IP/array handling:** `external_ip`/`private_ip` (INET, from SPEC-003) matched as text
for partial-IP queries ("98.97"); `tags` (TEXT[]) matched per-element.
- Wire the dashboard search box to the server query (debounced), replacing the
hostname-only client filter; render results through the SPEC-005 list view.
### Nice-to-have (v1 if cheap, else fast-follow)
- **Field-scoped syntax** like ScreenConnect: `os:windows`, `user:fred`, `ip:98.97`,
`company:"arizona computer"`, `tag:winter`, `version:23.9` — a known-prefix restricts
the match to that column; unprefixed terms stay match-any-field; quotes group a phrase.
### Explicitly out of scope
- Saved searches / smart groups / search-driven "session groups" (ScreenConnect's
`+ Create Session Group`) — separate feature.
- Boolean OR / NOT / regex / fuzzy ranking — v1 is AND-of-substring; relevance ranking
deferred.
- Searching **live-only** state not persisted to the DB (current viewer/host connection)
— exposed as a secondary structured filter (online/offline), not free-text, since it
lives in the in-memory `SessionManager`, not `connect_machines`.
- Full event/session-history search — this spec is machine search.
## Architecture
- **Relay-server / DB (the core):** add search to `db::machines` and `list_machines`
(`main.rs:636`). Query shape:
- Maintain a searchable-text representation of each `connect_machines` row and index it
for substring search. Two viable approaches (decide in planning):
1. **`pg_trgm` GIN index** over a concatenated expression
(`lower(hostname||' '||coalesce(organization,'')||' '||…||' '||host(external_ip)||…)`).
Supports `ILIKE '%term%'` on arbitrary substrings with index acceleration. Simple,
matches ScreenConnect's substring semantics best.
2. **Generated `search_text` column** (maintained on write / via the same path as
`update_machine_metadata`/`update_machine_inventory`) + `pg_trgm` GIN on it.
Slightly more plumbing, cleaner query, easier to weight later.
- `tags TEXT[]` folded into the text via `array_to_string(tags,' ')`; INET via
`host(external_ip)`/`host(private_ip)`.
- Field-scoped terms compile to a targeted `col ILIKE '%v%'` (or `= ANY(tags)` for
`tag:`); unscoped terms to the concatenated-text match; all terms AND-ed.
- **API:** `list_machines` gains `q` (and reuses the SPEC-005 enriched `MachineInfo`
payload so results render fully). Same `AuthenticatedUser` admin guard.
- **Dashboard:** the existing search input drives `listMachines({ q })` (debounced ~250ms)
in `dashboard/src/api/machines.ts`; remove the hostname-only client filter in
`MachinesPage.tsx`. Show result count + "no matches" empty state.
- **Protobuf:** none — server/dashboard/DB only. Searchability of inventory fields is
entirely dependent on SPEC-003 persisting them.
## Implementation details
- Files to touch: `server/migrations/` (new migration: `CREATE EXTENSION IF NOT EXISTS
pg_trgm;` + the GIN index / optional `search_text` column — idempotent, startup-applied
by `sqlx::migrate!()`, never pre-applied via psql); `server/src/db/machines.rs`
(`search_machines(q)` or `get_all_machines` gaining a filter; parameterized, never
string-interpolated); `server/src/main.rs:636` (`list_machines` reads `q`);
`dashboard/src/features/machines/MachinesPage.tsx:46` (drop local filter, call server),
`dashboard/src/api/machines.ts` (q param).
- Key logic: tokenize on whitespace (respecting quotes), classify each token as
scoped/unscoped, build a parameterized `WHERE term1 AND term2 …`. Cap query length and
term count.
## Security considerations
- **SQL injection:** all terms are bound parameters (`$n`), never concatenated into SQL.
Column list for scoped search is a fixed allowlist — a `field:` prefix can only map to a
known column, never arbitrary SQL.
- Admin-authenticated only (`AuthenticatedUser`), same as the existing machines list — no
new unauthenticated surface; search exposes nothing the list doesn't already.
- **DoS guard:** cap query length, number of terms, and result page size; `pg_trgm` keeps
worst-case substring scans index-backed so a broad query can't table-scan 900+ rows
unindexed.
- Treat the query as untrusted text; the response is the same admin-only machine data.
## Testing strategy
- **Unit:** tokenizer (quotes, multi-term, scoped vs. unscoped); query builder emits
parameterized SQL with the right AND structure; scoped `field:` maps only to allowlisted
columns (unknown prefix → treated as literal text, not an error/injection).
- **Integration (seeded DB):** "windows 11" returns all Win11 rows; a username returns
rows with that `logged_on_user`; a partial IP returns rows whose `external_ip`/
`private_ip` contains it; a tag value returns tagged rows; multi-term ANDs correctly;
empty `q` returns the full list unchanged.
- **Performance:** with ~1k seeded rows, a broad substring query uses the GIN index
(EXPLAIN shows index scan, not seq scan) and returns within target latency.
- **Manual:** on the live console, reproduce the user's three examples (OS, username, IP).
## Effort estimate & dependencies
- **Size: Medium.** The query builder + index is the bulk; the API/dashboard wiring is
small. Field-scoped syntax adds a little parsing.
- **Depends on:** **SPEC-003** — the inventory attributes (OS detail, user, IP, MAC,
serial, version, device type) must be **persisted** on `connect_machines` to be
searchable; without it, search covers only the handful of existing columns. Reuses
**SPEC-005**'s enriched payload to render results; benefits from **SPEC-004** dedup so
results aren't padded with ghost duplicates. Migration orders after SPEC-003/004's.
- **Unblocks:** fast fleet triage ("who's on this IP / running this OS / logged in as X"),
and the saved-search / smart-group follow-up.
## Open questions
1. **Index strategy** — concatenated-expression `pg_trgm` GIN vs. a maintained
`search_text` column. Proposed: start with the expression index (less write plumbing);
move to `search_text` if weighting/ranking is wanted later.
2. **Field-scoped syntax in v1 or fast-follow?** Default match-any-field covers the stated
use cases; scoped syntax is polish.
3. **Result cap / pagination** — return all matches, or page (limit/offset)? At ~1k rows a
cap with "N more" may suffice; confirm.
4. **Include live online/host state as a filter** — free-text can't reach in-memory state;
offer `status:online` as a structured filter that the server resolves against the
SessionManager, or keep search DB-only in v1?