Single search box matching case-insensitive substring across ALL machine attributes (OS, logged-on user, external/private IP, company, site, tag, serial, MAC, client version, ...) server-side, ScreenConnect-style. Replaces the dashboard's hostname/agent_id-only client filter (inadequate at ~900+ machines). pg_trgm GIN index over a concatenated searchable-text expression (INET cast to text, tags via array_to_string); multi-term AND; optional field-scoped syntax (os:/user:/ip:). Parameterized + fixed column allowlist (no injection), admin-guarded, DoS-capped. Depends on SPEC-003 (attrs must be persisted to be searchable); reuses SPEC-005 enriched payload. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
146 lines
8.3 KiB
Markdown
146 lines
8.3 KiB
Markdown
# SPEC-006: Universal Machine Search ("Everything Is Searchable")
|
|
|
|
**Status:** Proposed
|
|
**Priority:** P2
|
|
**Requested By:** Mike (2026-05-30)
|
|
**Estimated Effort:** Medium
|
|
|
|
## Overview
|
|
|
|
Give the Operator Console a single search box that matches a query against **any**
|
|
attribute of any guest machine and returns all matches — OS, logged-on user, IP, company,
|
|
site, tag, serial, MAC, version, anything — exactly like ScreenConnect. Today the
|
|
dashboard filters only `hostname`/`agent_id` client-side (`MachinesPage.tsx:46`), which
|
|
is useless across a ~900-machine fleet. Success = typing "windows 11" finds every Win11
|
|
box, "fred" finds every machine where Fred is/was logged on, and "98.97" finds the
|
|
machine at that IP — server-side, fast, case-insensitive substring, across the whole
|
|
fleet.
|
|
|
|
**Examples (user-stated):** search an OS string → all machines with that OS; a username →
|
|
all machines with that logged-on user; an IP (external or private) → the matching
|
|
machine(s).
|
|
|
|
## Scope
|
|
|
|
### Included in v1
|
|
|
|
- **Server-side multi-attribute search** on `GET /api/machines?q=<text>` (extend
|
|
`list_machines`), default = **case-insensitive substring, match-any-field** across:
|
|
`hostname`, `agent_id`/`machine_uid`, `organization` (company), `site`, `department`,
|
|
`device_type`, `tags`, `logged_on_user`, `os_name`/`os_version` (+locale),
|
|
`agent_version` (client version), `manufacturer`, `model`, `serial_number`,
|
|
`machine_description`, `external_ip`, `private_ip`, `mac_address`, `status`.
|
|
- **Multi-term AND:** space-separated terms all must match (each term may match a
|
|
different field) — "windows fred" = Win machine where Fred is logged on.
|
|
- **Performance:** a Postgres index that supports substring search across all columns
|
|
(see Architecture) so search stays fast at fleet scale.
|
|
- **IP/array handling:** `external_ip`/`private_ip` (INET, from SPEC-003) matched as text
|
|
for partial-IP queries ("98.97"); `tags` (TEXT[]) matched per-element.
|
|
- Wire the dashboard search box to the server query (debounced), replacing the
|
|
hostname-only client filter; render results through the SPEC-005 list view.
|
|
|
|
### Nice-to-have (v1 if cheap, else fast-follow)
|
|
|
|
- **Field-scoped syntax** like ScreenConnect: `os:windows`, `user:fred`, `ip:98.97`,
|
|
`company:"arizona computer"`, `tag:winter`, `version:23.9` — a known-prefix restricts
|
|
the match to that column; unprefixed terms stay match-any-field; quotes group a phrase.
|
|
|
|
### Explicitly out of scope
|
|
|
|
- Saved searches / smart groups / search-driven "session groups" (ScreenConnect's
|
|
`+ Create Session Group`) — separate feature.
|
|
- Boolean OR / NOT / regex / fuzzy ranking — v1 is AND-of-substring; relevance ranking
|
|
deferred.
|
|
- Searching **live-only** state not persisted to the DB (current viewer/host connection)
|
|
— exposed as a secondary structured filter (online/offline), not free-text, since it
|
|
lives in the in-memory `SessionManager`, not `connect_machines`.
|
|
- Full event/session-history search — this spec is machine search.
|
|
|
|
## Architecture
|
|
|
|
- **Relay-server / DB (the core):** add search to `db::machines` and `list_machines`
|
|
(`main.rs:636`). Query shape:
|
|
- Maintain a searchable-text representation of each `connect_machines` row and index it
|
|
for substring search. Two viable approaches (decide in planning):
|
|
1. **`pg_trgm` GIN index** over a concatenated expression
|
|
(`lower(hostname||' '||coalesce(organization,'')||' '||…||' '||host(external_ip)||…)`).
|
|
Supports `ILIKE '%term%'` on arbitrary substrings with index acceleration. Simple,
|
|
matches ScreenConnect's substring semantics best.
|
|
2. **Generated `search_text` column** (maintained on write / via the same path as
|
|
`update_machine_metadata`/`update_machine_inventory`) + `pg_trgm` GIN on it.
|
|
Slightly more plumbing, cleaner query, easier to weight later.
|
|
- `tags TEXT[]` folded into the text via `array_to_string(tags,' ')`; INET via
|
|
`host(external_ip)`/`host(private_ip)`.
|
|
- Field-scoped terms compile to a targeted `col ILIKE '%v%'` (or `= ANY(tags)` for
|
|
`tag:`); unscoped terms to the concatenated-text match; all terms AND-ed.
|
|
- **API:** `list_machines` gains `q` (and reuses the SPEC-005 enriched `MachineInfo`
|
|
payload so results render fully). Same `AuthenticatedUser` admin guard.
|
|
- **Dashboard:** the existing search input drives `listMachines({ q })` (debounced ~250ms)
|
|
in `dashboard/src/api/machines.ts`; remove the hostname-only client filter in
|
|
`MachinesPage.tsx`. Show result count + "no matches" empty state.
|
|
- **Protobuf:** none — server/dashboard/DB only. Searchability of inventory fields is
|
|
entirely dependent on SPEC-003 persisting them.
|
|
|
|
## Implementation details
|
|
|
|
- Files to touch: `server/migrations/` (new migration: `CREATE EXTENSION IF NOT EXISTS
|
|
pg_trgm;` + the GIN index / optional `search_text` column — idempotent, startup-applied
|
|
by `sqlx::migrate!()`, never pre-applied via psql); `server/src/db/machines.rs`
|
|
(`search_machines(q)` or `get_all_machines` gaining a filter; parameterized, never
|
|
string-interpolated); `server/src/main.rs:636` (`list_machines` reads `q`);
|
|
`dashboard/src/features/machines/MachinesPage.tsx:46` (drop local filter, call server),
|
|
`dashboard/src/api/machines.ts` (q param).
|
|
- Key logic: tokenize on whitespace (respecting quotes), classify each token as
|
|
scoped/unscoped, build a parameterized `WHERE term1 AND term2 …`. Cap query length and
|
|
term count.
|
|
|
|
## Security considerations
|
|
|
|
- **SQL injection:** all terms are bound parameters (`$n`), never concatenated into SQL.
|
|
Column list for scoped search is a fixed allowlist — a `field:` prefix can only map to a
|
|
known column, never arbitrary SQL.
|
|
- Admin-authenticated only (`AuthenticatedUser`), same as the existing machines list — no
|
|
new unauthenticated surface; search exposes nothing the list doesn't already.
|
|
- **DoS guard:** cap query length, number of terms, and result page size; `pg_trgm` keeps
|
|
worst-case substring scans index-backed so a broad query can't table-scan 900+ rows
|
|
unindexed.
|
|
- Treat the query as untrusted text; the response is the same admin-only machine data.
|
|
|
|
## Testing strategy
|
|
|
|
- **Unit:** tokenizer (quotes, multi-term, scoped vs. unscoped); query builder emits
|
|
parameterized SQL with the right AND structure; scoped `field:` maps only to allowlisted
|
|
columns (unknown prefix → treated as literal text, not an error/injection).
|
|
- **Integration (seeded DB):** "windows 11" returns all Win11 rows; a username returns
|
|
rows with that `logged_on_user`; a partial IP returns rows whose `external_ip`/
|
|
`private_ip` contains it; a tag value returns tagged rows; multi-term ANDs correctly;
|
|
empty `q` returns the full list unchanged.
|
|
- **Performance:** with ~1k seeded rows, a broad substring query uses the GIN index
|
|
(EXPLAIN shows index scan, not seq scan) and returns within target latency.
|
|
- **Manual:** on the live console, reproduce the user's three examples (OS, username, IP).
|
|
|
|
## Effort estimate & dependencies
|
|
|
|
- **Size: Medium.** The query builder + index is the bulk; the API/dashboard wiring is
|
|
small. Field-scoped syntax adds a little parsing.
|
|
- **Depends on:** **SPEC-003** — the inventory attributes (OS detail, user, IP, MAC,
|
|
serial, version, device type) must be **persisted** on `connect_machines` to be
|
|
searchable; without it, search covers only the handful of existing columns. Reuses
|
|
**SPEC-005**'s enriched payload to render results; benefits from **SPEC-004** dedup so
|
|
results aren't padded with ghost duplicates. Migration orders after SPEC-003/004's.
|
|
- **Unblocks:** fast fleet triage ("who's on this IP / running this OS / logged in as X"),
|
|
and the saved-search / smart-group follow-up.
|
|
|
|
## Open questions
|
|
|
|
1. **Index strategy** — concatenated-expression `pg_trgm` GIN vs. a maintained
|
|
`search_text` column. Proposed: start with the expression index (less write plumbing);
|
|
move to `search_text` if weighting/ranking is wanted later.
|
|
2. **Field-scoped syntax in v1 or fast-follow?** Default match-any-field covers the stated
|
|
use cases; scoped syntax is polish.
|
|
3. **Result cap / pagination** — return all matches, or page (limit/offset)? At ~1k rows a
|
|
cap with "N more" may suffice; confirm.
|
|
4. **Include live online/host state as a filter** — free-text can't reach in-memory state;
|
|
offer `status:online` as a structured filter that the server resolves against the
|
|
SessionManager, or keep search DB-only in v1?
|