diff --git a/docs/FEATURE_ROADMAP.md b/docs/FEATURE_ROADMAP.md index 977308f..b3497f2 100644 --- a/docs/FEATURE_ROADMAP.md +++ b/docs/FEATURE_ROADMAP.md @@ -51,6 +51,7 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001]( - [ ] **Stable machine identity + session lifecycle reaping + operator removal** — P1 — give the agent a deterministic machine-derived `machine_uid` (Windows `MachineGuid`-based) so the same box can't register duplicates (root cause: `agent_id` is a config-file random UUID that a portable/misconfigured run regenerates each launch); key registration on it; add TTL reaping + same-machine supersede as defense-in-depth; and admin-gated per-row + multi-select bulk removal of stale sessions/units. Identity must be bound to the per-machine agent key (spoof guard). Fixes ghost-session accumulation seen on the live console (15 sessions / 0 live, ~10 orphans for one machine). ([SPEC-004](specs/SPEC-004-session-lifecycle-and-removal.md)) - [ ] **Machines list view — dual connection indicators + rich rows** — P2 — ScreenConnect "Access"-list parity: per-row Host/Guest two-segment connection bar (Guest=agent online, Host=viewer connected, with names + durations) and rich inline metadata (company, site, device type, tags, logged-on user + idle, client version in red when outdated). Server-enriches `/api/machines` with live session state + SPEC-003 inventory. ([SPEC-005](specs/SPEC-005-machines-list-view-parity.md)) - [ ] Machines "by Company" tree nav with per-company counts — P3 — left-nav grouping sidebar (screenshot parity). Follow-up sub-item of SPEC-005. +- [ ] **Universal machine search ("everything is searchable")** — P2 — server-side `?q=` on `/api/machines` matching case-insensitive substring across ALL attributes (OS, logged-on user, external/private IP, company, site, tag, serial, MAC, version, …), pg_trgm GIN-indexed; multi-term AND + optional field-scoped syntax (`os:`, `user:`, `ip:`). Replaces the hostname-only client filter. Depends on SPEC-003 (attrs must be persisted). ([SPEC-006](specs/SPEC-006-universal-machine-search.md)) - [ ] Programmatic session pre-create + viewer-token (integration contract) — P2 ## Security & Infrastructure diff --git a/docs/specs/SPEC-006-universal-machine-search.md b/docs/specs/SPEC-006-universal-machine-search.md new file mode 100644 index 0000000..7aaf27a --- /dev/null +++ b/docs/specs/SPEC-006-universal-machine-search.md @@ -0,0 +1,145 @@ +# SPEC-006: Universal Machine Search ("Everything Is Searchable") + +**Status:** Proposed +**Priority:** P2 +**Requested By:** Mike (2026-05-30) +**Estimated Effort:** Medium + +## Overview + +Give the Operator Console a single search box that matches a query against **any** +attribute of any guest machine and returns all matches — OS, logged-on user, IP, company, +site, tag, serial, MAC, version, anything — exactly like ScreenConnect. Today the +dashboard filters only `hostname`/`agent_id` client-side (`MachinesPage.tsx:46`), which +is useless across a ~900-machine fleet. Success = typing "windows 11" finds every Win11 +box, "fred" finds every machine where Fred is/was logged on, and "98.97" finds the +machine at that IP — server-side, fast, case-insensitive substring, across the whole +fleet. + +**Examples (user-stated):** search an OS string → all machines with that OS; a username → +all machines with that logged-on user; an IP (external or private) → the matching +machine(s). + +## Scope + +### Included in v1 + +- **Server-side multi-attribute search** on `GET /api/machines?q=` (extend + `list_machines`), default = **case-insensitive substring, match-any-field** across: + `hostname`, `agent_id`/`machine_uid`, `organization` (company), `site`, `department`, + `device_type`, `tags`, `logged_on_user`, `os_name`/`os_version` (+locale), + `agent_version` (client version), `manufacturer`, `model`, `serial_number`, + `machine_description`, `external_ip`, `private_ip`, `mac_address`, `status`. +- **Multi-term AND:** space-separated terms all must match (each term may match a + different field) — "windows fred" = Win machine where Fred is logged on. +- **Performance:** a Postgres index that supports substring search across all columns + (see Architecture) so search stays fast at fleet scale. +- **IP/array handling:** `external_ip`/`private_ip` (INET, from SPEC-003) matched as text + for partial-IP queries ("98.97"); `tags` (TEXT[]) matched per-element. +- Wire the dashboard search box to the server query (debounced), replacing the + hostname-only client filter; render results through the SPEC-005 list view. + +### Nice-to-have (v1 if cheap, else fast-follow) + +- **Field-scoped syntax** like ScreenConnect: `os:windows`, `user:fred`, `ip:98.97`, + `company:"arizona computer"`, `tag:winter`, `version:23.9` — a known-prefix restricts + the match to that column; unprefixed terms stay match-any-field; quotes group a phrase. + +### Explicitly out of scope + +- Saved searches / smart groups / search-driven "session groups" (ScreenConnect's + `+ Create Session Group`) — separate feature. +- Boolean OR / NOT / regex / fuzzy ranking — v1 is AND-of-substring; relevance ranking + deferred. +- Searching **live-only** state not persisted to the DB (current viewer/host connection) + — exposed as a secondary structured filter (online/offline), not free-text, since it + lives in the in-memory `SessionManager`, not `connect_machines`. +- Full event/session-history search — this spec is machine search. + +## Architecture + +- **Relay-server / DB (the core):** add search to `db::machines` and `list_machines` + (`main.rs:636`). Query shape: + - Maintain a searchable-text representation of each `connect_machines` row and index it + for substring search. Two viable approaches (decide in planning): + 1. **`pg_trgm` GIN index** over a concatenated expression + (`lower(hostname||' '||coalesce(organization,'')||' '||…||' '||host(external_ip)||…)`). + Supports `ILIKE '%term%'` on arbitrary substrings with index acceleration. Simple, + matches ScreenConnect's substring semantics best. + 2. **Generated `search_text` column** (maintained on write / via the same path as + `update_machine_metadata`/`update_machine_inventory`) + `pg_trgm` GIN on it. + Slightly more plumbing, cleaner query, easier to weight later. + - `tags TEXT[]` folded into the text via `array_to_string(tags,' ')`; INET via + `host(external_ip)`/`host(private_ip)`. + - Field-scoped terms compile to a targeted `col ILIKE '%v%'` (or `= ANY(tags)` for + `tag:`); unscoped terms to the concatenated-text match; all terms AND-ed. +- **API:** `list_machines` gains `q` (and reuses the SPEC-005 enriched `MachineInfo` + payload so results render fully). Same `AuthenticatedUser` admin guard. +- **Dashboard:** the existing search input drives `listMachines({ q })` (debounced ~250ms) + in `dashboard/src/api/machines.ts`; remove the hostname-only client filter in + `MachinesPage.tsx`. Show result count + "no matches" empty state. +- **Protobuf:** none — server/dashboard/DB only. Searchability of inventory fields is + entirely dependent on SPEC-003 persisting them. + +## Implementation details + +- Files to touch: `server/migrations/` (new migration: `CREATE EXTENSION IF NOT EXISTS + pg_trgm;` + the GIN index / optional `search_text` column — idempotent, startup-applied + by `sqlx::migrate!()`, never pre-applied via psql); `server/src/db/machines.rs` + (`search_machines(q)` or `get_all_machines` gaining a filter; parameterized, never + string-interpolated); `server/src/main.rs:636` (`list_machines` reads `q`); + `dashboard/src/features/machines/MachinesPage.tsx:46` (drop local filter, call server), + `dashboard/src/api/machines.ts` (q param). +- Key logic: tokenize on whitespace (respecting quotes), classify each token as + scoped/unscoped, build a parameterized `WHERE term1 AND term2 …`. Cap query length and + term count. + +## Security considerations + +- **SQL injection:** all terms are bound parameters (`$n`), never concatenated into SQL. + Column list for scoped search is a fixed allowlist — a `field:` prefix can only map to a + known column, never arbitrary SQL. +- Admin-authenticated only (`AuthenticatedUser`), same as the existing machines list — no + new unauthenticated surface; search exposes nothing the list doesn't already. +- **DoS guard:** cap query length, number of terms, and result page size; `pg_trgm` keeps + worst-case substring scans index-backed so a broad query can't table-scan 900+ rows + unindexed. +- Treat the query as untrusted text; the response is the same admin-only machine data. + +## Testing strategy + +- **Unit:** tokenizer (quotes, multi-term, scoped vs. unscoped); query builder emits + parameterized SQL with the right AND structure; scoped `field:` maps only to allowlisted + columns (unknown prefix → treated as literal text, not an error/injection). +- **Integration (seeded DB):** "windows 11" returns all Win11 rows; a username returns + rows with that `logged_on_user`; a partial IP returns rows whose `external_ip`/ + `private_ip` contains it; a tag value returns tagged rows; multi-term ANDs correctly; + empty `q` returns the full list unchanged. +- **Performance:** with ~1k seeded rows, a broad substring query uses the GIN index + (EXPLAIN shows index scan, not seq scan) and returns within target latency. +- **Manual:** on the live console, reproduce the user's three examples (OS, username, IP). + +## Effort estimate & dependencies + +- **Size: Medium.** The query builder + index is the bulk; the API/dashboard wiring is + small. Field-scoped syntax adds a little parsing. +- **Depends on:** **SPEC-003** — the inventory attributes (OS detail, user, IP, MAC, + serial, version, device type) must be **persisted** on `connect_machines` to be + searchable; without it, search covers only the handful of existing columns. Reuses + **SPEC-005**'s enriched payload to render results; benefits from **SPEC-004** dedup so + results aren't padded with ghost duplicates. Migration orders after SPEC-003/004's. +- **Unblocks:** fast fleet triage ("who's on this IP / running this OS / logged in as X"), + and the saved-search / smart-group follow-up. + +## Open questions + +1. **Index strategy** — concatenated-expression `pg_trgm` GIN vs. a maintained + `search_text` column. Proposed: start with the expression index (less write plumbing); + move to `search_text` if weighting/ranking is wanted later. +2. **Field-scoped syntax in v1 or fast-follow?** Default match-any-field covers the stated + use cases; scoped syntax is polish. +3. **Result cap / pagination** — return all matches, or page (limit/offset)? At ~1k rows a + cap with "N more" may suffice; confirm. +4. **Include live online/host state as a filter** — free-text can't reach in-memory state; + offer `status:online` as a structured filter that the server resolves against the + SessionManager, or keep search DB-only in v1?