diff --git a/docs/FEATURE_ROADMAP.md b/docs/FEATURE_ROADMAP.md index 0dabfe9..df62802 100644 --- a/docs/FEATURE_ROADMAP.md +++ b/docs/FEATURE_ROADMAP.md @@ -68,6 +68,7 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001]( - [x] Protobuf-over-WSS transport, Zstd frame compression - [~] React/TS web viewer (`dashboard/src/components/RemoteViewer.tsx`) — embeddable session viewer - [ ] **Headless Linux mode (direct TTY access)** — P2 — Terminal-based remote access for Linux servers without GUI. PTY spawn (`openpty`), xterm.js web viewer, full ANSI/VT100 support. Enables server management, container debugging, emergency recovery via GuruConnect dashboard with audit logging. SSH replacement with centralized auth. ([SPEC-012](specs/SPEC-012-headless-linux-tty.md)) +- [ ] **Managed-agent SYSTEM service host + session broker** — P1 — convert the persistent agent from `HKCU Run` (user context) to a LocalSystem **service** that runs unattended (login screen, no user, across reboots) and spawns a per-session capture/input worker into the active desktop (Session 0 can't capture directly). Unblocks SPEC-016 Phase B end-to-end (the SYSTEM-ACL'd `cak_` store becomes readable; removes the Phase B fail-fast guard), enables true unattended access, and is the **broker primitive SPEC-013 builds on**. ([SPEC-018](specs/SPEC-018-managed-agent-service-host.md)) - [ ] **Windows session selection and backstage mode** — P2 — Enumerate and switch between Windows user sessions (Terminal Services/RDP/Fast User Switching) and access Session 0 (backstage) for system-level admin tasks. ScreenConnect parity: session selector shows all logged-on users, instant switching without reconnect. Backstage mode provides terminal/command interface for services management without disrupting any user desktop. Critical for multi-user server environments. ([SPEC-013](specs/SPEC-013-session-selection-and-backstage.md)) - [ ] **Configurable notification overlay on viewer connection** — P2 — Display a semi-transparent on-screen notification when a technician connects, showing technician name and company. Dashboard-configurable message template (supports `{{technician_name}}`, `{{company}}`, `{{time}}`), duration (5-60s), position (top-left/right, bottom-left/right, center), and dismissible behavior. Increases transparency and user awareness during remote support sessions. Compliance-friendly for privacy policies requiring user notification. ([SPEC-015](specs/SPEC-015-notification-overlay.md)) - [ ] Multi-monitor switching — P2 diff --git a/docs/specs/SPEC-018-managed-agent-service-host.md b/docs/specs/SPEC-018-managed-agent-service-host.md new file mode 100644 index 0000000..7aa3b14 --- /dev/null +++ b/docs/specs/SPEC-018-managed-agent-service-host.md @@ -0,0 +1,146 @@ +# SPEC-018: Managed-Agent SYSTEM Service Host + Session Broker + +**Status:** Proposed +**Priority:** P1 (blocks SPEC-016 Phase B end-to-end runtime and SPEC-013) +**Requested By:** Mike (2026-06-02) +**Estimated Effort:** X-Large + +## Overview + +Convert the managed/persistent GuruConnect agent from a user-context `HKCU\…\Run` autostart into a +**Windows SYSTEM service** that runs unattended — at the login screen, with no user logged in, across +reboots — and **brokers per-session capture/input worker processes** into the active interactive +desktop. A SYSTEM service lives in the isolated **Session 0** and cannot capture or inject the +interactive desktop directly, so the service spawns a worker into the target user session (the +ScreenConnect architecture). + +This is foundational, not cosmetic. It unblocks three things at once: +1. **SPEC-016 Phase B end-to-end runtime** — the per-machine `cak_` store is ACL'd to SYSTEM + + Administrators; today the agent runs as the interactive *user* and can't read its own store (the + Phase B C1 *fail-fast guard* exists precisely because of this). Running as SYSTEM makes the store + readable and removes the guard. +2. **True unattended access** — a user-context agent only runs while that user is logged in. Reaching + a rebooted server or a machine sitting at the login screen (table-stakes for remote support) + requires SYSTEM. +3. **SPEC-013 session selection / backstage** — the session-broker primitive built here is the + substrate SPEC-013's session-switching UX drives. + +**Success criteria:** the managed agent installs as an auto-start SYSTEM service; it holds the relay +connection and performs SPEC-016 enrollment as SYSTEM (reading/writing the SYSTEM-ACL'd `cak_`); it +spawns a capture/input worker into the active interactive session and relays frames; the worker is +respawned/retargeted on logon/logoff/console-connect; and the Phase B fail-fast guard is removed +because the store is now readable in-context. + +## Background — why this is needed (confirmed in code) + +- The persistent agent autostarts via `HKCU\…\Run` (`agent/src/startup.rs:21`, `STARTUP_KEY` = HKCU) + → interactive-user token, not SYSTEM. The only SYSTEM service today is the separate `sas_service` + (Secure Attention Sequence helper). +- SPEC-016 Phase B (`agent/src/credential_store.rs`) ACLs the `cak_` store to `*S-1-5-18` (SYSTEM) + + `*S-1-5-32-544` (Administrators). In the current user context the agent writes but cannot read it + back → the Phase B fail-fast guard (`agent/src/main.rs` `resolve_agent_credential`) emits + "must run as the GuruConnect SYSTEM service (see SPEC-018)" instead of bricking. +- Capture/input live in the agent process (`agent/src/capture/`, `agent/src/input/`); a Session-0 + SYSTEM service cannot drive these against the interactive desktop without a per-session worker. + +## Scope + +### Included in v1 + +1. **Windows service install/lifecycle** (`agent/src/install.rs` + a new service module): register the + managed agent as a **LocalSystem auto-start service** (`CreateServiceW` / a service crate), + configure failure/recovery (restart on crash), and **replace the HKCU `Run` autostart for managed + mode** (remove the Run entry on service install). Clean uninstall (stop + delete service). +2. **Service control loop** (Session 0, SYSTEM): owns the persistent WSS connection to the relay, + performs SPEC-016 enrollment as SYSTEM (now able to read/write the `cak_` store), and dispatches + session/connect requests to workers. Handles `SERVICE_CONTROL_STOP`/`SHUTDOWN` and + `SERVICE_CONTROL_SESSIONCHANGE`. +3. **Session broker:** enumerate sessions (`WTSEnumerateSessionsW`), resolve the active interactive + session (`WTSGetActiveConsoleSessionId`), obtain its user token (`WTSQueryUserToken` → + `DuplicateTokenEx`), and spawn a **per-session capture/input worker** into that session's desktop + (`CreateProcessAsUserW`, `winsta0\default`). The worker does DXGI capture + input injection in the + user's session; the service relays frames over the existing transport. +4. **Service ↔ worker IPC:** a local, ACL'd channel (named pipe `\\.\pipe\guruconnect-`) + carrying frames/input/control; pipe ACL restricted to SYSTEM + the target session user. +5. **Session-change handling:** on logon/logoff/console-connect/disconnect/lock/unlock, (re)spawn or + retarget the worker so the active desktop is always the one being served. +6. **Remove the SPEC-016 Phase B fail-fast guard** once the service runs as SYSTEM (the store is + readable in-context); keep the SYSTEM+Administrators ACL. + +### Explicitly out of scope (anticipated, separate specs) + +- **Session-selection / backstage UX** — the operator-facing picker and Session-0/secure-desktop + command surface are **SPEC-013**; this spec only provides the broker primitive it drives. +- **Login-screen / secure-desktop (winlogon) capture** beyond the broker hook — the hard + Secure-Desktop case is coordinated with SPEC-013; v1 here targets the active interactive session. +- **macOS/Linux service equivalents** — future SPEC-010 (cross-platform agents). + +## Architecture + +- **Agent splits into two roles:** + - **service-host** (LocalSystem, Session 0): service lifecycle, relay transport, SPEC-016 + enrollment + `cak_` store, session broker, IPC server. + - **session-worker** (per interactive session, user token): DXGI/GDI capture, input injection, + IPC client. Spawned by the service via `CreateProcessAsUserW`. +- **Service install** (`install.rs`): `CreateServiceW` with `SERVICE_AUTO_START`, `SERVICE_WIN32_OWN_PROCESS`, + recovery actions; uninstall stops + deletes. Replaces managed-mode `HKCU Run`. +- **Token handoff:** `WTSGetActiveConsoleSessionId` → `WTSQueryUserToken` → `DuplicateTokenEx` + (primary token) → `CreateProcessAsUserW` with `lpDesktop = "winsta0\\default"`. +- **IPC:** named pipe per session, length-prefixed protobuf (reuse `proto/` message types where + sensible), pipe security descriptor granting only SYSTEM + the session user. +- **Session events:** the service registers for `SERVICE_CONTROL_SESSIONCHANGE` and reacts to + `WTS_CONSOLE_CONNECT`, `WTS_SESSION_LOGON/LOGOFF`, `WTS_SESSION_LOCK/UNLOCK`. + +## Security considerations + +- **LocalSystem is maximal privilege** — minimize the service's attack surface; validate every + relay-delivered command; never spawn a worker except into a legitimately-enumerated active session. +- **IPC pipe must be ACL'd** (SYSTEM + the specific session user only) so a non-admin user can't + inject capture/input commands by connecting to the pipe. +- **Token hygiene:** close duplicated tokens promptly; don't leak SYSTEM or user primary tokens. +- The SPEC-016 `cak_` store (SYSTEM-ACL'd) is now correctly readable; the fail-fast guard is removed + but the ACL stays. +- **Audit:** service start/stop, enrollment-as-SYSTEM, worker spawn, session attach/retarget — written + to the existing event pipeline. + +## Implementation details + +- New service module (e.g. `agent/src/service/{mod.rs, broker.rs, ipc.rs}`); worker entry split out of + the current capture path. New `Commands` variants or an internal `--service`/`--session-worker` + dispatch in `agent/src/main.rs`. +- `install.rs`: service create/recovery/delete; drop the managed-mode HKCU `Run` write. +- `windows` crate features: `Win32_System_Services`, `Win32_System_RemoteDesktop` + (`WTS*`), `Win32_Security`, `Win32_System_Threading` (`CreateProcessAsUserW`), + `Win32_System_Pipes`. +- Remove the `resolve_agent_credential` fail-fast guard branch added in SPEC-016 Phase B. + +## Testing strategy + +- **Service:** install → auto-start on boot → stop → uninstall on a clean VM. +- **`cak_` end-to-end:** SYSTEM service enrolls (SPEC-016), stores + reads the `cak_`, connects — the + integration test SPEC-016 Phase B currently cannot run. +- **Session broker:** worker spawns into the active session; capture/input work; survives logoff→logon + (respawn) and console-connect (retarget); fast-user-switch retarget. +- **Security:** non-admin cannot connect to the IPC pipe; worker runs with the user's token (not + SYSTEM) in the user's desktop. + +## Effort estimate & dependencies + +- **Size:** X-Large (service host + worker split + token-handoff + IPC + session-change handling + + install/uninstall). +- **Depends on:** SPEC-016 (enrollment + `cak_` store); the existing capture/input cores. +- **Unblocks:** SPEC-016 Phase B end-to-end runtime (and the parked managed-agent enrollment test on + the internal beta machines); **SPEC-013** (session selection builds on this broker). + +## Open questions + +1. **Service vs. SYSTEM scheduled task** — a true Windows service (recovery, SCM integration) is the + standard, robust choice; recommend service. Lock in planning. +2. **One multi-session worker vs. one worker per session** — per-session worker is simpler to reason + about and isolates a crash to one session; confirm. +3. **IPC transport** — named pipe (recommended) vs. local TCP/loopback; pipe ACLing is the cleaner + security story. +4. **Login-screen / Secure-Desktop capture** — how much (if any) in this spec vs. deferred to SPEC-013 + (it needs a worker in the winlogon/secure desktop, a distinct hard problem). +5. **Migration** — on upgrade, cleanly transition existing HKCU-`Run` managed installs to the service + (remove the Run entry, install the service) without a gap.