Files
guru-connect/docs/specs/SPEC-018-managed-agent-service-host.md
Mike Swanson 94c07c2431 spec: add SPEC-018 managed-agent SYSTEM service host + session broker
LocalSystem service that runs the persistent agent unattended and brokers
per-session capture/input workers (Session 0 can't capture directly).
Unblocks SPEC-016 Phase B end-to-end (SYSTEM-ACL'd cak_ store readable;
removes the Phase B fail-fast guard) and is the broker primitive SPEC-013
builds on. 017 was taken by Mike's end-user-access spec, so this is 018.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 13:13:04 -07:00

9.2 KiB

SPEC-018: Managed-Agent SYSTEM Service Host + Session Broker

Status: Proposed Priority: P1 (blocks SPEC-016 Phase B end-to-end runtime and SPEC-013) Requested By: Mike (2026-06-02) Estimated Effort: X-Large

Overview

Convert the managed/persistent GuruConnect agent from a user-context HKCU\…\Run autostart into a Windows SYSTEM service that runs unattended — at the login screen, with no user logged in, across reboots — and brokers per-session capture/input worker processes into the active interactive desktop. A SYSTEM service lives in the isolated Session 0 and cannot capture or inject the interactive desktop directly, so the service spawns a worker into the target user session (the ScreenConnect architecture).

This is foundational, not cosmetic. It unblocks three things at once:

  1. SPEC-016 Phase B end-to-end runtime — the per-machine cak_ store is ACL'd to SYSTEM + Administrators; today the agent runs as the interactive user and can't read its own store (the Phase B C1 fail-fast guard exists precisely because of this). Running as SYSTEM makes the store readable and removes the guard.
  2. True unattended access — a user-context agent only runs while that user is logged in. Reaching a rebooted server or a machine sitting at the login screen (table-stakes for remote support) requires SYSTEM.
  3. SPEC-013 session selection / backstage — the session-broker primitive built here is the substrate SPEC-013's session-switching UX drives.

Success criteria: the managed agent installs as an auto-start SYSTEM service; it holds the relay connection and performs SPEC-016 enrollment as SYSTEM (reading/writing the SYSTEM-ACL'd cak_); it spawns a capture/input worker into the active interactive session and relays frames; the worker is respawned/retargeted on logon/logoff/console-connect; and the Phase B fail-fast guard is removed because the store is now readable in-context.

Background — why this is needed (confirmed in code)

  • The persistent agent autostarts via HKCU\…\Run (agent/src/startup.rs:21, STARTUP_KEY = HKCU) → interactive-user token, not SYSTEM. The only SYSTEM service today is the separate sas_service (Secure Attention Sequence helper).
  • SPEC-016 Phase B (agent/src/credential_store.rs) ACLs the cak_ store to *S-1-5-18 (SYSTEM) + *S-1-5-32-544 (Administrators). In the current user context the agent writes but cannot read it back → the Phase B fail-fast guard (agent/src/main.rs resolve_agent_credential) emits "must run as the GuruConnect SYSTEM service (see SPEC-018)" instead of bricking.
  • Capture/input live in the agent process (agent/src/capture/, agent/src/input/); a Session-0 SYSTEM service cannot drive these against the interactive desktop without a per-session worker.

Scope

Included in v1

  1. Windows service install/lifecycle (agent/src/install.rs + a new service module): register the managed agent as a LocalSystem auto-start service (CreateServiceW / a service crate), configure failure/recovery (restart on crash), and replace the HKCU Run autostart for managed mode (remove the Run entry on service install). Clean uninstall (stop + delete service).
  2. Service control loop (Session 0, SYSTEM): owns the persistent WSS connection to the relay, performs SPEC-016 enrollment as SYSTEM (now able to read/write the cak_ store), and dispatches session/connect requests to workers. Handles SERVICE_CONTROL_STOP/SHUTDOWN and SERVICE_CONTROL_SESSIONCHANGE.
  3. Session broker: enumerate sessions (WTSEnumerateSessionsW), resolve the active interactive session (WTSGetActiveConsoleSessionId), obtain its user token (WTSQueryUserTokenDuplicateTokenEx), and spawn a per-session capture/input worker into that session's desktop (CreateProcessAsUserW, winsta0\default). The worker does DXGI capture + input injection in the user's session; the service relays frames over the existing transport.
  4. Service ↔ worker IPC: a local, ACL'd channel (named pipe \\.\pipe\guruconnect-<sessionId>) carrying frames/input/control; pipe ACL restricted to SYSTEM + the target session user.
  5. Session-change handling: on logon/logoff/console-connect/disconnect/lock/unlock, (re)spawn or retarget the worker so the active desktop is always the one being served.
  6. Remove the SPEC-016 Phase B fail-fast guard once the service runs as SYSTEM (the store is readable in-context); keep the SYSTEM+Administrators ACL.

Explicitly out of scope (anticipated, separate specs)

  • Session-selection / backstage UX — the operator-facing picker and Session-0/secure-desktop command surface are SPEC-013; this spec only provides the broker primitive it drives.
  • Login-screen / secure-desktop (winlogon) capture beyond the broker hook — the hard Secure-Desktop case is coordinated with SPEC-013; v1 here targets the active interactive session.
  • macOS/Linux service equivalents — future SPEC-010 (cross-platform agents).

Architecture

  • Agent splits into two roles:
    • service-host (LocalSystem, Session 0): service lifecycle, relay transport, SPEC-016 enrollment + cak_ store, session broker, IPC server.
    • session-worker (per interactive session, user token): DXGI/GDI capture, input injection, IPC client. Spawned by the service via CreateProcessAsUserW.
  • Service install (install.rs): CreateServiceW with SERVICE_AUTO_START, SERVICE_WIN32_OWN_PROCESS, recovery actions; uninstall stops + deletes. Replaces managed-mode HKCU Run.
  • Token handoff: WTSGetActiveConsoleSessionIdWTSQueryUserTokenDuplicateTokenEx (primary token) → CreateProcessAsUserW with lpDesktop = "winsta0\\default".
  • IPC: named pipe per session, length-prefixed protobuf (reuse proto/ message types where sensible), pipe security descriptor granting only SYSTEM + the session user.
  • Session events: the service registers for SERVICE_CONTROL_SESSIONCHANGE and reacts to WTS_CONSOLE_CONNECT, WTS_SESSION_LOGON/LOGOFF, WTS_SESSION_LOCK/UNLOCK.

Security considerations

  • LocalSystem is maximal privilege — minimize the service's attack surface; validate every relay-delivered command; never spawn a worker except into a legitimately-enumerated active session.
  • IPC pipe must be ACL'd (SYSTEM + the specific session user only) so a non-admin user can't inject capture/input commands by connecting to the pipe.
  • Token hygiene: close duplicated tokens promptly; don't leak SYSTEM or user primary tokens.
  • The SPEC-016 cak_ store (SYSTEM-ACL'd) is now correctly readable; the fail-fast guard is removed but the ACL stays.
  • Audit: service start/stop, enrollment-as-SYSTEM, worker spawn, session attach/retarget — written to the existing event pipeline.

Implementation details

  • New service module (e.g. agent/src/service/{mod.rs, broker.rs, ipc.rs}); worker entry split out of the current capture path. New Commands variants or an internal --service/--session-worker dispatch in agent/src/main.rs.
  • install.rs: service create/recovery/delete; drop the managed-mode HKCU Run write.
  • windows crate features: Win32_System_Services, Win32_System_RemoteDesktop (WTS*), Win32_Security, Win32_System_Threading (CreateProcessAsUserW), Win32_System_Pipes.
  • Remove the resolve_agent_credential fail-fast guard branch added in SPEC-016 Phase B.

Testing strategy

  • Service: install → auto-start on boot → stop → uninstall on a clean VM.
  • cak_ end-to-end: SYSTEM service enrolls (SPEC-016), stores + reads the cak_, connects — the integration test SPEC-016 Phase B currently cannot run.
  • Session broker: worker spawns into the active session; capture/input work; survives logoff→logon (respawn) and console-connect (retarget); fast-user-switch retarget.
  • Security: non-admin cannot connect to the IPC pipe; worker runs with the user's token (not SYSTEM) in the user's desktop.

Effort estimate & dependencies

  • Size: X-Large (service host + worker split + token-handoff + IPC + session-change handling + install/uninstall).
  • Depends on: SPEC-016 (enrollment + cak_ store); the existing capture/input cores.
  • Unblocks: SPEC-016 Phase B end-to-end runtime (and the parked managed-agent enrollment test on the internal beta machines); SPEC-013 (session selection builds on this broker).

Open questions

  1. Service vs. SYSTEM scheduled task — a true Windows service (recovery, SCM integration) is the standard, robust choice; recommend service. Lock in planning.
  2. One multi-session worker vs. one worker per session — per-session worker is simpler to reason about and isolates a crash to one session; confirm.
  3. IPC transport — named pipe (recommended) vs. local TCP/loopback; pipe ACLing is the cleaner security story.
  4. Login-screen / Secure-Desktop capture — how much (if any) in this spec vs. deferred to SPEC-013 (it needs a worker in the winlogon/secure desktop, a distinct hard problem).
  5. Migration — on upgrade, cleanly transition existing HKCU-Run managed installs to the service (remove the Run entry, install the service) without a gap.