spec: add SPEC-018 managed-agent SYSTEM service host + session broker

LocalSystem service that runs the persistent agent unattended and brokers
per-session capture/input workers (Session 0 can't capture directly).
Unblocks SPEC-016 Phase B end-to-end (SYSTEM-ACL'd cak_ store readable;
removes the Phase B fail-fast guard) and is the broker primitive SPEC-013
builds on. 017 was taken by Mike's end-user-access spec, so this is 018.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-02 13:13:04 -07:00
parent 4c49b73a71
commit 94c07c2431
2 changed files with 147 additions and 0 deletions

View File

@@ -68,6 +68,7 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
- [x] Protobuf-over-WSS transport, Zstd frame compression
- [~] React/TS web viewer (`dashboard/src/components/RemoteViewer.tsx`) — embeddable session viewer
- [ ] **Headless Linux mode (direct TTY access)** — P2 — Terminal-based remote access for Linux servers without GUI. PTY spawn (`openpty`), xterm.js web viewer, full ANSI/VT100 support. Enables server management, container debugging, emergency recovery via GuruConnect dashboard with audit logging. SSH replacement with centralized auth. ([SPEC-012](specs/SPEC-012-headless-linux-tty.md))
- [ ] **Managed-agent SYSTEM service host + session broker** — P1 — convert the persistent agent from `HKCU Run` (user context) to a LocalSystem **service** that runs unattended (login screen, no user, across reboots) and spawns a per-session capture/input worker into the active desktop (Session 0 can't capture directly). Unblocks SPEC-016 Phase B end-to-end (the SYSTEM-ACL'd `cak_` store becomes readable; removes the Phase B fail-fast guard), enables true unattended access, and is the **broker primitive SPEC-013 builds on**. ([SPEC-018](specs/SPEC-018-managed-agent-service-host.md))
- [ ] **Windows session selection and backstage mode** — P2 — Enumerate and switch between Windows user sessions (Terminal Services/RDP/Fast User Switching) and access Session 0 (backstage) for system-level admin tasks. ScreenConnect parity: session selector shows all logged-on users, instant switching without reconnect. Backstage mode provides terminal/command interface for services management without disrupting any user desktop. Critical for multi-user server environments. ([SPEC-013](specs/SPEC-013-session-selection-and-backstage.md))
- [ ] **Configurable notification overlay on viewer connection** — P2 — Display a semi-transparent on-screen notification when a technician connects, showing technician name and company. Dashboard-configurable message template (supports `{{technician_name}}`, `{{company}}`, `{{time}}`), duration (5-60s), position (top-left/right, bottom-left/right, center), and dismissible behavior. Increases transparency and user awareness during remote support sessions. Compliance-friendly for privacy policies requiring user notification. ([SPEC-015](specs/SPEC-015-notification-overlay.md))
- [ ] Multi-monitor switching — P2

View File

@@ -0,0 +1,146 @@
# SPEC-018: Managed-Agent SYSTEM Service Host + Session Broker
**Status:** Proposed
**Priority:** P1 (blocks SPEC-016 Phase B end-to-end runtime and SPEC-013)
**Requested By:** Mike (2026-06-02)
**Estimated Effort:** X-Large
## Overview
Convert the managed/persistent GuruConnect agent from a user-context `HKCU\…\Run` autostart into a
**Windows SYSTEM service** that runs unattended — at the login screen, with no user logged in, across
reboots — and **brokers per-session capture/input worker processes** into the active interactive
desktop. A SYSTEM service lives in the isolated **Session 0** and cannot capture or inject the
interactive desktop directly, so the service spawns a worker into the target user session (the
ScreenConnect architecture).
This is foundational, not cosmetic. It unblocks three things at once:
1. **SPEC-016 Phase B end-to-end runtime** — the per-machine `cak_` store is ACL'd to SYSTEM +
Administrators; today the agent runs as the interactive *user* and can't read its own store (the
Phase B C1 *fail-fast guard* exists precisely because of this). Running as SYSTEM makes the store
readable and removes the guard.
2. **True unattended access** — a user-context agent only runs while that user is logged in. Reaching
a rebooted server or a machine sitting at the login screen (table-stakes for remote support)
requires SYSTEM.
3. **SPEC-013 session selection / backstage** — the session-broker primitive built here is the
substrate SPEC-013's session-switching UX drives.
**Success criteria:** the managed agent installs as an auto-start SYSTEM service; it holds the relay
connection and performs SPEC-016 enrollment as SYSTEM (reading/writing the SYSTEM-ACL'd `cak_`); it
spawns a capture/input worker into the active interactive session and relays frames; the worker is
respawned/retargeted on logon/logoff/console-connect; and the Phase B fail-fast guard is removed
because the store is now readable in-context.
## Background — why this is needed (confirmed in code)
- The persistent agent autostarts via `HKCU\…\Run` (`agent/src/startup.rs:21`, `STARTUP_KEY` = HKCU)
→ interactive-user token, not SYSTEM. The only SYSTEM service today is the separate `sas_service`
(Secure Attention Sequence helper).
- SPEC-016 Phase B (`agent/src/credential_store.rs`) ACLs the `cak_` store to `*S-1-5-18` (SYSTEM) +
`*S-1-5-32-544` (Administrators). In the current user context the agent writes but cannot read it
back → the Phase B fail-fast guard (`agent/src/main.rs` `resolve_agent_credential`) emits
"must run as the GuruConnect SYSTEM service (see SPEC-018)" instead of bricking.
- Capture/input live in the agent process (`agent/src/capture/`, `agent/src/input/`); a Session-0
SYSTEM service cannot drive these against the interactive desktop without a per-session worker.
## Scope
### Included in v1
1. **Windows service install/lifecycle** (`agent/src/install.rs` + a new service module): register the
managed agent as a **LocalSystem auto-start service** (`CreateServiceW` / a service crate),
configure failure/recovery (restart on crash), and **replace the HKCU `Run` autostart for managed
mode** (remove the Run entry on service install). Clean uninstall (stop + delete service).
2. **Service control loop** (Session 0, SYSTEM): owns the persistent WSS connection to the relay,
performs SPEC-016 enrollment as SYSTEM (now able to read/write the `cak_` store), and dispatches
session/connect requests to workers. Handles `SERVICE_CONTROL_STOP`/`SHUTDOWN` and
`SERVICE_CONTROL_SESSIONCHANGE`.
3. **Session broker:** enumerate sessions (`WTSEnumerateSessionsW`), resolve the active interactive
session (`WTSGetActiveConsoleSessionId`), obtain its user token (`WTSQueryUserToken`
`DuplicateTokenEx`), and spawn a **per-session capture/input worker** into that session's desktop
(`CreateProcessAsUserW`, `winsta0\default`). The worker does DXGI capture + input injection in the
user's session; the service relays frames over the existing transport.
4. **Service ↔ worker IPC:** a local, ACL'd channel (named pipe `\\.\pipe\guruconnect-<sessionId>`)
carrying frames/input/control; pipe ACL restricted to SYSTEM + the target session user.
5. **Session-change handling:** on logon/logoff/console-connect/disconnect/lock/unlock, (re)spawn or
retarget the worker so the active desktop is always the one being served.
6. **Remove the SPEC-016 Phase B fail-fast guard** once the service runs as SYSTEM (the store is
readable in-context); keep the SYSTEM+Administrators ACL.
### Explicitly out of scope (anticipated, separate specs)
- **Session-selection / backstage UX** — the operator-facing picker and Session-0/secure-desktop
command surface are **SPEC-013**; this spec only provides the broker primitive it drives.
- **Login-screen / secure-desktop (winlogon) capture** beyond the broker hook — the hard
Secure-Desktop case is coordinated with SPEC-013; v1 here targets the active interactive session.
- **macOS/Linux service equivalents** — future SPEC-010 (cross-platform agents).
## Architecture
- **Agent splits into two roles:**
- **service-host** (LocalSystem, Session 0): service lifecycle, relay transport, SPEC-016
enrollment + `cak_` store, session broker, IPC server.
- **session-worker** (per interactive session, user token): DXGI/GDI capture, input injection,
IPC client. Spawned by the service via `CreateProcessAsUserW`.
- **Service install** (`install.rs`): `CreateServiceW` with `SERVICE_AUTO_START`, `SERVICE_WIN32_OWN_PROCESS`,
recovery actions; uninstall stops + deletes. Replaces managed-mode `HKCU Run`.
- **Token handoff:** `WTSGetActiveConsoleSessionId``WTSQueryUserToken``DuplicateTokenEx`
(primary token) → `CreateProcessAsUserW` with `lpDesktop = "winsta0\\default"`.
- **IPC:** named pipe per session, length-prefixed protobuf (reuse `proto/` message types where
sensible), pipe security descriptor granting only SYSTEM + the session user.
- **Session events:** the service registers for `SERVICE_CONTROL_SESSIONCHANGE` and reacts to
`WTS_CONSOLE_CONNECT`, `WTS_SESSION_LOGON/LOGOFF`, `WTS_SESSION_LOCK/UNLOCK`.
## Security considerations
- **LocalSystem is maximal privilege** — minimize the service's attack surface; validate every
relay-delivered command; never spawn a worker except into a legitimately-enumerated active session.
- **IPC pipe must be ACL'd** (SYSTEM + the specific session user only) so a non-admin user can't
inject capture/input commands by connecting to the pipe.
- **Token hygiene:** close duplicated tokens promptly; don't leak SYSTEM or user primary tokens.
- The SPEC-016 `cak_` store (SYSTEM-ACL'd) is now correctly readable; the fail-fast guard is removed
but the ACL stays.
- **Audit:** service start/stop, enrollment-as-SYSTEM, worker spawn, session attach/retarget — written
to the existing event pipeline.
## Implementation details
- New service module (e.g. `agent/src/service/{mod.rs, broker.rs, ipc.rs}`); worker entry split out of
the current capture path. New `Commands` variants or an internal `--service`/`--session-worker`
dispatch in `agent/src/main.rs`.
- `install.rs`: service create/recovery/delete; drop the managed-mode HKCU `Run` write.
- `windows` crate features: `Win32_System_Services`, `Win32_System_RemoteDesktop`
(`WTS*`), `Win32_Security`, `Win32_System_Threading` (`CreateProcessAsUserW`),
`Win32_System_Pipes`.
- Remove the `resolve_agent_credential` fail-fast guard branch added in SPEC-016 Phase B.
## Testing strategy
- **Service:** install → auto-start on boot → stop → uninstall on a clean VM.
- **`cak_` end-to-end:** SYSTEM service enrolls (SPEC-016), stores + reads the `cak_`, connects — the
integration test SPEC-016 Phase B currently cannot run.
- **Session broker:** worker spawns into the active session; capture/input work; survives logoff→logon
(respawn) and console-connect (retarget); fast-user-switch retarget.
- **Security:** non-admin cannot connect to the IPC pipe; worker runs with the user's token (not
SYSTEM) in the user's desktop.
## Effort estimate & dependencies
- **Size:** X-Large (service host + worker split + token-handoff + IPC + session-change handling +
install/uninstall).
- **Depends on:** SPEC-016 (enrollment + `cak_` store); the existing capture/input cores.
- **Unblocks:** SPEC-016 Phase B end-to-end runtime (and the parked managed-agent enrollment test on
the internal beta machines); **SPEC-013** (session selection builds on this broker).
## Open questions
1. **Service vs. SYSTEM scheduled task** — a true Windows service (recovery, SCM integration) is the
standard, robust choice; recommend service. Lock in planning.
2. **One multi-session worker vs. one worker per session** — per-session worker is simpler to reason
about and isolates a crash to one session; confirm.
3. **IPC transport** — named pipe (recommended) vs. local TCP/loopback; pipe ACLing is the cleaner
security story.
4. **Login-screen / Secure-Desktop capture** — how much (if any) in this spec vs. deferred to SPEC-013
(it needs a worker in the winlogon/secure desktop, a distinct hard problem).
5. **Migration** — on upgrade, cleanly transition existing HKCU-`Run` managed installs to the service
(remove the Run entry, install the service) without a gap.