spec: add SPEC-018 managed-agent SYSTEM service host + session broker
LocalSystem service that runs the persistent agent unattended and brokers per-session capture/input workers (Session 0 can't capture directly). Unblocks SPEC-016 Phase B end-to-end (SYSTEM-ACL'd cak_ store readable; removes the Phase B fail-fast guard) and is the broker primitive SPEC-013 builds on. 017 was taken by Mike's end-user-access spec, so this is 018. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -68,6 +68,7 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
|
||||
- [x] Protobuf-over-WSS transport, Zstd frame compression
|
||||
- [~] React/TS web viewer (`dashboard/src/components/RemoteViewer.tsx`) — embeddable session viewer
|
||||
- [ ] **Headless Linux mode (direct TTY access)** — P2 — Terminal-based remote access for Linux servers without GUI. PTY spawn (`openpty`), xterm.js web viewer, full ANSI/VT100 support. Enables server management, container debugging, emergency recovery via GuruConnect dashboard with audit logging. SSH replacement with centralized auth. ([SPEC-012](specs/SPEC-012-headless-linux-tty.md))
|
||||
- [ ] **Managed-agent SYSTEM service host + session broker** — P1 — convert the persistent agent from `HKCU Run` (user context) to a LocalSystem **service** that runs unattended (login screen, no user, across reboots) and spawns a per-session capture/input worker into the active desktop (Session 0 can't capture directly). Unblocks SPEC-016 Phase B end-to-end (the SYSTEM-ACL'd `cak_` store becomes readable; removes the Phase B fail-fast guard), enables true unattended access, and is the **broker primitive SPEC-013 builds on**. ([SPEC-018](specs/SPEC-018-managed-agent-service-host.md))
|
||||
- [ ] **Windows session selection and backstage mode** — P2 — Enumerate and switch between Windows user sessions (Terminal Services/RDP/Fast User Switching) and access Session 0 (backstage) for system-level admin tasks. ScreenConnect parity: session selector shows all logged-on users, instant switching without reconnect. Backstage mode provides terminal/command interface for services management without disrupting any user desktop. Critical for multi-user server environments. ([SPEC-013](specs/SPEC-013-session-selection-and-backstage.md))
|
||||
- [ ] **Configurable notification overlay on viewer connection** — P2 — Display a semi-transparent on-screen notification when a technician connects, showing technician name and company. Dashboard-configurable message template (supports `{{technician_name}}`, `{{company}}`, `{{time}}`), duration (5-60s), position (top-left/right, bottom-left/right, center), and dismissible behavior. Increases transparency and user awareness during remote support sessions. Compliance-friendly for privacy policies requiring user notification. ([SPEC-015](specs/SPEC-015-notification-overlay.md))
|
||||
- [ ] Multi-monitor switching — P2
|
||||
|
||||
146
docs/specs/SPEC-018-managed-agent-service-host.md
Normal file
146
docs/specs/SPEC-018-managed-agent-service-host.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# SPEC-018: Managed-Agent SYSTEM Service Host + Session Broker
|
||||
|
||||
**Status:** Proposed
|
||||
**Priority:** P1 (blocks SPEC-016 Phase B end-to-end runtime and SPEC-013)
|
||||
**Requested By:** Mike (2026-06-02)
|
||||
**Estimated Effort:** X-Large
|
||||
|
||||
## Overview
|
||||
|
||||
Convert the managed/persistent GuruConnect agent from a user-context `HKCU\…\Run` autostart into a
|
||||
**Windows SYSTEM service** that runs unattended — at the login screen, with no user logged in, across
|
||||
reboots — and **brokers per-session capture/input worker processes** into the active interactive
|
||||
desktop. A SYSTEM service lives in the isolated **Session 0** and cannot capture or inject the
|
||||
interactive desktop directly, so the service spawns a worker into the target user session (the
|
||||
ScreenConnect architecture).
|
||||
|
||||
This is foundational, not cosmetic. It unblocks three things at once:
|
||||
1. **SPEC-016 Phase B end-to-end runtime** — the per-machine `cak_` store is ACL'd to SYSTEM +
|
||||
Administrators; today the agent runs as the interactive *user* and can't read its own store (the
|
||||
Phase B C1 *fail-fast guard* exists precisely because of this). Running as SYSTEM makes the store
|
||||
readable and removes the guard.
|
||||
2. **True unattended access** — a user-context agent only runs while that user is logged in. Reaching
|
||||
a rebooted server or a machine sitting at the login screen (table-stakes for remote support)
|
||||
requires SYSTEM.
|
||||
3. **SPEC-013 session selection / backstage** — the session-broker primitive built here is the
|
||||
substrate SPEC-013's session-switching UX drives.
|
||||
|
||||
**Success criteria:** the managed agent installs as an auto-start SYSTEM service; it holds the relay
|
||||
connection and performs SPEC-016 enrollment as SYSTEM (reading/writing the SYSTEM-ACL'd `cak_`); it
|
||||
spawns a capture/input worker into the active interactive session and relays frames; the worker is
|
||||
respawned/retargeted on logon/logoff/console-connect; and the Phase B fail-fast guard is removed
|
||||
because the store is now readable in-context.
|
||||
|
||||
## Background — why this is needed (confirmed in code)
|
||||
|
||||
- The persistent agent autostarts via `HKCU\…\Run` (`agent/src/startup.rs:21`, `STARTUP_KEY` = HKCU)
|
||||
→ interactive-user token, not SYSTEM. The only SYSTEM service today is the separate `sas_service`
|
||||
(Secure Attention Sequence helper).
|
||||
- SPEC-016 Phase B (`agent/src/credential_store.rs`) ACLs the `cak_` store to `*S-1-5-18` (SYSTEM) +
|
||||
`*S-1-5-32-544` (Administrators). In the current user context the agent writes but cannot read it
|
||||
back → the Phase B fail-fast guard (`agent/src/main.rs` `resolve_agent_credential`) emits
|
||||
"must run as the GuruConnect SYSTEM service (see SPEC-018)" instead of bricking.
|
||||
- Capture/input live in the agent process (`agent/src/capture/`, `agent/src/input/`); a Session-0
|
||||
SYSTEM service cannot drive these against the interactive desktop without a per-session worker.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included in v1
|
||||
|
||||
1. **Windows service install/lifecycle** (`agent/src/install.rs` + a new service module): register the
|
||||
managed agent as a **LocalSystem auto-start service** (`CreateServiceW` / a service crate),
|
||||
configure failure/recovery (restart on crash), and **replace the HKCU `Run` autostart for managed
|
||||
mode** (remove the Run entry on service install). Clean uninstall (stop + delete service).
|
||||
2. **Service control loop** (Session 0, SYSTEM): owns the persistent WSS connection to the relay,
|
||||
performs SPEC-016 enrollment as SYSTEM (now able to read/write the `cak_` store), and dispatches
|
||||
session/connect requests to workers. Handles `SERVICE_CONTROL_STOP`/`SHUTDOWN` and
|
||||
`SERVICE_CONTROL_SESSIONCHANGE`.
|
||||
3. **Session broker:** enumerate sessions (`WTSEnumerateSessionsW`), resolve the active interactive
|
||||
session (`WTSGetActiveConsoleSessionId`), obtain its user token (`WTSQueryUserToken` →
|
||||
`DuplicateTokenEx`), and spawn a **per-session capture/input worker** into that session's desktop
|
||||
(`CreateProcessAsUserW`, `winsta0\default`). The worker does DXGI capture + input injection in the
|
||||
user's session; the service relays frames over the existing transport.
|
||||
4. **Service ↔ worker IPC:** a local, ACL'd channel (named pipe `\\.\pipe\guruconnect-<sessionId>`)
|
||||
carrying frames/input/control; pipe ACL restricted to SYSTEM + the target session user.
|
||||
5. **Session-change handling:** on logon/logoff/console-connect/disconnect/lock/unlock, (re)spawn or
|
||||
retarget the worker so the active desktop is always the one being served.
|
||||
6. **Remove the SPEC-016 Phase B fail-fast guard** once the service runs as SYSTEM (the store is
|
||||
readable in-context); keep the SYSTEM+Administrators ACL.
|
||||
|
||||
### Explicitly out of scope (anticipated, separate specs)
|
||||
|
||||
- **Session-selection / backstage UX** — the operator-facing picker and Session-0/secure-desktop
|
||||
command surface are **SPEC-013**; this spec only provides the broker primitive it drives.
|
||||
- **Login-screen / secure-desktop (winlogon) capture** beyond the broker hook — the hard
|
||||
Secure-Desktop case is coordinated with SPEC-013; v1 here targets the active interactive session.
|
||||
- **macOS/Linux service equivalents** — future SPEC-010 (cross-platform agents).
|
||||
|
||||
## Architecture
|
||||
|
||||
- **Agent splits into two roles:**
|
||||
- **service-host** (LocalSystem, Session 0): service lifecycle, relay transport, SPEC-016
|
||||
enrollment + `cak_` store, session broker, IPC server.
|
||||
- **session-worker** (per interactive session, user token): DXGI/GDI capture, input injection,
|
||||
IPC client. Spawned by the service via `CreateProcessAsUserW`.
|
||||
- **Service install** (`install.rs`): `CreateServiceW` with `SERVICE_AUTO_START`, `SERVICE_WIN32_OWN_PROCESS`,
|
||||
recovery actions; uninstall stops + deletes. Replaces managed-mode `HKCU Run`.
|
||||
- **Token handoff:** `WTSGetActiveConsoleSessionId` → `WTSQueryUserToken` → `DuplicateTokenEx`
|
||||
(primary token) → `CreateProcessAsUserW` with `lpDesktop = "winsta0\\default"`.
|
||||
- **IPC:** named pipe per session, length-prefixed protobuf (reuse `proto/` message types where
|
||||
sensible), pipe security descriptor granting only SYSTEM + the session user.
|
||||
- **Session events:** the service registers for `SERVICE_CONTROL_SESSIONCHANGE` and reacts to
|
||||
`WTS_CONSOLE_CONNECT`, `WTS_SESSION_LOGON/LOGOFF`, `WTS_SESSION_LOCK/UNLOCK`.
|
||||
|
||||
## Security considerations
|
||||
|
||||
- **LocalSystem is maximal privilege** — minimize the service's attack surface; validate every
|
||||
relay-delivered command; never spawn a worker except into a legitimately-enumerated active session.
|
||||
- **IPC pipe must be ACL'd** (SYSTEM + the specific session user only) so a non-admin user can't
|
||||
inject capture/input commands by connecting to the pipe.
|
||||
- **Token hygiene:** close duplicated tokens promptly; don't leak SYSTEM or user primary tokens.
|
||||
- The SPEC-016 `cak_` store (SYSTEM-ACL'd) is now correctly readable; the fail-fast guard is removed
|
||||
but the ACL stays.
|
||||
- **Audit:** service start/stop, enrollment-as-SYSTEM, worker spawn, session attach/retarget — written
|
||||
to the existing event pipeline.
|
||||
|
||||
## Implementation details
|
||||
|
||||
- New service module (e.g. `agent/src/service/{mod.rs, broker.rs, ipc.rs}`); worker entry split out of
|
||||
the current capture path. New `Commands` variants or an internal `--service`/`--session-worker`
|
||||
dispatch in `agent/src/main.rs`.
|
||||
- `install.rs`: service create/recovery/delete; drop the managed-mode HKCU `Run` write.
|
||||
- `windows` crate features: `Win32_System_Services`, `Win32_System_RemoteDesktop`
|
||||
(`WTS*`), `Win32_Security`, `Win32_System_Threading` (`CreateProcessAsUserW`),
|
||||
`Win32_System_Pipes`.
|
||||
- Remove the `resolve_agent_credential` fail-fast guard branch added in SPEC-016 Phase B.
|
||||
|
||||
## Testing strategy
|
||||
|
||||
- **Service:** install → auto-start on boot → stop → uninstall on a clean VM.
|
||||
- **`cak_` end-to-end:** SYSTEM service enrolls (SPEC-016), stores + reads the `cak_`, connects — the
|
||||
integration test SPEC-016 Phase B currently cannot run.
|
||||
- **Session broker:** worker spawns into the active session; capture/input work; survives logoff→logon
|
||||
(respawn) and console-connect (retarget); fast-user-switch retarget.
|
||||
- **Security:** non-admin cannot connect to the IPC pipe; worker runs with the user's token (not
|
||||
SYSTEM) in the user's desktop.
|
||||
|
||||
## Effort estimate & dependencies
|
||||
|
||||
- **Size:** X-Large (service host + worker split + token-handoff + IPC + session-change handling +
|
||||
install/uninstall).
|
||||
- **Depends on:** SPEC-016 (enrollment + `cak_` store); the existing capture/input cores.
|
||||
- **Unblocks:** SPEC-016 Phase B end-to-end runtime (and the parked managed-agent enrollment test on
|
||||
the internal beta machines); **SPEC-013** (session selection builds on this broker).
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Service vs. SYSTEM scheduled task** — a true Windows service (recovery, SCM integration) is the
|
||||
standard, robust choice; recommend service. Lock in planning.
|
||||
2. **One multi-session worker vs. one worker per session** — per-session worker is simpler to reason
|
||||
about and isolates a crash to one session; confirm.
|
||||
3. **IPC transport** — named pipe (recommended) vs. local TCP/loopback; pipe ACLing is the cleaner
|
||||
security story.
|
||||
4. **Login-screen / Secure-Desktop capture** — how much (if any) in this spec vs. deferred to SPEC-013
|
||||
(it needs a worker in the winlogon/secure desktop, a distinct hard problem).
|
||||
5. **Migration** — on upgrade, cleanly transition existing HKCU-`Run` managed installs to the service
|
||||
(remove the Run entry, install the service) without a gap.
|
||||
Reference in New Issue
Block a user