SPEC-018 Phase 1: managed agent as LocalSystem service host #7

Merged
azcomputerguru merged 2 commits from feat/spec-018-service-host into main 2026-06-02 14:25:07 -07:00

Phase 1 of SPEC-018 (managed-agent SYSTEM service host). Service HOST + lifecycle only; the session broker / per-session capture worker / CreateProcessAsUser / IPC are Phase 2 (intentionally absent, seams documented).

What's here

  • LocalSystem Windows service (GuruConnectAgent) via the windows-service crate: SCM dispatcher, control loop, StartPending->Running->Stopped lifecycle, crash recovery (sc failure restart/5000).
  • Hidden service-run subcommand routes SCM launch into the service; runs the existing persistent-agent enroll/connect logic AS SYSTEM (so the SPEC-016 SYSTEM-ACL'd cak_ store is readable).
  • Managed install creates+starts the service and removes the HKCU Run autostart (single autostart, no double-run); non-elevated falls back to in-process. Idempotent uninstall.
  • Graceful stop interrupts a connected session (cancellable session loop) with a clean WS close.

Review

APPROVE WITH NITS -> all findings fixed + focused re-review CONFIRMED CLOSED:

  • H: connected SCM-stop now breaks the session loop + closes WS cleanly (optional shutdown param; non-service modes byte-for-byte unchanged)
  • M: catch_unwind so an agent-runtime panic becomes ServiceSpecific(1), not UB across the FFI service entry
  • L1: bounded retry on ERROR_SERVICE_MARKED_FOR_DELETE; L2/N1/N2 doc+test pins
  • Security verified: ImagePath quoted (no unquoted-path EoP), sc shell-out injection-free, no secrets logged

Unblocks / still pending

Makes the agent run as SYSTEM so SPEC-016 Phase B cak_ store is readable end-to-end. Capture of a desktop needs Phase 2 (session broker + worker). Service install/start/stop + cak_-as-SYSTEM round-trip need a Windows VM/admin to integration-test (not installed on the dev host).

Local verify (Windows host): fmt --check, clippy -D warnings, release build (x86_64-pc-windows-msvc), 58 tests � all green.

Spec: docs/specs/SPEC-018-managed-agent-service-host.md.

Phase 1 of SPEC-018 (managed-agent SYSTEM service host). Service HOST + lifecycle only; the session broker / per-session capture worker / CreateProcessAsUser / IPC are Phase 2 (intentionally absent, seams documented). ## What's here - LocalSystem Windows service (GuruConnectAgent) via the windows-service crate: SCM dispatcher, control loop, StartPending->Running->Stopped lifecycle, crash recovery (sc failure restart/5000). - Hidden `service-run` subcommand routes SCM launch into the service; runs the existing persistent-agent enroll/connect logic AS SYSTEM (so the SPEC-016 SYSTEM-ACL'd cak_ store is readable). - Managed install creates+starts the service and removes the HKCU Run autostart (single autostart, no double-run); non-elevated falls back to in-process. Idempotent uninstall. - Graceful stop interrupts a connected session (cancellable session loop) with a clean WS close. ## Review APPROVE WITH NITS -> all findings fixed + focused re-review CONFIRMED CLOSED: - H: connected SCM-stop now breaks the session loop + closes WS cleanly (optional shutdown param; non-service modes byte-for-byte unchanged) - M: catch_unwind so an agent-runtime panic becomes ServiceSpecific(1), not UB across the FFI service entry - L1: bounded retry on ERROR_SERVICE_MARKED_FOR_DELETE; L2/N1/N2 doc+test pins - Security verified: ImagePath quoted (no unquoted-path EoP), sc shell-out injection-free, no secrets logged ## Unblocks / still pending Makes the agent run as SYSTEM so SPEC-016 Phase B cak_ store is readable end-to-end. Capture of a desktop needs Phase 2 (session broker + worker). Service install/start/stop + cak_-as-SYSTEM round-trip need a Windows VM/admin to integration-test (not installed on the dev host). Local verify (Windows host): fmt --check, clippy -D warnings, release build (x86_64-pc-windows-msvc), 58 tests � all green. Spec: docs/specs/SPEC-018-managed-agent-service-host.md.
azcomputerguru added 2 commits 2026-06-02 14:03:18 -07:00
Run the managed/persistent GuruConnect agent as a LocalSystem Windows
service so it is reachable at the login screen and across reboots, and
so the SPEC-016 per-machine cak_ store (ACL-restricted to SYSTEM +
Administrators) is finally readable in-context.

Phase 1 scope (host + lifecycle only):
- New agent/src/service/mod.rs: registers "GuruConnectAgent" with the
  SCM via the windows-service dispatcher, reports a correct lifecycle
  (StartPending -> Running -> StopPending -> Stopped), handles
  Stop/Shutdown via an AtomicBool the agent loop polls (graceful WS
  close), and provides install/uninstall/start (LocalSystem, AutoStart,
  sc-failure crash recovery). Idempotent install/uninstall.
- main.rs: hidden `service-run` subcommand routes the SCM-launched
  process into the dispatcher; new run_managed_agent_service() runs the
  existing RunMode::PermanentAgent logic (resolve/enroll cak_, hold the
  relay) as SYSTEM. run_agent() now takes an optional SCM shutdown flag,
  skips the HKCU Run autostart and the tray when run as the service, and
  interrupts the reconnect backoff promptly on stop. An interactive
  launch of a managed binary now installs+starts the service and exits
  instead of double-running.
- install.rs: a managed install (embedded config present) installs the
  LocalSystem service as the single autostart and removes the legacy
  HKCU Run entry; uninstall stops+deletes the service (idempotent).
  Attended/viewer installs are untouched.
- Kept the SPEC-016 Phase B fail-fast guard as a harmless safety net for
  any non-SYSTEM invocation; updated its comment to name this service as
  the managed run context.

Phase 2 NOT built (seams documented): session broker, per-session
capture/input worker, CreateProcessAsUserW token handoff, service/worker
IPC, and SERVICE_CONTROL_SESSIONCHANGE. Phase 1 enrolls/connects as
SYSTEM but does not capture a desktop (a Session-0 process cannot).

No service is installed/started on the dev host; that is a VM/admin
integration step. fmt + clippy -D warnings + release build + 55 tests
all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(agent): SPEC-018 Phase 1 review fixes (cancellable session loop, panic guard, service-create retry)
All checks were successful
Build and Test / Build Agent (Windows) (pull_request) Successful in 10m23s
Build and Test / Build Server (Linux) (pull_request) Successful in 14m47s
Build and Test / Security Audit (pull_request) Successful in 5m29s
Build and Test / Build Summary (pull_request) Successful in 20s
a0e0d5f1e7
H: thread the SCM cooperative-stop flag into the connected session loop
(run_with_tray) via a new Option<&Arc<AtomicBool>> param. The flag was only
observed by the outer run_agent reconnect loop, which never runs while a
session is connected, so an SCM Stop/Shutdown left the service Running until
force-kill. The inner loop now checks it each tick, closes the WS cleanly, and
returns the SERVICE_STOP sentinel that the outer loop maps to a graceful stop.
The new param is optional: attended/viewer/interactive callers pass None and
behave exactly as before.

M: wrap the managed-agent runtime block_on in catch_unwind(AssertUnwindSafe) so
a panic in the agent future cannot unwind across the extern "system" service
entry (UB/abort). A caught panic becomes an Err -> ServiceExitCode::ServiceSpecific(1)
so SCM recovery engages cleanly.

L1: replace the fixed 2s sleep after delete() on reinstall with a bounded retry
on CreateService returning ERROR_SERVICE_MARKED_FOR_DELETE (1072), gated on
having actually deleted a prior instance.

L2: clarify the --elevated -> force_user_install mapping (comment only).

N1: add a clap-metadata test pinning the service-run subcommand name to
SERVICE_RUN_ARG, cross-linked from the existing literal test.

N2: correct the service doc comments now that graceful stop interrupts the
connected case too.

Verified on Windows host: cargo fmt --check, clippy -D warnings, release build
(x86_64-pc-windows-msvc), and cargo test (58 passed) all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
azcomputerguru merged commit 11af9dff8e into main 2026-06-02 14:25:07 -07:00
azcomputerguru deleted branch feat/spec-018-service-host 2026-06-02 14:25:11 -07:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: azcomputerguru/guru-connect#7