Commit Graph

2 Commits

Author SHA1 Message Date
a0e0d5f1e7 fix(agent): SPEC-018 Phase 1 review fixes (cancellable session loop, panic guard, service-create retry)
All checks were successful
Build and Test / Build Agent (Windows) (pull_request) Successful in 10m23s
Build and Test / Build Server (Linux) (pull_request) Successful in 14m47s
Build and Test / Security Audit (pull_request) Successful in 5m29s
Build and Test / Build Summary (pull_request) Successful in 20s
H: thread the SCM cooperative-stop flag into the connected session loop
(run_with_tray) via a new Option<&Arc<AtomicBool>> param. The flag was only
observed by the outer run_agent reconnect loop, which never runs while a
session is connected, so an SCM Stop/Shutdown left the service Running until
force-kill. The inner loop now checks it each tick, closes the WS cleanly, and
returns the SERVICE_STOP sentinel that the outer loop maps to a graceful stop.
The new param is optional: attended/viewer/interactive callers pass None and
behave exactly as before.

M: wrap the managed-agent runtime block_on in catch_unwind(AssertUnwindSafe) so
a panic in the agent future cannot unwind across the extern "system" service
entry (UB/abort). A caught panic becomes an Err -> ServiceExitCode::ServiceSpecific(1)
so SCM recovery engages cleanly.

L1: replace the fixed 2s sleep after delete() on reinstall with a bounded retry
on CreateService returning ERROR_SERVICE_MARKED_FOR_DELETE (1072), gated on
having actually deleted a prior instance.

L2: clarify the --elevated -> force_user_install mapping (comment only).

N1: add a clap-metadata test pinning the service-run subcommand name to
SERVICE_RUN_ARG, cross-linked from the existing literal test.

N2: correct the service doc comments now that graceful stop interrupts the
connected case too.

Verified on Windows host: cargo fmt --check, clippy -D warnings, release build
(x86_64-pc-windows-msvc), and cargo test (58 passed) all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 13:57:41 -07:00
7602b4346a feat(agent): SPEC-018 Phase 1 managed-agent SYSTEM service host
Run the managed/persistent GuruConnect agent as a LocalSystem Windows
service so it is reachable at the login screen and across reboots, and
so the SPEC-016 per-machine cak_ store (ACL-restricted to SYSTEM +
Administrators) is finally readable in-context.

Phase 1 scope (host + lifecycle only):
- New agent/src/service/mod.rs: registers "GuruConnectAgent" with the
  SCM via the windows-service dispatcher, reports a correct lifecycle
  (StartPending -> Running -> StopPending -> Stopped), handles
  Stop/Shutdown via an AtomicBool the agent loop polls (graceful WS
  close), and provides install/uninstall/start (LocalSystem, AutoStart,
  sc-failure crash recovery). Idempotent install/uninstall.
- main.rs: hidden `service-run` subcommand routes the SCM-launched
  process into the dispatcher; new run_managed_agent_service() runs the
  existing RunMode::PermanentAgent logic (resolve/enroll cak_, hold the
  relay) as SYSTEM. run_agent() now takes an optional SCM shutdown flag,
  skips the HKCU Run autostart and the tray when run as the service, and
  interrupts the reconnect backoff promptly on stop. An interactive
  launch of a managed binary now installs+starts the service and exits
  instead of double-running.
- install.rs: a managed install (embedded config present) installs the
  LocalSystem service as the single autostart and removes the legacy
  HKCU Run entry; uninstall stops+deletes the service (idempotent).
  Attended/viewer installs are untouched.
- Kept the SPEC-016 Phase B fail-fast guard as a harmless safety net for
  any non-SYSTEM invocation; updated its comment to name this service as
  the managed run context.

Phase 2 NOT built (seams documented): session broker, per-session
capture/input worker, CreateProcessAsUserW token handoff, service/worker
IPC, and SERVICE_CONTROL_SESSIONCHANGE. Phase 1 enrolls/connects as
SYSTEM but does not capture a desktop (a Session-0 process cannot).

No service is installed/started on the dev host; that is a VM/admin
integration step. fmt + clippy -D warnings + release build + 55 tests
all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 13:43:01 -07:00