# SPEC-016: Zero-Touch Per-Site Agent Enrollment **Status:** Proposed **Priority:** P1 **Requested By:** Mike (2026-06-02) **Estimated Effort:** X-Large ## Overview Give GuruConnect a ScreenConnect-class managed-agent enrollment flow: a technician runs **one signed installer per site** on every machine at that site — no per-machine key minting, no flags, no typing — and each machine **self-registers** on first run, the server minting it a per-machine `cak_` key bound to a stable, machine-derived `machine_uid`. Each site installer carries a **rotatable per-site enrollment key** (a long server-generated secret) plus a short human-readable **fingerprint** (`vN (XXXX)`) so an operator can tell at a glance whether an installer is current. Rotating a site's key blocks *new* enrollments from old installers while leaving already-enrolled machines untouched (they hold their own `cak_`). This is the missing piece that turns the v2 secure-session-core (SPEC-004 per-agent keys + `machine_uid`) into a real product workflow, and it **resolves SPEC-007's open signature-vs-appended-config question**: the agent binary is signed **once** in CI (already shipped via `release.yml`), and per-site customization rides in a thin **signed wrapper** that writes site config to the endpoint at install time — never appended into the signed PE. **Success criteria:** 1. A tech installs one site installer on N machines; all N appear in the console under the correct company/site, each as a distinct, deduplicated machine — zero per-machine setup. 2. Re-installing / re-imaging the same hardware **reuses** the existing machine row (no ghost duplicates — the failure mode SPEC-004 documents). 3. Rotating a site's enrollment key makes old installers unable to enroll new machines, while every already-enrolled agent keeps working. 4. Every distributed installer is **validly Authenticode-signed** (SmartScreen/WDAC clean). ## Background — what exists today (confirmed in code) - **Embedded config is append-based and breaks signing.** `server/src/api/downloads.rs` (`download_agent`, ~`:152`) reads `static/downloads/guruconnect.exe` and **appends** `MAGIC_MARKER` + `len:u32` + JSON (`:196`) to the end of the PE. The agent reads it back in `agent/src/config.rs` (`read_embedded_config`, `:223`). Appending bytes after a signed PE invalidates the Authenticode signature — so the current customization path and the newly-shipped CI signing are mutually exclusive. - **No self-registration exists.** Per-agent `cak_` keys are minted **admin-only** in `server/src/api/machine_keys.rs` (`create_key`, `:119`; "Admin issued a per-agent key", `:146`). There is no endpoint where an agent first-run exchanges an enrollment credential for its own key. - **Relay already accepts per-agent keys.** `server/src/relay/mod.rs` (`validate_agent_api_key`, `:417`) calls `crate::auth::agent_keys::verify_agent_key` (`:422`) — the `cak_` path — then falls back to the **deprecated** shared `AGENT_API_KEY` (`:444`, logs a "migrate to per-agent `cak_`" warning). - **Key primitives exist.** `server/src/auth/agent_keys.rs`: `generate_agent_key` mints a `cak_`-prefixed high-entropy key (`:36`/`:46`); `verify_agent_key` (`:71`). `server/src/db/agent_keys.rs` already inserts into `connect_agent_keys (machine_id, key_hash, tenant_id)` (`:47`) — the v2 tenancy column is present (migration `004_v2_secure_session_core.sql`). - **Identity is a random config UUID, not machine-derived** — the root cause of duplicates per SPEC-004 (`agent/src/config.rs` `generate_agent_id`, `:90`). - **Agent mode dispatch:** `agent/src/main.rs` `Commands::Install` (`:160`) → `run_install`; `agent/src/config.rs` `detect_run_mode` (`:162`) returns `RunMode::PermanentAgent` when embedded config is present. ## Scope ### Included in v1 (CORE) 1. **`machine_uid` — deterministic machine identity (hardware-salted, per-tenant).** Derive a stable id from the Windows `MachineGuid` (`HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid`) **salted with stable hardware signals** (SMBIOS UUID / motherboard + disk serial), independent of the config-file `agent_id`. Hardware-derived salt is deliberate: it **survives an OS reinstall/re-image on the same hardware** (so the row is reused — the re-image dedup goal) while keeping distinct physical boxes distinct (a per-install *random* salt would break re-image dedup and is rejected). Uniqueness is scoped **per-tenant** — dedup key `(tenant_id, machine_uid)` — so the same hardware legitimately present in two tenants stays two independent rows. (Shared root with SPEC-004; whichever lands first owns the impl, the other consumes it.) Used as the dedup key for register/move. **Collision-gated activation.** The residual collision case is VMs/templates that share a hardware UUID (some hypervisors clone the SMBIOS UUID). When the server detects a `machine_uid` collision (a seemingly-different endpoint resolving to an existing uid), the endpoint does **not** auto-activate: it drops to a **pending** state, fires an alert, and an operator must confirm in the dashboard that the collided endpoint may activate. This is the one deliberate exception to auto-approve (see item 6). 2. **Per-site enrollment key + fingerprint.** - Long (≥256-bit) server-generated secret per site, stored **hashed** (Argon2id, same as `cak_`/passwords), never recoverable in plaintext after issue. - A non-secret **fingerprint** = monotonic version + short derived code in **hex**, rendered `vN (XXXX)` (e.g. `v3 (7F2A)`), shown in the dashboard, baked into the installer filename, and reported by the agent at enrollment. Hex is deliberate — **not** the RMM word-style code (`GREEN-FALCON`) — so GuruConnect and GuruRMM artifacts are never visually conflated. - **Rotate** regenerates the secret and bumps the version; old installers are rejected for *new* enrollments; existing agents (holding `cak_`) are unaffected. 3. **Self-registration endpoint.** New `POST /api/enroll` (public, unauthenticated by JWT — gated by the enrollment key) accepting `{ site_code, enrollment_key, machine_uid, hostname, labels{company,site,department,device_type,tags} }`: - Verify `(site_code, enrollment_key)` against the current per-site key. - **Dedup by `machine_uid`** within the site: if the machine exists, reuse the row and rotate its `cak_`; else create the machine row. - Mint a `cak_` (reuse `generate_agent_key`), store hashed via `db::agent_keys` bound to `machine_id` (+ `tenant_id` from the site), return the plaintext `cak_` **once**. - Emit an audit event + **new-enrollment alert** (and a **site-move** alert when an existing `machine_uid` enrolls under a different site). - **Rate-limit + lockout** per `(site_code, source-IP)` as defense-in-depth (the key is long, so this is belt-and-suspenders, not load-bearing). 4. **Agent first-run enrollment.** On `RunMode::PermanentAgent` with no stored `cak_`: read site config → call `/api/enroll` with `machine_uid` → persist the returned `cak_` to a SYSTEM-only protected store (HKLM under a SYSTEM-only ACL, or DPAPI-machine) → connect to `wss://connect.azcomputerguru.com/ws/agent` using the `cak_`. On subsequent runs, use the stored `cak_` directly (no re-enroll). 5. **Sign-once base + per-site signed wrapper (resolves SPEC-007 open question).** - The base agent is signed once in CI (`release.yml`, already shipped) and stays byte-identical for everyone. - Per-site customization (labels + enrollment key + fingerprint) is delivered to the endpoint **at install time** via a signing-safe channel — NOT appended to the signed PE. **v1 produces BOTH a signed bootstrapper `.exe` and a signed MSI per site** (ScreenConnect parity — manual installs grab the `.exe`, GPO/Intune fleet pushes take the MSI), both wrapping the same sign-once agent and writing the site config to the protected config location. The two differ only in packaging (bootstrapper stub vs. WiX bundle); both are signed. - **Deprecate the append path** in `downloads.rs` for managed installs (keep only for attended/support-code if still needed), eliminating the signature-invalidation defect. 6. **Auto-approve posture (with collision-gate exception).** A self-registered machine is live and controllable immediately (ScreenConnect parity); the new-enrollment alert is the tripwire. The **one** exception is a detected `machine_uid` collision (item 1), which gates the endpoint to **pending** until an operator confirms it in the dashboard. ### Explicitly out of scope (ANTICIPATED — reserve room, do NOT build in v1) The v1 data model and agent mode-dispatch must leave room for these without building them: - **Per-site enrollment POLICY** — a `sites.enrollment_policy` field (default `auto-approve`; future `pending-approval`) plus per-seat/per-endpoint licensing controls. Commercial, multi-tenant (the `tenant_id` column already exists). Its own future SPEC. - **Flag overrides** — `--enroll-key` / `--site-code` (generic installer, key supplied on the command line) and `--reassign` (move an existing machine to a new site, gated by possession of the destination site's key, with an **explicit accidental-move guard**: a different-site re-run refuses unless `--reassign` is passed) + cross-client move policy. Backend (`machine_uid` + authorized site + `cak_`) is designed to support it; CLI surface is deferred. - **Technician-assisted interactive install** — `--technician` on a generic installer: prompts for the tech's own server credentials, and on auth presents a **validated** Company/Site/tags picker from the live authorized list (authz-by-identity, full audit trail). Heaviest path (interactive UI + auth/list callback); deferred. All three converge on the **same backend operation** delivered in v1: `machine_uid` + authorized site + issued `cak_`. v1 only ships the per-site-embedded-key door. ## Architecture - **Agent** (`agent/`): compute `machine_uid`; first-run enroll → store `cak_`; use stored `cak_` thereafter; read site config from the wrapper-written location instead of an appended PE blob. Touches `config.rs` (`EmbeddedConfig`/`detect_run_mode`/storage), `main.rs` (`Install`/run-mode), a new `enroll` client module, transport auth. - **Relay-server** (`server/`): new `POST /api/enroll`; per-site key issue/rotate/verify; `machine_uid` dedup + site-move on register; audit + alert emission; rate-limit/lockout. Touches `api/` (new `enroll.rs`, `sites` key endpoints), `auth/agent_keys.rs`, `db/agent_keys.rs`, `relay/mod.rs` (enrollment vs. connect), `main.rs` routes. - **Dashboard**: per-site enrollment-key display (fingerprint `vN (XXXX)`), **Rotate** action, "current installer" download wired to the signed wrapper build. (Builder UI is SPEC-007; this spec supplies the key/fingerprint/rotation it consumes.) - **DB migration:** `site_enrollment_keys` (or columns on the site): `site_id`, `key_hash`, `version`, `fingerprint`, `created_at`, `rotated_at`, `active`. Reserve `sites.enrollment_policy` (nullable, default `auto-approve`) for the anticipated policy work. `connect_machines` gains `machine_uid` (unique per tenant/site). - **Protobuf** (`proto/guruconnect.proto`): no wire change required for enrollment if `/api/enroll` is REST; `AgentStatus` label fields per SPEC-007 (`department`, `device_type`) ride along if landed together. ## Security considerations - **Two-tier credential model:** low-sensitivity **enrollment key** (gates "may register", shared per site, rotatable) vs. high-sensitivity **per-machine `cak_`** (operating credential, per-machine revocation). Compromise of an enrollment key is recovered by rotating one site — no fleet-wide re-key. - **Enrollment keys stored hashed** (Argon2id); plaintext shown once at issue/rotate. - **`cak_` at rest on the endpoint** is stored as a **DPAPI-machine-encrypted blob inside a SYSTEM-ACL'd location** (HKLM value or `ProgramData` file) — both layers: the SYSTEM ACL stops non-admin users reading it, and DPAPI-machine encryption makes a copied file/export inert off the box. (Local admin/SYSTEM can always recover it; that is accepted — blast radius of one leaked `cak_` is a single, independently-revocable machine.) - **`machine_uid` binding** is the spoof-guard SPEC-004 wants: a `cak_` is bound to a `machine_uid`; a different box presenting another box's `cak_` is detectable. - **Authorization model** for moves/enrolls is possession-of-destination-key in v1 (identity-based authz deferred to the technician-assisted path). - **Open registration risk** is mitigated by requiring `(site_code + long key)` and rate-limit/lockout; auto-approve is acceptable because the enrollment key is the gate and every enrollment/site-move fires an alert. - **Audit events:** enroll, re-enroll/reuse, site-move, key-rotate — all logged with `machine_uid`, site, and source IP. ## Testing strategy - **Unit:** `machine_uid` derivation stability; enrollment-key verify/rotate; fingerprint derivation; `cak_` mint/hash/verify; dedup decision (new vs. reuse vs. move). - **Integration:** enroll new → row + `cak_` issued; re-enroll same `machine_uid` → reuse, no duplicate; enroll with rotated (old) key → rejected; old `cak_` still connects after rotation; rate-limit/lockout trips; site-move emits alert. - **Manual:** build a site wrapper installer → run on a clean VM → appears in console under correct site, immediately controllable; re-image VM → same row reused; `signtool verify /pa` passes on the distributed wrapper and the laid-down agent. ## Effort estimate & dependencies - **Size:** X-Large (agent + relay + DB migration + CI build/sign wrapper + dashboard key/rotation surface). - **Depends on:** SPEC-004 `machine_uid` (shared root); the CI signing already shipped (SPEC-001 §2 / `release.yml`). - **Unblocks:** SPEC-007 (installer builder gets a real per-site key + the signing resolution), and the parked managed-agent test deployment on the internal beta machines. - **Relationship to v2 phases:** sits with the Phase-1 secure-session-core (per-agent keys + identity) and feeds Phase-2 dashboard work. ## Resolved decisions (2026-06-02, Mike) 1. **Wrapper shape — BOTH.** v1 ships a signed bootstrapper `.exe` *and* a signed MSI per site (ScreenConnect offers both; manual installs use the `.exe`, GPO/Intune fleet pushes use the MSI). Same sign-once agent inside each. 2. **`cak_` storage — BOTH layers.** DPAPI-machine-encrypted blob stored in a SYSTEM-ACL'd location. Non-admins can't read it; a stolen copy is inert off the box. 3. **Fingerprint — hex (`7F2A`).** Deliberately *not* the RMM word-code style, so the two products' artifacts are never visually conflated. 4. **`machine_uid` — per-tenant scope, hardware-derived salt, collision-gated.** Dedup key `(tenant_id, machine_uid)`; salt from stable hardware signals (survives same-hardware re-image, separates distinct boxes); detected collisions (e.g. template-cloned VMs sharing a hardware UUID) drop to pending + alert and require dashboard confirmation to activate. 5. **Attended (support-code) path — unchanged.** `download_support` is filename-based (`GuruConnect-.exe`), not append-based, so renaming never breaks the signature — it is already signing-safe. Only the managed `download_agent` append path is retired. ## Remaining for planning - Exact stable-hardware signal set for the salt (SMBIOS UUID alone vs. + motherboard/disk serial) and hypervisor behavior matrix (which hypervisors duplicate the SMBIOS UUID on clone → exercise the collision-gate). - MSI authoring approach (WiX) and whether per-site config rides as a per-site MSI vs. a base MSI + property/transform.