Fold the 2026-06-02 interview decisions into SPEC-016: - Installer wrapper: ship BOTH signed .exe and signed MSI per site - cak_ at-rest storage: DPAPI-machine-encrypted blob in a SYSTEM-ACL'd location - Fingerprint: hex (7F2A), deliberately unlike RMM word-codes - machine_uid: per-tenant scope + hardware-derived salt (survives re-image, separates distinct boxes) + collision-gated activation (template-cloned VMs sharing a hardware UUID drop to pending + alert, need dashboard confirm) - Attended support-code path: unchanged (filename-based, already signing-safe) Open Questions section -> Resolved decisions + a short Remaining-for-planning list (exact hardware salt signal set, WiX/MSI authoring approach). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
16 KiB
SPEC-016: Zero-Touch Per-Site Agent Enrollment
Status: Proposed Priority: P1 Requested By: Mike (2026-06-02) Estimated Effort: X-Large
Overview
Give GuruConnect a ScreenConnect-class managed-agent enrollment flow: a technician runs
one signed installer per site on every machine at that site — no per-machine key
minting, no flags, no typing — and each machine self-registers on first run, the
server minting it a per-machine cak_ key bound to a stable, machine-derived
machine_uid. Each site installer carries a rotatable per-site enrollment key (a long
server-generated secret) plus a short human-readable fingerprint (vN (XXXX)) so an
operator can tell at a glance whether an installer is current. Rotating a site's key blocks
new enrollments from old installers while leaving already-enrolled machines untouched
(they hold their own cak_).
This is the missing piece that turns the v2 secure-session-core (SPEC-004 per-agent keys +
machine_uid) into a real product workflow, and it resolves SPEC-007's open
signature-vs-appended-config question: the agent binary is signed once in CI
(already shipped via release.yml), and per-site customization rides in a thin signed
wrapper that writes site config to the endpoint at install time — never appended into the
signed PE.
Success criteria:
- A tech installs one site installer on N machines; all N appear in the console under the correct company/site, each as a distinct, deduplicated machine — zero per-machine setup.
- Re-installing / re-imaging the same hardware reuses the existing machine row (no ghost duplicates — the failure mode SPEC-004 documents).
- Rotating a site's enrollment key makes old installers unable to enroll new machines, while every already-enrolled agent keeps working.
- Every distributed installer is validly Authenticode-signed (SmartScreen/WDAC clean).
Background — what exists today (confirmed in code)
- Embedded config is append-based and breaks signing.
server/src/api/downloads.rs(download_agent, ~:152) readsstatic/downloads/guruconnect.exeand appendsMAGIC_MARKER+len:u32+ JSON (:196) to the end of the PE. The agent reads it back inagent/src/config.rs(read_embedded_config,:223). Appending bytes after a signed PE invalidates the Authenticode signature — so the current customization path and the newly-shipped CI signing are mutually exclusive. - No self-registration exists. Per-agent
cak_keys are minted admin-only inserver/src/api/machine_keys.rs(create_key,:119; "Admin issued a per-agent key",:146). There is no endpoint where an agent first-run exchanges an enrollment credential for its own key. - Relay already accepts per-agent keys.
server/src/relay/mod.rs(validate_agent_api_key,:417) callscrate::auth::agent_keys::verify_agent_key(:422) — thecak_path — then falls back to the deprecated sharedAGENT_API_KEY(:444, logs a "migrate to per-agentcak_" warning). - Key primitives exist.
server/src/auth/agent_keys.rs:generate_agent_keymints acak_-prefixed high-entropy key (:36/:46);verify_agent_key(:71).server/src/db/agent_keys.rsalready inserts intoconnect_agent_keys (machine_id, key_hash, tenant_id)(:47) — the v2 tenancy column is present (migration004_v2_secure_session_core.sql). - Identity is a random config UUID, not machine-derived — the root cause of duplicates
per SPEC-004 (
agent/src/config.rsgenerate_agent_id,:90). - Agent mode dispatch:
agent/src/main.rsCommands::Install(:160) →run_install;agent/src/config.rsdetect_run_mode(:162) returnsRunMode::PermanentAgentwhen embedded config is present.
Scope
Included in v1 (CORE)
-
machine_uid— deterministic machine identity (hardware-salted, per-tenant). Derive a stable id from the WindowsMachineGuid(HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid) salted with stable hardware signals (SMBIOS UUID / motherboard + disk serial), independent of the config-fileagent_id. Hardware-derived salt is deliberate: it survives an OS reinstall/re-image on the same hardware (so the row is reused — the re-image dedup goal) while keeping distinct physical boxes distinct (a per-install random salt would break re-image dedup and is rejected). Uniqueness is scoped per-tenant — dedup key(tenant_id, machine_uid)— so the same hardware legitimately present in two tenants stays two independent rows. (Shared root with SPEC-004; whichever lands first owns the impl, the other consumes it.) Used as the dedup key for register/move.Collision-gated activation. The residual collision case is VMs/templates that share a hardware UUID (some hypervisors clone the SMBIOS UUID). When the server detects a
machine_uidcollision (a seemingly-different endpoint resolving to an existing uid), the endpoint does not auto-activate: it drops to a pending state, fires an alert, and an operator must confirm in the dashboard that the collided endpoint may activate. This is the one deliberate exception to auto-approve (see item 6). -
Per-site enrollment key + fingerprint.
- Long (≥256-bit) server-generated secret per site, stored hashed (Argon2id, same
as
cak_/passwords), never recoverable in plaintext after issue. - A non-secret fingerprint = monotonic version + short derived code in hex,
rendered
vN (XXXX)(e.g.v3 (7F2A)), shown in the dashboard, baked into the installer filename, and reported by the agent at enrollment. Hex is deliberate — not the RMM word-style code (GREEN-FALCON) — so GuruConnect and GuruRMM artifacts are never visually conflated. - Rotate regenerates the secret and bumps the version; old installers are rejected
for new enrollments; existing agents (holding
cak_) are unaffected.
- Long (≥256-bit) server-generated secret per site, stored hashed (Argon2id, same
as
-
Self-registration endpoint. New
POST /api/enroll(public, unauthenticated by JWT — gated by the enrollment key) accepting{ site_code, enrollment_key, machine_uid, hostname, labels{company,site,department,device_type,tags} }:- Verify
(site_code, enrollment_key)against the current per-site key. - Dedup by
machine_uidwithin the site: if the machine exists, reuse the row and rotate itscak_; else create the machine row. - Mint a
cak_(reusegenerate_agent_key), store hashed viadb::agent_keysbound tomachine_id(+tenant_idfrom the site), return the plaintextcak_once. - Emit an audit event + new-enrollment alert (and a site-move alert when an
existing
machine_uidenrolls under a different site). - Rate-limit + lockout per
(site_code, source-IP)as defense-in-depth (the key is long, so this is belt-and-suspenders, not load-bearing).
- Verify
-
Agent first-run enrollment. On
RunMode::PermanentAgentwith no storedcak_: read site config → call/api/enrollwithmachine_uid→ persist the returnedcak_to a SYSTEM-only protected store (HKLM under a SYSTEM-only ACL, or DPAPI-machine) → connect towss://connect.azcomputerguru.com/ws/agentusing thecak_. On subsequent runs, use the storedcak_directly (no re-enroll). -
Sign-once base + per-site signed wrapper (resolves SPEC-007 open question).
- The base agent is signed once in CI (
release.yml, already shipped) and stays byte-identical for everyone. - Per-site customization (labels + enrollment key + fingerprint) is delivered to the
endpoint at install time via a signing-safe channel — NOT appended to the signed
PE. v1 produces BOTH a signed bootstrapper
.exeand a signed MSI per site (ScreenConnect parity — manual installs grab the.exe, GPO/Intune fleet pushes take the MSI), both wrapping the same sign-once agent and writing the site config to the protected config location. The two differ only in packaging (bootstrapper stub vs. WiX bundle); both are signed. - Deprecate the append path in
downloads.rsfor managed installs (keep only for attended/support-code if still needed), eliminating the signature-invalidation defect.
- The base agent is signed once in CI (
-
Auto-approve posture (with collision-gate exception). A self-registered machine is live and controllable immediately (ScreenConnect parity); the new-enrollment alert is the tripwire. The one exception is a detected
machine_uidcollision (item 1), which gates the endpoint to pending until an operator confirms it in the dashboard.
Explicitly out of scope (ANTICIPATED — reserve room, do NOT build in v1)
The v1 data model and agent mode-dispatch must leave room for these without building them:
- Per-site enrollment POLICY — a
sites.enrollment_policyfield (defaultauto-approve; futurepending-approval) plus per-seat/per-endpoint licensing controls. Commercial, multi-tenant (thetenant_idcolumn already exists). Its own future SPEC. - Flag overrides —
--enroll-key/--site-code(generic installer, key supplied on the command line) and--reassign(move an existing machine to a new site, gated by possession of the destination site's key, with an explicit accidental-move guard: a different-site re-run refuses unless--reassignis passed) + cross-client move policy. Backend (machine_uid+ authorized site +cak_) is designed to support it; CLI surface is deferred. - Technician-assisted interactive install —
--technicianon a generic installer: prompts for the tech's own server credentials, and on auth presents a validated Company/Site/tags picker from the live authorized list (authz-by-identity, full audit trail). Heaviest path (interactive UI + auth/list callback); deferred.
All three converge on the same backend operation delivered in v1: machine_uid +
authorized site + issued cak_. v1 only ships the per-site-embedded-key door.
Architecture
- Agent (
agent/): computemachine_uid; first-run enroll → storecak_; use storedcak_thereafter; read site config from the wrapper-written location instead of an appended PE blob. Touchesconfig.rs(EmbeddedConfig/detect_run_mode/storage),main.rs(Install/run-mode), a newenrollclient module, transport auth. - Relay-server (
server/): newPOST /api/enroll; per-site key issue/rotate/verify;machine_uiddedup + site-move on register; audit + alert emission; rate-limit/lockout. Touchesapi/(newenroll.rs,siteskey endpoints),auth/agent_keys.rs,db/agent_keys.rs,relay/mod.rs(enrollment vs. connect),main.rsroutes. - Dashboard: per-site enrollment-key display (fingerprint
vN (XXXX)), Rotate action, "current installer" download wired to the signed wrapper build. (Builder UI is SPEC-007; this spec supplies the key/fingerprint/rotation it consumes.) - DB migration:
site_enrollment_keys(or columns on the site):site_id,key_hash,version,fingerprint,created_at,rotated_at,active. Reservesites.enrollment_policy(nullable, defaultauto-approve) for the anticipated policy work.connect_machinesgainsmachine_uid(unique per tenant/site). - Protobuf (
proto/guruconnect.proto): no wire change required for enrollment if/api/enrollis REST;AgentStatuslabel fields per SPEC-007 (department,device_type) ride along if landed together.
Security considerations
- Two-tier credential model: low-sensitivity enrollment key (gates "may register",
shared per site, rotatable) vs. high-sensitivity per-machine
cak_(operating credential, per-machine revocation). Compromise of an enrollment key is recovered by rotating one site — no fleet-wide re-key. - Enrollment keys stored hashed (Argon2id); plaintext shown once at issue/rotate.
cak_at rest on the endpoint is stored as a DPAPI-machine-encrypted blob inside a SYSTEM-ACL'd location (HKLM value orProgramDatafile) — both layers: the SYSTEM ACL stops non-admin users reading it, and DPAPI-machine encryption makes a copied file/export inert off the box. (Local admin/SYSTEM can always recover it; that is accepted — blast radius of one leakedcak_is a single, independently-revocable machine.)machine_uidbinding is the spoof-guard SPEC-004 wants: acak_is bound to amachine_uid; a different box presenting another box'scak_is detectable.- Authorization model for moves/enrolls is possession-of-destination-key in v1 (identity-based authz deferred to the technician-assisted path).
- Open registration risk is mitigated by requiring
(site_code + long key)and rate-limit/lockout; auto-approve is acceptable because the enrollment key is the gate and every enrollment/site-move fires an alert. - Audit events: enroll, re-enroll/reuse, site-move, key-rotate — all logged with
machine_uid, site, and source IP.
Testing strategy
- Unit:
machine_uidderivation stability; enrollment-key verify/rotate; fingerprint derivation;cak_mint/hash/verify; dedup decision (new vs. reuse vs. move). - Integration: enroll new → row +
cak_issued; re-enroll samemachine_uid→ reuse, no duplicate; enroll with rotated (old) key → rejected; oldcak_still connects after rotation; rate-limit/lockout trips; site-move emits alert. - Manual: build a site wrapper installer → run on a clean VM → appears in console under
correct site, immediately controllable; re-image VM → same row reused;
signtool verify /papasses on the distributed wrapper and the laid-down agent.
Effort estimate & dependencies
- Size: X-Large (agent + relay + DB migration + CI build/sign wrapper + dashboard key/rotation surface).
- Depends on: SPEC-004
machine_uid(shared root); the CI signing already shipped (SPEC-001 §2 /release.yml). - Unblocks: SPEC-007 (installer builder gets a real per-site key + the signing resolution), and the parked managed-agent test deployment on the internal beta machines.
- Relationship to v2 phases: sits with the Phase-1 secure-session-core (per-agent keys
- identity) and feeds Phase-2 dashboard work.
Resolved decisions (2026-06-02, Mike)
- Wrapper shape — BOTH. v1 ships a signed bootstrapper
.exeand a signed MSI per site (ScreenConnect offers both; manual installs use the.exe, GPO/Intune fleet pushes use the MSI). Same sign-once agent inside each. cak_storage — BOTH layers. DPAPI-machine-encrypted blob stored in a SYSTEM-ACL'd location. Non-admins can't read it; a stolen copy is inert off the box.- Fingerprint — hex (
7F2A). Deliberately not the RMM word-code style, so the two products' artifacts are never visually conflated. machine_uid— per-tenant scope, hardware-derived salt, collision-gated. Dedup key(tenant_id, machine_uid); salt from stable hardware signals (survives same-hardware re-image, separates distinct boxes); detected collisions (e.g. template-cloned VMs sharing a hardware UUID) drop to pending + alert and require dashboard confirmation to activate.- Attended (support-code) path — unchanged.
download_supportis filename-based (GuruConnect-<code>.exe), not append-based, so renaming never breaks the signature — it is already signing-safe. Only the manageddownload_agentappend path is retired.
Remaining for planning
- Exact stable-hardware signal set for the salt (SMBIOS UUID alone vs. + motherboard/disk serial) and hypervisor behavior matrix (which hypervisors duplicate the SMBIOS UUID on clone → exercise the collision-gate).
- MSI authoring approach (WiX) and whether per-site config rides as a per-site MSI vs. a base MSI + property/transform.