From 18429f6fe3a88883246309cd081d8c894d08c4a6 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Tue, 2 Jun 2026 09:13:59 -0700 Subject: [PATCH] spec: add SPEC-016 zero-touch per-site agent enrollment ScreenConnect-class managed enrollment: one signed installer per site, machines self-register on first run and the server mints a per-machine cak_ key bound to a deterministic machine_uid (dedups re-installs). Per-site rotatable enrollment key (long secret + vN (XXXX) fingerprint); rotating blocks new enrollments from old installers, leaves enrolled agents untouched. Auto-approve + new-enrollment/site-move alert. Resolves SPEC-007's signature-vs-appended-config open question: sign the base agent once in CI + per-site signed wrapper that writes site config around the signed bytes (never appended into the PE). Deferred (room reserved): enrollment policy + per-seat licensing, --enroll-key/--site-code/--reassign flag overrides, technician-assisted interactive install. Tracking todo dbfe6a56. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/FEATURE_ROADMAP.md | 3 +- docs/specs/SPEC-016-zero-touch-enrollment.md | 210 +++++++++++++++++++ 2 files changed, 212 insertions(+), 1 deletion(-) create mode 100644 docs/specs/SPEC-016-zero-touch-enrollment.md diff --git a/docs/FEATURE_ROADMAP.md b/docs/FEATURE_ROADMAP.md index bb75a8b..122140d 100644 --- a/docs/FEATURE_ROADMAP.md +++ b/docs/FEATURE_ROADMAP.md @@ -87,10 +87,11 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001]( - [x] Sessions / machines / support-codes / events - [ ] **Full machine inventory in the connection DB** — P2 — persist per-machine device inventory (OS+locale+install, CPU/RAM, mfr/model/serial, external WAN IP captured server-side + private LAN IP + MAC, logged-on user, idle, time zone, uptime, local-admin) on `connect_machines`, refreshed each `AgentStatus`, shown in the dashboard machine detail (ScreenConnect "Guest Info" parity). Data layer for SPEC-002 Phase 2; closes GC side of agent-IP gap (todo 7459428e). **[→ v2 Phase 2]** ([SPEC-003](specs/SPEC-003-machine-inventory.md)) - [ ] **Stable machine identity + session lifecycle reaping + operator removal** — P1 — give the agent a deterministic machine-derived `machine_uid` (Windows `MachineGuid`-based) so the same box can't register duplicates (root cause: `agent_id` is a config-file random UUID that a portable/misconfigured run regenerates each launch); key registration on it; add TTL reaping + same-machine supersede as defense-in-depth; and admin-gated per-row + multi-select bulk removal of stale sessions/units. Identity must be bound to the per-machine agent key (spoof guard). Fixes ghost-session accumulation seen on the live console (15 sessions / 0 live, ~10 orphans for one machine). **[→ v2 Phase 1]** ([SPEC-004](specs/SPEC-004-session-lifecycle-and-removal.md)) +- [ ] **Zero-touch per-site agent enrollment** — P1 — ScreenConnect-class managed enrollment: one signed installer per site, machines self-register on first run and the server mints a per-machine `cak_` bound to a deterministic `machine_uid` (dedups re-installs). Per-site **rotatable** enrollment key (long secret + `vN (XXXX)` fingerprint) — rotating blocks new enrollments from old installers, leaves enrolled agents untouched. Auto-approve + new-enrollment/site-move alert. **Sign base agent once (CI, shipped) + per-site signed wrapper that writes site config around the signed bytes — resolves SPEC-007's signature-vs-appended-config question.** Anticipated/deferred: enrollment policy + licensing, `--enroll-key`/`--reassign` flag overrides, technician-assisted interactive install. **[→ v2 Phase 1]** ([SPEC-016](specs/SPEC-016-zero-touch-enrollment.md)) - [ ] **Machines list view — dual connection indicators + rich rows** — P2 — ScreenConnect "Access"-list parity: per-row Host/Guest two-segment connection bar (Guest=agent online, Host=viewer connected, with names + durations) and rich inline metadata (company, site, device type, tags, logged-on user + idle, client version in red when outdated). Server-enriches `/api/machines` with live session state + SPEC-003 inventory. **[→ v2 Phase 2]** ([SPEC-005](specs/SPEC-005-machines-list-view-parity.md)) - [ ] Machines "by Company" tree nav with per-company counts — P3 — left-nav grouping sidebar (screenshot parity). Follow-up sub-item of SPEC-005. - [ ] **Universal machine search ("everything is searchable")** — P2 — server-side `?q=` on `/api/machines` matching case-insensitive substring across ALL attributes (OS, logged-on user, external/private IP, company, site, tag, serial, MAC, version, …), pg_trgm GIN-indexed; multi-term AND + optional field-scoped syntax (`os:`, `user:`, `ip:`). Replaces the hostname-only client filter. Depends on SPEC-003 (attrs must be persisted). **[→ v2 Phase 2]** ([SPEC-006](specs/SPEC-006-universal-machine-search.md)) -- [ ] **Managed-agent installer builder ("Build Installer")** — P2 — dashboard wizard to build a pre-labeled persistent-agent installer (Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL / Send Link, reusing the existing embed-config download path; adds department + device_type to EmbeddedConfig/AgentStatus so labels persist at install time. Pairs with revocable per-machine keys; signature-vs-appended-config is the key open question. **[→ v2 Phase 2]** ([SPEC-007](specs/SPEC-007-managed-agent-installer-builder.md)) +- [ ] **Managed-agent installer builder ("Build Installer")** — P2 — dashboard wizard to build a pre-labeled persistent-agent installer (Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL / Send Link, reusing the existing embed-config download path; adds department + device_type to EmbeddedConfig/AgentStatus so labels persist at install time. Pairs with revocable per-machine keys; the signature-vs-appended-config question is resolved by SPEC-016 (sign-once base + per-site signed wrapper, no PE append). **[→ v2 Phase 2]** ([SPEC-007](specs/SPEC-007-managed-agent-installer-builder.md)) - [ ] **Valuable error messages (structured errors + no silent swallows)** — P2 — one structured API error envelope with stable codes + a correlation id that also lands in the logs; contextual tracing on server/agent; sweep the 37 `let _ =` swallows (the pattern that hid the migration-005 bug); dashboard surfaces the real cause + id instead of a generic line. **[→ v2 Phase 0/1 conventions]** ([SPEC-008](specs/SPEC-008-valuable-error-messages.md)) - [ ] **Feature-rich, fully-documented management API** — P2 — everything the console can do, callable by API: OpenAPI 3.x generated from code (utoipa) + browsable docs at `/api/docs`, long-lived revocable scoped API tokens (PAT-style, distinct from the 24h JWT + agent keys), an API-completeness gap audit, and consistent pagination/error conventions. Distinct from the ADR-001 RMM integration contract. **[→ v2 Phase 3]** ([SPEC-009](specs/SPEC-009-feature-rich-documented-api.md)) - [ ] **Branding and white-label configuration** — P2 — Allow MSPs to customize logo, colors, and product name for white-labeled remote support. Dashboard admin settings page with logo upload (PNG/SVG, max 2MB), brand hue slider (OKLCH 0-360°, default 184=cyan), product name override, company name, and favicon. Agent tray tooltip uses custom product name from registry. Singleton database table with public GET endpoint for unauthenticated rendering. CSS variables (`--brand-hue`, `--accent`, `--panel`) for dynamic theming. **[→ v2 Phase 2]** ([SPEC-014](specs/SPEC-014-branding-whitelabel.md)) diff --git a/docs/specs/SPEC-016-zero-touch-enrollment.md b/docs/specs/SPEC-016-zero-touch-enrollment.md new file mode 100644 index 0000000..3345475 --- /dev/null +++ b/docs/specs/SPEC-016-zero-touch-enrollment.md @@ -0,0 +1,210 @@ +# SPEC-016: Zero-Touch Per-Site Agent Enrollment + +**Status:** Proposed +**Priority:** P1 +**Requested By:** Mike (2026-06-02) +**Estimated Effort:** X-Large + +## Overview + +Give GuruConnect a ScreenConnect-class managed-agent enrollment flow: a technician runs +**one signed installer per site** on every machine at that site — no per-machine key +minting, no flags, no typing — and each machine **self-registers** on first run, the +server minting it a per-machine `cak_` key bound to a stable, machine-derived +`machine_uid`. Each site installer carries a **rotatable per-site enrollment key** (a long +server-generated secret) plus a short human-readable **fingerprint** (`vN (XXXX)`) so an +operator can tell at a glance whether an installer is current. Rotating a site's key blocks +*new* enrollments from old installers while leaving already-enrolled machines untouched +(they hold their own `cak_`). + +This is the missing piece that turns the v2 secure-session-core (SPEC-004 per-agent keys + +`machine_uid`) into a real product workflow, and it **resolves SPEC-007's open +signature-vs-appended-config question**: the agent binary is signed **once** in CI +(already shipped via `release.yml`), and per-site customization rides in a thin **signed +wrapper** that writes site config to the endpoint at install time — never appended into the +signed PE. + +**Success criteria:** +1. A tech installs one site installer on N machines; all N appear in the console under the + correct company/site, each as a distinct, deduplicated machine — zero per-machine setup. +2. Re-installing / re-imaging the same hardware **reuses** the existing machine row (no + ghost duplicates — the failure mode SPEC-004 documents). +3. Rotating a site's enrollment key makes old installers unable to enroll new machines, + while every already-enrolled agent keeps working. +4. Every distributed installer is **validly Authenticode-signed** (SmartScreen/WDAC clean). + +## Background — what exists today (confirmed in code) + +- **Embedded config is append-based and breaks signing.** `server/src/api/downloads.rs` + (`download_agent`, ~`:152`) reads `static/downloads/guruconnect.exe` and **appends** + `MAGIC_MARKER` + `len:u32` + JSON (`:196`) to the end of the PE. The agent reads it back + in `agent/src/config.rs` (`read_embedded_config`, `:223`). Appending bytes after a signed + PE invalidates the Authenticode signature — so the current customization path and the + newly-shipped CI signing are mutually exclusive. +- **No self-registration exists.** Per-agent `cak_` keys are minted **admin-only** in + `server/src/api/machine_keys.rs` (`create_key`, `:119`; "Admin issued a per-agent key", + `:146`). There is no endpoint where an agent first-run exchanges an enrollment credential + for its own key. +- **Relay already accepts per-agent keys.** `server/src/relay/mod.rs` + (`validate_agent_api_key`, `:417`) calls `crate::auth::agent_keys::verify_agent_key` + (`:422`) — the `cak_` path — then falls back to the **deprecated** shared `AGENT_API_KEY` + (`:444`, logs a "migrate to per-agent `cak_`" warning). +- **Key primitives exist.** `server/src/auth/agent_keys.rs`: `generate_agent_key` mints a + `cak_`-prefixed high-entropy key (`:36`/`:46`); `verify_agent_key` (`:71`). + `server/src/db/agent_keys.rs` already inserts into `connect_agent_keys (machine_id, + key_hash, tenant_id)` (`:47`) — the v2 tenancy column is present (migration + `004_v2_secure_session_core.sql`). +- **Identity is a random config UUID, not machine-derived** — the root cause of duplicates + per SPEC-004 (`agent/src/config.rs` `generate_agent_id`, `:90`). +- **Agent mode dispatch:** `agent/src/main.rs` `Commands::Install` (`:160`) → `run_install`; + `agent/src/config.rs` `detect_run_mode` (`:162`) returns `RunMode::PermanentAgent` when + embedded config is present. + +## Scope + +### Included in v1 (CORE) + +1. **`machine_uid` — deterministic machine identity.** Derive a stable id from the Windows + `MachineGuid` (`HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid`), independent of the + config-file `agent_id`. (Shared root with SPEC-004; whichever lands first owns the impl, + the other consumes it.) Used as the dedup key for register/move. + +2. **Per-site enrollment key + fingerprint.** + - Long (≥256-bit) server-generated secret per site, stored **hashed** (Argon2id, same + as `cak_`/passwords), never recoverable in plaintext after issue. + - A non-secret **fingerprint** = monotonic version + short derived code, rendered + `vN (XXXX)` (e.g. `v3 (7F2A)`), shown in the dashboard, baked into the installer + filename, and reported by the agent at enrollment. + - **Rotate** regenerates the secret and bumps the version; old installers are rejected + for *new* enrollments; existing agents (holding `cak_`) are unaffected. + +3. **Self-registration endpoint.** New `POST /api/enroll` (public, unauthenticated by JWT — + gated by the enrollment key) accepting `{ site_code, enrollment_key, machine_uid, + hostname, labels{company,site,department,device_type,tags} }`: + - Verify `(site_code, enrollment_key)` against the current per-site key. + - **Dedup by `machine_uid`** within the site: if the machine exists, reuse the row and + rotate its `cak_`; else create the machine row. + - Mint a `cak_` (reuse `generate_agent_key`), store hashed via `db::agent_keys` bound to + `machine_id` (+ `tenant_id` from the site), return the plaintext `cak_` **once**. + - Emit an audit event + **new-enrollment alert** (and a **site-move** alert when an + existing `machine_uid` enrolls under a different site). + - **Rate-limit + lockout** per `(site_code, source-IP)` as defense-in-depth (the key is + long, so this is belt-and-suspenders, not load-bearing). + +4. **Agent first-run enrollment.** On `RunMode::PermanentAgent` with no stored `cak_`: + read site config → call `/api/enroll` with `machine_uid` → persist the returned `cak_` + to a SYSTEM-only protected store (HKLM under a SYSTEM-only ACL, or DPAPI-machine) → + connect to `wss://connect.azcomputerguru.com/ws/agent` using the `cak_`. On subsequent + runs, use the stored `cak_` directly (no re-enroll). + +5. **Sign-once base + per-site signed wrapper (resolves SPEC-007 open question).** + - The base agent is signed once in CI (`release.yml`, already shipped) and stays + byte-identical for everyone. + - Per-site customization (labels + enrollment key + fingerprint) is delivered to the + endpoint **at install time** via a signing-safe channel — NOT appended to the signed + PE. v1 mechanism: a small **signed wrapper/bootstrapper** (or signed MSI) that carries + the site config, lays down the signed agent, and writes the site config to the + protected config location. Decision to lock in planning: wrapper-exe vs MSI + (see Open Questions). + - **Deprecate the append path** in `downloads.rs` for managed installs (keep only for + attended/support-code if still needed), eliminating the signature-invalidation defect. + +6. **Auto-approve posture.** A self-registered machine is live and controllable + immediately (ScreenConnect parity). The new-enrollment alert is the tripwire. + +### Explicitly out of scope (ANTICIPATED — reserve room, do NOT build in v1) + +The v1 data model and agent mode-dispatch must leave room for these without building them: + +- **Per-site enrollment POLICY** — a `sites.enrollment_policy` field (default + `auto-approve`; future `pending-approval`) plus per-seat/per-endpoint licensing controls. + Commercial, multi-tenant (the `tenant_id` column already exists). Its own future SPEC. +- **Flag overrides** — `--enroll-key` / `--site-code` (generic installer, key supplied on + the command line) and `--reassign` (move an existing machine to a new site, gated by + possession of the destination site's key, with an **explicit accidental-move guard**: + a different-site re-run refuses unless `--reassign` is passed) + cross-client move policy. + Backend (`machine_uid` + authorized site + `cak_`) is designed to support it; CLI surface + is deferred. +- **Technician-assisted interactive install** — `--technician` on a generic installer: + prompts for the tech's own server credentials, and on auth presents a **validated** + Company/Site/tags picker from the live authorized list (authz-by-identity, full audit + trail). Heaviest path (interactive UI + auth/list callback); deferred. + +All three converge on the **same backend operation** delivered in v1: `machine_uid` + +authorized site + issued `cak_`. v1 only ships the per-site-embedded-key door. + +## Architecture + +- **Agent** (`agent/`): compute `machine_uid`; first-run enroll → store `cak_`; use stored + `cak_` thereafter; read site config from the wrapper-written location instead of an + appended PE blob. Touches `config.rs` (`EmbeddedConfig`/`detect_run_mode`/storage), + `main.rs` (`Install`/run-mode), a new `enroll` client module, transport auth. +- **Relay-server** (`server/`): new `POST /api/enroll`; per-site key issue/rotate/verify; + `machine_uid` dedup + site-move on register; audit + alert emission; rate-limit/lockout. + Touches `api/` (new `enroll.rs`, `sites` key endpoints), `auth/agent_keys.rs`, + `db/agent_keys.rs`, `relay/mod.rs` (enrollment vs. connect), `main.rs` routes. +- **Dashboard**: per-site enrollment-key display (fingerprint `vN (XXXX)`), **Rotate** + action, "current installer" download wired to the signed wrapper build. (Builder UI is + SPEC-007; this spec supplies the key/fingerprint/rotation it consumes.) +- **DB migration:** `site_enrollment_keys` (or columns on the site): `site_id`, + `key_hash`, `version`, `fingerprint`, `created_at`, `rotated_at`, `active`. Reserve + `sites.enrollment_policy` (nullable, default `auto-approve`) for the anticipated policy + work. `connect_machines` gains `machine_uid` (unique per tenant/site). +- **Protobuf** (`proto/guruconnect.proto`): no wire change required for enrollment if + `/api/enroll` is REST; `AgentStatus` label fields per SPEC-007 (`department`, + `device_type`) ride along if landed together. + +## Security considerations + +- **Two-tier credential model:** low-sensitivity **enrollment key** (gates "may register", + shared per site, rotatable) vs. high-sensitivity **per-machine `cak_`** (operating + credential, per-machine revocation). Compromise of an enrollment key is recovered by + rotating one site — no fleet-wide re-key. +- **Enrollment keys stored hashed** (Argon2id); plaintext shown once at issue/rotate. +- **`cak_` at rest on the endpoint** must be SYSTEM-only (HKLM SYSTEM ACL or DPAPI-machine) + so a non-admin user can't read it. +- **`machine_uid` binding** is the spoof-guard SPEC-004 wants: a `cak_` is bound to a + `machine_uid`; a different box presenting another box's `cak_` is detectable. +- **Authorization model** for moves/enrolls is possession-of-destination-key in v1 + (identity-based authz deferred to the technician-assisted path). +- **Open registration risk** is mitigated by requiring `(site_code + long key)` and + rate-limit/lockout; auto-approve is acceptable because the enrollment key is the gate and + every enrollment/site-move fires an alert. +- **Audit events:** enroll, re-enroll/reuse, site-move, key-rotate — all logged with + `machine_uid`, site, and source IP. + +## Testing strategy + +- **Unit:** `machine_uid` derivation stability; enrollment-key verify/rotate; fingerprint + derivation; `cak_` mint/hash/verify; dedup decision (new vs. reuse vs. move). +- **Integration:** enroll new → row + `cak_` issued; re-enroll same `machine_uid` → reuse, + no duplicate; enroll with rotated (old) key → rejected; old `cak_` still connects after + rotation; rate-limit/lockout trips; site-move emits alert. +- **Manual:** build a site wrapper installer → run on a clean VM → appears in console under + correct site, immediately controllable; re-image VM → same row reused; `signtool verify + /pa` passes on the distributed wrapper and the laid-down agent. + +## Effort estimate & dependencies + +- **Size:** X-Large (agent + relay + DB migration + CI build/sign wrapper + dashboard + key/rotation surface). +- **Depends on:** SPEC-004 `machine_uid` (shared root); the CI signing already shipped + (SPEC-001 §2 / `release.yml`). +- **Unblocks:** SPEC-007 (installer builder gets a real per-site key + the signing + resolution), and the parked managed-agent test deployment on the internal beta machines. +- **Relationship to v2 phases:** sits with the Phase-1 secure-session-core (per-agent keys + + identity) and feeds Phase-2 dashboard work. + +## Open questions + +1. **Wrapper shape:** signed standalone bootstrapper `.exe` vs. signed **MSI** for the + per-site installer. MSI gives clean install/uninstall + GPO/Intune deploy; bootstrapper + is lighter. Lock in planning. +2. **`cak_` storage:** HKLM SYSTEM-ACL registry value vs. DPAPI-machine-protected file — + pick one for the protected store. +3. **Fingerprint code style:** raw hex (`7F2A`) vs. the RMM-house word style + (`GREEN-FALCON`). Cosmetic; pick for operator readability. +4. **Cross-tenant `machine_uid` collisions** (same hardware imaged across tenants) — scope + `machine_uid` uniqueness per tenant, not globally. +5. **Attended (support-code) path:** confirm whether the append-based `download_support` + path is retained as-is or also migrated off appending.