spec: add SPEC-016 zero-touch per-site agent enrollment
All checks were successful
All checks were successful
ScreenConnect-class managed enrollment: one signed installer per site, machines self-register on first run and the server mints a per-machine cak_ key bound to a deterministic machine_uid (dedups re-installs). Per-site rotatable enrollment key (long secret + vN (XXXX) fingerprint); rotating blocks new enrollments from old installers, leaves enrolled agents untouched. Auto-approve + new-enrollment/site-move alert. Resolves SPEC-007's signature-vs-appended-config open question: sign the base agent once in CI + per-site signed wrapper that writes site config around the signed bytes (never appended into the PE). Deferred (room reserved): enrollment policy + per-seat licensing, --enroll-key/--site-code/--reassign flag overrides, technician-assisted interactive install. Tracking todo dbfe6a56. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -87,10 +87,11 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
|
||||
- [x] Sessions / machines / support-codes / events
|
||||
- [ ] **Full machine inventory in the connection DB** — P2 — persist per-machine device inventory (OS+locale+install, CPU/RAM, mfr/model/serial, external WAN IP captured server-side + private LAN IP + MAC, logged-on user, idle, time zone, uptime, local-admin) on `connect_machines`, refreshed each `AgentStatus`, shown in the dashboard machine detail (ScreenConnect "Guest Info" parity). Data layer for SPEC-002 Phase 2; closes GC side of agent-IP gap (todo 7459428e). **[→ v2 Phase 2]** ([SPEC-003](specs/SPEC-003-machine-inventory.md))
|
||||
- [ ] **Stable machine identity + session lifecycle reaping + operator removal** — P1 — give the agent a deterministic machine-derived `machine_uid` (Windows `MachineGuid`-based) so the same box can't register duplicates (root cause: `agent_id` is a config-file random UUID that a portable/misconfigured run regenerates each launch); key registration on it; add TTL reaping + same-machine supersede as defense-in-depth; and admin-gated per-row + multi-select bulk removal of stale sessions/units. Identity must be bound to the per-machine agent key (spoof guard). Fixes ghost-session accumulation seen on the live console (15 sessions / 0 live, ~10 orphans for one machine). **[→ v2 Phase 1]** ([SPEC-004](specs/SPEC-004-session-lifecycle-and-removal.md))
|
||||
- [ ] **Zero-touch per-site agent enrollment** — P1 — ScreenConnect-class managed enrollment: one signed installer per site, machines self-register on first run and the server mints a per-machine `cak_` bound to a deterministic `machine_uid` (dedups re-installs). Per-site **rotatable** enrollment key (long secret + `vN (XXXX)` fingerprint) — rotating blocks new enrollments from old installers, leaves enrolled agents untouched. Auto-approve + new-enrollment/site-move alert. **Sign base agent once (CI, shipped) + per-site signed wrapper that writes site config around the signed bytes — resolves SPEC-007's signature-vs-appended-config question.** Anticipated/deferred: enrollment policy + licensing, `--enroll-key`/`--reassign` flag overrides, technician-assisted interactive install. **[→ v2 Phase 1]** ([SPEC-016](specs/SPEC-016-zero-touch-enrollment.md))
|
||||
- [ ] **Machines list view — dual connection indicators + rich rows** — P2 — ScreenConnect "Access"-list parity: per-row Host/Guest two-segment connection bar (Guest=agent online, Host=viewer connected, with names + durations) and rich inline metadata (company, site, device type, tags, logged-on user + idle, client version in red when outdated). Server-enriches `/api/machines` with live session state + SPEC-003 inventory. **[→ v2 Phase 2]** ([SPEC-005](specs/SPEC-005-machines-list-view-parity.md))
|
||||
- [ ] Machines "by Company" tree nav with per-company counts — P3 — left-nav grouping sidebar (screenshot parity). Follow-up sub-item of SPEC-005.
|
||||
- [ ] **Universal machine search ("everything is searchable")** — P2 — server-side `?q=` on `/api/machines` matching case-insensitive substring across ALL attributes (OS, logged-on user, external/private IP, company, site, tag, serial, MAC, version, …), pg_trgm GIN-indexed; multi-term AND + optional field-scoped syntax (`os:`, `user:`, `ip:`). Replaces the hostname-only client filter. Depends on SPEC-003 (attrs must be persisted). **[→ v2 Phase 2]** ([SPEC-006](specs/SPEC-006-universal-machine-search.md))
|
||||
- [ ] **Managed-agent installer builder ("Build Installer")** — P2 — dashboard wizard to build a pre-labeled persistent-agent installer (Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL / Send Link, reusing the existing embed-config download path; adds department + device_type to EmbeddedConfig/AgentStatus so labels persist at install time. Pairs with revocable per-machine keys; signature-vs-appended-config is the key open question. **[→ v2 Phase 2]** ([SPEC-007](specs/SPEC-007-managed-agent-installer-builder.md))
|
||||
- [ ] **Managed-agent installer builder ("Build Installer")** — P2 — dashboard wizard to build a pre-labeled persistent-agent installer (Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL / Send Link, reusing the existing embed-config download path; adds department + device_type to EmbeddedConfig/AgentStatus so labels persist at install time. Pairs with revocable per-machine keys; the signature-vs-appended-config question is resolved by SPEC-016 (sign-once base + per-site signed wrapper, no PE append). **[→ v2 Phase 2]** ([SPEC-007](specs/SPEC-007-managed-agent-installer-builder.md))
|
||||
- [ ] **Valuable error messages (structured errors + no silent swallows)** — P2 — one structured API error envelope with stable codes + a correlation id that also lands in the logs; contextual tracing on server/agent; sweep the 37 `let _ =` swallows (the pattern that hid the migration-005 bug); dashboard surfaces the real cause + id instead of a generic line. **[→ v2 Phase 0/1 conventions]** ([SPEC-008](specs/SPEC-008-valuable-error-messages.md))
|
||||
- [ ] **Feature-rich, fully-documented management API** — P2 — everything the console can do, callable by API: OpenAPI 3.x generated from code (utoipa) + browsable docs at `/api/docs`, long-lived revocable scoped API tokens (PAT-style, distinct from the 24h JWT + agent keys), an API-completeness gap audit, and consistent pagination/error conventions. Distinct from the ADR-001 RMM integration contract. **[→ v2 Phase 3]** ([SPEC-009](specs/SPEC-009-feature-rich-documented-api.md))
|
||||
- [ ] **Branding and white-label configuration** — P2 — Allow MSPs to customize logo, colors, and product name for white-labeled remote support. Dashboard admin settings page with logo upload (PNG/SVG, max 2MB), brand hue slider (OKLCH 0-360°, default 184=cyan), product name override, company name, and favicon. Agent tray tooltip uses custom product name from registry. Singleton database table with public GET endpoint for unauthenticated rendering. CSS variables (`--brand-hue`, `--accent`, `--panel`) for dynamic theming. **[→ v2 Phase 2]** ([SPEC-014](specs/SPEC-014-branding-whitelabel.md))
|
||||
|
||||
210
docs/specs/SPEC-016-zero-touch-enrollment.md
Normal file
210
docs/specs/SPEC-016-zero-touch-enrollment.md
Normal file
@@ -0,0 +1,210 @@
|
||||
# SPEC-016: Zero-Touch Per-Site Agent Enrollment
|
||||
|
||||
**Status:** Proposed
|
||||
**Priority:** P1
|
||||
**Requested By:** Mike (2026-06-02)
|
||||
**Estimated Effort:** X-Large
|
||||
|
||||
## Overview
|
||||
|
||||
Give GuruConnect a ScreenConnect-class managed-agent enrollment flow: a technician runs
|
||||
**one signed installer per site** on every machine at that site — no per-machine key
|
||||
minting, no flags, no typing — and each machine **self-registers** on first run, the
|
||||
server minting it a per-machine `cak_` key bound to a stable, machine-derived
|
||||
`machine_uid`. Each site installer carries a **rotatable per-site enrollment key** (a long
|
||||
server-generated secret) plus a short human-readable **fingerprint** (`vN (XXXX)`) so an
|
||||
operator can tell at a glance whether an installer is current. Rotating a site's key blocks
|
||||
*new* enrollments from old installers while leaving already-enrolled machines untouched
|
||||
(they hold their own `cak_`).
|
||||
|
||||
This is the missing piece that turns the v2 secure-session-core (SPEC-004 per-agent keys +
|
||||
`machine_uid`) into a real product workflow, and it **resolves SPEC-007's open
|
||||
signature-vs-appended-config question**: the agent binary is signed **once** in CI
|
||||
(already shipped via `release.yml`), and per-site customization rides in a thin **signed
|
||||
wrapper** that writes site config to the endpoint at install time — never appended into the
|
||||
signed PE.
|
||||
|
||||
**Success criteria:**
|
||||
1. A tech installs one site installer on N machines; all N appear in the console under the
|
||||
correct company/site, each as a distinct, deduplicated machine — zero per-machine setup.
|
||||
2. Re-installing / re-imaging the same hardware **reuses** the existing machine row (no
|
||||
ghost duplicates — the failure mode SPEC-004 documents).
|
||||
3. Rotating a site's enrollment key makes old installers unable to enroll new machines,
|
||||
while every already-enrolled agent keeps working.
|
||||
4. Every distributed installer is **validly Authenticode-signed** (SmartScreen/WDAC clean).
|
||||
|
||||
## Background — what exists today (confirmed in code)
|
||||
|
||||
- **Embedded config is append-based and breaks signing.** `server/src/api/downloads.rs`
|
||||
(`download_agent`, ~`:152`) reads `static/downloads/guruconnect.exe` and **appends**
|
||||
`MAGIC_MARKER` + `len:u32` + JSON (`:196`) to the end of the PE. The agent reads it back
|
||||
in `agent/src/config.rs` (`read_embedded_config`, `:223`). Appending bytes after a signed
|
||||
PE invalidates the Authenticode signature — so the current customization path and the
|
||||
newly-shipped CI signing are mutually exclusive.
|
||||
- **No self-registration exists.** Per-agent `cak_` keys are minted **admin-only** in
|
||||
`server/src/api/machine_keys.rs` (`create_key`, `:119`; "Admin issued a per-agent key",
|
||||
`:146`). There is no endpoint where an agent first-run exchanges an enrollment credential
|
||||
for its own key.
|
||||
- **Relay already accepts per-agent keys.** `server/src/relay/mod.rs`
|
||||
(`validate_agent_api_key`, `:417`) calls `crate::auth::agent_keys::verify_agent_key`
|
||||
(`:422`) — the `cak_` path — then falls back to the **deprecated** shared `AGENT_API_KEY`
|
||||
(`:444`, logs a "migrate to per-agent `cak_`" warning).
|
||||
- **Key primitives exist.** `server/src/auth/agent_keys.rs`: `generate_agent_key` mints a
|
||||
`cak_`-prefixed high-entropy key (`:36`/`:46`); `verify_agent_key` (`:71`).
|
||||
`server/src/db/agent_keys.rs` already inserts into `connect_agent_keys (machine_id,
|
||||
key_hash, tenant_id)` (`:47`) — the v2 tenancy column is present (migration
|
||||
`004_v2_secure_session_core.sql`).
|
||||
- **Identity is a random config UUID, not machine-derived** — the root cause of duplicates
|
||||
per SPEC-004 (`agent/src/config.rs` `generate_agent_id`, `:90`).
|
||||
- **Agent mode dispatch:** `agent/src/main.rs` `Commands::Install` (`:160`) → `run_install`;
|
||||
`agent/src/config.rs` `detect_run_mode` (`:162`) returns `RunMode::PermanentAgent` when
|
||||
embedded config is present.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included in v1 (CORE)
|
||||
|
||||
1. **`machine_uid` — deterministic machine identity.** Derive a stable id from the Windows
|
||||
`MachineGuid` (`HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid`), independent of the
|
||||
config-file `agent_id`. (Shared root with SPEC-004; whichever lands first owns the impl,
|
||||
the other consumes it.) Used as the dedup key for register/move.
|
||||
|
||||
2. **Per-site enrollment key + fingerprint.**
|
||||
- Long (≥256-bit) server-generated secret per site, stored **hashed** (Argon2id, same
|
||||
as `cak_`/passwords), never recoverable in plaintext after issue.
|
||||
- A non-secret **fingerprint** = monotonic version + short derived code, rendered
|
||||
`vN (XXXX)` (e.g. `v3 (7F2A)`), shown in the dashboard, baked into the installer
|
||||
filename, and reported by the agent at enrollment.
|
||||
- **Rotate** regenerates the secret and bumps the version; old installers are rejected
|
||||
for *new* enrollments; existing agents (holding `cak_`) are unaffected.
|
||||
|
||||
3. **Self-registration endpoint.** New `POST /api/enroll` (public, unauthenticated by JWT —
|
||||
gated by the enrollment key) accepting `{ site_code, enrollment_key, machine_uid,
|
||||
hostname, labels{company,site,department,device_type,tags} }`:
|
||||
- Verify `(site_code, enrollment_key)` against the current per-site key.
|
||||
- **Dedup by `machine_uid`** within the site: if the machine exists, reuse the row and
|
||||
rotate its `cak_`; else create the machine row.
|
||||
- Mint a `cak_` (reuse `generate_agent_key`), store hashed via `db::agent_keys` bound to
|
||||
`machine_id` (+ `tenant_id` from the site), return the plaintext `cak_` **once**.
|
||||
- Emit an audit event + **new-enrollment alert** (and a **site-move** alert when an
|
||||
existing `machine_uid` enrolls under a different site).
|
||||
- **Rate-limit + lockout** per `(site_code, source-IP)` as defense-in-depth (the key is
|
||||
long, so this is belt-and-suspenders, not load-bearing).
|
||||
|
||||
4. **Agent first-run enrollment.** On `RunMode::PermanentAgent` with no stored `cak_`:
|
||||
read site config → call `/api/enroll` with `machine_uid` → persist the returned `cak_`
|
||||
to a SYSTEM-only protected store (HKLM under a SYSTEM-only ACL, or DPAPI-machine) →
|
||||
connect to `wss://connect.azcomputerguru.com/ws/agent` using the `cak_`. On subsequent
|
||||
runs, use the stored `cak_` directly (no re-enroll).
|
||||
|
||||
5. **Sign-once base + per-site signed wrapper (resolves SPEC-007 open question).**
|
||||
- The base agent is signed once in CI (`release.yml`, already shipped) and stays
|
||||
byte-identical for everyone.
|
||||
- Per-site customization (labels + enrollment key + fingerprint) is delivered to the
|
||||
endpoint **at install time** via a signing-safe channel — NOT appended to the signed
|
||||
PE. v1 mechanism: a small **signed wrapper/bootstrapper** (or signed MSI) that carries
|
||||
the site config, lays down the signed agent, and writes the site config to the
|
||||
protected config location. Decision to lock in planning: wrapper-exe vs MSI
|
||||
(see Open Questions).
|
||||
- **Deprecate the append path** in `downloads.rs` for managed installs (keep only for
|
||||
attended/support-code if still needed), eliminating the signature-invalidation defect.
|
||||
|
||||
6. **Auto-approve posture.** A self-registered machine is live and controllable
|
||||
immediately (ScreenConnect parity). The new-enrollment alert is the tripwire.
|
||||
|
||||
### Explicitly out of scope (ANTICIPATED — reserve room, do NOT build in v1)
|
||||
|
||||
The v1 data model and agent mode-dispatch must leave room for these without building them:
|
||||
|
||||
- **Per-site enrollment POLICY** — a `sites.enrollment_policy` field (default
|
||||
`auto-approve`; future `pending-approval`) plus per-seat/per-endpoint licensing controls.
|
||||
Commercial, multi-tenant (the `tenant_id` column already exists). Its own future SPEC.
|
||||
- **Flag overrides** — `--enroll-key` / `--site-code` (generic installer, key supplied on
|
||||
the command line) and `--reassign` (move an existing machine to a new site, gated by
|
||||
possession of the destination site's key, with an **explicit accidental-move guard**:
|
||||
a different-site re-run refuses unless `--reassign` is passed) + cross-client move policy.
|
||||
Backend (`machine_uid` + authorized site + `cak_`) is designed to support it; CLI surface
|
||||
is deferred.
|
||||
- **Technician-assisted interactive install** — `--technician` on a generic installer:
|
||||
prompts for the tech's own server credentials, and on auth presents a **validated**
|
||||
Company/Site/tags picker from the live authorized list (authz-by-identity, full audit
|
||||
trail). Heaviest path (interactive UI + auth/list callback); deferred.
|
||||
|
||||
All three converge on the **same backend operation** delivered in v1: `machine_uid` +
|
||||
authorized site + issued `cak_`. v1 only ships the per-site-embedded-key door.
|
||||
|
||||
## Architecture
|
||||
|
||||
- **Agent** (`agent/`): compute `machine_uid`; first-run enroll → store `cak_`; use stored
|
||||
`cak_` thereafter; read site config from the wrapper-written location instead of an
|
||||
appended PE blob. Touches `config.rs` (`EmbeddedConfig`/`detect_run_mode`/storage),
|
||||
`main.rs` (`Install`/run-mode), a new `enroll` client module, transport auth.
|
||||
- **Relay-server** (`server/`): new `POST /api/enroll`; per-site key issue/rotate/verify;
|
||||
`machine_uid` dedup + site-move on register; audit + alert emission; rate-limit/lockout.
|
||||
Touches `api/` (new `enroll.rs`, `sites` key endpoints), `auth/agent_keys.rs`,
|
||||
`db/agent_keys.rs`, `relay/mod.rs` (enrollment vs. connect), `main.rs` routes.
|
||||
- **Dashboard**: per-site enrollment-key display (fingerprint `vN (XXXX)`), **Rotate**
|
||||
action, "current installer" download wired to the signed wrapper build. (Builder UI is
|
||||
SPEC-007; this spec supplies the key/fingerprint/rotation it consumes.)
|
||||
- **DB migration:** `site_enrollment_keys` (or columns on the site): `site_id`,
|
||||
`key_hash`, `version`, `fingerprint`, `created_at`, `rotated_at`, `active`. Reserve
|
||||
`sites.enrollment_policy` (nullable, default `auto-approve`) for the anticipated policy
|
||||
work. `connect_machines` gains `machine_uid` (unique per tenant/site).
|
||||
- **Protobuf** (`proto/guruconnect.proto`): no wire change required for enrollment if
|
||||
`/api/enroll` is REST; `AgentStatus` label fields per SPEC-007 (`department`,
|
||||
`device_type`) ride along if landed together.
|
||||
|
||||
## Security considerations
|
||||
|
||||
- **Two-tier credential model:** low-sensitivity **enrollment key** (gates "may register",
|
||||
shared per site, rotatable) vs. high-sensitivity **per-machine `cak_`** (operating
|
||||
credential, per-machine revocation). Compromise of an enrollment key is recovered by
|
||||
rotating one site — no fleet-wide re-key.
|
||||
- **Enrollment keys stored hashed** (Argon2id); plaintext shown once at issue/rotate.
|
||||
- **`cak_` at rest on the endpoint** must be SYSTEM-only (HKLM SYSTEM ACL or DPAPI-machine)
|
||||
so a non-admin user can't read it.
|
||||
- **`machine_uid` binding** is the spoof-guard SPEC-004 wants: a `cak_` is bound to a
|
||||
`machine_uid`; a different box presenting another box's `cak_` is detectable.
|
||||
- **Authorization model** for moves/enrolls is possession-of-destination-key in v1
|
||||
(identity-based authz deferred to the technician-assisted path).
|
||||
- **Open registration risk** is mitigated by requiring `(site_code + long key)` and
|
||||
rate-limit/lockout; auto-approve is acceptable because the enrollment key is the gate and
|
||||
every enrollment/site-move fires an alert.
|
||||
- **Audit events:** enroll, re-enroll/reuse, site-move, key-rotate — all logged with
|
||||
`machine_uid`, site, and source IP.
|
||||
|
||||
## Testing strategy
|
||||
|
||||
- **Unit:** `machine_uid` derivation stability; enrollment-key verify/rotate; fingerprint
|
||||
derivation; `cak_` mint/hash/verify; dedup decision (new vs. reuse vs. move).
|
||||
- **Integration:** enroll new → row + `cak_` issued; re-enroll same `machine_uid` → reuse,
|
||||
no duplicate; enroll with rotated (old) key → rejected; old `cak_` still connects after
|
||||
rotation; rate-limit/lockout trips; site-move emits alert.
|
||||
- **Manual:** build a site wrapper installer → run on a clean VM → appears in console under
|
||||
correct site, immediately controllable; re-image VM → same row reused; `signtool verify
|
||||
/pa` passes on the distributed wrapper and the laid-down agent.
|
||||
|
||||
## Effort estimate & dependencies
|
||||
|
||||
- **Size:** X-Large (agent + relay + DB migration + CI build/sign wrapper + dashboard
|
||||
key/rotation surface).
|
||||
- **Depends on:** SPEC-004 `machine_uid` (shared root); the CI signing already shipped
|
||||
(SPEC-001 §2 / `release.yml`).
|
||||
- **Unblocks:** SPEC-007 (installer builder gets a real per-site key + the signing
|
||||
resolution), and the parked managed-agent test deployment on the internal beta machines.
|
||||
- **Relationship to v2 phases:** sits with the Phase-1 secure-session-core (per-agent keys
|
||||
+ identity) and feeds Phase-2 dashboard work.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Wrapper shape:** signed standalone bootstrapper `.exe` vs. signed **MSI** for the
|
||||
per-site installer. MSI gives clean install/uninstall + GPO/Intune deploy; bootstrapper
|
||||
is lighter. Lock in planning.
|
||||
2. **`cak_` storage:** HKLM SYSTEM-ACL registry value vs. DPAPI-machine-protected file —
|
||||
pick one for the protected store.
|
||||
3. **Fingerprint code style:** raw hex (`7F2A`) vs. the RMM-house word style
|
||||
(`GREEN-FALCON`). Cosmetic; pick for operator readability.
|
||||
4. **Cross-tenant `machine_uid` collisions** (same hardware imaged across tenants) — scope
|
||||
`machine_uid` uniqueness per tenant, not globally.
|
||||
5. **Attended (support-code) path:** confirm whether the append-based `download_support`
|
||||
path is retained as-is or also migrated off appending.
|
||||
Reference in New Issue
Block a user