spec: SPEC-016 resolve all 5 open questions (enrollment design decisions)

Fold the 2026-06-02 interview decisions into SPEC-016: - Installer wrapper: ship BOTH signed .exe and signed MSI per site - cak_ at-rest storage: DPAPI-machine-encrypted blob in a SYSTEM-ACL'd location - Fingerprint: hex (7F2A), deliberately unlike RMM word-codes - machine_uid: per-tenant scope + hardware-derived salt (survives re-image, separates distinct boxes) + collision-gated activation (template-cloned VMs sharing a hardware UUID drop to pending + alert, need dashboard confirm) - Attended support-code path: unchanged (filename-based, already signing-safe) Open Questions section -> Resolved decisions + a short Remaining-for-planning list (exact hardware salt signal set, WiX/MSI authoring approach). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
spec: add SPEC-016 zero-touch per-site agent enrollment
2026-06-02 09:54:19 -07:00 · 2026-06-02 09:13:59 -07:00 · 2026-06-02 07:57:04 -07:00 · 2026-06-02 07:56:17 -07:00 · 2026-06-01 14:40:14 -07:00 · 2026-06-01 10:05:38 -07:00
6 changed files with 523 additions and 21 deletions
--- a/.gitea/workflows/release.yml
+++ b/.gitea/workflows/release.yml
@@ -27,6 +27,15 @@ on:
  # computes the next semver from conventional commits at dispatch time.
  # build-and-test.yml remains the automatic PR/push CI gate.
  workflow_dispatch:
+    inputs:
+      channel:
+        description: 'Release channel (stable = full versioned release; beta = signed prerelease test build, no version bump/changelog)'
+        required: true
+        default: 'stable'
+        type: choice
+        options:
+          - stable
+          - beta

 jobs:
  # ---------------------------------------------------------------------------
@@ -36,8 +45,11 @@ jobs:
    name: Version + Changelog
    runs-on: ubuntu-latest
    outputs:
-      version: ${{ steps.bump.outputs.version }}
-      released: ${{ steps.bump.outputs.released }}
+      # Coalesce across the stable (bump) and beta (beta) paths: exactly one of them runs per
+      # dispatch, so the first non-empty value wins. prerelease is 'true' only on the beta path.
+      version: ${{ steps.bump.outputs.version || steps.beta.outputs.version }}
+      released: ${{ steps.bump.outputs.released || steps.beta.outputs.released }}
+      prerelease: ${{ steps.beta.outputs.prerelease || 'false' }}
    steps:
      - name: Checkout (full history + tags)
        uses: actions/checkout@v4
@@ -59,7 +71,8 @@ jobs:
          fi

      - name: Install git-cliff
-        if: steps.guard.outputs.skip != 'true'
+        # Stable-only: beta produces no changelog, so git-cliff is unnecessary on the beta path.
+        if: steps.guard.outputs.skip != 'true' && github.event.inputs.channel == 'stable'
        run: |
          set -euo pipefail
          CLIFF_VERSION="2.6.1"
@@ -72,12 +85,16 @@ jobs:

      - name: Determine next version and bump components
        id: bump
-        if: steps.guard.outputs.skip != 'true'
+        # Stable-only: the beta path (id: beta) handles versioning without a manifest bump/commit.
+        if: steps.guard.outputs.skip != 'true' && github.event.inputs.channel == 'stable'
        run: |
          set -euo pipefail

          # ----- locate the last release tag (vX.Y.Z) -----
-          LAST_TAG="$(git tag --list 'v*' --sort=-v:refname | head -n1 || true)"
+          # Match ONLY strict final-release tags (vMAJOR.MINOR.PATCH). Beta tags look like
+          # v0.3.0-beta.7; if one of those were picked up here it would corrupt the next stable
+          # base version, so prerelease tags are explicitly excluded from this lookup.
+          LAST_TAG="$(git tag --list 'v*' --sort=-v:refname | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+$' | head -n1 || true)"
          if [ -z "${LAST_TAG}" ]; then
            echo "[INFO] No prior release tag found; baseline is current manifest version."
            BASE_VERSION="$(grep -m1 '^version' agent/Cargo.toml | sed -E 's/.*"([0-9]+\.[0-9]+\.[0-9]+)".*/\1/')"
@@ -186,8 +203,39 @@ jobs:
            sed -i -E "0,/^version = \"[0-9]+\.[0-9]+\.[0-9]+\"/s//version = \"${NEXT}\"/" Cargo.toml || true
          fi

+      - name: Beta channel - tag prerelease build (no bump, no commit, no changelog)
+        id: beta
+        # Beta-only path. Reuses the IDENTICAL downstream build + sign + publish jobs, but does
+        # NOT compute a semver bump, mutate any manifest, generate a changelog, or make a release
+        # commit. It just tags the CURRENT HEAD with a unique prerelease version so the Windows
+        # build job can check out `ref: v${VER}` exactly as it does for stable.
+        if: github.event.inputs.channel == 'beta' && steps.guard.outputs.skip != 'true'
+        run: |
+          set -euo pipefail
+
+          # Base version is read straight from the agent manifest — NOT bumped, NOT written back.
+          BASE="$(grep -m1 '^version' agent/Cargo.toml | sed -E 's/.*"([0-9]+\.[0-9]+\.[0-9]+)".*/\1/')"
+          # GITHUB_RUN_NUMBER guarantees a unique prerelease suffix without counting existing tags.
+          VER="${BASE}-beta.${GITHUB_RUN_NUMBER}"
+          echo "[INFO] Beta build version: ${VER} (base ${BASE}, run ${GITHUB_RUN_NUMBER})"
+
+          # Tag the current HEAD (no release commit). Push the tag so build-agent-windows can
+          # check out ref: v${VER}.
+          git config user.name "guruconnect-ci"
+          git config user.email "ci@azcomputerguru.com"
+          # Beta tags are disposable test markers; force makes re-running a failed beta dispatch idempotent (re-run reuses GITHUB_RUN_NUMBER, so the tag already exists).
+          git tag -f "v${VER}"
+          REMOTE="https://${{ secrets.CI_PUSH_TOKEN }}@git.azcomputerguru.com/${GITHUB_REPOSITORY}.git"
+          git push --force "${REMOTE}" "v${VER}"
+          echo "[OK] Pushed beta prerelease tag v${VER}"
+
+          echo "version=${VER}" >> "$GITHUB_OUTPUT"
+          echo "released=true" >> "$GITHUB_OUTPUT"
+          echo "prerelease=true" >> "$GITHUB_OUTPUT"
+
      - name: Generate changelog (git-cliff)
-        if: steps.guard.outputs.skip != 'true' && steps.bump.outputs.released == 'true'
+        # Stable-only: beta produces no changelog artifact.
+        if: steps.guard.outputs.skip != 'true' && steps.bump.outputs.released == 'true' && github.event.inputs.channel == 'stable'
        env:
          VERSION: ${{ steps.bump.outputs.version }}
        run: |
@@ -232,7 +280,10 @@ jobs:

          # Re-derive the set of changed components (same logic as the bump step). On the first
          # release (no prior tag) all components are considered changed.
-          LAST_TAG="$(git tag --list 'v*' --sort=-v:refname | head -n1 || true)"
+          # Match ONLY strict final-release tags (vMAJOR.MINOR.PATCH); exclude beta prerelease
+          # tags (v0.3.0-beta.7) so the changelog diff range is taken against the last real
+          # release, not an intervening beta build.
+          LAST_TAG="$(git tag --list 'v*' --sort=-v:refname | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+$' | head -n1 || true)"
          if [ -z "${LAST_TAG}" ]; then
            CHANGED_FILES="$(git ls-files)"
            FIRST_RELEASE=true
@@ -252,7 +303,8 @@ jobs:
          fi

      - name: Commit release + create tag
-        if: steps.guard.outputs.skip != 'true' && steps.bump.outputs.released == 'true'
+        # Stable-only: beta tags HEAD directly in the beta step and never makes a release commit.
+        if: steps.guard.outputs.skip != 'true' && steps.bump.outputs.released == 'true' && github.event.inputs.channel == 'stable'
        env:
          VERSION: ${{ steps.bump.outputs.version }}
        run: |
@@ -276,7 +328,8 @@ jobs:
          echo "[OK] Pushed release commit and tag v${VERSION}"

      - name: Upload changelog artifact
-        if: steps.guard.outputs.skip != 'true' && steps.bump.outputs.released == 'true'
+        # Stable-only: there is no changelog on the beta path, so nothing to upload.
+        if: steps.guard.outputs.skip != 'true' && steps.bump.outputs.released == 'true' && github.event.inputs.channel == 'stable'
        uses: actions/upload-artifact@v3
        with:
          name: changelog
@@ -445,6 +498,9 @@ jobs:
          echo "sha256=${SUM}" >> "$GITHUB_OUTPUT"

      - name: Download changelog artifact
+        # Stable-only: the beta path uploads no `changelog` artifact. The release-creation step
+        # already guards on `[ -f changelog-artifact/CHANGELOG.md ]`, so skipping this is safe.
+        if: github.event.inputs.channel == 'stable'
        uses: actions/download-artifact@v3
        with:
          name: changelog
@@ -472,17 +528,26 @@ jobs:
        env:
          VERSION: ${{ needs.version.outputs.version }}
          SHA256: ${{ steps.sha.outputs.sha256 }}
+          # PRERELEASE is 'true' on the beta path, 'false' on stable; drives the Gitea release flag.
+          PRERELEASE: ${{ needs.version.outputs.prerelease }}
          GITEA_TOKEN: ${{ secrets.CI_PUSH_TOKEN }}
        run: |
          set -euo pipefail
          API_BASE="https://git.azcomputerguru.com/api/v1/repos/${GITHUB_REPOSITORY}"
          TAG="v${VERSION}"
-          echo "[INFO] Creating Gitea release ${TAG} on ${GITHUB_REPOSITORY}"
+          echo "[INFO] Creating Gitea release ${TAG} on ${GITHUB_REPOSITORY} (prerelease=${PRERELEASE})"

-          BODY="$(printf 'GuruConnect %s\n\nSHA-256 (guruconnect.exe): %s\n\nSee CHANGELOG.md and /api/changelog for details.' "${TAG}" "${SHA256}")"
+          # Beta builds get a clear "prerelease test build" note in the body; the -beta.N suffix
+          # is already carried in TAG, so the release name "Release v..." needs no extra handling.
+          if [ "${PRERELEASE}" = "true" ]; then
+            BODY="$(printf 'GuruConnect %s (PRERELEASE / beta test build)\n\nSHA-256 (guruconnect.exe): %s\n\nSigned via Azure Trusted Signing. Not a stable release — no changelog/version bump.' "${TAG}" "${SHA256}")"
+          else
+            BODY="$(printf 'GuruConnect %s\n\nSHA-256 (guruconnect.exe): %s\n\nSee CHANGELOG.md and /api/changelog for details.' "${TAG}" "${SHA256}")"
+          fi

          # Build the JSON payload with python (handles escaping of the multi-line body safely).
-          CREATE_PAYLOAD="$(TAG="$TAG" BODY="$BODY" python3 -c 'import json,os; print(json.dumps({"tag_name": os.environ["TAG"], "name": "Release " + os.environ["TAG"], "body": os.environ["BODY"], "draft": False, "prerelease": False}))')"
+          # prerelease is derived from the PRERELEASE env var (beta -> true, stable -> false).
+          CREATE_PAYLOAD="$(TAG="$TAG" BODY="$BODY" PRERELEASE="$PRERELEASE" python3 -c 'import json,os; print(json.dumps({"tag_name": os.environ["TAG"], "name": "Release " + os.environ["TAG"], "body": os.environ["BODY"], "draft": False, "prerelease": os.environ.get("PRERELEASE","false") == "true"}))')"

          RELEASE_JSON="$(curl -fsS -X POST \
            "${API_BASE}/releases" \
--- a/docs/FEATURE_ROADMAP.md
+++ b/docs/FEATURE_ROADMAP.md
@@ -16,11 +16,16 @@ stack. It ships independently of GuruRMM and integrates with it via a versioned
 > match, blacklist-on-WS, agent-plane rejects user JWTs via per-agent `cak_` keys). The feature specs below
 > (SPEC-003–009) are **work-items inside the later v2 phases** — see the mapping.
 >
-> **Remaining to formally exit Phase 1:** secure-session-core **Task 8** (end-to-end verification +
-> `/gc-audit --pass=security` re-audit + the manual CRITICAL checks) and Code-Review sign-off on Tasks 3–5
-> (implemented without a local toolchain at the time; since built + deployed). Live HW-H.264 validation is
-> also pending — raw+Zstd remains the shipping default. ~~Sprint 0 (relay-auth CRITICAL hotfix)~~ **not
-> needed — those fixes shipped in Tasks 2–3.**
+> **Phase 1 formally EXITED (2026-05-31).** secure-session-core **Task 8** is complete — end-to-end
+> functional verification (live CRITICAL boundary checks against the deployed binary: login-JWT→401,
+> wrong-session viewer token→403, JWT-as-agent-key→401) **plus the `/gc-audit --pass=security` re-audit:
+> PASS, 0 CRITICAL/HIGH/MEDIUM/LOW** ([report](../reports/2026-05-31-gc-audit.md)). Code-Review sign-off on
+> Tasks 3–5 landed earlier. On top of Phase 1, **SPEC-004 (Tasks 2/4/5 — machine_uid dedup, session
+> reaping, operator removal API+UI) is implemented, reviewed, deployed, and the 11 live ghost rows were
+> purged**; the agent is now **auto-versioned + Azure-Trusted-Signing-signed via `release.yml`** with
+> **v0.3.0 published** as the stable release. ~~Sprint 0 (relay-auth CRITICAL hotfix)~~ **not needed.**
+> Still pending (NOT a Phase-1 blocker): live HW-H.264 cross-GPU validation — **raw+Zstd remains the
+> shipping default** (`DEFAULT_PREFER_H264=false`) until H.264 is validated across GPUs.

 ### v2 phase mapping of current specs

@@ -43,8 +48,9 @@ stack. It ships independently of GuruRMM and integrates with it via a versioned

 Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](specs/SPEC-001-operational-tooling-parity.md).

- [ ] **Code signing — Azure Trusted Signing in CI** — P1 — sign the Windows agent `.exe` via `jsign` (TRUSTEDSIGNING) in Gitea Actions, reusing the shared ACG cert profile. (SPEC-001 §2)
- [ ] **Automatic versioning** — P1 — conventional-commit-driven version bump across agent/server/dashboard, embedded via `build.rs`. (SPEC-001 §3)
+- [x] **Code signing — Azure Trusted Signing in CI** — P1 — Windows agent `.exe` signed via `jsign` (TRUSTEDSIGNING) in `release.yml`, fail-closed (never publishes unsigned). Shipped with v0.3.0. (SPEC-001 §2)
+- [ ] **Signed beta/test release channel** — **P1 — NOW** — every binary we hand to a tester must be signed, but signing today only runs on a deliberate full `release.yml` dispatch; the automatic `build-and-test.yml` agent artifact is explicitly **unsigned**. Add a `channel: stable | beta` `workflow_dispatch` input to `release.yml`: `beta` signs the agent and publishes a prerelease-tagged Gitea release (e.g. `v0.4.0-beta.1`) **skipping the semver bump + changelog**; `stable` keeps the existing full path. Keeps signing secrets out of PR-triggered runs. (SPEC-001 §2)
+- [x] **Automatic versioning** — P1 — conventional-commit-driven version bump computed at dispatch in `release.yml`, embedded via `build.rs`. Shipped with v0.3.0. (SPEC-001 §3)
 - [ ] **Changelog generation & API** — P2 — `CHANGELOG.md` + per-version changelogs from conventional commits, served at `/api/changelog/...`. (SPEC-001 §4)
 - [ ] **Feature-request workflow** — P2 — `/gc-feature-request` skill producing `docs/specs/SPEC-NNN-*.md` and updating this roadmap. (SPEC-001 §1)
 - [ ] **Roadmap / ADR / spec tracking** — P1 — this file + `ARCHITECTURE_DECISIONS.md` + `docs/specs/`. (SPEC-001 §5) — *bootstrapped*
@@ -81,10 +87,11 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
 - [x] Sessions / machines / support-codes / events
 - [ ] **Full machine inventory in the connection DB** — P2 — persist per-machine device inventory (OS+locale+install, CPU/RAM, mfr/model/serial, external WAN IP captured server-side + private LAN IP + MAC, logged-on user, idle, time zone, uptime, local-admin) on `connect_machines`, refreshed each `AgentStatus`, shown in the dashboard machine detail (ScreenConnect "Guest Info" parity). Data layer for SPEC-002 Phase 2; closes GC side of agent-IP gap (todo 7459428e). **[→ v2 Phase 2]** ([SPEC-003](specs/SPEC-003-machine-inventory.md))
 - [ ] **Stable machine identity + session lifecycle reaping + operator removal** — P1 — give the agent a deterministic machine-derived `machine_uid` (Windows `MachineGuid`-based) so the same box can't register duplicates (root cause: `agent_id` is a config-file random UUID that a portable/misconfigured run regenerates each launch); key registration on it; add TTL reaping + same-machine supersede as defense-in-depth; and admin-gated per-row + multi-select bulk removal of stale sessions/units. Identity must be bound to the per-machine agent key (spoof guard). Fixes ghost-session accumulation seen on the live console (15 sessions / 0 live, ~10 orphans for one machine). **[→ v2 Phase 1]** ([SPEC-004](specs/SPEC-004-session-lifecycle-and-removal.md))
+- [ ] **Zero-touch per-site agent enrollment** — P1 — ScreenConnect-class managed enrollment: one signed installer per site, machines self-register on first run and the server mints a per-machine `cak_` bound to a deterministic `machine_uid` (dedups re-installs). Per-site **rotatable** enrollment key (long secret + `vN (XXXX)` fingerprint) — rotating blocks new enrollments from old installers, leaves enrolled agents untouched. Auto-approve + new-enrollment/site-move alert. **Sign base agent once (CI, shipped) + per-site signed wrapper that writes site config around the signed bytes — resolves SPEC-007's signature-vs-appended-config question.** Anticipated/deferred: enrollment policy + licensing, `--enroll-key`/`--reassign` flag overrides, technician-assisted interactive install. **[→ v2 Phase 1]** ([SPEC-016](specs/SPEC-016-zero-touch-enrollment.md))
 - [ ] **Machines list view — dual connection indicators + rich rows** — P2 — ScreenConnect "Access"-list parity: per-row Host/Guest two-segment connection bar (Guest=agent online, Host=viewer connected, with names + durations) and rich inline metadata (company, site, device type, tags, logged-on user + idle, client version in red when outdated). Server-enriches `/api/machines` with live session state + SPEC-003 inventory. **[→ v2 Phase 2]** ([SPEC-005](specs/SPEC-005-machines-list-view-parity.md))
 - [ ] Machines "by Company" tree nav with per-company counts — P3 — left-nav grouping sidebar (screenshot parity). Follow-up sub-item of SPEC-005.
 - [ ] **Universal machine search ("everything is searchable")** — P2 — server-side `?q=` on `/api/machines` matching case-insensitive substring across ALL attributes (OS, logged-on user, external/private IP, company, site, tag, serial, MAC, version, …), pg_trgm GIN-indexed; multi-term AND + optional field-scoped syntax (`os:`, `user:`, `ip:`). Replaces the hostname-only client filter. Depends on SPEC-003 (attrs must be persisted). **[→ v2 Phase 2]** ([SPEC-006](specs/SPEC-006-universal-machine-search.md))
- [ ] **Managed-agent installer builder ("Build Installer")** — P2 — dashboard wizard to build a pre-labeled persistent-agent installer (Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL / Send Link, reusing the existing embed-config download path; adds department + device_type to EmbeddedConfig/AgentStatus so labels persist at install time. Pairs with revocable per-machine keys; signature-vs-appended-config is the key open question. **[→ v2 Phase 2]** ([SPEC-007](specs/SPEC-007-managed-agent-installer-builder.md))
+- [ ] **Managed-agent installer builder ("Build Installer")** — P2 — dashboard wizard to build a pre-labeled persistent-agent installer (Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL / Send Link, reusing the existing embed-config download path; adds department + device_type to EmbeddedConfig/AgentStatus so labels persist at install time. Pairs with revocable per-machine keys; the signature-vs-appended-config question is resolved by SPEC-016 (sign-once base + per-site signed wrapper, no PE append). **[→ v2 Phase 2]** ([SPEC-007](specs/SPEC-007-managed-agent-installer-builder.md))
 - [ ] **Valuable error messages (structured errors + no silent swallows)** — P2 — one structured API error envelope with stable codes + a correlation id that also lands in the logs; contextual tracing on server/agent; sweep the 37 `let _ =` swallows (the pattern that hid the migration-005 bug); dashboard surfaces the real cause + id instead of a generic line. **[→ v2 Phase 0/1 conventions]** ([SPEC-008](specs/SPEC-008-valuable-error-messages.md))
 - [ ] **Feature-rich, fully-documented management API** — P2 — everything the console can do, callable by API: OpenAPI 3.x generated from code (utoipa) + browsable docs at `/api/docs`, long-lived revocable scoped API tokens (PAT-style, distinct from the 24h JWT + agent keys), an API-completeness gap audit, and consistent pagination/error conventions. Distinct from the ADR-001 RMM integration contract. **[→ v2 Phase 3]** ([SPEC-009](specs/SPEC-009-feature-rich-documented-api.md))
 - [ ] **Branding and white-label configuration** — P2 — Allow MSPs to customize logo, colors, and product name for white-labeled remote support. Dashboard admin settings page with logo upload (PNG/SVG, max 2MB), brand hue slider (OKLCH 0-360°, default 184=cyan), product name override, company name, and favicon. Agent tray tooltip uses custom product name from registry. Singleton database table with public GET endpoint for unauthenticated rendering. CSS variables (`--brand-hue`, `--accent`, `--panel`) for dynamic theming. **[→ v2 Phase 2]** ([SPEC-014](specs/SPEC-014-branding-whitelabel.md))
--- a/docs/specs/SPEC-016-zero-touch-enrollment.md
+++ b/docs/specs/SPEC-016-zero-touch-enrollment.md
@@ -0,0 +1,244 @@
+# SPEC-016: Zero-Touch Per-Site Agent Enrollment
+
+**Status:** Proposed
+**Priority:** P1
+**Requested By:** Mike (2026-06-02)
+**Estimated Effort:** X-Large
+
+## Overview
+
+Give GuruConnect a ScreenConnect-class managed-agent enrollment flow: a technician runs
+**one signed installer per site** on every machine at that site — no per-machine key
+minting, no flags, no typing — and each machine **self-registers** on first run, the
+server minting it a per-machine `cak_` key bound to a stable, machine-derived
+`machine_uid`. Each site installer carries a **rotatable per-site enrollment key** (a long
+server-generated secret) plus a short human-readable **fingerprint** (`vN (XXXX)`) so an
+operator can tell at a glance whether an installer is current. Rotating a site's key blocks
+*new* enrollments from old installers while leaving already-enrolled machines untouched
+(they hold their own `cak_`).
+
+This is the missing piece that turns the v2 secure-session-core (SPEC-004 per-agent keys +
+`machine_uid`) into a real product workflow, and it **resolves SPEC-007's open
+signature-vs-appended-config question**: the agent binary is signed **once** in CI
+(already shipped via `release.yml`), and per-site customization rides in a thin **signed
+wrapper** that writes site config to the endpoint at install time — never appended into the
+signed PE.
+
+**Success criteria:**
+1. A tech installs one site installer on N machines; all N appear in the console under the
+   correct company/site, each as a distinct, deduplicated machine — zero per-machine setup.
+2. Re-installing / re-imaging the same hardware **reuses** the existing machine row (no
+   ghost duplicates — the failure mode SPEC-004 documents).
+3. Rotating a site's enrollment key makes old installers unable to enroll new machines,
+   while every already-enrolled agent keeps working.
+4. Every distributed installer is **validly Authenticode-signed** (SmartScreen/WDAC clean).
+
+## Background — what exists today (confirmed in code)
+
+- **Embedded config is append-based and breaks signing.** `server/src/api/downloads.rs`
+  (`download_agent`, ~`:152`) reads `static/downloads/guruconnect.exe` and **appends**
+  `MAGIC_MARKER` + `len:u32` + JSON (`:196`) to the end of the PE. The agent reads it back
+  in `agent/src/config.rs` (`read_embedded_config`, `:223`). Appending bytes after a signed
+  PE invalidates the Authenticode signature — so the current customization path and the
+  newly-shipped CI signing are mutually exclusive.
+- **No self-registration exists.** Per-agent `cak_` keys are minted **admin-only** in
+  `server/src/api/machine_keys.rs` (`create_key`, `:119`; "Admin issued a per-agent key",
+  `:146`). There is no endpoint where an agent first-run exchanges an enrollment credential
+  for its own key.
+- **Relay already accepts per-agent keys.** `server/src/relay/mod.rs`
+  (`validate_agent_api_key`, `:417`) calls `crate::auth::agent_keys::verify_agent_key`
+  (`:422`) — the `cak_` path — then falls back to the **deprecated** shared `AGENT_API_KEY`
+  (`:444`, logs a "migrate to per-agent `cak_`" warning).
+- **Key primitives exist.** `server/src/auth/agent_keys.rs`: `generate_agent_key` mints a
+  `cak_`-prefixed high-entropy key (`:36`/`:46`); `verify_agent_key` (`:71`).
+  `server/src/db/agent_keys.rs` already inserts into `connect_agent_keys (machine_id,
+  key_hash, tenant_id)` (`:47`) — the v2 tenancy column is present (migration
+  `004_v2_secure_session_core.sql`).
+- **Identity is a random config UUID, not machine-derived** — the root cause of duplicates
+  per SPEC-004 (`agent/src/config.rs` `generate_agent_id`, `:90`).
+- **Agent mode dispatch:** `agent/src/main.rs` `Commands::Install` (`:160`) → `run_install`;
+  `agent/src/config.rs` `detect_run_mode` (`:162`) returns `RunMode::PermanentAgent` when
+  embedded config is present.
+
+## Scope
+
+### Included in v1 (CORE)
+
+1. **`machine_uid` — deterministic machine identity (hardware-salted, per-tenant).** Derive
+   a stable id from the Windows `MachineGuid`
+   (`HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid`) **salted with stable hardware
+   signals** (SMBIOS UUID / motherboard + disk serial), independent of the config-file
+   `agent_id`. Hardware-derived salt is deliberate: it **survives an OS reinstall/re-image
+   on the same hardware** (so the row is reused — the re-image dedup goal) while keeping
+   distinct physical boxes distinct (a per-install *random* salt would break re-image dedup
+   and is rejected). Uniqueness is scoped **per-tenant** — dedup key `(tenant_id,
+   machine_uid)` — so the same hardware legitimately present in two tenants stays two
+   independent rows. (Shared root with SPEC-004; whichever lands first owns the impl, the
+   other consumes it.) Used as the dedup key for register/move.
+
+   **Collision-gated activation.** The residual collision case is VMs/templates that share a
+   hardware UUID (some hypervisors clone the SMBIOS UUID). When the server detects a
+   `machine_uid` collision (a seemingly-different endpoint resolving to an existing uid), the
+   endpoint does **not** auto-activate: it drops to a **pending** state, fires an alert, and
+   an operator must confirm in the dashboard that the collided endpoint may activate. This is
+   the one deliberate exception to auto-approve (see item 6).
+
+2. **Per-site enrollment key + fingerprint.**
+   - Long (≥256-bit) server-generated secret per site, stored **hashed** (Argon2id, same
+     as `cak_`/passwords), never recoverable in plaintext after issue.
+   - A non-secret **fingerprint** = monotonic version + short derived code in **hex**,
+     rendered `vN (XXXX)` (e.g. `v3 (7F2A)`), shown in the dashboard, baked into the
+     installer filename, and reported by the agent at enrollment. Hex is deliberate —
+     **not** the RMM word-style code (`GREEN-FALCON`) — so GuruConnect and GuruRMM
+     artifacts are never visually conflated.
+   - **Rotate** regenerates the secret and bumps the version; old installers are rejected
+     for *new* enrollments; existing agents (holding `cak_`) are unaffected.
+
+3. **Self-registration endpoint.** New `POST /api/enroll` (public, unauthenticated by JWT —
+   gated by the enrollment key) accepting `{ site_code, enrollment_key, machine_uid,
+   hostname, labels{company,site,department,device_type,tags} }`:
+   - Verify `(site_code, enrollment_key)` against the current per-site key.
+   - **Dedup by `machine_uid`** within the site: if the machine exists, reuse the row and
+     rotate its `cak_`; else create the machine row.
+   - Mint a `cak_` (reuse `generate_agent_key`), store hashed via `db::agent_keys` bound to
+     `machine_id` (+ `tenant_id` from the site), return the plaintext `cak_` **once**.
+   - Emit an audit event + **new-enrollment alert** (and a **site-move** alert when an
+     existing `machine_uid` enrolls under a different site).
+   - **Rate-limit + lockout** per `(site_code, source-IP)` as defense-in-depth (the key is
+     long, so this is belt-and-suspenders, not load-bearing).
+
+4. **Agent first-run enrollment.** On `RunMode::PermanentAgent` with no stored `cak_`:
+   read site config → call `/api/enroll` with `machine_uid` → persist the returned `cak_`
+   to a SYSTEM-only protected store (HKLM under a SYSTEM-only ACL, or DPAPI-machine) →
+   connect to `wss://connect.azcomputerguru.com/ws/agent` using the `cak_`. On subsequent
+   runs, use the stored `cak_` directly (no re-enroll).
+
+5. **Sign-once base + per-site signed wrapper (resolves SPEC-007 open question).**
+   - The base agent is signed once in CI (`release.yml`, already shipped) and stays
+     byte-identical for everyone.
+   - Per-site customization (labels + enrollment key + fingerprint) is delivered to the
+     endpoint **at install time** via a signing-safe channel — NOT appended to the signed
+     PE. **v1 produces BOTH a signed bootstrapper `.exe` and a signed MSI per site**
+     (ScreenConnect parity — manual installs grab the `.exe`, GPO/Intune fleet pushes take
+     the MSI), both wrapping the same sign-once agent and writing the site config to the
+     protected config location. The two differ only in packaging (bootstrapper stub vs. WiX
+     bundle); both are signed.
+   - **Deprecate the append path** in `downloads.rs` for managed installs (keep only for
+     attended/support-code if still needed), eliminating the signature-invalidation defect.
+
+6. **Auto-approve posture (with collision-gate exception).** A self-registered machine is
+   live and controllable immediately (ScreenConnect parity); the new-enrollment alert is the
+   tripwire. The **one** exception is a detected `machine_uid` collision (item 1), which
+   gates the endpoint to **pending** until an operator confirms it in the dashboard.
+
+### Explicitly out of scope (ANTICIPATED — reserve room, do NOT build in v1)
+
+The v1 data model and agent mode-dispatch must leave room for these without building them:
+
+- **Per-site enrollment POLICY** — a `sites.enrollment_policy` field (default
+  `auto-approve`; future `pending-approval`) plus per-seat/per-endpoint licensing controls.
+  Commercial, multi-tenant (the `tenant_id` column already exists). Its own future SPEC.
+- **Flag overrides** — `--enroll-key` / `--site-code` (generic installer, key supplied on
+  the command line) and `--reassign` (move an existing machine to a new site, gated by
+  possession of the destination site's key, with an **explicit accidental-move guard**:
+  a different-site re-run refuses unless `--reassign` is passed) + cross-client move policy.
+  Backend (`machine_uid` + authorized site + `cak_`) is designed to support it; CLI surface
+  is deferred.
+- **Technician-assisted interactive install** — `--technician` on a generic installer:
+  prompts for the tech's own server credentials, and on auth presents a **validated**
+  Company/Site/tags picker from the live authorized list (authz-by-identity, full audit
+  trail). Heaviest path (interactive UI + auth/list callback); deferred.
+
+All three converge on the **same backend operation** delivered in v1: `machine_uid` +
+authorized site + issued `cak_`. v1 only ships the per-site-embedded-key door.
+
+## Architecture
+
+- **Agent** (`agent/`): compute `machine_uid`; first-run enroll → store `cak_`; use stored
+  `cak_` thereafter; read site config from the wrapper-written location instead of an
+  appended PE blob. Touches `config.rs` (`EmbeddedConfig`/`detect_run_mode`/storage),
+  `main.rs` (`Install`/run-mode), a new `enroll` client module, transport auth.
+- **Relay-server** (`server/`): new `POST /api/enroll`; per-site key issue/rotate/verify;
+  `machine_uid` dedup + site-move on register; audit + alert emission; rate-limit/lockout.
+  Touches `api/` (new `enroll.rs`, `sites` key endpoints), `auth/agent_keys.rs`,
+  `db/agent_keys.rs`, `relay/mod.rs` (enrollment vs. connect), `main.rs` routes.
+- **Dashboard**: per-site enrollment-key display (fingerprint `vN (XXXX)`), **Rotate**
+  action, "current installer" download wired to the signed wrapper build. (Builder UI is
+  SPEC-007; this spec supplies the key/fingerprint/rotation it consumes.)
+- **DB migration:** `site_enrollment_keys` (or columns on the site): `site_id`,
+  `key_hash`, `version`, `fingerprint`, `created_at`, `rotated_at`, `active`. Reserve
+  `sites.enrollment_policy` (nullable, default `auto-approve`) for the anticipated policy
+  work. `connect_machines` gains `machine_uid` (unique per tenant/site).
+- **Protobuf** (`proto/guruconnect.proto`): no wire change required for enrollment if
+  `/api/enroll` is REST; `AgentStatus` label fields per SPEC-007 (`department`,
+  `device_type`) ride along if landed together.
+
+## Security considerations
+
+- **Two-tier credential model:** low-sensitivity **enrollment key** (gates "may register",
+  shared per site, rotatable) vs. high-sensitivity **per-machine `cak_`** (operating
+  credential, per-machine revocation). Compromise of an enrollment key is recovered by
+  rotating one site — no fleet-wide re-key.
+- **Enrollment keys stored hashed** (Argon2id); plaintext shown once at issue/rotate.
+- **`cak_` at rest on the endpoint** is stored as a **DPAPI-machine-encrypted blob inside a
+  SYSTEM-ACL'd location** (HKLM value or `ProgramData` file) — both layers: the SYSTEM ACL
+  stops non-admin users reading it, and DPAPI-machine encryption makes a copied file/export
+  inert off the box. (Local admin/SYSTEM can always recover it; that is accepted — blast
+  radius of one leaked `cak_` is a single, independently-revocable machine.)
+- **`machine_uid` binding** is the spoof-guard SPEC-004 wants: a `cak_` is bound to a
+  `machine_uid`; a different box presenting another box's `cak_` is detectable.
+- **Authorization model** for moves/enrolls is possession-of-destination-key in v1
+  (identity-based authz deferred to the technician-assisted path).
+- **Open registration risk** is mitigated by requiring `(site_code + long key)` and
+  rate-limit/lockout; auto-approve is acceptable because the enrollment key is the gate and
+  every enrollment/site-move fires an alert.
+- **Audit events:** enroll, re-enroll/reuse, site-move, key-rotate — all logged with
+  `machine_uid`, site, and source IP.
+
+## Testing strategy
+
+- **Unit:** `machine_uid` derivation stability; enrollment-key verify/rotate; fingerprint
+  derivation; `cak_` mint/hash/verify; dedup decision (new vs. reuse vs. move).
+- **Integration:** enroll new → row + `cak_` issued; re-enroll same `machine_uid` → reuse,
+  no duplicate; enroll with rotated (old) key → rejected; old `cak_` still connects after
+  rotation; rate-limit/lockout trips; site-move emits alert.
+- **Manual:** build a site wrapper installer → run on a clean VM → appears in console under
+  correct site, immediately controllable; re-image VM → same row reused; `signtool verify
+  /pa` passes on the distributed wrapper and the laid-down agent.
+
+## Effort estimate & dependencies
+
+- **Size:** X-Large (agent + relay + DB migration + CI build/sign wrapper + dashboard
+  key/rotation surface).
+- **Depends on:** SPEC-004 `machine_uid` (shared root); the CI signing already shipped
+  (SPEC-001 §2 / `release.yml`).
+- **Unblocks:** SPEC-007 (installer builder gets a real per-site key + the signing
+  resolution), and the parked managed-agent test deployment on the internal beta machines.
+- **Relationship to v2 phases:** sits with the Phase-1 secure-session-core (per-agent keys
+  + identity) and feeds Phase-2 dashboard work.
+
+## Resolved decisions (2026-06-02, Mike)
+
+1. **Wrapper shape — BOTH.** v1 ships a signed bootstrapper `.exe` *and* a signed MSI per
+   site (ScreenConnect offers both; manual installs use the `.exe`, GPO/Intune fleet pushes
+   use the MSI). Same sign-once agent inside each.
+2. **`cak_` storage — BOTH layers.** DPAPI-machine-encrypted blob stored in a SYSTEM-ACL'd
+   location. Non-admins can't read it; a stolen copy is inert off the box.
+3. **Fingerprint — hex (`7F2A`).** Deliberately *not* the RMM word-code style, so the two
+   products' artifacts are never visually conflated.
+4. **`machine_uid` — per-tenant scope, hardware-derived salt, collision-gated.** Dedup key
+   `(tenant_id, machine_uid)`; salt from stable hardware signals (survives same-hardware
+   re-image, separates distinct boxes); detected collisions (e.g. template-cloned VMs
+   sharing a hardware UUID) drop to pending + alert and require dashboard confirmation to
+   activate.
+5. **Attended (support-code) path — unchanged.** `download_support` is filename-based
+   (`GuruConnect-<code>.exe`), not append-based, so renaming never breaks the signature —
+   it is already signing-safe. Only the managed `download_agent` append path is retired.
+
+## Remaining for planning
+
+- Exact stable-hardware signal set for the salt (SMBIOS UUID alone vs. + motherboard/disk
+  serial) and hypervisor behavior matrix (which hypervisors duplicate the SMBIOS UUID on
+  clone → exercise the collision-gate).
+- MSI authoring approach (WiX) and whether per-site config rides as a per-site MSI vs. a
+  base MSI + property/transform.
--- a/reports/2026-05-31-gc-audit.md
+++ b/reports/2026-05-31-gc-audit.md
@@ -0,0 +1,129 @@
+# GuruConnect Audit Report — 2026-05-31
+
+**Auditor:** Claude (claude-opus-4-8[1m])
+**Passes:** Security & Remote-Session Integrity (`--pass=security` only)
+**Previous audit:** 2026-05-30 (`reports/2026-05-30-gc-audit.md`)
+**Scope note:** v2 **Phase-1 EXIT gate** re-audit. Confirms the three relay CRITICALs stay closed and
+the prior net-new HIGH is fixed, and assesses the net-new SPEC-004 surface (Tasks 2/4/5 — machine_uid
+dedup, session reaping, operator removal) now committed + deployed. Includes **live** boundary tests
+against the running production binary, not just a code re-derivation.
+
+**Code under audit:** working tree at tag **v0.3.0 / e967cce** = the binary deployed to prod
+172.16.3.30:3002 (deployed this session from 96f9c0a; e967cce adds only the version bump + changelog).
+
+---
+
+## Executive Summary
+
+| Pass | Total | Critical | High | Medium | Low | Info |
+|------|-------|----------|------|--------|-----|------|
+| Security & Session | 4 | 0 | 0 | 0 | 0 | 4 |
+
+**Phase-1 security EXIT gate: PASS.** The relay/server plane is clean. All three 2026-05-29 CRITICALs
+remain CLOSED (verified in code AND live against the deployed server). The prior net-new HIGH (agent
+auto-update TLS bypass) and the prior LOW (chat content logged at INFO) are both remediated. The
+net-new SPEC-004 surface (operator removal, machine_uid dedup gate, session reaper/supersede) audits
+clean with the keyed-identity security invariant intact end-to-end. No net-new findings.
+
+**Requires action:** none.
+
+---
+
+## Live functional verification (deployed binary, 172.16.3.30:3002)
+
+Forged tokens (HS256, real `JWT_SECRET`) exercised the WS auth boundaries directly. Each illegitimate
+access was REJECTED (4xx, never a 101 upgrade):
+
+| Check | Result | Proves |
+|-------|--------|--------|
+| Login-shape JWT on `/ws/viewer` | **401** | Login token not accepted as a viewer token (`purpose=="viewer"` enforced) — CRITICAL #1 |
+| Validly-signed viewer token for session AAAA used on session BBBB | **403** | Session binding enforced — a correctly-signed token is refused for the wrong session — CRITICAL #1 |
+| Login JWT used as agent `api_key` on `/ws/agent` | **401** | Agent plane rejects JWTs (no JWT branch) — CRITICAL #3 |
+| Wrong-signature token on `/ws/viewer` | **401** | Signature validation holds (control) |
+
+The session-bind case is the decisive one: a token that WOULD be accepted for its own session is
+rejected 403 for a different session, proving the binding rather than mere signature validation.
+
+---
+
+## The three relay CRITICALs — verdict
+
+| CRITICAL | Verdict | Enforced at |
+|----------|---------|-------------|
+| #1 any-JWT-joins-any-session | **CLOSED** | mint authz `api/sessions.rs` (is_admin \|\| permission); viewer WS `relay/mod.rs:496` `validate_viewer_token` (sig+expiry+`purpose=="viewer"`); session-bind `relay/mod.rs:527-534` (`claim != requested → 403`) |
+| #2 viewer-WS blacklist | **CLOSED** (TTL-bounded residual unchanged) | `relay/mod.rs:509` `token_blacklist.is_revoked` before upgrade. Residual: logout revokes login JWT not minted viewer tokens (5-min TTL) — same tracked MEDIUM, no regression |
+| #3 JWT-accepted-as-agent-key | **CLOSED**, fails closed | `relay/mod.rs:417` `validate_agent_api_key` — no JWT branch; only `cak_` (`auth/agent_keys.rs`, SHA-256 vs `connect_agent_keys`, `revoked_at IS NULL`) or deprecated shared key (WARN). Unresolved machine → 503 (`:303`); client `agent_id` overridden by key identity (`:283`) |
+
+Live results match these code paths exactly.
+
+---
+
+## Prior HIGH — FIXED
+
+**Agent auto-update TLS bypass → MITM-RCE: CLOSED.** `agent/src/update.rs:21` `dev_insecure_tls()` is
+`cfg!(debug_assertions)` AND env-var gated, so a release build's `cfg!` compiles out and the agent
+ALWAYS verifies certs. Both `check_for_update` (`:64`) and `download_update` (`:130`) consume it; unit
+test `test_dev_insecure_tls_release_is_always_false` (`:362`) asserts the release invariant. No
+`danger_accept_invalid_certs(true)` reachable in production. A signed-manifest defense-in-depth TODO is
+filed at `install_update` (`:189`) (= tracked task #10, not an exit blocker).
+
+---
+
+## Pass 5: Security & Remote-Session Integrity — net-new SPEC-004 surface
+
+### [INFO] Operator removal API (`server/src/api/removal.rs`) — clean, admin-gated
+Every removal handler takes the `AdminUser` extractor as its first argument (runs before any DB
+mutation): `remove_machine` (`:88`), `remove_session` (`:321`), `bulk_remove_machines` (`:471`).
+`AdminUser` (`auth/mod.rs:141`) validates JWT (signature + expiry + blacklist `:97`) then requires
+`is_admin()` else 403 (`:146`). Soft-deletes are parameterized + idempotent (`WHERE … AND deleted_at IS
+NULL`); bulk bounded (MAX_BATCH 500) with per-id UUID validation + isolated failures; audit
+(`db/events.rs:126`) records actor + target + trusted-proxy IP, best-effort (cannot be suppressed by
+attacker-controlled input). Removal is admin-role-gated globally (not per-tenant ACL) — same Phase-1
+posture as viewer-mint, per-tenant narrowing deferred to SPEC-002 Phase 4. Acceptable by context.
+
+### [INFO] machine_uid dedup security gate — invariant holds
+Gate at `relay/mod.rs:352`: `effective_machine_uid = if is_keyed_agent { None } else { claimed }`. The
+suppressed value (not the raw claim) flows to `register_agent` and `upsert_machine`. Keyed (`cak_`)
+agents take the agent_id-keyed upsert branch and never write/touch a `ON CONFLICT (machine_uid)` row, so
+a valid key for machine X cannot repoint machine Y via a claimed uid. An un-keyed uid-spoof can only
+match a uid-bearing row — which the keyed connect path never creates; the only residual is a legacy
+pre-keying row, and the startup L1 fix (`main.rs:267-288` via `keyed_machine_ids`, fail-closed on query
+error) ensures keyed machines are never uid-indexed on restore.
+
+### [INFO] Session reaper + same-machine supersede — clean, TOCTOU closed
+`reap_stale_persistent` (`:875`) and supersede (`:322`) select under a read lock then re-assert the full
+predicate under the write lock via `remove_session_if` (`:755`). Predicate requires
+`!is_online && is_persistent && viewers.is_empty()` (+ TTL / same-uid) — an online, viewer-attached, or
+support session is never reaped/superseded. Un-keyed uid-spoof blast radius = denial-of-persistence on
+an offline same-uid session at worst, never a hijack. Lock order matches `register_agent`; predicate is
+synchronous (no await under lock).
+
+### [INFO] General posture — confirmed, no regressions
+Runtime sqlx parameterized everywhere (no `format!`-built SQL); migrations 008/009 idempotent. Frame
+caps: agent 4 MiB / viewer 64 KiB applied before upgrade. Input throttle retained. `/api/auth/login`
+rate-limited (`main.rs:397`). `JWT_SECRET` panics if <32 (`main.rs:143`); agent keys SHA-256; Argon2id
+passwords; no secret/token/code/PII logged. **Chat content no longer logged** (prior LOW fixed —
+`relay/mod.rs:829,1428` now log length only).
+
+---
+
+## Definitive answers
+
+- **(a) Any non-admin removal path?** NO — all three removal handlers gate on `AdminUser` (JWT+blacklist+`is_admin`→403) before any DB mutation.
+- **(b) Any uid-spoof that repoints/hijacks another machine's row or session (not just denial)?** NO — keyed identity is authoritative and uid-suppressed across connect → upsert → reattach → startup restore. Worst case for an un-keyed spoof is denial-of-persistence on an offline same-uid session.
+- **(c) Any auth-plane bypass (agent↔viewer credential crossover)?** NO — viewer plane requires a `purpose=="viewer"` session-bound minted token; agent plane requires a `cak_`/shared key with no JWT branch. Confirmed in code and live.
+
+---
+
+## Verdict
+
+**Phase-1 security EXIT gate: PASS.** Relay/server plane clean; prior HIGH + LOW remediated; SPEC-004
+surface sound with the keyed-identity invariant intact across the connect path, DB upsert, in-memory
+reattach, and startup restore. No new CRITICAL/HIGH/MEDIUM/LOW.
+
+**Tracked, deferred-by-design (not exit blockers):**
+- Viewer-token logout revocation residual (MEDIUM, TTL-bounded) — `v2-secure-session-core/plan.md`.
+- Update-binary signature verification (defense-in-depth, task #10) — TODO at `update.rs:189`.
+
+*Note: only `--pass=security` was run. API-surface, Rust-quality, TypeScript, protocol-integrity,
+docs-reconciliation, and CI/CD passes were not executed this run.*
--- a/server/src/db/machines.rs
+++ b/server/src/db/machines.rs
@@ -166,7 +166,7 @@ pub async fn upsert_machine(
                r#"
                INSERT INTO connect_machines (agent_id, hostname, is_persistent, status, last_seen, machine_uid)
                VALUES ($1, $2, $3, 'online', NOW(), $4)
-                ON CONFLICT (machine_uid) DO UPDATE SET
+                ON CONFLICT (machine_uid) WHERE machine_uid IS NOT NULL DO UPDATE SET
                    agent_id = EXCLUDED.agent_id,
                    hostname = EXCLUDED.hostname,
                    status = 'online',
--- a/specs/v2-secure-session-core/plan.md
+++ b/specs/v2-secure-session-core/plan.md
@@ -527,3 +527,60 @@ Reference: SPEC-002 §5; `agent/src/encoder/raw.rs` (salvaged), `proto/guruconne
 - **Rate limiting:** hammer `/api/auth/login` and the code-validate route → confirm throttling/lockout.
 - **Migrations:** fresh DB applies the v2 migrations cleanly; `_sqlx_migrations` consistent; `tenant_id`
  populated with the default tenant.
+
+---
+
+## Task 9 [PROPOSED 2026-06-01 — provisioning model = TOFU auto-enroll, chosen by Mike]: `cak_` auto-enroll provisioning + shared-key retirement
+
+> Context: Task 2 built the SERVER `cak_` machinery (mint/SHA-256 hash/verify in `auth/agent_keys.rs`,
+> relay validation in `validate_agent_api_key`, admin issuance `POST /api/machines/:id/keys`). What's
+> missing is how an AGENT obtains and uses a `cak_` — today agents still carry the deprecated shared
+> `AGENT_API_KEY`, so `connect_agent_keys` is empty and the relay logs the DEPRECATED-shared-key warning
+> for every agent. This task closes that with **trust-on-first-use auto-enroll** so the shared key can be
+> retired (unblocks task list #5). NOTE: the agent already presents whatever is in its `api_key` slot and
+> the relay auto-detects `cak_` vs shared — so a `cak_`-keyed agent needs **no change to its auth call**,
+> only a way to *receive*, *persist*, and *prefer* a `cak_`.
+
+**Flow (TOFU):**
+1. **Bootstrap (first connect):** a fresh agent authenticates on `/ws/agent` with a bootstrap secret —
+   interim: the shared `AGENT_API_KEY` (embedded by the download endpoint); target: a single-use,
+   short-lived **enroll token** (more secure TOFU — see Security). 
+2. **Server issues on first connect:** when an agent authed via the bootstrap path (i.e. NOT already
+   `cak_`-keyed) connects and its machine has **no active (non-revoked) `cak_`**, the relay: resolves/creates
+   the machine row (existing `upsert_machine` on `machine_uid` — now functional after the 2026-06-01
+   ON CONFLICT fix), mints a `cak_` (`generate_agent_key` + `db::agent_keys::insert_agent_key` for that
+   `machine_id`), and sends the plaintext key to the agent **once** over a new server→agent message. Only
+   the hash is stored. **Idempotent:** never re-issue if an active key already exists for the machine.
+3. **Agent receives + persists + prefers:** on `AgentKeyProvision`, the agent persists the `cak_` durably at
+   `%ProgramData%\GuruConnect\agent_key` (restricted ACL, same pattern as `machine_uid`). On startup it loads
+   the persisted `cak_` if present and uses it as its auth key, falling back to the embedded/bootstrap secret
+   only when no `cak_` is stored yet. After provisioning, every reconnect authenticates via `cak_` (no more
+   DEPRECATED-shared-key warning for that agent).
+4. **Shared-key retirement (phased):** Phase A — shared key stays as the bootstrap so existing+new agents
+   self-enroll; monitor the relay WARN count → ~0. Phase B — once the fleet is `cak_`-keyed, restrict the
+   shared `AGENT_API_KEY` to enrollment-only or remove the env entirely (only `cak_` / enroll-token accepted).
+   This is the concrete completion of task-list #5.
+
+**Protocol (4-artifact drift discipline):** add `AgentKeyProvision { string key = 1; }` (server→agent) to
+`proto/guruconnect.proto` with a new reserved message ID; regenerate prost on both agent + server; the
+hand-written `dashboard/src/lib/protobuf.ts` decoder does NOT need it (agent-plane only) but reserve the ID.
+
+**Files:** `proto/guruconnect.proto` (new message); `server/src/relay/mod.rs` (issue+send on bootstrap connect
+with no active key); `server/src/db/agent_keys.rs` (add `has_active_key(machine_id)` check; reuse insert);
+`agent/src/transport/*` (handle inbound `AgentKeyProvision`); `agent/src/config.rs` + a small key-store module
+(load/persist `cak_`, prefer over bootstrap).
+
+**Security (TOFU):** the first connect trusts the bootstrap secret — a leaked shared key during the enroll
+window could enroll a rogue agent; the secure target is a **single-use, short-lived enroll token** per
+deployment instead of the shared key (shared-key bootstrap is interim convenience). The `cak_` is sent
+plaintext once over the existing wss/TLS channel; only the hash is stored server-side; the agent stores it
+locally with restricted ACLs. Revocation via the existing `DELETE /api/machines/:id/keys/:key_id` fails the
+agent closed; on its next bootstrap connect it re-enrolls. The keyed-agent dedup (Task 3) keeps the
+authenticated identity authoritative.
+
+**Verification:** drop a current-build (signed 0.3.0+) agent configured with the shared-key bootstrap →
+it connects, receives a `cak_`, persists it; restart → it authenticates via the `cak_` (relay shows NO
+DEPRECATED-shared-key warning) and `connect_agent_keys` holds exactly one active key for the machine; issue
+is idempotent across reconnects; revoke the key via the admin API → agent rejected, then re-enrolls on next
+bootstrap connect. Reference: `auth/agent_keys.rs`, `api/machine_keys.rs`, `relay/mod.rs:266-309`
+(`validate_agent_api_key`), `.claude/standards/security/credential-handling.md`.
Author	SHA1	Message	Date
Mike Swanson	c286a29b9d	spec: SPEC-016 resolve all 5 open questions (enrollment design decisions) Some checks are pending Build and Test / Build Server (Linux) (push) Waiting to run Details Build and Test / Build Agent (Windows) (push) Waiting to run Details Build and Test / Security Audit (push) Waiting to run Details Build and Test / Build Summary (push) Blocked by required conditions Details Fold the 2026-06-02 interview decisions into SPEC-016: - Installer wrapper: ship BOTH signed .exe and signed MSI per site - cak_ at-rest storage: DPAPI-machine-encrypted blob in a SYSTEM-ACL'd location - Fingerprint: hex (7F2A), deliberately unlike RMM word-codes - machine_uid: per-tenant scope + hardware-derived salt (survives re-image, separates distinct boxes) + collision-gated activation (template-cloned VMs sharing a hardware UUID drop to pending + alert, need dashboard confirm) - Attended support-code path: unchanged (filename-based, already signing-safe) Open Questions section -> Resolved decisions + a short Remaining-for-planning list (exact hardware salt signal set, WiX/MSI authoring approach). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 09:54:19 -07:00
Mike Swanson	18429f6fe3	spec: add SPEC-016 zero-touch per-site agent enrollment All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 10m46s Details Build and Test / Build Server (Linux) (push) Successful in 15m33s Details Build and Test / Security Audit (push) Successful in 6m3s Details Build and Test / Build Summary (push) Successful in 25s Details ScreenConnect-class managed enrollment: one signed installer per site, machines self-register on first run and the server mints a per-machine cak_ key bound to a deterministic machine_uid (dedups re-installs). Per-site rotatable enrollment key (long secret + vN (XXXX) fingerprint); rotating blocks new enrollments from old installers, leaves enrolled agents untouched. Auto-approve + new-enrollment/site-move alert. Resolves SPEC-007's signature-vs-appended-config open question: sign the base agent once in CI + per-site signed wrapper that writes site config around the signed bytes (never appended into the PE). Deferred (room reserved): enrollment policy + per-seat licensing, --enroll-key/--site-code/--reassign flag overrides, technician-assisted interactive install. Tracking todo dbfe6a56. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 09:13:59 -07:00
Mike Swanson	3b9e4068c9	docs(roadmap): mark release signing shipped; add signed beta channel as P1-NOW All checks were successful Build and Test / Build Server (Linux) (push) Successful in 14m11s Details Build and Test / Build Agent (Windows) (push) Successful in 8m3s Details Build and Test / Security Audit (push) Successful in 5m38s Details Build and Test / Build Summary (push) Successful in 17s Details Release-path Azure Trusted Signing and auto-versioning were already shipped with v0.3.0 (stale [ ] -> [x]). Add a new P1/NOW item for a signed beta/test release channel: the auto build-and-test.yml agent artifact is unsigned, so testers can receive unsigned binaries. The beta channel (now implemented in release.yml) closes that gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 07:57:04 -07:00
Mike Swanson	87f229509b	ci(release): add signed beta/test release channel Some checks failed Build and Test / Build Server (Linux) (push) Has started running Details Build and Test / Build Agent (Windows) (push) Has started running Details Build and Test / Security Audit (push) Has been cancelled Details Build and Test / Build Summary (push) Has been cancelled Details Add a `channel: stable \| beta` workflow_dispatch input to release.yml. `stable` is unchanged (byte-for-byte). `beta` produces a Windows agent binary signed by the identical fail-closed Azure Trusted Signing path, but skips the semver bump, changelog, and release commit, and publishes a prerelease-tagged Gitea release (vX.Y.Z-beta.<run_number>) at HEAD. So every binary handed to a tester is signed, not just formal releases. - prerelease tags excluded from stable LAST_TAG detection (both lookups) so a beta tag can't corrupt the next stable version computation - beta tag force-created/pushed -> idempotent on failed-run re-runs - changelog download gated to stable; release prerelease flag plumbed through to the Gitea REST payload Reviewed-by: Code Review Agent (APPROVE WITH NITS; N1 hardened) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 07:56:17 -07:00
Mike Swanson	40c7d860cc	spec(v2-session-core): add Task 9 — cak_ auto-enroll provisioning (TOFU) + shared-key retirement All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m10s Details Build and Test / Build Server (Linux) (push) Successful in 10m31s Details Build and Test / Security Audit (push) Successful in 4m1s Details Build and Test / Build Summary (push) Successful in 9s Details	2026-06-01 14:40:14 -07:00
Mike Swanson	0059b21db6	fix(server): revert migration 008 comment edit — modifying an applied sqlx migration breaks its checksum and crash-loops the server on startup; machines.rs ON CONFLICT fix retained All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m33s Details Build and Test / Build Server (Linux) (push) Successful in 11m57s Details Build and Test / Security Audit (push) Successful in 4m33s Details Build and Test / Build Summary (push) Successful in 11s Details	2026-06-01 10:05:38 -07:00
Mike Swanson	f950511e3e	fix(server): bind machine_uid upsert ON CONFLICT to the partial index (WHERE machine_uid IS NOT NULL) Some checks failed Build and Test / Build Agent (Windows) (push) Successful in 8m16s Details Build and Test / Build Server (Linux) (push) Successful in 11m58s Details Build and Test / Security Audit (push) Has started running Details Build and Test / Build Summary (push) Has been cancelled Details Bare ON CONFLICT (machine_uid) could not bind to migration 008's partial unique index, so no connect_machines row was persisted for any agent reporting a machine_uid. Confirmed live on 172.16.3.30 with a signed 0.3.0 test agent.	2026-06-01 09:50:34 -07:00
Mike Swanson	16017456aa	docs: 2026-05-31 security re-audit (Phase-1 EXIT) + roadmap reconcile All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 6m59s Details Build and Test / Build Server (Linux) (push) Successful in 10m35s Details Build and Test / Security Audit (push) Successful in 4m3s Details Build and Test / Build Summary (push) Successful in 7s Details /gc-audit --pass=security re-pass over the deployed v0.3.0 code: PASS, 0 CRITICAL/HIGH/MEDIUM/LOW. The 3 relay CRITICALs stay closed (verified in code AND live against the deployed binary), the prior agent-update-TLS HIGH and chat-logging LOW are fixed, and the net-new SPEC-004 surface (machine_uid dedup gate, session reaper/supersede, operator removal API) audits clean — no non-admin removal path, no uid-spoof hijack, no auth-plane crossover. Marks v2 Phase 1 formally exited (secure-session-core Task 8 complete). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 18:19:09 -07:00