SPEC-016 Phase B: agent enrollment (machine_uid, first-run enroll, cak_ storage) #6

Closed
azcomputerguru wants to merge 0 commits from feat/spec-016-phase-b-agent into main

Phase B of SPEC-016 (zero-touch per-site enrollment) — agent side. Builds on the merged Phase A server backend.

What's here

  • machine_uid: hardware-salted (SMBIOS UUID, fallback board+disk serial), re-image-stable (MachineGuid demoted to last-resort floor only); deterministic.
  • First-run enrollment client: POST /api/enroll, handles 201/200/202/401/409/429; persists the minted cak_; backoff, no hot-loop, no secrets logged.
  • cak_ at-rest store: DPAPI-machine-encrypted blob in a SYSTEM+Administrators ACL'd ProgramData path, dir locked before any secret write, atomic.
  • Run-mode wiring: stored cak_ -> connect; none -> enroll -> store -> connect; deprecated api_key fallback only when no cak_.

Review

REQUEST CHANGES -> all 6 findings fixed + focused re-review CONFIRMED CLOSED:

  • H1 re-image-stable machine_uid (no MachineGuid in salted digest)
  • C1 fail-fast (clear "must run as SYSTEM service / SPEC-017" error) instead of silent brick; SYSTEM ACL kept
  • H2 ACL-before-secret ordering; M1 load_cak access-denied vs decrypt-failure; M2 PowerShell timeout; L1 degradation warn

Dependency / not-yet-runnable

The managed agent must run as a SYSTEM service for the SYSTEM-ACL'd cak_ store to be readable end-to-end; that service host is a SEPARATE upcoming spec (SPEC-017, per Mike's decision to split it from SPEC-013). Until it lands, the agent fail-fasts with a clear message rather than bricking. Enrollment logic is unit-tested (52 tests) but the live enroll->store->connect cycle is integration-tested once the service host exists.

Local verify (Windows host): fmt --check, clippy -D warnings, release build (x86_64-pc-windows-msvc), 52 tests — all green.

Spec: docs/specs/SPEC-016-zero-touch-enrollment.md.

Phase B of SPEC-016 (zero-touch per-site enrollment) — agent side. Builds on the merged Phase A server backend. ## What's here - machine_uid: hardware-salted (SMBIOS UUID, fallback board+disk serial), re-image-stable (MachineGuid demoted to last-resort floor only); deterministic. - First-run enrollment client: POST /api/enroll, handles 201/200/202/401/409/429; persists the minted cak_; backoff, no hot-loop, no secrets logged. - cak_ at-rest store: DPAPI-machine-encrypted blob in a SYSTEM+Administrators ACL'd ProgramData path, dir locked before any secret write, atomic. - Run-mode wiring: stored cak_ -> connect; none -> enroll -> store -> connect; deprecated api_key fallback only when no cak_. ## Review REQUEST CHANGES -> all 6 findings fixed + focused re-review CONFIRMED CLOSED: - H1 re-image-stable machine_uid (no MachineGuid in salted digest) - C1 fail-fast (clear "must run as SYSTEM service / SPEC-017" error) instead of silent brick; SYSTEM ACL kept - H2 ACL-before-secret ordering; M1 load_cak access-denied vs decrypt-failure; M2 PowerShell timeout; L1 degradation warn ## Dependency / not-yet-runnable The managed agent must run as a SYSTEM service for the SYSTEM-ACL'd cak_ store to be readable end-to-end; that service host is a SEPARATE upcoming spec (SPEC-017, per Mike's decision to split it from SPEC-013). Until it lands, the agent fail-fasts with a clear message rather than bricking. Enrollment logic is unit-tested (52 tests) but the live enroll->store->connect cycle is integration-tested once the service host exists. Local verify (Windows host): fmt --check, clippy -D warnings, release build (x86_64-pc-windows-msvc), 52 tests — all green. Spec: docs/specs/SPEC-016-zero-touch-enrollment.md.
azcomputerguru added 6 commits 2026-06-02 12:58:58 -07:00
Extend the SPEC-004 machine_uid derivation with the locked SPEC-016
hardware salt: combine the Windows MachineGuid with the SMBIOS system
UUID (Win32_ComputerSystemProduct.UUID), falling back to motherboard
serial (Win32_BaseBoard.SerialNumber) + primary disk serial when the
SMBIOS UUID is absent or a degenerate placeholder (all-zeros / all-FFs,
emitted by some OEMs and hypervisor templates).

Signals are read via narrow PowerShell CIM queries (hidden window, no
profile) rather than adding a WMI crate or hand-rolling COM IWbemServices
for two scalar reads. Values are normalized (trim + upper-case) so vendor
case/space drift never perturbs the digest. The combined string is
SHA-256'd into the existing opaque muid_<hex> shape, preserving the wire
identity the relay connect path already reports while making it survive an
OS re-image on the same hardware. Which signal set fed the result is
logged (source label only, never the secret values).

Adds unit tests for derivation determinism + signal-sensitivity,
degenerate-SMBIOS rejection, and signal normalization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add enrollment_key + site_code to EmbeddedConfig and the resolved Config
alongside the existing labels, and add department/device_type label fields
(SPEC-007 AgentStatus parity). The legacy api_key is retained but made
optional/defaulted so a SPEC-016 site installer can carry only the
enrollment credentials; existing pre-enrollment installers still parse.

The enrollment fields are #[serde(skip)] on Config so they are never
written to the on-disk TOML (install-time material only); apply_enrollment_env
layers them from GURUCONNECT_ENROLLMENT_KEY / GURUCONNECT_SITE_CODE on the
file and env load paths. The embedded path carries them from the install
blob. Config delivery itself (signed wrapper) is Phase C and unchanged here.

Add Config::https_base() deriving the REST API base (https://host[:port])
from the wss:// server_url so the enroll client and the persistent
transport share one authority.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Store the per-machine cak_ with BOTH layers Mike locked: DPAPI-machine
encryption (CryptProtectData with CRYPTPROTECT_LOCAL_MACHINE — a copied
blob is inert off the box) inside a SYSTEM/Administrators-only ACL'd file
at %ProgramData%\GuruConnect\credentials\agent.cak. The directory + file
ACL is hardened via icacls (/inheritance:r + grant to the well-known SIDs
*S-1-5-18 and *S-1-5-32-544, locale-independent) — auditable, with far
less unsafe FFI than building a registry-key security descriptor by hand.
Co-locates with the existing %ProgramData%\GuruConnect config/seed dir.

Provides store_cak / load_cak / clear_cak. store_cak writes atomically
(temp file + rename in the locked dir). load_cak treats a present-but-
undecryptable blob as a hard error (tamper / cross-machine copy) rather
than silently re-enrolling over it. The plaintext is never logged; the
transient plaintext copy is scrubbed after encryption. DPAPI output blobs
are LocalFree'd. Enables the Win32_Security_Cryptography windows feature.

Round-trip unit tests cover encrypt/decrypt recovery across lengths and
that a tampered blob fails to decrypt (DPAPI authenticates its blobs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New enroll module: on a managed agent with no stored cak_ but with
enrollment_key + site_code, POST machine_uid + hostname + labels to
<https-base>/api/enroll and persist the minted cak_. Handles every Phase A
status code distinctly:
  - 201 new / 200 reuse -> persist cak_ (DPAPI store) and connect
  - 202 collision_pending -> log "pending operator confirmation", slow
    re-check loop (no key issued; cannot connect until confirmed)
  - 401 ENROLL_REJECTED / 409 ENROLL_SITE_CONFLICT -> distinct actionable
    errors, long backoff (won't fix without operator action, but recovers
    automatically once it does) — no tight loop
  - 429 -> honor Retry-After, short backoff
  - network / 5xx / decode -> short backoff
The enrollment_key and cak_ are never logged. Uses the existing reqwest
client and the update path's TLS posture (rustls; dev-insecure only in
debug + opt-in). Wire-contract unit tests pin the request shape against
the server's EnrollRequest/EnrollLabels and decode active + pending bodies.

main.rs run-mode wiring: before a managed agent connects, resolve the
operating credential by precedence — stored cak_ (steady state, no
network) -> first-run enrollment -> DEPRECATED legacy api_key (transition
only, logged at WARNING) -> error. The relay already accepts the cak_ as
the api_key query param, so the persistent transport authenticates with it
unchanged. Attended/support-code and viewer paths are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
H1: derive machine_uid from the durable hardware salt ALONE (SMBIOS UUID, or
board+disk serial) plus a fixed namespace, so it survives an OS re-image (which
regenerates MachineGuid). MachineGuid is demoted to a last-resort signal used
only when no hardware salt is readable (volatile, reboot-only floor). Re-image
stability proven by salted_uid_is_reimage_stable_independent_of_machine_guid.

H2: in store_cak, lock the directory ACL BEFORE any secret bytes are written;
the temp file is created inside the already-locked dir, then renamed. No
ciphertext ever exists at an inherited/world-readable path. Ordering made an
explicit precondition, not an unstated inheritance assumption.

M1: load_cak now returns a LoadCakError enum distinguishing Io (incl.
PermissionDenied — operational) from Decrypt (the real tamper/wrong-machine
signal). Only a successful READ whose DPAPI decrypt fails hard-stops.

M2: the PowerShell SMBIOS/board/disk shell-out is spawned and waited on with a
10s wall-clock bound; on timeout the child is killed and the signal is treated
as missing (falls back through the chain), never panics. Keeps
CREATE_NO_WINDOW -NonInteractive -NoProfile.

L1: warn! breadcrumb when the salted derivation degrades to MachineGuid-only,
so the server-side collision-gate operator has a clue. No secret values logged.

C1: keep the SYSTEM+Administrators ACL (Option A target). store_cak now does a
read-back verification immediately after writing and fails at ENROLL time if
this context cannot read its own store; resolve_agent_credential fails fast with
an actionable SPEC-017 message on an access-denied store instead of silently
re-enrolling/bricking. Guarded comment notes this is satisfied once the SYSTEM
service host lands.

Deferred items (clear_cak placeholder, legacy api_key path) left as-is.

Verification on x86_64-pc-windows-msvc: cargo fmt --check clean, clippy
-D warnings clean, release build OK, 52 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
spec: add SPEC-017 end-user (sub-user) remote access
Some checks failed
Build and Test / Build Agent (Windows) (pull_request) Successful in 10m54s
Build and Test / Build Server (Linux) (push) Has been cancelled
Build and Test / Build Agent (Windows) (push) Has started running
Build and Test / Security Audit (push) Has been cancelled
Build and Test / Build Summary (push) Has been cancelled
Build and Test / Build Server (Linux) (pull_request) Successful in 15m39s
Build and Test / Security Audit (pull_request) Successful in 5m54s
Build and Test / Build Summary (pull_request) Successful in 36s
4c49b73a71
Author
Owner

Phase B agent commits (d0b8db0..367906b) are already on main (fast-forwarded); SPEC-018 + the SPEC-017->018 ref fix landed separately on main. Closing as already-merged.

Phase B agent commits (d0b8db0..367906b) are already on main (fast-forwarded); SPEC-018 + the SPEC-017->018 ref fix landed separately on main. Closing as already-merged.
azcomputerguru closed this pull request 2026-06-02 13:14:54 -07:00
Some checks failed
Build and Test / Build Agent (Windows) (pull_request) Successful in 10m54s
Build and Test / Build Server (Linux) (push) Has been cancelled
Build and Test / Build Agent (Windows) (push) Has started running
Build and Test / Security Audit (push) Has been cancelled
Build and Test / Build Summary (push) Has been cancelled
Build and Test / Build Server (Linux) (pull_request) Successful in 15m39s
Build and Test / Security Audit (pull_request) Successful in 5m54s
Build and Test / Build Summary (pull_request) Successful in 36s

Pull request closed

Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: azcomputerguru/guru-connect#6