Files
guru-connect/agent/src/credential_store.rs
Mike Swanson 55b9c97b28
Some checks failed
Build and Test / Build Agent (Windows) (push) Failing after 11m16s
Build and Test / Build Server (Linux) (push) Successful in 12m22s
Build and Test / Security Audit (push) Successful in 8m19s
Build and Test / Build Summary (push) Has been skipped
fix(agent): point Phase B fail-fast guard at SPEC-018
The SPEC-016 Phase B credential-store guard referenced "SPEC-017" for the
forthcoming SYSTEM service host, but 017 is now Mike's end-user-access
spec; the service host is SPEC-018. Comment + error-string text only, no
logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 13:13:13 -07:00

414 lines
18 KiB
Rust

//! At-rest storage for the per-machine operating credential (`cak_`).
//!
//! SPEC-016 Phase B, item 4 + §Security. The `cak_` minted by `/api/enroll` is
//! the high-sensitivity, per-machine, independently-revocable operating
//! credential. It is stored with **two independent layers** (Mike's locked
//! decision — "BOTH layers"):
//!
//! 1. **DPAPI-machine encryption** (`CryptProtectData` with
//! `CRYPTPROTECT_LOCAL_MACHINE`): the on-disk bytes are a DPAPI blob keyed to
//! THIS machine. A copied/exfiltrated file is inert on any other box — DPAPI
//! machine keys do not leave the machine.
//! 2. **SYSTEM/Administrators-only ACL** on the containing directory + file: a
//! non-admin user cannot even read the ciphertext. Inheritance is removed and
//! only `SYSTEM` and `BUILTIN\Administrators` are granted full control.
//!
//! Local admin / SYSTEM can always recover the value — that is accepted (SPEC-016
//! §Security): the blast radius of one leaked `cak_` is a single, independently
//! revocable machine.
//!
//! Storage location (chosen over an HKLM value): a file under
//! `%ProgramData%\GuruConnect\credentials\agent.cak`. Rationale — the agent
//! already keeps its config and the `machine_uid` fallback seed under
//! `%ProgramData%\GuruConnect`, so co-locating keeps a single protected
//! directory; and a directory/file ACL applied via `icacls` is auditable with far
//! less unsafe FFI than building a registry-key security descriptor by hand. Both
//! storage shapes are explicitly permitted by the spec.
//!
//! SECURITY: the plaintext `cak_` is NEVER logged. Errors describe the operation,
//! not the value.
#![cfg(windows)]
use anyhow::{anyhow, Context, Result};
use std::path::PathBuf;
use thiserror::Error;
/// Failure classes for [`load_cak`], so callers can distinguish an *operational*
/// problem (the file exists but this process cannot open/read it — e.g. running in
/// the wrong security context against a SYSTEM-only-ACL'd store) from the real
/// *tamper / wrong-machine* signal (the file was read successfully but DPAPI
/// decryption failed).
///
/// The distinction matters for the run-mode resolver (`main.rs`):
/// - [`LoadCakError::Io`] is recoverable/actionable — log it and STOP (do not
/// silently re-enroll over a store we simply can't read in this context).
/// - [`LoadCakError::Decrypt`] is a hard tamper signal — STOP, do not re-enroll.
#[derive(Debug, Error)]
pub enum LoadCakError {
/// The store path could not be resolved (e.g. `%ProgramData%` unset).
#[error("could not resolve credential store path: {0}")]
Path(String),
/// An IO/open/read error reaching the stored blob — INCLUDING
/// `PermissionDenied` (the running context lacks rights to the SYSTEM-only
/// store). Operational, not a tamper signal.
#[error("credential store is present but could not be read in this context: {source}")]
Io {
/// Whether this was specifically an access-denied error (drives the
/// run-mode fail-fast guard in `main.rs`).
permission_denied: bool,
source: std::io::Error,
},
/// The blob was read successfully but DPAPI decryption FAILED — the real
/// tamper / wrong-machine / corruption signal. A hard stop; never re-enroll.
#[error("stored credential failed to decrypt (wrong machine, tampered, or corrupted): {0}")]
Decrypt(String),
}
/// Directory holding the protected credential file.
fn credentials_dir() -> Result<PathBuf> {
let program_data =
std::env::var("ProgramData").context("ProgramData environment variable is not set")?;
Ok(PathBuf::from(program_data)
.join("GuruConnect")
.join("credentials"))
}
/// Full path to the DPAPI-encrypted `cak_` blob.
fn cak_path() -> Result<PathBuf> {
Ok(credentials_dir()?.join("agent.cak"))
}
/// Persist `cak` encrypted at rest.
///
/// Ordering is security-critical (H2 — TOCTOU): the directory ACL is locked
/// BEFORE any secret bytes touch the filesystem, and the temp file is written
/// INSIDE the already-locked directory, so no ciphertext ever exists at a path
/// carrying an inherited (potentially world-readable) ACL:
///
/// 1. `create_dir_all(dir)` — ensure the directory exists.
/// 2. `lock_down_acl(dir)` — remove inherited ACEs and grant SYSTEM +
/// Administrators full control, made inheritable `(OI)(CI)` so children
/// created afterward are covered. This is an explicit precondition for the
/// write that follows — NOT an unstated inheritance assumption.
/// 3. DPAPI-machine-encrypt the plaintext.
/// 4. Write the ciphertext to a temp file inside the now-locked directory, then
/// rename over the target (atomic-ish replace).
/// 5. `lock_down_acl(file)` — assert the file's own ACL (belt-and-suspenders; the
/// file already inherits the directory's restrictive ACEs).
/// 6. C1 read-back: immediately attempt [`load_cak`] to PROVE the running
/// security context can read its own store. If it cannot (e.g. a non-SYSTEM
/// run wrote a SYSTEM-only store it can no longer read), fail HERE at enroll
/// time with an actionable error — rather than silently bricking on the next
/// boot when the steady-state path tries to load it.
///
/// Returns an error (never logs the plaintext) on any failure so the caller can
/// surface it / retry.
pub fn store_cak(cak: &str) -> Result<()> {
// 1 + 2: lock the directory ACL BEFORE writing any secret (H2 / TOCTOU).
let dir = credentials_dir()?;
std::fs::create_dir_all(&dir)
.with_context(|| format!("failed to create credentials dir {dir:?}"))?;
lock_down_acl(&dir).context("failed to restrict credentials directory ACL")?;
// 3: encrypt only after the destination directory is locked down.
let ciphertext = dpapi_protect(cak.as_bytes()).context("DPAPI encryption of cak_ failed")?;
// 4: write the temp file INSIDE the already-locked directory, then rename.
let path = cak_path()?;
let tmp = path.with_extension("cak.tmp");
std::fs::write(&tmp, &ciphertext)
.with_context(|| format!("failed to write temp credential file {tmp:?}"))?;
std::fs::rename(&tmp, &path)
.with_context(|| format!("failed to place credential file {path:?}"))?;
// 5: assert the file ACL too (the file already inherits the dir's ACEs).
lock_down_acl(&path).context("failed to restrict credential file ACL")?;
// 6: C1 read-back — confirm THIS context can read back what it just wrote.
// Catches the "wrote a SYSTEM-only store from a non-SYSTEM context" footgun at
// enroll time instead of as a silent brick on the next launch.
match load_cak() {
Ok(Some(_)) => {
tracing::info!("[ENROLL] stored per-machine credential (encrypted at rest)");
Ok(())
}
Ok(None) => Err(anyhow!(
"stored the credential but read-back returned nothing — refusing to proceed \
with an unverifiable credential store"
)),
Err(LoadCakError::Io {
permission_denied: true,
..
}) => Err(anyhow!(
"[ENROLL] wrote the credential store but cannot read it back in THIS security \
context (access denied). The store is ACL'd to SYSTEM + Administrators by \
design; the managed agent must run as the GuruConnect SYSTEM service (see \
SPEC-018) to read it. Refusing to leave an unreadable store behind."
)),
Err(e) => Err(anyhow::Error::new(e)
.context("stored the credential but the immediate read-back verification failed")),
}
}
/// Load and decrypt the stored `cak_`, or `Ok(None)` if no credential is stored.
///
/// Error classification (M1) — the caller MUST treat these differently:
/// - `Ok(None)` -> no store yet (NotFound or empty); enroll is fine.
/// - [`LoadCakError::Io`] -> the store exists but is unreadable in this
/// context (open/read error, INCLUDING access-denied). Operational; the caller
/// logs it and STOPS — it must NOT silently re-enroll over a store it merely
/// cannot read here.
/// - [`LoadCakError::Decrypt`] -> the bytes were read but DPAPI decryption
/// FAILED (wrong machine / tampered / corrupted). A hard tamper signal; STOP.
///
/// Only a successful READ whose decrypt fails is the tamper signal — an IO or
/// permission error is never conflated with tamper.
pub fn load_cak() -> std::result::Result<Option<String>, LoadCakError> {
let path = cak_path().map_err(|e| LoadCakError::Path(e.to_string()))?;
let ciphertext = match std::fs::read(&path) {
Ok(bytes) => bytes,
Err(e) if e.kind() == std::io::ErrorKind::NotFound => return Ok(None),
Err(e) => {
let permission_denied = e.kind() == std::io::ErrorKind::PermissionDenied;
return Err(LoadCakError::Io {
permission_denied,
source: e,
});
}
};
if ciphertext.is_empty() {
return Ok(None);
}
// Reaching here means the READ succeeded — so a decrypt failure now IS the real
// tamper / wrong-machine signal (never conflated with an IO/permission error).
let plaintext =
dpapi_unprotect(&ciphertext).map_err(|e| LoadCakError::Decrypt(e.to_string()))?;
let cak = String::from_utf8(plaintext)
.map_err(|e| LoadCakError::Decrypt(format!("decrypted bytes were not valid UTF-8: {e}")))?;
if cak.is_empty() {
return Ok(None);
}
Ok(Some(cak))
}
/// Remove the stored credential (e.g. on revocation / forced re-enroll).
/// Succeeds if the file is already absent.
///
/// Part of the store/load/clear API the spec requires (SPEC-016 item 4). Not yet
/// called from a code path — the relay-side `cak_` revocation / forced re-enroll
/// flow that drives it is the deferred SPEC-016 Phase B/D server work (the
/// `TODO(SPEC-016 Phase B/D): consider revoking existing cak_ on collision` note
/// in `server/src/api/enroll.rs`) — so it is retained as part of the complete
/// store API and explicitly allowed dead until that server work lands.
#[allow(dead_code)]
pub fn clear_cak() -> Result<()> {
let path = cak_path()?;
match std::fs::remove_file(&path) {
Ok(()) => {
tracing::info!("[ENROLL] cleared stored per-machine credential");
Ok(())
}
Err(e) if e.kind() == std::io::ErrorKind::NotFound => Ok(()),
Err(e) => Err(e).with_context(|| format!("failed to remove {path:?}")),
}
}
// ---------------------------------------------------------------------------
// DPAPI (machine scope)
// ---------------------------------------------------------------------------
/// DPAPI-machine-encrypt `plaintext` into a self-contained blob.
fn dpapi_protect(plaintext: &[u8]) -> Result<Vec<u8>> {
use windows::Win32::Security::Cryptography::{
CryptProtectData, CRYPTPROTECT_LOCAL_MACHINE, CRYPT_INTEGER_BLOB,
};
// CryptProtectData requires a mutable input pointer in the struct, though it
// does not modify the bytes; copy into a local Vec to get a *mut without
// aliasing the caller's slice.
let mut input = plaintext.to_vec();
let in_blob = CRYPT_INTEGER_BLOB {
cbData: u32::try_from(input.len()).context("plaintext too large for DPAPI")?,
pbData: input.as_mut_ptr(),
};
let mut out_blob = CRYPT_INTEGER_BLOB::default();
// SAFETY: in_blob points at a valid, sized buffer; out_blob is owned here and
// its pbData is allocated by DPAPI (freed via LocalFree below). No prompt
// struct / entropy / reserved args.
unsafe {
CryptProtectData(
&in_blob,
windows::core::PCWSTR::null(),
None,
None,
None,
CRYPTPROTECT_LOCAL_MACHINE,
&mut out_blob,
)
.context("CryptProtectData failed")?;
}
let result = copy_and_free_blob(&out_blob);
// Best-effort scrub of the transient plaintext copy.
input.iter_mut().for_each(|b| *b = 0);
result.ok_or_else(|| anyhow!("CryptProtectData returned an empty/invalid blob"))
}
/// DPAPI-decrypt a blob previously produced by [`dpapi_protect`] on this machine.
fn dpapi_unprotect(ciphertext: &[u8]) -> Result<Vec<u8>> {
use windows::Win32::Security::Cryptography::{
CryptUnprotectData, CRYPTPROTECT_LOCAL_MACHINE, CRYPT_INTEGER_BLOB,
};
let mut input = ciphertext.to_vec();
let in_blob = CRYPT_INTEGER_BLOB {
cbData: u32::try_from(input.len()).context("ciphertext too large for DPAPI")?,
pbData: input.as_mut_ptr(),
};
let mut out_blob = CRYPT_INTEGER_BLOB::default();
// SAFETY: as in dpapi_protect — valid sized input, owned output freed below.
unsafe {
CryptUnprotectData(
&in_blob,
None,
None,
None,
None,
CRYPTPROTECT_LOCAL_MACHINE,
&mut out_blob,
)
.context("CryptUnprotectData failed")?;
}
copy_and_free_blob(&out_blob)
.ok_or_else(|| anyhow!("CryptUnprotectData returned an empty/invalid blob"))
}
/// Copy a DPAPI output blob into an owned `Vec` and `LocalFree` the DPAPI buffer.
///
/// Returns `Some(bytes)` on success, `None` if the blob is null/empty. Always
/// frees `pbData` when non-null (DPAPI allocates it with `LocalAlloc`).
fn copy_and_free_blob(
blob: &windows::Win32::Security::Cryptography::CRYPT_INTEGER_BLOB,
) -> Option<Vec<u8>> {
use windows::Win32::Foundation::{LocalFree, HLOCAL};
if blob.pbData.is_null() {
return None;
}
// SAFETY: DPAPI guarantees pbData points at cbData valid bytes on success.
let bytes = unsafe { std::slice::from_raw_parts(blob.pbData, blob.cbData as usize).to_vec() };
// SAFETY: pbData was allocated by DPAPI via LocalAlloc; free it once.
unsafe {
let _ = LocalFree(HLOCAL(blob.pbData as *mut core::ffi::c_void));
}
if bytes.is_empty() {
None
} else {
Some(bytes)
}
}
// ---------------------------------------------------------------------------
// ACL hardening
// ---------------------------------------------------------------------------
/// Restrict `path` (file or directory) to SYSTEM + Administrators full control,
/// removing inherited ACEs so a permissive parent grant cannot leak read access.
///
/// Implemented via `icacls` — the documented, auditable mechanism — rather than
/// hand-rolling a security descriptor through `SetNamedSecurityInfoW` (hundreds
/// of lines of SID/ACL FFI). `icacls` ships on every supported Windows target.
/// A failure here is surfaced (the caller treats inability to lock down the
/// credential store as a hard error) but the well-known SIDs `*S-1-5-18`
/// (LocalSystem) and `*S-1-5-32-544` (BUILTIN\Administrators) are language- and
/// locale-independent, so this does not break on localized Windows.
fn lock_down_acl(path: &std::path::Path) -> Result<()> {
use std::os::windows::process::CommandExt;
use std::process::Command;
const CREATE_NO_WINDOW: u32 = 0x0800_0000;
let path_str = path
.to_str()
.ok_or_else(|| anyhow!("credential path is not valid UTF-8: {path:?}"))?;
// /inheritance:r -> remove inherited ACEs (drop the permissive parent grant)
// /grant:r -> replace any existing explicit grants for the principal
// *S-1-5-18 -> LocalSystem; *S-1-5-32-544 -> BUILTIN\Administrators
let output = Command::new("icacls")
.arg(path_str)
.args([
"/inheritance:r",
"/grant:r",
"*S-1-5-18:(OI)(CI)F",
"/grant:r",
"*S-1-5-32-544:(OI)(CI)F",
])
.creation_flags(CREATE_NO_WINDOW)
.output()
.context("failed to invoke icacls to harden credential ACL")?;
if !output.status.success() {
// icacls writes its diagnostics to stdout; surface the code only (no
// credential material is ever passed to icacls, only the path).
return Err(anyhow!(
"icacls failed to harden {path_str} (exit {:?})",
output.status.code()
));
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
/// DPAPI round-trips on the same machine: protect then unprotect must recover
/// the exact plaintext. (Runs on the build/test host, which IS the same
/// machine — the machine-scope key is available to any process here.)
#[test]
fn dpapi_roundtrip_recovers_plaintext() {
let secret = b"cak_test_value_0123456789abcdef";
let blob = dpapi_protect(secret).expect("DPAPI protect should succeed on this machine");
assert_ne!(
blob.as_slice(),
secret.as_slice(),
"ciphertext must differ from plaintext"
);
let recovered = dpapi_unprotect(&blob).expect("DPAPI unprotect should succeed");
assert_eq!(recovered, secret, "round-trip must recover the exact bytes");
}
/// A non-empty plaintext yields a non-empty, differing blob, and an empty
/// input is handled (DPAPI accepts zero-length and round-trips to empty).
#[test]
fn dpapi_roundtrip_handles_varied_lengths() {
for plaintext in [b"x".as_slice(), b"cak_".as_slice(), &[0u8; 256]] {
let blob = dpapi_protect(plaintext).expect("protect");
let back = dpapi_unprotect(&blob).expect("unprotect");
assert_eq!(back.as_slice(), plaintext);
}
}
/// Tampering with the ciphertext must make decryption FAIL rather than return
/// garbage — DPAPI authenticates its blobs.
#[test]
fn dpapi_rejects_tampered_blob() {
let mut blob = dpapi_protect(b"cak_tamper_target").expect("protect");
// Flip a byte in the middle of the blob.
let mid = blob.len() / 2;
blob[mid] ^= 0xFF;
assert!(
dpapi_unprotect(&blob).is_err(),
"a tampered DPAPI blob must fail to decrypt"
);
}
}