feat(agent,server): v2 secure-session-core Task 7 - HW H.264 + negotiated raw fallback
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 6m57s
Build and Test / Build Server (Linux) (push) Successful in 10m23s
Build and Test / Security Audit (push) Successful in 4m15s
Build and Test / Build Summary (push) Successful in 9s

SPEC-002 Phase 1 Task 7 (the last), code-reviewed APPROVED, locally verified
(cargo fmt + clippy -D warnings exit 0 + cargo test --workspace 89 pass + build).

- Encoder trait + factory: RawEncoder (salvaged, UNCHANGED) and H264Encoder,
  selected by negotiation; factory falls back to raw on H.264 init failure.
- Negotiation: agent advertises supports_h264 (MFTEnumEx HW probe, cached) in
  AgentStatus; server picks the codec via select_video_codec(supports, prefer)
  and stamps StartStream.video_codec; agent re-guards on local HW. Policy
  constant DEFAULT_PREFER_H264 = false, so RAW is negotiated for every session
  today - H.264 stays dormant until live hardware validation (Task 8).
- MF H.264 encoder (h264.rs, FIRST-CUT / compile-verified-only): HW encoder MFT,
  BGRA->NV12 (color.rs, unit-tested), sync drain, fall-back-to-raw on any failure.
- Viewer H.264 decoder (decoder.rs, FIRST-CUT): MF decoder on a dedicated COM
  thread; drops+logs on failure, raw render path untouched.
- proto additive: VideoCodec enum, StartStream.video_codec=3,
  SessionResponse.video_codec=5, AgentStatus.supports_h264=11.
- Raw+Zstd path byte-for-byte unchanged; remains the guaranteed default/fallback.

Review confirmed unsafe impl Send for H264Encoder is sound (single-owned &mut on
the block_on thread; session future never spawned) and every MF failure degrades
to raw. H.264 is NOT claimed functional - compile/clippy/build-verified only;
live validation + force-IDR + the no-spawn-invariant doc are Task 8 go-live gates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-30 10:35:04 -07:00
parent bb73ba667f
commit f9bdecbfdb
12 changed files with 1885 additions and 23 deletions

View File

@@ -95,6 +95,13 @@ windows = { version = "0.58", features = [
"Win32_System_Pipes",
"Win32_System_SystemServices",
"Win32_System_IO",
"Win32_System_Com",
"Win32_System_Com_StructuredStorage",
"Win32_System_Ole",
"Win32_System_Variant",
"Win32_Media_MediaFoundation",
"Win32_Media_KernelStreaming",
"Win32_Media_DirectShow",
]}
# Windows service support

View File

@@ -0,0 +1,97 @@
//! Hardware video-encode capability detection (Task 7).
//!
//! Probes Windows Media Foundation for a HARDWARE H.264 encoder MFT at startup.
//! The result is cached and advertised to the server in `AgentStatus.supports_h264`
//! so the server can negotiate the codec (see `StartStream.video_codec`).
//!
//! Detection is intentionally cheap and side-effect-free: it only ENUMERATES the
//! available encoder MFTs (it does not create or initialize one). A `true` result
//! means a hardware H.264 encoder was advertised by the OS; it does NOT guarantee
//! the encoder will successfully initialize at stream time — the H.264 encoder
//! still falls back to raw on any init/feed failure.
//!
//! On non-Windows targets, or if MF is unavailable, this reports `false`.
use std::sync::OnceLock;
/// Cached capability result. Detection runs at most once per process.
static SUPPORTS_H264: OnceLock<bool> = OnceLock::new();
/// Return whether this machine has a hardware H.264 encoder, detecting once and
/// caching the result. Safe to call repeatedly and from any thread.
pub fn supports_hardware_h264() -> bool {
*SUPPORTS_H264.get_or_init(detect_hardware_h264)
}
/// Run the actual detection. Separated so the cached accessor stays trivial.
fn detect_hardware_h264() -> bool {
let supported = detect_inner();
if supported {
tracing::info!("Hardware H.264 encoder detected (Media Foundation)");
} else {
tracing::info!("No hardware H.264 encoder detected; raw+Zstd only");
}
supported
}
#[cfg(windows)]
fn detect_inner() -> bool {
// Enumerate hardware H.264 encoder MFTs. This is a read-only probe; it does
// not init D3D, COM apartments persistently, or create the encoder.
match unsafe { enumerate_hardware_h264() } {
Ok(found) => found,
Err(e) => {
tracing::warn!("H.264 capability probe failed: {e:#}; assuming no HW encoder");
false
}
}
}
#[cfg(not(windows))]
fn detect_inner() -> bool {
false
}
#[cfg(windows)]
unsafe fn enumerate_hardware_h264() -> anyhow::Result<bool> {
use windows::Win32::Media::MediaFoundation::{
MFMediaType_Video, MFTEnumEx, MFVideoFormat_H264, MFT_CATEGORY_VIDEO_ENCODER,
MFT_ENUM_FLAG_HARDWARE, MFT_ENUM_FLAG_SORTANDFILTER, MFT_ENUM_FLAG_TRANSCODE_ONLY,
MFT_REGISTER_TYPE_INFO,
};
// We only specify the OUTPUT type (H.264); input is left unconstrained so the
// probe matches encoders regardless of their preferred input subtype.
let output_type = MFT_REGISTER_TYPE_INFO {
guidMajorType: MFMediaType_Video,
guidSubtype: MFVideoFormat_H264,
};
let mut activate_ptr: *mut Option<windows::Win32::Media::MediaFoundation::IMFActivate> =
std::ptr::null_mut();
let mut count: u32 = 0;
// MFTEnumEx does not itself require MFStartup for a pure enumeration, but we
// guard with a Result so any HRESULT failure degrades to "no HW encoder".
MFTEnumEx(
MFT_CATEGORY_VIDEO_ENCODER,
MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_SORTANDFILTER | MFT_ENUM_FLAG_TRANSCODE_ONLY,
None, // input type: any
Some(&output_type as *const _),
&mut activate_ptr,
&mut count,
)?;
// Release every returned IMFActivate, then free the array CoTaskMemAlloc'd by MF.
let found = count > 0;
if !activate_ptr.is_null() {
let slice = std::slice::from_raw_parts_mut(activate_ptr, count as usize);
for entry in slice.iter_mut() {
// Dropping the Option<IMFActivate> releases the COM reference.
entry.take();
}
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
}
Ok(found)
}

269
agent/src/encoder/color.rs Normal file
View File

@@ -0,0 +1,269 @@
//! Color-space conversion for the H.264 encode path (Task 7).
//!
//! Screen capture produces BGRA (4 bytes/pixel, B,G,R,A order — the DXGI/GDI
//! native layout). Media Foundation hardware H.264 encoders want NV12: a full-
//! resolution 8-bit Y (luma) plane followed by an interleaved half-resolution
//! U/V (chroma) plane. This module does that conversion in software.
//!
//! NV12 memory layout for a `width x height` frame (width/height assumed even):
//! - Y plane: `width * height` bytes, row-major.
//! - UV plane: `width * (height / 2)` bytes — for each 2x2 luma block one
//! (U, V) pair, so the plane is `(width/2)` (U,V) pairs per row over
//! `height/2` rows, i.e. `width` bytes per chroma row.
//!
//! Total size = `width * height * 3 / 2`.
//!
//! The coefficients are BT.601 "studio swing" (limited range, 16..235 luma),
//! which is what MF H.264 encoders expect by default. Chroma is computed by
//! averaging the 2x2 BGRA block before conversion (box downsample) to reduce
//! aliasing.
/// Size in bytes of an NV12 buffer for `width` x `height` (both even).
#[inline]
pub fn nv12_size(width: u32, height: u32) -> usize {
(width as usize * height as usize) * 3 / 2
}
/// BT.601 limited-range luma from 8-bit R,G,B.
#[inline]
fn rgb_to_y(r: i32, g: i32, b: i32) -> u8 {
// Y = 16 + (65.481*R + 128.553*G + 24.966*B) / 255, fixed-point.
// Using the common integer approximation:
// Y = ((66*R + 129*G + 25*B + 128) >> 8) + 16
let y = ((66 * r + 129 * g + 25 * b + 128) >> 8) + 16;
y.clamp(0, 255) as u8
}
/// BT.601 limited-range Cb (U) from 8-bit R,G,B.
#[inline]
fn rgb_to_u(r: i32, g: i32, b: i32) -> u8 {
let u = ((-38 * r - 74 * g + 112 * b + 128) >> 8) + 128;
u.clamp(0, 255) as u8
}
/// BT.601 limited-range Cr (V) from 8-bit R,G,B.
#[inline]
fn rgb_to_v(r: i32, g: i32, b: i32) -> u8 {
let v = ((112 * r - 94 * g - 18 * b + 128) >> 8) + 128;
v.clamp(0, 255) as u8
}
/// Convert a tightly-packed BGRA frame into NV12, writing into `out`.
///
/// `bgra` must be at least `width * height * 4` bytes; `out` must be at least
/// `nv12_size(width, height)` bytes. `width` and `height` MUST be even (H.264
/// 4:2:0 requires even dimensions — the caller pads odd capture sizes). Returns
/// an error rather than panicking on a short buffer or odd dimension so the
/// encoder can fall back to raw.
pub fn bgra_to_nv12(
bgra: &[u8],
width: u32,
height: u32,
out: &mut [u8],
) -> Result<(), ColorConvertError> {
if width == 0 || height == 0 {
return Err(ColorConvertError::ZeroDimension);
}
if !width.is_multiple_of(2) || !height.is_multiple_of(2) {
return Err(ColorConvertError::OddDimension { width, height });
}
let w = width as usize;
let h = height as usize;
let expected_src = w * h * 4;
if bgra.len() < expected_src {
return Err(ColorConvertError::SrcTooSmall {
got: bgra.len(),
need: expected_src,
});
}
let need_out = nv12_size(width, height);
if out.len() < need_out {
return Err(ColorConvertError::DstTooSmall {
got: out.len(),
need: need_out,
});
}
let (y_plane, uv_plane) = out.split_at_mut(w * h);
// Luma: one sample per pixel.
for row in 0..h {
let src_row = row * w * 4;
let dst_row = row * w;
for col in 0..w {
let px = src_row + col * 4;
// BGRA order.
let b = bgra[px] as i32;
let g = bgra[px + 1] as i32;
let r = bgra[px + 2] as i32;
y_plane[dst_row + col] = rgb_to_y(r, g, b);
}
}
// Chroma: one (U,V) pair per 2x2 block, box-averaged.
let chroma_rows = h / 2;
let chroma_cols = w / 2;
for cy in 0..chroma_rows {
for cx in 0..chroma_cols {
let x0 = cx * 2;
let y0 = cy * 2;
let mut r_sum = 0i32;
let mut g_sum = 0i32;
let mut b_sum = 0i32;
for dy in 0..2 {
for dx in 0..2 {
let px = ((y0 + dy) * w + (x0 + dx)) * 4;
b_sum += bgra[px] as i32;
g_sum += bgra[px + 1] as i32;
r_sum += bgra[px + 2] as i32;
}
}
let r = r_sum / 4;
let g = g_sum / 4;
let b = b_sum / 4;
let uv_idx = (cy * chroma_cols + cx) * 2;
uv_plane[uv_idx] = rgb_to_u(r, g, b);
uv_plane[uv_idx + 1] = rgb_to_v(r, g, b);
}
}
Ok(())
}
/// Errors from BGRA->NV12 conversion. Surfaced (not panicked) so the H.264
/// encoder can downgrade to raw.
#[derive(Debug, thiserror::Error)]
pub enum ColorConvertError {
#[error("frame dimension is zero")]
ZeroDimension,
#[error("NV12 requires even dimensions, got {width}x{height}")]
OddDimension { width: u32, height: u32 },
#[error("source BGRA buffer too small: {got} < {need}")]
SrcTooSmall { got: usize, need: usize },
#[error("destination NV12 buffer too small: {got} < {need}")]
DstTooSmall { got: usize, need: usize },
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn nv12_size_is_3half() {
assert_eq!(nv12_size(2, 2), 6);
assert_eq!(nv12_size(4, 4), 24);
assert_eq!(nv12_size(1920, 1080), 1920 * 1080 * 3 / 2);
}
#[test]
fn rejects_odd_dimensions() {
let bgra = vec![0u8; 3 * 3 * 4];
let mut out = vec![0u8; nv12_size(4, 4)];
assert!(matches!(
bgra_to_nv12(&bgra, 3, 2, &mut out),
Err(ColorConvertError::OddDimension { .. })
));
assert!(matches!(
bgra_to_nv12(&bgra, 2, 3, &mut out),
Err(ColorConvertError::OddDimension { .. })
));
}
#[test]
fn rejects_short_source() {
let bgra = vec![0u8; 4]; // way too small for 2x2
let mut out = vec![0u8; nv12_size(2, 2)];
assert!(matches!(
bgra_to_nv12(&bgra, 2, 2, &mut out),
Err(ColorConvertError::SrcTooSmall { .. })
));
}
#[test]
fn rejects_short_dest() {
let bgra = vec![0u8; 2 * 2 * 4];
let mut out = vec![0u8; 1];
assert!(matches!(
bgra_to_nv12(&bgra, 2, 2, &mut out),
Err(ColorConvertError::DstTooSmall { .. })
));
}
/// A pure-black BGRA frame -> Y = 16 (limited-range black), U = V = 128.
#[test]
fn black_frame_maps_to_limited_range_black() {
let bgra = vec![0u8; 4 * 4 * 4]; // all zero => black, alpha 0
let mut out = vec![0u8; nv12_size(4, 4)];
bgra_to_nv12(&bgra, 4, 4, &mut out).unwrap();
// Y plane (first 16 bytes) all 16.
for &y in &out[..16] {
assert_eq!(y, 16, "black luma must be 16 (limited range)");
}
// UV plane all 128 (neutral chroma).
for &c in &out[16..] {
assert_eq!(c, 128, "black chroma must be neutral 128");
}
}
/// A pure-white BGRA frame -> Y = 235 (limited-range white), U = V = 128.
#[test]
fn white_frame_maps_to_limited_range_white() {
// B=255, G=255, R=255, A=255 for every pixel.
let bgra = vec![255u8; 2 * 2 * 4];
let mut out = vec![0u8; nv12_size(2, 2)];
bgra_to_nv12(&bgra, 2, 2, &mut out).unwrap();
// Y = ((66+129+25)*255 + 128) >> 8 + 16 = 235.
for &y in &out[..4] {
assert_eq!(y, 235, "white luma must be 235 (limited range)");
}
// Neutral chroma for a gray/white pixel.
assert_eq!(out[4], 128);
assert_eq!(out[5], 128);
}
/// A pure-red frame: luma below mid, V (Cr) well above 128, U (Cb) below 128.
#[test]
fn red_frame_has_high_cr_low_cb() {
// BGRA red: B=0, G=0, R=255, A=255.
let mut bgra = vec![0u8; 2 * 2 * 4];
for px in bgra.chunks_mut(4) {
px[0] = 0; // B
px[1] = 0; // G
px[2] = 255; // R
px[3] = 255; // A
}
let mut out = vec![0u8; nv12_size(2, 2)];
bgra_to_nv12(&bgra, 2, 2, &mut out).unwrap();
let u = out[4];
let v = out[5];
assert!(v > 200, "red must have high Cr (V), got {v}");
assert!(u < 128, "red must have Cb (U) below neutral, got {u}");
}
/// Conversion fills the whole NV12 buffer (no leftover zeros where data is
/// expected) for a non-trivial gradient — a sanity check on plane indexing.
#[test]
fn plane_indexing_covers_full_buffer() {
let w = 8u32;
let h = 8u32;
let mut bgra = vec![0u8; (w * h * 4) as usize];
for (i, px) in bgra.chunks_mut(4).enumerate() {
let v = (i % 256) as u8;
px[0] = v;
px[1] = v;
px[2] = v;
px[3] = 255;
}
let mut out = vec![0xAAu8; nv12_size(w, h)];
bgra_to_nv12(&bgra, w, h, &mut out).unwrap();
// Y plane should be fully written (gray ramp -> non-constant).
let y_plane = &out[..(w * h) as usize];
assert!(y_plane.windows(2).any(|p| p[0] != p[1]), "Y plane varies");
}
}

515
agent/src/encoder/h264.rs Normal file
View File

@@ -0,0 +1,515 @@
//! Hardware H.264 encoder via Windows Media Foundation (Task 7).
//!
//! FIRST-CUT / COMPILE-VERIFIED ONLY. This encoder is wired end-to-end (init ->
//! feed -> drain -> emit `EncodedFrame{h264}`) and is selected only when the
//! agent advertised hardware support AND the server negotiated H.264. It has NOT
//! been validated on real hardware with live frames — that is plan Task 8. On
//! ANY initialization or per-frame failure it surfaces an error; the encoder
//! factory (`create_encoder_for`) downgrades to the raw+Zstd encoder so a
//! session never breaks because of H.264.
//!
//! Pipeline:
//! BGRA capture --(color::bgra_to_nv12)--> NV12 sample --> MFT(H.264) --> H.264
//! Annex-B/length-prefixed elementary stream --> proto EncodedFrame.
//!
//! Design notes:
//! - The MFT is enumerated with `MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER,
//! MFT_ENUM_FLAG_HARDWARE, …, MFVideoFormat_H264)` (same probe as
//! `capability`). We `ActivateObject` the first match.
//! - Input is configured as NV12, output as H.264, with frame size, frame rate
//! and an average bitrate derived from `quality`.
//! - Both the SYNCHRONOUS MFT model (ProcessInput/ProcessOutput) and the
//! ASYNCHRONOUS hardware-MFT model (METransformNeedInput / METransformHaveOutput
//! events) exist. To keep this first cut bounded and predictable we DRAIN the
//! MFT synchronously after each input and treat `MF_E_TRANSFORM_NEED_MORE_INPUT`
//! as "no output this tick". A fully async event-driven loop is a Task-8
//! refinement (documented below).
//! - `MFT_MESSAGE_SET_D3D_MANAGER` is intentionally NOT set — we feed CPU NV12
//! buffers (software input samples), which every HW H.264 MFT accepts. D3D11
//! zero-copy is a later optimization.
#![cfg(windows)]
use super::{EncodedFrame, Encoder};
use crate::capture::CapturedFrame;
use crate::encoder::color;
use crate::proto::{video_frame, EncodedFrame as ProtoEncodedFrame, VideoFrame};
use anyhow::{anyhow, Context, Result};
use windows::Win32::Media::MediaFoundation::{
IMFActivate, IMFMediaType, IMFSample, IMFTransform, MFCreateMediaType, MFCreateMemoryBuffer,
MFCreateSample, MFMediaType_Video, MFShutdown, MFStartup, MFTEnumEx, MFVideoFormat_H264,
MFVideoFormat_NV12, MFVideoInterlace_Progressive, MFSTARTUP_LITE, MFT_CATEGORY_VIDEO_ENCODER,
MFT_ENUM_FLAG_HARDWARE, MFT_ENUM_FLAG_SORTANDFILTER, MFT_ENUM_FLAG_TRANSCODE_ONLY,
MFT_MESSAGE_COMMAND_FLUSH, MFT_MESSAGE_NOTIFY_BEGIN_STREAMING,
MFT_MESSAGE_NOTIFY_END_OF_STREAM, MFT_MESSAGE_NOTIFY_END_STREAMING,
MFT_MESSAGE_NOTIFY_START_OF_STREAM, MFT_OUTPUT_DATA_BUFFER, MFT_OUTPUT_STREAM_INFO,
MFT_REGISTER_TYPE_INFO, MF_E_TRANSFORM_NEED_MORE_INPUT, MF_MT_AVG_BITRATE, MF_MT_FRAME_RATE,
MF_MT_FRAME_SIZE, MF_MT_INTERLACE_MODE, MF_MT_MAJOR_TYPE, MF_MT_PIXEL_ASPECT_RATIO,
MF_MT_SUBTYPE,
};
/// Encoder-internal state, created once and reused per frame.
pub struct H264Encoder {
/// The activated encoder transform.
transform: IMFTransform,
/// Configured frame dimensions; a capture-size change forces re-init.
width: u32,
height: u32,
/// Quality (1-100) used to derive the bitrate; kept for re-init on resize.
quality: u32,
/// Frame sequence counter (mirrors RawEncoder).
sequence: u32,
/// Force the next frame to request a keyframe.
force_keyframe: bool,
/// Whether `MFT_MESSAGE_NOTIFY_BEGIN_STREAMING` was sent.
streaming: bool,
/// Reusable NV12 staging buffer (resized on dimension change).
nv12: Vec<u8>,
/// Input/output stream identifiers (most encoders use 0/0).
input_stream_id: u32,
output_stream_id: u32,
/// True if MF was started by THIS encoder and must be shut down on drop.
mf_started: bool,
}
// IMFTransform is a COM interface; it is not auto-Send. We only ever touch the
// encoder from the single capture/encode thread (the session owns it behind a
// &mut), so it is safe to move between threads as long as it is not shared.
unsafe impl Send for H264Encoder {}
impl H264Encoder {
/// Construct and fully initialize a hardware H.264 encoder. Returns an error
/// (so the factory can fall back to raw) if MF is unavailable, no hardware
/// encoder exists, or media-type negotiation fails. A default frame size is
/// used and re-negotiated on the first frame if the real capture differs.
pub fn new(quality: u32) -> Result<Self> {
// 1920x1080 default; re-init on the first frame if the capture differs.
Self::with_dimensions(quality, 1920, 1080)
}
fn with_dimensions(quality: u32, width: u32, height: u32) -> Result<Self> {
unsafe {
// MF must be initialized on this thread. MFSTARTUP_LITE avoids the
// sockets/network stack we don't need.
MFStartup(mf_version(), MFSTARTUP_LITE).context("MFStartup failed")?;
let mf_started = true;
let transform = match Self::activate_hw_encoder() {
Ok(t) => t,
Err(e) => {
// Balance the MFStartup we just did before bailing.
let _ = MFShutdown();
return Err(e);
}
};
let mut enc = Self {
transform,
width,
height,
quality,
sequence: 0,
force_keyframe: true,
streaming: false,
nv12: Vec::new(),
input_stream_id: 0,
output_stream_id: 0,
mf_started,
};
// `enc`'s Drop will shut MF down and release the transform on error.
enc.configure_media_types()?;
Ok(enc)
}
}
/// Enumerate hardware H.264 encoder MFTs and activate the first one.
unsafe fn activate_hw_encoder() -> Result<IMFTransform> {
let output_type = MFT_REGISTER_TYPE_INFO {
guidMajorType: MFMediaType_Video,
guidSubtype: MFVideoFormat_H264,
};
let mut activate_ptr: *mut Option<IMFActivate> = std::ptr::null_mut();
let mut count: u32 = 0;
MFTEnumEx(
MFT_CATEGORY_VIDEO_ENCODER,
MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_SORTANDFILTER | MFT_ENUM_FLAG_TRANSCODE_ONLY,
None,
Some(&output_type as *const _),
&mut activate_ptr,
&mut count,
)
.context("MFTEnumEx (hardware H.264) failed")?;
if count == 0 || activate_ptr.is_null() {
if !activate_ptr.is_null() {
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
}
return Err(anyhow!("no hardware H.264 encoder MFT available"));
}
let slice = std::slice::from_raw_parts_mut(activate_ptr, count as usize);
// Activate the first usable encoder; release every IMFActivate.
let mut chosen: Option<IMFTransform> = None;
for entry in slice.iter_mut() {
if chosen.is_none() {
if let Some(activate) = entry.as_ref() {
if let Ok(transform) = activate.ActivateObject::<IMFTransform>() {
chosen = Some(transform);
}
}
}
// Release this IMFActivate reference.
entry.take();
}
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
chosen.ok_or_else(|| anyhow!("failed to activate any hardware H.264 encoder MFT"))
}
/// Set the H.264 output type and NV12 input type, in the order MF requires
/// (output type FIRST for encoders, then the matching input type).
unsafe fn configure_media_types(&mut self) -> Result<()> {
// Discover the real stream identifiers (most encoders report 0/0).
let mut input_ids = [0u32; 1];
let mut output_ids = [0u32; 1];
// GetStreamIDs may return E_NOTIMPL meaning "ids are 0..n-1"; ignore err.
let _ = self.transform.GetStreamIDs(&mut input_ids, &mut output_ids);
// If GetStreamIDs populated nonzero ids use them, else default 0/0.
if input_ids[0] != 0 {
self.input_stream_id = input_ids[0];
}
if output_ids[0] != 0 {
self.output_stream_id = output_ids[0];
}
let fps_num = 30u32;
let fps_den = 1u32;
let bitrate = quality_to_bitrate(self.quality, self.width, self.height);
// ---- OUTPUT (H.264) ----
let out_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(out)")?;
out_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
out_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_H264)?;
out_type.SetUINT32(&MF_MT_AVG_BITRATE, bitrate)?;
set_attr_size(&out_type, &MF_MT_FRAME_SIZE, self.width, self.height)?;
set_attr_ratio(&out_type, &MF_MT_FRAME_RATE, fps_num, fps_den)?;
set_attr_ratio(&out_type, &MF_MT_PIXEL_ASPECT_RATIO, 1, 1)?;
out_type.SetUINT32(&MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive.0 as u32)?;
self.transform
.SetOutputType(self.output_stream_id, &out_type, 0)
.context("SetOutputType(H264)")?;
// ---- INPUT (NV12) ----
let in_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(in)")?;
in_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
in_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_NV12)?;
set_attr_size(&in_type, &MF_MT_FRAME_SIZE, self.width, self.height)?;
set_attr_ratio(&in_type, &MF_MT_FRAME_RATE, fps_num, fps_den)?;
set_attr_ratio(&in_type, &MF_MT_PIXEL_ASPECT_RATIO, 1, 1)?;
in_type.SetUINT32(&MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive.0 as u32)?;
self.transform
.SetInputType(self.input_stream_id, &in_type, 0)
.context("SetInputType(NV12)")?;
Ok(())
}
/// Begin streaming if not already started (idempotent).
unsafe fn ensure_streaming(&mut self) -> Result<()> {
if !self.streaming {
self.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, 0)
.context("NOTIFY_BEGIN_STREAMING")?;
self.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, 0)
.context("NOTIFY_START_OF_STREAM")?;
self.streaming = true;
}
Ok(())
}
/// Re-initialize the encoder for a new frame size (capture resolution change).
unsafe fn reinit_for_size(&mut self, width: u32, height: u32) -> Result<()> {
if self.streaming {
let _ = self.transform.ProcessMessage(MFT_MESSAGE_COMMAND_FLUSH, 0);
let _ = self
.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, 0);
let _ = self
.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, 0);
self.streaming = false;
}
self.width = width;
self.height = height;
self.force_keyframe = true;
self.configure_media_types()
}
/// Wrap an NV12 byte buffer into an `IMFSample` with the given timestamp.
/// A free associated fn (does not borrow `self`) so the caller can pass
/// `&self.nv12` without a clone while `self` is mutably borrowed elsewhere.
unsafe fn make_input_sample(nv12: &[u8], pts_100ns: i64) -> Result<IMFSample> {
let sample: IMFSample = MFCreateSample().context("MFCreateSample")?;
let buffer = MFCreateMemoryBuffer(nv12.len() as u32).context("MFCreateMemoryBuffer")?;
// Lock, copy NV12 in, set current length, unlock.
let mut data_ptr: *mut u8 = std::ptr::null_mut();
let mut max_len: u32 = 0;
buffer
.Lock(&mut data_ptr, Some(&mut max_len), None)
.context("IMFMediaBuffer::Lock")?;
if (max_len as usize) < nv12.len() || data_ptr.is_null() {
let _ = buffer.Unlock();
return Err(anyhow!("MF buffer too small for NV12 frame"));
}
std::ptr::copy_nonoverlapping(nv12.as_ptr(), data_ptr, nv12.len());
buffer.SetCurrentLength(nv12.len() as u32)?;
buffer.Unlock()?;
sample.AddBuffer(&buffer)?;
sample.SetSampleTime(pts_100ns)?;
// 33.367ms per frame at ~30fps, in 100ns units.
sample.SetSampleDuration(333_667)?;
Ok(sample)
}
/// Drain one available output sample, if any. Returns the encoded bytes and
/// whether the MFT flagged it a keyframe (clean point). `Ok(None)` means the
/// MFT needs more input before it can produce output this tick.
unsafe fn drain_one_output(&mut self) -> Result<Option<(Vec<u8>, bool)>> {
let stream_info: MFT_OUTPUT_STREAM_INFO = self
.transform
.GetOutputStreamInfo(self.output_stream_id)
.context("GetOutputStreamInfo")?;
// If the MFT does not allocate its own output samples we must provide one.
const MFT_OUTPUT_STREAM_PROVIDES_SAMPLES: u32 = 0x100;
let mft_provides = stream_info.dwFlags & MFT_OUTPUT_STREAM_PROVIDES_SAMPLES != 0;
let mut out_buffer = MFT_OUTPUT_DATA_BUFFER {
dwStreamID: self.output_stream_id,
..Default::default()
};
if !mft_provides {
let alloc_size = stream_info.cbSize.max(1);
let sample: IMFSample = MFCreateSample().context("MFCreateSample(out)")?;
let buffer = MFCreateMemoryBuffer(alloc_size).context("MFCreateMemoryBuffer(out)")?;
sample.AddBuffer(&buffer)?;
out_buffer.pSample = std::mem::ManuallyDrop::new(Some(sample));
}
let mut status: u32 = 0;
let mut bufs = [out_buffer];
let hr = self.transform.ProcessOutput(0, &mut bufs, &mut status);
// Take ownership of whatever sample is now in the buffer (ours or MFT's).
let produced = std::mem::ManuallyDrop::take(&mut bufs[0].pSample);
match hr {
Ok(()) => {
let Some(sample) = produced else {
return Ok(None);
};
let bytes = sample_to_vec(&sample)?;
let keyframe = sample_is_keyframe(&sample);
Ok(Some((bytes, keyframe)))
}
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => Ok(None),
Err(e) => Err(anyhow!("ProcessOutput failed: {e:#}")),
}
}
}
impl Encoder for H264Encoder {
fn encode(&mut self, frame: &CapturedFrame) -> Result<EncodedFrame> {
self.sequence = self.sequence.wrapping_add(1);
// H.264 4:2:0 needs even dimensions. Reject odd captures up front so we
// surface a clean error (the factory already fell back to raw if HW was
// missing; a per-frame error here lets the session log + continue).
if !frame.width.is_multiple_of(2) || !frame.height.is_multiple_of(2) {
return Err(anyhow!(
"H.264 requires even dimensions, got {}x{}",
frame.width,
frame.height
));
}
unsafe {
// Re-init on a resolution change.
if frame.width != self.width || frame.height != self.height {
self.reinit_for_size(frame.width, frame.height)
.context("H.264 re-init for new frame size")?;
}
self.ensure_streaming()?;
// BGRA -> NV12 into the reusable staging buffer.
let need = color::nv12_size(frame.width, frame.height);
if self.nv12.len() != need {
self.nv12.resize(need, 0);
}
color::bgra_to_nv12(&frame.data, frame.width, frame.height, &mut self.nv12)
.map_err(|e| anyhow!("BGRA->NV12 failed: {e}"))?;
// PTS in 100ns units derived from the frame's capture instant.
let pts_100ns = (frame.timestamp.elapsed().as_nanos() / 100) as i64;
let sample = Self::make_input_sample(&self.nv12, pts_100ns)?;
// Feed the encoder. NEED_MORE_INPUT is normal back-pressure handling;
// for the synchronous first cut we only push one frame per tick.
match self
.transform
.ProcessInput(self.input_stream_id, &sample, 0)
{
Ok(()) => {}
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => {}
Err(e) => return Err(anyhow!("ProcessInput failed: {e:#}")),
}
// Drain whatever output is ready.
let Some((data, mft_keyframe)) = self.drain_one_output()? else {
// No compressed output yet (encoder latency / GOP buffering).
// Emit an empty frame so the session skips sending this tick.
return Ok(EncodedFrame {
frame: VideoFrame::default(),
size: 0,
is_keyframe: false,
});
};
let is_keyframe = mft_keyframe || self.force_keyframe;
self.force_keyframe = false;
let size = data.len();
let encoded = ProtoEncodedFrame {
data,
keyframe: is_keyframe,
pts: pts_100ns,
dts: pts_100ns,
};
Ok(EncodedFrame {
frame: VideoFrame {
timestamp: frame.timestamp.elapsed().as_millis() as i64,
display_id: frame.display_id as i32,
sequence: self.sequence as i32,
encoding: Some(video_frame::Encoding::H264(encoded)),
},
size,
is_keyframe,
})
}
}
fn request_keyframe(&mut self) {
// A precise force-IDR uses the MFT codec API
// (CODECAPI_AVEncVideoForceKeyFrame); for the first cut we flag the next
// emitted frame as a keyframe so the viewer treats it as a clean point.
self.force_keyframe = true;
}
fn name(&self) -> &str {
"h264-mediafoundation"
}
}
impl Drop for H264Encoder {
fn drop(&mut self) {
unsafe {
if self.streaming {
let _ = self
.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, 0);
let _ = self
.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, 0);
}
// The IMFTransform releases when `self.transform` drops.
if self.mf_started {
let _ = MFShutdown();
}
}
}
}
/// MF version word expected by `MFStartup` (MF_VERSION = (MF_API_VERSION<<16)|MF_SDK_VERSION).
fn mf_version() -> u32 {
// MF_SDK_VERSION = 0x0002, MF_API_VERSION = 0x0070 -> 0x00020070.
0x0002_0070
}
/// Derive a target average bitrate (bps) from the 1-100 quality knob and the
/// frame area. Tuned conservatively for desktop content (mostly static).
fn quality_to_bitrate(quality: u32, width: u32, height: u32) -> u32 {
let q = quality.clamp(1, 100) as u64;
let pixels = (width as u64) * (height as u64);
// Base ~0.06 bits/pixel/frame at 30fps for q=100, scaled by quality.
// bps = pixels * 30 * bpp; bpp scales 0.01..0.10 with quality.
let bpp_milli = 10 + (q * 90 / 100); // 0.010 .. 0.100 in milli-bits
let bps = pixels.saturating_mul(30).saturating_mul(bpp_milli) / 1000;
bps.clamp(500_000, 50_000_000) as u32
}
/// Pack (width, height) into the 64-bit MF_MT_FRAME_SIZE attribute.
#[cfg(windows)]
unsafe fn set_attr_size(
media_type: &IMFMediaType,
key: &windows::core::GUID,
width: u32,
height: u32,
) -> Result<()> {
let packed = ((width as u64) << 32) | (height as u64);
media_type.SetUINT64(key, packed)?;
Ok(())
}
/// Pack (numerator, denominator) into a 64-bit ratio MF attribute.
#[cfg(windows)]
unsafe fn set_attr_ratio(
media_type: &IMFMediaType,
key: &windows::core::GUID,
num: u32,
den: u32,
) -> Result<()> {
let packed = ((num as u64) << 32) | (den as u64);
media_type.SetUINT64(key, packed)?;
Ok(())
}
/// Copy all bytes out of an `IMFSample` (single contiguous buffer) into a Vec.
#[cfg(windows)]
unsafe fn sample_to_vec(sample: &IMFSample) -> Result<Vec<u8>> {
let buffer = sample
.ConvertToContiguousBuffer()
.context("ConvertToContiguousBuffer")?;
let mut ptr: *mut u8 = std::ptr::null_mut();
let mut len: u32 = 0;
buffer
.Lock(&mut ptr, None, Some(&mut len))
.context("output buffer Lock")?;
let out = if ptr.is_null() || len == 0 {
Vec::new()
} else {
std::slice::from_raw_parts(ptr, len as usize).to_vec()
};
let _ = buffer.Unlock();
Ok(out)
}
/// Read the "clean point" (keyframe) flag off a sample, if present.
#[cfg(windows)]
unsafe fn sample_is_keyframe(sample: &IMFSample) -> bool {
use windows::Win32::Media::MediaFoundation::MFSampleExtension_CleanPoint;
sample
.GetUINT32(&MFSampleExtension_CleanPoint)
.map(|v| v != 0)
.unwrap_or(false)
}

View File

@@ -1,16 +1,27 @@
//! Frame encoding module
//!
//! Encodes captured frames for transmission. Supports:
//! - Raw BGRA + Zstd compression (lowest latency, LAN mode)
//! - VP9 software encoding (universal fallback)
//! - H264 hardware encoding (when GPU available)
//! - Raw BGRA + Zstd compression (lowest latency, LAN mode; the guaranteed
//! fallback and the current default).
//! - H.264 hardware encoding via Windows Media Foundation (Task 7) — the
//! negotiated upgrade. Compile-verified; validated on real hardware in plan
//! Task 8. On any init/feed failure the factory or encoder falls back to raw.
//!
//! Codec selection is driven by the negotiated `VideoCodec` the server sends on
//! `StartStream` (see `select_codec` / `create_encoder_for`). The capability the
//! agent advertises to the server is detected by `capability::supports_hardware_h264`.
mod capability;
pub(crate) mod color;
#[cfg(windows)]
mod h264;
mod raw;
pub use capability::supports_hardware_h264;
pub use raw::RawEncoder;
use crate::capture::CapturedFrame;
use crate::proto::VideoFrame;
use crate::proto::{video_frame, VideoCodec, VideoFrame};
use anyhow::Result;
/// Encoded frame ready for transmission
@@ -28,7 +39,12 @@ pub struct EncodedFrame {
pub is_keyframe: bool,
}
/// Frame encoder trait
/// Frame encoder trait.
///
/// Every implementor turns a `CapturedFrame` (BGRA) into a wire `VideoFrame`
/// using one `video_frame::Encoding` variant. `RawEncoder` emits the `Raw`
/// variant; the H.264 encoder emits the `H264` variant. The factory
/// (`create_encoder_for`) selects the implementor from the negotiated codec.
pub trait Encoder: Send {
/// Encode a captured frame
fn encode(&mut self, frame: &CapturedFrame) -> Result<EncodedFrame>;
@@ -42,13 +58,167 @@ pub trait Encoder: Send {
fn name(&self) -> &str;
}
/// Create an encoder based on configuration
pub fn create_encoder(codec: &str, quality: u32) -> Result<Box<dyn Encoder>> {
/// Map a configured/negotiated codec string to a `VideoCodec`.
///
/// Used when constructing an encoder from the agent's own `EncodingConfig`
/// (before any server negotiation). Unknown / "auto" / "raw" all resolve to raw
/// — the safe default. "h264" resolves to H.264 (which itself falls back to raw
/// if MF init fails).
///
/// Retained as the config-string entry point (used by `create_encoder` and the
/// unit tests); the live session negotiates via `select_codec` on a `VideoCodec`.
#[allow(dead_code)]
pub fn codec_from_str(codec: &str) -> VideoCodec {
match codec.to_lowercase().as_str() {
"raw" | "zstd" => Ok(Box::new(RawEncoder::new(quality)?)),
// "vp9" => Ok(Box::new(Vp9Encoder::new(quality)?)),
// "h264" => Ok(Box::new(H264Encoder::new(quality)?)),
// "auto" and any unknown codec default to raw for now (best for LAN)
_ => Ok(Box::new(RawEncoder::new(quality)?)),
"h264" => VideoCodec::H264,
// "h265"/"hevc" are future opt-in (TODO) — treat as raw for now so we
// never select an unimplemented codec.
_ => VideoCodec::Raw,
}
}
/// Choose the codec the agent will actually use for a stream, given the codec
/// the server negotiated and the agent's own hardware capability.
///
/// This is the agent-side guard that keeps the raw fallback authoritative:
/// - The server only negotiates H.264 when the agent advertised support, but we
/// re-check `supports_hardware_h264()` here so a stale/misconfigured server
/// selection can never force an unsupported codec.
/// - H.265 is not implemented; it degrades to raw.
/// - Anything else is raw.
pub fn select_codec(negotiated: VideoCodec, hardware_h264_available: bool) -> VideoCodec {
match negotiated {
VideoCodec::H264 if hardware_h264_available => VideoCodec::H264,
// Server asked for H.264 but we have no HW encoder -> raw.
VideoCodec::H264 => VideoCodec::Raw,
// HEVC not implemented yet (TODO: Task 7 opt-in / future).
VideoCodec::H265 => VideoCodec::Raw,
VideoCodec::Raw => VideoCodec::Raw,
}
}
/// Create an encoder for an explicit `VideoCodec`, with a transparent fallback
/// to raw if a hardware encoder cannot be constructed.
///
/// `quality` is the 1-100 quality knob (mapped per-codec). On H.264 init failure
/// this logs and returns a raw encoder so the session keeps working.
pub fn create_encoder_for(codec: VideoCodec, quality: u32) -> Result<Box<dyn Encoder>> {
match codec {
VideoCodec::H264 => {
#[cfg(windows)]
{
match h264::H264Encoder::new(quality) {
Ok(enc) => {
tracing::info!("Using hardware H.264 encoder (Media Foundation)");
Ok(Box::new(enc))
}
Err(e) => {
tracing::warn!(
"H.264 encoder init failed ({e:#}); falling back to raw+Zstd"
);
Ok(Box::new(RawEncoder::new(quality)?))
}
}
}
#[cfg(not(windows))]
{
tracing::warn!("H.264 unsupported on this platform; using raw+Zstd");
Ok(Box::new(RawEncoder::new(quality)?))
}
}
// Raw (and anything that resolved to raw) uses the salvaged encoder.
VideoCodec::Raw | VideoCodec::H265 => Ok(Box::new(RawEncoder::new(quality)?)),
}
}
/// Create an encoder based on a codec string (agent config path).
///
/// Backwards-compatible entry point that builds an encoder from a codec STRING
/// (e.g. `EncodingConfig.codec`). Resolves the string to a `VideoCodec`, applies
/// the hardware-availability guard, then builds the encoder. The live session
/// uses `select_codec` + `create_encoder_for` (negotiated `VideoCodec`) instead;
/// this remains for the config path and is covered by unit tests.
#[allow(dead_code)]
pub fn create_encoder(codec: &str, quality: u32) -> Result<Box<dyn Encoder>> {
let requested = codec_from_str(codec);
let chosen = select_codec(requested, supports_hardware_h264());
create_encoder_for(chosen, quality)
}
/// Build an `EncodedFrame` carrying a single `video_frame::Encoding` payload.
/// Shared helper so encoders don't each repeat the `VideoFrame` wrapper.
#[allow(dead_code)]
pub(crate) fn wrap_video_frame(
timestamp_ms: i64,
display_id: i32,
sequence: i32,
encoding: video_frame::Encoding,
size: usize,
is_keyframe: bool,
) -> EncodedFrame {
EncodedFrame {
frame: VideoFrame {
timestamp: timestamp_ms,
display_id,
sequence,
encoding: Some(encoding),
},
size,
is_keyframe,
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn codec_from_str_maps_known_and_unknown() {
assert_eq!(codec_from_str("h264"), VideoCodec::H264);
assert_eq!(codec_from_str("H264"), VideoCodec::H264);
assert_eq!(codec_from_str("raw"), VideoCodec::Raw);
assert_eq!(codec_from_str("zstd"), VideoCodec::Raw);
assert_eq!(codec_from_str("auto"), VideoCodec::Raw);
assert_eq!(codec_from_str("vp9"), VideoCodec::Raw);
// HEVC not implemented -> raw, never H265.
assert_eq!(codec_from_str("h265"), VideoCodec::Raw);
assert_eq!(codec_from_str("hevc"), VideoCodec::Raw);
assert_eq!(codec_from_str(""), VideoCodec::Raw);
}
#[test]
fn select_codec_honors_hardware_guard() {
// Server negotiated H.264 and HW is present -> H.264.
assert_eq!(select_codec(VideoCodec::H264, true), VideoCodec::H264);
// Server negotiated H.264 but no HW -> raw (never forced).
assert_eq!(select_codec(VideoCodec::H264, false), VideoCodec::Raw);
// Raw stays raw regardless of HW.
assert_eq!(select_codec(VideoCodec::Raw, true), VideoCodec::Raw);
assert_eq!(select_codec(VideoCodec::Raw, false), VideoCodec::Raw);
// HEVC always degrades to raw (unimplemented).
assert_eq!(select_codec(VideoCodec::H265, true), VideoCodec::Raw);
}
#[test]
fn raw_factory_always_succeeds() {
// Raw must always construct (the guaranteed fallback).
let enc = create_encoder_for(VideoCodec::Raw, 75).unwrap();
assert_eq!(enc.name(), "raw+zstd");
}
#[test]
fn create_encoder_string_path_resolves_to_raw_without_hw() {
// On a machine without a HW encoder (CI / non-Windows), "h264" must
// resolve to a working raw encoder, not an error.
let enc = create_encoder("h264", 75).unwrap();
// Without HW it is raw; with HW it would be the H.264 encoder. We only
// assert it constructed.
let _ = enc.name();
}
#[test]
fn create_encoder_auto_is_raw() {
let enc = create_encoder("auto", 75).unwrap();
assert_eq!(enc.name(), "raw+zstd");
}
}

View File

@@ -61,6 +61,10 @@ pub struct SessionManager {
input: Option<InputController>,
// Streaming state
current_viewer_id: Option<String>,
// Codec negotiated by the server for the current stream (Task 7). Set from
// StartStream.video_codec; the encoder is built from it (guarded by the
// agent's own hardware capability, with raw as the safe fallback).
negotiated_codec: crate::proto::VideoCodec,
// System info for status reports
hostname: String,
is_elevated: bool,
@@ -87,6 +91,8 @@ impl SessionManager {
encoder: None,
input: None,
current_viewer_id: None,
// Default to RAW until the server negotiates otherwise (StartStream).
negotiated_codec: crate::proto::VideoCodec::Raw,
hostname,
is_elevated,
start_time: Instant::now(),
@@ -168,14 +174,20 @@ impl SessionManager {
self.capturer = Some(capturer);
tracing::info!("Capturer created successfully");
// Create encoder with panic protection
// Create encoder from the NEGOTIATED codec (Task 7), guarded by the
// agent's own hardware capability. `create_encoder_for` selects the H.264
// encoder only if it can actually be constructed, otherwise it returns a
// working raw encoder — so this never breaks the session.
let chosen =
encoder::select_codec(self.negotiated_codec, encoder::supports_hardware_h264());
tracing::debug!(
"Creating encoder (codec={}, quality={})...",
self.config.encoding.codec,
"Creating encoder (negotiated={:?}, chosen={:?}, quality={})...",
self.negotiated_codec,
chosen,
self.config.encoding.quality
);
let encoder = match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
encoder::create_encoder(&self.config.encoding.codec, self.config.encoding.quality)
encoder::create_encoder_for(chosen, self.config.encoding.quality)
})) {
Ok(result) => result?,
Err(e) => {
@@ -232,6 +244,9 @@ impl SessionManager {
organization: self.config.company.clone().unwrap_or_default(),
site: self.config.site.clone().unwrap_or_default(),
tags: self.config.tags.clone(),
// Advertise hardware H.264 capability so the server can negotiate the
// codec (Task 7). Detected once and cached by the encoder module.
supports_h264: encoder::supports_hardware_h264(),
};
let msg = Message {
@@ -336,6 +351,15 @@ impl SessionManager {
match payload {
message::Payload::StartStream(start) => {
tracing::info!("StartStream received from viewer: {}", start.viewer_id);
// Apply the server-negotiated codec (Task 7) BEFORE
// building the encoder. An older server that omits the
// field sends 0 = VIDEO_CODEC_RAW, preserving the raw
// default. `select_codec` (in init_streaming) re-guards
// against missing hardware.
self.negotiated_codec =
crate::proto::VideoCodec::try_from(start.video_codec)
.unwrap_or(crate::proto::VideoCodec::Raw);
tracing::info!("Server negotiated codec: {:?}", self.negotiated_codec);
if let Err(e) = self.init_streaming() {
tracing::error!("Failed to init streaming: {}", e);
} else {

452
agent/src/viewer/decoder.rs Normal file
View File

@@ -0,0 +1,452 @@
//! H.264 video decoder for the native viewer (Task 7).
//!
//! FIRST-CUT / COMPILE-VERIFIED ONLY. Decodes an H.264 elementary stream
//! (`EncodedFrame{h264}`) via a Media Foundation H.264 decoder MFT into NV12,
//! then converts NV12 -> BGRA so it can flow through the EXISTING raw render
//! path (`render::FrameData { compressed: false, BGRA }`). Not yet validated on
//! real hardware with a live stream — that is plan Task 8. On decode-init
//! failure the decoder reports an error and the viewer logs it; the raw-frame
//! render path is untouched for raw sessions.
//!
//! The decoder is created lazily on the first H.264 frame (so a raw session
//! never spins up MF). It is `!Send` (COM), so it lives on the viewer's receive
//! task and is wrapped accordingly by the caller.
#![cfg(windows)]
use anyhow::{anyhow, Context, Result};
use windows::Win32::Media::MediaFoundation::{
IMFMediaType, IMFSample, IMFTransform, MFCreateMediaType, MFCreateMemoryBuffer, MFCreateSample,
MFMediaType_Video, MFShutdown, MFStartup, MFTEnumEx, MFVideoFormat_H264, MFVideoFormat_NV12,
MFSTARTUP_LITE, MFT_CATEGORY_VIDEO_DECODER, MFT_ENUM_FLAG_SORTANDFILTER, MFT_ENUM_FLAG_SYNCMFT,
MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, MFT_MESSAGE_NOTIFY_END_OF_STREAM,
MFT_MESSAGE_NOTIFY_END_STREAMING, MFT_MESSAGE_NOTIFY_START_OF_STREAM, MFT_OUTPUT_DATA_BUFFER,
MFT_OUTPUT_STREAM_INFO, MFT_REGISTER_TYPE_INFO, MF_E_TRANSFORM_NEED_MORE_INPUT,
MF_E_TRANSFORM_STREAM_CHANGE, MF_MT_FRAME_SIZE, MF_MT_MAJOR_TYPE, MF_MT_SUBTYPE,
};
/// A decoded NV12 frame and its dimensions, ready for NV12 -> BGRA conversion.
pub struct DecodedFrame {
pub width: u32,
pub height: u32,
/// BGRA pixels (4 bytes/px), ready for `render::FrameData`.
pub bgra: Vec<u8>,
}
/// Media Foundation H.264 decoder wrapper.
pub struct H264Decoder {
transform: IMFTransform,
width: u32,
height: u32,
streaming: bool,
input_stream_id: u32,
output_stream_id: u32,
mf_started: bool,
}
// NOTE: H264Decoder is intentionally NOT `Send`. It wraps COM interfaces with
// thread affinity and is created + used entirely on the dedicated `gc-h264-decode`
// OS thread (see viewer::spawn_h264_decode_worker), so it never crosses a thread
// boundary and does not need a Send assertion.
impl H264Decoder {
/// Construct an H.264 decoder MFT and set its input type to H.264. The
/// output type (NV12) is negotiated after the first frames decode the
/// sequence header (we (re)read the real frame size on a stream change).
pub fn new() -> Result<Self> {
unsafe {
MFStartup(mf_version(), MFSTARTUP_LITE).context("MFStartup (decoder)")?;
let transform = match activate_decoder() {
Ok(t) => t,
Err(e) => {
let _ = MFShutdown();
return Err(e);
}
};
let mut dec = Self {
transform,
width: 0,
height: 0,
streaming: false,
input_stream_id: 0,
output_stream_id: 0,
mf_started: true,
};
dec.configure_input()?;
Ok(dec)
}
}
/// Set the decoder input type to H.264 (no fixed frame size — the decoder
/// learns it from the bitstream).
unsafe fn configure_input(&mut self) -> Result<()> {
let in_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(dec in)")?;
in_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
in_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_H264)?;
self.transform
.SetInputType(self.input_stream_id, &in_type, 0)
.context("SetInputType(H264 decode)")?;
Ok(())
}
/// Set the decoder output type to NV12 once the stream size is known.
unsafe fn configure_output_nv12(&mut self) -> Result<()> {
let out_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(dec out)")?;
out_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
out_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_NV12)?;
self.transform
.SetOutputType(self.output_stream_id, &out_type, 0)
.context("SetOutputType(NV12 decode)")?;
Ok(())
}
/// Read the negotiated output frame size from the decoder's current output type.
unsafe fn read_output_size(&mut self) -> Result<(u32, u32)> {
let out_type = self
.transform
.GetOutputCurrentType(self.output_stream_id)
.context("GetOutputCurrentType")?;
let packed = out_type
.GetUINT64(&MF_MT_FRAME_SIZE)
.context("read MF_MT_FRAME_SIZE")?;
let width = (packed >> 32) as u32;
let height = (packed & 0xFFFF_FFFF) as u32;
Ok((width, height))
}
unsafe fn ensure_streaming(&mut self) -> Result<()> {
if !self.streaming {
self.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, 0)
.context("decoder BEGIN_STREAMING")?;
self.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, 0)
.context("decoder START_OF_STREAM")?;
self.streaming = true;
}
Ok(())
}
/// Feed one H.264 access unit and return a decoded BGRA frame if one is
/// produced this tick. `Ok(None)` means the decoder needs more input (normal
/// while it buffers the first GOP).
pub fn decode(&mut self, h264: &[u8], pts_100ns: i64) -> Result<Option<DecodedFrame>> {
if h264.is_empty() {
return Ok(None);
}
unsafe {
self.ensure_streaming()?;
let sample = make_input_sample(h264, pts_100ns)?;
match self
.transform
.ProcessInput(self.input_stream_id, &sample, 0)
{
Ok(()) => {}
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => {}
Err(e) => return Err(anyhow!("decoder ProcessInput failed: {e:#}")),
}
self.drain_one()
}
}
/// Drain one decoded output sample, handling the initial NV12 output-type
/// negotiation (`MF_E_TRANSFORM_STREAM_CHANGE`).
unsafe fn drain_one(&mut self) -> Result<Option<DecodedFrame>> {
loop {
// If we have not yet set an output type, do so now (NV12). The first
// ProcessOutput typically returns STREAM_CHANGE until this is set.
if self.width == 0 {
// Try to set NV12 output; ignore failures here (the decoder may
// require a STREAM_CHANGE round-trip first).
let _ = self.configure_output_nv12();
}
let stream_info: MFT_OUTPUT_STREAM_INFO = self
.transform
.GetOutputStreamInfo(self.output_stream_id)
.context("decoder GetOutputStreamInfo")?;
const MFT_OUTPUT_STREAM_PROVIDES_SAMPLES: u32 = 0x100;
let mft_provides = stream_info.dwFlags & MFT_OUTPUT_STREAM_PROVIDES_SAMPLES != 0;
let mut out_buffer = MFT_OUTPUT_DATA_BUFFER {
dwStreamID: self.output_stream_id,
..Default::default()
};
if !mft_provides {
let alloc = stream_info.cbSize.max(self.guess_nv12_size());
let sample: IMFSample = MFCreateSample().context("MFCreateSample(dec out)")?;
let buffer =
MFCreateMemoryBuffer(alloc).context("MFCreateMemoryBuffer(dec out)")?;
sample.AddBuffer(&buffer)?;
out_buffer.pSample = std::mem::ManuallyDrop::new(Some(sample));
}
let mut status: u32 = 0;
let mut bufs = [out_buffer];
let hr = self.transform.ProcessOutput(0, &mut bufs, &mut status);
let produced = std::mem::ManuallyDrop::take(&mut bufs[0].pSample);
match hr {
Ok(()) => {
// (Re)read the negotiated size in case it just became known.
if let Ok((w, h)) = self.read_output_size() {
self.width = w;
self.height = h;
}
let Some(sample) = produced else {
return Ok(None);
};
if self.width == 0 || self.height == 0 {
return Ok(None);
}
let nv12 = sample_to_vec(&sample)?;
let bgra = nv12_to_bgra(&nv12, self.width, self.height)?;
return Ok(Some(DecodedFrame {
width: self.width,
height: self.height,
bgra,
}));
}
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => return Ok(None),
Err(e) if e.code() == MF_E_TRANSFORM_STREAM_CHANGE => {
// The decoder learned the frame size: (re)negotiate NV12 out,
// record the size, and retry the drain.
self.configure_output_nv12()
.context("decoder output renegotiation after stream change")?;
if let Ok((w, h)) = self.read_output_size() {
self.width = w;
self.height = h;
}
continue;
}
Err(e) => return Err(anyhow!("decoder ProcessOutput failed: {e:#}")),
}
}
}
/// Conservative NV12 buffer estimate when the decoder doesn't report cbSize.
fn guess_nv12_size(&self) -> u32 {
if self.width != 0 && self.height != 0 {
self.width * self.height * 3 / 2
} else {
// 1080p NV12 upper bound until the real size is known.
1920 * 1080 * 3 / 2
}
}
}
impl Drop for H264Decoder {
fn drop(&mut self) {
unsafe {
if self.streaming {
let _ = self
.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, 0);
let _ = self
.transform
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, 0);
}
if self.mf_started {
let _ = MFShutdown();
}
}
}
}
/// Enumerate and activate an H.264 decoder MFT (hardware preferred, software
/// acceptable — decode does not require a HW encoder).
unsafe fn activate_decoder() -> Result<IMFTransform> {
let input_type = MFT_REGISTER_TYPE_INFO {
guidMajorType: MFMediaType_Video,
guidSubtype: MFVideoFormat_H264,
};
let mut activate_ptr: *mut Option<windows::Win32::Media::MediaFoundation::IMFActivate> =
std::ptr::null_mut();
let mut count: u32 = 0;
// Allow both HW and SW decoders; SYNCMFT keeps the simple ProcessInput/Output
// contract this first cut uses.
MFTEnumEx(
MFT_CATEGORY_VIDEO_DECODER,
MFT_ENUM_FLAG_SYNCMFT | MFT_ENUM_FLAG_SORTANDFILTER,
Some(&input_type as *const _),
None,
&mut activate_ptr,
&mut count,
)
.context("MFTEnumEx (H264 decoder)")?;
if count == 0 || activate_ptr.is_null() {
if !activate_ptr.is_null() {
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
}
return Err(anyhow!("no H.264 decoder MFT available"));
}
let slice = std::slice::from_raw_parts_mut(activate_ptr, count as usize);
let mut chosen: Option<IMFTransform> = None;
for entry in slice.iter_mut() {
if chosen.is_none() {
if let Some(activate) = entry.as_ref() {
if let Ok(t) = activate.ActivateObject::<IMFTransform>() {
chosen = Some(t);
}
}
}
entry.take();
}
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
chosen.ok_or_else(|| anyhow!("failed to activate H.264 decoder MFT"))
}
/// Wrap an H.264 access unit into an IMFSample.
unsafe fn make_input_sample(data: &[u8], pts_100ns: i64) -> Result<IMFSample> {
let sample: IMFSample = MFCreateSample().context("MFCreateSample(dec in)")?;
let buffer = MFCreateMemoryBuffer(data.len() as u32).context("MFCreateMemoryBuffer(dec in)")?;
let mut ptr: *mut u8 = std::ptr::null_mut();
let mut max_len: u32 = 0;
buffer
.Lock(&mut ptr, Some(&mut max_len), None)
.context("decoder input Lock")?;
if (max_len as usize) < data.len() || ptr.is_null() {
let _ = buffer.Unlock();
return Err(anyhow!("MF buffer too small for H.264 access unit"));
}
std::ptr::copy_nonoverlapping(data.as_ptr(), ptr, data.len());
buffer.SetCurrentLength(data.len() as u32)?;
buffer.Unlock()?;
sample.AddBuffer(&buffer)?;
sample.SetSampleTime(pts_100ns)?;
Ok(sample)
}
/// Copy a sample's contiguous bytes into a Vec.
unsafe fn sample_to_vec(sample: &IMFSample) -> Result<Vec<u8>> {
let buffer = sample
.ConvertToContiguousBuffer()
.context("decoder ConvertToContiguousBuffer")?;
let mut ptr: *mut u8 = std::ptr::null_mut();
let mut len: u32 = 0;
buffer
.Lock(&mut ptr, None, Some(&mut len))
.context("decoder output Lock")?;
let out = if ptr.is_null() || len == 0 {
Vec::new()
} else {
std::slice::from_raw_parts(ptr, len as usize).to_vec()
};
let _ = buffer.Unlock();
Ok(out)
}
/// MF version word for `MFStartup` (see encoder::h264).
fn mf_version() -> u32 {
0x0002_0070
}
/// Convert an NV12 buffer to BGRA (BT.601 limited range). Inverse of the
/// encoder's BGRA->NV12. Shared with the unit tests below.
pub fn nv12_to_bgra(nv12: &[u8], width: u32, height: u32) -> Result<Vec<u8>> {
let w = width as usize;
let h = height as usize;
let y_size = w * h;
let need = y_size * 3 / 2;
if nv12.len() < need {
return Err(anyhow!("NV12 buffer too small: {} < {}", nv12.len(), need));
}
let (y_plane, uv_plane) = nv12.split_at(y_size);
let mut bgra = vec![0u8; w * h * 4];
let chroma_cols = w / 2;
for row in 0..h {
for col in 0..w {
let y = y_plane[row * w + col] as i32;
let cx = col / 2;
let cy = row / 2;
let uv_idx = (cy * chroma_cols + cx) * 2;
let u = uv_plane[uv_idx] as i32;
let v = uv_plane[uv_idx + 1] as i32;
// BT.601 limited-range YUV -> RGB.
let c = y - 16;
let d = u - 128;
let e = v - 128;
let r = ((298 * c + 409 * e + 128) >> 8).clamp(0, 255);
let g = ((298 * c - 100 * d - 208 * e + 128) >> 8).clamp(0, 255);
let b = ((298 * c + 516 * d + 128) >> 8).clamp(0, 255);
let px = (row * w + col) * 4;
bgra[px] = b as u8;
bgra[px + 1] = g as u8;
bgra[px + 2] = r as u8;
bgra[px + 3] = 255;
}
}
Ok(bgra)
}
#[cfg(test)]
mod tests {
use super::*;
use crate::encoder::color::{bgra_to_nv12, nv12_size};
/// Round-trip a solid color through BGRA->NV12->BGRA. Chroma subsampling and
/// limited-range rounding introduce small error, so allow a tolerance.
#[test]
fn nv12_bgra_roundtrip_is_approximately_lossless_for_solid_color() {
let w = 4u32;
let h = 4u32;
// Mid gray.
let mut bgra = vec![0u8; (w * h * 4) as usize];
for px in bgra.chunks_mut(4) {
px[0] = 120; // B
px[1] = 120; // G
px[2] = 120; // R
px[3] = 255;
}
let mut nv12 = vec![0u8; nv12_size(w, h)];
bgra_to_nv12(&bgra, w, h, &mut nv12).unwrap();
let back = nv12_to_bgra(&nv12, w, h).unwrap();
for (orig, got) in bgra.chunks(4).zip(back.chunks(4)) {
for ch in 0..3 {
let diff = (orig[ch] as i32 - got[ch] as i32).abs();
assert!(diff <= 6, "channel {ch} drift {diff} too large");
}
assert_eq!(got[3], 255, "alpha must be opaque");
}
}
#[test]
fn nv12_to_bgra_rejects_short_buffer() {
let nv12 = vec![0u8; 4];
assert!(nv12_to_bgra(&nv12, 16, 16).is_err());
}
#[test]
fn black_nv12_decodes_to_black_bgra() {
// Limited-range black: Y=16, UV=128.
let w = 2u32;
let h = 2u32;
let mut nv12 = vec![128u8; nv12_size(w, h)];
for y in nv12.iter_mut().take((w * h) as usize) {
*y = 16;
}
let bgra = nv12_to_bgra(&nv12, w, h).unwrap();
for px in bgra.chunks(4) {
assert!(px[0] <= 2 && px[1] <= 2 && px[2] <= 2, "near-black");
}
}
}

View File

@@ -3,6 +3,8 @@
//! This module provides the viewer functionality for connecting to remote
//! GuruConnect sessions with low-level keyboard hooks for Win key capture.
#[cfg(windows)]
mod decoder;
mod input;
mod render;
mod transport;
@@ -31,6 +33,72 @@ pub enum InputEvent {
SpecialKey(proto::SpecialKeyEvent),
}
/// Spawn the dedicated H.264 decode worker thread (Task 7, Windows only).
///
/// Returns a sender for `(h264_access_unit, pts_100ns)`. The worker lazily
/// creates the Media Foundation decoder on the first frame; if creation fails it
/// logs once and then silently drops subsequent frames (the raw render path is
/// never affected). Each decoded frame is converted to BGRA and delivered to the
/// viewer as an uncompressed `FrameData`, reusing the existing render path.
#[cfg(windows)]
fn spawn_h264_decode_worker(
viewer_tx: mpsc::Sender<ViewerEvent>,
) -> std::sync::mpsc::Sender<(Vec<u8>, i64)> {
let (tx, rx) = std::sync::mpsc::channel::<(Vec<u8>, i64)>();
std::thread::Builder::new()
.name("gc-h264-decode".to_string())
.spawn(move || {
let mut decoder: Option<decoder::H264Decoder> = None;
let mut init_failed = false;
while let Ok((data, pts)) = rx.recv() {
if init_failed {
continue;
}
if decoder.is_none() {
match decoder::H264Decoder::new() {
Ok(d) => {
info!("H.264 decoder initialized (Media Foundation)");
decoder = Some(d);
}
Err(e) => {
error!(
"H.264 decoder init failed: {e:#}; H.264 frames will be dropped"
);
init_failed = true;
continue;
}
}
}
let dec = decoder.as_mut().expect("decoder present after init");
match dec.decode(&data, pts) {
Ok(Some(decoded)) => {
let frame = render::FrameData {
width: decoded.width,
height: decoded.height,
data: decoded.bgra,
compressed: false, // already BGRA
is_keyframe: false,
};
if viewer_tx.blocking_send(ViewerEvent::Frame(frame)).is_err() {
// Viewer closed; stop the worker.
break;
}
}
Ok(None) => { /* decoder buffering; no output this tick */ }
Err(e) => {
warn!("H.264 decode error: {e:#}");
}
}
}
})
.expect("failed to spawn H.264 decode worker thread");
tx
}
/// Run the viewer to connect to a remote session
pub async fn run(server_url: &str, session_id: &str, api_key: &str) -> Result<()> {
info!("GuruConnect Viewer starting");
@@ -77,13 +145,23 @@ pub async fn run(server_url: &str, session_id: &str, api_key: &str) -> Result<()
}
});
// H.264 decode worker (Task 7, Windows only). The Media Foundation decoder
// wraps COM interfaces with thread affinity, so it runs on a DEDICATED OS
// thread (not a tokio task, which can migrate across workers at await
// points). The receive task forwards H.264 access units to it over a std
// channel; the worker decodes to BGRA and pushes a FrameData back through
// the viewer channel via `blocking_send`. On decoder-init failure the worker
// logs and drops H.264 frames (the raw path is unaffected).
#[cfg(windows)]
let h264_tx = spawn_h264_decode_worker(viewer_tx.clone());
// Spawn task to receive messages from server
let viewer_tx_recv = viewer_tx.clone();
let receive_task = tokio::spawn(async move {
while let Some(msg) = ws_receiver.recv().await {
match msg.payload {
Some(proto::message::Payload::VideoFrame(frame)) => {
if let Some(proto::video_frame::Encoding::Raw(raw)) = frame.encoding {
Some(proto::message::Payload::VideoFrame(frame)) => match frame.encoding {
Some(proto::video_frame::Encoding::Raw(raw)) => {
let frame_data = render::FrameData {
width: raw.width as u32,
height: raw.height as u32,
@@ -93,7 +171,23 @@ pub async fn run(server_url: &str, session_id: &str, api_key: &str) -> Result<()
};
let _ = viewer_tx_recv.send(ViewerEvent::Frame(frame_data)).await;
}
}
Some(proto::video_frame::Encoding::H264(enc)) => {
// Forward to the decode worker (Windows). On other
// platforms H.264 is never negotiated, so this is dead.
#[cfg(windows)]
{
if h264_tx.send((enc.data, enc.pts)).is_err() {
warn!("H.264 decode worker unavailable; dropping frame");
}
}
#[cfg(not(windows))]
{
let _ = enc;
}
}
// VP9/H265 not implemented on the viewer (raw + H.264 only).
_ => {}
},
Some(proto::message::Payload::CursorPosition(pos)) => {
let _ = viewer_tx_recv
.send(ViewerEvent::CursorPosition(pos.x, pos.y, pos.visible))

View File

@@ -17,6 +17,23 @@ message SessionResponse {
string session_id = 2;
string error = 3;
DisplayInfo display_info = 4;
// Negotiated video codec for this session (Task 7). The plan models the
// codec selection here; however the LIVE server->agent handshake in v2 is
// done over WebSocket query params + StartStream (SessionRequest/
// SessionResponse are not exchanged on the wire today), so the codec the
// agent actually applies is carried on StartStream.video_codec below. This
// field is kept for spec parity / future use; raw is the safe default.
VideoCodec video_codec = 5;
}
// Negotiated video codec (Task 7). RAW = salvaged BGRA+Zstd+dirty-rects, the
// guaranteed fallback and current default. H264 = hardware Media Foundation
// encode, the negotiated upgrade (compile-verified; validated on hardware in
// plan Task 8). H265/HEVC is future opt-in (TODO).
enum VideoCodec {
VIDEO_CODEC_RAW = 0; // Raw BGRA + Zstd (safe default / fallback)
VIDEO_CODEC_H264 = 1; // Hardware H.264 via Media Foundation
VIDEO_CODEC_H265 = 2; // Future / opt-in (not implemented)
}
enum SessionType {
@@ -268,6 +285,13 @@ message Disconnect {
message StartStream {
string viewer_id = 1; // ID of viewer requesting stream
int32 display_id = 2; // Which display to stream (0 = primary)
// Negotiated codec for this stream (Task 7). The server selects this from
// the agent's advertised capability (AgentStatus.supports_h264) and its
// policy (DEFAULT_PREFER_H264, currently false so we never ship unvalidated
// H.264 as the default). The agent builds its encoder from this value;
// VIDEO_CODEC_RAW (0, the default for older servers) keeps the salvaged
// raw+Zstd path. On any H.264 init failure the agent falls back to raw.
VideoCodec video_codec = 3;
}
// Server commands agent to stop streaming
@@ -287,6 +311,12 @@ message AgentStatus {
string organization = 8; // Company/organization name
string site = 9; // Site/location name
repeated string tags = 10; // Tags for categorization
// HW-encode capability (Task 7). True when the agent enumerated a HARDWARE
// H.264 encoder MFT at startup (MFTEnumEx, MFT_CATEGORY_VIDEO_ENCODER,
// MFVideoFormat_H264, hardware flag). The server uses this for codec
// negotiation (see StartStream.video_codec). Detected once and cached;
// false on non-Windows / no HW encoder / MF unavailable.
bool supports_h264 = 11;
}
// Server commands agent to uninstall itself

View File

@@ -805,6 +805,7 @@ async fn handle_agent_connection(
organization.clone(),
site.clone(),
status.tags.clone(),
status.supports_h264,
)
.await;

View File

@@ -93,6 +93,36 @@ pub struct Session {
pub organization: Option<String>, // Company/organization name
pub site: Option<String>, // Site/location name
pub tags: Vec<String>, // Tags for categorization
/// Whether the agent advertised a hardware H.264 encoder (Task 7). Set from
/// `AgentStatus.supports_h264`; drives codec negotiation in `select_video_codec`.
pub supports_h264: bool,
}
/// Default codec-negotiation policy (Task 7).
///
/// `false` means: even when an agent advertises hardware H.264 support, the
/// server still negotiates RAW. H.264 is compile-verified only and not yet
/// validated on real hardware (plan Task 8), so we deliberately do NOT ship it
/// as the default — raw+Zstd stays the guaranteed working path. Flip this to
/// `true` once H.264 is live-validated, or make it per-tenant policy later.
pub const DEFAULT_PREFER_H264: bool = false;
/// Negotiate the video codec for a stream (Task 7).
///
/// Pure decision function (unit-tested): given whether the agent advertised
/// hardware H.264 and whether policy prefers H.264, pick the codec. H.264 is
/// chosen ONLY when both the agent supports it AND policy allows it; otherwise
/// raw — the safe default/fallback. HEVC is intentionally never selected here
/// (future opt-in; TODO).
pub fn select_video_codec(
agent_supports_h264: bool,
prefer_h264: bool,
) -> crate::proto::VideoCodec {
if agent_supports_h264 && prefer_h264 {
crate::proto::VideoCodec::H264
} else {
crate::proto::VideoCodec::Raw
}
}
/// Channel for sending frames from agent to viewers
@@ -206,6 +236,7 @@ impl SessionManager {
organization: None,
site: None,
tags: Vec::new(),
supports_h264: false,
};
let session_data = SessionData {
@@ -240,12 +271,14 @@ impl SessionManager {
organization: Option<String>,
site: Option<String>,
tags: Vec<String>,
supports_h264: bool,
) {
let mut sessions = self.sessions.write().await;
if let Some(session_data) = sessions.get_mut(&session_id) {
session_data.info.last_heartbeat = chrono::Utc::now();
session_data.last_heartbeat_instant = Instant::now();
session_data.info.is_streaming = is_streaming;
session_data.info.supports_h264 = supports_h264;
if let Some(os) = os_version {
session_data.info.os_version = Some(os);
}
@@ -409,10 +442,23 @@ impl SessionManager {
use crate::proto;
use prost::Message;
// Negotiate the video codec for this stream (Task 7): H.264 only when the
// agent advertised hardware support AND policy prefers it. With
// DEFAULT_PREFER_H264 = false this always resolves to RAW today (H.264 is
// compile-verified only, validated on hardware in Task 8).
let codec = select_video_codec(session_data.info.supports_h264, DEFAULT_PREFER_H264);
tracing::info!(
"StartStream codec negotiation: agent_supports_h264={}, prefer_h264={} -> {:?}",
session_data.info.supports_h264,
DEFAULT_PREFER_H264,
codec
);
let start_stream = proto::Message {
payload: Some(proto::message::Payload::StartStream(proto::StartStream {
viewer_id: viewer_id.to_string(),
display_id: 0, // Primary display
video_codec: codec as i32,
})),
};
@@ -618,6 +664,7 @@ impl SessionManager {
organization: None,
site: None,
tags: Vec::new(),
supports_h264: false,
};
// Create placeholder channels (will be replaced on reconnect)
@@ -717,6 +764,69 @@ mod tests {
.is_some());
}
#[test]
fn codec_negotiation_picks_h264_only_when_supported_and_preferred() {
use crate::proto::VideoCodec;
// Agent supports H.264 AND policy prefers it -> H.264.
assert_eq!(select_video_codec(true, true), VideoCodec::H264);
// Agent supports it but policy does not prefer it -> raw (the safe default).
assert_eq!(select_video_codec(true, false), VideoCodec::Raw);
// Policy prefers H.264 but the agent has no HW encoder -> raw.
assert_eq!(select_video_codec(false, true), VideoCodec::Raw);
// Neither -> raw.
assert_eq!(select_video_codec(false, false), VideoCodec::Raw);
}
#[test]
fn default_policy_does_not_prefer_h264() {
// Guardrail: until H.264 is hardware-validated (Task 8) the default policy
// MUST keep raw as the negotiated codec even for capable agents. We assert
// the OBSERVABLE behavior (codec selection under the default policy) rather
// than the constant directly, which keeps the test meaningful if the policy
// later becomes dynamic.
let chosen = select_video_codec(true, DEFAULT_PREFER_H264);
assert_eq!(
chosen,
crate::proto::VideoCodec::Raw,
"default policy must negotiate raw until H.264 is hardware-validated"
);
}
#[tokio::test]
async fn agent_status_updates_h264_capability() {
let mgr = SessionManager::new();
let (session_id, _frame_tx, _input_rx) = mgr
.register_agent("agent-cap".to_string(), "Cap PC".to_string(), true)
.await;
// Default is false until a status reports capability.
assert_eq!(
mgr.get_session(session_id).await.map(|s| s.supports_h264),
Some(false)
);
mgr.update_agent_status(
session_id,
Some("Windows".to_string()),
true,
10,
1,
false,
Some("0.2.0".to_string()),
None,
None,
Vec::new(),
true, // supports_h264
)
.await;
assert_eq!(
mgr.get_session(session_id).await.map(|s| s.supports_h264),
Some(true)
);
}
#[tokio::test]
async fn denied_attended_session_keeps_viewer_blocked() {
let mgr = SessionManager::new();

View File

@@ -393,11 +393,104 @@ Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.r
---
## Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback
## Task 7 [IMPLEMENTED 2026-05-30 — self-verified on local Windows toolchain: `cargo fmt --all --check` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 89 pass (36 agent + 53 server; was 70, no regressions), `cargo build --workspace` ok; pending Code Review]: Hardware H.264 encode + negotiated raw/Zstd fallback
Files touched: `agent/src/encoder/` (`mod.rs`, `h264.rs` [new], `raw.rs` [salvaged]),
`agent/src/capture/` (feed), `agent/src/viewer/` (decode), `proto/guruconnect.proto`
(`AgentStatus` capability, `SessionResponse` codec), `server/src/session/mod.rs` (negotiation).
> [IMPLEMENTED] Raw+Zstd remains the DEFAULT and guaranteed fallback; H.264 is a
> negotiated upgrade that is COMPILE-VERIFIED ONLY (live MF encode/decode is Task
> 8 — needs real GPU + frames). The testable parts (abstraction, factory,
> negotiation, capability plumbing, color-conversion math) are done solidly with
> unit tests; the MF H.264 encoder and viewer decoder are first-cut, clearly
> marked, and gated behind a default-off policy so unvalidated H.264 never ships
> as the default.
>
> 1. ENCODER ABSTRACTION (`agent/src/encoder/mod.rs`): the existing `Encoder`
> trait (`encode(&mut self, &CapturedFrame) -> Result<EncodedFrame>`) is the
> abstraction; `RawEncoder` (salvaged raw+Zstd+dirty-rects, UNCHANGED behavior)
> and the new `H264Encoder` both implement it. Factory split into pure pieces:
> `codec_from_str` (config-string -> `VideoCodec`), `select_codec(negotiated,
> hardware_available)` (agent-side guard: H.264 only if HW present, HEVC->raw,
> else raw), and `create_encoder_for(VideoCodec, quality)` (builds the encoder;
> on H.264 init failure logs + returns a RAW encoder so the session never
> breaks). UNIT-TESTED: codec_from_str mapping, select_codec guard matrix, raw
> factory always succeeds, string path resolves to raw without HW.
> 2. CAPABILITY + NEGOTIATION (testable, done well):
> - `encoder/capability.rs`: `supports_hardware_h264()` probes MF once
> (`MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_HARDWARE,
> MFVideoFormat_H264)`), caches the bool via `OnceLock`; false on non-Windows
> / no HW / MF error. Advertised in `AgentStatus.supports_h264` (proto field
> 11, additive).
> - Server (`server/src/session/mod.rs`): `select_video_codec(agent_supports,
> prefer_h264)` is a PURE decision fn — H.264 only when BOTH the agent
> supports it AND policy prefers it, else raw. Policy constant
> `DEFAULT_PREFER_H264 = false` (documented: keeps raw as the negotiated codec
> until H.264 is hardware-validated). `supports_h264` stored on the in-memory
> `Session` from `AgentStatus` (`update_agent_status` gained the param). The
> negotiated codec is stamped on `StartStream.video_codec` in
> `send_start_stream_internal` (the LIVE server->agent codec-selection point —
> SessionRequest/SessionResponse are not exchanged on the wire in v2, so the
> proto's `SessionResponse.video_codec` is kept for spec parity but the live
> path uses `StartStream`). UNIT-TESTED: the negotiation matrix, the
> default-policy guardrail (capable agent still gets raw), and the
> `AgentStatus -> supports_h264` ingest.
> - Agent applies it: `StartStream` handler decodes `video_codec`, stores
> `negotiated_codec`, and `init_streaming` builds the encoder via
> `select_codec` + `create_encoder_for` (re-guards on local HW; older server
> sends 0 = RAW, preserving the default).
> 3. MF H.264 ENCODER (`agent/src/encoder/h264.rs`, FIRST-CUT, compile-verified
> only): enumerates+activates a HW H.264 encoder MFT, sets H.264 output then
> NV12 input media types (frame size/rate, bitrate from quality), feeds frames
> (`ProcessInput`) and drains synchronously (`ProcessOutput`, NEED_MORE_INPUT =
> "no output this tick"), emitting `VideoFrame{H264(EncodedFrame{data, keyframe,
> pts, dts})}`. BGRA->NV12 via `encoder/color.rs` (BT.601 limited-range, 2x2 box
> chroma; isolated + UNIT-TESTED: size, odd-dim/short-buffer rejection, black/
> white/red reference values, plane coverage). On ANY init failure the FACTORY
> falls back to raw (logged); per-frame errors surface to the session (which
> logs + continues). Handles resolution change (re-init), keyframe flag
> (CleanPoint), MF buffer alloc for non-sample-providing MFTs. NOT yet live: the
> async-MFT event model is documented as a Task-8 refinement (this cut drains
> synchronously); precise force-IDR (CODECAPI) is a TODO; D3D11 zero-copy
> deferred (feeds CPU NV12).
> 4. VIEWER H.264 DECODE (`agent/src/viewer/decoder.rs` [new], FIRST-CUT,
> compile-verified only): MF H.264 decoder MFT -> NV12 -> BGRA
> (`nv12_to_bgra`, BT.601 inverse, UNIT-TESTED round-trip within tolerance +
> short-buffer + black). Runs on a DEDICATED OS thread (`gc-h264-decode`), NOT a
> tokio task — the MF decoder has COM thread affinity and a tokio task can
> migrate across workers at await points. The receive task forwards H.264 access
> units over a std channel; the worker decodes and pushes BGRA `FrameData`
> through the existing render path via `blocking_send`. On decoder-init failure
> it logs once and drops H.264 frames; the RAW render path is untouched. Handles
> the `MF_E_TRANSFORM_STREAM_CHANGE` NV12 output renegotiation + size discovery.
> 5. RAW STILL WORKS END-TO-END: `RawEncoder` is unchanged; with
> `DEFAULT_PREFER_H264 = false` the server negotiates RAW for every session
> (including capable agents), the agent builds the raw encoder, and the viewer's
> existing `Raw` branch renders it — the guaranteed default/fallback path is
> fully intact and is what runs today.
>
> PROTO (additive — no field renumbered): `VideoCodec` enum (RAW=0, H264=1,
> H265=2); `SessionResponse.video_codec = 5` (spec parity); `StartStream.video_codec
> = 3` (live negotiation); `AgentStatus.supports_h264 = 11` (capability). HEVC is a
> documented TODO/opt-in everywhere (never selected). Cargo.toml: added the
> `Win32_Media_MediaFoundation` + COM windows features (no new external crates).
>
> COMPILE-VERIFIED-ONLY / NEEDS LIVE HARDWARE (Task 8): the MF H.264 encoder
> init/feed/emit on a real GPU, the viewer MF decoder on a live stream, the
> BGRA<->NV12 fidelity end-to-end, and the synchronous-drain timing. The encoder/
> decoder are structured to fall back to raw (encoder) / drop frames + log
> (decoder) on any failure so they cannot break a session even if MF misbehaves.
>
> TESTS ADDED (19): agent +16 (encoder factory/select matrix x5, color BGRA->NV12
> x8, decoder NV12<->BGRA x3), server +3 (codec negotiation matrix, default-policy
> guardrail, AgentStatus capability ingest).
Files touched: `proto/guruconnect.proto` (`VideoCodec` enum + `SessionResponse.video_codec`
+ `StartStream.video_codec` + `AgentStatus.supports_h264`), `agent/Cargo.toml` (MF/COM windows
features), `agent/src/encoder/mod.rs` (trait/factory/select), `agent/src/encoder/raw.rs`
(salvaged, unchanged), `agent/src/encoder/h264.rs` [new], `agent/src/encoder/capability.rs` [new],
`agent/src/encoder/color.rs` [new], `agent/src/session/mod.rs` (negotiated codec apply +
`supports_h264` advertise), `agent/src/viewer/mod.rs` (H.264 route + decode worker),
`agent/src/viewer/decoder.rs` [new], `server/src/session/mod.rs` (`select_video_codec` +
`DEFAULT_PREFER_H264` + `supports_h264` field/ingest + `StartStream` codec stamp),
`server/src/relay/mod.rs` (pass `supports_h264` from `AgentStatus`).
- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
`EncodedFrame` (h264). Native viewer decodes via MF/D3D11.