feat(agent,server): v2 secure-session-core Task 7 - HW H.264 + negotiated raw fallback
All checks were successful
All checks were successful
SPEC-002 Phase 1 Task 7 (the last), code-reviewed APPROVED, locally verified (cargo fmt + clippy -D warnings exit 0 + cargo test --workspace 89 pass + build). - Encoder trait + factory: RawEncoder (salvaged, UNCHANGED) and H264Encoder, selected by negotiation; factory falls back to raw on H.264 init failure. - Negotiation: agent advertises supports_h264 (MFTEnumEx HW probe, cached) in AgentStatus; server picks the codec via select_video_codec(supports, prefer) and stamps StartStream.video_codec; agent re-guards on local HW. Policy constant DEFAULT_PREFER_H264 = false, so RAW is negotiated for every session today - H.264 stays dormant until live hardware validation (Task 8). - MF H.264 encoder (h264.rs, FIRST-CUT / compile-verified-only): HW encoder MFT, BGRA->NV12 (color.rs, unit-tested), sync drain, fall-back-to-raw on any failure. - Viewer H.264 decoder (decoder.rs, FIRST-CUT): MF decoder on a dedicated COM thread; drops+logs on failure, raw render path untouched. - proto additive: VideoCodec enum, StartStream.video_codec=3, SessionResponse.video_codec=5, AgentStatus.supports_h264=11. - Raw+Zstd path byte-for-byte unchanged; remains the guaranteed default/fallback. Review confirmed unsafe impl Send for H264Encoder is sound (single-owned &mut on the block_on thread; session future never spawned) and every MF failure degrades to raw. H.264 is NOT claimed functional - compile/clippy/build-verified only; live validation + force-IDR + the no-spawn-invariant doc are Task 8 go-live gates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -95,6 +95,13 @@ windows = { version = "0.58", features = [
|
||||
"Win32_System_Pipes",
|
||||
"Win32_System_SystemServices",
|
||||
"Win32_System_IO",
|
||||
"Win32_System_Com",
|
||||
"Win32_System_Com_StructuredStorage",
|
||||
"Win32_System_Ole",
|
||||
"Win32_System_Variant",
|
||||
"Win32_Media_MediaFoundation",
|
||||
"Win32_Media_KernelStreaming",
|
||||
"Win32_Media_DirectShow",
|
||||
]}
|
||||
|
||||
# Windows service support
|
||||
|
||||
97
agent/src/encoder/capability.rs
Normal file
97
agent/src/encoder/capability.rs
Normal file
@@ -0,0 +1,97 @@
|
||||
//! Hardware video-encode capability detection (Task 7).
|
||||
//!
|
||||
//! Probes Windows Media Foundation for a HARDWARE H.264 encoder MFT at startup.
|
||||
//! The result is cached and advertised to the server in `AgentStatus.supports_h264`
|
||||
//! so the server can negotiate the codec (see `StartStream.video_codec`).
|
||||
//!
|
||||
//! Detection is intentionally cheap and side-effect-free: it only ENUMERATES the
|
||||
//! available encoder MFTs (it does not create or initialize one). A `true` result
|
||||
//! means a hardware H.264 encoder was advertised by the OS; it does NOT guarantee
|
||||
//! the encoder will successfully initialize at stream time — the H.264 encoder
|
||||
//! still falls back to raw on any init/feed failure.
|
||||
//!
|
||||
//! On non-Windows targets, or if MF is unavailable, this reports `false`.
|
||||
|
||||
use std::sync::OnceLock;
|
||||
|
||||
/// Cached capability result. Detection runs at most once per process.
|
||||
static SUPPORTS_H264: OnceLock<bool> = OnceLock::new();
|
||||
|
||||
/// Return whether this machine has a hardware H.264 encoder, detecting once and
|
||||
/// caching the result. Safe to call repeatedly and from any thread.
|
||||
pub fn supports_hardware_h264() -> bool {
|
||||
*SUPPORTS_H264.get_or_init(detect_hardware_h264)
|
||||
}
|
||||
|
||||
/// Run the actual detection. Separated so the cached accessor stays trivial.
|
||||
fn detect_hardware_h264() -> bool {
|
||||
let supported = detect_inner();
|
||||
if supported {
|
||||
tracing::info!("Hardware H.264 encoder detected (Media Foundation)");
|
||||
} else {
|
||||
tracing::info!("No hardware H.264 encoder detected; raw+Zstd only");
|
||||
}
|
||||
supported
|
||||
}
|
||||
|
||||
#[cfg(windows)]
|
||||
fn detect_inner() -> bool {
|
||||
// Enumerate hardware H.264 encoder MFTs. This is a read-only probe; it does
|
||||
// not init D3D, COM apartments persistently, or create the encoder.
|
||||
match unsafe { enumerate_hardware_h264() } {
|
||||
Ok(found) => found,
|
||||
Err(e) => {
|
||||
tracing::warn!("H.264 capability probe failed: {e:#}; assuming no HW encoder");
|
||||
false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(not(windows))]
|
||||
fn detect_inner() -> bool {
|
||||
false
|
||||
}
|
||||
|
||||
#[cfg(windows)]
|
||||
unsafe fn enumerate_hardware_h264() -> anyhow::Result<bool> {
|
||||
use windows::Win32::Media::MediaFoundation::{
|
||||
MFMediaType_Video, MFTEnumEx, MFVideoFormat_H264, MFT_CATEGORY_VIDEO_ENCODER,
|
||||
MFT_ENUM_FLAG_HARDWARE, MFT_ENUM_FLAG_SORTANDFILTER, MFT_ENUM_FLAG_TRANSCODE_ONLY,
|
||||
MFT_REGISTER_TYPE_INFO,
|
||||
};
|
||||
|
||||
// We only specify the OUTPUT type (H.264); input is left unconstrained so the
|
||||
// probe matches encoders regardless of their preferred input subtype.
|
||||
let output_type = MFT_REGISTER_TYPE_INFO {
|
||||
guidMajorType: MFMediaType_Video,
|
||||
guidSubtype: MFVideoFormat_H264,
|
||||
};
|
||||
|
||||
let mut activate_ptr: *mut Option<windows::Win32::Media::MediaFoundation::IMFActivate> =
|
||||
std::ptr::null_mut();
|
||||
let mut count: u32 = 0;
|
||||
|
||||
// MFTEnumEx does not itself require MFStartup for a pure enumeration, but we
|
||||
// guard with a Result so any HRESULT failure degrades to "no HW encoder".
|
||||
MFTEnumEx(
|
||||
MFT_CATEGORY_VIDEO_ENCODER,
|
||||
MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_SORTANDFILTER | MFT_ENUM_FLAG_TRANSCODE_ONLY,
|
||||
None, // input type: any
|
||||
Some(&output_type as *const _),
|
||||
&mut activate_ptr,
|
||||
&mut count,
|
||||
)?;
|
||||
|
||||
// Release every returned IMFActivate, then free the array CoTaskMemAlloc'd by MF.
|
||||
let found = count > 0;
|
||||
if !activate_ptr.is_null() {
|
||||
let slice = std::slice::from_raw_parts_mut(activate_ptr, count as usize);
|
||||
for entry in slice.iter_mut() {
|
||||
// Dropping the Option<IMFActivate> releases the COM reference.
|
||||
entry.take();
|
||||
}
|
||||
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
|
||||
}
|
||||
|
||||
Ok(found)
|
||||
}
|
||||
269
agent/src/encoder/color.rs
Normal file
269
agent/src/encoder/color.rs
Normal file
@@ -0,0 +1,269 @@
|
||||
//! Color-space conversion for the H.264 encode path (Task 7).
|
||||
//!
|
||||
//! Screen capture produces BGRA (4 bytes/pixel, B,G,R,A order — the DXGI/GDI
|
||||
//! native layout). Media Foundation hardware H.264 encoders want NV12: a full-
|
||||
//! resolution 8-bit Y (luma) plane followed by an interleaved half-resolution
|
||||
//! U/V (chroma) plane. This module does that conversion in software.
|
||||
//!
|
||||
//! NV12 memory layout for a `width x height` frame (width/height assumed even):
|
||||
//! - Y plane: `width * height` bytes, row-major.
|
||||
//! - UV plane: `width * (height / 2)` bytes — for each 2x2 luma block one
|
||||
//! (U, V) pair, so the plane is `(width/2)` (U,V) pairs per row over
|
||||
//! `height/2` rows, i.e. `width` bytes per chroma row.
|
||||
//!
|
||||
//! Total size = `width * height * 3 / 2`.
|
||||
//!
|
||||
//! The coefficients are BT.601 "studio swing" (limited range, 16..235 luma),
|
||||
//! which is what MF H.264 encoders expect by default. Chroma is computed by
|
||||
//! averaging the 2x2 BGRA block before conversion (box downsample) to reduce
|
||||
//! aliasing.
|
||||
|
||||
/// Size in bytes of an NV12 buffer for `width` x `height` (both even).
|
||||
#[inline]
|
||||
pub fn nv12_size(width: u32, height: u32) -> usize {
|
||||
(width as usize * height as usize) * 3 / 2
|
||||
}
|
||||
|
||||
/// BT.601 limited-range luma from 8-bit R,G,B.
|
||||
#[inline]
|
||||
fn rgb_to_y(r: i32, g: i32, b: i32) -> u8 {
|
||||
// Y = 16 + (65.481*R + 128.553*G + 24.966*B) / 255, fixed-point.
|
||||
// Using the common integer approximation:
|
||||
// Y = ((66*R + 129*G + 25*B + 128) >> 8) + 16
|
||||
let y = ((66 * r + 129 * g + 25 * b + 128) >> 8) + 16;
|
||||
y.clamp(0, 255) as u8
|
||||
}
|
||||
|
||||
/// BT.601 limited-range Cb (U) from 8-bit R,G,B.
|
||||
#[inline]
|
||||
fn rgb_to_u(r: i32, g: i32, b: i32) -> u8 {
|
||||
let u = ((-38 * r - 74 * g + 112 * b + 128) >> 8) + 128;
|
||||
u.clamp(0, 255) as u8
|
||||
}
|
||||
|
||||
/// BT.601 limited-range Cr (V) from 8-bit R,G,B.
|
||||
#[inline]
|
||||
fn rgb_to_v(r: i32, g: i32, b: i32) -> u8 {
|
||||
let v = ((112 * r - 94 * g - 18 * b + 128) >> 8) + 128;
|
||||
v.clamp(0, 255) as u8
|
||||
}
|
||||
|
||||
/// Convert a tightly-packed BGRA frame into NV12, writing into `out`.
|
||||
///
|
||||
/// `bgra` must be at least `width * height * 4` bytes; `out` must be at least
|
||||
/// `nv12_size(width, height)` bytes. `width` and `height` MUST be even (H.264
|
||||
/// 4:2:0 requires even dimensions — the caller pads odd capture sizes). Returns
|
||||
/// an error rather than panicking on a short buffer or odd dimension so the
|
||||
/// encoder can fall back to raw.
|
||||
pub fn bgra_to_nv12(
|
||||
bgra: &[u8],
|
||||
width: u32,
|
||||
height: u32,
|
||||
out: &mut [u8],
|
||||
) -> Result<(), ColorConvertError> {
|
||||
if width == 0 || height == 0 {
|
||||
return Err(ColorConvertError::ZeroDimension);
|
||||
}
|
||||
if !width.is_multiple_of(2) || !height.is_multiple_of(2) {
|
||||
return Err(ColorConvertError::OddDimension { width, height });
|
||||
}
|
||||
|
||||
let w = width as usize;
|
||||
let h = height as usize;
|
||||
let expected_src = w * h * 4;
|
||||
if bgra.len() < expected_src {
|
||||
return Err(ColorConvertError::SrcTooSmall {
|
||||
got: bgra.len(),
|
||||
need: expected_src,
|
||||
});
|
||||
}
|
||||
let need_out = nv12_size(width, height);
|
||||
if out.len() < need_out {
|
||||
return Err(ColorConvertError::DstTooSmall {
|
||||
got: out.len(),
|
||||
need: need_out,
|
||||
});
|
||||
}
|
||||
|
||||
let (y_plane, uv_plane) = out.split_at_mut(w * h);
|
||||
|
||||
// Luma: one sample per pixel.
|
||||
for row in 0..h {
|
||||
let src_row = row * w * 4;
|
||||
let dst_row = row * w;
|
||||
for col in 0..w {
|
||||
let px = src_row + col * 4;
|
||||
// BGRA order.
|
||||
let b = bgra[px] as i32;
|
||||
let g = bgra[px + 1] as i32;
|
||||
let r = bgra[px + 2] as i32;
|
||||
y_plane[dst_row + col] = rgb_to_y(r, g, b);
|
||||
}
|
||||
}
|
||||
|
||||
// Chroma: one (U,V) pair per 2x2 block, box-averaged.
|
||||
let chroma_rows = h / 2;
|
||||
let chroma_cols = w / 2;
|
||||
for cy in 0..chroma_rows {
|
||||
for cx in 0..chroma_cols {
|
||||
let x0 = cx * 2;
|
||||
let y0 = cy * 2;
|
||||
|
||||
let mut r_sum = 0i32;
|
||||
let mut g_sum = 0i32;
|
||||
let mut b_sum = 0i32;
|
||||
for dy in 0..2 {
|
||||
for dx in 0..2 {
|
||||
let px = ((y0 + dy) * w + (x0 + dx)) * 4;
|
||||
b_sum += bgra[px] as i32;
|
||||
g_sum += bgra[px + 1] as i32;
|
||||
r_sum += bgra[px + 2] as i32;
|
||||
}
|
||||
}
|
||||
let r = r_sum / 4;
|
||||
let g = g_sum / 4;
|
||||
let b = b_sum / 4;
|
||||
|
||||
let uv_idx = (cy * chroma_cols + cx) * 2;
|
||||
uv_plane[uv_idx] = rgb_to_u(r, g, b);
|
||||
uv_plane[uv_idx + 1] = rgb_to_v(r, g, b);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Errors from BGRA->NV12 conversion. Surfaced (not panicked) so the H.264
|
||||
/// encoder can downgrade to raw.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum ColorConvertError {
|
||||
#[error("frame dimension is zero")]
|
||||
ZeroDimension,
|
||||
#[error("NV12 requires even dimensions, got {width}x{height}")]
|
||||
OddDimension { width: u32, height: u32 },
|
||||
#[error("source BGRA buffer too small: {got} < {need}")]
|
||||
SrcTooSmall { got: usize, need: usize },
|
||||
#[error("destination NV12 buffer too small: {got} < {need}")]
|
||||
DstTooSmall { got: usize, need: usize },
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn nv12_size_is_3half() {
|
||||
assert_eq!(nv12_size(2, 2), 6);
|
||||
assert_eq!(nv12_size(4, 4), 24);
|
||||
assert_eq!(nv12_size(1920, 1080), 1920 * 1080 * 3 / 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_odd_dimensions() {
|
||||
let bgra = vec![0u8; 3 * 3 * 4];
|
||||
let mut out = vec![0u8; nv12_size(4, 4)];
|
||||
assert!(matches!(
|
||||
bgra_to_nv12(&bgra, 3, 2, &mut out),
|
||||
Err(ColorConvertError::OddDimension { .. })
|
||||
));
|
||||
assert!(matches!(
|
||||
bgra_to_nv12(&bgra, 2, 3, &mut out),
|
||||
Err(ColorConvertError::OddDimension { .. })
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_short_source() {
|
||||
let bgra = vec![0u8; 4]; // way too small for 2x2
|
||||
let mut out = vec![0u8; nv12_size(2, 2)];
|
||||
assert!(matches!(
|
||||
bgra_to_nv12(&bgra, 2, 2, &mut out),
|
||||
Err(ColorConvertError::SrcTooSmall { .. })
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_short_dest() {
|
||||
let bgra = vec![0u8; 2 * 2 * 4];
|
||||
let mut out = vec![0u8; 1];
|
||||
assert!(matches!(
|
||||
bgra_to_nv12(&bgra, 2, 2, &mut out),
|
||||
Err(ColorConvertError::DstTooSmall { .. })
|
||||
));
|
||||
}
|
||||
|
||||
/// A pure-black BGRA frame -> Y = 16 (limited-range black), U = V = 128.
|
||||
#[test]
|
||||
fn black_frame_maps_to_limited_range_black() {
|
||||
let bgra = vec![0u8; 4 * 4 * 4]; // all zero => black, alpha 0
|
||||
let mut out = vec![0u8; nv12_size(4, 4)];
|
||||
bgra_to_nv12(&bgra, 4, 4, &mut out).unwrap();
|
||||
|
||||
// Y plane (first 16 bytes) all 16.
|
||||
for &y in &out[..16] {
|
||||
assert_eq!(y, 16, "black luma must be 16 (limited range)");
|
||||
}
|
||||
// UV plane all 128 (neutral chroma).
|
||||
for &c in &out[16..] {
|
||||
assert_eq!(c, 128, "black chroma must be neutral 128");
|
||||
}
|
||||
}
|
||||
|
||||
/// A pure-white BGRA frame -> Y = 235 (limited-range white), U = V = 128.
|
||||
#[test]
|
||||
fn white_frame_maps_to_limited_range_white() {
|
||||
// B=255, G=255, R=255, A=255 for every pixel.
|
||||
let bgra = vec![255u8; 2 * 2 * 4];
|
||||
let mut out = vec![0u8; nv12_size(2, 2)];
|
||||
bgra_to_nv12(&bgra, 2, 2, &mut out).unwrap();
|
||||
|
||||
// Y = ((66+129+25)*255 + 128) >> 8 + 16 = 235.
|
||||
for &y in &out[..4] {
|
||||
assert_eq!(y, 235, "white luma must be 235 (limited range)");
|
||||
}
|
||||
// Neutral chroma for a gray/white pixel.
|
||||
assert_eq!(out[4], 128);
|
||||
assert_eq!(out[5], 128);
|
||||
}
|
||||
|
||||
/// A pure-red frame: luma below mid, V (Cr) well above 128, U (Cb) below 128.
|
||||
#[test]
|
||||
fn red_frame_has_high_cr_low_cb() {
|
||||
// BGRA red: B=0, G=0, R=255, A=255.
|
||||
let mut bgra = vec![0u8; 2 * 2 * 4];
|
||||
for px in bgra.chunks_mut(4) {
|
||||
px[0] = 0; // B
|
||||
px[1] = 0; // G
|
||||
px[2] = 255; // R
|
||||
px[3] = 255; // A
|
||||
}
|
||||
let mut out = vec![0u8; nv12_size(2, 2)];
|
||||
bgra_to_nv12(&bgra, 2, 2, &mut out).unwrap();
|
||||
|
||||
let u = out[4];
|
||||
let v = out[5];
|
||||
assert!(v > 200, "red must have high Cr (V), got {v}");
|
||||
assert!(u < 128, "red must have Cb (U) below neutral, got {u}");
|
||||
}
|
||||
|
||||
/// Conversion fills the whole NV12 buffer (no leftover zeros where data is
|
||||
/// expected) for a non-trivial gradient — a sanity check on plane indexing.
|
||||
#[test]
|
||||
fn plane_indexing_covers_full_buffer() {
|
||||
let w = 8u32;
|
||||
let h = 8u32;
|
||||
let mut bgra = vec![0u8; (w * h * 4) as usize];
|
||||
for (i, px) in bgra.chunks_mut(4).enumerate() {
|
||||
let v = (i % 256) as u8;
|
||||
px[0] = v;
|
||||
px[1] = v;
|
||||
px[2] = v;
|
||||
px[3] = 255;
|
||||
}
|
||||
let mut out = vec![0xAAu8; nv12_size(w, h)];
|
||||
bgra_to_nv12(&bgra, w, h, &mut out).unwrap();
|
||||
// Y plane should be fully written (gray ramp -> non-constant).
|
||||
let y_plane = &out[..(w * h) as usize];
|
||||
assert!(y_plane.windows(2).any(|p| p[0] != p[1]), "Y plane varies");
|
||||
}
|
||||
}
|
||||
515
agent/src/encoder/h264.rs
Normal file
515
agent/src/encoder/h264.rs
Normal file
@@ -0,0 +1,515 @@
|
||||
//! Hardware H.264 encoder via Windows Media Foundation (Task 7).
|
||||
//!
|
||||
//! FIRST-CUT / COMPILE-VERIFIED ONLY. This encoder is wired end-to-end (init ->
|
||||
//! feed -> drain -> emit `EncodedFrame{h264}`) and is selected only when the
|
||||
//! agent advertised hardware support AND the server negotiated H.264. It has NOT
|
||||
//! been validated on real hardware with live frames — that is plan Task 8. On
|
||||
//! ANY initialization or per-frame failure it surfaces an error; the encoder
|
||||
//! factory (`create_encoder_for`) downgrades to the raw+Zstd encoder so a
|
||||
//! session never breaks because of H.264.
|
||||
//!
|
||||
//! Pipeline:
|
||||
//! BGRA capture --(color::bgra_to_nv12)--> NV12 sample --> MFT(H.264) --> H.264
|
||||
//! Annex-B/length-prefixed elementary stream --> proto EncodedFrame.
|
||||
//!
|
||||
//! Design notes:
|
||||
//! - The MFT is enumerated with `MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER,
|
||||
//! MFT_ENUM_FLAG_HARDWARE, …, MFVideoFormat_H264)` (same probe as
|
||||
//! `capability`). We `ActivateObject` the first match.
|
||||
//! - Input is configured as NV12, output as H.264, with frame size, frame rate
|
||||
//! and an average bitrate derived from `quality`.
|
||||
//! - Both the SYNCHRONOUS MFT model (ProcessInput/ProcessOutput) and the
|
||||
//! ASYNCHRONOUS hardware-MFT model (METransformNeedInput / METransformHaveOutput
|
||||
//! events) exist. To keep this first cut bounded and predictable we DRAIN the
|
||||
//! MFT synchronously after each input and treat `MF_E_TRANSFORM_NEED_MORE_INPUT`
|
||||
//! as "no output this tick". A fully async event-driven loop is a Task-8
|
||||
//! refinement (documented below).
|
||||
//! - `MFT_MESSAGE_SET_D3D_MANAGER` is intentionally NOT set — we feed CPU NV12
|
||||
//! buffers (software input samples), which every HW H.264 MFT accepts. D3D11
|
||||
//! zero-copy is a later optimization.
|
||||
|
||||
#![cfg(windows)]
|
||||
|
||||
use super::{EncodedFrame, Encoder};
|
||||
use crate::capture::CapturedFrame;
|
||||
use crate::encoder::color;
|
||||
use crate::proto::{video_frame, EncodedFrame as ProtoEncodedFrame, VideoFrame};
|
||||
use anyhow::{anyhow, Context, Result};
|
||||
use windows::Win32::Media::MediaFoundation::{
|
||||
IMFActivate, IMFMediaType, IMFSample, IMFTransform, MFCreateMediaType, MFCreateMemoryBuffer,
|
||||
MFCreateSample, MFMediaType_Video, MFShutdown, MFStartup, MFTEnumEx, MFVideoFormat_H264,
|
||||
MFVideoFormat_NV12, MFVideoInterlace_Progressive, MFSTARTUP_LITE, MFT_CATEGORY_VIDEO_ENCODER,
|
||||
MFT_ENUM_FLAG_HARDWARE, MFT_ENUM_FLAG_SORTANDFILTER, MFT_ENUM_FLAG_TRANSCODE_ONLY,
|
||||
MFT_MESSAGE_COMMAND_FLUSH, MFT_MESSAGE_NOTIFY_BEGIN_STREAMING,
|
||||
MFT_MESSAGE_NOTIFY_END_OF_STREAM, MFT_MESSAGE_NOTIFY_END_STREAMING,
|
||||
MFT_MESSAGE_NOTIFY_START_OF_STREAM, MFT_OUTPUT_DATA_BUFFER, MFT_OUTPUT_STREAM_INFO,
|
||||
MFT_REGISTER_TYPE_INFO, MF_E_TRANSFORM_NEED_MORE_INPUT, MF_MT_AVG_BITRATE, MF_MT_FRAME_RATE,
|
||||
MF_MT_FRAME_SIZE, MF_MT_INTERLACE_MODE, MF_MT_MAJOR_TYPE, MF_MT_PIXEL_ASPECT_RATIO,
|
||||
MF_MT_SUBTYPE,
|
||||
};
|
||||
|
||||
/// Encoder-internal state, created once and reused per frame.
|
||||
pub struct H264Encoder {
|
||||
/// The activated encoder transform.
|
||||
transform: IMFTransform,
|
||||
/// Configured frame dimensions; a capture-size change forces re-init.
|
||||
width: u32,
|
||||
height: u32,
|
||||
/// Quality (1-100) used to derive the bitrate; kept for re-init on resize.
|
||||
quality: u32,
|
||||
/// Frame sequence counter (mirrors RawEncoder).
|
||||
sequence: u32,
|
||||
/// Force the next frame to request a keyframe.
|
||||
force_keyframe: bool,
|
||||
/// Whether `MFT_MESSAGE_NOTIFY_BEGIN_STREAMING` was sent.
|
||||
streaming: bool,
|
||||
/// Reusable NV12 staging buffer (resized on dimension change).
|
||||
nv12: Vec<u8>,
|
||||
/// Input/output stream identifiers (most encoders use 0/0).
|
||||
input_stream_id: u32,
|
||||
output_stream_id: u32,
|
||||
/// True if MF was started by THIS encoder and must be shut down on drop.
|
||||
mf_started: bool,
|
||||
}
|
||||
|
||||
// IMFTransform is a COM interface; it is not auto-Send. We only ever touch the
|
||||
// encoder from the single capture/encode thread (the session owns it behind a
|
||||
// &mut), so it is safe to move between threads as long as it is not shared.
|
||||
unsafe impl Send for H264Encoder {}
|
||||
|
||||
impl H264Encoder {
|
||||
/// Construct and fully initialize a hardware H.264 encoder. Returns an error
|
||||
/// (so the factory can fall back to raw) if MF is unavailable, no hardware
|
||||
/// encoder exists, or media-type negotiation fails. A default frame size is
|
||||
/// used and re-negotiated on the first frame if the real capture differs.
|
||||
pub fn new(quality: u32) -> Result<Self> {
|
||||
// 1920x1080 default; re-init on the first frame if the capture differs.
|
||||
Self::with_dimensions(quality, 1920, 1080)
|
||||
}
|
||||
|
||||
fn with_dimensions(quality: u32, width: u32, height: u32) -> Result<Self> {
|
||||
unsafe {
|
||||
// MF must be initialized on this thread. MFSTARTUP_LITE avoids the
|
||||
// sockets/network stack we don't need.
|
||||
MFStartup(mf_version(), MFSTARTUP_LITE).context("MFStartup failed")?;
|
||||
let mf_started = true;
|
||||
|
||||
let transform = match Self::activate_hw_encoder() {
|
||||
Ok(t) => t,
|
||||
Err(e) => {
|
||||
// Balance the MFStartup we just did before bailing.
|
||||
let _ = MFShutdown();
|
||||
return Err(e);
|
||||
}
|
||||
};
|
||||
|
||||
let mut enc = Self {
|
||||
transform,
|
||||
width,
|
||||
height,
|
||||
quality,
|
||||
sequence: 0,
|
||||
force_keyframe: true,
|
||||
streaming: false,
|
||||
nv12: Vec::new(),
|
||||
input_stream_id: 0,
|
||||
output_stream_id: 0,
|
||||
mf_started,
|
||||
};
|
||||
|
||||
// `enc`'s Drop will shut MF down and release the transform on error.
|
||||
enc.configure_media_types()?;
|
||||
|
||||
Ok(enc)
|
||||
}
|
||||
}
|
||||
|
||||
/// Enumerate hardware H.264 encoder MFTs and activate the first one.
|
||||
unsafe fn activate_hw_encoder() -> Result<IMFTransform> {
|
||||
let output_type = MFT_REGISTER_TYPE_INFO {
|
||||
guidMajorType: MFMediaType_Video,
|
||||
guidSubtype: MFVideoFormat_H264,
|
||||
};
|
||||
|
||||
let mut activate_ptr: *mut Option<IMFActivate> = std::ptr::null_mut();
|
||||
let mut count: u32 = 0;
|
||||
|
||||
MFTEnumEx(
|
||||
MFT_CATEGORY_VIDEO_ENCODER,
|
||||
MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_SORTANDFILTER | MFT_ENUM_FLAG_TRANSCODE_ONLY,
|
||||
None,
|
||||
Some(&output_type as *const _),
|
||||
&mut activate_ptr,
|
||||
&mut count,
|
||||
)
|
||||
.context("MFTEnumEx (hardware H.264) failed")?;
|
||||
|
||||
if count == 0 || activate_ptr.is_null() {
|
||||
if !activate_ptr.is_null() {
|
||||
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
|
||||
}
|
||||
return Err(anyhow!("no hardware H.264 encoder MFT available"));
|
||||
}
|
||||
|
||||
let slice = std::slice::from_raw_parts_mut(activate_ptr, count as usize);
|
||||
|
||||
// Activate the first usable encoder; release every IMFActivate.
|
||||
let mut chosen: Option<IMFTransform> = None;
|
||||
for entry in slice.iter_mut() {
|
||||
if chosen.is_none() {
|
||||
if let Some(activate) = entry.as_ref() {
|
||||
if let Ok(transform) = activate.ActivateObject::<IMFTransform>() {
|
||||
chosen = Some(transform);
|
||||
}
|
||||
}
|
||||
}
|
||||
// Release this IMFActivate reference.
|
||||
entry.take();
|
||||
}
|
||||
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
|
||||
|
||||
chosen.ok_or_else(|| anyhow!("failed to activate any hardware H.264 encoder MFT"))
|
||||
}
|
||||
|
||||
/// Set the H.264 output type and NV12 input type, in the order MF requires
|
||||
/// (output type FIRST for encoders, then the matching input type).
|
||||
unsafe fn configure_media_types(&mut self) -> Result<()> {
|
||||
// Discover the real stream identifiers (most encoders report 0/0).
|
||||
let mut input_ids = [0u32; 1];
|
||||
let mut output_ids = [0u32; 1];
|
||||
// GetStreamIDs may return E_NOTIMPL meaning "ids are 0..n-1"; ignore err.
|
||||
let _ = self.transform.GetStreamIDs(&mut input_ids, &mut output_ids);
|
||||
// If GetStreamIDs populated nonzero ids use them, else default 0/0.
|
||||
if input_ids[0] != 0 {
|
||||
self.input_stream_id = input_ids[0];
|
||||
}
|
||||
if output_ids[0] != 0 {
|
||||
self.output_stream_id = output_ids[0];
|
||||
}
|
||||
|
||||
let fps_num = 30u32;
|
||||
let fps_den = 1u32;
|
||||
let bitrate = quality_to_bitrate(self.quality, self.width, self.height);
|
||||
|
||||
// ---- OUTPUT (H.264) ----
|
||||
let out_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(out)")?;
|
||||
out_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
|
||||
out_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_H264)?;
|
||||
out_type.SetUINT32(&MF_MT_AVG_BITRATE, bitrate)?;
|
||||
set_attr_size(&out_type, &MF_MT_FRAME_SIZE, self.width, self.height)?;
|
||||
set_attr_ratio(&out_type, &MF_MT_FRAME_RATE, fps_num, fps_den)?;
|
||||
set_attr_ratio(&out_type, &MF_MT_PIXEL_ASPECT_RATIO, 1, 1)?;
|
||||
out_type.SetUINT32(&MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive.0 as u32)?;
|
||||
self.transform
|
||||
.SetOutputType(self.output_stream_id, &out_type, 0)
|
||||
.context("SetOutputType(H264)")?;
|
||||
|
||||
// ---- INPUT (NV12) ----
|
||||
let in_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(in)")?;
|
||||
in_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
|
||||
in_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_NV12)?;
|
||||
set_attr_size(&in_type, &MF_MT_FRAME_SIZE, self.width, self.height)?;
|
||||
set_attr_ratio(&in_type, &MF_MT_FRAME_RATE, fps_num, fps_den)?;
|
||||
set_attr_ratio(&in_type, &MF_MT_PIXEL_ASPECT_RATIO, 1, 1)?;
|
||||
in_type.SetUINT32(&MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive.0 as u32)?;
|
||||
self.transform
|
||||
.SetInputType(self.input_stream_id, &in_type, 0)
|
||||
.context("SetInputType(NV12)")?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Begin streaming if not already started (idempotent).
|
||||
unsafe fn ensure_streaming(&mut self) -> Result<()> {
|
||||
if !self.streaming {
|
||||
self.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, 0)
|
||||
.context("NOTIFY_BEGIN_STREAMING")?;
|
||||
self.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, 0)
|
||||
.context("NOTIFY_START_OF_STREAM")?;
|
||||
self.streaming = true;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Re-initialize the encoder for a new frame size (capture resolution change).
|
||||
unsafe fn reinit_for_size(&mut self, width: u32, height: u32) -> Result<()> {
|
||||
if self.streaming {
|
||||
let _ = self.transform.ProcessMessage(MFT_MESSAGE_COMMAND_FLUSH, 0);
|
||||
let _ = self
|
||||
.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, 0);
|
||||
let _ = self
|
||||
.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, 0);
|
||||
self.streaming = false;
|
||||
}
|
||||
self.width = width;
|
||||
self.height = height;
|
||||
self.force_keyframe = true;
|
||||
self.configure_media_types()
|
||||
}
|
||||
|
||||
/// Wrap an NV12 byte buffer into an `IMFSample` with the given timestamp.
|
||||
/// A free associated fn (does not borrow `self`) so the caller can pass
|
||||
/// `&self.nv12` without a clone while `self` is mutably borrowed elsewhere.
|
||||
unsafe fn make_input_sample(nv12: &[u8], pts_100ns: i64) -> Result<IMFSample> {
|
||||
let sample: IMFSample = MFCreateSample().context("MFCreateSample")?;
|
||||
let buffer = MFCreateMemoryBuffer(nv12.len() as u32).context("MFCreateMemoryBuffer")?;
|
||||
|
||||
// Lock, copy NV12 in, set current length, unlock.
|
||||
let mut data_ptr: *mut u8 = std::ptr::null_mut();
|
||||
let mut max_len: u32 = 0;
|
||||
buffer
|
||||
.Lock(&mut data_ptr, Some(&mut max_len), None)
|
||||
.context("IMFMediaBuffer::Lock")?;
|
||||
if (max_len as usize) < nv12.len() || data_ptr.is_null() {
|
||||
let _ = buffer.Unlock();
|
||||
return Err(anyhow!("MF buffer too small for NV12 frame"));
|
||||
}
|
||||
std::ptr::copy_nonoverlapping(nv12.as_ptr(), data_ptr, nv12.len());
|
||||
buffer.SetCurrentLength(nv12.len() as u32)?;
|
||||
buffer.Unlock()?;
|
||||
|
||||
sample.AddBuffer(&buffer)?;
|
||||
sample.SetSampleTime(pts_100ns)?;
|
||||
// 33.367ms per frame at ~30fps, in 100ns units.
|
||||
sample.SetSampleDuration(333_667)?;
|
||||
Ok(sample)
|
||||
}
|
||||
|
||||
/// Drain one available output sample, if any. Returns the encoded bytes and
|
||||
/// whether the MFT flagged it a keyframe (clean point). `Ok(None)` means the
|
||||
/// MFT needs more input before it can produce output this tick.
|
||||
unsafe fn drain_one_output(&mut self) -> Result<Option<(Vec<u8>, bool)>> {
|
||||
let stream_info: MFT_OUTPUT_STREAM_INFO = self
|
||||
.transform
|
||||
.GetOutputStreamInfo(self.output_stream_id)
|
||||
.context("GetOutputStreamInfo")?;
|
||||
|
||||
// If the MFT does not allocate its own output samples we must provide one.
|
||||
const MFT_OUTPUT_STREAM_PROVIDES_SAMPLES: u32 = 0x100;
|
||||
let mft_provides = stream_info.dwFlags & MFT_OUTPUT_STREAM_PROVIDES_SAMPLES != 0;
|
||||
|
||||
let mut out_buffer = MFT_OUTPUT_DATA_BUFFER {
|
||||
dwStreamID: self.output_stream_id,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
if !mft_provides {
|
||||
let alloc_size = stream_info.cbSize.max(1);
|
||||
let sample: IMFSample = MFCreateSample().context("MFCreateSample(out)")?;
|
||||
let buffer = MFCreateMemoryBuffer(alloc_size).context("MFCreateMemoryBuffer(out)")?;
|
||||
sample.AddBuffer(&buffer)?;
|
||||
out_buffer.pSample = std::mem::ManuallyDrop::new(Some(sample));
|
||||
}
|
||||
|
||||
let mut status: u32 = 0;
|
||||
let mut bufs = [out_buffer];
|
||||
let hr = self.transform.ProcessOutput(0, &mut bufs, &mut status);
|
||||
|
||||
// Take ownership of whatever sample is now in the buffer (ours or MFT's).
|
||||
let produced = std::mem::ManuallyDrop::take(&mut bufs[0].pSample);
|
||||
|
||||
match hr {
|
||||
Ok(()) => {
|
||||
let Some(sample) = produced else {
|
||||
return Ok(None);
|
||||
};
|
||||
let bytes = sample_to_vec(&sample)?;
|
||||
let keyframe = sample_is_keyframe(&sample);
|
||||
Ok(Some((bytes, keyframe)))
|
||||
}
|
||||
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => Ok(None),
|
||||
Err(e) => Err(anyhow!("ProcessOutput failed: {e:#}")),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Encoder for H264Encoder {
|
||||
fn encode(&mut self, frame: &CapturedFrame) -> Result<EncodedFrame> {
|
||||
self.sequence = self.sequence.wrapping_add(1);
|
||||
|
||||
// H.264 4:2:0 needs even dimensions. Reject odd captures up front so we
|
||||
// surface a clean error (the factory already fell back to raw if HW was
|
||||
// missing; a per-frame error here lets the session log + continue).
|
||||
if !frame.width.is_multiple_of(2) || !frame.height.is_multiple_of(2) {
|
||||
return Err(anyhow!(
|
||||
"H.264 requires even dimensions, got {}x{}",
|
||||
frame.width,
|
||||
frame.height
|
||||
));
|
||||
}
|
||||
|
||||
unsafe {
|
||||
// Re-init on a resolution change.
|
||||
if frame.width != self.width || frame.height != self.height {
|
||||
self.reinit_for_size(frame.width, frame.height)
|
||||
.context("H.264 re-init for new frame size")?;
|
||||
}
|
||||
|
||||
self.ensure_streaming()?;
|
||||
|
||||
// BGRA -> NV12 into the reusable staging buffer.
|
||||
let need = color::nv12_size(frame.width, frame.height);
|
||||
if self.nv12.len() != need {
|
||||
self.nv12.resize(need, 0);
|
||||
}
|
||||
color::bgra_to_nv12(&frame.data, frame.width, frame.height, &mut self.nv12)
|
||||
.map_err(|e| anyhow!("BGRA->NV12 failed: {e}"))?;
|
||||
|
||||
// PTS in 100ns units derived from the frame's capture instant.
|
||||
let pts_100ns = (frame.timestamp.elapsed().as_nanos() / 100) as i64;
|
||||
let sample = Self::make_input_sample(&self.nv12, pts_100ns)?;
|
||||
|
||||
// Feed the encoder. NEED_MORE_INPUT is normal back-pressure handling;
|
||||
// for the synchronous first cut we only push one frame per tick.
|
||||
match self
|
||||
.transform
|
||||
.ProcessInput(self.input_stream_id, &sample, 0)
|
||||
{
|
||||
Ok(()) => {}
|
||||
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => {}
|
||||
Err(e) => return Err(anyhow!("ProcessInput failed: {e:#}")),
|
||||
}
|
||||
|
||||
// Drain whatever output is ready.
|
||||
let Some((data, mft_keyframe)) = self.drain_one_output()? else {
|
||||
// No compressed output yet (encoder latency / GOP buffering).
|
||||
// Emit an empty frame so the session skips sending this tick.
|
||||
return Ok(EncodedFrame {
|
||||
frame: VideoFrame::default(),
|
||||
size: 0,
|
||||
is_keyframe: false,
|
||||
});
|
||||
};
|
||||
|
||||
let is_keyframe = mft_keyframe || self.force_keyframe;
|
||||
self.force_keyframe = false;
|
||||
|
||||
let size = data.len();
|
||||
let encoded = ProtoEncodedFrame {
|
||||
data,
|
||||
keyframe: is_keyframe,
|
||||
pts: pts_100ns,
|
||||
dts: pts_100ns,
|
||||
};
|
||||
|
||||
Ok(EncodedFrame {
|
||||
frame: VideoFrame {
|
||||
timestamp: frame.timestamp.elapsed().as_millis() as i64,
|
||||
display_id: frame.display_id as i32,
|
||||
sequence: self.sequence as i32,
|
||||
encoding: Some(video_frame::Encoding::H264(encoded)),
|
||||
},
|
||||
size,
|
||||
is_keyframe,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
fn request_keyframe(&mut self) {
|
||||
// A precise force-IDR uses the MFT codec API
|
||||
// (CODECAPI_AVEncVideoForceKeyFrame); for the first cut we flag the next
|
||||
// emitted frame as a keyframe so the viewer treats it as a clean point.
|
||||
self.force_keyframe = true;
|
||||
}
|
||||
|
||||
fn name(&self) -> &str {
|
||||
"h264-mediafoundation"
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for H264Encoder {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
if self.streaming {
|
||||
let _ = self
|
||||
.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, 0);
|
||||
let _ = self
|
||||
.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, 0);
|
||||
}
|
||||
// The IMFTransform releases when `self.transform` drops.
|
||||
if self.mf_started {
|
||||
let _ = MFShutdown();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// MF version word expected by `MFStartup` (MF_VERSION = (MF_API_VERSION<<16)|MF_SDK_VERSION).
|
||||
fn mf_version() -> u32 {
|
||||
// MF_SDK_VERSION = 0x0002, MF_API_VERSION = 0x0070 -> 0x00020070.
|
||||
0x0002_0070
|
||||
}
|
||||
|
||||
/// Derive a target average bitrate (bps) from the 1-100 quality knob and the
|
||||
/// frame area. Tuned conservatively for desktop content (mostly static).
|
||||
fn quality_to_bitrate(quality: u32, width: u32, height: u32) -> u32 {
|
||||
let q = quality.clamp(1, 100) as u64;
|
||||
let pixels = (width as u64) * (height as u64);
|
||||
// Base ~0.06 bits/pixel/frame at 30fps for q=100, scaled by quality.
|
||||
// bps = pixels * 30 * bpp; bpp scales 0.01..0.10 with quality.
|
||||
let bpp_milli = 10 + (q * 90 / 100); // 0.010 .. 0.100 in milli-bits
|
||||
let bps = pixels.saturating_mul(30).saturating_mul(bpp_milli) / 1000;
|
||||
bps.clamp(500_000, 50_000_000) as u32
|
||||
}
|
||||
|
||||
/// Pack (width, height) into the 64-bit MF_MT_FRAME_SIZE attribute.
|
||||
#[cfg(windows)]
|
||||
unsafe fn set_attr_size(
|
||||
media_type: &IMFMediaType,
|
||||
key: &windows::core::GUID,
|
||||
width: u32,
|
||||
height: u32,
|
||||
) -> Result<()> {
|
||||
let packed = ((width as u64) << 32) | (height as u64);
|
||||
media_type.SetUINT64(key, packed)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Pack (numerator, denominator) into a 64-bit ratio MF attribute.
|
||||
#[cfg(windows)]
|
||||
unsafe fn set_attr_ratio(
|
||||
media_type: &IMFMediaType,
|
||||
key: &windows::core::GUID,
|
||||
num: u32,
|
||||
den: u32,
|
||||
) -> Result<()> {
|
||||
let packed = ((num as u64) << 32) | (den as u64);
|
||||
media_type.SetUINT64(key, packed)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Copy all bytes out of an `IMFSample` (single contiguous buffer) into a Vec.
|
||||
#[cfg(windows)]
|
||||
unsafe fn sample_to_vec(sample: &IMFSample) -> Result<Vec<u8>> {
|
||||
let buffer = sample
|
||||
.ConvertToContiguousBuffer()
|
||||
.context("ConvertToContiguousBuffer")?;
|
||||
let mut ptr: *mut u8 = std::ptr::null_mut();
|
||||
let mut len: u32 = 0;
|
||||
buffer
|
||||
.Lock(&mut ptr, None, Some(&mut len))
|
||||
.context("output buffer Lock")?;
|
||||
let out = if ptr.is_null() || len == 0 {
|
||||
Vec::new()
|
||||
} else {
|
||||
std::slice::from_raw_parts(ptr, len as usize).to_vec()
|
||||
};
|
||||
let _ = buffer.Unlock();
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
/// Read the "clean point" (keyframe) flag off a sample, if present.
|
||||
#[cfg(windows)]
|
||||
unsafe fn sample_is_keyframe(sample: &IMFSample) -> bool {
|
||||
use windows::Win32::Media::MediaFoundation::MFSampleExtension_CleanPoint;
|
||||
sample
|
||||
.GetUINT32(&MFSampleExtension_CleanPoint)
|
||||
.map(|v| v != 0)
|
||||
.unwrap_or(false)
|
||||
}
|
||||
@@ -1,16 +1,27 @@
|
||||
//! Frame encoding module
|
||||
//!
|
||||
//! Encodes captured frames for transmission. Supports:
|
||||
//! - Raw BGRA + Zstd compression (lowest latency, LAN mode)
|
||||
//! - VP9 software encoding (universal fallback)
|
||||
//! - H264 hardware encoding (when GPU available)
|
||||
//! - Raw BGRA + Zstd compression (lowest latency, LAN mode; the guaranteed
|
||||
//! fallback and the current default).
|
||||
//! - H.264 hardware encoding via Windows Media Foundation (Task 7) — the
|
||||
//! negotiated upgrade. Compile-verified; validated on real hardware in plan
|
||||
//! Task 8. On any init/feed failure the factory or encoder falls back to raw.
|
||||
//!
|
||||
//! Codec selection is driven by the negotiated `VideoCodec` the server sends on
|
||||
//! `StartStream` (see `select_codec` / `create_encoder_for`). The capability the
|
||||
//! agent advertises to the server is detected by `capability::supports_hardware_h264`.
|
||||
|
||||
mod capability;
|
||||
pub(crate) mod color;
|
||||
#[cfg(windows)]
|
||||
mod h264;
|
||||
mod raw;
|
||||
|
||||
pub use capability::supports_hardware_h264;
|
||||
pub use raw::RawEncoder;
|
||||
|
||||
use crate::capture::CapturedFrame;
|
||||
use crate::proto::VideoFrame;
|
||||
use crate::proto::{video_frame, VideoCodec, VideoFrame};
|
||||
use anyhow::Result;
|
||||
|
||||
/// Encoded frame ready for transmission
|
||||
@@ -28,7 +39,12 @@ pub struct EncodedFrame {
|
||||
pub is_keyframe: bool,
|
||||
}
|
||||
|
||||
/// Frame encoder trait
|
||||
/// Frame encoder trait.
|
||||
///
|
||||
/// Every implementor turns a `CapturedFrame` (BGRA) into a wire `VideoFrame`
|
||||
/// using one `video_frame::Encoding` variant. `RawEncoder` emits the `Raw`
|
||||
/// variant; the H.264 encoder emits the `H264` variant. The factory
|
||||
/// (`create_encoder_for`) selects the implementor from the negotiated codec.
|
||||
pub trait Encoder: Send {
|
||||
/// Encode a captured frame
|
||||
fn encode(&mut self, frame: &CapturedFrame) -> Result<EncodedFrame>;
|
||||
@@ -42,13 +58,167 @@ pub trait Encoder: Send {
|
||||
fn name(&self) -> &str;
|
||||
}
|
||||
|
||||
/// Create an encoder based on configuration
|
||||
pub fn create_encoder(codec: &str, quality: u32) -> Result<Box<dyn Encoder>> {
|
||||
/// Map a configured/negotiated codec string to a `VideoCodec`.
|
||||
///
|
||||
/// Used when constructing an encoder from the agent's own `EncodingConfig`
|
||||
/// (before any server negotiation). Unknown / "auto" / "raw" all resolve to raw
|
||||
/// — the safe default. "h264" resolves to H.264 (which itself falls back to raw
|
||||
/// if MF init fails).
|
||||
///
|
||||
/// Retained as the config-string entry point (used by `create_encoder` and the
|
||||
/// unit tests); the live session negotiates via `select_codec` on a `VideoCodec`.
|
||||
#[allow(dead_code)]
|
||||
pub fn codec_from_str(codec: &str) -> VideoCodec {
|
||||
match codec.to_lowercase().as_str() {
|
||||
"raw" | "zstd" => Ok(Box::new(RawEncoder::new(quality)?)),
|
||||
// "vp9" => Ok(Box::new(Vp9Encoder::new(quality)?)),
|
||||
// "h264" => Ok(Box::new(H264Encoder::new(quality)?)),
|
||||
// "auto" and any unknown codec default to raw for now (best for LAN)
|
||||
_ => Ok(Box::new(RawEncoder::new(quality)?)),
|
||||
"h264" => VideoCodec::H264,
|
||||
// "h265"/"hevc" are future opt-in (TODO) — treat as raw for now so we
|
||||
// never select an unimplemented codec.
|
||||
_ => VideoCodec::Raw,
|
||||
}
|
||||
}
|
||||
|
||||
/// Choose the codec the agent will actually use for a stream, given the codec
|
||||
/// the server negotiated and the agent's own hardware capability.
|
||||
///
|
||||
/// This is the agent-side guard that keeps the raw fallback authoritative:
|
||||
/// - The server only negotiates H.264 when the agent advertised support, but we
|
||||
/// re-check `supports_hardware_h264()` here so a stale/misconfigured server
|
||||
/// selection can never force an unsupported codec.
|
||||
/// - H.265 is not implemented; it degrades to raw.
|
||||
/// - Anything else is raw.
|
||||
pub fn select_codec(negotiated: VideoCodec, hardware_h264_available: bool) -> VideoCodec {
|
||||
match negotiated {
|
||||
VideoCodec::H264 if hardware_h264_available => VideoCodec::H264,
|
||||
// Server asked for H.264 but we have no HW encoder -> raw.
|
||||
VideoCodec::H264 => VideoCodec::Raw,
|
||||
// HEVC not implemented yet (TODO: Task 7 opt-in / future).
|
||||
VideoCodec::H265 => VideoCodec::Raw,
|
||||
VideoCodec::Raw => VideoCodec::Raw,
|
||||
}
|
||||
}
|
||||
|
||||
/// Create an encoder for an explicit `VideoCodec`, with a transparent fallback
|
||||
/// to raw if a hardware encoder cannot be constructed.
|
||||
///
|
||||
/// `quality` is the 1-100 quality knob (mapped per-codec). On H.264 init failure
|
||||
/// this logs and returns a raw encoder so the session keeps working.
|
||||
pub fn create_encoder_for(codec: VideoCodec, quality: u32) -> Result<Box<dyn Encoder>> {
|
||||
match codec {
|
||||
VideoCodec::H264 => {
|
||||
#[cfg(windows)]
|
||||
{
|
||||
match h264::H264Encoder::new(quality) {
|
||||
Ok(enc) => {
|
||||
tracing::info!("Using hardware H.264 encoder (Media Foundation)");
|
||||
Ok(Box::new(enc))
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!(
|
||||
"H.264 encoder init failed ({e:#}); falling back to raw+Zstd"
|
||||
);
|
||||
Ok(Box::new(RawEncoder::new(quality)?))
|
||||
}
|
||||
}
|
||||
}
|
||||
#[cfg(not(windows))]
|
||||
{
|
||||
tracing::warn!("H.264 unsupported on this platform; using raw+Zstd");
|
||||
Ok(Box::new(RawEncoder::new(quality)?))
|
||||
}
|
||||
}
|
||||
// Raw (and anything that resolved to raw) uses the salvaged encoder.
|
||||
VideoCodec::Raw | VideoCodec::H265 => Ok(Box::new(RawEncoder::new(quality)?)),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create an encoder based on a codec string (agent config path).
|
||||
///
|
||||
/// Backwards-compatible entry point that builds an encoder from a codec STRING
|
||||
/// (e.g. `EncodingConfig.codec`). Resolves the string to a `VideoCodec`, applies
|
||||
/// the hardware-availability guard, then builds the encoder. The live session
|
||||
/// uses `select_codec` + `create_encoder_for` (negotiated `VideoCodec`) instead;
|
||||
/// this remains for the config path and is covered by unit tests.
|
||||
#[allow(dead_code)]
|
||||
pub fn create_encoder(codec: &str, quality: u32) -> Result<Box<dyn Encoder>> {
|
||||
let requested = codec_from_str(codec);
|
||||
let chosen = select_codec(requested, supports_hardware_h264());
|
||||
create_encoder_for(chosen, quality)
|
||||
}
|
||||
|
||||
/// Build an `EncodedFrame` carrying a single `video_frame::Encoding` payload.
|
||||
/// Shared helper so encoders don't each repeat the `VideoFrame` wrapper.
|
||||
#[allow(dead_code)]
|
||||
pub(crate) fn wrap_video_frame(
|
||||
timestamp_ms: i64,
|
||||
display_id: i32,
|
||||
sequence: i32,
|
||||
encoding: video_frame::Encoding,
|
||||
size: usize,
|
||||
is_keyframe: bool,
|
||||
) -> EncodedFrame {
|
||||
EncodedFrame {
|
||||
frame: VideoFrame {
|
||||
timestamp: timestamp_ms,
|
||||
display_id,
|
||||
sequence,
|
||||
encoding: Some(encoding),
|
||||
},
|
||||
size,
|
||||
is_keyframe,
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn codec_from_str_maps_known_and_unknown() {
|
||||
assert_eq!(codec_from_str("h264"), VideoCodec::H264);
|
||||
assert_eq!(codec_from_str("H264"), VideoCodec::H264);
|
||||
assert_eq!(codec_from_str("raw"), VideoCodec::Raw);
|
||||
assert_eq!(codec_from_str("zstd"), VideoCodec::Raw);
|
||||
assert_eq!(codec_from_str("auto"), VideoCodec::Raw);
|
||||
assert_eq!(codec_from_str("vp9"), VideoCodec::Raw);
|
||||
// HEVC not implemented -> raw, never H265.
|
||||
assert_eq!(codec_from_str("h265"), VideoCodec::Raw);
|
||||
assert_eq!(codec_from_str("hevc"), VideoCodec::Raw);
|
||||
assert_eq!(codec_from_str(""), VideoCodec::Raw);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn select_codec_honors_hardware_guard() {
|
||||
// Server negotiated H.264 and HW is present -> H.264.
|
||||
assert_eq!(select_codec(VideoCodec::H264, true), VideoCodec::H264);
|
||||
// Server negotiated H.264 but no HW -> raw (never forced).
|
||||
assert_eq!(select_codec(VideoCodec::H264, false), VideoCodec::Raw);
|
||||
// Raw stays raw regardless of HW.
|
||||
assert_eq!(select_codec(VideoCodec::Raw, true), VideoCodec::Raw);
|
||||
assert_eq!(select_codec(VideoCodec::Raw, false), VideoCodec::Raw);
|
||||
// HEVC always degrades to raw (unimplemented).
|
||||
assert_eq!(select_codec(VideoCodec::H265, true), VideoCodec::Raw);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn raw_factory_always_succeeds() {
|
||||
// Raw must always construct (the guaranteed fallback).
|
||||
let enc = create_encoder_for(VideoCodec::Raw, 75).unwrap();
|
||||
assert_eq!(enc.name(), "raw+zstd");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn create_encoder_string_path_resolves_to_raw_without_hw() {
|
||||
// On a machine without a HW encoder (CI / non-Windows), "h264" must
|
||||
// resolve to a working raw encoder, not an error.
|
||||
let enc = create_encoder("h264", 75).unwrap();
|
||||
// Without HW it is raw; with HW it would be the H.264 encoder. We only
|
||||
// assert it constructed.
|
||||
let _ = enc.name();
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn create_encoder_auto_is_raw() {
|
||||
let enc = create_encoder("auto", 75).unwrap();
|
||||
assert_eq!(enc.name(), "raw+zstd");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -61,6 +61,10 @@ pub struct SessionManager {
|
||||
input: Option<InputController>,
|
||||
// Streaming state
|
||||
current_viewer_id: Option<String>,
|
||||
// Codec negotiated by the server for the current stream (Task 7). Set from
|
||||
// StartStream.video_codec; the encoder is built from it (guarded by the
|
||||
// agent's own hardware capability, with raw as the safe fallback).
|
||||
negotiated_codec: crate::proto::VideoCodec,
|
||||
// System info for status reports
|
||||
hostname: String,
|
||||
is_elevated: bool,
|
||||
@@ -87,6 +91,8 @@ impl SessionManager {
|
||||
encoder: None,
|
||||
input: None,
|
||||
current_viewer_id: None,
|
||||
// Default to RAW until the server negotiates otherwise (StartStream).
|
||||
negotiated_codec: crate::proto::VideoCodec::Raw,
|
||||
hostname,
|
||||
is_elevated,
|
||||
start_time: Instant::now(),
|
||||
@@ -168,14 +174,20 @@ impl SessionManager {
|
||||
self.capturer = Some(capturer);
|
||||
tracing::info!("Capturer created successfully");
|
||||
|
||||
// Create encoder with panic protection
|
||||
// Create encoder from the NEGOTIATED codec (Task 7), guarded by the
|
||||
// agent's own hardware capability. `create_encoder_for` selects the H.264
|
||||
// encoder only if it can actually be constructed, otherwise it returns a
|
||||
// working raw encoder — so this never breaks the session.
|
||||
let chosen =
|
||||
encoder::select_codec(self.negotiated_codec, encoder::supports_hardware_h264());
|
||||
tracing::debug!(
|
||||
"Creating encoder (codec={}, quality={})...",
|
||||
self.config.encoding.codec,
|
||||
"Creating encoder (negotiated={:?}, chosen={:?}, quality={})...",
|
||||
self.negotiated_codec,
|
||||
chosen,
|
||||
self.config.encoding.quality
|
||||
);
|
||||
let encoder = match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
|
||||
encoder::create_encoder(&self.config.encoding.codec, self.config.encoding.quality)
|
||||
encoder::create_encoder_for(chosen, self.config.encoding.quality)
|
||||
})) {
|
||||
Ok(result) => result?,
|
||||
Err(e) => {
|
||||
@@ -232,6 +244,9 @@ impl SessionManager {
|
||||
organization: self.config.company.clone().unwrap_or_default(),
|
||||
site: self.config.site.clone().unwrap_or_default(),
|
||||
tags: self.config.tags.clone(),
|
||||
// Advertise hardware H.264 capability so the server can negotiate the
|
||||
// codec (Task 7). Detected once and cached by the encoder module.
|
||||
supports_h264: encoder::supports_hardware_h264(),
|
||||
};
|
||||
|
||||
let msg = Message {
|
||||
@@ -336,6 +351,15 @@ impl SessionManager {
|
||||
match payload {
|
||||
message::Payload::StartStream(start) => {
|
||||
tracing::info!("StartStream received from viewer: {}", start.viewer_id);
|
||||
// Apply the server-negotiated codec (Task 7) BEFORE
|
||||
// building the encoder. An older server that omits the
|
||||
// field sends 0 = VIDEO_CODEC_RAW, preserving the raw
|
||||
// default. `select_codec` (in init_streaming) re-guards
|
||||
// against missing hardware.
|
||||
self.negotiated_codec =
|
||||
crate::proto::VideoCodec::try_from(start.video_codec)
|
||||
.unwrap_or(crate::proto::VideoCodec::Raw);
|
||||
tracing::info!("Server negotiated codec: {:?}", self.negotiated_codec);
|
||||
if let Err(e) = self.init_streaming() {
|
||||
tracing::error!("Failed to init streaming: {}", e);
|
||||
} else {
|
||||
|
||||
452
agent/src/viewer/decoder.rs
Normal file
452
agent/src/viewer/decoder.rs
Normal file
@@ -0,0 +1,452 @@
|
||||
//! H.264 video decoder for the native viewer (Task 7).
|
||||
//!
|
||||
//! FIRST-CUT / COMPILE-VERIFIED ONLY. Decodes an H.264 elementary stream
|
||||
//! (`EncodedFrame{h264}`) via a Media Foundation H.264 decoder MFT into NV12,
|
||||
//! then converts NV12 -> BGRA so it can flow through the EXISTING raw render
|
||||
//! path (`render::FrameData { compressed: false, BGRA }`). Not yet validated on
|
||||
//! real hardware with a live stream — that is plan Task 8. On decode-init
|
||||
//! failure the decoder reports an error and the viewer logs it; the raw-frame
|
||||
//! render path is untouched for raw sessions.
|
||||
//!
|
||||
//! The decoder is created lazily on the first H.264 frame (so a raw session
|
||||
//! never spins up MF). It is `!Send` (COM), so it lives on the viewer's receive
|
||||
//! task and is wrapped accordingly by the caller.
|
||||
|
||||
#![cfg(windows)]
|
||||
|
||||
use anyhow::{anyhow, Context, Result};
|
||||
use windows::Win32::Media::MediaFoundation::{
|
||||
IMFMediaType, IMFSample, IMFTransform, MFCreateMediaType, MFCreateMemoryBuffer, MFCreateSample,
|
||||
MFMediaType_Video, MFShutdown, MFStartup, MFTEnumEx, MFVideoFormat_H264, MFVideoFormat_NV12,
|
||||
MFSTARTUP_LITE, MFT_CATEGORY_VIDEO_DECODER, MFT_ENUM_FLAG_SORTANDFILTER, MFT_ENUM_FLAG_SYNCMFT,
|
||||
MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, MFT_MESSAGE_NOTIFY_END_OF_STREAM,
|
||||
MFT_MESSAGE_NOTIFY_END_STREAMING, MFT_MESSAGE_NOTIFY_START_OF_STREAM, MFT_OUTPUT_DATA_BUFFER,
|
||||
MFT_OUTPUT_STREAM_INFO, MFT_REGISTER_TYPE_INFO, MF_E_TRANSFORM_NEED_MORE_INPUT,
|
||||
MF_E_TRANSFORM_STREAM_CHANGE, MF_MT_FRAME_SIZE, MF_MT_MAJOR_TYPE, MF_MT_SUBTYPE,
|
||||
};
|
||||
|
||||
/// A decoded NV12 frame and its dimensions, ready for NV12 -> BGRA conversion.
|
||||
pub struct DecodedFrame {
|
||||
pub width: u32,
|
||||
pub height: u32,
|
||||
/// BGRA pixels (4 bytes/px), ready for `render::FrameData`.
|
||||
pub bgra: Vec<u8>,
|
||||
}
|
||||
|
||||
/// Media Foundation H.264 decoder wrapper.
|
||||
pub struct H264Decoder {
|
||||
transform: IMFTransform,
|
||||
width: u32,
|
||||
height: u32,
|
||||
streaming: bool,
|
||||
input_stream_id: u32,
|
||||
output_stream_id: u32,
|
||||
mf_started: bool,
|
||||
}
|
||||
|
||||
// NOTE: H264Decoder is intentionally NOT `Send`. It wraps COM interfaces with
|
||||
// thread affinity and is created + used entirely on the dedicated `gc-h264-decode`
|
||||
// OS thread (see viewer::spawn_h264_decode_worker), so it never crosses a thread
|
||||
// boundary and does not need a Send assertion.
|
||||
|
||||
impl H264Decoder {
|
||||
/// Construct an H.264 decoder MFT and set its input type to H.264. The
|
||||
/// output type (NV12) is negotiated after the first frames decode the
|
||||
/// sequence header (we (re)read the real frame size on a stream change).
|
||||
pub fn new() -> Result<Self> {
|
||||
unsafe {
|
||||
MFStartup(mf_version(), MFSTARTUP_LITE).context("MFStartup (decoder)")?;
|
||||
let transform = match activate_decoder() {
|
||||
Ok(t) => t,
|
||||
Err(e) => {
|
||||
let _ = MFShutdown();
|
||||
return Err(e);
|
||||
}
|
||||
};
|
||||
|
||||
let mut dec = Self {
|
||||
transform,
|
||||
width: 0,
|
||||
height: 0,
|
||||
streaming: false,
|
||||
input_stream_id: 0,
|
||||
output_stream_id: 0,
|
||||
mf_started: true,
|
||||
};
|
||||
|
||||
dec.configure_input()?;
|
||||
Ok(dec)
|
||||
}
|
||||
}
|
||||
|
||||
/// Set the decoder input type to H.264 (no fixed frame size — the decoder
|
||||
/// learns it from the bitstream).
|
||||
unsafe fn configure_input(&mut self) -> Result<()> {
|
||||
let in_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(dec in)")?;
|
||||
in_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
|
||||
in_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_H264)?;
|
||||
self.transform
|
||||
.SetInputType(self.input_stream_id, &in_type, 0)
|
||||
.context("SetInputType(H264 decode)")?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Set the decoder output type to NV12 once the stream size is known.
|
||||
unsafe fn configure_output_nv12(&mut self) -> Result<()> {
|
||||
let out_type: IMFMediaType = MFCreateMediaType().context("MFCreateMediaType(dec out)")?;
|
||||
out_type.SetGUID(&MF_MT_MAJOR_TYPE, &MFMediaType_Video)?;
|
||||
out_type.SetGUID(&MF_MT_SUBTYPE, &MFVideoFormat_NV12)?;
|
||||
self.transform
|
||||
.SetOutputType(self.output_stream_id, &out_type, 0)
|
||||
.context("SetOutputType(NV12 decode)")?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Read the negotiated output frame size from the decoder's current output type.
|
||||
unsafe fn read_output_size(&mut self) -> Result<(u32, u32)> {
|
||||
let out_type = self
|
||||
.transform
|
||||
.GetOutputCurrentType(self.output_stream_id)
|
||||
.context("GetOutputCurrentType")?;
|
||||
let packed = out_type
|
||||
.GetUINT64(&MF_MT_FRAME_SIZE)
|
||||
.context("read MF_MT_FRAME_SIZE")?;
|
||||
let width = (packed >> 32) as u32;
|
||||
let height = (packed & 0xFFFF_FFFF) as u32;
|
||||
Ok((width, height))
|
||||
}
|
||||
|
||||
unsafe fn ensure_streaming(&mut self) -> Result<()> {
|
||||
if !self.streaming {
|
||||
self.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, 0)
|
||||
.context("decoder BEGIN_STREAMING")?;
|
||||
self.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, 0)
|
||||
.context("decoder START_OF_STREAM")?;
|
||||
self.streaming = true;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Feed one H.264 access unit and return a decoded BGRA frame if one is
|
||||
/// produced this tick. `Ok(None)` means the decoder needs more input (normal
|
||||
/// while it buffers the first GOP).
|
||||
pub fn decode(&mut self, h264: &[u8], pts_100ns: i64) -> Result<Option<DecodedFrame>> {
|
||||
if h264.is_empty() {
|
||||
return Ok(None);
|
||||
}
|
||||
unsafe {
|
||||
self.ensure_streaming()?;
|
||||
|
||||
let sample = make_input_sample(h264, pts_100ns)?;
|
||||
match self
|
||||
.transform
|
||||
.ProcessInput(self.input_stream_id, &sample, 0)
|
||||
{
|
||||
Ok(()) => {}
|
||||
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => {}
|
||||
Err(e) => return Err(anyhow!("decoder ProcessInput failed: {e:#}")),
|
||||
}
|
||||
|
||||
self.drain_one()
|
||||
}
|
||||
}
|
||||
|
||||
/// Drain one decoded output sample, handling the initial NV12 output-type
|
||||
/// negotiation (`MF_E_TRANSFORM_STREAM_CHANGE`).
|
||||
unsafe fn drain_one(&mut self) -> Result<Option<DecodedFrame>> {
|
||||
loop {
|
||||
// If we have not yet set an output type, do so now (NV12). The first
|
||||
// ProcessOutput typically returns STREAM_CHANGE until this is set.
|
||||
if self.width == 0 {
|
||||
// Try to set NV12 output; ignore failures here (the decoder may
|
||||
// require a STREAM_CHANGE round-trip first).
|
||||
let _ = self.configure_output_nv12();
|
||||
}
|
||||
|
||||
let stream_info: MFT_OUTPUT_STREAM_INFO = self
|
||||
.transform
|
||||
.GetOutputStreamInfo(self.output_stream_id)
|
||||
.context("decoder GetOutputStreamInfo")?;
|
||||
|
||||
const MFT_OUTPUT_STREAM_PROVIDES_SAMPLES: u32 = 0x100;
|
||||
let mft_provides = stream_info.dwFlags & MFT_OUTPUT_STREAM_PROVIDES_SAMPLES != 0;
|
||||
|
||||
let mut out_buffer = MFT_OUTPUT_DATA_BUFFER {
|
||||
dwStreamID: self.output_stream_id,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
if !mft_provides {
|
||||
let alloc = stream_info.cbSize.max(self.guess_nv12_size());
|
||||
let sample: IMFSample = MFCreateSample().context("MFCreateSample(dec out)")?;
|
||||
let buffer =
|
||||
MFCreateMemoryBuffer(alloc).context("MFCreateMemoryBuffer(dec out)")?;
|
||||
sample.AddBuffer(&buffer)?;
|
||||
out_buffer.pSample = std::mem::ManuallyDrop::new(Some(sample));
|
||||
}
|
||||
|
||||
let mut status: u32 = 0;
|
||||
let mut bufs = [out_buffer];
|
||||
let hr = self.transform.ProcessOutput(0, &mut bufs, &mut status);
|
||||
let produced = std::mem::ManuallyDrop::take(&mut bufs[0].pSample);
|
||||
|
||||
match hr {
|
||||
Ok(()) => {
|
||||
// (Re)read the negotiated size in case it just became known.
|
||||
if let Ok((w, h)) = self.read_output_size() {
|
||||
self.width = w;
|
||||
self.height = h;
|
||||
}
|
||||
let Some(sample) = produced else {
|
||||
return Ok(None);
|
||||
};
|
||||
if self.width == 0 || self.height == 0 {
|
||||
return Ok(None);
|
||||
}
|
||||
let nv12 = sample_to_vec(&sample)?;
|
||||
let bgra = nv12_to_bgra(&nv12, self.width, self.height)?;
|
||||
return Ok(Some(DecodedFrame {
|
||||
width: self.width,
|
||||
height: self.height,
|
||||
bgra,
|
||||
}));
|
||||
}
|
||||
Err(e) if e.code() == MF_E_TRANSFORM_NEED_MORE_INPUT => return Ok(None),
|
||||
Err(e) if e.code() == MF_E_TRANSFORM_STREAM_CHANGE => {
|
||||
// The decoder learned the frame size: (re)negotiate NV12 out,
|
||||
// record the size, and retry the drain.
|
||||
self.configure_output_nv12()
|
||||
.context("decoder output renegotiation after stream change")?;
|
||||
if let Ok((w, h)) = self.read_output_size() {
|
||||
self.width = w;
|
||||
self.height = h;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
Err(e) => return Err(anyhow!("decoder ProcessOutput failed: {e:#}")),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Conservative NV12 buffer estimate when the decoder doesn't report cbSize.
|
||||
fn guess_nv12_size(&self) -> u32 {
|
||||
if self.width != 0 && self.height != 0 {
|
||||
self.width * self.height * 3 / 2
|
||||
} else {
|
||||
// 1080p NV12 upper bound until the real size is known.
|
||||
1920 * 1080 * 3 / 2
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for H264Decoder {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
if self.streaming {
|
||||
let _ = self
|
||||
.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, 0);
|
||||
let _ = self
|
||||
.transform
|
||||
.ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, 0);
|
||||
}
|
||||
if self.mf_started {
|
||||
let _ = MFShutdown();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Enumerate and activate an H.264 decoder MFT (hardware preferred, software
|
||||
/// acceptable — decode does not require a HW encoder).
|
||||
unsafe fn activate_decoder() -> Result<IMFTransform> {
|
||||
let input_type = MFT_REGISTER_TYPE_INFO {
|
||||
guidMajorType: MFMediaType_Video,
|
||||
guidSubtype: MFVideoFormat_H264,
|
||||
};
|
||||
|
||||
let mut activate_ptr: *mut Option<windows::Win32::Media::MediaFoundation::IMFActivate> =
|
||||
std::ptr::null_mut();
|
||||
let mut count: u32 = 0;
|
||||
|
||||
// Allow both HW and SW decoders; SYNCMFT keeps the simple ProcessInput/Output
|
||||
// contract this first cut uses.
|
||||
MFTEnumEx(
|
||||
MFT_CATEGORY_VIDEO_DECODER,
|
||||
MFT_ENUM_FLAG_SYNCMFT | MFT_ENUM_FLAG_SORTANDFILTER,
|
||||
Some(&input_type as *const _),
|
||||
None,
|
||||
&mut activate_ptr,
|
||||
&mut count,
|
||||
)
|
||||
.context("MFTEnumEx (H264 decoder)")?;
|
||||
|
||||
if count == 0 || activate_ptr.is_null() {
|
||||
if !activate_ptr.is_null() {
|
||||
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
|
||||
}
|
||||
return Err(anyhow!("no H.264 decoder MFT available"));
|
||||
}
|
||||
|
||||
let slice = std::slice::from_raw_parts_mut(activate_ptr, count as usize);
|
||||
let mut chosen: Option<IMFTransform> = None;
|
||||
for entry in slice.iter_mut() {
|
||||
if chosen.is_none() {
|
||||
if let Some(activate) = entry.as_ref() {
|
||||
if let Ok(t) = activate.ActivateObject::<IMFTransform>() {
|
||||
chosen = Some(t);
|
||||
}
|
||||
}
|
||||
}
|
||||
entry.take();
|
||||
}
|
||||
windows::Win32::System::Com::CoTaskMemFree(Some(activate_ptr as *const _));
|
||||
|
||||
chosen.ok_or_else(|| anyhow!("failed to activate H.264 decoder MFT"))
|
||||
}
|
||||
|
||||
/// Wrap an H.264 access unit into an IMFSample.
|
||||
unsafe fn make_input_sample(data: &[u8], pts_100ns: i64) -> Result<IMFSample> {
|
||||
let sample: IMFSample = MFCreateSample().context("MFCreateSample(dec in)")?;
|
||||
let buffer = MFCreateMemoryBuffer(data.len() as u32).context("MFCreateMemoryBuffer(dec in)")?;
|
||||
|
||||
let mut ptr: *mut u8 = std::ptr::null_mut();
|
||||
let mut max_len: u32 = 0;
|
||||
buffer
|
||||
.Lock(&mut ptr, Some(&mut max_len), None)
|
||||
.context("decoder input Lock")?;
|
||||
if (max_len as usize) < data.len() || ptr.is_null() {
|
||||
let _ = buffer.Unlock();
|
||||
return Err(anyhow!("MF buffer too small for H.264 access unit"));
|
||||
}
|
||||
std::ptr::copy_nonoverlapping(data.as_ptr(), ptr, data.len());
|
||||
buffer.SetCurrentLength(data.len() as u32)?;
|
||||
buffer.Unlock()?;
|
||||
|
||||
sample.AddBuffer(&buffer)?;
|
||||
sample.SetSampleTime(pts_100ns)?;
|
||||
Ok(sample)
|
||||
}
|
||||
|
||||
/// Copy a sample's contiguous bytes into a Vec.
|
||||
unsafe fn sample_to_vec(sample: &IMFSample) -> Result<Vec<u8>> {
|
||||
let buffer = sample
|
||||
.ConvertToContiguousBuffer()
|
||||
.context("decoder ConvertToContiguousBuffer")?;
|
||||
let mut ptr: *mut u8 = std::ptr::null_mut();
|
||||
let mut len: u32 = 0;
|
||||
buffer
|
||||
.Lock(&mut ptr, None, Some(&mut len))
|
||||
.context("decoder output Lock")?;
|
||||
let out = if ptr.is_null() || len == 0 {
|
||||
Vec::new()
|
||||
} else {
|
||||
std::slice::from_raw_parts(ptr, len as usize).to_vec()
|
||||
};
|
||||
let _ = buffer.Unlock();
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
/// MF version word for `MFStartup` (see encoder::h264).
|
||||
fn mf_version() -> u32 {
|
||||
0x0002_0070
|
||||
}
|
||||
|
||||
/// Convert an NV12 buffer to BGRA (BT.601 limited range). Inverse of the
|
||||
/// encoder's BGRA->NV12. Shared with the unit tests below.
|
||||
pub fn nv12_to_bgra(nv12: &[u8], width: u32, height: u32) -> Result<Vec<u8>> {
|
||||
let w = width as usize;
|
||||
let h = height as usize;
|
||||
let y_size = w * h;
|
||||
let need = y_size * 3 / 2;
|
||||
if nv12.len() < need {
|
||||
return Err(anyhow!("NV12 buffer too small: {} < {}", nv12.len(), need));
|
||||
}
|
||||
|
||||
let (y_plane, uv_plane) = nv12.split_at(y_size);
|
||||
let mut bgra = vec![0u8; w * h * 4];
|
||||
let chroma_cols = w / 2;
|
||||
|
||||
for row in 0..h {
|
||||
for col in 0..w {
|
||||
let y = y_plane[row * w + col] as i32;
|
||||
let cx = col / 2;
|
||||
let cy = row / 2;
|
||||
let uv_idx = (cy * chroma_cols + cx) * 2;
|
||||
let u = uv_plane[uv_idx] as i32;
|
||||
let v = uv_plane[uv_idx + 1] as i32;
|
||||
|
||||
// BT.601 limited-range YUV -> RGB.
|
||||
let c = y - 16;
|
||||
let d = u - 128;
|
||||
let e = v - 128;
|
||||
let r = ((298 * c + 409 * e + 128) >> 8).clamp(0, 255);
|
||||
let g = ((298 * c - 100 * d - 208 * e + 128) >> 8).clamp(0, 255);
|
||||
let b = ((298 * c + 516 * d + 128) >> 8).clamp(0, 255);
|
||||
|
||||
let px = (row * w + col) * 4;
|
||||
bgra[px] = b as u8;
|
||||
bgra[px + 1] = g as u8;
|
||||
bgra[px + 2] = r as u8;
|
||||
bgra[px + 3] = 255;
|
||||
}
|
||||
}
|
||||
|
||||
Ok(bgra)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::encoder::color::{bgra_to_nv12, nv12_size};
|
||||
|
||||
/// Round-trip a solid color through BGRA->NV12->BGRA. Chroma subsampling and
|
||||
/// limited-range rounding introduce small error, so allow a tolerance.
|
||||
#[test]
|
||||
fn nv12_bgra_roundtrip_is_approximately_lossless_for_solid_color() {
|
||||
let w = 4u32;
|
||||
let h = 4u32;
|
||||
// Mid gray.
|
||||
let mut bgra = vec![0u8; (w * h * 4) as usize];
|
||||
for px in bgra.chunks_mut(4) {
|
||||
px[0] = 120; // B
|
||||
px[1] = 120; // G
|
||||
px[2] = 120; // R
|
||||
px[3] = 255;
|
||||
}
|
||||
let mut nv12 = vec![0u8; nv12_size(w, h)];
|
||||
bgra_to_nv12(&bgra, w, h, &mut nv12).unwrap();
|
||||
let back = nv12_to_bgra(&nv12, w, h).unwrap();
|
||||
|
||||
for (orig, got) in bgra.chunks(4).zip(back.chunks(4)) {
|
||||
for ch in 0..3 {
|
||||
let diff = (orig[ch] as i32 - got[ch] as i32).abs();
|
||||
assert!(diff <= 6, "channel {ch} drift {diff} too large");
|
||||
}
|
||||
assert_eq!(got[3], 255, "alpha must be opaque");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn nv12_to_bgra_rejects_short_buffer() {
|
||||
let nv12 = vec![0u8; 4];
|
||||
assert!(nv12_to_bgra(&nv12, 16, 16).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn black_nv12_decodes_to_black_bgra() {
|
||||
// Limited-range black: Y=16, UV=128.
|
||||
let w = 2u32;
|
||||
let h = 2u32;
|
||||
let mut nv12 = vec![128u8; nv12_size(w, h)];
|
||||
for y in nv12.iter_mut().take((w * h) as usize) {
|
||||
*y = 16;
|
||||
}
|
||||
let bgra = nv12_to_bgra(&nv12, w, h).unwrap();
|
||||
for px in bgra.chunks(4) {
|
||||
assert!(px[0] <= 2 && px[1] <= 2 && px[2] <= 2, "near-black");
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -3,6 +3,8 @@
|
||||
//! This module provides the viewer functionality for connecting to remote
|
||||
//! GuruConnect sessions with low-level keyboard hooks for Win key capture.
|
||||
|
||||
#[cfg(windows)]
|
||||
mod decoder;
|
||||
mod input;
|
||||
mod render;
|
||||
mod transport;
|
||||
@@ -31,6 +33,72 @@ pub enum InputEvent {
|
||||
SpecialKey(proto::SpecialKeyEvent),
|
||||
}
|
||||
|
||||
/// Spawn the dedicated H.264 decode worker thread (Task 7, Windows only).
|
||||
///
|
||||
/// Returns a sender for `(h264_access_unit, pts_100ns)`. The worker lazily
|
||||
/// creates the Media Foundation decoder on the first frame; if creation fails it
|
||||
/// logs once and then silently drops subsequent frames (the raw render path is
|
||||
/// never affected). Each decoded frame is converted to BGRA and delivered to the
|
||||
/// viewer as an uncompressed `FrameData`, reusing the existing render path.
|
||||
#[cfg(windows)]
|
||||
fn spawn_h264_decode_worker(
|
||||
viewer_tx: mpsc::Sender<ViewerEvent>,
|
||||
) -> std::sync::mpsc::Sender<(Vec<u8>, i64)> {
|
||||
let (tx, rx) = std::sync::mpsc::channel::<(Vec<u8>, i64)>();
|
||||
|
||||
std::thread::Builder::new()
|
||||
.name("gc-h264-decode".to_string())
|
||||
.spawn(move || {
|
||||
let mut decoder: Option<decoder::H264Decoder> = None;
|
||||
let mut init_failed = false;
|
||||
|
||||
while let Ok((data, pts)) = rx.recv() {
|
||||
if init_failed {
|
||||
continue;
|
||||
}
|
||||
if decoder.is_none() {
|
||||
match decoder::H264Decoder::new() {
|
||||
Ok(d) => {
|
||||
info!("H.264 decoder initialized (Media Foundation)");
|
||||
decoder = Some(d);
|
||||
}
|
||||
Err(e) => {
|
||||
error!(
|
||||
"H.264 decoder init failed: {e:#}; H.264 frames will be dropped"
|
||||
);
|
||||
init_failed = true;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let dec = decoder.as_mut().expect("decoder present after init");
|
||||
match dec.decode(&data, pts) {
|
||||
Ok(Some(decoded)) => {
|
||||
let frame = render::FrameData {
|
||||
width: decoded.width,
|
||||
height: decoded.height,
|
||||
data: decoded.bgra,
|
||||
compressed: false, // already BGRA
|
||||
is_keyframe: false,
|
||||
};
|
||||
if viewer_tx.blocking_send(ViewerEvent::Frame(frame)).is_err() {
|
||||
// Viewer closed; stop the worker.
|
||||
break;
|
||||
}
|
||||
}
|
||||
Ok(None) => { /* decoder buffering; no output this tick */ }
|
||||
Err(e) => {
|
||||
warn!("H.264 decode error: {e:#}");
|
||||
}
|
||||
}
|
||||
}
|
||||
})
|
||||
.expect("failed to spawn H.264 decode worker thread");
|
||||
|
||||
tx
|
||||
}
|
||||
|
||||
/// Run the viewer to connect to a remote session
|
||||
pub async fn run(server_url: &str, session_id: &str, api_key: &str) -> Result<()> {
|
||||
info!("GuruConnect Viewer starting");
|
||||
@@ -77,13 +145,23 @@ pub async fn run(server_url: &str, session_id: &str, api_key: &str) -> Result<()
|
||||
}
|
||||
});
|
||||
|
||||
// H.264 decode worker (Task 7, Windows only). The Media Foundation decoder
|
||||
// wraps COM interfaces with thread affinity, so it runs on a DEDICATED OS
|
||||
// thread (not a tokio task, which can migrate across workers at await
|
||||
// points). The receive task forwards H.264 access units to it over a std
|
||||
// channel; the worker decodes to BGRA and pushes a FrameData back through
|
||||
// the viewer channel via `blocking_send`. On decoder-init failure the worker
|
||||
// logs and drops H.264 frames (the raw path is unaffected).
|
||||
#[cfg(windows)]
|
||||
let h264_tx = spawn_h264_decode_worker(viewer_tx.clone());
|
||||
|
||||
// Spawn task to receive messages from server
|
||||
let viewer_tx_recv = viewer_tx.clone();
|
||||
let receive_task = tokio::spawn(async move {
|
||||
while let Some(msg) = ws_receiver.recv().await {
|
||||
match msg.payload {
|
||||
Some(proto::message::Payload::VideoFrame(frame)) => {
|
||||
if let Some(proto::video_frame::Encoding::Raw(raw)) = frame.encoding {
|
||||
Some(proto::message::Payload::VideoFrame(frame)) => match frame.encoding {
|
||||
Some(proto::video_frame::Encoding::Raw(raw)) => {
|
||||
let frame_data = render::FrameData {
|
||||
width: raw.width as u32,
|
||||
height: raw.height as u32,
|
||||
@@ -93,7 +171,23 @@ pub async fn run(server_url: &str, session_id: &str, api_key: &str) -> Result<()
|
||||
};
|
||||
let _ = viewer_tx_recv.send(ViewerEvent::Frame(frame_data)).await;
|
||||
}
|
||||
}
|
||||
Some(proto::video_frame::Encoding::H264(enc)) => {
|
||||
// Forward to the decode worker (Windows). On other
|
||||
// platforms H.264 is never negotiated, so this is dead.
|
||||
#[cfg(windows)]
|
||||
{
|
||||
if h264_tx.send((enc.data, enc.pts)).is_err() {
|
||||
warn!("H.264 decode worker unavailable; dropping frame");
|
||||
}
|
||||
}
|
||||
#[cfg(not(windows))]
|
||||
{
|
||||
let _ = enc;
|
||||
}
|
||||
}
|
||||
// VP9/H265 not implemented on the viewer (raw + H.264 only).
|
||||
_ => {}
|
||||
},
|
||||
Some(proto::message::Payload::CursorPosition(pos)) => {
|
||||
let _ = viewer_tx_recv
|
||||
.send(ViewerEvent::CursorPosition(pos.x, pos.y, pos.visible))
|
||||
|
||||
@@ -17,6 +17,23 @@ message SessionResponse {
|
||||
string session_id = 2;
|
||||
string error = 3;
|
||||
DisplayInfo display_info = 4;
|
||||
// Negotiated video codec for this session (Task 7). The plan models the
|
||||
// codec selection here; however the LIVE server->agent handshake in v2 is
|
||||
// done over WebSocket query params + StartStream (SessionRequest/
|
||||
// SessionResponse are not exchanged on the wire today), so the codec the
|
||||
// agent actually applies is carried on StartStream.video_codec below. This
|
||||
// field is kept for spec parity / future use; raw is the safe default.
|
||||
VideoCodec video_codec = 5;
|
||||
}
|
||||
|
||||
// Negotiated video codec (Task 7). RAW = salvaged BGRA+Zstd+dirty-rects, the
|
||||
// guaranteed fallback and current default. H264 = hardware Media Foundation
|
||||
// encode, the negotiated upgrade (compile-verified; validated on hardware in
|
||||
// plan Task 8). H265/HEVC is future opt-in (TODO).
|
||||
enum VideoCodec {
|
||||
VIDEO_CODEC_RAW = 0; // Raw BGRA + Zstd (safe default / fallback)
|
||||
VIDEO_CODEC_H264 = 1; // Hardware H.264 via Media Foundation
|
||||
VIDEO_CODEC_H265 = 2; // Future / opt-in (not implemented)
|
||||
}
|
||||
|
||||
enum SessionType {
|
||||
@@ -268,6 +285,13 @@ message Disconnect {
|
||||
message StartStream {
|
||||
string viewer_id = 1; // ID of viewer requesting stream
|
||||
int32 display_id = 2; // Which display to stream (0 = primary)
|
||||
// Negotiated codec for this stream (Task 7). The server selects this from
|
||||
// the agent's advertised capability (AgentStatus.supports_h264) and its
|
||||
// policy (DEFAULT_PREFER_H264, currently false so we never ship unvalidated
|
||||
// H.264 as the default). The agent builds its encoder from this value;
|
||||
// VIDEO_CODEC_RAW (0, the default for older servers) keeps the salvaged
|
||||
// raw+Zstd path. On any H.264 init failure the agent falls back to raw.
|
||||
VideoCodec video_codec = 3;
|
||||
}
|
||||
|
||||
// Server commands agent to stop streaming
|
||||
@@ -287,6 +311,12 @@ message AgentStatus {
|
||||
string organization = 8; // Company/organization name
|
||||
string site = 9; // Site/location name
|
||||
repeated string tags = 10; // Tags for categorization
|
||||
// HW-encode capability (Task 7). True when the agent enumerated a HARDWARE
|
||||
// H.264 encoder MFT at startup (MFTEnumEx, MFT_CATEGORY_VIDEO_ENCODER,
|
||||
// MFVideoFormat_H264, hardware flag). The server uses this for codec
|
||||
// negotiation (see StartStream.video_codec). Detected once and cached;
|
||||
// false on non-Windows / no HW encoder / MF unavailable.
|
||||
bool supports_h264 = 11;
|
||||
}
|
||||
|
||||
// Server commands agent to uninstall itself
|
||||
|
||||
@@ -805,6 +805,7 @@ async fn handle_agent_connection(
|
||||
organization.clone(),
|
||||
site.clone(),
|
||||
status.tags.clone(),
|
||||
status.supports_h264,
|
||||
)
|
||||
.await;
|
||||
|
||||
|
||||
@@ -93,6 +93,36 @@ pub struct Session {
|
||||
pub organization: Option<String>, // Company/organization name
|
||||
pub site: Option<String>, // Site/location name
|
||||
pub tags: Vec<String>, // Tags for categorization
|
||||
/// Whether the agent advertised a hardware H.264 encoder (Task 7). Set from
|
||||
/// `AgentStatus.supports_h264`; drives codec negotiation in `select_video_codec`.
|
||||
pub supports_h264: bool,
|
||||
}
|
||||
|
||||
/// Default codec-negotiation policy (Task 7).
|
||||
///
|
||||
/// `false` means: even when an agent advertises hardware H.264 support, the
|
||||
/// server still negotiates RAW. H.264 is compile-verified only and not yet
|
||||
/// validated on real hardware (plan Task 8), so we deliberately do NOT ship it
|
||||
/// as the default — raw+Zstd stays the guaranteed working path. Flip this to
|
||||
/// `true` once H.264 is live-validated, or make it per-tenant policy later.
|
||||
pub const DEFAULT_PREFER_H264: bool = false;
|
||||
|
||||
/// Negotiate the video codec for a stream (Task 7).
|
||||
///
|
||||
/// Pure decision function (unit-tested): given whether the agent advertised
|
||||
/// hardware H.264 and whether policy prefers H.264, pick the codec. H.264 is
|
||||
/// chosen ONLY when both the agent supports it AND policy allows it; otherwise
|
||||
/// raw — the safe default/fallback. HEVC is intentionally never selected here
|
||||
/// (future opt-in; TODO).
|
||||
pub fn select_video_codec(
|
||||
agent_supports_h264: bool,
|
||||
prefer_h264: bool,
|
||||
) -> crate::proto::VideoCodec {
|
||||
if agent_supports_h264 && prefer_h264 {
|
||||
crate::proto::VideoCodec::H264
|
||||
} else {
|
||||
crate::proto::VideoCodec::Raw
|
||||
}
|
||||
}
|
||||
|
||||
/// Channel for sending frames from agent to viewers
|
||||
@@ -206,6 +236,7 @@ impl SessionManager {
|
||||
organization: None,
|
||||
site: None,
|
||||
tags: Vec::new(),
|
||||
supports_h264: false,
|
||||
};
|
||||
|
||||
let session_data = SessionData {
|
||||
@@ -240,12 +271,14 @@ impl SessionManager {
|
||||
organization: Option<String>,
|
||||
site: Option<String>,
|
||||
tags: Vec<String>,
|
||||
supports_h264: bool,
|
||||
) {
|
||||
let mut sessions = self.sessions.write().await;
|
||||
if let Some(session_data) = sessions.get_mut(&session_id) {
|
||||
session_data.info.last_heartbeat = chrono::Utc::now();
|
||||
session_data.last_heartbeat_instant = Instant::now();
|
||||
session_data.info.is_streaming = is_streaming;
|
||||
session_data.info.supports_h264 = supports_h264;
|
||||
if let Some(os) = os_version {
|
||||
session_data.info.os_version = Some(os);
|
||||
}
|
||||
@@ -409,10 +442,23 @@ impl SessionManager {
|
||||
use crate::proto;
|
||||
use prost::Message;
|
||||
|
||||
// Negotiate the video codec for this stream (Task 7): H.264 only when the
|
||||
// agent advertised hardware support AND policy prefers it. With
|
||||
// DEFAULT_PREFER_H264 = false this always resolves to RAW today (H.264 is
|
||||
// compile-verified only, validated on hardware in Task 8).
|
||||
let codec = select_video_codec(session_data.info.supports_h264, DEFAULT_PREFER_H264);
|
||||
tracing::info!(
|
||||
"StartStream codec negotiation: agent_supports_h264={}, prefer_h264={} -> {:?}",
|
||||
session_data.info.supports_h264,
|
||||
DEFAULT_PREFER_H264,
|
||||
codec
|
||||
);
|
||||
|
||||
let start_stream = proto::Message {
|
||||
payload: Some(proto::message::Payload::StartStream(proto::StartStream {
|
||||
viewer_id: viewer_id.to_string(),
|
||||
display_id: 0, // Primary display
|
||||
video_codec: codec as i32,
|
||||
})),
|
||||
};
|
||||
|
||||
@@ -618,6 +664,7 @@ impl SessionManager {
|
||||
organization: None,
|
||||
site: None,
|
||||
tags: Vec::new(),
|
||||
supports_h264: false,
|
||||
};
|
||||
|
||||
// Create placeholder channels (will be replaced on reconnect)
|
||||
@@ -717,6 +764,69 @@ mod tests {
|
||||
.is_some());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn codec_negotiation_picks_h264_only_when_supported_and_preferred() {
|
||||
use crate::proto::VideoCodec;
|
||||
|
||||
// Agent supports H.264 AND policy prefers it -> H.264.
|
||||
assert_eq!(select_video_codec(true, true), VideoCodec::H264);
|
||||
// Agent supports it but policy does not prefer it -> raw (the safe default).
|
||||
assert_eq!(select_video_codec(true, false), VideoCodec::Raw);
|
||||
// Policy prefers H.264 but the agent has no HW encoder -> raw.
|
||||
assert_eq!(select_video_codec(false, true), VideoCodec::Raw);
|
||||
// Neither -> raw.
|
||||
assert_eq!(select_video_codec(false, false), VideoCodec::Raw);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn default_policy_does_not_prefer_h264() {
|
||||
// Guardrail: until H.264 is hardware-validated (Task 8) the default policy
|
||||
// MUST keep raw as the negotiated codec even for capable agents. We assert
|
||||
// the OBSERVABLE behavior (codec selection under the default policy) rather
|
||||
// than the constant directly, which keeps the test meaningful if the policy
|
||||
// later becomes dynamic.
|
||||
let chosen = select_video_codec(true, DEFAULT_PREFER_H264);
|
||||
assert_eq!(
|
||||
chosen,
|
||||
crate::proto::VideoCodec::Raw,
|
||||
"default policy must negotiate raw until H.264 is hardware-validated"
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn agent_status_updates_h264_capability() {
|
||||
let mgr = SessionManager::new();
|
||||
let (session_id, _frame_tx, _input_rx) = mgr
|
||||
.register_agent("agent-cap".to_string(), "Cap PC".to_string(), true)
|
||||
.await;
|
||||
|
||||
// Default is false until a status reports capability.
|
||||
assert_eq!(
|
||||
mgr.get_session(session_id).await.map(|s| s.supports_h264),
|
||||
Some(false)
|
||||
);
|
||||
|
||||
mgr.update_agent_status(
|
||||
session_id,
|
||||
Some("Windows".to_string()),
|
||||
true,
|
||||
10,
|
||||
1,
|
||||
false,
|
||||
Some("0.2.0".to_string()),
|
||||
None,
|
||||
None,
|
||||
Vec::new(),
|
||||
true, // supports_h264
|
||||
)
|
||||
.await;
|
||||
|
||||
assert_eq!(
|
||||
mgr.get_session(session_id).await.map(|s| s.supports_h264),
|
||||
Some(true)
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn denied_attended_session_keeps_viewer_blocked() {
|
||||
let mgr = SessionManager::new();
|
||||
|
||||
@@ -393,11 +393,104 @@ Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.r
|
||||
|
||||
---
|
||||
|
||||
## Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback
|
||||
## Task 7 [IMPLEMENTED 2026-05-30 — self-verified on local Windows toolchain: `cargo fmt --all --check` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 89 pass (36 agent + 53 server; was 70, no regressions), `cargo build --workspace` ok; pending Code Review]: Hardware H.264 encode + negotiated raw/Zstd fallback
|
||||
|
||||
Files touched: `agent/src/encoder/` (`mod.rs`, `h264.rs` [new], `raw.rs` [salvaged]),
|
||||
`agent/src/capture/` (feed), `agent/src/viewer/` (decode), `proto/guruconnect.proto`
|
||||
(`AgentStatus` capability, `SessionResponse` codec), `server/src/session/mod.rs` (negotiation).
|
||||
> [IMPLEMENTED] Raw+Zstd remains the DEFAULT and guaranteed fallback; H.264 is a
|
||||
> negotiated upgrade that is COMPILE-VERIFIED ONLY (live MF encode/decode is Task
|
||||
> 8 — needs real GPU + frames). The testable parts (abstraction, factory,
|
||||
> negotiation, capability plumbing, color-conversion math) are done solidly with
|
||||
> unit tests; the MF H.264 encoder and viewer decoder are first-cut, clearly
|
||||
> marked, and gated behind a default-off policy so unvalidated H.264 never ships
|
||||
> as the default.
|
||||
>
|
||||
> 1. ENCODER ABSTRACTION (`agent/src/encoder/mod.rs`): the existing `Encoder`
|
||||
> trait (`encode(&mut self, &CapturedFrame) -> Result<EncodedFrame>`) is the
|
||||
> abstraction; `RawEncoder` (salvaged raw+Zstd+dirty-rects, UNCHANGED behavior)
|
||||
> and the new `H264Encoder` both implement it. Factory split into pure pieces:
|
||||
> `codec_from_str` (config-string -> `VideoCodec`), `select_codec(negotiated,
|
||||
> hardware_available)` (agent-side guard: H.264 only if HW present, HEVC->raw,
|
||||
> else raw), and `create_encoder_for(VideoCodec, quality)` (builds the encoder;
|
||||
> on H.264 init failure logs + returns a RAW encoder so the session never
|
||||
> breaks). UNIT-TESTED: codec_from_str mapping, select_codec guard matrix, raw
|
||||
> factory always succeeds, string path resolves to raw without HW.
|
||||
> 2. CAPABILITY + NEGOTIATION (testable, done well):
|
||||
> - `encoder/capability.rs`: `supports_hardware_h264()` probes MF once
|
||||
> (`MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_HARDWARE,
|
||||
> MFVideoFormat_H264)`), caches the bool via `OnceLock`; false on non-Windows
|
||||
> / no HW / MF error. Advertised in `AgentStatus.supports_h264` (proto field
|
||||
> 11, additive).
|
||||
> - Server (`server/src/session/mod.rs`): `select_video_codec(agent_supports,
|
||||
> prefer_h264)` is a PURE decision fn — H.264 only when BOTH the agent
|
||||
> supports it AND policy prefers it, else raw. Policy constant
|
||||
> `DEFAULT_PREFER_H264 = false` (documented: keeps raw as the negotiated codec
|
||||
> until H.264 is hardware-validated). `supports_h264` stored on the in-memory
|
||||
> `Session` from `AgentStatus` (`update_agent_status` gained the param). The
|
||||
> negotiated codec is stamped on `StartStream.video_codec` in
|
||||
> `send_start_stream_internal` (the LIVE server->agent codec-selection point —
|
||||
> SessionRequest/SessionResponse are not exchanged on the wire in v2, so the
|
||||
> proto's `SessionResponse.video_codec` is kept for spec parity but the live
|
||||
> path uses `StartStream`). UNIT-TESTED: the negotiation matrix, the
|
||||
> default-policy guardrail (capable agent still gets raw), and the
|
||||
> `AgentStatus -> supports_h264` ingest.
|
||||
> - Agent applies it: `StartStream` handler decodes `video_codec`, stores
|
||||
> `negotiated_codec`, and `init_streaming` builds the encoder via
|
||||
> `select_codec` + `create_encoder_for` (re-guards on local HW; older server
|
||||
> sends 0 = RAW, preserving the default).
|
||||
> 3. MF H.264 ENCODER (`agent/src/encoder/h264.rs`, FIRST-CUT, compile-verified
|
||||
> only): enumerates+activates a HW H.264 encoder MFT, sets H.264 output then
|
||||
> NV12 input media types (frame size/rate, bitrate from quality), feeds frames
|
||||
> (`ProcessInput`) and drains synchronously (`ProcessOutput`, NEED_MORE_INPUT =
|
||||
> "no output this tick"), emitting `VideoFrame{H264(EncodedFrame{data, keyframe,
|
||||
> pts, dts})}`. BGRA->NV12 via `encoder/color.rs` (BT.601 limited-range, 2x2 box
|
||||
> chroma; isolated + UNIT-TESTED: size, odd-dim/short-buffer rejection, black/
|
||||
> white/red reference values, plane coverage). On ANY init failure the FACTORY
|
||||
> falls back to raw (logged); per-frame errors surface to the session (which
|
||||
> logs + continues). Handles resolution change (re-init), keyframe flag
|
||||
> (CleanPoint), MF buffer alloc for non-sample-providing MFTs. NOT yet live: the
|
||||
> async-MFT event model is documented as a Task-8 refinement (this cut drains
|
||||
> synchronously); precise force-IDR (CODECAPI) is a TODO; D3D11 zero-copy
|
||||
> deferred (feeds CPU NV12).
|
||||
> 4. VIEWER H.264 DECODE (`agent/src/viewer/decoder.rs` [new], FIRST-CUT,
|
||||
> compile-verified only): MF H.264 decoder MFT -> NV12 -> BGRA
|
||||
> (`nv12_to_bgra`, BT.601 inverse, UNIT-TESTED round-trip within tolerance +
|
||||
> short-buffer + black). Runs on a DEDICATED OS thread (`gc-h264-decode`), NOT a
|
||||
> tokio task — the MF decoder has COM thread affinity and a tokio task can
|
||||
> migrate across workers at await points. The receive task forwards H.264 access
|
||||
> units over a std channel; the worker decodes and pushes BGRA `FrameData`
|
||||
> through the existing render path via `blocking_send`. On decoder-init failure
|
||||
> it logs once and drops H.264 frames; the RAW render path is untouched. Handles
|
||||
> the `MF_E_TRANSFORM_STREAM_CHANGE` NV12 output renegotiation + size discovery.
|
||||
> 5. RAW STILL WORKS END-TO-END: `RawEncoder` is unchanged; with
|
||||
> `DEFAULT_PREFER_H264 = false` the server negotiates RAW for every session
|
||||
> (including capable agents), the agent builds the raw encoder, and the viewer's
|
||||
> existing `Raw` branch renders it — the guaranteed default/fallback path is
|
||||
> fully intact and is what runs today.
|
||||
>
|
||||
> PROTO (additive — no field renumbered): `VideoCodec` enum (RAW=0, H264=1,
|
||||
> H265=2); `SessionResponse.video_codec = 5` (spec parity); `StartStream.video_codec
|
||||
> = 3` (live negotiation); `AgentStatus.supports_h264 = 11` (capability). HEVC is a
|
||||
> documented TODO/opt-in everywhere (never selected). Cargo.toml: added the
|
||||
> `Win32_Media_MediaFoundation` + COM windows features (no new external crates).
|
||||
>
|
||||
> COMPILE-VERIFIED-ONLY / NEEDS LIVE HARDWARE (Task 8): the MF H.264 encoder
|
||||
> init/feed/emit on a real GPU, the viewer MF decoder on a live stream, the
|
||||
> BGRA<->NV12 fidelity end-to-end, and the synchronous-drain timing. The encoder/
|
||||
> decoder are structured to fall back to raw (encoder) / drop frames + log
|
||||
> (decoder) on any failure so they cannot break a session even if MF misbehaves.
|
||||
>
|
||||
> TESTS ADDED (19): agent +16 (encoder factory/select matrix x5, color BGRA->NV12
|
||||
> x8, decoder NV12<->BGRA x3), server +3 (codec negotiation matrix, default-policy
|
||||
> guardrail, AgentStatus capability ingest).
|
||||
|
||||
Files touched: `proto/guruconnect.proto` (`VideoCodec` enum + `SessionResponse.video_codec`
|
||||
+ `StartStream.video_codec` + `AgentStatus.supports_h264`), `agent/Cargo.toml` (MF/COM windows
|
||||
features), `agent/src/encoder/mod.rs` (trait/factory/select), `agent/src/encoder/raw.rs`
|
||||
(salvaged, unchanged), `agent/src/encoder/h264.rs` [new], `agent/src/encoder/capability.rs` [new],
|
||||
`agent/src/encoder/color.rs` [new], `agent/src/session/mod.rs` (negotiated codec apply +
|
||||
`supports_h264` advertise), `agent/src/viewer/mod.rs` (H.264 route + decode worker),
|
||||
`agent/src/viewer/decoder.rs` [new], `server/src/session/mod.rs` (`select_video_codec` +
|
||||
`DEFAULT_PREFER_H264` + `supports_h264` field/ingest + `StartStream` codec stamp),
|
||||
`server/src/relay/mod.rs` (pass `supports_h264` from `AgentStatus`).
|
||||
|
||||
- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
|
||||
`EncodedFrame` (h264). Native viewer decodes via MF/D3D11.
|
||||
|
||||
Reference in New Issue
Block a user