Comprehensive specification for expanding agent support beyond Windows: macOS Agent (Priority 1): - ScreenCaptureKit API (macOS 13+) with AVFoundation fallback - CGEvent input injection - VideoToolbox H.264 encoding - NSStatusItem menu bar icon - Universal binary (x86_64 + arm64) - Code signing and notarization Linux Agent (Priority 2): - X11 XShm screen capture with Wayland detection - XTest input injection - VA-API hardware H.264 encoding with software fallback - StatusNotifier system tray - .deb and .rpm packaging Architecture: - Platform abstraction layer (traits for capture/input/encoder/tray) - Refactor existing Windows code behind PlatformCapture/Input/Encoder - No protobuf protocol changes - Same authentication (support codes and agent keys) Estimated effort: X-Large (12-16 weeks) Priority: P2 (market-critical for multi-platform MSP adoption) Updated roadmap: promoted from P3 to P2 with full spec link. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
16 KiB
SPEC-010: Cross-Platform Agent Support (macOS and Linux)
Status: Proposed Priority: P2 Requested By: Mike Swanson (2026-05-30) Estimated Effort: X-Large
Overview
Expand GuruConnect's agent support beyond Windows to include macOS and Linux platforms, enabling remote control and support across the full spectrum of enterprise and home computing environments. This cross-platform expansion addresses a critical market gap—ScreenConnect, Splashtop, and AnyDesk all support macOS/Linux, and GuruConnect's Windows-only limitation blocks adoption by multi-platform organizations. The primary technical challenge is abstracting platform-specific screen capture, input injection, and video encoding while maintaining the core protobuf-over-WSS transport and session model. Success criteria: feature parity with the Windows agent (capture, input, tray presence, persistent/support-code modes, H.264 encoding) on macOS 12+ and Ubuntu 22.04+ LTS.
Scope
Included in v1
macOS Agent (Priority 1):
- Screen capture via ScreenCaptureKit API (macOS 13+) with AVFoundation fallback (macOS 12)
- Input injection via CGEvent
- H.264 encoding via VideoToolbox
- Menu bar icon (NSStatusItem) with status and exit option
guruconnect://protocol handler registration- .app bundle packaging and code signing
- Screen Recording permission prompt and validation
- Universal binary (x86_64 + arm64)
Linux Agent (Priority 2):
- Screen capture via X11 (XShm) with Wayland pipewire fallback detection
- Input injection via XTest (X11) or uinput (Wayland)
- H.264 encoding via VA-API (hardware) or software fallback (libx264)
- System tray icon via StatusNotifier/AppIndicator
guruconnect://protocol handler (.desktop file)- .deb packaging (Ubuntu/Debian) and .rpm (Fedora/RHEL)
- x86_64 binary
Shared Cross-Platform:
- Unified agent codebase with platform-specific modules behind traits
- Same protobuf protocol (no wire format changes)
- Same persistent/support-code authentication modes
- Same AgentStatus metadata reporting (OS, architecture, uptime, logged-on user)
- Viewer compatibility: existing native Windows viewer and web viewer work unchanged
Explicitly out of scope
- FreeBSD/BSD support — defer; assess demand post-launch
- Wayland-native screen capture — v1 uses X11 compatibility layer; native Wayland portal support is Phase 2
- Multi-monitor switching on macOS/Linux — already deferred on Windows (P2 roadmap item); cross-platform parity follows
- macOS < 12 — 12.x is the minimum (released Oct 2021, 95%+ adoption)
- Linux distros outside Ubuntu 22.04+ / Debian 11+ / Fedora 38+ / RHEL 9+ — official support limited; community builds acceptable
Architecture
Agent Refactor: Platform Abstraction Layer
Current Windows agent (agent/src/) tightly couples platform APIs. Refactor into:
agent/src/
├── main.rs # CLI entry + platform dispatch
├── session/ # Session logic (platform-agnostic)
├── transport/ # WebSocket (unchanged)
├── platform/ # NEW: trait definitions
│ ├── mod.rs # PlatformCapture, PlatformInput, PlatformTray traits
│ ├── windows/ # Windows impl (existing code refactored)
│ │ ├── capture.rs # DXGI + GDI behind PlatformCapture
│ │ ├── input.rs # Win32 SendInput behind PlatformInput
│ │ ├── encoder.rs # Media Foundation H.264
│ │ └── tray.rs # Win32 shell tray
│ ├── macos/ # NEW: macOS impl
│ │ ├── capture.rs # ScreenCaptureKit/AVFoundation
│ │ ├── input.rs # CGEvent
│ │ ├── encoder.rs # VideoToolbox H.264
│ │ └── tray.rs # NSStatusItem (Objective-C via objc2 crate)
│ └── linux/ # NEW: Linux impl
│ ├── capture.rs # X11 XShm / pipewire detection
│ ├── input.rs # XTest / uinput
│ ├── encoder.rs # VA-API / libx264
│ └── tray.rs # StatusNotifier
├── encoder/ # Color conversion, raw fallback (platform-agnostic)
├── viewer/ # Native viewer (winit cross-platform, unchanged)
├── config.rs # Config (unchanged)
└── install.rs # Installation (platform-specific #[cfg] blocks)
Key traits:
// agent/src/platform/mod.rs
pub trait PlatformCapture: Send {
fn capture_frame(&mut self) -> Result<CapturedFrame>;
fn get_dimensions(&self) -> (u32, u32);
}
pub trait PlatformInput: Send {
fn inject_mouse(&mut self, event: MouseEvent) -> Result<()>;
fn inject_keyboard(&mut self, event: KeyboardEvent) -> Result<()>;
}
pub trait PlatformEncoder: Send {
fn encode(&mut self, frame: &[u8], width: u32, height: u32) -> Result<Vec<u8>>;
fn set_bitrate(&mut self, kbps: u32);
}
pub trait PlatformTray: Send {
fn show(&mut self, status: &str) -> Result<()>;
fn poll_events(&mut self) -> Vec<TrayEvent>;
}
Protobuf Changes
None. The existing AgentStatus message already carries OS/arch metadata; viewer consumes frames opaquely. Protocol remains unchanged.
Database Schema
No migration required. The connect_machines table's os and architecture fields (part of SPEC-003 inventory) already accommodate macOS/Linux values.
Agent Build & Packaging
Cargo.toml adjustments:
[target.'cfg(target_os = "macos")'.dependencies]
core-graphics = "0.23"
core-foundation = "0.9"
objc2 = "0.5"
objc2-foundation = "0.2"
objc2-app-kit = "0.2"
security-framework = "2.9"
[target.'cfg(target_os = "linux")'.dependencies]
x11 = { version = "2.21", features = ["xlib", "xtest", "xrandr"] }
libva = "0.4"
macOS packaging:
.appbundle structure:GuruConnect.app/Contents/MacOS/guruconnect,Info.plistwith protocol handler- Code signing:
codesign --sign "Developer ID Application: Arizona Computer Guru LLC" - Notarization via
notarytool(Apple requirement for macOS 10.15+) - Universal binary:
cargo build --release --target x86_64-apple-darwin && cargo build --release --target aarch64-apple-darwin && lipo -create ...
Linux packaging:
.debviacargo-deb:[package.metadata.deb] maintainer = "AZ Computer Guru <support@azcomputerguru.com>" depends = "libx11-6, libxext6, libxtst6" section = "net" assets = [ ["target/release/guruconnect", "usr/bin/", "755"], ["deploy/linux/guruconnect.desktop", "usr/share/applications/", "644"], ].rpmviacargo-generate-rpm
Distribution
- Server downloads:
/downloads/guruconnect-macos.dmg,/downloads/guruconnect-linux.deb,/downloads/guruconnect-linux.rpm - Dashboard detection: browser User-Agent → suggest correct download
- Auto-update (out of scope for v1): defer cross-platform updater to Phase 2
Implementation Details
Files to Create
macOS:
agent/src/platform/macos/capture.rs— ScreenCaptureKit stream + AVFoundation fallbackagent/src/platform/macos/input.rs— CGEventCreateMouseEvent, CGEventCreateKeyboardEvent, CGEventPostagent/src/platform/macos/encoder.rs— VTCompressionSession (VideoToolbox)agent/src/platform/macos/tray.rs— NSStatusBar + NSMenu via objc2agent/Info.plist.template— bundle metadata, protocol handler (CFBundleURLSchemes)
Linux:
agent/src/platform/linux/capture.rs— X11Display::open, XShmGetImage; pipewire detection fallback (log warning)agent/src/platform/linux/input.rs— XTestFakeMotionEvent, XTestFakeButtonEvent, XTestFakeKeyEventagent/src/platform/linux/encoder.rs— libva VAConfigAttrib + VABufferType; software fallback via openh264agent/src/platform/linux/tray.rs— dbus org.kde.StatusNotifierItem protocoldeploy/linux/guruconnect.desktop— .desktop file withMimeType=x-scheme-handler/guruconnect
Shared:
agent/src/platform/mod.rs— trait definitions + platform factoryfn create_platform() -> Box<dyn Platform>- Refactor
agent/src/capture/{dxgi,gdi}.rs→agent/src/platform/windows/capture.rs - Refactor
agent/src/input/{mouse,keyboard}.rs→agent/src/platform/windows/input.rs - Refactor
agent/src/encoder/h264.rs→agent/src/platform/windows/encoder.rs(Media Foundation)
Key Logic
macOS screen capture (ScreenCaptureKit, macOS 13+):
// agent/src/platform/macos/capture.rs
use core_graphics::display::CGDisplay;
use objc2::rc::Id;
use objc2_foundation::NSArray;
use objc2_screen_capture_kit::{SCStreamConfiguration, SCStream, SCContentFilter, SCStreamOutputType};
pub struct MacOSCapture {
stream: Id<SCStream>,
latest_frame: Arc<Mutex<Option<Vec<u8>>>>,
}
impl PlatformCapture for MacOSCapture {
fn capture_frame(&mut self) -> Result<CapturedFrame> {
// ScreenCaptureKit delivers frames to delegate callback
let frame = self.latest_frame.lock().unwrap().clone()
.ok_or(anyhow!("No frame available"))?;
Ok(CapturedFrame { data: frame, width: self.width, height: self.height })
}
}
macOS input injection:
// agent/src/platform/macos/input.rs
use core_graphics::event::{CGEvent, CGEventType, CGMouseButton, EventField};
impl PlatformInput for MacOSInput {
fn inject_mouse(&mut self, event: MouseEvent) -> Result<()> {
let cg_event = match event.button {
MouseButton::Left => CGEvent::new_mouse_event(
None, CGEventType::LeftMouseDown,
CGPoint::new(event.x as f64, event.y as f64),
CGMouseButton::Left
)?,
// ...
};
cg_event.post(CGEventTapLocation::HID);
Ok(())
}
}
Linux X11 capture:
// agent/src/platform/linux/capture.rs
use x11::xlib::{XOpenDisplay, XDefaultRootWindow, XGetImage, ZPixmap};
use x11::xshm::{XShmAttach, XShmGetImage};
pub struct LinuxCapture {
display: *mut Display,
root: Window,
shm_info: XShmSegmentInfo,
}
impl PlatformCapture for LinuxCapture {
fn capture_frame(&mut self) -> Result<CapturedFrame> {
let img = unsafe { XShmGetImage(self.display, self.root, ...) };
// Convert XImage BGRA → frame buffer
Ok(CapturedFrame { data: converted, width, height })
}
}
Permission handling (macOS Screen Recording):
// agent/src/platform/macos/permissions.rs
use objc2_av_foundation::AVCaptureDevice;
pub fn check_screen_recording_permission() -> bool {
// macOS 10.15+ requires explicit Screen Recording permission
// Trigger prompt on first run; subsequent checks use TCC database status
AVCaptureDevice::authorizationStatusForMediaType(AVMediaTypeVideo) == AVAuthorizationStatusAuthorized
}
Security Considerations
macOS Security
- Screen Recording permission: User must grant permission in System Settings → Privacy & Security → Screen Recording. Agent detects denial and logs instructions.
- Gatekeeper: Code signing + notarization required to avoid "unidentified developer" block.
- Hardened runtime: Enable hardened runtime entitlements (
com.apple.security.cs.allow-unsigned-executable-memoryfor JIT, if needed). - Input injection: No additional permission for CGEvent (runs as the logged-on user).
Linux Security
- X11 security: XTest extension enabled by default; uinput requires
/dev/uinputwrite access (typically granted toinputgroup). - Wayland sandboxing: v1 runs under XWayland (X11 compatibility); native Wayland requires PipeWire portal (user consent per-session).
- AppArmor/SELinux: Agent may need profile exemptions for screen capture on hardened distros.
Authentication
No protocol changes. macOS/Linux agents authenticate with the same support-code or persistent agent key (AGENT_API_KEY or per-agent cak_* key from SPEC-004). Relay server validates identically.
Audit Events
Existing events table logs agent connections with os and architecture fields. No schema change needed.
Testing Strategy
Unit Tests
- Mock platform traits for capture/input/encoder modules
- Test frame conversion (BGRA → RGB, stride alignment)
- Test protobuf message serialization (unchanged, but validate on new platforms)
Integration Tests
- macOS test rig: macOS 13 VM (UTM or Parallels), run agent, verify viewer connects and displays screen
- Linux test rig: Ubuntu 22.04 LXC container (X11 forwarded), verify capture + input
- Cross-platform viewer: existing Windows native viewer connects to macOS/Linux agents
Manual Testing Scenarios
- macOS install: Download
.app, drag to Applications, launch, grant Screen Recording permission, enter support code, verify viewer connects. - Linux install:
sudo dpkg -i guruconnect.deb, runguruconnect --support-code ABC123, verify tray icon, test remote control. - Encoding quality: Compare H.264 output from VideoToolbox (macOS), VA-API (Linux), Media Foundation (Windows) — validate frame rate and bitrate parity.
- Input fidelity: Test full keyboard (including modifiers, function keys) and mouse (click, drag, scroll) on all three platforms.
- Protocol handler: Click
guruconnect://connect?session=...link, verify viewer launches with session ID pre-filled.
CI/CD Additions
- macOS build: Gitea Actions runner on macOS (M1 or x86_64), build universal binary, sign, notarize
- Linux build: Existing
ubuntu-latestrunner, build.deband.rpmvia cargo-deb/cargo-generate-rpm - Artifact upload: Upload binaries to
server/static/downloads/on release tag
Effort Estimate & Dependencies
Size: X-Large (12-16 weeks, 1 developer)
Breakdown:
- Platform abstraction refactor (Windows code): 2 weeks
- macOS implementation (capture, input, encoder, tray): 4 weeks
- Linux implementation (X11 capture, input, VA-API encoder, tray): 3 weeks
- Packaging and code signing (macOS notarization, .deb/.rpm): 2 weeks
- Testing, documentation, CI/CD integration: 2 weeks
- Buffer for platform-specific edge cases: 1-3 weeks
Dependencies:
- SPEC-002 v2 Phase 1 completion — per-agent keys and secure session core must be stable (already shipped)
- SPEC-004 deterministic machine identity — macOS/Linux need stable
machine_uidderivation (macOS:IOPlatformUUID, Linux:/etc/machine-id) - Code signing infrastructure: Apple Developer Program account + Azure Trusted Signing already in place (for Windows); reuse for macOS
- Test infrastructure: macOS VM or bare-metal test host, Linux LXC container with X11
Unblocks:
- Multi-platform MSP adoption (organizations with mixed Windows/macOS/Linux fleets)
- Education/developer market (high macOS/Linux penetration)
- SPEC-003 machine inventory (macOS/Linux populate same fields)
Open Questions
- Wayland native support timeline? — v1 uses X11 compatibility (XWayland); when do we commit to PipeWire portal? Defer to Phase 2 pending Wayland adoption metrics.
- ARM Linux (Raspberry Pi, ARM servers)? — Ubuntu 22.04 arm64 is viable; add as a build target? Defer to community request.
- macOS Ventura (13.0) vs. Monterey (12.0) adoption split? — ScreenCaptureKit (13.0+) is preferred; fallback to AVFoundation for 12.x. Monitor 12.x usage after 6 months; deprecate if <5%.
- Linux H.264 licensing? — VA-API uses hardware codecs (no license issue); software fallback via openh264 (Cisco BSD license, royalty-free). Confirm distro packaging allows bundling.
- System tray on headless Linux servers? — Agent can run headless (tray is optional);
--no-trayflag for server deployments.
Cross-references:
- ADR-001: GuruConnect is a standalone product (no RMM coupling)
- SPEC-002: v2 modernization architecture (agent key model)
- SPEC-003: Machine inventory (OS/arch fields)
- SPEC-004: Stable machine identity (macOS IOPlatformUUID, Linux /etc/machine-id)