spec: add SPEC-010 Cross-Platform Agent Support (macOS and Linux)

Comprehensive specification for expanding agent support beyond Windows:

macOS Agent (Priority 1):
- ScreenCaptureKit API (macOS 13+) with AVFoundation fallback
- CGEvent input injection
- VideoToolbox H.264 encoding
- NSStatusItem menu bar icon
- Universal binary (x86_64 + arm64)
- Code signing and notarization

Linux Agent (Priority 2):
- X11 XShm screen capture with Wayland detection
- XTest input injection
- VA-API hardware H.264 encoding with software fallback
- StatusNotifier system tray
- .deb and .rpm packaging

Architecture:
- Platform abstraction layer (traits for capture/input/encoder/tray)
- Refactor existing Windows code behind PlatformCapture/Input/Encoder
- No protobuf protocol changes
- Same authentication (support codes and agent keys)

Estimated effort: X-Large (12-16 weeks)
Priority: P2 (market-critical for multi-platform MSP adoption)

Updated roadmap: promoted from P3 to P2 with full spec link.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-05-30 18:15:16 -07:00
parent 786d3e47af
commit 5e2325507f
2 changed files with 359 additions and 1 deletions

View File

@@ -93,5 +93,5 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
## Future Considerations
- [ ] macOS / Linux remote-control agents — P3
- [ ] **Cross-platform agent support (macOS and Linux)** — P2 — Enable remote control beyond Windows with native agents for macOS 12+ and Ubuntu 22.04+ LTS. Platform abstraction layer for capture/input/encoding, VideoToolbox (macOS) and VA-API (Linux) H.264 encoding, .app/.deb/.rpm packaging. Unblocks multi-platform MSP adoption. ([SPEC-010](specs/SPEC-010-cross-platform-agents.md))
- [ ] Auto-update for the agent — P3

View File

@@ -0,0 +1,358 @@
# SPEC-010: Cross-Platform Agent Support (macOS and Linux)
**Status:** Proposed
**Priority:** P2
**Requested By:** Mike Swanson (2026-05-30)
**Estimated Effort:** X-Large
## Overview
Expand GuruConnect's agent support beyond Windows to include macOS and Linux platforms, enabling remote control and support across the full spectrum of enterprise and home computing environments. This cross-platform expansion addresses a critical market gap—ScreenConnect, Splashtop, and AnyDesk all support macOS/Linux, and GuruConnect's Windows-only limitation blocks adoption by multi-platform organizations. The primary technical challenge is abstracting platform-specific screen capture, input injection, and video encoding while maintaining the core protobuf-over-WSS transport and session model. Success criteria: feature parity with the Windows agent (capture, input, tray presence, persistent/support-code modes, H.264 encoding) on macOS 12+ and Ubuntu 22.04+ LTS.
## Scope
### Included in v1
**macOS Agent (Priority 1):**
- Screen capture via ScreenCaptureKit API (macOS 13+) with AVFoundation fallback (macOS 12)
- Input injection via CGEvent
- H.264 encoding via VideoToolbox
- Menu bar icon (NSStatusItem) with status and exit option
- `guruconnect://` protocol handler registration
- .app bundle packaging and code signing
- Screen Recording permission prompt and validation
- Universal binary (x86_64 + arm64)
**Linux Agent (Priority 2):**
- Screen capture via X11 (XShm) with Wayland pipewire fallback detection
- Input injection via XTest (X11) or uinput (Wayland)
- H.264 encoding via VA-API (hardware) or software fallback (libx264)
- System tray icon via StatusNotifier/AppIndicator
- `guruconnect://` protocol handler (.desktop file)
- .deb packaging (Ubuntu/Debian) and .rpm (Fedora/RHEL)
- x86_64 binary
**Shared Cross-Platform:**
- Unified agent codebase with platform-specific modules behind traits
- Same protobuf protocol (no wire format changes)
- Same persistent/support-code authentication modes
- Same AgentStatus metadata reporting (OS, architecture, uptime, logged-on user)
- Viewer compatibility: existing native Windows viewer and web viewer work unchanged
### Explicitly out of scope
- **FreeBSD/BSD support** — defer; assess demand post-launch
- **Wayland-native screen capture** — v1 uses X11 compatibility layer; native Wayland portal support is Phase 2
- **Multi-monitor switching on macOS/Linux** — already deferred on Windows (P2 roadmap item); cross-platform parity follows
- **macOS < 12** — 12.x is the minimum (released Oct 2021, 95%+ adoption)
- **Linux distros outside Ubuntu 22.04+ / Debian 11+ / Fedora 38+ / RHEL 9+** — official support limited; community builds acceptable
## Architecture
### Agent Refactor: Platform Abstraction Layer
Current Windows agent (`agent/src/`) tightly couples platform APIs. Refactor into:
```
agent/src/
├── main.rs # CLI entry + platform dispatch
├── session/ # Session logic (platform-agnostic)
├── transport/ # WebSocket (unchanged)
├── platform/ # NEW: trait definitions
│ ├── mod.rs # PlatformCapture, PlatformInput, PlatformTray traits
│ ├── windows/ # Windows impl (existing code refactored)
│ │ ├── capture.rs # DXGI + GDI behind PlatformCapture
│ │ ├── input.rs # Win32 SendInput behind PlatformInput
│ │ ├── encoder.rs # Media Foundation H.264
│ │ └── tray.rs # Win32 shell tray
│ ├── macos/ # NEW: macOS impl
│ │ ├── capture.rs # ScreenCaptureKit/AVFoundation
│ │ ├── input.rs # CGEvent
│ │ ├── encoder.rs # VideoToolbox H.264
│ │ └── tray.rs # NSStatusItem (Objective-C via objc2 crate)
│ └── linux/ # NEW: Linux impl
│ ├── capture.rs # X11 XShm / pipewire detection
│ ├── input.rs # XTest / uinput
│ ├── encoder.rs # VA-API / libx264
│ └── tray.rs # StatusNotifier
├── encoder/ # Color conversion, raw fallback (platform-agnostic)
├── viewer/ # Native viewer (winit cross-platform, unchanged)
├── config.rs # Config (unchanged)
└── install.rs # Installation (platform-specific #[cfg] blocks)
```
**Key traits:**
```rust
// agent/src/platform/mod.rs
pub trait PlatformCapture: Send {
fn capture_frame(&mut self) -> Result<CapturedFrame>;
fn get_dimensions(&self) -> (u32, u32);
}
pub trait PlatformInput: Send {
fn inject_mouse(&mut self, event: MouseEvent) -> Result<()>;
fn inject_keyboard(&mut self, event: KeyboardEvent) -> Result<()>;
}
pub trait PlatformEncoder: Send {
fn encode(&mut self, frame: &[u8], width: u32, height: u32) -> Result<Vec<u8>>;
fn set_bitrate(&mut self, kbps: u32);
}
pub trait PlatformTray: Send {
fn show(&mut self, status: &str) -> Result<()>;
fn poll_events(&mut self) -> Vec<TrayEvent>;
}
```
### Protobuf Changes
**None.** The existing `AgentStatus` message already carries OS/arch metadata; viewer consumes frames opaquely. Protocol remains unchanged.
### Database Schema
**No migration required.** The `connect_machines` table's `os` and `architecture` fields (part of SPEC-003 inventory) already accommodate macOS/Linux values.
### Agent Build & Packaging
**Cargo.toml adjustments:**
```toml
[target.'cfg(target_os = "macos")'.dependencies]
core-graphics = "0.23"
core-foundation = "0.9"
objc2 = "0.5"
objc2-foundation = "0.2"
objc2-app-kit = "0.2"
security-framework = "2.9"
[target.'cfg(target_os = "linux")'.dependencies]
x11 = { version = "2.21", features = ["xlib", "xtest", "xrandr"] }
libva = "0.4"
```
**macOS packaging:**
- `.app` bundle structure: `GuruConnect.app/Contents/MacOS/guruconnect`, `Info.plist` with protocol handler
- Code signing: `codesign --sign "Developer ID Application: Arizona Computer Guru LLC"`
- Notarization via `notarytool` (Apple requirement for macOS 10.15+)
- Universal binary: `cargo build --release --target x86_64-apple-darwin && cargo build --release --target aarch64-apple-darwin && lipo -create ...`
**Linux packaging:**
- `.deb` via `cargo-deb`:
```toml
[package.metadata.deb]
maintainer = "AZ Computer Guru <support@azcomputerguru.com>"
depends = "libx11-6, libxext6, libxtst6"
section = "net"
assets = [
["target/release/guruconnect", "usr/bin/", "755"],
["deploy/linux/guruconnect.desktop", "usr/share/applications/", "644"],
]
```
- `.rpm` via `cargo-generate-rpm`
### Distribution
- **Server downloads:** `/downloads/guruconnect-macos.dmg`, `/downloads/guruconnect-linux.deb`, `/downloads/guruconnect-linux.rpm`
- **Dashboard detection:** browser User-Agent → suggest correct download
- **Auto-update (out of scope for v1):** defer cross-platform updater to Phase 2
## Implementation Details
### Files to Create
**macOS:**
- `agent/src/platform/macos/capture.rs` — ScreenCaptureKit stream + AVFoundation fallback
- `agent/src/platform/macos/input.rs` — CGEventCreateMouseEvent, CGEventCreateKeyboardEvent, CGEventPost
- `agent/src/platform/macos/encoder.rs` — VTCompressionSession (VideoToolbox)
- `agent/src/platform/macos/tray.rs` — NSStatusBar + NSMenu via objc2
- `agent/Info.plist.template` — bundle metadata, protocol handler (`CFBundleURLSchemes`)
**Linux:**
- `agent/src/platform/linux/capture.rs` — X11Display::open, XShmGetImage; pipewire detection fallback (log warning)
- `agent/src/platform/linux/input.rs` — XTestFakeMotionEvent, XTestFakeButtonEvent, XTestFakeKeyEvent
- `agent/src/platform/linux/encoder.rs` — libva VAConfigAttrib + VABufferType; software fallback via openh264
- `agent/src/platform/linux/tray.rs` — dbus org.kde.StatusNotifierItem protocol
- `deploy/linux/guruconnect.desktop` — .desktop file with `MimeType=x-scheme-handler/guruconnect`
**Shared:**
- `agent/src/platform/mod.rs` — trait definitions + platform factory `fn create_platform() -> Box<dyn Platform>`
- Refactor `agent/src/capture/{dxgi,gdi}.rs` → `agent/src/platform/windows/capture.rs`
- Refactor `agent/src/input/{mouse,keyboard}.rs` → `agent/src/platform/windows/input.rs`
- Refactor `agent/src/encoder/h264.rs` → `agent/src/platform/windows/encoder.rs` (Media Foundation)
### Key Logic
**macOS screen capture (ScreenCaptureKit, macOS 13+):**
```rust
// agent/src/platform/macos/capture.rs
use core_graphics::display::CGDisplay;
use objc2::rc::Id;
use objc2_foundation::NSArray;
use objc2_screen_capture_kit::{SCStreamConfiguration, SCStream, SCContentFilter, SCStreamOutputType};
pub struct MacOSCapture {
stream: Id<SCStream>,
latest_frame: Arc<Mutex<Option<Vec<u8>>>>,
}
impl PlatformCapture for MacOSCapture {
fn capture_frame(&mut self) -> Result<CapturedFrame> {
// ScreenCaptureKit delivers frames to delegate callback
let frame = self.latest_frame.lock().unwrap().clone()
.ok_or(anyhow!("No frame available"))?;
Ok(CapturedFrame { data: frame, width: self.width, height: self.height })
}
}
```
**macOS input injection:**
```rust
// agent/src/platform/macos/input.rs
use core_graphics::event::{CGEvent, CGEventType, CGMouseButton, EventField};
impl PlatformInput for MacOSInput {
fn inject_mouse(&mut self, event: MouseEvent) -> Result<()> {
let cg_event = match event.button {
MouseButton::Left => CGEvent::new_mouse_event(
None, CGEventType::LeftMouseDown,
CGPoint::new(event.x as f64, event.y as f64),
CGMouseButton::Left
)?,
// ...
};
cg_event.post(CGEventTapLocation::HID);
Ok(())
}
}
```
**Linux X11 capture:**
```rust
// agent/src/platform/linux/capture.rs
use x11::xlib::{XOpenDisplay, XDefaultRootWindow, XGetImage, ZPixmap};
use x11::xshm::{XShmAttach, XShmGetImage};
pub struct LinuxCapture {
display: *mut Display,
root: Window,
shm_info: XShmSegmentInfo,
}
impl PlatformCapture for LinuxCapture {
fn capture_frame(&mut self) -> Result<CapturedFrame> {
let img = unsafe { XShmGetImage(self.display, self.root, ...) };
// Convert XImage BGRA → frame buffer
Ok(CapturedFrame { data: converted, width, height })
}
}
```
**Permission handling (macOS Screen Recording):**
```rust
// agent/src/platform/macos/permissions.rs
use objc2_av_foundation::AVCaptureDevice;
pub fn check_screen_recording_permission() -> bool {
// macOS 10.15+ requires explicit Screen Recording permission
// Trigger prompt on first run; subsequent checks use TCC database status
AVCaptureDevice::authorizationStatusForMediaType(AVMediaTypeVideo) == AVAuthorizationStatusAuthorized
}
```
## Security Considerations
### macOS Security
- **Screen Recording permission:** User must grant permission in System Settings → Privacy & Security → Screen Recording. Agent detects denial and logs instructions.
- **Gatekeeper:** Code signing + notarization required to avoid "unidentified developer" block.
- **Hardened runtime:** Enable hardened runtime entitlements (`com.apple.security.cs.allow-unsigned-executable-memory` for JIT, if needed).
- **Input injection:** No additional permission for CGEvent (runs as the logged-on user).
### Linux Security
- **X11 security:** XTest extension enabled by default; uinput requires `/dev/uinput` write access (typically granted to `input` group).
- **Wayland sandboxing:** v1 runs under XWayland (X11 compatibility); native Wayland requires PipeWire portal (user consent per-session).
- **AppArmor/SELinux:** Agent may need profile exemptions for screen capture on hardened distros.
### Authentication
**No protocol changes.** macOS/Linux agents authenticate with the same support-code or persistent agent key (`AGENT_API_KEY` or per-agent `cak_*` key from SPEC-004). Relay server validates identically.
### Audit Events
Existing `events` table logs agent connections with `os` and `architecture` fields. No schema change needed.
## Testing Strategy
### Unit Tests
- Mock platform traits for capture/input/encoder modules
- Test frame conversion (BGRA → RGB, stride alignment)
- Test protobuf message serialization (unchanged, but validate on new platforms)
### Integration Tests
- **macOS test rig:** macOS 13 VM (UTM or Parallels), run agent, verify viewer connects and displays screen
- **Linux test rig:** Ubuntu 22.04 LXC container (X11 forwarded), verify capture + input
- **Cross-platform viewer:** existing Windows native viewer connects to macOS/Linux agents
### Manual Testing Scenarios
1. **macOS install:** Download `.app`, drag to Applications, launch, grant Screen Recording permission, enter support code, verify viewer connects.
2. **Linux install:** `sudo dpkg -i guruconnect.deb`, run `guruconnect --support-code ABC123`, verify tray icon, test remote control.
3. **Encoding quality:** Compare H.264 output from VideoToolbox (macOS), VA-API (Linux), Media Foundation (Windows) — validate frame rate and bitrate parity.
4. **Input fidelity:** Test full keyboard (including modifiers, function keys) and mouse (click, drag, scroll) on all three platforms.
5. **Protocol handler:** Click `guruconnect://connect?session=...` link, verify viewer launches with session ID pre-filled.
### CI/CD Additions
- **macOS build:** Gitea Actions runner on macOS (M1 or x86_64), build universal binary, sign, notarize
- **Linux build:** Existing `ubuntu-latest` runner, build `.deb` and `.rpm` via cargo-deb/cargo-generate-rpm
- **Artifact upload:** Upload binaries to `server/static/downloads/` on release tag
## Effort Estimate & Dependencies
**Size:** X-Large (12-16 weeks, 1 developer)
**Breakdown:**
- Platform abstraction refactor (Windows code): 2 weeks
- macOS implementation (capture, input, encoder, tray): 4 weeks
- Linux implementation (X11 capture, input, VA-API encoder, tray): 3 weeks
- Packaging and code signing (macOS notarization, .deb/.rpm): 2 weeks
- Testing, documentation, CI/CD integration: 2 weeks
- Buffer for platform-specific edge cases: 1-3 weeks
**Dependencies:**
- **SPEC-002 v2 Phase 1 completion** — per-agent keys and secure session core must be stable (already shipped)
- **SPEC-004 deterministic machine identity** — macOS/Linux need stable `machine_uid` derivation (macOS: `IOPlatformUUID`, Linux: `/etc/machine-id`)
- **Code signing infrastructure:** Apple Developer Program account + Azure Trusted Signing already in place (for Windows); reuse for macOS
- **Test infrastructure:** macOS VM or bare-metal test host, Linux LXC container with X11
**Unblocks:**
- Multi-platform MSP adoption (organizations with mixed Windows/macOS/Linux fleets)
- Education/developer market (high macOS/Linux penetration)
- SPEC-003 machine inventory (macOS/Linux populate same fields)
## Open Questions
1. **Wayland native support timeline?** — v1 uses X11 compatibility (XWayland); when do we commit to PipeWire portal? Defer to Phase 2 pending Wayland adoption metrics.
2. **ARM Linux (Raspberry Pi, ARM servers)?** — Ubuntu 22.04 arm64 is viable; add as a build target? Defer to community request.
3. **macOS Ventura (13.0) vs. Monterey (12.0) adoption split?** — ScreenCaptureKit (13.0+) is preferred; fallback to AVFoundation for 12.x. Monitor 12.x usage after 6 months; deprecate if <5%.
4. **Linux H.264 licensing?** — VA-API uses hardware codecs (no license issue); software fallback via openh264 (Cisco BSD license, royalty-free). Confirm distro packaging allows bundling.
5. **System tray on headless Linux servers?** — Agent can run headless (tray is optional); `--no-tray` flag for server deployments.
---
**Cross-references:**
- ADR-001: GuruConnect is a standalone product (no RMM coupling)
- SPEC-002: v2 modernization architecture (agent key model)
- SPEC-003: Machine inventory (OS/arch fields)
- SPEC-004: Stable machine identity (macOS IOPlatformUUID, Linux /etc/machine-id)