spec: add SPEC-010 Cross-Platform Agent Support (macOS and Linux)
Comprehensive specification for expanding agent support beyond Windows: macOS Agent (Priority 1): - ScreenCaptureKit API (macOS 13+) with AVFoundation fallback - CGEvent input injection - VideoToolbox H.264 encoding - NSStatusItem menu bar icon - Universal binary (x86_64 + arm64) - Code signing and notarization Linux Agent (Priority 2): - X11 XShm screen capture with Wayland detection - XTest input injection - VA-API hardware H.264 encoding with software fallback - StatusNotifier system tray - .deb and .rpm packaging Architecture: - Platform abstraction layer (traits for capture/input/encoder/tray) - Refactor existing Windows code behind PlatformCapture/Input/Encoder - No protobuf protocol changes - Same authentication (support codes and agent keys) Estimated effort: X-Large (12-16 weeks) Priority: P2 (market-critical for multi-platform MSP adoption) Updated roadmap: promoted from P3 to P2 with full spec link. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -93,5 +93,5 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
|
||||
|
||||
## Future Considerations
|
||||
|
||||
- [ ] macOS / Linux remote-control agents — P3
|
||||
- [ ] **Cross-platform agent support (macOS and Linux)** — P2 — Enable remote control beyond Windows with native agents for macOS 12+ and Ubuntu 22.04+ LTS. Platform abstraction layer for capture/input/encoding, VideoToolbox (macOS) and VA-API (Linux) H.264 encoding, .app/.deb/.rpm packaging. Unblocks multi-platform MSP adoption. ([SPEC-010](specs/SPEC-010-cross-platform-agents.md))
|
||||
- [ ] Auto-update for the agent — P3
|
||||
|
||||
358
docs/specs/SPEC-010-cross-platform-agents.md
Normal file
358
docs/specs/SPEC-010-cross-platform-agents.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# SPEC-010: Cross-Platform Agent Support (macOS and Linux)
|
||||
|
||||
**Status:** Proposed
|
||||
**Priority:** P2
|
||||
**Requested By:** Mike Swanson (2026-05-30)
|
||||
**Estimated Effort:** X-Large
|
||||
|
||||
## Overview
|
||||
|
||||
Expand GuruConnect's agent support beyond Windows to include macOS and Linux platforms, enabling remote control and support across the full spectrum of enterprise and home computing environments. This cross-platform expansion addresses a critical market gap—ScreenConnect, Splashtop, and AnyDesk all support macOS/Linux, and GuruConnect's Windows-only limitation blocks adoption by multi-platform organizations. The primary technical challenge is abstracting platform-specific screen capture, input injection, and video encoding while maintaining the core protobuf-over-WSS transport and session model. Success criteria: feature parity with the Windows agent (capture, input, tray presence, persistent/support-code modes, H.264 encoding) on macOS 12+ and Ubuntu 22.04+ LTS.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included in v1
|
||||
|
||||
**macOS Agent (Priority 1):**
|
||||
- Screen capture via ScreenCaptureKit API (macOS 13+) with AVFoundation fallback (macOS 12)
|
||||
- Input injection via CGEvent
|
||||
- H.264 encoding via VideoToolbox
|
||||
- Menu bar icon (NSStatusItem) with status and exit option
|
||||
- `guruconnect://` protocol handler registration
|
||||
- .app bundle packaging and code signing
|
||||
- Screen Recording permission prompt and validation
|
||||
- Universal binary (x86_64 + arm64)
|
||||
|
||||
**Linux Agent (Priority 2):**
|
||||
- Screen capture via X11 (XShm) with Wayland pipewire fallback detection
|
||||
- Input injection via XTest (X11) or uinput (Wayland)
|
||||
- H.264 encoding via VA-API (hardware) or software fallback (libx264)
|
||||
- System tray icon via StatusNotifier/AppIndicator
|
||||
- `guruconnect://` protocol handler (.desktop file)
|
||||
- .deb packaging (Ubuntu/Debian) and .rpm (Fedora/RHEL)
|
||||
- x86_64 binary
|
||||
|
||||
**Shared Cross-Platform:**
|
||||
- Unified agent codebase with platform-specific modules behind traits
|
||||
- Same protobuf protocol (no wire format changes)
|
||||
- Same persistent/support-code authentication modes
|
||||
- Same AgentStatus metadata reporting (OS, architecture, uptime, logged-on user)
|
||||
- Viewer compatibility: existing native Windows viewer and web viewer work unchanged
|
||||
|
||||
### Explicitly out of scope
|
||||
|
||||
- **FreeBSD/BSD support** — defer; assess demand post-launch
|
||||
- **Wayland-native screen capture** — v1 uses X11 compatibility layer; native Wayland portal support is Phase 2
|
||||
- **Multi-monitor switching on macOS/Linux** — already deferred on Windows (P2 roadmap item); cross-platform parity follows
|
||||
- **macOS < 12** — 12.x is the minimum (released Oct 2021, 95%+ adoption)
|
||||
- **Linux distros outside Ubuntu 22.04+ / Debian 11+ / Fedora 38+ / RHEL 9+** — official support limited; community builds acceptable
|
||||
|
||||
## Architecture
|
||||
|
||||
### Agent Refactor: Platform Abstraction Layer
|
||||
|
||||
Current Windows agent (`agent/src/`) tightly couples platform APIs. Refactor into:
|
||||
|
||||
```
|
||||
agent/src/
|
||||
├── main.rs # CLI entry + platform dispatch
|
||||
├── session/ # Session logic (platform-agnostic)
|
||||
├── transport/ # WebSocket (unchanged)
|
||||
├── platform/ # NEW: trait definitions
|
||||
│ ├── mod.rs # PlatformCapture, PlatformInput, PlatformTray traits
|
||||
│ ├── windows/ # Windows impl (existing code refactored)
|
||||
│ │ ├── capture.rs # DXGI + GDI behind PlatformCapture
|
||||
│ │ ├── input.rs # Win32 SendInput behind PlatformInput
|
||||
│ │ ├── encoder.rs # Media Foundation H.264
|
||||
│ │ └── tray.rs # Win32 shell tray
|
||||
│ ├── macos/ # NEW: macOS impl
|
||||
│ │ ├── capture.rs # ScreenCaptureKit/AVFoundation
|
||||
│ │ ├── input.rs # CGEvent
|
||||
│ │ ├── encoder.rs # VideoToolbox H.264
|
||||
│ │ └── tray.rs # NSStatusItem (Objective-C via objc2 crate)
|
||||
│ └── linux/ # NEW: Linux impl
|
||||
│ ├── capture.rs # X11 XShm / pipewire detection
|
||||
│ ├── input.rs # XTest / uinput
|
||||
│ ├── encoder.rs # VA-API / libx264
|
||||
│ └── tray.rs # StatusNotifier
|
||||
├── encoder/ # Color conversion, raw fallback (platform-agnostic)
|
||||
├── viewer/ # Native viewer (winit cross-platform, unchanged)
|
||||
├── config.rs # Config (unchanged)
|
||||
└── install.rs # Installation (platform-specific #[cfg] blocks)
|
||||
```
|
||||
|
||||
**Key traits:**
|
||||
|
||||
```rust
|
||||
// agent/src/platform/mod.rs
|
||||
pub trait PlatformCapture: Send {
|
||||
fn capture_frame(&mut self) -> Result<CapturedFrame>;
|
||||
fn get_dimensions(&self) -> (u32, u32);
|
||||
}
|
||||
|
||||
pub trait PlatformInput: Send {
|
||||
fn inject_mouse(&mut self, event: MouseEvent) -> Result<()>;
|
||||
fn inject_keyboard(&mut self, event: KeyboardEvent) -> Result<()>;
|
||||
}
|
||||
|
||||
pub trait PlatformEncoder: Send {
|
||||
fn encode(&mut self, frame: &[u8], width: u32, height: u32) -> Result<Vec<u8>>;
|
||||
fn set_bitrate(&mut self, kbps: u32);
|
||||
}
|
||||
|
||||
pub trait PlatformTray: Send {
|
||||
fn show(&mut self, status: &str) -> Result<()>;
|
||||
fn poll_events(&mut self) -> Vec<TrayEvent>;
|
||||
}
|
||||
```
|
||||
|
||||
### Protobuf Changes
|
||||
|
||||
**None.** The existing `AgentStatus` message already carries OS/arch metadata; viewer consumes frames opaquely. Protocol remains unchanged.
|
||||
|
||||
### Database Schema
|
||||
|
||||
**No migration required.** The `connect_machines` table's `os` and `architecture` fields (part of SPEC-003 inventory) already accommodate macOS/Linux values.
|
||||
|
||||
### Agent Build & Packaging
|
||||
|
||||
**Cargo.toml adjustments:**
|
||||
|
||||
```toml
|
||||
[target.'cfg(target_os = "macos")'.dependencies]
|
||||
core-graphics = "0.23"
|
||||
core-foundation = "0.9"
|
||||
objc2 = "0.5"
|
||||
objc2-foundation = "0.2"
|
||||
objc2-app-kit = "0.2"
|
||||
security-framework = "2.9"
|
||||
|
||||
[target.'cfg(target_os = "linux")'.dependencies]
|
||||
x11 = { version = "2.21", features = ["xlib", "xtest", "xrandr"] }
|
||||
libva = "0.4"
|
||||
```
|
||||
|
||||
**macOS packaging:**
|
||||
|
||||
- `.app` bundle structure: `GuruConnect.app/Contents/MacOS/guruconnect`, `Info.plist` with protocol handler
|
||||
- Code signing: `codesign --sign "Developer ID Application: Arizona Computer Guru LLC"`
|
||||
- Notarization via `notarytool` (Apple requirement for macOS 10.15+)
|
||||
- Universal binary: `cargo build --release --target x86_64-apple-darwin && cargo build --release --target aarch64-apple-darwin && lipo -create ...`
|
||||
|
||||
**Linux packaging:**
|
||||
|
||||
- `.deb` via `cargo-deb`:
|
||||
```toml
|
||||
[package.metadata.deb]
|
||||
maintainer = "AZ Computer Guru <support@azcomputerguru.com>"
|
||||
depends = "libx11-6, libxext6, libxtst6"
|
||||
section = "net"
|
||||
assets = [
|
||||
["target/release/guruconnect", "usr/bin/", "755"],
|
||||
["deploy/linux/guruconnect.desktop", "usr/share/applications/", "644"],
|
||||
]
|
||||
```
|
||||
- `.rpm` via `cargo-generate-rpm`
|
||||
|
||||
### Distribution
|
||||
|
||||
- **Server downloads:** `/downloads/guruconnect-macos.dmg`, `/downloads/guruconnect-linux.deb`, `/downloads/guruconnect-linux.rpm`
|
||||
- **Dashboard detection:** browser User-Agent → suggest correct download
|
||||
- **Auto-update (out of scope for v1):** defer cross-platform updater to Phase 2
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Files to Create
|
||||
|
||||
**macOS:**
|
||||
- `agent/src/platform/macos/capture.rs` — ScreenCaptureKit stream + AVFoundation fallback
|
||||
- `agent/src/platform/macos/input.rs` — CGEventCreateMouseEvent, CGEventCreateKeyboardEvent, CGEventPost
|
||||
- `agent/src/platform/macos/encoder.rs` — VTCompressionSession (VideoToolbox)
|
||||
- `agent/src/platform/macos/tray.rs` — NSStatusBar + NSMenu via objc2
|
||||
- `agent/Info.plist.template` — bundle metadata, protocol handler (`CFBundleURLSchemes`)
|
||||
|
||||
**Linux:**
|
||||
- `agent/src/platform/linux/capture.rs` — X11Display::open, XShmGetImage; pipewire detection fallback (log warning)
|
||||
- `agent/src/platform/linux/input.rs` — XTestFakeMotionEvent, XTestFakeButtonEvent, XTestFakeKeyEvent
|
||||
- `agent/src/platform/linux/encoder.rs` — libva VAConfigAttrib + VABufferType; software fallback via openh264
|
||||
- `agent/src/platform/linux/tray.rs` — dbus org.kde.StatusNotifierItem protocol
|
||||
- `deploy/linux/guruconnect.desktop` — .desktop file with `MimeType=x-scheme-handler/guruconnect`
|
||||
|
||||
**Shared:**
|
||||
- `agent/src/platform/mod.rs` — trait definitions + platform factory `fn create_platform() -> Box<dyn Platform>`
|
||||
- Refactor `agent/src/capture/{dxgi,gdi}.rs` → `agent/src/platform/windows/capture.rs`
|
||||
- Refactor `agent/src/input/{mouse,keyboard}.rs` → `agent/src/platform/windows/input.rs`
|
||||
- Refactor `agent/src/encoder/h264.rs` → `agent/src/platform/windows/encoder.rs` (Media Foundation)
|
||||
|
||||
### Key Logic
|
||||
|
||||
**macOS screen capture (ScreenCaptureKit, macOS 13+):**
|
||||
|
||||
```rust
|
||||
// agent/src/platform/macos/capture.rs
|
||||
use core_graphics::display::CGDisplay;
|
||||
use objc2::rc::Id;
|
||||
use objc2_foundation::NSArray;
|
||||
use objc2_screen_capture_kit::{SCStreamConfiguration, SCStream, SCContentFilter, SCStreamOutputType};
|
||||
|
||||
pub struct MacOSCapture {
|
||||
stream: Id<SCStream>,
|
||||
latest_frame: Arc<Mutex<Option<Vec<u8>>>>,
|
||||
}
|
||||
|
||||
impl PlatformCapture for MacOSCapture {
|
||||
fn capture_frame(&mut self) -> Result<CapturedFrame> {
|
||||
// ScreenCaptureKit delivers frames to delegate callback
|
||||
let frame = self.latest_frame.lock().unwrap().clone()
|
||||
.ok_or(anyhow!("No frame available"))?;
|
||||
Ok(CapturedFrame { data: frame, width: self.width, height: self.height })
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**macOS input injection:**
|
||||
|
||||
```rust
|
||||
// agent/src/platform/macos/input.rs
|
||||
use core_graphics::event::{CGEvent, CGEventType, CGMouseButton, EventField};
|
||||
|
||||
impl PlatformInput for MacOSInput {
|
||||
fn inject_mouse(&mut self, event: MouseEvent) -> Result<()> {
|
||||
let cg_event = match event.button {
|
||||
MouseButton::Left => CGEvent::new_mouse_event(
|
||||
None, CGEventType::LeftMouseDown,
|
||||
CGPoint::new(event.x as f64, event.y as f64),
|
||||
CGMouseButton::Left
|
||||
)?,
|
||||
// ...
|
||||
};
|
||||
cg_event.post(CGEventTapLocation::HID);
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Linux X11 capture:**
|
||||
|
||||
```rust
|
||||
// agent/src/platform/linux/capture.rs
|
||||
use x11::xlib::{XOpenDisplay, XDefaultRootWindow, XGetImage, ZPixmap};
|
||||
use x11::xshm::{XShmAttach, XShmGetImage};
|
||||
|
||||
pub struct LinuxCapture {
|
||||
display: *mut Display,
|
||||
root: Window,
|
||||
shm_info: XShmSegmentInfo,
|
||||
}
|
||||
|
||||
impl PlatformCapture for LinuxCapture {
|
||||
fn capture_frame(&mut self) -> Result<CapturedFrame> {
|
||||
let img = unsafe { XShmGetImage(self.display, self.root, ...) };
|
||||
// Convert XImage BGRA → frame buffer
|
||||
Ok(CapturedFrame { data: converted, width, height })
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Permission handling (macOS Screen Recording):**
|
||||
|
||||
```rust
|
||||
// agent/src/platform/macos/permissions.rs
|
||||
use objc2_av_foundation::AVCaptureDevice;
|
||||
|
||||
pub fn check_screen_recording_permission() -> bool {
|
||||
// macOS 10.15+ requires explicit Screen Recording permission
|
||||
// Trigger prompt on first run; subsequent checks use TCC database status
|
||||
AVCaptureDevice::authorizationStatusForMediaType(AVMediaTypeVideo) == AVAuthorizationStatusAuthorized
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### macOS Security
|
||||
|
||||
- **Screen Recording permission:** User must grant permission in System Settings → Privacy & Security → Screen Recording. Agent detects denial and logs instructions.
|
||||
- **Gatekeeper:** Code signing + notarization required to avoid "unidentified developer" block.
|
||||
- **Hardened runtime:** Enable hardened runtime entitlements (`com.apple.security.cs.allow-unsigned-executable-memory` for JIT, if needed).
|
||||
- **Input injection:** No additional permission for CGEvent (runs as the logged-on user).
|
||||
|
||||
### Linux Security
|
||||
|
||||
- **X11 security:** XTest extension enabled by default; uinput requires `/dev/uinput` write access (typically granted to `input` group).
|
||||
- **Wayland sandboxing:** v1 runs under XWayland (X11 compatibility); native Wayland requires PipeWire portal (user consent per-session).
|
||||
- **AppArmor/SELinux:** Agent may need profile exemptions for screen capture on hardened distros.
|
||||
|
||||
### Authentication
|
||||
|
||||
**No protocol changes.** macOS/Linux agents authenticate with the same support-code or persistent agent key (`AGENT_API_KEY` or per-agent `cak_*` key from SPEC-004). Relay server validates identically.
|
||||
|
||||
### Audit Events
|
||||
|
||||
Existing `events` table logs agent connections with `os` and `architecture` fields. No schema change needed.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- Mock platform traits for capture/input/encoder modules
|
||||
- Test frame conversion (BGRA → RGB, stride alignment)
|
||||
- Test protobuf message serialization (unchanged, but validate on new platforms)
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- **macOS test rig:** macOS 13 VM (UTM or Parallels), run agent, verify viewer connects and displays screen
|
||||
- **Linux test rig:** Ubuntu 22.04 LXC container (X11 forwarded), verify capture + input
|
||||
- **Cross-platform viewer:** existing Windows native viewer connects to macOS/Linux agents
|
||||
|
||||
### Manual Testing Scenarios
|
||||
|
||||
1. **macOS install:** Download `.app`, drag to Applications, launch, grant Screen Recording permission, enter support code, verify viewer connects.
|
||||
2. **Linux install:** `sudo dpkg -i guruconnect.deb`, run `guruconnect --support-code ABC123`, verify tray icon, test remote control.
|
||||
3. **Encoding quality:** Compare H.264 output from VideoToolbox (macOS), VA-API (Linux), Media Foundation (Windows) — validate frame rate and bitrate parity.
|
||||
4. **Input fidelity:** Test full keyboard (including modifiers, function keys) and mouse (click, drag, scroll) on all three platforms.
|
||||
5. **Protocol handler:** Click `guruconnect://connect?session=...` link, verify viewer launches with session ID pre-filled.
|
||||
|
||||
### CI/CD Additions
|
||||
|
||||
- **macOS build:** Gitea Actions runner on macOS (M1 or x86_64), build universal binary, sign, notarize
|
||||
- **Linux build:** Existing `ubuntu-latest` runner, build `.deb` and `.rpm` via cargo-deb/cargo-generate-rpm
|
||||
- **Artifact upload:** Upload binaries to `server/static/downloads/` on release tag
|
||||
|
||||
## Effort Estimate & Dependencies
|
||||
|
||||
**Size:** X-Large (12-16 weeks, 1 developer)
|
||||
|
||||
**Breakdown:**
|
||||
- Platform abstraction refactor (Windows code): 2 weeks
|
||||
- macOS implementation (capture, input, encoder, tray): 4 weeks
|
||||
- Linux implementation (X11 capture, input, VA-API encoder, tray): 3 weeks
|
||||
- Packaging and code signing (macOS notarization, .deb/.rpm): 2 weeks
|
||||
- Testing, documentation, CI/CD integration: 2 weeks
|
||||
- Buffer for platform-specific edge cases: 1-3 weeks
|
||||
|
||||
**Dependencies:**
|
||||
- **SPEC-002 v2 Phase 1 completion** — per-agent keys and secure session core must be stable (already shipped)
|
||||
- **SPEC-004 deterministic machine identity** — macOS/Linux need stable `machine_uid` derivation (macOS: `IOPlatformUUID`, Linux: `/etc/machine-id`)
|
||||
- **Code signing infrastructure:** Apple Developer Program account + Azure Trusted Signing already in place (for Windows); reuse for macOS
|
||||
- **Test infrastructure:** macOS VM or bare-metal test host, Linux LXC container with X11
|
||||
|
||||
**Unblocks:**
|
||||
- Multi-platform MSP adoption (organizations with mixed Windows/macOS/Linux fleets)
|
||||
- Education/developer market (high macOS/Linux penetration)
|
||||
- SPEC-003 machine inventory (macOS/Linux populate same fields)
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Wayland native support timeline?** — v1 uses X11 compatibility (XWayland); when do we commit to PipeWire portal? Defer to Phase 2 pending Wayland adoption metrics.
|
||||
2. **ARM Linux (Raspberry Pi, ARM servers)?** — Ubuntu 22.04 arm64 is viable; add as a build target? Defer to community request.
|
||||
3. **macOS Ventura (13.0) vs. Monterey (12.0) adoption split?** — ScreenCaptureKit (13.0+) is preferred; fallback to AVFoundation for 12.x. Monitor 12.x usage after 6 months; deprecate if <5%.
|
||||
4. **Linux H.264 licensing?** — VA-API uses hardware codecs (no license issue); software fallback via openh264 (Cisco BSD license, royalty-free). Confirm distro packaging allows bundling.
|
||||
5. **System tray on headless Linux servers?** — Agent can run headless (tray is optional); `--no-tray` flag for server deployments.
|
||||
|
||||
---
|
||||
|
||||
**Cross-references:**
|
||||
- ADR-001: GuruConnect is a standalone product (no RMM coupling)
|
||||
- SPEC-002: v2 modernization architecture (agent key model)
|
||||
- SPEC-003: Machine inventory (OS/arch fields)
|
||||
- SPEC-004: Stable machine identity (macOS IOPlatformUUID, Linux /etc/machine-id)
|
||||
Reference in New Issue
Block a user