Files
guru-connect/docs/specs/SPEC-011-mobile-agents.md
azcomputerguru b1862800a1 spec: add SPEC-011 Mobile Agent Support (iOS and Android)
Comprehensive specification for iOS/Android devices as remote control targets:

iOS Agent (View-Only):
- ReplayKit 2 screen capture (user consent required)
- VideoToolbox H.264 encoding
- NO input injection (iOS sandboxing limitation)
- APNs push notifications for session requests
- Foreground-only operation (OS requirement)

Android Agent (View + Control):
- MediaProjection API screen capture (user consent)
- MediaCodec H.264 encoding
- Accessibility Service for input injection (tap/swipe/type)
- FCM push notifications
- Foreground service with persistent notification

Architecture:
- Native Swift/SwiftUI (iOS) and Kotlin/Jetpack Compose (Android) apps
- Same protobuf-over-WSS protocol as desktop agents
- Support-code authentication (persistent mode deferred to Phase 2)
- Minor protobuf additions: MobileCapabilities, TouchEvent
- Server push module: APNs (a2 crate) + FCM HTTP v1

Key constraints:
- Attended-only sessions (user must grant permission)
- Foreground-only (cannot capture in background on either platform)
- iOS view-only (platform sandbox prevents input injection)
- Consent-first model (MediaProjection/ReplayKit user prompts)

Estimated effort: X-Large (16-20 weeks, requires mobile expertise)
Priority: P3

Distinct from GuruRMM SPEC-017 (MDM/inventory) — this is remote
control, not device management.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-30 18:24:16 -07:00

23 KiB

SPEC-011: Mobile Agent Support (iOS and Android as Remote Control Targets)

Status: Proposed Priority: P3 Requested By: Mike Swanson (2026-05-30) Estimated Effort: X-Large (16-20 weeks, requires mobile development expertise)

Overview

Enable remote viewing and control of iOS/Android mobile devices by building native GuruConnect agent apps for the App Store and Google Play. Unlike desktop agents that run persistently in the background, mobile agents operate within OS sandbox constraints: they require user consent to share the screen, must remain in the foreground during sessions, and (on iOS) cannot inject input at all. The primary use case is supervised support sessions—a user opens the app, shares their screen with a technician, and the technician can see the screen (both platforms) and remotely control it (Android only). This is fundamentally different from desktop remote control; it's an attended, consent-driven support tool constrained by mobile OS security models.

Use Cases:

  • Support technician walks a customer through app setup while viewing their mobile screen in real-time
  • Android device troubleshooting with remote control (tap, swipe, type) via Accessibility Service
  • iOS device screen sharing for demonstration or guided support (view-only, user retains control)

Success Criteria:

  • iOS app (iOS 14+) shares screen via ReplayKit with user consent; viewer sees live screen; no input injection
  • Android app (Android 10+) shares screen via MediaProjection and accepts remote input via Accessibility Service
  • Both apps connect using same protobuf-over-WSS protocol and support-code authentication as desktop agents
  • Push notifications wake the app when a support session is requested

Scope

Included in v1

iOS Agent (View-Only):

  • Native Swift/SwiftUI app targeting iOS 14+ and iPadOS 14+
  • Screen capture via ReplayKit 2 (RPScreenRecorder, RPBroadcastSampleHandler)
  • H.264 encoding via VideoToolbox
  • User consent required: "Start Broadcast" button triggers system permission prompt
  • Foreground-only operation (app must remain visible during session)
  • Support-code authentication (6-digit code entry)
  • Push notification (APNs) to alert user of incoming support request
  • NO input injection (iOS sandboxing prevents third-party input; user controls their own device)
  • Displays "Session Active" banner with duration and "Stop Sharing" button

Android Agent (View + Control):

  • Native Kotlin/Jetpack Compose app targeting Android 10+ (API 29+)
  • Screen capture via MediaProjection API (requires user consent per session)
  • H.264 encoding via MediaCodec
  • Input injection via Accessibility Service (user must enable in Settings → Accessibility)
  • Foreground service with persistent notification during session
  • Support-code authentication (6-digit code entry)
  • Push notification (FCM) to alert user of incoming support request
  • Displays ongoing notification: "GuruConnect session active - Tap to stop"

Shared Cross-Platform:

  • Same protobuf protocol (AgentStatus, FrameData, InputEvent) as desktop agents
  • Support-code-only authentication (persistent agent mode deferred to Phase 2)
  • Relay server unchanged (mobile agents are just another platform)
  • Dashboard shows mobile devices with OS icon (iOS/Android) and "Mobile" badge
  • Existing native/web viewers display mobile screens without modification

Explicitly out of scope

  • Persistent/unattended agent mode — v1 is attended-only (user must open the app and consent)
  • iOS input injection — technically impossible without jailbreak or Apple Private APIs (violates App Store guidelines)
  • Background screen capture — both iOS and Android require the app to be foreground during capture
  • File transfer — defer to Phase 2 (not in desktop agents yet per roadmap)
  • Chat — defer to Phase 2 (desktop agents have it, but deprioritized for mobile v1)
  • Multi-device support in single app — one mobile device = one agent instance
  • Tablet-optimized UI — v1 UI is phone-first; iPad/Android tablet use same layout

Architecture

iOS App Structure

GuruConnectMobile-iOS/
├── App/
│   ├── GuruConnectApp.swift          # SwiftUI app entry
│   ├── ContentView.swift              # Main UI (support code entry, status)
│   ├── SessionView.swift              # Active session UI (duration, stop button)
│   └── Info.plist                     # Capabilities, permissions
├── Broadcast/                         # ReplayKit broadcast extension
│   ├── SampleHandler.swift            # RPBroadcastSampleHandler subclass
│   ├── VideoEncoder.swift             # VideoToolbox H.264 encoding
│   └── Info.plist                     # Extension config
├── Shared/
│   ├── Protocol/                      # Protobuf messages (Swift codegen)
│   ├── Transport/                     # WebSocket client (Starscream)
│   ├── Authentication.swift           # Support code validation
│   └── PushNotifications.swift        # APNs registration + handling
└── GuruConnectMobile.xcodeproj

ReplayKit architecture:

  • Main app registers broadcast extension via RPSystemBroadcastPickerView
  • User taps "Start Broadcast" → system shows app picker → user selects GuruConnect
  • Extension (SampleHandler) receives CMSampleBuffers in processSampleBuffer(_:with:)
  • Extension encodes H.264 via VideoToolbox, sends frames to shared App Group container
  • Main app reads from shared container, sends frames via WebSocket to relay server
  • App Group required: shared data between app and extension (group.com.azcomputerguru.guruconnect)

Android App Structure

GuruConnectMobile-Android/
├── app/src/main/
│   ├── java/com/azcomputerguru/guruconnect/
│   │   ├── MainActivity.kt            # Jetpack Compose UI
│   │   ├── SessionActivity.kt         # Active session screen
│   │   ├── ScreenCaptureService.kt    # Foreground service, MediaProjection
│   │   ├── InputAccessibilityService.kt  # AccessibilityService for input injection
│   │   ├── VideoEncoder.kt            # MediaCodec H.264 encoding
│   │   ├── WebSocketClient.kt         # OkHttp WebSocket
│   │   ├── ProtobufHandler.kt         # Protobuf serialization
│   │   └── PushMessagingService.kt    # FCM receiver
│   ├── res/
│   │   ├── layout/                    # XML layouts (if not full Compose)
│   │   ├── values/strings.xml
│   │   └── xml/accessibility_service_config.xml
│   └── AndroidManifest.xml            # Permissions, services
├── proto/                             # Protobuf definitions (shared with server)
└── build.gradle

MediaProjection architecture:

  • User grants MediaProjection permission via MediaProjectionManager.createScreenCaptureIntent()
  • ScreenCaptureService (foreground service) creates VirtualDisplay → frames to ImageReader
  • VideoEncoder encodes frames with MediaCodec (H.264)
  • WebSocketClient sends encoded frames to relay server
  • InputAccessibilityService receives InputEvent protobuf messages, dispatches AccessibilityService.dispatchGesture()

Protobuf Changes

Minor additions to support mobile-specific metadata:

// proto/guruconnect.proto

message AgentStatus {
  // Existing fields...
  optional MobileCapabilities mobile_capabilities = 20;
}

message MobileCapabilities {
  bool supports_input_injection = 1;  // false for iOS, true for Android (if Accessibility enabled)
  bool requires_foreground = 2;       // true for both (can't capture in background)
  bool requires_user_consent = 3;     // true for both (MediaProjection/ReplayKit consent)
}

message InputEvent {
  // Existing MouseEvent/KeyboardEvent...
  optional TouchEvent touch_event = 3;  // NEW: mobile touch events
}

message TouchEvent {
  enum Action {
    DOWN = 0;
    MOVE = 1;
    UP = 2;
  }
  Action action = 1;
  float x = 2;  // normalized 0.0-1.0
  float y = 3;
  int32 pointer_id = 4;  // for multi-touch
}

Database Schema

No migration required. Mobile devices populate existing connect_machines table with:

  • os: "iOS" or "Android"
  • os_version: "17.2.1", "14.0", etc.
  • architecture: "arm64", "aarch64"
  • device_type: "iPhone 15 Pro", "Samsung Galaxy S24", etc. (from device model identifier)

Push Notifications

iOS (APNs):

  • App registers for push on launch: UNUserNotificationCenter.requestAuthorization()
  • Server stores APNs device token in connect_machines.push_token
  • When viewer requests session, server sends APNs push: {"aps": {"alert": "Support session requested", "sound": "default"}, "session_id": "..."}
  • User taps notification → app opens, auto-fills support code, prompts to start broadcast

Android (FCM):

  • App registers with Firebase on launch, uploads FCM token to server
  • Server sends FCM push when session requested
  • User taps notification → MainActivity opens with support code pre-filled

Input Injection (Android Only)

AccessibilityService setup:

  1. User enables service in Settings → Accessibility → GuruConnect → toggle ON
  2. App declares service in AndroidManifest.xml:
    <service android:name=".InputAccessibilityService"
             android:permission="android.permission.BIND_ACCESSIBILITY_SERVICE">
        <intent-filter><action android:name="android.accessibilityservice.AccessibilityService" /></intent-filter>
        <meta-data android:name="android.accessibilityservice.config"
                   android:resource="@xml/accessibility_service_config" />
    </service>
    
  3. During session, relay sends InputEvent (touch/swipe) → service dispatches:
    val path = Path().apply { moveTo(x, y) }
    val gesture = GestureDescription.Builder()
        .addStroke(GestureDescription.StrokeDescription(path, 0, duration))
        .build()
    dispatchGesture(gesture, null, null)
    

iOS: No input injection. Relay server detects mobile_capabilities.supports_input_injection = false and disables input controls in viewer UI (show "View-Only Mode" banner).

Implementation Details

Files to Create

iOS (Swift/SwiftUI):

  • GuruConnectMobile-iOS/App/GuruConnectApp.swift — App entry, scene setup
  • GuruConnectMobile-iOS/App/ContentView.swift — Support code entry, connection status
  • GuruConnectMobile-iOS/App/SessionView.swift — Active session UI (timer, stop button)
  • GuruConnectMobile-iOS/Broadcast/SampleHandler.swift — ReplayKit extension, frame capture
  • GuruConnectMobile-iOS/Broadcast/VideoEncoder.swift — VideoToolbox H.264 encoding
  • GuruConnectMobile-iOS/Shared/Transport/WebSocketClient.swift — Starscream WebSocket wrapper
  • GuruConnectMobile-iOS/Shared/Protocol/Protobuf.swift — Swift protobuf codegen
  • GuruConnectMobile-iOS/Shared/PushNotifications.swift — APNs registration + handling

Android (Kotlin/Jetpack Compose):

  • app/src/main/java/.../MainActivity.kt — Compose UI, support code entry
  • app/src/main/java/.../SessionActivity.kt — Active session screen
  • app/src/main/java/.../ScreenCaptureService.kt — MediaProjection foreground service
  • app/src/main/java/.../InputAccessibilityService.kt — Accessibility service for input
  • app/src/main/java/.../VideoEncoder.kt — MediaCodec H.264 encoding
  • app/src/main/java/.../WebSocketClient.kt — OkHttp WebSocket
  • app/src/main/java/.../ProtobufHandler.kt — Protobuf serialization (protobuf-javalite)
  • app/src/main/java/.../PushMessagingService.kt — FCM message receiver

Server (minor additions):

  • server/src/push/ — NEW module for APNs/FCM push sending
    • server/src/push/apns.rs — APNs HTTP/2 client (via a2 crate)
    • server/src/push/fcm.rs — FCM HTTP v1 client
    • server/src/push/mod.rs — Unified send_push_notification(device_token, session_id) API
  • server/src/api/devices.rs — NEW: POST /api/devices/:id/push-token to store APNs/FCM tokens
  • proto/guruconnect.proto — Add MobileCapabilities and TouchEvent messages

Shared:

  • proto/guruconnect.proto — Update with mobile messages (protobuf source of truth)

Key Logic

iOS ReplayKit screen capture:

// GuruConnectMobile-iOS/Broadcast/SampleHandler.swift
import ReplayKit
import VideoToolbox

class SampleHandler: RPBroadcastSampleHandler {
    var encoder: VideoEncoder?
    var wsClient: WebSocketClient?

    override func broadcastStarted(withSetupInfo setupInfo: [String : NSObject]?) {
        encoder = VideoEncoder()
        wsClient = WebSocketClient(url: "wss://connect.azcomputerguru.com/ws/agent")
        wsClient?.connect(supportCode: setupInfo?["supportCode"] as? String)
    }

    override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
        guard sampleBufferType == .video else { return }

        if let encoded = encoder?.encode(sampleBuffer) {
            let frameData = FrameData(data: encoded, width: 1920, height: 1080)
            wsClient?.send(frameData)
        }
    }
}

Android MediaProjection screen capture:

// ScreenCaptureService.kt
class ScreenCaptureService : Service() {
    private lateinit var mediaProjection: MediaProjection
    private lateinit var virtualDisplay: VirtualDisplay
    private lateinit var imageReader: ImageReader
    private val encoder = VideoEncoder()
    private val wsClient = WebSocketClient()

    override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int {
        val resultCode = intent?.getIntExtra("resultCode", 0) ?: return START_NOT_STICKY
        val data = intent.getParcelableExtra<Intent>("data") ?: return START_NOT_STICKY

        val projection = MediaProjectionManager.getMediaProjection(resultCode, data)
        imageReader = ImageReader.newInstance(1920, 1080, PixelFormat.RGBA_8888, 2)

        virtualDisplay = projection.createVirtualDisplay(
            "GuruConnect",
            1920, 1080, densityDpi,
            DisplayManager.VIRTUAL_DISPLAY_FLAG_AUTO_MIRROR,
            imageReader.surface, null, null
        )

        imageReader.setOnImageAvailableListener({ reader ->
            val image = reader.acquireLatestImage()
            val encoded = encoder.encode(image)
            wsClient.sendFrame(encoded)
            image.close()
        }, backgroundHandler)

        return START_STICKY
    }
}

Android Accessibility Service input injection:

// InputAccessibilityService.kt
class InputAccessibilityService : AccessibilityService() {

    fun injectTouch(x: Float, y: Float, action: TouchEvent.Action) {
        val displayMetrics = resources.displayMetrics
        val absX = x * displayMetrics.widthPixels
        val absY = y * displayMetrics.heightPixels

        val path = Path().apply { moveTo(absX, absY) }
        val duration = if (action == TouchEvent.Action.DOWN || action == TouchEvent.Action.UP) 10L else 50L

        val gesture = GestureDescription.Builder()
            .addStroke(GestureDescription.StrokeDescription(path, 0, duration))
            .build()

        dispatchGesture(gesture, object : GestureResultCallback() {
            override fun onCompleted(gestureDescription: GestureDescription) {
                Log.d("GC", "Touch injected: ($absX, $absY)")
            }
        }, null)
    }

    override fun onAccessibilityEvent(event: AccessibilityEvent?) {}
    override fun onInterrupt() {}
}

Push notification handling (iOS):

// PushNotifications.swift
import UserNotifications

class PushNotificationHandler: NSObject, UNUserNotificationCenterDelegate {
    func userNotificationCenter(_ center: UNUserNotificationCenter,
                                didReceive response: UNNotificationResponse,
                                withCompletionHandler completionHandler: @escaping () -> Void) {
        let userInfo = response.notification.request.content.userInfo
        if let sessionId = userInfo["session_id"] as? String {
            // Navigate to SessionView with pre-filled support code
            NotificationCenter.default.post(name: .sessionRequested, object: sessionId)
        }
        completionHandler()
    }
}

Security Considerations

iOS Security

  • Screen Recording permission: User must explicitly start ReplayKit broadcast; cannot be triggered remotely
  • App Sandbox: Extension runs in separate sandbox from main app; shared data via App Group only
  • APNs authentication: Server uses APNs auth key (.p8 file) with Team ID + Key ID
  • No input injection: Not a security choice—iOS platform limitation (actually a security feature)

Android Security

  • MediaProjection consent: User must grant permission via system dialog; consent required per session (cannot be saved)
  • Accessibility Service risk: Granting Accessibility permission is high-privilege; app must clearly explain why (remote support) and warn user
  • Foreground service: Session runs as foreground service with persistent notification (user always aware)
  • FCM authentication: Server uses FCM service account key (JSON) for authenticated sends

Authentication

Support code only in v1:

  • User enters 6-digit support code from dashboard
  • Agent authenticates via POST /api/auth/support-code (same as desktop agents)
  • Viewer token issued, session begins

Persistent agent mode deferred to Phase 2:

  • Requires secure storage of agent key (iOS Keychain, Android EncryptedSharedPreferences)
  • Requires background keep-alive (iOS: silent push, Android: foreground service)

Privacy

  • Consent-first model: User must actively grant screen sharing permission each session
  • No background capture: OS prevents capturing screen when app is backgrounded (security feature)
  • User can stop anytime: "Stop Sharing" button (iOS) or notification action (Android)

Testing Strategy

Unit Tests

  • iOS: XCTest for protobuf serialization, support code validation
  • Android: JUnit + MockK for input event handling, encoder logic

Integration Tests

  • iOS test rig: iPhone 14 Pro (iOS 17) physical device or simulator (ReplayKit requires real device for broadcast extension)
  • Android test rig: Pixel 6 (Android 14) physical device (MediaProjection requires real device)
  • Test flow: enter support code → grant permissions → viewer connects → verify frames received

Manual Testing Scenarios

  1. iOS attended session:

    • User opens app, enters support code, taps "Start Broadcast", selects GuruConnect
    • Viewer connects, sees iPhone home screen, user navigates Settings
    • Verify: frames display correctly, input controls disabled (view-only banner shown)
    • User taps "Stop Sharing" → session ends gracefully
  2. Android attended session with input:

    • User opens app, enables Accessibility Service in Settings
    • User enters support code, grants MediaProjection permission
    • Viewer connects, remotely taps app icon, swipes, types text
    • Verify: input events execute on device, foreground notification shows
    • User swipes down notification, taps "Stop" → session ends
  3. Push notification wake:

    • Viewer requests session from dashboard
    • Push notification appears on locked phone
    • User taps notification → app opens with support code pre-filled
    • User grants screen sharing → session starts
  4. Low bandwidth: Throttle connection to 1 Mbps, verify H.264 adapts, frames remain usable

App Store / Play Store Review

  • iOS App Store: Requires detailed privacy policy explaining screen recording usage, ReplayKit justification in app review notes
  • Google Play: Requires Accessibility Service usage justification ("remote support for user's own device with their explicit consent")

Effort Estimate & Dependencies

Size: X-Large (16-20 weeks, 1 developer with mobile experience)

Breakdown:

  • iOS app + ReplayKit extension: 5 weeks
  • Android app + MediaProjection service: 4 weeks
  • Android Accessibility Service input injection: 2 weeks
  • Push notification backend (APNs + FCM): 2 weeks
  • Server protobuf additions + mobile capabilities handling: 1 week
  • Viewer UI adjustments (touch event handling, view-only mode): 1 week
  • App Store + Play Store submission, review cycles: 2 weeks
  • Testing, edge cases, OS compatibility: 2 weeks
  • Buffer: 1-3 weeks

Dependencies:

  • Apple Developer Program enrollment ($99/year) — required for APNs + App Store distribution
  • Google Play Developer account ($25 one-time) — required for Play Store distribution
  • Firebase project setup (free tier) — for FCM push notifications
  • SPEC-002 v2 Phase 1 completion — per-agent keys model must be stable (already shipped)
  • Mobile development expertise — Swift/SwiftUI + Kotlin/Jetpack Compose; consider contract hire if not in-house

Unblocks:

  • Mobile support parity with competitors (ScreenConnect, TeamViewer, Splashtop all have mobile agents)
  • "Show me the problem" use case for phone/tablet support
  • BYOD enterprise support (employees request help on personal iOS/Android devices)

Open Questions

  1. iOS view-only limitation — acceptable to market? — Competitors (TeamViewer, Splashtop) also view-only on iOS due to platform constraints. Document prominently in UI/marketing.

  2. Android Accessibility Service friction — how to onboard? — Most users don't know how to enable Accessibility. Need step-by-step wizard with screenshots. Alternatively: offer view-only Android mode (no Accessibility required) as Phase 1, add input in Phase 2.

  3. Foreground-only sessions — user can't multitask? — Correct. iOS/Android stop screen capture when app backgrounds. This is an OS limitation, not a bug. Frame UI as "supervised support session" not "unattended monitoring."

  4. Push notification reliability? — APNs/FCM are best-effort, not guaranteed. If push fails, user can manually open app and enter support code. Fallback: dashboard shows "waiting for device" with code to give user over phone.

  5. Cross-platform viewer compatibility? — Existing native Windows viewer and web viewer already handle arbitrary frame sizes (mobile screens are just smaller). Touch events map to mouse clicks for non-touch-aware viewers.

  6. App Store/Play Store approval risk? — Accessibility Service apps face extra scrutiny on Android. Emphasize "user-initiated remote support" positioning, not "remote monitoring." Provide detailed privacy policy. Low risk if framed correctly.

  7. Multi-touch support? — v1 supports single-touch only (maps to mouse). Multi-touch (pinch-zoom, two-finger gestures) deferred to Phase 2 (requires TouchEvent.pointer_id array).


Cross-references:

  • GuruRMM SPEC-017: Mobile Device Support (MDM, inventory, lock/wipe) — complementary, not overlapping
  • SPEC-002: v2 modernization architecture (per-agent keys)
  • SPEC-010: Cross-platform agents (macOS/Linux) — similar platform abstraction approach
  • ADR-001: GuruConnect is standalone (no RMM coupling for this feature)