azcomputerguru/guru-connect

Fork 0

Files

Mike Swanson c736a710a1

Build and Test / Build Server (Linux) (push) Failing after 3m43s

Details

Build and Test / Build Agent (Windows) (push) Successful in 7m43s

Details

Build and Test / Security Audit (push) Successful in 4m57s

Details

Build and Test / Build Summary (push) Has been skipped

Details

docs: record Tasks 3-5 code review (APPROVE-WITH-FIXES) in plan status

Formal review on GURU-5070: cargo fmt/clippy/test green (89 tests, 0 warnings);
the 3 audit CRITICALs verified closed with no bypass; all security paths fail
closed. Non-blocking follow-ups tracked (viewer-token logout revocation, delete
dead validate_agent_key placeholder, X-Real-IP/log hygiene). Remaining for
Phase-1 exit: Task 8 e2e verification + /gc-audit security re-audit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-30 18:14:02 -07:00

38 KiB

Raw Blame History

v2 Secure Session Core — Implementation Plan

STATUS 2026-05-30: Tasks 1–7 IMPLEMENTED + DEPLOYED. Tasks 3–5 now CODE-REVIEWED — verdict APPROVE-WITH-FIXES (no CRITICAL/HIGH). Compile-verified on GURU-5070: cargo fmt --check clean, clippy -D warnings 0 warnings, cargo test --workspace 89 pass. The 3 audit CRITICALs verified closed with no bypass; all security paths fail closed. Non-blocking follow-ups tracked: viewer-token logout revocation (MEDIUM, TTL-bounded), delete the dead validate_agent_key "accept-any" placeholder (MEDIUM), X-Real-IP/consent-comment/support-code-log hygiene (LOW). Remaining for Phase-1 exit: Task 8 (e2e verification + /gc-audit --pass=security re-audit).

Spec created: 2026-05-29 Status: in progress — Tasks 1-4 IMPLEMENTED 2026-05-29 (Task 4 self-reviewed, pending Code Review; Tasks 1-3 code-reviewed APPROVED). Task 4 completes the KEYSTONE (secure auth/session core). Viewer-token authz STRENGTH split IMPLEMENTED 2026-05-29 (self-reviewed; no Rust toolchain on this machine — not yet cargo check-verified; pending Code Review). This was the REQUIRED Phase-1-exit follow-up: the gate previously used view (held by EVERY default role incl. viewer) but a viewer token granted input CONTROL. DECIDED (Mike, 2026-05-29) + IMPLEMENTED: SPLIT VIEW_ONLY/CONTROL tokens — view-perm users get a watch-only token (relay refuses their input), admin/control users get a control token. See the "Task 3 authz-strength fix" block under Task 3 below. Resolves coord todo c8916c89 (coordinator marks done after review). Remaining follow-up: nothing revokes a minted viewer token on logout (bounded by 5-min TTL) — follow-up todo. Task 4 (rate limiting + single-use codes) next. CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2 fixed only the token mechanism; the authz gate is what actually closes audit CRITICAL #1. Policy DECIDED (Mike, 2026-05-29): admin-or-view-permission (is_admin() || has_permission(...)). Parent: docs/specs/SPEC-002-v2-modernization-architecture.md (Phase 1) Keystone: Tasks 1–4 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH is closed there. Tasks 5–7 deliver the product capability on top. Do them in order.

Task 0: Commit this spec

Commit the specs/v2-secure-session-core/ directory before writing any code:

git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"

Do not start Task 1 until this commit exists.

Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables

[DONE] migration 004_v2_secure_session_core.sql + db/agent_keys.rs + db/tenancy.rs + struct/query updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db layer already uses runtime sqlx::query() (no macros) — the v2 "switch to runtime" was already true.

Files touched: server/migrations/ (new v2 migration files), server/src/db/ (rebuilt modules: agent_keys.rs [new], sessions.rs, machines.rs, support_codes.rs, events.rs, users.rs, mod.rs).

New table connect_agent_keys: id UUID, machine_id UUID FK, key_hash TEXT, tenant_id UUID NULL, created_at, last_used_at, revoked_at TIMESTAMPTZ NULL. Keys are cak_-prefixed, stored hashed (SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
Add nullable tenant_id UUID to machines, sessions, support_codes, events, users, connect_agent_keys, defaulting to a single bootstrap tenant row. Add a tenants table with one seed row.
sessions gains is_managed BOOLEAN, source TEXT ('standalone'|'gururmm'), consent_state TEXT ('not_required'|'pending'|'granted'|'denied'), tenant_id.
support_codes: add single-use semantics — consumed_at TIMESTAMPTZ NULL; widen the code column to hold a higher-entropy human-readable code (see Task 4).
Migrations are idempotent (CREATE TABLE IF NOT EXISTS / ADD COLUMN IF NOT EXISTS), applied on server startup, recorded in _sqlx_migrations. New queries use runtime sqlx::query().
DB modules expose a tenancy helper that today resolves every call to the default tenant (Phase-4 switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).

Reference: specs/native-remote-control/plan.md Task 2 (connect_agent_keys); .claude/standards/gururmm/sqlx-migrations.md.

Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens

CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by AuthenticatedUser (authentication, not authorization). GC has a real admin|operator|viewer role + permissions model, so this is intra- tenant privilege escalation until a permission check is added. The mechanism is fixed here; the authz check in Task 3 is what closes audit CRITICAL #1. Metadata bug todo faf39fe0 resolved (migration 005).

[IMPLEMENTED] auth/agent_keys.rs [new] (cak_ mint/SHA-256 hash/verify), auth/jwt.rs (ViewerClaims + create_viewer_token/validate_viewer_token, 5-min TTL, purpose:"viewer"), auth/mod.rs (module + re-export). Deleted the JWT-as-agent-key branch in relay/mod.rs validate_agent_api_key — now per-agent cak_ key OR deprecated shared AGENT_API_KEY (WARNING-logged), never a user JWT. New endpoints: POST/GET /api/machines/:agent_id/keys, DELETE /api/machines/:agent_id/keys/:key_id (admin), POST /api/sessions/:id/viewer-token (dashboard JWT). db helpers added: agent_keys::{list_for_machine,key_belongs_to_machine}. Folded in migration 005_machine_metadata.sql + Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust toolchain on this machine — self-reviewed; not yet cargo check-verified.

Files touched: server/src/auth/ (mod.rs, jwt.rs, agent_keys.rs [new], token_blacklist.rs, password.rs), server/src/api/auth.rs, server/src/api/sessions.rs.

Delete the JWT-as-agent-key path. Remove the jwt_config.validate_token branch from validate_agent_api_key (relay/mod.rs:224); agent authentication validates a cak_ per-agent key (hash compare against connect_agent_keys, reject if revoked_at set) OR a support code — never a user JWT.
Session-scoped viewer tokens: new endpoint POST /api/sessions/:id/viewer-token (auth: dashboard JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose claims include session_id and tenant_id. This replaces "any dashboard JWT can view any session."
Keep Argon2id passwords; keep the blacklist but make is_revoked() callable from the WS layer (Task 3).
Per-agent key issuance endpoint (admin): POST /api/machines/:id/keys → returns plaintext cak_ once, stores hash. DELETE /api/machines/:id/keys/:key_id sets revoked_at.

Reference: relay/mod.rs:224 (validate_agent_api_key — the CRITICAL), auth/mod.rs:116 (blacklist already consulted for REST — extend to WS), specs/native-remote-control/plan.md Tasks 2/3/6; .claude/standards/security/credential-handling.md, .claude/standards/api/response-format.md.

Task 3 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified]: Secure relay WS handlers + bounded relay

[IMPLEMENTED] Viewer WS now verifies the session-scoped VIEWER token (validate_viewer_token: sig+exp+purpose) + token_blacklist.is_revoked + session_id claim == requested session, before upgrade — raw login JWTs are no longer accepted (closes CRITICAL #1 mechanism + #2). Authz gate added to mint_viewer_token: is_admin() || has_permission("view") → 403 envelope on failure (closes CRITICAL #1 — uses the EXISTING view permission from GC's catalog; no new permission defined). Agent WS now binds persistent reattach to the authenticated machine identity: validate_agent_api_key returns an AgentKeyAuth enum carrying the cak_ key's machine agent_id (resolved via new db::machines::get_machine_by_id); a mismatched query-string agent_id is ignored, a per-agent key whose machine can't be resolved fails closed (503). Frame caps set on BOTH upgrades (agent 4 MiB, viewer 64 KiB via max_message_size/max_frame_size) (closes WS-OOM HIGH). Viewer→agent input throttled to 200 events/sec/viewer via a refilling token bucket + non-blocking bounded try_send (drop/coalesce on overflow) (closes input-injection MEDIUM). Startup managed-session reconcile retained + clarified (persistent machines → offline in-memory sessions). Removed #[allow(dead_code)] on validate_viewer_token and AuthenticatedUser::has_permission. No token/secret logged; runtime sqlx::query/query_as. Files: server/src/relay/mod.rs, server/src/api/sessions.rs, server/src/db/machines.rs, server/src/auth/mod.rs, server/src/auth/jwt.rs, server/src/main.rs.

Task 3 authz-strength fix — VIEW_ONLY/CONTROL token split [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]

Closes audit CRITICAL #1 at full strength (coord todo c8916c89). The Task-3 gate minted a viewer token for any is_admin() || has_permission("view") user, but view is held by EVERY default role (incl. viewer) and the token granted input CONTROL — intra-tenant privilege escalation. Now the token carries an ACCESS MODE inside its signed claims and the relay enforces it:

auth/jwt.rs: new ViewerAccess enum (ViewOnly | Control, serde-renamed to "view_only"/"control"); ViewerClaims gains an access: ViewerAccess field; create_viewer_token(..., access) stamps it; validate_viewer_token returns it as part of the claims (sig+exp+purpose checks unchanged). New unit tests cover the round-trip, the lowercase wire form, and login-JWT rejection.

auth/mod.rs: re-export ViewerAccess.

api/sessions.rs (mint_viewer_token): TIERED mint — is_admin() || has_permission("control") → CONTROL token; else has_permission("view") → VIEW_ONLY token; else → 403 (standard envelope). Permission constants SESSION_CONTROL_PERMISSION="control" / SESSION_VIEW_PERMISSION="view". Response echoes access (advisory; the signed claim is authoritative).

relay/mod.rs: viewer_ws_handler reads claims.access from the VERIFIED token and threads it into handle_viewer_connection (new access: ViewerAccess param). In the input path, a view-only token's MouseEvent/KeyEvent/SpecialKey are refused (a guarded match arm if !access.can_control() that silently drops + logs once-per- power-of-two), BEFORE the throttle/try_send. A control token forwards as before (with the Task-3 throttle). Video still streams to a view-only viewer; chat (not an injected- input vector) is still relayed. The mode cannot be forged — it lives in the signed token.

Everything else from Task 3 (session_id-claim match, blacklist, frame caps, throttle, agent identity binding) is intact — this is purely additive access-mode enforcement.

PHASE-2 REFINEMENT: this refuses to FORWARD input for a view-only token; it does NOT yet tie the viewer mode to the agent-side SessionType.VIEW_ONLY capture mode (the agent still does full capture). Deferred (deeper agent change).

Files touched: server/src/relay/mod.rs, server/src/session/mod.rs.

viewer_ws_handler (relay/mod.rs:242): verify the viewer token's signature + expiry + blacklist + session_id claim == requested session_id before handle_viewer_connection (relay/mod.rs:595). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit CRITICAL #1. The minting endpoint POST /api/sessions/:id/viewer-token (in server/src/api/sessions.rs) must enforce a real permission predicate, not just AuthenticatedUser: user.is_admin() || user.has_permission(<policy>). GC's role model (admin|operator|viewer) + permissions table already exist (server/src/auth/mod.rs), so honoring the intra-tenant role distinction is cheap. Policy DECIDED (Mike, 2026-05-29): admin-or-view-permission — user.is_admin() || user.has_permission(<the session-view/control permission in GC's catalog — use the real name; define one if absent>). Enforce at the minting endpoint mint_viewer_token (the authz decision point); the WS then trusts the session-scoped token. Multi-tenant client-access isolation stays deferred to Phase 4.
agent_ws_handler (relay/mod.rs:55): authenticate via per-agent key OR support code only (Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string agent_id alone (session/mod.rs:98).
Frame caps: set explicit .max_message_size(...)/.max_frame_size(...) on both WebSocketUpgrades; reject oversized frames before to_vec()/broadcast. (Fixes WS-OOM HIGH.)
Input throttle: bound + rate-limit the viewer→agent input queue (relay/mod.rs:669); cap events/sec.
Reconcile managed sessions from DB on startup so they aren't orphaned.

Reference: audit Pass E (reports/2026-05-29-gc-audit.md §"Pass 5"); relay/mod.rs:242,55,595,669.

Task 4 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Working rate limiting + single-use support codes

[IMPLEMENTED] Closes the keystone (Tasks 1–4). Three parts:

A. RATE LIMITING — replaced the non-compiling tower_governor layer with a small self-contained in-memory limiter (middleware/rate_limit.rs): a per-IP fixed-window RateLimiter (Mutex<HashMap<IpAddr, Window>>, no new dep) + a per-IP consecutive-failure FailureLockout, bundled as RateLimitState in AppState. Keyed by ConnectInfo<SocketAddr> IP (same source the relay uses); X-Forwarded-For intentionally NOT trusted (proxy-spoofable). 429 with the standard error envelope on limit. Re-enabled pub mod rate_limit. Wired per-route via route_layer(from_fn_with_state(...)) onto POST /api/auth/login (8/min/IP), POST /api/auth/change-password (5/min/IP), and GET /api/codes/:code/validate (15/min/IP). Named consts for every limit. LOCKOUT: after 10 consecutive failed code-validations from an IP, that IP is locked out 15 min; the validate handler reports success/failure into the lockout, the middleware enforces it BEFORE the handler runs. Unit tests cover window allow/block/reset, per-IP isolation, and lockout trip/reset/expire (clock injected, no sleeps).

B. SINGLE-USE CODES — the agent bind path now CONSUMES the code atomically on first bind. In-memory: new SupportCodeManager::consume_for_bind accepts ONLY a Pending code and flips it to Connected under the write lock (a 2nd presenter loses the race → rejected). This replaces the v1 pre-upgrade check that accepted pending OR connected (the reusable-code HIGH). DB: new db::support_codes::consume_code_for_bind — a single conditional UPDATE ... SET consumed_at = NOW(), status='connected' WHERE code=$1 AND consumed_at IS NULL AND status='pending' AND (expires_at IS NULL OR expires_at > NOW()) RETURNING id; zero rows ⇒ not consumable. The in-memory consume is AUTHORITATIVE (the live source of truth); the DB UPDATE is a durable/audit mirror applied best-effort after it (a missing DB row does not veto a bind the in-memory layer admitted). To make the durable record meaningful, the portal create_code handler now also inserts the code into connect_support_codes. Validate/preview path is UNCHANGED and explicitly does NOT consume (test preview_validate_does_not_consume).

C. WIDER CODE — replaced the 6-digit numeric generator with a grouped base32-style code XXX-XXX-XXX (9 symbols over a 31-char UNAMBIGUOUS alphabet excluding 0/O/1/I/L ≈ 44.6 bits), CSPRNG-backed (OsRng, rejection sampling to avoid modulo bias). The new code (11 chars incl. hyphens) does NOT fit the VARCHAR(10) column from migration 001, so migration 006_widen_support_code.sql widens connect_support_codes.code AND connect_sessions.support_code to TEXT (idempotent). Unit tests cover shape, charset (no ambiguous chars), and practical uniqueness.

DEPS: none added; tower_governor REMOVED from Cargo.toml (it never compiled). No code/secret/support-code value logged on any path. Runtime sqlx::query. Files: server/src/middleware/rate_limit.rs (rebuilt), server/src/middleware/mod.rs, server/src/main.rs (AppState field + 3 route wirings + create_code DB insert + validate handler lockout feed), server/src/support_codes.rs (new generator + consume_for_bind + tests), server/src/db/support_codes.rs (consume_code_for_bind), server/src/relay/mod.rs (atomic consume on bind), server/migrations/006_widen_support_code.sql [new], server/Cargo.toml.

Files touched: server/src/middleware/rate_limit.rs (rebuild — v1 is non-compiling), server/src/middleware/mod.rs, server/src/api/auth.rs (login), server/src/api/ (code validate), server/src/db/support_codes.rs, server/src/relay/mod.rs (code bind).

Rebuild a compiling rate-limit layer (fix the tower_governor generics, or a small in-memory fixed-window limiter); re-enable pub mod rate_limit. Wire it to POST /api/auth/login, POST /api/auth/change-password, and the support-code validate route.
Single-use codes: consume atomically on first agent bind (accept only a pending, unconsumed code; set consumed_at); reject a second presenter. (Fixes the reusable-code HIGH.)
Widen the code: higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing 6-digit numeric; add per-IP lockout on repeated validate failures.

Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); middleware/mod.rs:3-11.

Task 5 [IMPLEMENTED 2026-05-30 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Attended-mode consent

[IMPLEMENTED] An ATTENDED (support-code) session now requires the end user to ACCEPT a native consent prompt before the technician's session is surfaced.

PROTO (proto/guruconnect.proto): added ConsentRequest (server/relay → agent: session_id, technician_name, access_mode (ConsentAccessMode = CONSENT_VIEW|CONSENT_CONTROL), timeout_secs) and ConsentResponse (agent → server: session_id, granted, reason), inserted AFTER AdminCommand. New Message oneof field numbers consent_request = 80, consent_response = 81 (no existing field renumbered).

SERVER (session/mod.rs): new ConsentState enum (NotRequired|Pending|Granted|Denied, as_db_str/allows_viewer) added to the in-memory Session. register_agent starts attended (!is_persistent) sessions Pending, managed/persistent NotRequired. join_session REFUSES any viewer unless consent_state.allows_viewer() (only Granted/ NotRequired) — this is the gate that keeps a support session invisible to the technician until accepted. New set_consent_state/get_consent_state. Unit tests cover the db-string mapping, the viewer-admission predicate, and the attended-pending-blocks / granted-admits / managed-admits / denied-blocks transitions.

SERVER (relay/mod.rs): after registering an attended agent, run_consent_handshake sends ConsentRequest, audits consent_requested, then waits up to CONSENT_TIMEOUT_SECS = 60 for a ConsentResponse. granted → consent_state = Granted + audit consent_granted + proceed. denied/timeout/agent-disconnect → consent_state = Denied + audit consent_denied, send a Disconnect to the agent, end the session row (status='denied'), release the code, and TEAR DOWN (early return — the technician never sees the session). In-memory consent is authoritative; the DB consent_state (via db::sessions::update_consent_state) is a durable/ audit mirror. A late/dup ConsentResponse in the main loop is logged+ignored (no silent unhandled variant). db::events gained CONSENT_GRANTED/ CONSENT_DENIED/CONSENT_REQUESTED. db::sessions::create_session now sets is_managed/source/consent_state from is_support_session (attended → false/standalone/pending; managed → true/gururmm/not_required). api::SessionInfo echoes consent_state so the dashboard can show "awaiting consent".

AGENT (consent/mod.rs [new], session/mod.rs, main.rs): on ConsentRequest, handle_consent_request runs a blocking Windows MessageBox (MB_YESNO | TOPMOST | SETFOREGROUND | SYSTEMMODAL | ICONQUESTION) on spawn_blocking so the async loop/heartbeats are not stalled, phrasing the prompt VIEW vs VIEW-and-CONTROL from access_mode, then sends a ConsentResponse. Anything other than an explicit Yes (closed box, panic) is a DENY. Non-Windows build is a // TODO(platform) stub that fails CLOSED (denies). Unit tests cover the prompt wording + the access-mode decode fallback.

Managed/unattended sessions are not_required, never prompted (Phase-1 default). PER-TENANT consent policy beyond this default is a future refinement — left as a TODO (no per-tenant policy table consulted yet).

No .unwrap() on a non-test path; runtime sqlx::query; no code/secret/ token value logged. No Rust toolchain here — self-reviewed only.

Files touched: proto/guruconnect.proto, agent/src/session/mod.rs, agent/src/consent/mod.rs (new), agent/src/main.rs, server/src/relay/mod.rs, server/src/session/mod.rs, server/src/db/sessions.rs, server/src/db/events.rs, server/src/api/mod.rs.

Add ConsentRequest / ConsentResponse to the proto (after AdminCommand).
On an attended session, the agent shows a consent dialog to the end user; the server keeps the session consent_state = pending and surfaces it to the technician only on granted. denied/timeout → tear down.
Managed/unattended sessions follow per-tenant policy (default: silent for managed; consent_state = not_required). Audit the consent decision to events.

Reference: specs/native-remote-control/plan.md Task 5 (Consent primitive); proto AdminCommand insertion point.

Task 6 [IMPLEMENTED 2026-05-30 — self-verified on a local Windows toolchain: `cargo fmt --all` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 69 pass (19 agent + 50 server), `cargo build --workspace` ok; pending Code Review]: Native viewer — full key fidelity

[IMPLEMENTED] The four parts:

VIEWER CAPTURE (agent/src/viewer/input.rs): the WH_KEYBOARD_LL hook now DIVERTS system combinations (Windows/Apps keys, Alt+Tab, Alt+Esc, Ctrl+Esc; Win+R / Win+E compose on the remote because the diverted Win-down is forwarded) and forwards them as full-fidelity KeyEvents — VK + hardware scan code + is_extended (read from KBDLLHOOKSTRUCT.flags & LLKHF_EXTENDED) + modifier snapshot — returning LRESULT(1) to suppress local handling. The combo decision is a pure is_system_combo(vk, alt, ctrl) so it is unit-tested. Ordinary keys are NOT forwarded by the hook (they take the normal winit path — avoids double inject). A "send system keys to remote" toggle (AtomicBool, default ON) gates the diversion; when OFF the hook is transparent. Toggle API: set_/toggle_/send_system_keys_enabled. Host key: Pause/Break flips it (handled in render.rs, intercepted locally, logged).

AGENT INJECTION (agent/src/input/keyboard.rs): rewrote send_key to inject via SendInput with KEYEVENTF_SCANCODE (layout-independent) and the correct KEYEVENTF_EXTENDEDKEY (set when the viewer flagged extended, the VK is inherently extended, or the VK→scan map carries the 0xE0 prefix). New key_event_full(vk, scan, is_extended, down) consumes all KeyEvent fields (scan 0 ⇒ derive from VK for older viewers); wired into session/mod.rs KeyEvent handling. is_extended_key delegates to a platform-independent vk_is_extended (unit-tested) that also covers right Ctrl/Alt.

CTRL+ALT+DEL: the existing SpecialKey::CtrlAltDel → send_ctrl_alt_del → send_sas path is confirmed and hardened. send_sas now: Tier 1 SAS helper service (SYSTEM, named pipe) → Tier 2 direct sas.dll!SendSAS → Tier 3 fails with a clear, actionable error (no false success; plain SendInput can't reach the secure desktop). The SAS installer (bin/sas_service.rs install) now sets the SoftwareSASGeneration Winlogon policy (HKLM...\Policies\System, DWORD 1 = services) via set_software_sas_policy, with a // TODO(installer) noting the top-level managed installer should also ensure it.

MODIFIER HYGIENE: viewer tracks held Ctrl/Alt/Shift/Win (ViewerModifierState in render.rs) and emits explicit key-ups on FOCUS LOSS (WindowEvent::Focused(false)) and on window close, so a modifier pressed-but-not-released across a blur doesn't stay latched on the remote. Agent-side KeyboardController also tracks injected modifiers and exposes release_all_modifiers (defensive complement; wired via ModifierState, the scaffolding the cleanup kept). Both trackers unit-tested.

PROTO: KeyEvent gained bool is_extended = 7 (new field number, nothing renumbered) — carries the viewer's LLKHF_EXTENDED capture so injection picks the right extended flag; older agents that ignore it fall back to deriving from vk/scan. SpecialKeyEvent already carried CTRL_ALT_DEL; unchanged.

dead_code wired/removed: InputEvent::SpecialKey (now emitted/consumable), ModifierState (agent keyboard — now tracks + drains held modifiers), type_char/type_string kept with their allows (separate unicode-typing feature, not in Task 6 scope). sas_client::is_service_available/get_service_status kept narrowly allowed (status API not yet wired into a runtime path). InputController::key_event (vk-only) and release_all_modifiers kept allowed as API surface (the relayed path uses key_event_full, focus-loss re-sync is viewer-driven). New toggle accessors set_/send_system_keys_enabled narrowly allowed (host-key uses toggle_; future viewer menu).

TESTS ADDED (13): extended-key flag determination (extended + non-extended sets), modifier-state transitions (record/down-up/ignore-non-modifier/drain-and-clear), the system-combo classifier (Win/Alt+Tab/Alt+Esc/Ctrl+Esc divert; ordinary keys don't), and the toggle state machine (default ON, flip, explicit set). Live hook/SendInput/SendSAS behavior is plan Task 8 (needs a real desktop).

Files touched: proto/guruconnect.proto (KeyEvent is_extended = 7), agent/src/viewer/input.rs (rewritten hook + toggle), agent/src/viewer/render.rs (focus-loss modifier release, host-key toggle, is_extended on winit path), agent/src/viewer/mod.rs (unchanged SpecialKey forwarding), agent/src/input/keyboard.rs (scan-code injection + ModifierState + vk_is_extended + tests), agent/src/input/mod.rs (key_event_full / release_all_modifiers / vk_is_extended re-export), agent/src/session/mod.rs (KeyEvent → key_event_full), agent/src/bin/sas_service.rs (set_software_sas_policy + install wiring), agent/src/input/keyboard.rs (send_sas hardening).

Viewer capture: install WH_KEYBOARD_LL while the viewer is in focused control; divert system combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as KeyEvent (VK + scan code + extended flag) instead of letting the local shell act. Add a "send system keys to remote" toggle.
Agent injection: scan-code SendInput (KEYEVENTF_SCANCODE + correct KEYEVENTF_EXTENDEDKEY for right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged input/keyboard.rs.
Ctrl+Alt+Del: SpecialKeyEvent.CTRL_ALT_DEL → the salvaged SAS service (bin/sas_service.rs, SendSAS, SYSTEM, SoftwareSASGeneration policy set by the managed installer).
Modifier hygiene: track modifier up/down; re-sync (release) on focus loss to kill stuck modifiers.

Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; agent/src/input/keyboard.rs, agent/src/bin/sas_service.rs.

Task 7 [IMPLEMENTED 2026-05-30 — self-verified on local Windows toolchain: `cargo fmt --all --check` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 89 pass (36 agent + 53 server; was 70, no regressions), `cargo build --workspace` ok; pending Code Review]: Hardware H.264 encode + negotiated raw/Zstd fallback

[IMPLEMENTED] Raw+Zstd remains the DEFAULT and guaranteed fallback; H.264 is a negotiated upgrade that is COMPILE-VERIFIED ONLY (live MF encode/decode is Task 8 — needs real GPU + frames). The testable parts (abstraction, factory, negotiation, capability plumbing, color-conversion math) are done solidly with unit tests; the MF H.264 encoder and viewer decoder are first-cut, clearly marked, and gated behind a default-off policy so unvalidated H.264 never ships as the default.

ENCODER ABSTRACTION (agent/src/encoder/mod.rs): the existing Encoder trait (encode(&mut self, &CapturedFrame) -> Result<EncodedFrame>) is the abstraction; RawEncoder (salvaged raw+Zstd+dirty-rects, UNCHANGED behavior) and the new H264Encoder both implement it. Factory split into pure pieces: codec_from_str (config-string -> VideoCodec), select_codec(negotiated, hardware_available) (agent-side guard: H.264 only if HW present, HEVC->raw, else raw), and create_encoder_for(VideoCodec, quality) (builds the encoder; on H.264 init failure logs + returns a RAW encoder so the session never breaks). UNIT-TESTED: codec_from_str mapping, select_codec guard matrix, raw factory always succeeds, string path resolves to raw without HW.

CAPABILITY + NEGOTIATION (testable, done well):

encoder/capability.rs: supports_hardware_h264() probes MF once (MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_HARDWARE, MFVideoFormat_H264)), caches the bool via OnceLock; false on non-Windows / no HW / MF error. Advertised in AgentStatus.supports_h264 (proto field 11, additive).

Server (server/src/session/mod.rs): select_video_codec(agent_supports, prefer_h264) is a PURE decision fn — H.264 only when BOTH the agent supports it AND policy prefers it, else raw. Policy constant DEFAULT_PREFER_H264 = false (documented: keeps raw as the negotiated codec until H.264 is hardware-validated). supports_h264 stored on the in-memory Session from AgentStatus (update_agent_status gained the param). The negotiated codec is stamped on StartStream.video_codec in send_start_stream_internal (the LIVE server->agent codec-selection point — SessionRequest/SessionResponse are not exchanged on the wire in v2, so the proto's SessionResponse.video_codec is kept for spec parity but the live path uses StartStream). UNIT-TESTED: the negotiation matrix, the default-policy guardrail (capable agent still gets raw), and the AgentStatus -> supports_h264 ingest.

Agent applies it: StartStream handler decodes video_codec, stores negotiated_codec, and init_streaming builds the encoder via select_codec + create_encoder_for (re-guards on local HW; older server sends 0 = RAW, preserving the default).

MF H.264 ENCODER (agent/src/encoder/h264.rs, FIRST-CUT, compile-verified only): enumerates+activates a HW H.264 encoder MFT, sets H.264 output then NV12 input media types (frame size/rate, bitrate from quality), feeds frames (ProcessInput) and drains synchronously (ProcessOutput, NEED_MORE_INPUT = "no output this tick"), emitting VideoFrame{H264(EncodedFrame{data, keyframe, pts, dts})}. BGRA->NV12 via encoder/color.rs (BT.601 limited-range, 2x2 box chroma; isolated + UNIT-TESTED: size, odd-dim/short-buffer rejection, black/ white/red reference values, plane coverage). On ANY init failure the FACTORY falls back to raw (logged); per-frame errors surface to the session (which logs + continues). Handles resolution change (re-init), keyframe flag (CleanPoint), MF buffer alloc for non-sample-providing MFTs. NOT yet live: the async-MFT event model is documented as a Task-8 refinement (this cut drains synchronously); precise force-IDR (CODECAPI) is a TODO; D3D11 zero-copy deferred (feeds CPU NV12).

VIEWER H.264 DECODE (agent/src/viewer/decoder.rs [new], FIRST-CUT, compile-verified only): MF H.264 decoder MFT -> NV12 -> BGRA (nv12_to_bgra, BT.601 inverse, UNIT-TESTED round-trip within tolerance + short-buffer + black). Runs on a DEDICATED OS thread (gc-h264-decode), NOT a tokio task — the MF decoder has COM thread affinity and a tokio task can migrate across workers at await points. The receive task forwards H.264 access units over a std channel; the worker decodes and pushes BGRA FrameData through the existing render path via blocking_send. On decoder-init failure it logs once and drops H.264 frames; the RAW render path is untouched. Handles the MF_E_TRANSFORM_STREAM_CHANGE NV12 output renegotiation + size discovery.

RAW STILL WORKS END-TO-END: RawEncoder is unchanged; with DEFAULT_PREFER_H264 = false the server negotiates RAW for every session (including capable agents), the agent builds the raw encoder, and the viewer's existing Raw branch renders it — the guaranteed default/fallback path is fully intact and is what runs today.

PROTO (additive — no field renumbered): VideoCodec enum (RAW=0, H264=1, H265=2); SessionResponse.video_codec = 5 (spec parity); StartStream.video_codec = 3 (live negotiation); AgentStatus.supports_h264 = 11 (capability). HEVC is a documented TODO/opt-in everywhere (never selected). Cargo.toml: added the Win32_Media_MediaFoundation + COM windows features (no new external crates).

COMPILE-VERIFIED-ONLY / NEEDS LIVE HARDWARE (Task 8): the MF H.264 encoder init/feed/emit on a real GPU, the viewer MF decoder on a live stream, the BGRA<->NV12 fidelity end-to-end, and the synchronous-drain timing. The encoder/ decoder are structured to fall back to raw (encoder) / drop frames + log (decoder) on any failure so they cannot break a session even if MF misbehaves.

TESTS ADDED (19): agent +16 (encoder factory/select matrix x5, color BGRA->NV12 x8, decoder NV12<->BGRA x3), server +3 (codec negotiation matrix, default-policy guardrail, AgentStatus capability ingest).

Files touched: proto/guruconnect.proto (VideoCodec enum + SessionResponse.video_codec

StartStream.video_codec + AgentStatus.supports_h264), agent/Cargo.toml (MF/COM windows features), agent/src/encoder/mod.rs (trait/factory/select), agent/src/encoder/raw.rs (salvaged, unchanged), agent/src/encoder/h264.rs [new], agent/src/encoder/capability.rs [new], agent/src/encoder/color.rs [new], agent/src/session/mod.rs (negotiated codec apply + supports_h264 advertise), agent/src/viewer/mod.rs (H.264 route + decode worker), agent/src/viewer/decoder.rs [new], server/src/session/mod.rs (select_video_codec + DEFAULT_PREFER_H264 + supports_h264 field/ingest + StartStream codec stamp), server/src/relay/mod.rs (pass supports_h264 from AgentStatus).

HW H.264 via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's EncodedFrame (h264). Native viewer decodes via MF/D3D11.
Fallback: salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
Negotiation: agent advertises HW-encode capability in AgentStatus; server selects the codec in SessionResponse; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now (and the Phase-2 WebRTC track later).

Reference: SPEC-002 §5; agent/src/encoder/raw.rs (salvaged), proto/guruconnect.proto (EncodedFrame already modeled).

Task 8: Verification (end-to-end, observable)

Security (re-audit clean): run /gc-audit --pass=security (and --pass=rust) → the three relay CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked cak_ key is rejected on /ws/agent; a viewer token for session A is rejected on session B; a logged-out (blacklisted) viewer token is rejected on /ws/viewer; a user JWT is rejected as an agent key.
Attended flow: generate a support code → run the one-time agent → end user sees + accepts consent → technician's session appears only after acceptance; a denied consent tears down.
Key fidelity (the headline): in a live session, confirm Win+R opens Run on the remote, Ctrl+C on remote / Ctrl+V locally (and vice-versa, text), and Ctrl+Alt+Del reaches the remote secure desktop. Confirm no stuck modifiers after alt-tabbing away and back.
Codec: confirm a HW-H.264 machine negotiates h264 (check SessionResponse), a Win7/no-HW machine falls back to raw+Zstd, both render correctly in the native viewer.
Rate limiting: hammer /api/auth/login and the code-validate route → confirm throttling/lockout.
Migrations: fresh DB applies the v2 migrations cleanly; _sqlx_migrations consistent; tenant_id populated with the default tenant.

38 KiB Raw Blame History Unescape Escape

v2 Secure Session Core — Implementation Plan

Task 0: Commit this spec

Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables

Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens

Task 3 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet cargo check-verified]: Secure relay WS handlers + bounded relay

Task 3 authz-strength fix — VIEW_ONLY/CONTROL token split [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet cargo check-verified; pending Code Review]

Task 4 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet cargo check-verified; pending Code Review]: Working rate limiting + single-use support codes

Task 5 [IMPLEMENTED 2026-05-30 — self-reviewed; no Rust toolchain on this machine, not yet cargo check-verified; pending Code Review]: Attended-mode consent

Task 8: Verification (end-to-end, observable)

38 KiB

Raw Blame History

Task 3 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified]: Secure relay WS handlers + bounded relay

Task 3 authz-strength fix — VIEW_ONLY/CONTROL token split [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]

Task 4 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Working rate limiting + single-use support codes

Task 5 [IMPLEMENTED 2026-05-30 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Attended-mode consent