Files
guru-connect/specs/v2-secure-session-core/plan.md
Mike Swanson a453e7984e
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m20s
Build and Test / Build Agent (Windows) (push) Successful in 6m9s
Build and Test / Security Audit (push) Successful in 4m21s
Build and Test / Build Summary (push) Has been skipped
feat(server): viewer-token view-only/control split - closes CRITICAL #1
Authz-strength fix (coord todo c8916c89), code-reviewed APPROVED. Replaces the
weak "view" gate (held by every role) with a permission-tiered access mode
stamped inside the signed viewer token:
- mint: is_admin() || has_permission("control") -> CONTROL token; else
  has_permission("view") -> VIEW_ONLY token; else 403.
- enforce: the relay drops MouseEvent/KeyEvent/SpecialKey for a VIEW_ONLY token
  before forwarding (video still streams); CONTROL tokens forward under the
  Task-3 throttle. Mode is unforgeable (in the signature) and unbypassable
  (all other viewer->agent payloads hit the catch-all and are never forwarded).
A low-privilege viewer-role user can now at most watch, never control. New
ViewerAccess enum (view_only|control) on ViewerClaims; 3 unit tests.

Audit CRITICAL #1 now fully closed (mechanism in Task 3 + this authz strength).
Not cargo-check-verified locally (no toolchain) - the push triggers CI
(clippy -D warnings + build + test) which is the verification gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 19:24:32 -07:00

18 KiB
Raw Blame History

v2 Secure Session Core — Implementation Plan

Spec created: 2026-05-29 Status: in progress — Tasks 1-3 DONE 2026-05-29 (Task 3 code-reviewed APPROVED). Viewer-token authz STRENGTH split IMPLEMENTED 2026-05-29 (self-reviewed; no Rust toolchain on this machine — not yet cargo check-verified; pending Code Review). This was the REQUIRED Phase-1-exit follow-up: the gate previously used view (held by EVERY default role incl. viewer) but a viewer token granted input CONTROL. DECIDED (Mike, 2026-05-29) + IMPLEMENTED: SPLIT VIEW_ONLY/CONTROL tokens — view-perm users get a watch-only token (relay refuses their input), admin/control users get a control token. See the "Task 3 authz-strength fix" block under Task 3 below. Resolves coord todo c8916c89 (coordinator marks done after review). Remaining follow-up: nothing revokes a minted viewer token on logout (bounded by 5-min TTL) — follow-up todo. Task 4 (rate limiting + single-use codes) next. CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2 fixed only the token mechanism; the authz gate is what actually closes audit CRITICAL #1. Policy DECIDED (Mike, 2026-05-29): admin-or-view-permission (is_admin() || has_permission(...)). Parent: docs/specs/SPEC-002-v2-modernization-architecture.md (Phase 1) Keystone: Tasks 14 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH is closed there. Tasks 57 deliver the product capability on top. Do them in order.

Task 0: Commit this spec

Commit the specs/v2-secure-session-core/ directory before writing any code:

git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"

Do not start Task 1 until this commit exists.


Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables

[DONE] migration 004_v2_secure_session_core.sql + db/agent_keys.rs + db/tenancy.rs + struct/query updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db layer already uses runtime sqlx::query() (no macros) — the v2 "switch to runtime" was already true.

Files touched: server/migrations/ (new v2 migration files), server/src/db/ (rebuilt modules: agent_keys.rs [new], sessions.rs, machines.rs, support_codes.rs, events.rs, users.rs, mod.rs).

  • New table connect_agent_keys: id UUID, machine_id UUID FK, key_hash TEXT, tenant_id UUID NULL, created_at, last_used_at, revoked_at TIMESTAMPTZ NULL. Keys are cak_-prefixed, stored hashed (SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
  • Add nullable tenant_id UUID to machines, sessions, support_codes, events, users, connect_agent_keys, defaulting to a single bootstrap tenant row. Add a tenants table with one seed row.
  • sessions gains is_managed BOOLEAN, source TEXT ('standalone'|'gururmm'), consent_state TEXT ('not_required'|'pending'|'granted'|'denied'), tenant_id.
  • support_codes: add single-use semantics — consumed_at TIMESTAMPTZ NULL; widen the code column to hold a higher-entropy human-readable code (see Task 4).
  • Migrations are idempotent (CREATE TABLE IF NOT EXISTS / ADD COLUMN IF NOT EXISTS), applied on server startup, recorded in _sqlx_migrations. New queries use runtime sqlx::query().
  • DB modules expose a tenancy helper that today resolves every call to the default tenant (Phase-4 switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).

Reference: specs/native-remote-control/plan.md Task 2 (connect_agent_keys); .claude/standards/gururmm/sqlx-migrations.md.


Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens

CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by AuthenticatedUser (authentication, not authorization). GC has a real admin|operator|viewer role + permissions model, so this is intra- tenant privilege escalation until a permission check is added. The mechanism is fixed here; the authz check in Task 3 is what closes audit CRITICAL #1. Metadata bug todo faf39fe0 resolved (migration 005).

[IMPLEMENTED] auth/agent_keys.rs [new] (cak_ mint/SHA-256 hash/verify), auth/jwt.rs (ViewerClaims + create_viewer_token/validate_viewer_token, 5-min TTL, purpose:"viewer"), auth/mod.rs (module + re-export). Deleted the JWT-as-agent-key branch in relay/mod.rs validate_agent_api_key — now per-agent cak_ key OR deprecated shared AGENT_API_KEY (WARNING-logged), never a user JWT. New endpoints: POST/GET /api/machines/:agent_id/keys, DELETE /api/machines/:agent_id/keys/:key_id (admin), POST /api/sessions/:id/viewer-token (dashboard JWT). db helpers added: agent_keys::{list_for_machine,key_belongs_to_machine}. Folded in migration 005_machine_metadata.sql + Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust toolchain on this machine — self-reviewed; not yet cargo check-verified.

Files touched: server/src/auth/ (mod.rs, jwt.rs, agent_keys.rs [new], token_blacklist.rs, password.rs), server/src/api/auth.rs, server/src/api/sessions.rs.

  • Delete the JWT-as-agent-key path. Remove the jwt_config.validate_token branch from validate_agent_api_key (relay/mod.rs:224); agent authentication validates a cak_ per-agent key (hash compare against connect_agent_keys, reject if revoked_at set) OR a support code — never a user JWT.
  • Session-scoped viewer tokens: new endpoint POST /api/sessions/:id/viewer-token (auth: dashboard JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose claims include session_id and tenant_id. This replaces "any dashboard JWT can view any session."
  • Keep Argon2id passwords; keep the blacklist but make is_revoked() callable from the WS layer (Task 3).
  • Per-agent key issuance endpoint (admin): POST /api/machines/:id/keys → returns plaintext cak_ once, stores hash. DELETE /api/machines/:id/keys/:key_id sets revoked_at.

Reference: relay/mod.rs:224 (validate_agent_api_key — the CRITICAL), auth/mod.rs:116 (blacklist already consulted for REST — extend to WS), specs/native-remote-control/plan.md Tasks 2/3/6; .claude/standards/security/credential-handling.md, .claude/standards/api/response-format.md.


Task 3 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet cargo check-verified]: Secure relay WS handlers + bounded relay

[IMPLEMENTED] Viewer WS now verifies the session-scoped VIEWER token (validate_viewer_token: sig+exp+purpose) + token_blacklist.is_revoked + session_id claim == requested session, before upgrade — raw login JWTs are no longer accepted (closes CRITICAL #1 mechanism + #2). Authz gate added to mint_viewer_token: is_admin() || has_permission("view") → 403 envelope on failure (closes CRITICAL #1 — uses the EXISTING view permission from GC's catalog; no new permission defined). Agent WS now binds persistent reattach to the authenticated machine identity: validate_agent_api_key returns an AgentKeyAuth enum carrying the cak_ key's machine agent_id (resolved via new db::machines::get_machine_by_id); a mismatched query-string agent_id is ignored, a per-agent key whose machine can't be resolved fails closed (503). Frame caps set on BOTH upgrades (agent 4 MiB, viewer 64 KiB via max_message_size/max_frame_size) (closes WS-OOM HIGH). Viewer→agent input throttled to 200 events/sec/viewer via a refilling token bucket + non-blocking bounded try_send (drop/coalesce on overflow) (closes input-injection MEDIUM). Startup managed-session reconcile retained + clarified (persistent machines → offline in-memory sessions). Removed #[allow(dead_code)] on validate_viewer_token and AuthenticatedUser::has_permission. No token/secret logged; runtime sqlx::query/query_as. Files: server/src/relay/mod.rs, server/src/api/sessions.rs, server/src/db/machines.rs, server/src/auth/mod.rs, server/src/auth/jwt.rs, server/src/main.rs.

Task 3 authz-strength fix — VIEW_ONLY/CONTROL token split [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet cargo check-verified; pending Code Review]

Closes audit CRITICAL #1 at full strength (coord todo c8916c89). The Task-3 gate minted a viewer token for any is_admin() || has_permission("view") user, but view is held by EVERY default role (incl. viewer) and the token granted input CONTROL — intra-tenant privilege escalation. Now the token carries an ACCESS MODE inside its signed claims and the relay enforces it:

  • auth/jwt.rs: new ViewerAccess enum (ViewOnly | Control, serde-renamed to "view_only"/"control"); ViewerClaims gains an access: ViewerAccess field; create_viewer_token(..., access) stamps it; validate_viewer_token returns it as part of the claims (sig+exp+purpose checks unchanged). New unit tests cover the round-trip, the lowercase wire form, and login-JWT rejection.
  • auth/mod.rs: re-export ViewerAccess.
  • api/sessions.rs (mint_viewer_token): TIERED mint — is_admin() || has_permission("control") → CONTROL token; else has_permission("view") → VIEW_ONLY token; else → 403 (standard envelope). Permission constants SESSION_CONTROL_PERMISSION="control" / SESSION_VIEW_PERMISSION="view". Response echoes access (advisory; the signed claim is authoritative).
  • relay/mod.rs: viewer_ws_handler reads claims.access from the VERIFIED token and threads it into handle_viewer_connection (new access: ViewerAccess param). In the input path, a view-only token's MouseEvent/KeyEvent/SpecialKey are refused (a guarded match arm if !access.can_control() that silently drops + logs once-per- power-of-two), BEFORE the throttle/try_send. A control token forwards as before (with the Task-3 throttle). Video still streams to a view-only viewer; chat (not an injected- input vector) is still relayed. The mode cannot be forged — it lives in the signed token.

Everything else from Task 3 (session_id-claim match, blacklist, frame caps, throttle, agent identity binding) is intact — this is purely additive access-mode enforcement.

PHASE-2 REFINEMENT: this refuses to FORWARD input for a view-only token; it does NOT yet tie the viewer mode to the agent-side SessionType.VIEW_ONLY capture mode (the agent still does full capture). Deferred (deeper agent change).

Files touched: server/src/relay/mod.rs, server/src/session/mod.rs.

  • viewer_ws_handler (relay/mod.rs:242): verify the viewer token's signature + expiry + blacklist + session_id claim == requested session_id before handle_viewer_connection (relay/mod.rs:595). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
  • Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit CRITICAL #1. The minting endpoint POST /api/sessions/:id/viewer-token (in server/src/api/sessions.rs) must enforce a real permission predicate, not just AuthenticatedUser: user.is_admin() || user.has_permission(<policy>). GC's role model (admin|operator|viewer) + permissions table already exist (server/src/auth/mod.rs), so honoring the intra-tenant role distinction is cheap. Policy DECIDED (Mike, 2026-05-29): admin-or-view-permissionuser.is_admin() || user.has_permission(<the session-view/control permission in GC's catalog — use the real name; define one if absent>). Enforce at the minting endpoint mint_viewer_token (the authz decision point); the WS then trusts the session-scoped token. Multi-tenant client-access isolation stays deferred to Phase 4.
  • agent_ws_handler (relay/mod.rs:55): authenticate via per-agent key OR support code only (Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string agent_id alone (session/mod.rs:98).
  • Frame caps: set explicit .max_message_size(...)/.max_frame_size(...) on both WebSocketUpgrades; reject oversized frames before to_vec()/broadcast. (Fixes WS-OOM HIGH.)
  • Input throttle: bound + rate-limit the viewer→agent input queue (relay/mod.rs:669); cap events/sec.
  • Reconcile managed sessions from DB on startup so they aren't orphaned.

Reference: audit Pass E (reports/2026-05-29-gc-audit.md §"Pass 5"); relay/mod.rs:242,55,595,669.


Task 4 (KEYSTONE): Working rate limiting + single-use support codes

Files touched: server/src/middleware/rate_limit.rs (rebuild — v1 is non-compiling), server/src/middleware/mod.rs, server/src/api/auth.rs (login), server/src/api/ (code validate), server/src/db/support_codes.rs, server/src/relay/mod.rs (code bind).

  • Rebuild a compiling rate-limit layer (fix the tower_governor generics, or a small in-memory fixed-window limiter); re-enable pub mod rate_limit. Wire it to POST /api/auth/login, POST /api/auth/change-password, and the support-code validate route.
  • Single-use codes: consume atomically on first agent bind (accept only a pending, unconsumed code; set consumed_at); reject a second presenter. (Fixes the reusable-code HIGH.)
  • Widen the code: higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing 6-digit numeric; add per-IP lockout on repeated validate failures.

Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); middleware/mod.rs:3-11.


Files touched: proto/guruconnect.proto, agent/src/session/mod.rs, agent/src/ (consent UI dialog), server/src/relay/mod.rs, server/src/session/mod.rs.

  • Add ConsentRequest / ConsentResponse to the proto (after AdminCommand).
  • On an attended session, the agent shows a consent dialog to the end user; the server keeps the session consent_state = pending and surfaces it to the technician only on granted. denied/timeout → tear down.
  • Managed/unattended sessions follow per-tenant policy (default: silent for managed; consent_state = not_required). Audit the consent decision to events.

Reference: specs/native-remote-control/plan.md Task 5 (Consent primitive); proto AdminCommand insertion point.


Task 6: Native viewer — full key fidelity

Files touched: agent/src/viewer/ (low-level hook + input capture), agent/src/input/keyboard.rs (extend — salvaged), agent/src/input/mod.rs, agent/src/bin/sas_service.rs (wire — salvaged), proto/guruconnect.proto (confirm KeyEvent/SpecialKeyEvent coverage).

  • Viewer capture: install WH_KEYBOARD_LL while the viewer is in focused control; divert system combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as KeyEvent (VK + scan code + extended flag) instead of letting the local shell act. Add a "send system keys to remote" toggle.
  • Agent injection: scan-code SendInput (KEYEVENTF_SCANCODE + correct KEYEVENTF_EXTENDEDKEY for right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged input/keyboard.rs.
  • Ctrl+Alt+Del: SpecialKeyEvent.CTRL_ALT_DEL → the salvaged SAS service (bin/sas_service.rs, SendSAS, SYSTEM, SoftwareSASGeneration policy set by the managed installer).
  • Modifier hygiene: track modifier up/down; re-sync (release) on focus loss to kill stuck modifiers.

Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; agent/src/input/keyboard.rs, agent/src/bin/sas_service.rs.


Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback

Files touched: agent/src/encoder/ (mod.rs, h264.rs [new], raw.rs [salvaged]), agent/src/capture/ (feed), agent/src/viewer/ (decode), proto/guruconnect.proto (AgentStatus capability, SessionResponse codec), server/src/session/mod.rs (negotiation).

  • HW H.264 via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's EncodedFrame (h264). Native viewer decodes via MF/D3D11.
  • Fallback: salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
  • Negotiation: agent advertises HW-encode capability in AgentStatus; server selects the codec in SessionResponse; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now (and the Phase-2 WebRTC track later).

Reference: SPEC-002 §5; agent/src/encoder/raw.rs (salvaged), proto/guruconnect.proto (EncodedFrame already modeled).


Task 8: Verification (end-to-end, observable)

  • Security (re-audit clean): run /gc-audit --pass=security (and --pass=rust) → the three relay CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked cak_ key is rejected on /ws/agent; a viewer token for session A is rejected on session B; a logged-out (blacklisted) viewer token is rejected on /ws/viewer; a user JWT is rejected as an agent key.
  • Attended flow: generate a support code → run the one-time agent → end user sees + accepts consent → technician's session appears only after acceptance; a denied consent tears down.
  • Key fidelity (the headline): in a live session, confirm Win+R opens Run on the remote, Ctrl+C on remote / Ctrl+V locally (and vice-versa, text), and Ctrl+Alt+Del reaches the remote secure desktop. Confirm no stuck modifiers after alt-tabbing away and back.
  • Codec: confirm a HW-H.264 machine negotiates h264 (check SessionResponse), a Win7/no-HW machine falls back to raw+Zstd, both render correctly in the native viewer.
  • Rate limiting: hammer /api/auth/login and the code-validate route → confirm throttling/lockout.
  • Migrations: fresh DB applies the v2 migrations cleanly; _sqlx_migrations consistent; tenant_id populated with the default tenant.