Files
guru-connect/specs/v2-secure-session-core/plan.md
Mike Swanson fef8111ff3
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 6m7s
Build and Test / Build Server (Linux) (push) Successful in 10m15s
Build and Test / Security Audit (push) Successful in 4m24s
Build and Test / Build Summary (push) Successful in 12s
feat(server): v2 secure-session-core Task 1 - schema + per-agent keys
SPEC-002 Phase 1 Task 1 (specs/v2-secure-session-core), code-reviewed APPROVED.

Migration 004 (idempotent, server-applied): tenants + seeded default tenant,
connect_agent_keys (hash-only, revocable, FK->connect_machines), nullable
tenant_id on all scoped tables (tenancy-ready, not tenant-yet), connect_sessions
is_managed/source/consent_state, connect_support_codes consumed_at. New db
modules agent_keys.rs (stores only key_hash) + tenancy.rs (DEFAULT_TENANT_ID,
Phase-4 switch point). Struct/query updates across machines/sessions/
support_codes/events/users. Runtime sqlx throughout (GC db layer already uses
it - no compile-time macros).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 18:33:26 -07:00

10 KiB
Raw Blame History

v2 Secure Session Core — Implementation Plan

Spec created: 2026-05-29 Status: in progress — Task 1 (schema) DONE 2026-05-29; Task 2 (auth) next Parent: docs/specs/SPEC-002-v2-modernization-architecture.md (Phase 1) Keystone: Tasks 14 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH is closed there. Tasks 57 deliver the product capability on top. Do them in order.

Task 0: Commit this spec

Commit the specs/v2-secure-session-core/ directory before writing any code:

git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"

Do not start Task 1 until this commit exists.


Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables

[DONE] migration 004_v2_secure_session_core.sql + db/agent_keys.rs + db/tenancy.rs + struct/query updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db layer already uses runtime sqlx::query() (no macros) — the v2 "switch to runtime" was already true.

Files touched: server/migrations/ (new v2 migration files), server/src/db/ (rebuilt modules: agent_keys.rs [new], sessions.rs, machines.rs, support_codes.rs, events.rs, users.rs, mod.rs).

  • New table connect_agent_keys: id UUID, machine_id UUID FK, key_hash TEXT, tenant_id UUID NULL, created_at, last_used_at, revoked_at TIMESTAMPTZ NULL. Keys are cak_-prefixed, stored hashed (SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
  • Add nullable tenant_id UUID to machines, sessions, support_codes, events, users, connect_agent_keys, defaulting to a single bootstrap tenant row. Add a tenants table with one seed row.
  • sessions gains is_managed BOOLEAN, source TEXT ('standalone'|'gururmm'), consent_state TEXT ('not_required'|'pending'|'granted'|'denied'), tenant_id.
  • support_codes: add single-use semantics — consumed_at TIMESTAMPTZ NULL; widen the code column to hold a higher-entropy human-readable code (see Task 4).
  • Migrations are idempotent (CREATE TABLE IF NOT EXISTS / ADD COLUMN IF NOT EXISTS), applied on server startup, recorded in _sqlx_migrations. New queries use runtime sqlx::query().
  • DB modules expose a tenancy helper that today resolves every call to the default tenant (Phase-4 switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).

Reference: specs/native-remote-control/plan.md Task 2 (connect_agent_keys); .claude/standards/gururmm/sqlx-migrations.md.


Task 2 (KEYSTONE): Rebuilt auth model — plane separation + session-scoped viewer tokens

Files touched: server/src/auth/ (mod.rs, jwt.rs, agent_keys.rs [new], token_blacklist.rs, password.rs), server/src/api/auth.rs, server/src/api/sessions.rs.

  • Delete the JWT-as-agent-key path. Remove the jwt_config.validate_token branch from validate_agent_api_key (relay/mod.rs:224); agent authentication validates a cak_ per-agent key (hash compare against connect_agent_keys, reject if revoked_at set) OR a support code — never a user JWT.
  • Session-scoped viewer tokens: new endpoint POST /api/sessions/:id/viewer-token (auth: dashboard JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose claims include session_id and tenant_id. This replaces "any dashboard JWT can view any session."
  • Keep Argon2id passwords; keep the blacklist but make is_revoked() callable from the WS layer (Task 3).
  • Per-agent key issuance endpoint (admin): POST /api/machines/:id/keys → returns plaintext cak_ once, stores hash. DELETE /api/machines/:id/keys/:key_id sets revoked_at.

Reference: relay/mod.rs:224 (validate_agent_api_key — the CRITICAL), auth/mod.rs:116 (blacklist already consulted for REST — extend to WS), specs/native-remote-control/plan.md Tasks 2/3/6; .claude/standards/security/credential-handling.md, .claude/standards/api/response-format.md.


Task 3 (KEYSTONE): Secure relay WS handlers + bounded relay

Files touched: server/src/relay/mod.rs, server/src/session/mod.rs.

  • viewer_ws_handler (relay/mod.rs:242): verify the viewer token's signature + expiry + blacklist + session_id claim == requested session_id before handle_viewer_connection (relay/mod.rs:595). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
  • agent_ws_handler (relay/mod.rs:55): authenticate via per-agent key OR support code only (Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string agent_id alone (session/mod.rs:98).
  • Frame caps: set explicit .max_message_size(...)/.max_frame_size(...) on both WebSocketUpgrades; reject oversized frames before to_vec()/broadcast. (Fixes WS-OOM HIGH.)
  • Input throttle: bound + rate-limit the viewer→agent input queue (relay/mod.rs:669); cap events/sec.
  • Reconcile managed sessions from DB on startup so they aren't orphaned.

Reference: audit Pass E (reports/2026-05-29-gc-audit.md §"Pass 5"); relay/mod.rs:242,55,595,669.


Task 4 (KEYSTONE): Working rate limiting + single-use support codes

Files touched: server/src/middleware/rate_limit.rs (rebuild — v1 is non-compiling), server/src/middleware/mod.rs, server/src/api/auth.rs (login), server/src/api/ (code validate), server/src/db/support_codes.rs, server/src/relay/mod.rs (code bind).

  • Rebuild a compiling rate-limit layer (fix the tower_governor generics, or a small in-memory fixed-window limiter); re-enable pub mod rate_limit. Wire it to POST /api/auth/login, POST /api/auth/change-password, and the support-code validate route.
  • Single-use codes: consume atomically on first agent bind (accept only a pending, unconsumed code; set consumed_at); reject a second presenter. (Fixes the reusable-code HIGH.)
  • Widen the code: higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing 6-digit numeric; add per-IP lockout on repeated validate failures.

Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); middleware/mod.rs:3-11.


Files touched: proto/guruconnect.proto, agent/src/session/mod.rs, agent/src/ (consent UI dialog), server/src/relay/mod.rs, server/src/session/mod.rs.

  • Add ConsentRequest / ConsentResponse to the proto (after AdminCommand).
  • On an attended session, the agent shows a consent dialog to the end user; the server keeps the session consent_state = pending and surfaces it to the technician only on granted. denied/timeout → tear down.
  • Managed/unattended sessions follow per-tenant policy (default: silent for managed; consent_state = not_required). Audit the consent decision to events.

Reference: specs/native-remote-control/plan.md Task 5 (Consent primitive); proto AdminCommand insertion point.


Task 6: Native viewer — full key fidelity

Files touched: agent/src/viewer/ (low-level hook + input capture), agent/src/input/keyboard.rs (extend — salvaged), agent/src/input/mod.rs, agent/src/bin/sas_service.rs (wire — salvaged), proto/guruconnect.proto (confirm KeyEvent/SpecialKeyEvent coverage).

  • Viewer capture: install WH_KEYBOARD_LL while the viewer is in focused control; divert system combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as KeyEvent (VK + scan code + extended flag) instead of letting the local shell act. Add a "send system keys to remote" toggle.
  • Agent injection: scan-code SendInput (KEYEVENTF_SCANCODE + correct KEYEVENTF_EXTENDEDKEY for right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged input/keyboard.rs.
  • Ctrl+Alt+Del: SpecialKeyEvent.CTRL_ALT_DEL → the salvaged SAS service (bin/sas_service.rs, SendSAS, SYSTEM, SoftwareSASGeneration policy set by the managed installer).
  • Modifier hygiene: track modifier up/down; re-sync (release) on focus loss to kill stuck modifiers.

Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; agent/src/input/keyboard.rs, agent/src/bin/sas_service.rs.


Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback

Files touched: agent/src/encoder/ (mod.rs, h264.rs [new], raw.rs [salvaged]), agent/src/capture/ (feed), agent/src/viewer/ (decode), proto/guruconnect.proto (AgentStatus capability, SessionResponse codec), server/src/session/mod.rs (negotiation).

  • HW H.264 via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's EncodedFrame (h264). Native viewer decodes via MF/D3D11.
  • Fallback: salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
  • Negotiation: agent advertises HW-encode capability in AgentStatus; server selects the codec in SessionResponse; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now (and the Phase-2 WebRTC track later).

Reference: SPEC-002 §5; agent/src/encoder/raw.rs (salvaged), proto/guruconnect.proto (EncodedFrame already modeled).


Task 8: Verification (end-to-end, observable)

  • Security (re-audit clean): run /gc-audit --pass=security (and --pass=rust) → the three relay CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked cak_ key is rejected on /ws/agent; a viewer token for session A is rejected on session B; a logged-out (blacklisted) viewer token is rejected on /ws/viewer; a user JWT is rejected as an agent key.
  • Attended flow: generate a support code → run the one-time agent → end user sees + accepts consent → technician's session appears only after acceptance; a denied consent tears down.
  • Key fidelity (the headline): in a live session, confirm Win+R opens Run on the remote, Ctrl+C on remote / Ctrl+V locally (and vice-versa, text), and Ctrl+Alt+Del reaches the remote secure desktop. Confirm no stuck modifiers after alt-tabbing away and back.
  • Codec: confirm a HW-H.264 machine negotiates h264 (check SessionResponse), a Win7/no-HW machine falls back to raw+Zstd, both render correctly in the native viewer.
  • Rate limiting: hammer /api/auth/login and the code-validate route → confirm throttling/lockout.
  • Migrations: fresh DB applies the v2 migrations cleanly; _sqlx_migrations consistent; tenant_id populated with the default tenant.