Files
guru-connect/specs/v2-secure-session-core/plan.md
Mike Swanson 41691bfb2c
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m37s
Build and Test / Build Agent (Windows) (push) Successful in 6m37s
Build and Test / Security Audit (push) Successful in 4m10s
Build and Test / Build Summary (push) Has been skipped
feat(server): v2 secure-session-core Task 2 - auth rebuild
SPEC-002 Phase 1 Task 2 (specs/v2-secure-session-core), code-reviewed APPROVED.

- DELETE the JWT-as-agent-key branch in relay validate_agent_api_key (audit
  CRITICAL): agent auth now = per-agent cak_ key (SHA-256 -> connect_agent_keys,
  revoked filtered) OR support code OR deprecated shared AGENT_API_KEY (warned).
  A user JWT can no longer authenticate an agent.
- auth/agent_keys.rs: cak_ gen (OsRng 256-bit) + SHA-256 hash + verify.
- auth/jwt.rs: ViewerClaims + create/validate_viewer_token (5-min TTL,
  purpose=viewer, session_id+tenant_id claims; non-interchangeable with login).
- Admin key issuance: POST/GET/DELETE /api/machines/:agent_id/keys.
- POST /api/sessions/:id/viewer-token mints a session-bound short-lived token.
- Migration 005: organization/site/tags on connect_machines (fixes the silent
  update_machine_metadata write, coord todo faf39fe0).

NOTE: viewer-token minting is gated by AuthenticatedUser only; the AUTHORIZATION
check (admin/permission gate) that closes audit CRITICAL #1 lands in Task 3 (the
viewer WS verification). The viewer WS path (relay/mod.rs:285) is untouched here.
Not cargo-check-verified (no toolchain on the authoring host) - self-reviewed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 18:57:12 -07:00

13 KiB
Raw Blame History

v2 Secure Session Core — Implementation Plan

Spec created: 2026-05-29 Status: in progress — Tasks 1-2 DONE 2026-05-29; Task 3 (relay WS) next. CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2 fixed only the token mechanism; the authz gate is what actually closes audit CRITICAL #1. Policy (admin-only vs admin-or-view-permission) pending Mike's decision. Parent: docs/specs/SPEC-002-v2-modernization-architecture.md (Phase 1) Keystone: Tasks 14 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH is closed there. Tasks 57 deliver the product capability on top. Do them in order.

Task 0: Commit this spec

Commit the specs/v2-secure-session-core/ directory before writing any code:

git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"

Do not start Task 1 until this commit exists.


Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables

[DONE] migration 004_v2_secure_session_core.sql + db/agent_keys.rs + db/tenancy.rs + struct/query updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db layer already uses runtime sqlx::query() (no macros) — the v2 "switch to runtime" was already true.

Files touched: server/migrations/ (new v2 migration files), server/src/db/ (rebuilt modules: agent_keys.rs [new], sessions.rs, machines.rs, support_codes.rs, events.rs, users.rs, mod.rs).

  • New table connect_agent_keys: id UUID, machine_id UUID FK, key_hash TEXT, tenant_id UUID NULL, created_at, last_used_at, revoked_at TIMESTAMPTZ NULL. Keys are cak_-prefixed, stored hashed (SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
  • Add nullable tenant_id UUID to machines, sessions, support_codes, events, users, connect_agent_keys, defaulting to a single bootstrap tenant row. Add a tenants table with one seed row.
  • sessions gains is_managed BOOLEAN, source TEXT ('standalone'|'gururmm'), consent_state TEXT ('not_required'|'pending'|'granted'|'denied'), tenant_id.
  • support_codes: add single-use semantics — consumed_at TIMESTAMPTZ NULL; widen the code column to hold a higher-entropy human-readable code (see Task 4).
  • Migrations are idempotent (CREATE TABLE IF NOT EXISTS / ADD COLUMN IF NOT EXISTS), applied on server startup, recorded in _sqlx_migrations. New queries use runtime sqlx::query().
  • DB modules expose a tenancy helper that today resolves every call to the default tenant (Phase-4 switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).

Reference: specs/native-remote-control/plan.md Task 2 (connect_agent_keys); .claude/standards/gururmm/sqlx-migrations.md.


Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens

CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by AuthenticatedUser (authentication, not authorization). GC has a real admin|operator|viewer role + permissions model, so this is intra- tenant privilege escalation until a permission check is added. The mechanism is fixed here; the authz check in Task 3 is what closes audit CRITICAL #1. Metadata bug todo faf39fe0 resolved (migration 005).

[IMPLEMENTED] auth/agent_keys.rs [new] (cak_ mint/SHA-256 hash/verify), auth/jwt.rs (ViewerClaims + create_viewer_token/validate_viewer_token, 5-min TTL, purpose:"viewer"), auth/mod.rs (module + re-export). Deleted the JWT-as-agent-key branch in relay/mod.rs validate_agent_api_key — now per-agent cak_ key OR deprecated shared AGENT_API_KEY (WARNING-logged), never a user JWT. New endpoints: POST/GET /api/machines/:agent_id/keys, DELETE /api/machines/:agent_id/keys/:key_id (admin), POST /api/sessions/:id/viewer-token (dashboard JWT). db helpers added: agent_keys::{list_for_machine,key_belongs_to_machine}. Folded in migration 005_machine_metadata.sql + Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust toolchain on this machine — self-reviewed; not yet cargo check-verified.

Files touched: server/src/auth/ (mod.rs, jwt.rs, agent_keys.rs [new], token_blacklist.rs, password.rs), server/src/api/auth.rs, server/src/api/sessions.rs.

  • Delete the JWT-as-agent-key path. Remove the jwt_config.validate_token branch from validate_agent_api_key (relay/mod.rs:224); agent authentication validates a cak_ per-agent key (hash compare against connect_agent_keys, reject if revoked_at set) OR a support code — never a user JWT.
  • Session-scoped viewer tokens: new endpoint POST /api/sessions/:id/viewer-token (auth: dashboard JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose claims include session_id and tenant_id. This replaces "any dashboard JWT can view any session."
  • Keep Argon2id passwords; keep the blacklist but make is_revoked() callable from the WS layer (Task 3).
  • Per-agent key issuance endpoint (admin): POST /api/machines/:id/keys → returns plaintext cak_ once, stores hash. DELETE /api/machines/:id/keys/:key_id sets revoked_at.

Reference: relay/mod.rs:224 (validate_agent_api_key — the CRITICAL), auth/mod.rs:116 (blacklist already consulted for REST — extend to WS), specs/native-remote-control/plan.md Tasks 2/3/6; .claude/standards/security/credential-handling.md, .claude/standards/api/response-format.md.


Task 3 (KEYSTONE): Secure relay WS handlers + bounded relay

Files touched: server/src/relay/mod.rs, server/src/session/mod.rs.

  • viewer_ws_handler (relay/mod.rs:242): verify the viewer token's signature + expiry + blacklist + session_id claim == requested session_id before handle_viewer_connection (relay/mod.rs:595). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
  • Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit CRITICAL #1. The minting endpoint POST /api/sessions/:id/viewer-token (in server/src/api/sessions.rs) must enforce a real permission predicate, not just AuthenticatedUser: user.is_admin() || user.has_permission(<policy>). GC's role model (admin|operator|viewer) + permissions table already exist (server/src/auth/mod.rs), so honoring the intra-tenant role distinction is cheap. Policy decision (Mike): admin-only, or admin-or-view_sessions-permission. Multi-tenant client-access isolation stays deferred to Phase 4; this is only the intra-tenant role gate.
  • agent_ws_handler (relay/mod.rs:55): authenticate via per-agent key OR support code only (Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string agent_id alone (session/mod.rs:98).
  • Frame caps: set explicit .max_message_size(...)/.max_frame_size(...) on both WebSocketUpgrades; reject oversized frames before to_vec()/broadcast. (Fixes WS-OOM HIGH.)
  • Input throttle: bound + rate-limit the viewer→agent input queue (relay/mod.rs:669); cap events/sec.
  • Reconcile managed sessions from DB on startup so they aren't orphaned.

Reference: audit Pass E (reports/2026-05-29-gc-audit.md §"Pass 5"); relay/mod.rs:242,55,595,669.


Task 4 (KEYSTONE): Working rate limiting + single-use support codes

Files touched: server/src/middleware/rate_limit.rs (rebuild — v1 is non-compiling), server/src/middleware/mod.rs, server/src/api/auth.rs (login), server/src/api/ (code validate), server/src/db/support_codes.rs, server/src/relay/mod.rs (code bind).

  • Rebuild a compiling rate-limit layer (fix the tower_governor generics, or a small in-memory fixed-window limiter); re-enable pub mod rate_limit. Wire it to POST /api/auth/login, POST /api/auth/change-password, and the support-code validate route.
  • Single-use codes: consume atomically on first agent bind (accept only a pending, unconsumed code; set consumed_at); reject a second presenter. (Fixes the reusable-code HIGH.)
  • Widen the code: higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing 6-digit numeric; add per-IP lockout on repeated validate failures.

Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); middleware/mod.rs:3-11.


Files touched: proto/guruconnect.proto, agent/src/session/mod.rs, agent/src/ (consent UI dialog), server/src/relay/mod.rs, server/src/session/mod.rs.

  • Add ConsentRequest / ConsentResponse to the proto (after AdminCommand).
  • On an attended session, the agent shows a consent dialog to the end user; the server keeps the session consent_state = pending and surfaces it to the technician only on granted. denied/timeout → tear down.
  • Managed/unattended sessions follow per-tenant policy (default: silent for managed; consent_state = not_required). Audit the consent decision to events.

Reference: specs/native-remote-control/plan.md Task 5 (Consent primitive); proto AdminCommand insertion point.


Task 6: Native viewer — full key fidelity

Files touched: agent/src/viewer/ (low-level hook + input capture), agent/src/input/keyboard.rs (extend — salvaged), agent/src/input/mod.rs, agent/src/bin/sas_service.rs (wire — salvaged), proto/guruconnect.proto (confirm KeyEvent/SpecialKeyEvent coverage).

  • Viewer capture: install WH_KEYBOARD_LL while the viewer is in focused control; divert system combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as KeyEvent (VK + scan code + extended flag) instead of letting the local shell act. Add a "send system keys to remote" toggle.
  • Agent injection: scan-code SendInput (KEYEVENTF_SCANCODE + correct KEYEVENTF_EXTENDEDKEY for right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged input/keyboard.rs.
  • Ctrl+Alt+Del: SpecialKeyEvent.CTRL_ALT_DEL → the salvaged SAS service (bin/sas_service.rs, SendSAS, SYSTEM, SoftwareSASGeneration policy set by the managed installer).
  • Modifier hygiene: track modifier up/down; re-sync (release) on focus loss to kill stuck modifiers.

Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; agent/src/input/keyboard.rs, agent/src/bin/sas_service.rs.


Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback

Files touched: agent/src/encoder/ (mod.rs, h264.rs [new], raw.rs [salvaged]), agent/src/capture/ (feed), agent/src/viewer/ (decode), proto/guruconnect.proto (AgentStatus capability, SessionResponse codec), server/src/session/mod.rs (negotiation).

  • HW H.264 via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's EncodedFrame (h264). Native viewer decodes via MF/D3D11.
  • Fallback: salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
  • Negotiation: agent advertises HW-encode capability in AgentStatus; server selects the codec in SessionResponse; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now (and the Phase-2 WebRTC track later).

Reference: SPEC-002 §5; agent/src/encoder/raw.rs (salvaged), proto/guruconnect.proto (EncodedFrame already modeled).


Task 8: Verification (end-to-end, observable)

  • Security (re-audit clean): run /gc-audit --pass=security (and --pass=rust) → the three relay CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked cak_ key is rejected on /ws/agent; a viewer token for session A is rejected on session B; a logged-out (blacklisted) viewer token is rejected on /ws/viewer; a user JWT is rejected as an agent key.
  • Attended flow: generate a support code → run the one-time agent → end user sees + accepts consent → technician's session appears only after acceptance; a denied consent tears down.
  • Key fidelity (the headline): in a live session, confirm Win+R opens Run on the remote, Ctrl+C on remote / Ctrl+V locally (and vice-versa, text), and Ctrl+Alt+Del reaches the remote secure desktop. Confirm no stuck modifiers after alt-tabbing away and back.
  • Codec: confirm a HW-H.264 machine negotiates h264 (check SessionResponse), a Win7/no-HW machine falls back to raw+Zstd, both render correctly in the native viewer.
  • Rate limiting: hammer /api/auth/login and the code-validate route → confirm throttling/lockout.
  • Migrations: fresh DB applies the v2 migrations cleanly; _sqlx_migrations consistent; tenant_id populated with the default tenant.