SPEC-002 Phase 1 Task 2 (specs/v2-secure-session-core), code-reviewed APPROVED. - DELETE the JWT-as-agent-key branch in relay validate_agent_api_key (audit CRITICAL): agent auth now = per-agent cak_ key (SHA-256 -> connect_agent_keys, revoked filtered) OR support code OR deprecated shared AGENT_API_KEY (warned). A user JWT can no longer authenticate an agent. - auth/agent_keys.rs: cak_ gen (OsRng 256-bit) + SHA-256 hash + verify. - auth/jwt.rs: ViewerClaims + create/validate_viewer_token (5-min TTL, purpose=viewer, session_id+tenant_id claims; non-interchangeable with login). - Admin key issuance: POST/GET/DELETE /api/machines/:agent_id/keys. - POST /api/sessions/:id/viewer-token mints a session-bound short-lived token. - Migration 005: organization/site/tags on connect_machines (fixes the silent update_machine_metadata write, coord todo faf39fe0). NOTE: viewer-token minting is gated by AuthenticatedUser only; the AUTHORIZATION check (admin/permission gate) that closes audit CRITICAL #1 lands in Task 3 (the viewer WS verification). The viewer WS path (relay/mod.rs:285) is untouched here. Not cargo-check-verified (no toolchain on the authoring host) - self-reviewed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
13 KiB
v2 Secure Session Core — Implementation Plan
Spec created: 2026-05-29 Status: in progress — Tasks 1-2 DONE 2026-05-29; Task 3 (relay WS) next. CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2 fixed only the token mechanism; the authz gate is what actually closes audit CRITICAL #1. Policy (admin-only vs admin-or-view-permission) pending Mike's decision. Parent:
docs/specs/SPEC-002-v2-modernization-architecture.md(Phase 1) Keystone: Tasks 1–4 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH is closed there. Tasks 5–7 deliver the product capability on top. Do them in order.
Task 0: Commit this spec
Commit the specs/v2-secure-session-core/ directory before writing any code:
git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"
Do not start Task 1 until this commit exists.
Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables
[DONE] migration
004_v2_secure_session_core.sql+db/agent_keys.rs+db/tenancy.rs+ struct/query updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db layer already uses runtimesqlx::query()(no macros) — the v2 "switch to runtime" was already true.
Files touched: server/migrations/ (new v2 migration files), server/src/db/ (rebuilt modules:
agent_keys.rs [new], sessions.rs, machines.rs, support_codes.rs, events.rs, users.rs,
mod.rs).
- New table
connect_agent_keys:id UUID,machine_id UUID FK,key_hash TEXT,tenant_id UUID NULL,created_at,last_used_at,revoked_at TIMESTAMPTZ NULL. Keys arecak_-prefixed, stored hashed (SHA-256, mirroring v1's hash helper); plaintext returned once at issuance. - Add nullable
tenant_id UUIDtomachines,sessions,support_codes,events,users,connect_agent_keys, defaulting to a single bootstrap tenant row. Add atenantstable with one seed row. sessionsgainsis_managed BOOLEAN,source TEXT('standalone'|'gururmm'),consent_state TEXT('not_required'|'pending'|'granted'|'denied'),tenant_id.support_codes: add single-use semantics —consumed_at TIMESTAMPTZ NULL; widen the code column to hold a higher-entropy human-readable code (see Task 4).- Migrations are idempotent (
CREATE TABLE IF NOT EXISTS/ADD COLUMN IF NOT EXISTS), applied on server startup, recorded in_sqlx_migrations. New queries use runtimesqlx::query(). - DB modules expose a
tenancyhelper that today resolves every call to the default tenant (Phase-4 switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).
Reference: specs/native-remote-control/plan.md Task 2 (connect_agent_keys); .claude/standards/gururmm/sqlx-migrations.md.
Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens
CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by
AuthenticatedUser(authentication, not authorization). GC has a realadmin|operator|viewerrole + permissions model, so this is intra- tenant privilege escalation until a permission check is added. The mechanism is fixed here; the authz check in Task 3 is what closes audit CRITICAL #1. Metadata bug todo faf39fe0 resolved (migration 005).
[IMPLEMENTED]
auth/agent_keys.rs[new] (cak_ mint/SHA-256 hash/verify),auth/jwt.rs(ViewerClaims+create_viewer_token/validate_viewer_token, 5-min TTL,purpose:"viewer"),auth/mod.rs(module + re-export). Deleted the JWT-as-agent-key branch inrelay/mod.rsvalidate_agent_api_key— now per-agentcak_key OR deprecated sharedAGENT_API_KEY(WARNING-logged), never a user JWT. New endpoints:POST/GET /api/machines/:agent_id/keys,DELETE /api/machines/:agent_id/keys/:key_id(admin),POST /api/sessions/:id/viewer-token(dashboard JWT). db helpers added:agent_keys::{list_for_machine,key_belongs_to_machine}. Folded in migration005_machine_metadata.sql+ Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust toolchain on this machine — self-reviewed; not yetcargo check-verified.
Files touched: server/src/auth/ (mod.rs, jwt.rs, agent_keys.rs [new], token_blacklist.rs,
password.rs), server/src/api/auth.rs, server/src/api/sessions.rs.
- Delete the JWT-as-agent-key path. Remove the
jwt_config.validate_tokenbranch fromvalidate_agent_api_key(relay/mod.rs:224); agent authentication validates acak_per-agent key (hash compare againstconnect_agent_keys, reject ifrevoked_atset) OR a support code — never a user JWT. - Session-scoped viewer tokens: new endpoint
POST /api/sessions/:id/viewer-token(auth: dashboard JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose claims includesession_idandtenant_id. This replaces "any dashboard JWT can view any session." - Keep Argon2id passwords; keep the blacklist but make
is_revoked()callable from the WS layer (Task 3). - Per-agent key issuance endpoint (admin):
POST /api/machines/:id/keys→ returns plaintextcak_once, stores hash.DELETE /api/machines/:id/keys/:key_idsetsrevoked_at.
Reference: relay/mod.rs:224 (validate_agent_api_key — the CRITICAL), auth/mod.rs:116
(blacklist already consulted for REST — extend to WS), specs/native-remote-control/plan.md Tasks 2/3/6;
.claude/standards/security/credential-handling.md, .claude/standards/api/response-format.md.
Task 3 (KEYSTONE): Secure relay WS handlers + bounded relay
Files touched: server/src/relay/mod.rs, server/src/session/mod.rs.
viewer_ws_handler(relay/mod.rs:242): verify the viewer token's signature + expiry + blacklist +session_idclaim == requestedsession_idbeforehandle_viewer_connection(relay/mod.rs:595). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)- Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit
CRITICAL #1. The minting endpoint
POST /api/sessions/:id/viewer-token(inserver/src/api/sessions.rs) must enforce a real permission predicate, not justAuthenticatedUser:user.is_admin() || user.has_permission(<policy>). GC's role model (admin|operator|viewer) + permissions table already exist (server/src/auth/mod.rs), so honoring the intra-tenant role distinction is cheap. Policy decision (Mike): admin-only, or admin-or-view_sessions-permission. Multi-tenant client-access isolation stays deferred to Phase 4; this is only the intra-tenant role gate. agent_ws_handler(relay/mod.rs:55): authenticate via per-agent key OR support code only (Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-stringagent_idalone (session/mod.rs:98).- Frame caps: set explicit
.max_message_size(...)/.max_frame_size(...)on bothWebSocketUpgrades; reject oversized frames beforeto_vec()/broadcast. (Fixes WS-OOM HIGH.) - Input throttle: bound + rate-limit the viewer→agent input queue (
relay/mod.rs:669); cap events/sec. - Reconcile managed sessions from DB on startup so they aren't orphaned.
Reference: audit Pass E (reports/2026-05-29-gc-audit.md §"Pass 5"); relay/mod.rs:242,55,595,669.
Task 4 (KEYSTONE): Working rate limiting + single-use support codes
Files touched: server/src/middleware/rate_limit.rs (rebuild — v1 is non-compiling),
server/src/middleware/mod.rs, server/src/api/auth.rs (login), server/src/api/ (code validate),
server/src/db/support_codes.rs, server/src/relay/mod.rs (code bind).
- Rebuild a compiling rate-limit layer (fix the
tower_governorgenerics, or a small in-memory fixed-window limiter); re-enablepub mod rate_limit. Wire it toPOST /api/auth/login,POST /api/auth/change-password, and the support-code validate route. - Single-use codes: consume atomically on first agent bind (accept only a
pending, unconsumed code; setconsumed_at); reject a second presenter. (Fixes the reusable-code HIGH.) - Widen the code: higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing 6-digit numeric; add per-IP lockout on repeated validate failures.
Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); middleware/mod.rs:3-11.
Task 5: Attended-mode consent
Files touched: proto/guruconnect.proto, agent/src/session/mod.rs, agent/src/ (consent UI dialog),
server/src/relay/mod.rs, server/src/session/mod.rs.
- Add
ConsentRequest/ConsentResponseto the proto (afterAdminCommand). - On an attended session, the agent shows a consent dialog to the end user; the server keeps the session
consent_state = pendingand surfaces it to the technician only ongranted.denied/timeout → tear down. - Managed/unattended sessions follow per-tenant policy (default: silent for managed;
consent_state = not_required). Audit the consent decision toevents.
Reference: specs/native-remote-control/plan.md Task 5 (Consent primitive); proto AdminCommand insertion point.
Task 6: Native viewer — full key fidelity
Files touched: agent/src/viewer/ (low-level hook + input capture), agent/src/input/keyboard.rs
(extend — salvaged), agent/src/input/mod.rs, agent/src/bin/sas_service.rs (wire — salvaged),
proto/guruconnect.proto (confirm KeyEvent/SpecialKeyEvent coverage).
- Viewer capture: install
WH_KEYBOARD_LLwhile the viewer is in focused control; divert system combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward asKeyEvent(VK + scan code + extended flag) instead of letting the local shell act. Add a "send system keys to remote" toggle. - Agent injection: scan-code
SendInput(KEYEVENTF_SCANCODE+ correctKEYEVENTF_EXTENDEDKEYfor right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvagedinput/keyboard.rs. - Ctrl+Alt+Del:
SpecialKeyEvent.CTRL_ALT_DEL→ the salvaged SAS service (bin/sas_service.rs,SendSAS, SYSTEM,SoftwareSASGenerationpolicy set by the managed installer). - Modifier hygiene: track modifier up/down; re-sync (release) on focus loss to kill stuck modifiers.
Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; agent/src/input/keyboard.rs, agent/src/bin/sas_service.rs.
Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback
Files touched: agent/src/encoder/ (mod.rs, h264.rs [new], raw.rs [salvaged]),
agent/src/capture/ (feed), agent/src/viewer/ (decode), proto/guruconnect.proto
(AgentStatus capability, SessionResponse codec), server/src/session/mod.rs (negotiation).
- HW H.264 via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
EncodedFrame(h264). Native viewer decodes via MF/D3D11. - Fallback: salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
- Negotiation: agent advertises HW-encode capability in
AgentStatus; server selects the codec inSessionResponse; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now (and the Phase-2 WebRTC track later).
Reference: SPEC-002 §5; agent/src/encoder/raw.rs (salvaged), proto/guruconnect.proto (EncodedFrame already modeled).
Task 8: Verification (end-to-end, observable)
- Security (re-audit clean): run
/gc-audit --pass=security(and--pass=rust) → the three relay CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revokedcak_key is rejected on/ws/agent; a viewer token for session A is rejected on session B; a logged-out (blacklisted) viewer token is rejected on/ws/viewer; a user JWT is rejected as an agent key. - Attended flow: generate a support code → run the one-time agent → end user sees + accepts consent → technician's session appears only after acceptance; a denied consent tears down.
- Key fidelity (the headline): in a live session, confirm Win+R opens Run on the remote, Ctrl+C on remote / Ctrl+V locally (and vice-versa, text), and Ctrl+Alt+Del reaches the remote secure desktop. Confirm no stuck modifiers after alt-tabbing away and back.
- Codec: confirm a HW-H.264 machine negotiates h264 (check
SessionResponse), a Win7/no-HW machine falls back to raw+Zstd, both render correctly in the native viewer. - Rate limiting: hammer
/api/auth/loginand the code-validate route → confirm throttling/lockout. - Migrations: fresh DB applies the v2 migrations cleanly;
_sqlx_migrationsconsistent;tenant_idpopulated with the default tenant.