Files
guru-connect/specs/v2-secure-session-core/plan.md
Mike Swanson fef8111ff3
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 6m7s
Build and Test / Build Server (Linux) (push) Successful in 10m15s
Build and Test / Security Audit (push) Successful in 4m24s
Build and Test / Build Summary (push) Successful in 12s
feat(server): v2 secure-session-core Task 1 - schema + per-agent keys
SPEC-002 Phase 1 Task 1 (specs/v2-secure-session-core), code-reviewed APPROVED.

Migration 004 (idempotent, server-applied): tenants + seeded default tenant,
connect_agent_keys (hash-only, revocable, FK->connect_machines), nullable
tenant_id on all scoped tables (tenancy-ready, not tenant-yet), connect_sessions
is_managed/source/consent_state, connect_support_codes consumed_at. New db
modules agent_keys.rs (stores only key_hash) + tenancy.rs (DEFAULT_TENANT_ID,
Phase-4 switch point). Struct/query updates across machines/sessions/
support_codes/events/users. Runtime sqlx throughout (GC db layer already uses
it - no compile-time macros).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 18:33:26 -07:00

176 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# v2 Secure Session Core — Implementation Plan
> Spec created: 2026-05-29
> Status: in progress — Task 1 (schema) DONE 2026-05-29; Task 2 (auth) next
> Parent: `docs/specs/SPEC-002-v2-modernization-architecture.md` (Phase 1)
> Keystone: Tasks 14 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH
> is closed there. Tasks 57 deliver the product capability on top. Do them in order.
## Task 0: Commit this spec
Commit the `specs/v2-secure-session-core/` directory before writing any code:
```
git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"
```
Do not start Task 1 until this commit exists.
---
## Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables
> [DONE] migration `004_v2_secure_session_core.sql` + `db/agent_keys.rs` + `db/tenancy.rs` + struct/query
> updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db
> layer already uses runtime `sqlx::query()` (no macros) — the v2 "switch to runtime" was already true.
Files touched: `server/migrations/` (new v2 migration files), `server/src/db/` (rebuilt modules:
`agent_keys.rs` [new], `sessions.rs`, `machines.rs`, `support_codes.rs`, `events.rs`, `users.rs`,
`mod.rs`).
- New table `connect_agent_keys`: `id UUID`, `machine_id UUID FK`, `key_hash TEXT`, `tenant_id UUID NULL`,
`created_at`, `last_used_at`, `revoked_at TIMESTAMPTZ NULL`. Keys are `cak_`-prefixed, stored hashed
(SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
- Add nullable `tenant_id UUID` to `machines`, `sessions`, `support_codes`, `events`, `users`,
`connect_agent_keys`, defaulting to a single bootstrap tenant row. Add a `tenants` table with one seed row.
- `sessions` gains `is_managed BOOLEAN`, `source TEXT` (`'standalone'|'gururmm'`), `consent_state TEXT`
(`'not_required'|'pending'|'granted'|'denied'`), `tenant_id`.
- `support_codes`: add single-use semantics — `consumed_at TIMESTAMPTZ NULL`; widen the code column to
hold a higher-entropy human-readable code (see Task 4).
- Migrations are **idempotent** (`CREATE TABLE IF NOT EXISTS` / `ADD COLUMN IF NOT EXISTS`), applied on
server startup, recorded in `_sqlx_migrations`. New queries use runtime `sqlx::query()`.
- DB modules expose a `tenancy` helper that today resolves every call to the default tenant (Phase-4
switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).
Reference: `specs/native-remote-control/plan.md` Task 2 (`connect_agent_keys`); `.claude/standards/gururmm/sqlx-migrations.md`.
---
## Task 2 (KEYSTONE): Rebuilt auth model — plane separation + session-scoped viewer tokens
Files touched: `server/src/auth/` (`mod.rs`, `jwt.rs`, `agent_keys.rs` [new], `token_blacklist.rs`,
`password.rs`), `server/src/api/auth.rs`, `server/src/api/sessions.rs`.
- **Delete the JWT-as-agent-key path.** Remove the `jwt_config.validate_token` branch from
`validate_agent_api_key` (`relay/mod.rs:224`); agent authentication validates a `cak_` per-agent key
(hash compare against `connect_agent_keys`, reject if `revoked_at` set) OR a support code — **never a
user JWT**.
- **Session-scoped viewer tokens:** new endpoint `POST /api/sessions/:id/viewer-token` (auth: dashboard
JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose
claims include `session_id` and `tenant_id`. This replaces "any dashboard JWT can view any session."
- Keep Argon2id passwords; keep the blacklist but make `is_revoked()` callable from the WS layer (Task 3).
- Per-agent key issuance endpoint (admin): `POST /api/machines/:id/keys` → returns plaintext `cak_` once,
stores hash. `DELETE /api/machines/:id/keys/:key_id` sets `revoked_at`.
Reference: `relay/mod.rs:224` (`validate_agent_api_key` — the CRITICAL), `auth/mod.rs:116`
(blacklist already consulted for REST — extend to WS), `specs/native-remote-control/plan.md` Tasks 2/3/6;
`.claude/standards/security/credential-handling.md`, `.claude/standards/api/response-format.md`.
---
## Task 3 (KEYSTONE): Secure relay WS handlers + bounded relay
Files touched: `server/src/relay/mod.rs`, `server/src/session/mod.rs`.
- **`viewer_ws_handler`** (`relay/mod.rs:242`): verify the viewer token's **signature + expiry +
blacklist + `session_id` claim == requested `session_id`** before `handle_viewer_connection`
(`relay/mod.rs:595`). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
- **`agent_ws_handler`** (`relay/mod.rs:55`): authenticate via per-agent key OR support code only
(Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string
`agent_id` alone (`session/mod.rs:98`).
- **Frame caps:** set explicit `.max_message_size(...)`/`.max_frame_size(...)` on both
`WebSocketUpgrade`s; reject oversized frames before `to_vec()`/broadcast. (Fixes WS-OOM HIGH.)
- **Input throttle:** bound + rate-limit the viewer→agent input queue (`relay/mod.rs:669`); cap events/sec.
- Reconcile managed sessions from DB on startup so they aren't orphaned.
Reference: audit Pass E (`reports/2026-05-29-gc-audit.md` §"Pass 5"); `relay/mod.rs:242,55,595,669`.
---
## Task 4 (KEYSTONE): Working rate limiting + single-use support codes
Files touched: `server/src/middleware/rate_limit.rs` (rebuild — v1 is non-compiling),
`server/src/middleware/mod.rs`, `server/src/api/auth.rs` (login), `server/src/api/` (code validate),
`server/src/db/support_codes.rs`, `server/src/relay/mod.rs` (code bind).
- Rebuild a **compiling** rate-limit layer (fix the `tower_governor` generics, or a small in-memory
fixed-window limiter); re-enable `pub mod rate_limit`. Wire it to `POST /api/auth/login`,
`POST /api/auth/change-password`, and the support-code validate route.
- **Single-use codes:** consume atomically on first agent bind (accept only a `pending`, unconsumed code;
set `consumed_at`); reject a second presenter. (Fixes the reusable-code HIGH.)
- **Widen the code:** higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing
6-digit numeric; add per-IP lockout on repeated validate failures.
Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); `middleware/mod.rs:3-11`.
---
## Task 5: Attended-mode consent
Files touched: `proto/guruconnect.proto`, `agent/src/session/mod.rs`, `agent/src/` (consent UI dialog),
`server/src/relay/mod.rs`, `server/src/session/mod.rs`.
- Add `ConsentRequest` / `ConsentResponse` to the proto (after `AdminCommand`).
- On an attended session, the agent shows a consent dialog to the end user; the server keeps the session
`consent_state = pending` and surfaces it to the technician only on `granted`. `denied`/timeout → tear down.
- Managed/unattended sessions follow per-tenant policy (default: silent for managed; `consent_state =
not_required`). Audit the consent decision to `events`.
Reference: `specs/native-remote-control/plan.md` Task 5 (Consent primitive); proto `AdminCommand` insertion point.
---
## Task 6: Native viewer — full key fidelity
Files touched: `agent/src/viewer/` (low-level hook + input capture), `agent/src/input/keyboard.rs`
(extend — salvaged), `agent/src/input/mod.rs`, `agent/src/bin/sas_service.rs` (wire — salvaged),
`proto/guruconnect.proto` (confirm `KeyEvent`/`SpecialKeyEvent` coverage).
- **Viewer capture:** install `WH_KEYBOARD_LL` while the viewer is in focused control; divert system
combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as `KeyEvent` (VK + scan code + extended flag)
instead of letting the local shell act. Add a "send system keys to remote" toggle.
- **Agent injection:** scan-code `SendInput` (`KEYEVENTF_SCANCODE` + correct `KEYEVENTF_EXTENDEDKEY` for
right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged `input/keyboard.rs`.
- **Ctrl+Alt+Del:** `SpecialKeyEvent.CTRL_ALT_DEL` → the salvaged SAS service (`bin/sas_service.rs`,
`SendSAS`, SYSTEM, `SoftwareSASGeneration` policy set by the managed installer).
- **Modifier hygiene:** track modifier up/down; **re-sync (release) on focus loss** to kill stuck modifiers.
Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.rs`, `agent/src/bin/sas_service.rs`.
---
## Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback
Files touched: `agent/src/encoder/` (`mod.rs`, `h264.rs` [new], `raw.rs` [salvaged]),
`agent/src/capture/` (feed), `agent/src/viewer/` (decode), `proto/guruconnect.proto`
(`AgentStatus` capability, `SessionResponse` codec), `server/src/session/mod.rs` (negotiation).
- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
`EncodedFrame` (h264). Native viewer decodes via MF/D3D11.
- **Fallback:** salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
- **Negotiation:** agent advertises HW-encode capability in `AgentStatus`; server selects the codec in
`SessionResponse`; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now
(and the Phase-2 WebRTC track later).
Reference: SPEC-002 §5; `agent/src/encoder/raw.rs` (salvaged), `proto/guruconnect.proto` (EncodedFrame already modeled).
---
## Task 8: Verification (end-to-end, observable)
- **Security (re-audit clean):** run `/gc-audit --pass=security` (and `--pass=rust`) → the three relay
CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked
`cak_` key is rejected on `/ws/agent`; a viewer token for session A is rejected on session B; a
logged-out (blacklisted) viewer token is rejected on `/ws/viewer`; a user JWT is rejected as an agent key.
- **Attended flow:** generate a support code → run the one-time agent → end user sees + accepts consent →
technician's session appears only after acceptance; a denied consent tears down.
- **Key fidelity (the headline):** in a live session, confirm **Win+R opens Run on the remote**, **Ctrl+C
on remote / Ctrl+V locally** (and vice-versa, text), and **Ctrl+Alt+Del reaches the remote secure
desktop**. Confirm no stuck modifiers after alt-tabbing away and back.
- **Codec:** confirm a HW-H.264 machine negotiates h264 (check `SessionResponse`), a Win7/no-HW machine
falls back to raw+Zstd, both render correctly in the native viewer.
- **Rate limiting:** hammer `/api/auth/login` and the code-validate route → confirm throttling/lockout.
- **Migrations:** fresh DB applies the v2 migrations cleanly; `_sqlx_migrations` consistent; `tenant_id`
populated with the default tenant.