Files
guru-connect/specs/v2-secure-session-core/plan.md
Mike Swanson 81e4b99a34
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m2s
Build and Test / Build Server (Linux) (push) Successful in 10m41s
Build and Test / Security Audit (push) Successful in 4m17s
Build and Test / Build Summary (push) Successful in 8s
spec: add v2-secure-session-core shape spec
Phase 1 of SPEC-002 (GuruConnect v2). Keystone-first plan: Tasks 1-4
rebuild the auth/session core that closes the 3 audit CRITICALs by design
(per-agent cak_ keys, plane separation, session-scoped viewer tokens,
blacklist+frame-caps+throttle on the relay WS, single-use rate-limited
support codes, tenancy-ready schema); Tasks 5-7 deliver attended consent,
native full key fidelity (WH_KEYBOARD_LL hook, scan-code injection, SAS
Ctrl+Alt+Del), and HW H.264 with raw+Zstd fallback. plan/shape/references/
standards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 18:15:37 -07:00

172 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# v2 Secure Session Core — Implementation Plan
> Spec created: 2026-05-29
> Status: not started
> Parent: `docs/specs/SPEC-002-v2-modernization-architecture.md` (Phase 1)
> Keystone: Tasks 14 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH
> is closed there. Tasks 57 deliver the product capability on top. Do them in order.
## Task 0: Commit this spec
Commit the `specs/v2-secure-session-core/` directory before writing any code:
```
git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"
```
Do not start Task 1 until this commit exists.
---
## Task 1 (KEYSTONE): v2 schema — per-agent keys + tenancy-ready tables
Files touched: `server/migrations/` (new v2 migration files), `server/src/db/` (rebuilt modules:
`agent_keys.rs` [new], `sessions.rs`, `machines.rs`, `support_codes.rs`, `events.rs`, `users.rs`,
`mod.rs`).
- New table `connect_agent_keys`: `id UUID`, `machine_id UUID FK`, `key_hash TEXT`, `tenant_id UUID NULL`,
`created_at`, `last_used_at`, `revoked_at TIMESTAMPTZ NULL`. Keys are `cak_`-prefixed, stored hashed
(SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
- Add nullable `tenant_id UUID` to `machines`, `sessions`, `support_codes`, `events`, `users`,
`connect_agent_keys`, defaulting to a single bootstrap tenant row. Add a `tenants` table with one seed row.
- `sessions` gains `is_managed BOOLEAN`, `source TEXT` (`'standalone'|'gururmm'`), `consent_state TEXT`
(`'not_required'|'pending'|'granted'|'denied'`), `tenant_id`.
- `support_codes`: add single-use semantics — `consumed_at TIMESTAMPTZ NULL`; widen the code column to
hold a higher-entropy human-readable code (see Task 4).
- Migrations are **idempotent** (`CREATE TABLE IF NOT EXISTS` / `ADD COLUMN IF NOT EXISTS`), applied on
server startup, recorded in `_sqlx_migrations`. New queries use runtime `sqlx::query()`.
- DB modules expose a `tenancy` helper that today resolves every call to the default tenant (Phase-4
switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).
Reference: `specs/native-remote-control/plan.md` Task 2 (`connect_agent_keys`); `.claude/standards/gururmm/sqlx-migrations.md`.
---
## Task 2 (KEYSTONE): Rebuilt auth model — plane separation + session-scoped viewer tokens
Files touched: `server/src/auth/` (`mod.rs`, `jwt.rs`, `agent_keys.rs` [new], `token_blacklist.rs`,
`password.rs`), `server/src/api/auth.rs`, `server/src/api/sessions.rs`.
- **Delete the JWT-as-agent-key path.** Remove the `jwt_config.validate_token` branch from
`validate_agent_api_key` (`relay/mod.rs:224`); agent authentication validates a `cak_` per-agent key
(hash compare against `connect_agent_keys`, reject if `revoked_at` set) OR a support code — **never a
user JWT**.
- **Session-scoped viewer tokens:** new endpoint `POST /api/sessions/:id/viewer-token` (auth: dashboard
JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose
claims include `session_id` and `tenant_id`. This replaces "any dashboard JWT can view any session."
- Keep Argon2id passwords; keep the blacklist but make `is_revoked()` callable from the WS layer (Task 3).
- Per-agent key issuance endpoint (admin): `POST /api/machines/:id/keys` → returns plaintext `cak_` once,
stores hash. `DELETE /api/machines/:id/keys/:key_id` sets `revoked_at`.
Reference: `relay/mod.rs:224` (`validate_agent_api_key` — the CRITICAL), `auth/mod.rs:116`
(blacklist already consulted for REST — extend to WS), `specs/native-remote-control/plan.md` Tasks 2/3/6;
`.claude/standards/security/credential-handling.md`, `.claude/standards/api/response-format.md`.
---
## Task 3 (KEYSTONE): Secure relay WS handlers + bounded relay
Files touched: `server/src/relay/mod.rs`, `server/src/session/mod.rs`.
- **`viewer_ws_handler`** (`relay/mod.rs:242`): verify the viewer token's **signature + expiry +
blacklist + `session_id` claim == requested `session_id`** before `handle_viewer_connection`
(`relay/mod.rs:595`). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
- **`agent_ws_handler`** (`relay/mod.rs:55`): authenticate via per-agent key OR support code only
(Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string
`agent_id` alone (`session/mod.rs:98`).
- **Frame caps:** set explicit `.max_message_size(...)`/`.max_frame_size(...)` on both
`WebSocketUpgrade`s; reject oversized frames before `to_vec()`/broadcast. (Fixes WS-OOM HIGH.)
- **Input throttle:** bound + rate-limit the viewer→agent input queue (`relay/mod.rs:669`); cap events/sec.
- Reconcile managed sessions from DB on startup so they aren't orphaned.
Reference: audit Pass E (`reports/2026-05-29-gc-audit.md` §"Pass 5"); `relay/mod.rs:242,55,595,669`.
---
## Task 4 (KEYSTONE): Working rate limiting + single-use support codes
Files touched: `server/src/middleware/rate_limit.rs` (rebuild — v1 is non-compiling),
`server/src/middleware/mod.rs`, `server/src/api/auth.rs` (login), `server/src/api/` (code validate),
`server/src/db/support_codes.rs`, `server/src/relay/mod.rs` (code bind).
- Rebuild a **compiling** rate-limit layer (fix the `tower_governor` generics, or a small in-memory
fixed-window limiter); re-enable `pub mod rate_limit`. Wire it to `POST /api/auth/login`,
`POST /api/auth/change-password`, and the support-code validate route.
- **Single-use codes:** consume atomically on first agent bind (accept only a `pending`, unconsumed code;
set `consumed_at`); reject a second presenter. (Fixes the reusable-code HIGH.)
- **Widen the code:** higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing
6-digit numeric; add per-IP lockout on repeated validate failures.
Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); `middleware/mod.rs:3-11`.
---
## Task 5: Attended-mode consent
Files touched: `proto/guruconnect.proto`, `agent/src/session/mod.rs`, `agent/src/` (consent UI dialog),
`server/src/relay/mod.rs`, `server/src/session/mod.rs`.
- Add `ConsentRequest` / `ConsentResponse` to the proto (after `AdminCommand`).
- On an attended session, the agent shows a consent dialog to the end user; the server keeps the session
`consent_state = pending` and surfaces it to the technician only on `granted`. `denied`/timeout → tear down.
- Managed/unattended sessions follow per-tenant policy (default: silent for managed; `consent_state =
not_required`). Audit the consent decision to `events`.
Reference: `specs/native-remote-control/plan.md` Task 5 (Consent primitive); proto `AdminCommand` insertion point.
---
## Task 6: Native viewer — full key fidelity
Files touched: `agent/src/viewer/` (low-level hook + input capture), `agent/src/input/keyboard.rs`
(extend — salvaged), `agent/src/input/mod.rs`, `agent/src/bin/sas_service.rs` (wire — salvaged),
`proto/guruconnect.proto` (confirm `KeyEvent`/`SpecialKeyEvent` coverage).
- **Viewer capture:** install `WH_KEYBOARD_LL` while the viewer is in focused control; divert system
combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as `KeyEvent` (VK + scan code + extended flag)
instead of letting the local shell act. Add a "send system keys to remote" toggle.
- **Agent injection:** scan-code `SendInput` (`KEYEVENTF_SCANCODE` + correct `KEYEVENTF_EXTENDEDKEY` for
right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged `input/keyboard.rs`.
- **Ctrl+Alt+Del:** `SpecialKeyEvent.CTRL_ALT_DEL` → the salvaged SAS service (`bin/sas_service.rs`,
`SendSAS`, SYSTEM, `SoftwareSASGeneration` policy set by the managed installer).
- **Modifier hygiene:** track modifier up/down; **re-sync (release) on focus loss** to kill stuck modifiers.
Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.rs`, `agent/src/bin/sas_service.rs`.
---
## Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback
Files touched: `agent/src/encoder/` (`mod.rs`, `h264.rs` [new], `raw.rs` [salvaged]),
`agent/src/capture/` (feed), `agent/src/viewer/` (decode), `proto/guruconnect.proto`
(`AgentStatus` capability, `SessionResponse` codec), `server/src/session/mod.rs` (negotiation).
- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
`EncodedFrame` (h264). Native viewer decodes via MF/D3D11.
- **Fallback:** salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
- **Negotiation:** agent advertises HW-encode capability in `AgentStatus`; server selects the codec in
`SessionResponse`; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now
(and the Phase-2 WebRTC track later).
Reference: SPEC-002 §5; `agent/src/encoder/raw.rs` (salvaged), `proto/guruconnect.proto` (EncodedFrame already modeled).
---
## Task 8: Verification (end-to-end, observable)
- **Security (re-audit clean):** run `/gc-audit --pass=security` (and `--pass=rust`) → the three relay
CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked
`cak_` key is rejected on `/ws/agent`; a viewer token for session A is rejected on session B; a
logged-out (blacklisted) viewer token is rejected on `/ws/viewer`; a user JWT is rejected as an agent key.
- **Attended flow:** generate a support code → run the one-time agent → end user sees + accepts consent →
technician's session appears only after acceptance; a denied consent tears down.
- **Key fidelity (the headline):** in a live session, confirm **Win+R opens Run on the remote**, **Ctrl+C
on remote / Ctrl+V locally** (and vice-versa, text), and **Ctrl+Alt+Del reaches the remote secure
desktop**. Confirm no stuck modifiers after alt-tabbing away and back.
- **Codec:** confirm a HW-H.264 machine negotiates h264 (check `SessionResponse`), a Win7/no-HW machine
falls back to raw+Zstd, both render correctly in the native viewer.
- **Rate limiting:** hammer `/api/auth/login` and the code-validate route → confirm throttling/lockout.
- **Migrations:** fresh DB applies the v2 migrations cleanly; `_sqlx_migrations` consistent; `tenant_id`
populated with the default tenant.