Files
guru-connect/specs/v2-secure-session-core/plan.md
Mike Swanson 41691bfb2c
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m37s
Build and Test / Build Agent (Windows) (push) Successful in 6m37s
Build and Test / Security Audit (push) Successful in 4m10s
Build and Test / Build Summary (push) Has been skipped
feat(server): v2 secure-session-core Task 2 - auth rebuild
SPEC-002 Phase 1 Task 2 (specs/v2-secure-session-core), code-reviewed APPROVED.

- DELETE the JWT-as-agent-key branch in relay validate_agent_api_key (audit
  CRITICAL): agent auth now = per-agent cak_ key (SHA-256 -> connect_agent_keys,
  revoked filtered) OR support code OR deprecated shared AGENT_API_KEY (warned).
  A user JWT can no longer authenticate an agent.
- auth/agent_keys.rs: cak_ gen (OsRng 256-bit) + SHA-256 hash + verify.
- auth/jwt.rs: ViewerClaims + create/validate_viewer_token (5-min TTL,
  purpose=viewer, session_id+tenant_id claims; non-interchangeable with login).
- Admin key issuance: POST/GET/DELETE /api/machines/:agent_id/keys.
- POST /api/sessions/:id/viewer-token mints a session-bound short-lived token.
- Migration 005: organization/site/tags on connect_machines (fixes the silent
  update_machine_metadata write, coord todo faf39fe0).

NOTE: viewer-token minting is gated by AuthenticatedUser only; the AUTHORIZATION
check (admin/permission gate) that closes audit CRITICAL #1 lands in Task 3 (the
viewer WS verification). The viewer WS path (relay/mod.rs:285) is untouched here.
Not cargo-check-verified (no toolchain on the authoring host) - self-reviewed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 18:57:12 -07:00

201 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# v2 Secure Session Core — Implementation Plan
> Spec created: 2026-05-29
> Status: in progress — Tasks 1-2 DONE 2026-05-29; Task 3 (relay WS) next.
> CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2
> fixed only the token *mechanism*; the authz gate is what actually closes audit CRITICAL #1. Policy
> (admin-only vs admin-or-view-permission) pending Mike's decision.
> Parent: `docs/specs/SPEC-002-v2-modernization-architecture.md` (Phase 1)
> Keystone: Tasks 14 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH
> is closed there. Tasks 57 deliver the product capability on top. Do them in order.
## Task 0: Commit this spec
Commit the `specs/v2-secure-session-core/` directory before writing any code:
```
git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"
```
Do not start Task 1 until this commit exists.
---
## Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables
> [DONE] migration `004_v2_secure_session_core.sql` + `db/agent_keys.rs` + `db/tenancy.rs` + struct/query
> updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db
> layer already uses runtime `sqlx::query()` (no macros) — the v2 "switch to runtime" was already true.
Files touched: `server/migrations/` (new v2 migration files), `server/src/db/` (rebuilt modules:
`agent_keys.rs` [new], `sessions.rs`, `machines.rs`, `support_codes.rs`, `events.rs`, `users.rs`,
`mod.rs`).
- New table `connect_agent_keys`: `id UUID`, `machine_id UUID FK`, `key_hash TEXT`, `tenant_id UUID NULL`,
`created_at`, `last_used_at`, `revoked_at TIMESTAMPTZ NULL`. Keys are `cak_`-prefixed, stored hashed
(SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
- Add nullable `tenant_id UUID` to `machines`, `sessions`, `support_codes`, `events`, `users`,
`connect_agent_keys`, defaulting to a single bootstrap tenant row. Add a `tenants` table with one seed row.
- `sessions` gains `is_managed BOOLEAN`, `source TEXT` (`'standalone'|'gururmm'`), `consent_state TEXT`
(`'not_required'|'pending'|'granted'|'denied'`), `tenant_id`.
- `support_codes`: add single-use semantics — `consumed_at TIMESTAMPTZ NULL`; widen the code column to
hold a higher-entropy human-readable code (see Task 4).
- Migrations are **idempotent** (`CREATE TABLE IF NOT EXISTS` / `ADD COLUMN IF NOT EXISTS`), applied on
server startup, recorded in `_sqlx_migrations`. New queries use runtime `sqlx::query()`.
- DB modules expose a `tenancy` helper that today resolves every call to the default tenant (Phase-4
switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).
Reference: `specs/native-remote-control/plan.md` Task 2 (`connect_agent_keys`); `.claude/standards/gururmm/sqlx-migrations.md`.
---
## Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens
> CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by `AuthenticatedUser` (authentication,
> not authorization). GC has a real `admin|operator|viewer` role + permissions model, so this is intra-
> tenant privilege escalation until a permission check is added. **The mechanism is fixed here; the authz
> check in Task 3 is what closes audit CRITICAL #1.** Metadata bug todo faf39fe0 resolved (migration 005).
> [IMPLEMENTED] `auth/agent_keys.rs` [new] (cak_ mint/SHA-256 hash/verify), `auth/jwt.rs`
> (`ViewerClaims` + `create_viewer_token`/`validate_viewer_token`, 5-min TTL, `purpose:"viewer"`),
> `auth/mod.rs` (module + re-export). Deleted the JWT-as-agent-key branch in `relay/mod.rs`
> `validate_agent_api_key` — now per-agent `cak_` key OR deprecated shared `AGENT_API_KEY` (WARNING-logged),
> never a user JWT. New endpoints: `POST/GET /api/machines/:agent_id/keys`,
> `DELETE /api/machines/:agent_id/keys/:key_id` (admin), `POST /api/sessions/:id/viewer-token` (dashboard JWT).
> db helpers added: `agent_keys::{list_for_machine,key_belongs_to_machine}`. Folded in migration
> `005_machine_metadata.sql` + Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust
> toolchain on this machine — self-reviewed; not yet `cargo check`-verified.
Files touched: `server/src/auth/` (`mod.rs`, `jwt.rs`, `agent_keys.rs` [new], `token_blacklist.rs`,
`password.rs`), `server/src/api/auth.rs`, `server/src/api/sessions.rs`.
- **Delete the JWT-as-agent-key path.** Remove the `jwt_config.validate_token` branch from
`validate_agent_api_key` (`relay/mod.rs:224`); agent authentication validates a `cak_` per-agent key
(hash compare against `connect_agent_keys`, reject if `revoked_at` set) OR a support code — **never a
user JWT**.
- **Session-scoped viewer tokens:** new endpoint `POST /api/sessions/:id/viewer-token` (auth: dashboard
JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose
claims include `session_id` and `tenant_id`. This replaces "any dashboard JWT can view any session."
- Keep Argon2id passwords; keep the blacklist but make `is_revoked()` callable from the WS layer (Task 3).
- Per-agent key issuance endpoint (admin): `POST /api/machines/:id/keys` → returns plaintext `cak_` once,
stores hash. `DELETE /api/machines/:id/keys/:key_id` sets `revoked_at`.
Reference: `relay/mod.rs:224` (`validate_agent_api_key` — the CRITICAL), `auth/mod.rs:116`
(blacklist already consulted for REST — extend to WS), `specs/native-remote-control/plan.md` Tasks 2/3/6;
`.claude/standards/security/credential-handling.md`, `.claude/standards/api/response-format.md`.
---
## Task 3 (KEYSTONE): Secure relay WS handlers + bounded relay
Files touched: `server/src/relay/mod.rs`, `server/src/session/mod.rs`.
- **`viewer_ws_handler`** (`relay/mod.rs:242`): verify the viewer token's **signature + expiry +
blacklist + `session_id` claim == requested `session_id`** before `handle_viewer_connection`
(`relay/mod.rs:595`). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
- **Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit
CRITICAL #1.** The minting endpoint `POST /api/sessions/:id/viewer-token` (in `server/src/api/sessions.rs`)
must enforce a real permission predicate, not just `AuthenticatedUser`: `user.is_admin() ||
user.has_permission(<policy>)`. GC's role model (`admin|operator|viewer`) + permissions table already
exist (`server/src/auth/mod.rs`), so honoring the intra-tenant role distinction is cheap. **Policy
decision (Mike): admin-only, or admin-or-`view_sessions`-permission.** Multi-tenant client-access
isolation stays deferred to Phase 4; this is only the intra-tenant role gate.
- **`agent_ws_handler`** (`relay/mod.rs:55`): authenticate via per-agent key OR support code only
(Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string
`agent_id` alone (`session/mod.rs:98`).
- **Frame caps:** set explicit `.max_message_size(...)`/`.max_frame_size(...)` on both
`WebSocketUpgrade`s; reject oversized frames before `to_vec()`/broadcast. (Fixes WS-OOM HIGH.)
- **Input throttle:** bound + rate-limit the viewer→agent input queue (`relay/mod.rs:669`); cap events/sec.
- Reconcile managed sessions from DB on startup so they aren't orphaned.
Reference: audit Pass E (`reports/2026-05-29-gc-audit.md` §"Pass 5"); `relay/mod.rs:242,55,595,669`.
---
## Task 4 (KEYSTONE): Working rate limiting + single-use support codes
Files touched: `server/src/middleware/rate_limit.rs` (rebuild — v1 is non-compiling),
`server/src/middleware/mod.rs`, `server/src/api/auth.rs` (login), `server/src/api/` (code validate),
`server/src/db/support_codes.rs`, `server/src/relay/mod.rs` (code bind).
- Rebuild a **compiling** rate-limit layer (fix the `tower_governor` generics, or a small in-memory
fixed-window limiter); re-enable `pub mod rate_limit`. Wire it to `POST /api/auth/login`,
`POST /api/auth/change-password`, and the support-code validate route.
- **Single-use codes:** consume atomically on first agent bind (accept only a `pending`, unconsumed code;
set `consumed_at`); reject a second presenter. (Fixes the reusable-code HIGH.)
- **Widen the code:** higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing
6-digit numeric; add per-IP lockout on repeated validate failures.
Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); `middleware/mod.rs:3-11`.
---
## Task 5: Attended-mode consent
Files touched: `proto/guruconnect.proto`, `agent/src/session/mod.rs`, `agent/src/` (consent UI dialog),
`server/src/relay/mod.rs`, `server/src/session/mod.rs`.
- Add `ConsentRequest` / `ConsentResponse` to the proto (after `AdminCommand`).
- On an attended session, the agent shows a consent dialog to the end user; the server keeps the session
`consent_state = pending` and surfaces it to the technician only on `granted`. `denied`/timeout → tear down.
- Managed/unattended sessions follow per-tenant policy (default: silent for managed; `consent_state =
not_required`). Audit the consent decision to `events`.
Reference: `specs/native-remote-control/plan.md` Task 5 (Consent primitive); proto `AdminCommand` insertion point.
---
## Task 6: Native viewer — full key fidelity
Files touched: `agent/src/viewer/` (low-level hook + input capture), `agent/src/input/keyboard.rs`
(extend — salvaged), `agent/src/input/mod.rs`, `agent/src/bin/sas_service.rs` (wire — salvaged),
`proto/guruconnect.proto` (confirm `KeyEvent`/`SpecialKeyEvent` coverage).
- **Viewer capture:** install `WH_KEYBOARD_LL` while the viewer is in focused control; divert system
combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as `KeyEvent` (VK + scan code + extended flag)
instead of letting the local shell act. Add a "send system keys to remote" toggle.
- **Agent injection:** scan-code `SendInput` (`KEYEVENTF_SCANCODE` + correct `KEYEVENTF_EXTENDEDKEY` for
right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged `input/keyboard.rs`.
- **Ctrl+Alt+Del:** `SpecialKeyEvent.CTRL_ALT_DEL` → the salvaged SAS service (`bin/sas_service.rs`,
`SendSAS`, SYSTEM, `SoftwareSASGeneration` policy set by the managed installer).
- **Modifier hygiene:** track modifier up/down; **re-sync (release) on focus loss** to kill stuck modifiers.
Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.rs`, `agent/src/bin/sas_service.rs`.
---
## Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback
Files touched: `agent/src/encoder/` (`mod.rs`, `h264.rs` [new], `raw.rs` [salvaged]),
`agent/src/capture/` (feed), `agent/src/viewer/` (decode), `proto/guruconnect.proto`
(`AgentStatus` capability, `SessionResponse` codec), `server/src/session/mod.rs` (negotiation).
- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
`EncodedFrame` (h264). Native viewer decodes via MF/D3D11.
- **Fallback:** salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
- **Negotiation:** agent advertises HW-encode capability in `AgentStatus`; server selects the codec in
`SessionResponse`; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now
(and the Phase-2 WebRTC track later).
Reference: SPEC-002 §5; `agent/src/encoder/raw.rs` (salvaged), `proto/guruconnect.proto` (EncodedFrame already modeled).
---
## Task 8: Verification (end-to-end, observable)
- **Security (re-audit clean):** run `/gc-audit --pass=security` (and `--pass=rust`) → the three relay
CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked
`cak_` key is rejected on `/ws/agent`; a viewer token for session A is rejected on session B; a
logged-out (blacklisted) viewer token is rejected on `/ws/viewer`; a user JWT is rejected as an agent key.
- **Attended flow:** generate a support code → run the one-time agent → end user sees + accepts consent →
technician's session appears only after acceptance; a denied consent tears down.
- **Key fidelity (the headline):** in a live session, confirm **Win+R opens Run on the remote**, **Ctrl+C
on remote / Ctrl+V locally** (and vice-versa, text), and **Ctrl+Alt+Del reaches the remote secure
desktop**. Confirm no stuck modifiers after alt-tabbing away and back.
- **Codec:** confirm a HW-H.264 machine negotiates h264 (check `SessionResponse`), a Win7/no-HW machine
falls back to raw+Zstd, both render correctly in the native viewer.
- **Rate limiting:** hammer `/api/auth/login` and the code-validate route → confirm throttling/lockout.
- **Migrations:** fresh DB applies the v2 migrations cleanly; `_sqlx_migrations` consistent; `tenant_id`
populated with the default tenant.