Authz-strength fix (coord todo c8916c89), code-reviewed APPROVED. Replaces the
weak "view" gate (held by every role) with a permission-tiered access mode
stamped inside the signed viewer token:
- mint: is_admin() || has_permission("control") -> CONTROL token; else
has_permission("view") -> VIEW_ONLY token; else 403.
- enforce: the relay drops MouseEvent/KeyEvent/SpecialKey for a VIEW_ONLY token
before forwarding (video still streams); CONTROL tokens forward under the
Task-3 throttle. Mode is unforgeable (in the signature) and unbypassable
(all other viewer->agent payloads hit the catch-all and are never forwarded).
A low-privilege viewer-role user can now at most watch, never control. New
ViewerAccess enum (view_only|control) on ViewerClaims; 3 unit tests.
Audit CRITICAL #1 now fully closed (mechanism in Task 3 + this authz strength).
Not cargo-check-verified locally (no toolchain) - the push triggers CI
(clippy -D warnings + build + test) which is the verification gate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
267 lines
18 KiB
Markdown
267 lines
18 KiB
Markdown
# v2 Secure Session Core — Implementation Plan
|
||
|
||
> Spec created: 2026-05-29
|
||
> Status: in progress — Tasks 1-3 DONE 2026-05-29 (Task 3 code-reviewed APPROVED). Viewer-token authz
|
||
> STRENGTH split IMPLEMENTED 2026-05-29 (self-reviewed; no Rust toolchain on this machine — not yet
|
||
> `cargo check`-verified; pending Code Review). This was the REQUIRED Phase-1-exit follow-up: the gate
|
||
> previously used `view` (held by EVERY default role incl. `viewer`) but a viewer token granted input
|
||
> CONTROL. DECIDED (Mike, 2026-05-29) + IMPLEMENTED: SPLIT VIEW_ONLY/CONTROL tokens — `view`-perm users
|
||
> get a watch-only token (relay refuses their input), admin/`control` users get a control token. See the
|
||
> "Task 3 authz-strength fix" block under Task 3 below. Resolves coord todo c8916c89 (coordinator marks
|
||
> done after review). Remaining follow-up: nothing revokes a minted viewer token on logout (bounded by
|
||
> 5-min TTL) — follow-up todo. Task 4 (rate limiting + single-use codes) next.
|
||
> CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2
|
||
> fixed only the token *mechanism*; the authz gate is what actually closes audit CRITICAL #1.
|
||
> Policy DECIDED (Mike, 2026-05-29): admin-or-view-permission (`is_admin() || has_permission(...)`).
|
||
> Parent: `docs/specs/SPEC-002-v2-modernization-architecture.md` (Phase 1)
|
||
> Keystone: Tasks 1–4 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH
|
||
> is closed there. Tasks 5–7 deliver the product capability on top. Do them in order.
|
||
|
||
## Task 0: Commit this spec
|
||
|
||
Commit the `specs/v2-secure-session-core/` directory before writing any code:
|
||
|
||
```
|
||
git add specs/v2-secure-session-core/
|
||
git commit -m "spec: add v2-secure-session-core shape spec"
|
||
```
|
||
|
||
Do not start Task 1 until this commit exists.
|
||
|
||
---
|
||
|
||
## Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables
|
||
|
||
> [DONE] migration `004_v2_secure_session_core.sql` + `db/agent_keys.rs` + `db/tenancy.rs` + struct/query
|
||
> updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db
|
||
> layer already uses runtime `sqlx::query()` (no macros) — the v2 "switch to runtime" was already true.
|
||
|
||
Files touched: `server/migrations/` (new v2 migration files), `server/src/db/` (rebuilt modules:
|
||
`agent_keys.rs` [new], `sessions.rs`, `machines.rs`, `support_codes.rs`, `events.rs`, `users.rs`,
|
||
`mod.rs`).
|
||
|
||
- New table `connect_agent_keys`: `id UUID`, `machine_id UUID FK`, `key_hash TEXT`, `tenant_id UUID NULL`,
|
||
`created_at`, `last_used_at`, `revoked_at TIMESTAMPTZ NULL`. Keys are `cak_`-prefixed, stored hashed
|
||
(SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
|
||
- Add nullable `tenant_id UUID` to `machines`, `sessions`, `support_codes`, `events`, `users`,
|
||
`connect_agent_keys`, defaulting to a single bootstrap tenant row. Add a `tenants` table with one seed row.
|
||
- `sessions` gains `is_managed BOOLEAN`, `source TEXT` (`'standalone'|'gururmm'`), `consent_state TEXT`
|
||
(`'not_required'|'pending'|'granted'|'denied'`), `tenant_id`.
|
||
- `support_codes`: add single-use semantics — `consumed_at TIMESTAMPTZ NULL`; widen the code column to
|
||
hold a higher-entropy human-readable code (see Task 4).
|
||
- Migrations are **idempotent** (`CREATE TABLE IF NOT EXISTS` / `ADD COLUMN IF NOT EXISTS`), applied on
|
||
server startup, recorded in `_sqlx_migrations`. New queries use runtime `sqlx::query()`.
|
||
- DB modules expose a `tenancy` helper that today resolves every call to the default tenant (Phase-4
|
||
switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).
|
||
|
||
Reference: `specs/native-remote-control/plan.md` Task 2 (`connect_agent_keys`); `.claude/standards/gururmm/sqlx-migrations.md`.
|
||
|
||
---
|
||
|
||
## Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens
|
||
|
||
> CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by `AuthenticatedUser` (authentication,
|
||
> not authorization). GC has a real `admin|operator|viewer` role + permissions model, so this is intra-
|
||
> tenant privilege escalation until a permission check is added. **The mechanism is fixed here; the authz
|
||
> check in Task 3 is what closes audit CRITICAL #1.** Metadata bug todo faf39fe0 resolved (migration 005).
|
||
|
||
> [IMPLEMENTED] `auth/agent_keys.rs` [new] (cak_ mint/SHA-256 hash/verify), `auth/jwt.rs`
|
||
> (`ViewerClaims` + `create_viewer_token`/`validate_viewer_token`, 5-min TTL, `purpose:"viewer"`),
|
||
> `auth/mod.rs` (module + re-export). Deleted the JWT-as-agent-key branch in `relay/mod.rs`
|
||
> `validate_agent_api_key` — now per-agent `cak_` key OR deprecated shared `AGENT_API_KEY` (WARNING-logged),
|
||
> never a user JWT. New endpoints: `POST/GET /api/machines/:agent_id/keys`,
|
||
> `DELETE /api/machines/:agent_id/keys/:key_id` (admin), `POST /api/sessions/:id/viewer-token` (dashboard JWT).
|
||
> db helpers added: `agent_keys::{list_for_machine,key_belongs_to_machine}`. Folded in migration
|
||
> `005_machine_metadata.sql` + Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust
|
||
> toolchain on this machine — self-reviewed; not yet `cargo check`-verified.
|
||
|
||
Files touched: `server/src/auth/` (`mod.rs`, `jwt.rs`, `agent_keys.rs` [new], `token_blacklist.rs`,
|
||
`password.rs`), `server/src/api/auth.rs`, `server/src/api/sessions.rs`.
|
||
|
||
- **Delete the JWT-as-agent-key path.** Remove the `jwt_config.validate_token` branch from
|
||
`validate_agent_api_key` (`relay/mod.rs:224`); agent authentication validates a `cak_` per-agent key
|
||
(hash compare against `connect_agent_keys`, reject if `revoked_at` set) OR a support code — **never a
|
||
user JWT**.
|
||
- **Session-scoped viewer tokens:** new endpoint `POST /api/sessions/:id/viewer-token` (auth: dashboard
|
||
JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose
|
||
claims include `session_id` and `tenant_id`. This replaces "any dashboard JWT can view any session."
|
||
- Keep Argon2id passwords; keep the blacklist but make `is_revoked()` callable from the WS layer (Task 3).
|
||
- Per-agent key issuance endpoint (admin): `POST /api/machines/:id/keys` → returns plaintext `cak_` once,
|
||
stores hash. `DELETE /api/machines/:id/keys/:key_id` sets `revoked_at`.
|
||
|
||
Reference: `relay/mod.rs:224` (`validate_agent_api_key` — the CRITICAL), `auth/mod.rs:116`
|
||
(blacklist already consulted for REST — extend to WS), `specs/native-remote-control/plan.md` Tasks 2/3/6;
|
||
`.claude/standards/security/credential-handling.md`, `.claude/standards/api/response-format.md`.
|
||
|
||
---
|
||
|
||
## Task 3 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified]: Secure relay WS handlers + bounded relay
|
||
|
||
> [IMPLEMENTED] Viewer WS now verifies the session-scoped VIEWER token
|
||
> (`validate_viewer_token`: sig+exp+`purpose`) + `token_blacklist.is_revoked` +
|
||
> `session_id` claim == requested session, before upgrade — raw login JWTs are no
|
||
> longer accepted (closes CRITICAL #1 mechanism + #2). Authz gate added to
|
||
> `mint_viewer_token`: `is_admin() || has_permission("view")` → 403 envelope on
|
||
> failure (closes CRITICAL #1 — uses the EXISTING `view` permission from GC's
|
||
> catalog; no new permission defined). Agent WS now binds persistent reattach to
|
||
> the authenticated machine identity: `validate_agent_api_key` returns an
|
||
> `AgentKeyAuth` enum carrying the `cak_` key's machine `agent_id` (resolved via
|
||
> new `db::machines::get_machine_by_id`); a mismatched query-string `agent_id` is
|
||
> ignored, a per-agent key whose machine can't be resolved fails closed
|
||
> (503). Frame caps set on BOTH upgrades (agent 4 MiB, viewer 64 KiB via
|
||
> `max_message_size`/`max_frame_size`) (closes WS-OOM HIGH). Viewer→agent input
|
||
> throttled to 200 events/sec/viewer via a refilling token bucket + non-blocking
|
||
> bounded `try_send` (drop/coalesce on overflow) (closes input-injection MEDIUM).
|
||
> Startup managed-session reconcile retained + clarified (persistent machines →
|
||
> offline in-memory sessions). Removed `#[allow(dead_code)]` on
|
||
> `validate_viewer_token` and `AuthenticatedUser::has_permission`. No token/secret
|
||
> logged; runtime `sqlx::query`/`query_as`. Files: `server/src/relay/mod.rs`,
|
||
> `server/src/api/sessions.rs`, `server/src/db/machines.rs`, `server/src/auth/mod.rs`,
|
||
> `server/src/auth/jwt.rs`, `server/src/main.rs`.
|
||
|
||
### Task 3 authz-strength fix — VIEW_ONLY/CONTROL token split [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]
|
||
|
||
> Closes audit CRITICAL #1 at full strength (coord todo c8916c89). The Task-3 gate
|
||
> minted a viewer token for any `is_admin() || has_permission("view")` user, but `view`
|
||
> is held by EVERY default role (incl. `viewer`) and the token granted input CONTROL —
|
||
> intra-tenant privilege escalation. Now the token carries an ACCESS MODE inside its
|
||
> signed claims and the relay enforces it:
|
||
>
|
||
> - `auth/jwt.rs`: new `ViewerAccess` enum (`ViewOnly` | `Control`, serde-renamed to
|
||
> `"view_only"`/`"control"`); `ViewerClaims` gains an `access: ViewerAccess` field;
|
||
> `create_viewer_token(..., access)` stamps it; `validate_viewer_token` returns it as
|
||
> part of the claims (sig+exp+`purpose` checks unchanged). New unit tests cover the
|
||
> round-trip, the lowercase wire form, and login-JWT rejection.
|
||
> - `auth/mod.rs`: re-export `ViewerAccess`.
|
||
> - `api/sessions.rs` (`mint_viewer_token`): TIERED mint — `is_admin() || has_permission("control")`
|
||
> → CONTROL token; else `has_permission("view")` → VIEW_ONLY token; else → 403 (standard
|
||
> envelope). Permission constants `SESSION_CONTROL_PERMISSION="control"` /
|
||
> `SESSION_VIEW_PERMISSION="view"`. Response echoes `access` (advisory; the signed claim
|
||
> is authoritative).
|
||
> - `relay/mod.rs`: `viewer_ws_handler` reads `claims.access` from the VERIFIED token and
|
||
> threads it into `handle_viewer_connection` (new `access: ViewerAccess` param). In the
|
||
> input path, a view-only token's `MouseEvent`/`KeyEvent`/`SpecialKey` are refused (a
|
||
> guarded match arm `if !access.can_control()` that silently drops + logs once-per-
|
||
> power-of-two), BEFORE the throttle/`try_send`. A control token forwards as before (with
|
||
> the Task-3 throttle). Video still streams to a view-only viewer; chat (not an injected-
|
||
> input vector) is still relayed. The mode cannot be forged — it lives in the signed token.
|
||
>
|
||
> Everything else from Task 3 (session_id-claim match, blacklist, frame caps, throttle,
|
||
> agent identity binding) is intact — this is purely additive access-mode enforcement.
|
||
>
|
||
> PHASE-2 REFINEMENT: this refuses to FORWARD input for a view-only token; it does NOT yet
|
||
> tie the viewer mode to the agent-side `SessionType.VIEW_ONLY` capture mode (the agent still
|
||
> does full capture). Deferred (deeper agent change).
|
||
|
||
Files touched: `server/src/relay/mod.rs`, `server/src/session/mod.rs`.
|
||
|
||
- **`viewer_ws_handler`** (`relay/mod.rs:242`): verify the viewer token's **signature + expiry +
|
||
blacklist + `session_id` claim == requested `session_id`** before `handle_viewer_connection`
|
||
(`relay/mod.rs:595`). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
|
||
- **Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit
|
||
CRITICAL #1.** The minting endpoint `POST /api/sessions/:id/viewer-token` (in `server/src/api/sessions.rs`)
|
||
must enforce a real permission predicate, not just `AuthenticatedUser`: `user.is_admin() ||
|
||
user.has_permission(<policy>)`. GC's role model (`admin|operator|viewer`) + permissions table already
|
||
exist (`server/src/auth/mod.rs`), so honoring the intra-tenant role distinction is cheap. **Policy
|
||
DECIDED (Mike, 2026-05-29): admin-or-view-permission** — `user.is_admin() || user.has_permission(<the
|
||
session-view/control permission in GC's catalog — use the real name; define one if absent>)`. Enforce
|
||
at the minting endpoint `mint_viewer_token` (the authz decision point); the WS then trusts the
|
||
session-scoped token. Multi-tenant client-access isolation stays deferred to Phase 4.
|
||
- **`agent_ws_handler`** (`relay/mod.rs:55`): authenticate via per-agent key OR support code only
|
||
(Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string
|
||
`agent_id` alone (`session/mod.rs:98`).
|
||
- **Frame caps:** set explicit `.max_message_size(...)`/`.max_frame_size(...)` on both
|
||
`WebSocketUpgrade`s; reject oversized frames before `to_vec()`/broadcast. (Fixes WS-OOM HIGH.)
|
||
- **Input throttle:** bound + rate-limit the viewer→agent input queue (`relay/mod.rs:669`); cap events/sec.
|
||
- Reconcile managed sessions from DB on startup so they aren't orphaned.
|
||
|
||
Reference: audit Pass E (`reports/2026-05-29-gc-audit.md` §"Pass 5"); `relay/mod.rs:242,55,595,669`.
|
||
|
||
---
|
||
|
||
## Task 4 (KEYSTONE): Working rate limiting + single-use support codes
|
||
|
||
Files touched: `server/src/middleware/rate_limit.rs` (rebuild — v1 is non-compiling),
|
||
`server/src/middleware/mod.rs`, `server/src/api/auth.rs` (login), `server/src/api/` (code validate),
|
||
`server/src/db/support_codes.rs`, `server/src/relay/mod.rs` (code bind).
|
||
|
||
- Rebuild a **compiling** rate-limit layer (fix the `tower_governor` generics, or a small in-memory
|
||
fixed-window limiter); re-enable `pub mod rate_limit`. Wire it to `POST /api/auth/login`,
|
||
`POST /api/auth/change-password`, and the support-code validate route.
|
||
- **Single-use codes:** consume atomically on first agent bind (accept only a `pending`, unconsumed code;
|
||
set `consumed_at`); reject a second presenter. (Fixes the reusable-code HIGH.)
|
||
- **Widen the code:** higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing
|
||
6-digit numeric; add per-IP lockout on repeated validate failures.
|
||
|
||
Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); `middleware/mod.rs:3-11`.
|
||
|
||
---
|
||
|
||
## Task 5: Attended-mode consent
|
||
|
||
Files touched: `proto/guruconnect.proto`, `agent/src/session/mod.rs`, `agent/src/` (consent UI dialog),
|
||
`server/src/relay/mod.rs`, `server/src/session/mod.rs`.
|
||
|
||
- Add `ConsentRequest` / `ConsentResponse` to the proto (after `AdminCommand`).
|
||
- On an attended session, the agent shows a consent dialog to the end user; the server keeps the session
|
||
`consent_state = pending` and surfaces it to the technician only on `granted`. `denied`/timeout → tear down.
|
||
- Managed/unattended sessions follow per-tenant policy (default: silent for managed; `consent_state =
|
||
not_required`). Audit the consent decision to `events`.
|
||
|
||
Reference: `specs/native-remote-control/plan.md` Task 5 (Consent primitive); proto `AdminCommand` insertion point.
|
||
|
||
---
|
||
|
||
## Task 6: Native viewer — full key fidelity
|
||
|
||
Files touched: `agent/src/viewer/` (low-level hook + input capture), `agent/src/input/keyboard.rs`
|
||
(extend — salvaged), `agent/src/input/mod.rs`, `agent/src/bin/sas_service.rs` (wire — salvaged),
|
||
`proto/guruconnect.proto` (confirm `KeyEvent`/`SpecialKeyEvent` coverage).
|
||
|
||
- **Viewer capture:** install `WH_KEYBOARD_LL` while the viewer is in focused control; divert system
|
||
combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as `KeyEvent` (VK + scan code + extended flag)
|
||
instead of letting the local shell act. Add a "send system keys to remote" toggle.
|
||
- **Agent injection:** scan-code `SendInput` (`KEYEVENTF_SCANCODE` + correct `KEYEVENTF_EXTENDEDKEY` for
|
||
right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged `input/keyboard.rs`.
|
||
- **Ctrl+Alt+Del:** `SpecialKeyEvent.CTRL_ALT_DEL` → the salvaged SAS service (`bin/sas_service.rs`,
|
||
`SendSAS`, SYSTEM, `SoftwareSASGeneration` policy set by the managed installer).
|
||
- **Modifier hygiene:** track modifier up/down; **re-sync (release) on focus loss** to kill stuck modifiers.
|
||
|
||
Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.rs`, `agent/src/bin/sas_service.rs`.
|
||
|
||
---
|
||
|
||
## Task 7: Hardware H.264 encode + negotiated raw/Zstd fallback
|
||
|
||
Files touched: `agent/src/encoder/` (`mod.rs`, `h264.rs` [new], `raw.rs` [salvaged]),
|
||
`agent/src/capture/` (feed), `agent/src/viewer/` (decode), `proto/guruconnect.proto`
|
||
(`AgentStatus` capability, `SessionResponse` codec), `server/src/session/mod.rs` (negotiation).
|
||
|
||
- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
|
||
`EncodedFrame` (h264). Native viewer decodes via MF/D3D11.
|
||
- **Fallback:** salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
|
||
- **Negotiation:** agent advertises HW-encode capability in `AgentStatus`; server selects the codec in
|
||
`SessionResponse`; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now
|
||
(and the Phase-2 WebRTC track later).
|
||
|
||
Reference: SPEC-002 §5; `agent/src/encoder/raw.rs` (salvaged), `proto/guruconnect.proto` (EncodedFrame already modeled).
|
||
|
||
---
|
||
|
||
## Task 8: Verification (end-to-end, observable)
|
||
|
||
- **Security (re-audit clean):** run `/gc-audit --pass=security` (and `--pass=rust`) → the three relay
|
||
CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked
|
||
`cak_` key is rejected on `/ws/agent`; a viewer token for session A is rejected on session B; a
|
||
logged-out (blacklisted) viewer token is rejected on `/ws/viewer`; a user JWT is rejected as an agent key.
|
||
- **Attended flow:** generate a support code → run the one-time agent → end user sees + accepts consent →
|
||
technician's session appears only after acceptance; a denied consent tears down.
|
||
- **Key fidelity (the headline):** in a live session, confirm **Win+R opens Run on the remote**, **Ctrl+C
|
||
on remote / Ctrl+V locally** (and vice-versa, text), and **Ctrl+Alt+Del reaches the remote secure
|
||
desktop**. Confirm no stuck modifiers after alt-tabbing away and back.
|
||
- **Codec:** confirm a HW-H.264 machine negotiates h264 (check `SessionResponse`), a Win7/no-HW machine
|
||
falls back to raw+Zstd, both render correctly in the native viewer.
|
||
- **Rate limiting:** hammer `/api/auth/login` and the code-validate route → confirm throttling/lockout.
|
||
- **Migrations:** fresh DB applies the v2 migrations cleanly; `_sqlx_migrations` consistent; `tenant_id`
|
||
populated with the default tenant.
|