guru-connect/specs/v2-secure-session-core/plan.md

# v2 Secure Session Core — Implementation Plan

> **STATUS 2026-05-30: Tasks 1–7 IMPLEMENTED + DEPLOYED. Tasks 3–5 now CODE-REVIEWED — verdict
> APPROVE-WITH-FIXES (no CRITICAL/HIGH).** Compile-verified on GURU-5070: `cargo fmt --check` clean,
> `clippy -D warnings` 0 warnings, `cargo test --workspace` 89 pass. The 3 audit CRITICALs verified
> closed with no bypass; all security paths fail closed. Non-blocking follow-ups tracked: viewer-token
> logout revocation (MEDIUM, TTL-bounded), delete the dead `validate_agent_key` "accept-any" placeholder
> (MEDIUM), `X-Real-IP`/consent-comment/support-code-log hygiene (LOW). **Remaining for Phase-1 exit:
> Task 8 (e2e verification + `/gc-audit --pass=security` re-audit).**
>
> Spec created: 2026-05-29
> Status: in progress — Tasks 1-4 IMPLEMENTED 2026-05-29 (Task 4 self-reviewed, pending Code Review;
> Tasks 1-3 code-reviewed APPROVED). Task 4 completes the KEYSTONE (secure auth/session core). Viewer-token authz
> STRENGTH split IMPLEMENTED 2026-05-29 (self-reviewed; no Rust toolchain on this machine — not yet
> `cargo check`-verified; pending Code Review). This was the REQUIRED Phase-1-exit follow-up: the gate
> previously used `view` (held by EVERY default role incl. `viewer`) but a viewer token granted input
> CONTROL. DECIDED (Mike, 2026-05-29) + IMPLEMENTED: SPLIT VIEW_ONLY/CONTROL tokens — `view`-perm users
> get a watch-only token (relay refuses their input), admin/`control` users get a control token. See the
> "Task 3 authz-strength fix" block under Task 3 below. Resolves coord todo c8916c89 (coordinator marks
> done after review). Remaining follow-up: nothing revokes a minted viewer token on logout (bounded by
> 5-min TTL) — follow-up todo. Task 4 (rate limiting + single-use codes) next.
> CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2
> fixed only the token *mechanism*; the authz gate is what actually closes audit CRITICAL #1.
> Policy DECIDED (Mike, 2026-05-29): admin-or-view-permission (`is_admin() || has_permission(...)`).
> Parent: `docs/specs/SPEC-002-v2-modernization-architecture.md` (Phase 1)
> Keystone: Tasks 1–4 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH
> is closed there. Tasks 5–7 deliver the product capability on top. Do them in order.

## Task 0: Commit this spec

Commit the `specs/v2-secure-session-core/` directory before writing any code:

```
git add specs/v2-secure-session-core/
git commit -m "spec: add v2-secure-session-core shape spec"
```

Do not start Task 1 until this commit exists.

---

## Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables

> [DONE] migration `004_v2_secure_session_core.sql` + `db/agent_keys.rs` + `db/tenancy.rs` + struct/query
> updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db
> layer already uses runtime `sqlx::query()` (no macros) — the v2 "switch to runtime" was already true.

Files touched: `server/migrations/` (new v2 migration files), `server/src/db/` (rebuilt modules:
`agent_keys.rs` [new], `sessions.rs`, `machines.rs`, `support_codes.rs`, `events.rs`, `users.rs`,
`mod.rs`).

- New table `connect_agent_keys`: `id UUID`, `machine_id UUID FK`, `key_hash TEXT`, `tenant_id UUID NULL`,
  `created_at`, `last_used_at`, `revoked_at TIMESTAMPTZ NULL`. Keys are `cak_`-prefixed, stored hashed
  (SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
- Add nullable `tenant_id UUID` to `machines`, `sessions`, `support_codes`, `events`, `users`,
  `connect_agent_keys`, defaulting to a single bootstrap tenant row. Add a `tenants` table with one seed row.
- `sessions` gains `is_managed BOOLEAN`, `source TEXT` (`'standalone'|'gururmm'`), `consent_state TEXT`
  (`'not_required'|'pending'|'granted'|'denied'`), `tenant_id`.
- `support_codes`: add single-use semantics — `consumed_at TIMESTAMPTZ NULL`; widen the code column to
  hold a higher-entropy human-readable code (see Task 4).
- Migrations are **idempotent** (`CREATE TABLE IF NOT EXISTS` / `ADD COLUMN IF NOT EXISTS`), applied on
  server startup, recorded in `_sqlx_migrations`. New queries use runtime `sqlx::query()`.
- DB modules expose a `tenancy` helper that today resolves every call to the default tenant (Phase-4
  switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).

Reference: `specs/native-remote-control/plan.md` Task 2 (`connect_agent_keys`); `.claude/standards/gururmm/sqlx-migrations.md`.

---

## Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens

> CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by `AuthenticatedUser` (authentication,
> not authorization). GC has a real `admin|operator|viewer` role + permissions model, so this is intra-
> tenant privilege escalation until a permission check is added. **The mechanism is fixed here; the authz
> check in Task 3 is what closes audit CRITICAL #1.** Metadata bug todo faf39fe0 resolved (migration 005).

> [IMPLEMENTED] `auth/agent_keys.rs` [new] (cak_ mint/SHA-256 hash/verify), `auth/jwt.rs`
> (`ViewerClaims` + `create_viewer_token`/`validate_viewer_token`, 5-min TTL, `purpose:"viewer"`),
> `auth/mod.rs` (module + re-export). Deleted the JWT-as-agent-key branch in `relay/mod.rs`
> `validate_agent_api_key` — now per-agent `cak_` key OR deprecated shared `AGENT_API_KEY` (WARNING-logged),
> never a user JWT. New endpoints: `POST/GET /api/machines/:agent_id/keys`,
> `DELETE /api/machines/:agent_id/keys/:key_id` (admin), `POST /api/sessions/:id/viewer-token` (dashboard JWT).
> db helpers added: `agent_keys::{list_for_machine,key_belongs_to_machine}`. Folded in migration
> `005_machine_metadata.sql` + Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust
> toolchain on this machine — self-reviewed; not yet `cargo check`-verified.

Files touched: `server/src/auth/` (`mod.rs`, `jwt.rs`, `agent_keys.rs` [new], `token_blacklist.rs`,
`password.rs`), `server/src/api/auth.rs`, `server/src/api/sessions.rs`.

- **Delete the JWT-as-agent-key path.** Remove the `jwt_config.validate_token` branch from
  `validate_agent_api_key` (`relay/mod.rs:224`); agent authentication validates a `cak_` per-agent key
  (hash compare against `connect_agent_keys`, reject if `revoked_at` set) OR a support code — **never a
  user JWT**.
- **Session-scoped viewer tokens:** new endpoint `POST /api/sessions/:id/viewer-token` (auth: dashboard
  JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose
  claims include `session_id` and `tenant_id`. This replaces "any dashboard JWT can view any session."
- Keep Argon2id passwords; keep the blacklist but make `is_revoked()` callable from the WS layer (Task 3).
- Per-agent key issuance endpoint (admin): `POST /api/machines/:id/keys` → returns plaintext `cak_` once,
  stores hash. `DELETE /api/machines/:id/keys/:key_id` sets `revoked_at`.

Reference: `relay/mod.rs:224` (`validate_agent_api_key` — the CRITICAL), `auth/mod.rs:116`
(blacklist already consulted for REST — extend to WS), `specs/native-remote-control/plan.md` Tasks 2/3/6;
`.claude/standards/security/credential-handling.md`, `.claude/standards/api/response-format.md`.

---

## Task 3 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified]: Secure relay WS handlers + bounded relay

> [IMPLEMENTED] Viewer WS now verifies the session-scoped VIEWER token
> (`validate_viewer_token`: sig+exp+`purpose`) + `token_blacklist.is_revoked` +
> `session_id` claim == requested session, before upgrade — raw login JWTs are no
> longer accepted (closes CRITICAL #1 mechanism + #2). Authz gate added to
> `mint_viewer_token`: `is_admin() || has_permission("view")` → 403 envelope on
> failure (closes CRITICAL #1 — uses the EXISTING `view` permission from GC's
> catalog; no new permission defined). Agent WS now binds persistent reattach to
> the authenticated machine identity: `validate_agent_api_key` returns an
> `AgentKeyAuth` enum carrying the `cak_` key's machine `agent_id` (resolved via
> new `db::machines::get_machine_by_id`); a mismatched query-string `agent_id` is
> ignored, a per-agent key whose machine can't be resolved fails closed
> (503). Frame caps set on BOTH upgrades (agent 4 MiB, viewer 64 KiB via
> `max_message_size`/`max_frame_size`) (closes WS-OOM HIGH). Viewer→agent input
> throttled to 200 events/sec/viewer via a refilling token bucket + non-blocking
> bounded `try_send` (drop/coalesce on overflow) (closes input-injection MEDIUM).
> Startup managed-session reconcile retained + clarified (persistent machines →
> offline in-memory sessions). Removed `#[allow(dead_code)]` on
> `validate_viewer_token` and `AuthenticatedUser::has_permission`. No token/secret
> logged; runtime `sqlx::query`/`query_as`. Files: `server/src/relay/mod.rs`,
> `server/src/api/sessions.rs`, `server/src/db/machines.rs`, `server/src/auth/mod.rs`,
> `server/src/auth/jwt.rs`, `server/src/main.rs`.

### Task 3 authz-strength fix — VIEW_ONLY/CONTROL token split [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]

> Closes audit CRITICAL #1 at full strength (coord todo c8916c89). The Task-3 gate
> minted a viewer token for any `is_admin() || has_permission("view")` user, but `view`
> is held by EVERY default role (incl. `viewer`) and the token granted input CONTROL —
> intra-tenant privilege escalation. Now the token carries an ACCESS MODE inside its
> signed claims and the relay enforces it:
>
> - `auth/jwt.rs`: new `ViewerAccess` enum (`ViewOnly` | `Control`, serde-renamed to
>   `"view_only"`/`"control"`); `ViewerClaims` gains an `access: ViewerAccess` field;
>   `create_viewer_token(..., access)` stamps it; `validate_viewer_token` returns it as
>   part of the claims (sig+exp+`purpose` checks unchanged). New unit tests cover the
>   round-trip, the lowercase wire form, and login-JWT rejection.
> - `auth/mod.rs`: re-export `ViewerAccess`.
> - `api/sessions.rs` (`mint_viewer_token`): TIERED mint — `is_admin() || has_permission("control")`
>   → CONTROL token; else `has_permission("view")` → VIEW_ONLY token; else → 403 (standard
>   envelope). Permission constants `SESSION_CONTROL_PERMISSION="control"` /
>   `SESSION_VIEW_PERMISSION="view"`. Response echoes `access` (advisory; the signed claim
>   is authoritative).
> - `relay/mod.rs`: `viewer_ws_handler` reads `claims.access` from the VERIFIED token and
>   threads it into `handle_viewer_connection` (new `access: ViewerAccess` param). In the
>   input path, a view-only token's `MouseEvent`/`KeyEvent`/`SpecialKey` are refused (a
>   guarded match arm `if !access.can_control()` that silently drops + logs once-per-
>   power-of-two), BEFORE the throttle/`try_send`. A control token forwards as before (with
>   the Task-3 throttle). Video still streams to a view-only viewer; chat (not an injected-
>   input vector) is still relayed. The mode cannot be forged — it lives in the signed token.
>
> Everything else from Task 3 (session_id-claim match, blacklist, frame caps, throttle,
> agent identity binding) is intact — this is purely additive access-mode enforcement.
>
> PHASE-2 REFINEMENT: this refuses to FORWARD input for a view-only token; it does NOT yet
> tie the viewer mode to the agent-side `SessionType.VIEW_ONLY` capture mode (the agent still
> does full capture). Deferred (deeper agent change).

Files touched: `server/src/relay/mod.rs`, `server/src/session/mod.rs`.

- **`viewer_ws_handler`** (`relay/mod.rs:242`): verify the viewer token's **signature + expiry +
  blacklist + `session_id` claim == requested `session_id`** before `handle_viewer_connection`
  (`relay/mod.rs:595`). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
- **Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit
  CRITICAL #1.** The minting endpoint `POST /api/sessions/:id/viewer-token` (in `server/src/api/sessions.rs`)
  must enforce a real permission predicate, not just `AuthenticatedUser`: `user.is_admin() ||
  user.has_permission(<policy>)`. GC's role model (`admin|operator|viewer`) + permissions table already
  exist (`server/src/auth/mod.rs`), so honoring the intra-tenant role distinction is cheap. **Policy
  DECIDED (Mike, 2026-05-29): admin-or-view-permission** — `user.is_admin() || user.has_permission(<the
  session-view/control permission in GC's catalog — use the real name; define one if absent>)`. Enforce
  at the minting endpoint `mint_viewer_token` (the authz decision point); the WS then trusts the
  session-scoped token. Multi-tenant client-access isolation stays deferred to Phase 4.
- **`agent_ws_handler`** (`relay/mod.rs:55`): authenticate via per-agent key OR support code only
  (Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string
  `agent_id` alone (`session/mod.rs:98`).
- **Frame caps:** set explicit `.max_message_size(...)`/`.max_frame_size(...)` on both
  `WebSocketUpgrade`s; reject oversized frames before `to_vec()`/broadcast. (Fixes WS-OOM HIGH.)
- **Input throttle:** bound + rate-limit the viewer→agent input queue (`relay/mod.rs:669`); cap events/sec.
- Reconcile managed sessions from DB on startup so they aren't orphaned.

Reference: audit Pass E (`reports/2026-05-29-gc-audit.md` §"Pass 5"); `relay/mod.rs:242,55,595,669`.

---

## Task 4 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Working rate limiting + single-use support codes

> [IMPLEMENTED] Closes the keystone (Tasks 1–4). Three parts:
>
> A. RATE LIMITING — replaced the non-compiling tower_governor layer with a small
>    self-contained in-memory limiter (`middleware/rate_limit.rs`): a per-IP
>    fixed-window `RateLimiter` (`Mutex<HashMap<IpAddr, Window>>`, no new dep) +
>    a per-IP consecutive-failure `FailureLockout`, bundled as `RateLimitState`
>    in `AppState`. Keyed by `ConnectInfo<SocketAddr>` IP (same source the relay
>    uses); X-Forwarded-For intentionally NOT trusted (proxy-spoofable). 429 with
>    the standard error envelope on limit. Re-enabled `pub mod rate_limit`. Wired
>    per-route via `route_layer(from_fn_with_state(...))` onto `POST
>    /api/auth/login` (8/min/IP), `POST /api/auth/change-password` (5/min/IP), and
>    `GET /api/codes/:code/validate` (15/min/IP). Named consts for every limit.
>    LOCKOUT: after 10 consecutive failed code-validations from an IP, that IP is
>    locked out 15 min; the validate handler reports success/failure into the
>    lockout, the middleware enforces it BEFORE the handler runs. Unit tests cover
>    window allow/block/reset, per-IP isolation, and lockout trip/reset/expire
>    (clock injected, no sleeps).
>
> B. SINGLE-USE CODES — the agent bind path now CONSUMES the code atomically on
>    first bind. In-memory: new `SupportCodeManager::consume_for_bind` accepts
>    ONLY a `Pending` code and flips it to `Connected` under the write lock (a 2nd
>    presenter loses the race → rejected). This replaces the v1 pre-upgrade check
>    that accepted `pending` OR `connected` (the reusable-code HIGH). DB: new
>    `db::support_codes::consume_code_for_bind` — a single conditional UPDATE
>    `... SET consumed_at = NOW(), status='connected' WHERE code=$1 AND consumed_at
>    IS NULL AND status='pending' AND (expires_at IS NULL OR expires_at > NOW())
>    RETURNING id`; zero rows ⇒ not consumable. The in-memory consume is
>    AUTHORITATIVE (the live source of truth); the DB UPDATE is a durable/audit
>    mirror applied best-effort after it (a missing DB row does not veto a bind the
>    in-memory layer admitted). To make the durable record meaningful, the portal
>    `create_code` handler now also inserts the code into `connect_support_codes`.
>    Validate/preview path is UNCHANGED and explicitly does NOT consume (test
>    `preview_validate_does_not_consume`).
>
> C. WIDER CODE — replaced the 6-digit numeric generator with a grouped
>    base32-style code `XXX-XXX-XXX` (9 symbols over a 31-char UNAMBIGUOUS alphabet
>    excluding 0/O/1/I/L ≈ 44.6 bits), CSPRNG-backed (`OsRng`, rejection sampling
>    to avoid modulo bias). The new code (11 chars incl. hyphens) does NOT fit the
>    `VARCHAR(10)` column from migration 001, so migration `006_widen_support_code.sql`
>    widens `connect_support_codes.code` AND `connect_sessions.support_code` to
>    TEXT (idempotent). Unit tests cover shape, charset (no ambiguous chars), and
>    practical uniqueness.
>
> DEPS: none added; `tower_governor` REMOVED from Cargo.toml (it never compiled).
> No code/secret/support-code value logged on any path. Runtime `sqlx::query`.
> Files: `server/src/middleware/rate_limit.rs` (rebuilt), `server/src/middleware/mod.rs`,
> `server/src/main.rs` (AppState field + 3 route wirings + create_code DB insert +
> validate handler lockout feed), `server/src/support_codes.rs` (new generator +
> `consume_for_bind` + tests), `server/src/db/support_codes.rs`
> (`consume_code_for_bind`), `server/src/relay/mod.rs` (atomic consume on bind),
> `server/migrations/006_widen_support_code.sql` [new], `server/Cargo.toml`.

Files touched: `server/src/middleware/rate_limit.rs` (rebuild — v1 is non-compiling),
`server/src/middleware/mod.rs`, `server/src/api/auth.rs` (login), `server/src/api/` (code validate),
`server/src/db/support_codes.rs`, `server/src/relay/mod.rs` (code bind).

- Rebuild a **compiling** rate-limit layer (fix the `tower_governor` generics, or a small in-memory
  fixed-window limiter); re-enable `pub mod rate_limit`. Wire it to `POST /api/auth/login`,
  `POST /api/auth/change-password`, and the support-code validate route.
- **Single-use codes:** consume atomically on first agent bind (accept only a `pending`, unconsumed code;
  set `consumed_at`); reject a second presenter. (Fixes the reusable-code HIGH.)
- **Widen the code:** higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing
  6-digit numeric; add per-IP lockout on repeated validate failures.

Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); `middleware/mod.rs:3-11`.

---

## Task 5 [IMPLEMENTED 2026-05-30 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Attended-mode consent

> [IMPLEMENTED] An ATTENDED (support-code) session now requires the end user to
> ACCEPT a native consent prompt before the technician's session is surfaced.
>
> - PROTO (`proto/guruconnect.proto`): added `ConsentRequest` (server/relay →
>   agent: `session_id`, `technician_name`, `access_mode` (`ConsentAccessMode`
>   = `CONSENT_VIEW`|`CONSENT_CONTROL`), `timeout_secs`) and `ConsentResponse`
>   (agent → server: `session_id`, `granted`, `reason`), inserted AFTER
>   `AdminCommand`. New `Message` oneof field numbers `consent_request = 80`,
>   `consent_response = 81` (no existing field renumbered).
> - SERVER (`session/mod.rs`): new `ConsentState` enum
>   (`NotRequired|Pending|Granted|Denied`, `as_db_str`/`allows_viewer`) added to
>   the in-memory `Session`. `register_agent` starts attended (`!is_persistent`)
>   sessions `Pending`, managed/persistent `NotRequired`. `join_session` REFUSES
>   any viewer unless `consent_state.allows_viewer()` (only `Granted`/
>   `NotRequired`) — this is the gate that keeps a support session invisible to
>   the technician until accepted. New `set_consent_state`/`get_consent_state`.
>   Unit tests cover the db-string mapping, the viewer-admission predicate, and
>   the attended-pending-blocks / granted-admits / managed-admits / denied-blocks
>   transitions.
> - SERVER (`relay/mod.rs`): after registering an attended agent,
>   `run_consent_handshake` sends `ConsentRequest`, audits `consent_requested`,
>   then waits up to `CONSENT_TIMEOUT_SECS = 60` for a `ConsentResponse`.
>   granted → `consent_state = Granted` + audit `consent_granted` + proceed.
>   denied/timeout/agent-disconnect → `consent_state = Denied` + audit
>   `consent_denied`, send a Disconnect to the agent, end the session row
>   (`status='denied'`), release the code, and TEAR DOWN (early return — the
>   technician never sees the session). In-memory consent is authoritative; the
>   DB `consent_state` (via `db::sessions::update_consent_state`) is a durable/
>   audit mirror. A late/dup `ConsentResponse` in the main loop is logged+ignored
>   (no silent unhandled variant). `db::events` gained `CONSENT_GRANTED`/
>   `CONSENT_DENIED`/`CONSENT_REQUESTED`. `db::sessions::create_session` now sets
>   `is_managed`/`source`/`consent_state` from `is_support_session` (attended →
>   `false`/`standalone`/`pending`; managed → `true`/`gururmm`/`not_required`).
>   `api::SessionInfo` echoes `consent_state` so the dashboard can show "awaiting
>   consent".
> - AGENT (`consent/mod.rs` [new], `session/mod.rs`, `main.rs`): on
>   `ConsentRequest`, `handle_consent_request` runs a blocking Windows
>   `MessageBox` (MB_YESNO | TOPMOST | SETFOREGROUND | SYSTEMMODAL | ICONQUESTION)
>   on `spawn_blocking` so the async loop/heartbeats are not stalled, phrasing
>   the prompt VIEW vs VIEW-and-CONTROL from `access_mode`, then sends a
>   `ConsentResponse`. Anything other than an explicit Yes (closed box, panic) is
>   a DENY. Non-Windows build is a `// TODO(platform)` stub that fails CLOSED
>   (denies). Unit tests cover the prompt wording + the access-mode decode
>   fallback.
> - Managed/unattended sessions are `not_required`, never prompted (Phase-1
>   default). PER-TENANT consent policy beyond this default is a future
>   refinement — left as a TODO (no per-tenant policy table consulted yet).
> - No `.unwrap()` on a non-test path; runtime `sqlx::query`; no code/secret/
>   token value logged. No Rust toolchain here — self-reviewed only.

Files touched: `proto/guruconnect.proto`, `agent/src/session/mod.rs`, `agent/src/consent/mod.rs` (new),
`agent/src/main.rs`, `server/src/relay/mod.rs`, `server/src/session/mod.rs`, `server/src/db/sessions.rs`,
`server/src/db/events.rs`, `server/src/api/mod.rs`.

- Add `ConsentRequest` / `ConsentResponse` to the proto (after `AdminCommand`).
- On an attended session, the agent shows a consent dialog to the end user; the server keeps the session
  `consent_state = pending` and surfaces it to the technician only on `granted`. `denied`/timeout → tear down.
- Managed/unattended sessions follow per-tenant policy (default: silent for managed; `consent_state =
  not_required`). Audit the consent decision to `events`.

Reference: `specs/native-remote-control/plan.md` Task 5 (Consent primitive); proto `AdminCommand` insertion point.

---

## Task 6 [IMPLEMENTED 2026-05-30 — self-verified on a local Windows toolchain: `cargo fmt --all` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 69 pass (19 agent + 50 server), `cargo build --workspace` ok; pending Code Review]: Native viewer — full key fidelity

> [IMPLEMENTED] The four parts:
>
> 1. VIEWER CAPTURE (`agent/src/viewer/input.rs`): the `WH_KEYBOARD_LL` hook now
>    DIVERTS system combinations (Windows/Apps keys, Alt+Tab, Alt+Esc, Ctrl+Esc;
>    Win+R / Win+E compose on the remote because the diverted Win-down is forwarded)
>    and forwards them as full-fidelity `KeyEvent`s — VK + hardware scan code +
>    `is_extended` (read from `KBDLLHOOKSTRUCT.flags & LLKHF_EXTENDED`) + modifier
>    snapshot — returning `LRESULT(1)` to suppress local handling. The combo decision
>    is a pure `is_system_combo(vk, alt, ctrl)` so it is unit-tested. Ordinary keys are
>    NOT forwarded by the hook (they take the normal winit path — avoids double inject).
>    A "send system keys to remote" toggle (`AtomicBool`, default ON) gates the diversion;
>    when OFF the hook is transparent. Toggle API: `set_/toggle_/send_system_keys_enabled`.
>    Host key: Pause/Break flips it (handled in `render.rs`, intercepted locally, logged).
> 2. AGENT INJECTION (`agent/src/input/keyboard.rs`): rewrote `send_key` to inject via
>    `SendInput` with `KEYEVENTF_SCANCODE` (layout-independent) and the correct
>    `KEYEVENTF_EXTENDEDKEY` (set when the viewer flagged extended, the VK is inherently
>    extended, or the VK→scan map carries the 0xE0 prefix). New `key_event_full(vk, scan,
>    is_extended, down)` consumes all `KeyEvent` fields (scan 0 ⇒ derive from VK for older
>    viewers); wired into `session/mod.rs` `KeyEvent` handling. `is_extended_key` delegates
>    to a platform-independent `vk_is_extended` (unit-tested) that also covers right Ctrl/Alt.
> 3. CTRL+ALT+DEL: the existing `SpecialKey::CtrlAltDel` → `send_ctrl_alt_del` →
>    `send_sas` path is confirmed and hardened. `send_sas` now: Tier 1 SAS helper service
>    (SYSTEM, named pipe) → Tier 2 direct `sas.dll!SendSAS` → Tier 3 **fails with a clear,
>    actionable error** (no false success; plain SendInput can't reach the secure desktop).
>    The SAS installer (`bin/sas_service.rs install`) now sets the `SoftwareSASGeneration`
>    Winlogon policy (HKLM\...\Policies\System, DWORD 1 = services) via `set_software_sas_policy`,
>    with a `// TODO(installer)` noting the top-level managed installer should also ensure it.
> 4. MODIFIER HYGIENE: viewer tracks held Ctrl/Alt/Shift/Win (`ViewerModifierState` in
>    `render.rs`) and emits explicit key-ups on FOCUS LOSS (`WindowEvent::Focused(false)`)
>    and on window close, so a modifier pressed-but-not-released across a blur doesn't stay
>    latched on the remote. Agent-side `KeyboardController` also tracks injected modifiers and
>    exposes `release_all_modifiers` (defensive complement; wired via `ModifierState`, the
>    scaffolding the cleanup kept). Both trackers unit-tested.
>
> PROTO: `KeyEvent` gained `bool is_extended = 7` (new field number, nothing renumbered)
> — carries the viewer's `LLKHF_EXTENDED` capture so injection picks the right extended
> flag; older agents that ignore it fall back to deriving from vk/scan. `SpecialKeyEvent`
> already carried `CTRL_ALT_DEL`; unchanged.
>
> dead_code wired/removed: `InputEvent::SpecialKey` (now emitted/consumable),
> `ModifierState` (agent keyboard — now tracks + drains held modifiers),
> `type_char`/`type_string` kept with their allows (separate unicode-typing feature, not in
> Task 6 scope). `sas_client::is_service_available`/`get_service_status` kept narrowly allowed
> (status API not yet wired into a runtime path). `InputController::key_event` (vk-only) and
> `release_all_modifiers` kept allowed as API surface (the relayed path uses `key_event_full`,
> focus-loss re-sync is viewer-driven). New toggle accessors `set_/send_system_keys_enabled`
> narrowly allowed (host-key uses `toggle_`; future viewer menu).
>
> TESTS ADDED (13): extended-key flag determination (extended + non-extended sets),
> modifier-state transitions (record/down-up/ignore-non-modifier/drain-and-clear), the
> system-combo classifier (Win/Alt+Tab/Alt+Esc/Ctrl+Esc divert; ordinary keys don't), and
> the toggle state machine (default ON, flip, explicit set). Live hook/SendInput/SendSAS
> behavior is plan Task 8 (needs a real desktop).

Files touched: `proto/guruconnect.proto` (KeyEvent `is_extended = 7`), `agent/src/viewer/input.rs`
(rewritten hook + toggle), `agent/src/viewer/render.rs` (focus-loss modifier release, host-key
toggle, is_extended on winit path), `agent/src/viewer/mod.rs` (unchanged SpecialKey forwarding),
`agent/src/input/keyboard.rs` (scan-code injection + ModifierState + vk_is_extended + tests),
`agent/src/input/mod.rs` (key_event_full / release_all_modifiers / vk_is_extended re-export),
`agent/src/session/mod.rs` (KeyEvent → key_event_full), `agent/src/bin/sas_service.rs`
(set_software_sas_policy + install wiring), `agent/src/input/keyboard.rs` (send_sas hardening).

- **Viewer capture:** install `WH_KEYBOARD_LL` while the viewer is in focused control; divert system
  combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as `KeyEvent` (VK + scan code + extended flag)
  instead of letting the local shell act. Add a "send system keys to remote" toggle.
- **Agent injection:** scan-code `SendInput` (`KEYEVENTF_SCANCODE` + correct `KEYEVENTF_EXTENDEDKEY` for
  right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged `input/keyboard.rs`.
- **Ctrl+Alt+Del:** `SpecialKeyEvent.CTRL_ALT_DEL` → the salvaged SAS service (`bin/sas_service.rs`,
  `SendSAS`, SYSTEM, `SoftwareSASGeneration` policy set by the managed installer).
- **Modifier hygiene:** track modifier up/down; **re-sync (release) on focus loss** to kill stuck modifiers.

Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.rs`, `agent/src/bin/sas_service.rs`.

---

## Task 7 [IMPLEMENTED 2026-05-30 — self-verified on local Windows toolchain: `cargo fmt --all --check` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 89 pass (36 agent + 53 server; was 70, no regressions), `cargo build --workspace` ok; pending Code Review]: Hardware H.264 encode + negotiated raw/Zstd fallback

> [IMPLEMENTED] Raw+Zstd remains the DEFAULT and guaranteed fallback; H.264 is a
> negotiated upgrade that is COMPILE-VERIFIED ONLY (live MF encode/decode is Task
> 8 — needs real GPU + frames). The testable parts (abstraction, factory,
> negotiation, capability plumbing, color-conversion math) are done solidly with
> unit tests; the MF H.264 encoder and viewer decoder are first-cut, clearly
> marked, and gated behind a default-off policy so unvalidated H.264 never ships
> as the default.
>
> 1. ENCODER ABSTRACTION (`agent/src/encoder/mod.rs`): the existing `Encoder`
>    trait (`encode(&mut self, &CapturedFrame) -> Result<EncodedFrame>`) is the
>    abstraction; `RawEncoder` (salvaged raw+Zstd+dirty-rects, UNCHANGED behavior)
>    and the new `H264Encoder` both implement it. Factory split into pure pieces:
>    `codec_from_str` (config-string -> `VideoCodec`), `select_codec(negotiated,
>    hardware_available)` (agent-side guard: H.264 only if HW present, HEVC->raw,
>    else raw), and `create_encoder_for(VideoCodec, quality)` (builds the encoder;
>    on H.264 init failure logs + returns a RAW encoder so the session never
>    breaks). UNIT-TESTED: codec_from_str mapping, select_codec guard matrix, raw
>    factory always succeeds, string path resolves to raw without HW.
> 2. CAPABILITY + NEGOTIATION (testable, done well):
>    - `encoder/capability.rs`: `supports_hardware_h264()` probes MF once
>      (`MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_HARDWARE,
>      MFVideoFormat_H264)`), caches the bool via `OnceLock`; false on non-Windows
>      / no HW / MF error. Advertised in `AgentStatus.supports_h264` (proto field
>      11, additive).
>    - Server (`server/src/session/mod.rs`): `select_video_codec(agent_supports,
>      prefer_h264)` is a PURE decision fn — H.264 only when BOTH the agent
>      supports it AND policy prefers it, else raw. Policy constant
>      `DEFAULT_PREFER_H264 = false` (documented: keeps raw as the negotiated codec
>      until H.264 is hardware-validated). `supports_h264` stored on the in-memory
>      `Session` from `AgentStatus` (`update_agent_status` gained the param). The
>      negotiated codec is stamped on `StartStream.video_codec` in
>      `send_start_stream_internal` (the LIVE server->agent codec-selection point —
>      SessionRequest/SessionResponse are not exchanged on the wire in v2, so the
>      proto's `SessionResponse.video_codec` is kept for spec parity but the live
>      path uses `StartStream`). UNIT-TESTED: the negotiation matrix, the
>      default-policy guardrail (capable agent still gets raw), and the
>      `AgentStatus -> supports_h264` ingest.
>    - Agent applies it: `StartStream` handler decodes `video_codec`, stores
>      `negotiated_codec`, and `init_streaming` builds the encoder via
>      `select_codec` + `create_encoder_for` (re-guards on local HW; older server
>      sends 0 = RAW, preserving the default).
> 3. MF H.264 ENCODER (`agent/src/encoder/h264.rs`, FIRST-CUT, compile-verified
>    only): enumerates+activates a HW H.264 encoder MFT, sets H.264 output then
>    NV12 input media types (frame size/rate, bitrate from quality), feeds frames
>    (`ProcessInput`) and drains synchronously (`ProcessOutput`, NEED_MORE_INPUT =
>    "no output this tick"), emitting `VideoFrame{H264(EncodedFrame{data, keyframe,
>    pts, dts})}`. BGRA->NV12 via `encoder/color.rs` (BT.601 limited-range, 2x2 box
>    chroma; isolated + UNIT-TESTED: size, odd-dim/short-buffer rejection, black/
>    white/red reference values, plane coverage). On ANY init failure the FACTORY
>    falls back to raw (logged); per-frame errors surface to the session (which
>    logs + continues). Handles resolution change (re-init), keyframe flag
>    (CleanPoint), MF buffer alloc for non-sample-providing MFTs. NOT yet live: the
>    async-MFT event model is documented as a Task-8 refinement (this cut drains
>    synchronously); precise force-IDR (CODECAPI) is a TODO; D3D11 zero-copy
>    deferred (feeds CPU NV12).
> 4. VIEWER H.264 DECODE (`agent/src/viewer/decoder.rs` [new], FIRST-CUT,
>    compile-verified only): MF H.264 decoder MFT -> NV12 -> BGRA
>    (`nv12_to_bgra`, BT.601 inverse, UNIT-TESTED round-trip within tolerance +
>    short-buffer + black). Runs on a DEDICATED OS thread (`gc-h264-decode`), NOT a
>    tokio task — the MF decoder has COM thread affinity and a tokio task can
>    migrate across workers at await points. The receive task forwards H.264 access
>    units over a std channel; the worker decodes and pushes BGRA `FrameData`
>    through the existing render path via `blocking_send`. On decoder-init failure
>    it logs once and drops H.264 frames; the RAW render path is untouched. Handles
>    the `MF_E_TRANSFORM_STREAM_CHANGE` NV12 output renegotiation + size discovery.
> 5. RAW STILL WORKS END-TO-END: `RawEncoder` is unchanged; with
>    `DEFAULT_PREFER_H264 = false` the server negotiates RAW for every session
>    (including capable agents), the agent builds the raw encoder, and the viewer's
>    existing `Raw` branch renders it — the guaranteed default/fallback path is
>    fully intact and is what runs today.
>
> PROTO (additive — no field renumbered): `VideoCodec` enum (RAW=0, H264=1,
> H265=2); `SessionResponse.video_codec = 5` (spec parity); `StartStream.video_codec
> = 3` (live negotiation); `AgentStatus.supports_h264 = 11` (capability). HEVC is a
> documented TODO/opt-in everywhere (never selected). Cargo.toml: added the
> `Win32_Media_MediaFoundation` + COM windows features (no new external crates).
>
> COMPILE-VERIFIED-ONLY / NEEDS LIVE HARDWARE (Task 8): the MF H.264 encoder
> init/feed/emit on a real GPU, the viewer MF decoder on a live stream, the
> BGRA<->NV12 fidelity end-to-end, and the synchronous-drain timing. The encoder/
> decoder are structured to fall back to raw (encoder) / drop frames + log
> (decoder) on any failure so they cannot break a session even if MF misbehaves.
>
> TESTS ADDED (19): agent +16 (encoder factory/select matrix x5, color BGRA->NV12
> x8, decoder NV12<->BGRA x3), server +3 (codec negotiation matrix, default-policy
> guardrail, AgentStatus capability ingest).

Files touched: `proto/guruconnect.proto` (`VideoCodec` enum + `SessionResponse.video_codec`
+ `StartStream.video_codec` + `AgentStatus.supports_h264`), `agent/Cargo.toml` (MF/COM windows
features), `agent/src/encoder/mod.rs` (trait/factory/select), `agent/src/encoder/raw.rs`
(salvaged, unchanged), `agent/src/encoder/h264.rs` [new], `agent/src/encoder/capability.rs` [new],
`agent/src/encoder/color.rs` [new], `agent/src/session/mod.rs` (negotiated codec apply +
`supports_h264` advertise), `agent/src/viewer/mod.rs` (H.264 route + decode worker),
`agent/src/viewer/decoder.rs` [new], `server/src/session/mod.rs` (`select_video_codec` +
`DEFAULT_PREFER_H264` + `supports_h264` field/ingest + `StartStream` codec stamp),
`server/src/relay/mod.rs` (pass `supports_h264` from `AgentStatus`).

- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
  `EncodedFrame` (h264). Native viewer decodes via MF/D3D11.
- **Fallback:** salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
- **Negotiation:** agent advertises HW-encode capability in `AgentStatus`; server selects the codec in
  `SessionResponse`; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now
  (and the Phase-2 WebRTC track later).

Reference: SPEC-002 §5; `agent/src/encoder/raw.rs` (salvaged), `proto/guruconnect.proto` (EncodedFrame already modeled).

---

## Task 8: Verification (end-to-end, observable)

- **Security (re-audit clean):** run `/gc-audit --pass=security` (and `--pass=rust`) → the three relay
  CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked
  `cak_` key is rejected on `/ws/agent`; a viewer token for session A is rejected on session B; a
  logged-out (blacklisted) viewer token is rejected on `/ws/viewer`; a user JWT is rejected as an agent key.
- **Attended flow:** generate a support code → run the one-time agent → end user sees + accepts consent →
  technician's session appears only after acceptance; a denied consent tears down.
- **Key fidelity (the headline):** in a live session, confirm **Win+R opens Run on the remote**, **Ctrl+C
  on remote / Ctrl+V locally** (and vice-versa, text), and **Ctrl+Alt+Del reaches the remote secure
  desktop**. Confirm no stuck modifiers after alt-tabbing away and back.
- **Codec:** confirm a HW-H.264 machine negotiates h264 (check `SessionResponse`), a Win7/no-HW machine
  falls back to raw+Zstd, both render correctly in the native viewer.
- **Rate limiting:** hammer `/api/auth/login` and the code-validate route → confirm throttling/lockout.
- **Migrations:** fresh DB applies the v2 migrations cleanly; `_sqlx_migrations` consistent; `tenant_id`
  populated with the default tenant.

---

## Task 9 [PROPOSED 2026-06-01 — provisioning model = TOFU auto-enroll, chosen by Mike]: `cak_` auto-enroll provisioning + shared-key retirement

> Context: Task 2 built the SERVER `cak_` machinery (mint/SHA-256 hash/verify in `auth/agent_keys.rs`,
> relay validation in `validate_agent_api_key`, admin issuance `POST /api/machines/:id/keys`). What's
> missing is how an AGENT obtains and uses a `cak_` — today agents still carry the deprecated shared
> `AGENT_API_KEY`, so `connect_agent_keys` is empty and the relay logs the DEPRECATED-shared-key warning
> for every agent. This task closes that with **trust-on-first-use auto-enroll** so the shared key can be
> retired (unblocks task list #5). NOTE: the agent already presents whatever is in its `api_key` slot and
> the relay auto-detects `cak_` vs shared — so a `cak_`-keyed agent needs **no change to its auth call**,
> only a way to *receive*, *persist*, and *prefer* a `cak_`.

**Flow (TOFU):**
1. **Bootstrap (first connect):** a fresh agent authenticates on `/ws/agent` with a bootstrap secret —
   interim: the shared `AGENT_API_KEY` (embedded by the download endpoint); target: a single-use,
   short-lived **enroll token** (more secure TOFU — see Security).
2. **Server issues on first connect:** when an agent authed via the bootstrap path (i.e. NOT already
   `cak_`-keyed) connects and its machine has **no active (non-revoked) `cak_`**, the relay: resolves/creates
   the machine row (existing `upsert_machine` on `machine_uid` — now functional after the 2026-06-01
   ON CONFLICT fix), mints a `cak_` (`generate_agent_key` + `db::agent_keys::insert_agent_key` for that
   `machine_id`), and sends the plaintext key to the agent **once** over a new server→agent message. Only
   the hash is stored. **Idempotent:** never re-issue if an active key already exists for the machine.
3. **Agent receives + persists + prefers:** on `AgentKeyProvision`, the agent persists the `cak_` durably at
   `%ProgramData%\GuruConnect\agent_key` (restricted ACL, same pattern as `machine_uid`). On startup it loads
   the persisted `cak_` if present and uses it as its auth key, falling back to the embedded/bootstrap secret
   only when no `cak_` is stored yet. After provisioning, every reconnect authenticates via `cak_` (no more
   DEPRECATED-shared-key warning for that agent).
4. **Shared-key retirement (phased):** Phase A — shared key stays as the bootstrap so existing+new agents
   self-enroll; monitor the relay WARN count → ~0. Phase B — once the fleet is `cak_`-keyed, restrict the
   shared `AGENT_API_KEY` to enrollment-only or remove the env entirely (only `cak_` / enroll-token accepted).
   This is the concrete completion of task-list #5.

**Protocol (4-artifact drift discipline):** add `AgentKeyProvision { string key = 1; }` (server→agent) to
`proto/guruconnect.proto` with a new reserved message ID; regenerate prost on both agent + server; the
hand-written `dashboard/src/lib/protobuf.ts` decoder does NOT need it (agent-plane only) but reserve the ID.

**Files:** `proto/guruconnect.proto` (new message); `server/src/relay/mod.rs` (issue+send on bootstrap connect
with no active key); `server/src/db/agent_keys.rs` (add `has_active_key(machine_id)` check; reuse insert);
`agent/src/transport/*` (handle inbound `AgentKeyProvision`); `agent/src/config.rs` + a small key-store module
(load/persist `cak_`, prefer over bootstrap).

**Security (TOFU):** the first connect trusts the bootstrap secret — a leaked shared key during the enroll
window could enroll a rogue agent; the secure target is a **single-use, short-lived enroll token** per
deployment instead of the shared key (shared-key bootstrap is interim convenience). The `cak_` is sent
plaintext once over the existing wss/TLS channel; only the hash is stored server-side; the agent stores it
locally with restricted ACLs. Revocation via the existing `DELETE /api/machines/:id/keys/:key_id` fails the
agent closed; on its next bootstrap connect it re-enrolls. The keyed-agent dedup (Task 3) keeps the
authenticated identity authoritative.

**Verification:** drop a current-build (signed 0.3.0+) agent configured with the shared-key bootstrap →
it connects, receives a `cak_`, persists it; restart → it authenticates via the `cak_` (relay shows NO
DEPRECATED-shared-key warning) and `connect_agent_keys` holds exactly one active key for the machine; issue
is idempotent across reconnects; revoke the key via the admin API → agent rejected, then re-enrolls on next
bootstrap connect. Reference: `auth/agent_keys.rs`, `api/machine_keys.rs`, `relay/mod.rs:266-309`
(`validate_agent_api_key`), `.claude/standards/security/credential-handling.md`.