587 lines
42 KiB
Markdown
587 lines
42 KiB
Markdown
# v2 Secure Session Core — Implementation Plan
|
||
|
||
> **STATUS 2026-05-30: Tasks 1–7 IMPLEMENTED + DEPLOYED. Tasks 3–5 now CODE-REVIEWED — verdict
|
||
> APPROVE-WITH-FIXES (no CRITICAL/HIGH).** Compile-verified on GURU-5070: `cargo fmt --check` clean,
|
||
> `clippy -D warnings` 0 warnings, `cargo test --workspace` 89 pass. The 3 audit CRITICALs verified
|
||
> closed with no bypass; all security paths fail closed. Non-blocking follow-ups tracked: viewer-token
|
||
> logout revocation (MEDIUM, TTL-bounded), delete the dead `validate_agent_key` "accept-any" placeholder
|
||
> (MEDIUM), `X-Real-IP`/consent-comment/support-code-log hygiene (LOW). **Remaining for Phase-1 exit:
|
||
> Task 8 (e2e verification + `/gc-audit --pass=security` re-audit).**
|
||
>
|
||
> Spec created: 2026-05-29
|
||
> Status: in progress — Tasks 1-4 IMPLEMENTED 2026-05-29 (Task 4 self-reviewed, pending Code Review;
|
||
> Tasks 1-3 code-reviewed APPROVED). Task 4 completes the KEYSTONE (secure auth/session core). Viewer-token authz
|
||
> STRENGTH split IMPLEMENTED 2026-05-29 (self-reviewed; no Rust toolchain on this machine — not yet
|
||
> `cargo check`-verified; pending Code Review). This was the REQUIRED Phase-1-exit follow-up: the gate
|
||
> previously used `view` (held by EVERY default role incl. `viewer`) but a viewer token granted input
|
||
> CONTROL. DECIDED (Mike, 2026-05-29) + IMPLEMENTED: SPLIT VIEW_ONLY/CONTROL tokens — `view`-perm users
|
||
> get a watch-only token (relay refuses their input), admin/`control` users get a control token. See the
|
||
> "Task 3 authz-strength fix" block under Task 3 below. Resolves coord todo c8916c89 (coordinator marks
|
||
> done after review). Remaining follow-up: nothing revokes a minted viewer token on logout (bounded by
|
||
> 5-min TTL) — follow-up todo. Task 4 (rate limiting + single-use codes) next.
|
||
> CARRY-FORWARD: Task 3 MUST add a viewer-token AUTHORIZATION check (admin/permission gate) — Task 2
|
||
> fixed only the token *mechanism*; the authz gate is what actually closes audit CRITICAL #1.
|
||
> Policy DECIDED (Mike, 2026-05-29): admin-or-view-permission (`is_admin() || has_permission(...)`).
|
||
> Parent: `docs/specs/SPEC-002-v2-modernization-architecture.md` (Phase 1)
|
||
> Keystone: Tasks 1–4 are the "get-right-first" secure auth/session core — every audit CRITICAL/HIGH
|
||
> is closed there. Tasks 5–7 deliver the product capability on top. Do them in order.
|
||
|
||
## Task 0: Commit this spec
|
||
|
||
Commit the `specs/v2-secure-session-core/` directory before writing any code:
|
||
|
||
```
|
||
git add specs/v2-secure-session-core/
|
||
git commit -m "spec: add v2-secure-session-core shape spec"
|
||
```
|
||
|
||
Do not start Task 1 until this commit exists.
|
||
|
||
---
|
||
|
||
## Task 1 (KEYSTONE) [DONE 2026-05-29]: v2 schema — per-agent keys + tenancy-ready tables
|
||
|
||
> [DONE] migration `004_v2_secure_session_core.sql` + `db/agent_keys.rs` + `db/tenancy.rs` + struct/query
|
||
> updates across machines/sessions/support_codes/events/users. Code-reviewed APPROVED. Note: GC's db
|
||
> layer already uses runtime `sqlx::query()` (no macros) — the v2 "switch to runtime" was already true.
|
||
|
||
Files touched: `server/migrations/` (new v2 migration files), `server/src/db/` (rebuilt modules:
|
||
`agent_keys.rs` [new], `sessions.rs`, `machines.rs`, `support_codes.rs`, `events.rs`, `users.rs`,
|
||
`mod.rs`).
|
||
|
||
- New table `connect_agent_keys`: `id UUID`, `machine_id UUID FK`, `key_hash TEXT`, `tenant_id UUID NULL`,
|
||
`created_at`, `last_used_at`, `revoked_at TIMESTAMPTZ NULL`. Keys are `cak_`-prefixed, stored hashed
|
||
(SHA-256, mirroring v1's hash helper); plaintext returned once at issuance.
|
||
- Add nullable `tenant_id UUID` to `machines`, `sessions`, `support_codes`, `events`, `users`,
|
||
`connect_agent_keys`, defaulting to a single bootstrap tenant row. Add a `tenants` table with one seed row.
|
||
- `sessions` gains `is_managed BOOLEAN`, `source TEXT` (`'standalone'|'gururmm'`), `consent_state TEXT`
|
||
(`'not_required'|'pending'|'granted'|'denied'`), `tenant_id`.
|
||
- `support_codes`: add single-use semantics — `consumed_at TIMESTAMPTZ NULL`; widen the code column to
|
||
hold a higher-entropy human-readable code (see Task 4).
|
||
- Migrations are **idempotent** (`CREATE TABLE IF NOT EXISTS` / `ADD COLUMN IF NOT EXISTS`), applied on
|
||
server startup, recorded in `_sqlx_migrations`. New queries use runtime `sqlx::query()`.
|
||
- DB modules expose a `tenancy` helper that today resolves every call to the default tenant (Phase-4
|
||
switch point). Struct fields match columns (the v1 audit flagged 3 unmapped v003 columns — don't repeat).
|
||
|
||
Reference: `specs/native-remote-control/plan.md` Task 2 (`connect_agent_keys`); `.claude/standards/gururmm/sqlx-migrations.md`.
|
||
|
||
---
|
||
|
||
## Task 2 (KEYSTONE) [DONE 2026-05-29 — code-reviewed APPROVED]: Rebuilt auth model — plane separation + session-scoped viewer tokens
|
||
|
||
> CARRY-FORWARD TO TASK 3: viewer-token minting is gated only by `AuthenticatedUser` (authentication,
|
||
> not authorization). GC has a real `admin|operator|viewer` role + permissions model, so this is intra-
|
||
> tenant privilege escalation until a permission check is added. **The mechanism is fixed here; the authz
|
||
> check in Task 3 is what closes audit CRITICAL #1.** Metadata bug todo faf39fe0 resolved (migration 005).
|
||
|
||
> [IMPLEMENTED] `auth/agent_keys.rs` [new] (cak_ mint/SHA-256 hash/verify), `auth/jwt.rs`
|
||
> (`ViewerClaims` + `create_viewer_token`/`validate_viewer_token`, 5-min TTL, `purpose:"viewer"`),
|
||
> `auth/mod.rs` (module + re-export). Deleted the JWT-as-agent-key branch in `relay/mod.rs`
|
||
> `validate_agent_api_key` — now per-agent `cak_` key OR deprecated shared `AGENT_API_KEY` (WARNING-logged),
|
||
> never a user JWT. New endpoints: `POST/GET /api/machines/:agent_id/keys`,
|
||
> `DELETE /api/machines/:agent_id/keys/:key_id` (admin), `POST /api/sessions/:id/viewer-token` (dashboard JWT).
|
||
> db helpers added: `agent_keys::{list_for_machine,key_belongs_to_machine}`. Folded in migration
|
||
> `005_machine_metadata.sql` + Machine struct org/site/tags mapping (coord todo faf39fe0). No Rust
|
||
> toolchain on this machine — self-reviewed; not yet `cargo check`-verified.
|
||
|
||
Files touched: `server/src/auth/` (`mod.rs`, `jwt.rs`, `agent_keys.rs` [new], `token_blacklist.rs`,
|
||
`password.rs`), `server/src/api/auth.rs`, `server/src/api/sessions.rs`.
|
||
|
||
- **Delete the JWT-as-agent-key path.** Remove the `jwt_config.validate_token` branch from
|
||
`validate_agent_api_key` (`relay/mod.rs:224`); agent authentication validates a `cak_` per-agent key
|
||
(hash compare against `connect_agent_keys`, reject if `revoked_at` set) OR a support code — **never a
|
||
user JWT**.
|
||
- **Session-scoped viewer tokens:** new endpoint `POST /api/sessions/:id/viewer-token` (auth: dashboard
|
||
JWT + authorization that the user may view that session/machine) mints a short-lived (~5 min) JWT whose
|
||
claims include `session_id` and `tenant_id`. This replaces "any dashboard JWT can view any session."
|
||
- Keep Argon2id passwords; keep the blacklist but make `is_revoked()` callable from the WS layer (Task 3).
|
||
- Per-agent key issuance endpoint (admin): `POST /api/machines/:id/keys` → returns plaintext `cak_` once,
|
||
stores hash. `DELETE /api/machines/:id/keys/:key_id` sets `revoked_at`.
|
||
|
||
Reference: `relay/mod.rs:224` (`validate_agent_api_key` — the CRITICAL), `auth/mod.rs:116`
|
||
(blacklist already consulted for REST — extend to WS), `specs/native-remote-control/plan.md` Tasks 2/3/6;
|
||
`.claude/standards/security/credential-handling.md`, `.claude/standards/api/response-format.md`.
|
||
|
||
---
|
||
|
||
## Task 3 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified]: Secure relay WS handlers + bounded relay
|
||
|
||
> [IMPLEMENTED] Viewer WS now verifies the session-scoped VIEWER token
|
||
> (`validate_viewer_token`: sig+exp+`purpose`) + `token_blacklist.is_revoked` +
|
||
> `session_id` claim == requested session, before upgrade — raw login JWTs are no
|
||
> longer accepted (closes CRITICAL #1 mechanism + #2). Authz gate added to
|
||
> `mint_viewer_token`: `is_admin() || has_permission("view")` → 403 envelope on
|
||
> failure (closes CRITICAL #1 — uses the EXISTING `view` permission from GC's
|
||
> catalog; no new permission defined). Agent WS now binds persistent reattach to
|
||
> the authenticated machine identity: `validate_agent_api_key` returns an
|
||
> `AgentKeyAuth` enum carrying the `cak_` key's machine `agent_id` (resolved via
|
||
> new `db::machines::get_machine_by_id`); a mismatched query-string `agent_id` is
|
||
> ignored, a per-agent key whose machine can't be resolved fails closed
|
||
> (503). Frame caps set on BOTH upgrades (agent 4 MiB, viewer 64 KiB via
|
||
> `max_message_size`/`max_frame_size`) (closes WS-OOM HIGH). Viewer→agent input
|
||
> throttled to 200 events/sec/viewer via a refilling token bucket + non-blocking
|
||
> bounded `try_send` (drop/coalesce on overflow) (closes input-injection MEDIUM).
|
||
> Startup managed-session reconcile retained + clarified (persistent machines →
|
||
> offline in-memory sessions). Removed `#[allow(dead_code)]` on
|
||
> `validate_viewer_token` and `AuthenticatedUser::has_permission`. No token/secret
|
||
> logged; runtime `sqlx::query`/`query_as`. Files: `server/src/relay/mod.rs`,
|
||
> `server/src/api/sessions.rs`, `server/src/db/machines.rs`, `server/src/auth/mod.rs`,
|
||
> `server/src/auth/jwt.rs`, `server/src/main.rs`.
|
||
|
||
### Task 3 authz-strength fix — VIEW_ONLY/CONTROL token split [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]
|
||
|
||
> Closes audit CRITICAL #1 at full strength (coord todo c8916c89). The Task-3 gate
|
||
> minted a viewer token for any `is_admin() || has_permission("view")` user, but `view`
|
||
> is held by EVERY default role (incl. `viewer`) and the token granted input CONTROL —
|
||
> intra-tenant privilege escalation. Now the token carries an ACCESS MODE inside its
|
||
> signed claims and the relay enforces it:
|
||
>
|
||
> - `auth/jwt.rs`: new `ViewerAccess` enum (`ViewOnly` | `Control`, serde-renamed to
|
||
> `"view_only"`/`"control"`); `ViewerClaims` gains an `access: ViewerAccess` field;
|
||
> `create_viewer_token(..., access)` stamps it; `validate_viewer_token` returns it as
|
||
> part of the claims (sig+exp+`purpose` checks unchanged). New unit tests cover the
|
||
> round-trip, the lowercase wire form, and login-JWT rejection.
|
||
> - `auth/mod.rs`: re-export `ViewerAccess`.
|
||
> - `api/sessions.rs` (`mint_viewer_token`): TIERED mint — `is_admin() || has_permission("control")`
|
||
> → CONTROL token; else `has_permission("view")` → VIEW_ONLY token; else → 403 (standard
|
||
> envelope). Permission constants `SESSION_CONTROL_PERMISSION="control"` /
|
||
> `SESSION_VIEW_PERMISSION="view"`. Response echoes `access` (advisory; the signed claim
|
||
> is authoritative).
|
||
> - `relay/mod.rs`: `viewer_ws_handler` reads `claims.access` from the VERIFIED token and
|
||
> threads it into `handle_viewer_connection` (new `access: ViewerAccess` param). In the
|
||
> input path, a view-only token's `MouseEvent`/`KeyEvent`/`SpecialKey` are refused (a
|
||
> guarded match arm `if !access.can_control()` that silently drops + logs once-per-
|
||
> power-of-two), BEFORE the throttle/`try_send`. A control token forwards as before (with
|
||
> the Task-3 throttle). Video still streams to a view-only viewer; chat (not an injected-
|
||
> input vector) is still relayed. The mode cannot be forged — it lives in the signed token.
|
||
>
|
||
> Everything else from Task 3 (session_id-claim match, blacklist, frame caps, throttle,
|
||
> agent identity binding) is intact — this is purely additive access-mode enforcement.
|
||
>
|
||
> PHASE-2 REFINEMENT: this refuses to FORWARD input for a view-only token; it does NOT yet
|
||
> tie the viewer mode to the agent-side `SessionType.VIEW_ONLY` capture mode (the agent still
|
||
> does full capture). Deferred (deeper agent change).
|
||
|
||
Files touched: `server/src/relay/mod.rs`, `server/src/session/mod.rs`.
|
||
|
||
- **`viewer_ws_handler`** (`relay/mod.rs:242`): verify the viewer token's **signature + expiry +
|
||
blacklist + `session_id` claim == requested `session_id`** before `handle_viewer_connection`
|
||
(`relay/mod.rs:595`). Reject otherwise. (Fixes the any-JWT-joins-any-session + blacklist-bypass CRITICALs.)
|
||
- **Viewer-token AUTHORIZATION (carry-forward from Task 2 review) — this is what actually closes audit
|
||
CRITICAL #1.** The minting endpoint `POST /api/sessions/:id/viewer-token` (in `server/src/api/sessions.rs`)
|
||
must enforce a real permission predicate, not just `AuthenticatedUser`: `user.is_admin() ||
|
||
user.has_permission(<policy>)`. GC's role model (`admin|operator|viewer`) + permissions table already
|
||
exist (`server/src/auth/mod.rs`), so honoring the intra-tenant role distinction is cheap. **Policy
|
||
DECIDED (Mike, 2026-05-29): admin-or-view-permission** — `user.is_admin() || user.has_permission(<the
|
||
session-view/control permission in GC's catalog — use the real name; define one if absent>)`. Enforce
|
||
at the minting endpoint `mint_viewer_token` (the authz decision point); the WS then trusts the
|
||
session-scoped token. Multi-tenant client-access isolation stays deferred to Phase 4.
|
||
- **`agent_ws_handler`** (`relay/mod.rs:55`): authenticate via per-agent key OR support code only
|
||
(Task 2). Persistent reattach must bind to the authenticated machine identity, not a query-string
|
||
`agent_id` alone (`session/mod.rs:98`).
|
||
- **Frame caps:** set explicit `.max_message_size(...)`/`.max_frame_size(...)` on both
|
||
`WebSocketUpgrade`s; reject oversized frames before `to_vec()`/broadcast. (Fixes WS-OOM HIGH.)
|
||
- **Input throttle:** bound + rate-limit the viewer→agent input queue (`relay/mod.rs:669`); cap events/sec.
|
||
- Reconcile managed sessions from DB on startup so they aren't orphaned.
|
||
|
||
Reference: audit Pass E (`reports/2026-05-29-gc-audit.md` §"Pass 5"); `relay/mod.rs:242,55,595,669`.
|
||
|
||
---
|
||
|
||
## Task 4 (KEYSTONE) [IMPLEMENTED 2026-05-29 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Working rate limiting + single-use support codes
|
||
|
||
> [IMPLEMENTED] Closes the keystone (Tasks 1–4). Three parts:
|
||
>
|
||
> A. RATE LIMITING — replaced the non-compiling tower_governor layer with a small
|
||
> self-contained in-memory limiter (`middleware/rate_limit.rs`): a per-IP
|
||
> fixed-window `RateLimiter` (`Mutex<HashMap<IpAddr, Window>>`, no new dep) +
|
||
> a per-IP consecutive-failure `FailureLockout`, bundled as `RateLimitState`
|
||
> in `AppState`. Keyed by `ConnectInfo<SocketAddr>` IP (same source the relay
|
||
> uses); X-Forwarded-For intentionally NOT trusted (proxy-spoofable). 429 with
|
||
> the standard error envelope on limit. Re-enabled `pub mod rate_limit`. Wired
|
||
> per-route via `route_layer(from_fn_with_state(...))` onto `POST
|
||
> /api/auth/login` (8/min/IP), `POST /api/auth/change-password` (5/min/IP), and
|
||
> `GET /api/codes/:code/validate` (15/min/IP). Named consts for every limit.
|
||
> LOCKOUT: after 10 consecutive failed code-validations from an IP, that IP is
|
||
> locked out 15 min; the validate handler reports success/failure into the
|
||
> lockout, the middleware enforces it BEFORE the handler runs. Unit tests cover
|
||
> window allow/block/reset, per-IP isolation, and lockout trip/reset/expire
|
||
> (clock injected, no sleeps).
|
||
>
|
||
> B. SINGLE-USE CODES — the agent bind path now CONSUMES the code atomically on
|
||
> first bind. In-memory: new `SupportCodeManager::consume_for_bind` accepts
|
||
> ONLY a `Pending` code and flips it to `Connected` under the write lock (a 2nd
|
||
> presenter loses the race → rejected). This replaces the v1 pre-upgrade check
|
||
> that accepted `pending` OR `connected` (the reusable-code HIGH). DB: new
|
||
> `db::support_codes::consume_code_for_bind` — a single conditional UPDATE
|
||
> `... SET consumed_at = NOW(), status='connected' WHERE code=$1 AND consumed_at
|
||
> IS NULL AND status='pending' AND (expires_at IS NULL OR expires_at > NOW())
|
||
> RETURNING id`; zero rows ⇒ not consumable. The in-memory consume is
|
||
> AUTHORITATIVE (the live source of truth); the DB UPDATE is a durable/audit
|
||
> mirror applied best-effort after it (a missing DB row does not veto a bind the
|
||
> in-memory layer admitted). To make the durable record meaningful, the portal
|
||
> `create_code` handler now also inserts the code into `connect_support_codes`.
|
||
> Validate/preview path is UNCHANGED and explicitly does NOT consume (test
|
||
> `preview_validate_does_not_consume`).
|
||
>
|
||
> C. WIDER CODE — replaced the 6-digit numeric generator with a grouped
|
||
> base32-style code `XXX-XXX-XXX` (9 symbols over a 31-char UNAMBIGUOUS alphabet
|
||
> excluding 0/O/1/I/L ≈ 44.6 bits), CSPRNG-backed (`OsRng`, rejection sampling
|
||
> to avoid modulo bias). The new code (11 chars incl. hyphens) does NOT fit the
|
||
> `VARCHAR(10)` column from migration 001, so migration `006_widen_support_code.sql`
|
||
> widens `connect_support_codes.code` AND `connect_sessions.support_code` to
|
||
> TEXT (idempotent). Unit tests cover shape, charset (no ambiguous chars), and
|
||
> practical uniqueness.
|
||
>
|
||
> DEPS: none added; `tower_governor` REMOVED from Cargo.toml (it never compiled).
|
||
> No code/secret/support-code value logged on any path. Runtime `sqlx::query`.
|
||
> Files: `server/src/middleware/rate_limit.rs` (rebuilt), `server/src/middleware/mod.rs`,
|
||
> `server/src/main.rs` (AppState field + 3 route wirings + create_code DB insert +
|
||
> validate handler lockout feed), `server/src/support_codes.rs` (new generator +
|
||
> `consume_for_bind` + tests), `server/src/db/support_codes.rs`
|
||
> (`consume_code_for_bind`), `server/src/relay/mod.rs` (atomic consume on bind),
|
||
> `server/migrations/006_widen_support_code.sql` [new], `server/Cargo.toml`.
|
||
|
||
Files touched: `server/src/middleware/rate_limit.rs` (rebuild — v1 is non-compiling),
|
||
`server/src/middleware/mod.rs`, `server/src/api/auth.rs` (login), `server/src/api/` (code validate),
|
||
`server/src/db/support_codes.rs`, `server/src/relay/mod.rs` (code bind).
|
||
|
||
- Rebuild a **compiling** rate-limit layer (fix the `tower_governor` generics, or a small in-memory
|
||
fixed-window limiter); re-enable `pub mod rate_limit`. Wire it to `POST /api/auth/login`,
|
||
`POST /api/auth/change-password`, and the support-code validate route.
|
||
- **Single-use codes:** consume atomically on first agent bind (accept only a `pending`, unconsumed code;
|
||
set `consumed_at`); reject a second presenter. (Fixes the reusable-code HIGH.)
|
||
- **Widen the code:** higher-entropy human-readable format (e.g. grouped base32, ~40+ bits) replacing
|
||
6-digit numeric; add per-IP lockout on repeated validate failures.
|
||
|
||
Reference: audit Pass B/E (rate limiting disabled/non-compiling; reusable codes); `middleware/mod.rs:3-11`.
|
||
|
||
---
|
||
|
||
## Task 5 [IMPLEMENTED 2026-05-30 — self-reviewed; no Rust toolchain on this machine, not yet `cargo check`-verified; pending Code Review]: Attended-mode consent
|
||
|
||
> [IMPLEMENTED] An ATTENDED (support-code) session now requires the end user to
|
||
> ACCEPT a native consent prompt before the technician's session is surfaced.
|
||
>
|
||
> - PROTO (`proto/guruconnect.proto`): added `ConsentRequest` (server/relay →
|
||
> agent: `session_id`, `technician_name`, `access_mode` (`ConsentAccessMode`
|
||
> = `CONSENT_VIEW`|`CONSENT_CONTROL`), `timeout_secs`) and `ConsentResponse`
|
||
> (agent → server: `session_id`, `granted`, `reason`), inserted AFTER
|
||
> `AdminCommand`. New `Message` oneof field numbers `consent_request = 80`,
|
||
> `consent_response = 81` (no existing field renumbered).
|
||
> - SERVER (`session/mod.rs`): new `ConsentState` enum
|
||
> (`NotRequired|Pending|Granted|Denied`, `as_db_str`/`allows_viewer`) added to
|
||
> the in-memory `Session`. `register_agent` starts attended (`!is_persistent`)
|
||
> sessions `Pending`, managed/persistent `NotRequired`. `join_session` REFUSES
|
||
> any viewer unless `consent_state.allows_viewer()` (only `Granted`/
|
||
> `NotRequired`) — this is the gate that keeps a support session invisible to
|
||
> the technician until accepted. New `set_consent_state`/`get_consent_state`.
|
||
> Unit tests cover the db-string mapping, the viewer-admission predicate, and
|
||
> the attended-pending-blocks / granted-admits / managed-admits / denied-blocks
|
||
> transitions.
|
||
> - SERVER (`relay/mod.rs`): after registering an attended agent,
|
||
> `run_consent_handshake` sends `ConsentRequest`, audits `consent_requested`,
|
||
> then waits up to `CONSENT_TIMEOUT_SECS = 60` for a `ConsentResponse`.
|
||
> granted → `consent_state = Granted` + audit `consent_granted` + proceed.
|
||
> denied/timeout/agent-disconnect → `consent_state = Denied` + audit
|
||
> `consent_denied`, send a Disconnect to the agent, end the session row
|
||
> (`status='denied'`), release the code, and TEAR DOWN (early return — the
|
||
> technician never sees the session). In-memory consent is authoritative; the
|
||
> DB `consent_state` (via `db::sessions::update_consent_state`) is a durable/
|
||
> audit mirror. A late/dup `ConsentResponse` in the main loop is logged+ignored
|
||
> (no silent unhandled variant). `db::events` gained `CONSENT_GRANTED`/
|
||
> `CONSENT_DENIED`/`CONSENT_REQUESTED`. `db::sessions::create_session` now sets
|
||
> `is_managed`/`source`/`consent_state` from `is_support_session` (attended →
|
||
> `false`/`standalone`/`pending`; managed → `true`/`gururmm`/`not_required`).
|
||
> `api::SessionInfo` echoes `consent_state` so the dashboard can show "awaiting
|
||
> consent".
|
||
> - AGENT (`consent/mod.rs` [new], `session/mod.rs`, `main.rs`): on
|
||
> `ConsentRequest`, `handle_consent_request` runs a blocking Windows
|
||
> `MessageBox` (MB_YESNO | TOPMOST | SETFOREGROUND | SYSTEMMODAL | ICONQUESTION)
|
||
> on `spawn_blocking` so the async loop/heartbeats are not stalled, phrasing
|
||
> the prompt VIEW vs VIEW-and-CONTROL from `access_mode`, then sends a
|
||
> `ConsentResponse`. Anything other than an explicit Yes (closed box, panic) is
|
||
> a DENY. Non-Windows build is a `// TODO(platform)` stub that fails CLOSED
|
||
> (denies). Unit tests cover the prompt wording + the access-mode decode
|
||
> fallback.
|
||
> - Managed/unattended sessions are `not_required`, never prompted (Phase-1
|
||
> default). PER-TENANT consent policy beyond this default is a future
|
||
> refinement — left as a TODO (no per-tenant policy table consulted yet).
|
||
> - No `.unwrap()` on a non-test path; runtime `sqlx::query`; no code/secret/
|
||
> token value logged. No Rust toolchain here — self-reviewed only.
|
||
|
||
Files touched: `proto/guruconnect.proto`, `agent/src/session/mod.rs`, `agent/src/consent/mod.rs` (new),
|
||
`agent/src/main.rs`, `server/src/relay/mod.rs`, `server/src/session/mod.rs`, `server/src/db/sessions.rs`,
|
||
`server/src/db/events.rs`, `server/src/api/mod.rs`.
|
||
|
||
- Add `ConsentRequest` / `ConsentResponse` to the proto (after `AdminCommand`).
|
||
- On an attended session, the agent shows a consent dialog to the end user; the server keeps the session
|
||
`consent_state = pending` and surfaces it to the technician only on `granted`. `denied`/timeout → tear down.
|
||
- Managed/unattended sessions follow per-tenant policy (default: silent for managed; `consent_state =
|
||
not_required`). Audit the consent decision to `events`.
|
||
|
||
Reference: `specs/native-remote-control/plan.md` Task 5 (Consent primitive); proto `AdminCommand` insertion point.
|
||
|
||
---
|
||
|
||
## Task 6 [IMPLEMENTED 2026-05-30 — self-verified on a local Windows toolchain: `cargo fmt --all` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 69 pass (19 agent + 50 server), `cargo build --workspace` ok; pending Code Review]: Native viewer — full key fidelity
|
||
|
||
> [IMPLEMENTED] The four parts:
|
||
>
|
||
> 1. VIEWER CAPTURE (`agent/src/viewer/input.rs`): the `WH_KEYBOARD_LL` hook now
|
||
> DIVERTS system combinations (Windows/Apps keys, Alt+Tab, Alt+Esc, Ctrl+Esc;
|
||
> Win+R / Win+E compose on the remote because the diverted Win-down is forwarded)
|
||
> and forwards them as full-fidelity `KeyEvent`s — VK + hardware scan code +
|
||
> `is_extended` (read from `KBDLLHOOKSTRUCT.flags & LLKHF_EXTENDED`) + modifier
|
||
> snapshot — returning `LRESULT(1)` to suppress local handling. The combo decision
|
||
> is a pure `is_system_combo(vk, alt, ctrl)` so it is unit-tested. Ordinary keys are
|
||
> NOT forwarded by the hook (they take the normal winit path — avoids double inject).
|
||
> A "send system keys to remote" toggle (`AtomicBool`, default ON) gates the diversion;
|
||
> when OFF the hook is transparent. Toggle API: `set_/toggle_/send_system_keys_enabled`.
|
||
> Host key: Pause/Break flips it (handled in `render.rs`, intercepted locally, logged).
|
||
> 2. AGENT INJECTION (`agent/src/input/keyboard.rs`): rewrote `send_key` to inject via
|
||
> `SendInput` with `KEYEVENTF_SCANCODE` (layout-independent) and the correct
|
||
> `KEYEVENTF_EXTENDEDKEY` (set when the viewer flagged extended, the VK is inherently
|
||
> extended, or the VK→scan map carries the 0xE0 prefix). New `key_event_full(vk, scan,
|
||
> is_extended, down)` consumes all `KeyEvent` fields (scan 0 ⇒ derive from VK for older
|
||
> viewers); wired into `session/mod.rs` `KeyEvent` handling. `is_extended_key` delegates
|
||
> to a platform-independent `vk_is_extended` (unit-tested) that also covers right Ctrl/Alt.
|
||
> 3. CTRL+ALT+DEL: the existing `SpecialKey::CtrlAltDel` → `send_ctrl_alt_del` →
|
||
> `send_sas` path is confirmed and hardened. `send_sas` now: Tier 1 SAS helper service
|
||
> (SYSTEM, named pipe) → Tier 2 direct `sas.dll!SendSAS` → Tier 3 **fails with a clear,
|
||
> actionable error** (no false success; plain SendInput can't reach the secure desktop).
|
||
> The SAS installer (`bin/sas_service.rs install`) now sets the `SoftwareSASGeneration`
|
||
> Winlogon policy (HKLM\...\Policies\System, DWORD 1 = services) via `set_software_sas_policy`,
|
||
> with a `// TODO(installer)` noting the top-level managed installer should also ensure it.
|
||
> 4. MODIFIER HYGIENE: viewer tracks held Ctrl/Alt/Shift/Win (`ViewerModifierState` in
|
||
> `render.rs`) and emits explicit key-ups on FOCUS LOSS (`WindowEvent::Focused(false)`)
|
||
> and on window close, so a modifier pressed-but-not-released across a blur doesn't stay
|
||
> latched on the remote. Agent-side `KeyboardController` also tracks injected modifiers and
|
||
> exposes `release_all_modifiers` (defensive complement; wired via `ModifierState`, the
|
||
> scaffolding the cleanup kept). Both trackers unit-tested.
|
||
>
|
||
> PROTO: `KeyEvent` gained `bool is_extended = 7` (new field number, nothing renumbered)
|
||
> — carries the viewer's `LLKHF_EXTENDED` capture so injection picks the right extended
|
||
> flag; older agents that ignore it fall back to deriving from vk/scan. `SpecialKeyEvent`
|
||
> already carried `CTRL_ALT_DEL`; unchanged.
|
||
>
|
||
> dead_code wired/removed: `InputEvent::SpecialKey` (now emitted/consumable),
|
||
> `ModifierState` (agent keyboard — now tracks + drains held modifiers),
|
||
> `type_char`/`type_string` kept with their allows (separate unicode-typing feature, not in
|
||
> Task 6 scope). `sas_client::is_service_available`/`get_service_status` kept narrowly allowed
|
||
> (status API not yet wired into a runtime path). `InputController::key_event` (vk-only) and
|
||
> `release_all_modifiers` kept allowed as API surface (the relayed path uses `key_event_full`,
|
||
> focus-loss re-sync is viewer-driven). New toggle accessors `set_/send_system_keys_enabled`
|
||
> narrowly allowed (host-key uses `toggle_`; future viewer menu).
|
||
>
|
||
> TESTS ADDED (13): extended-key flag determination (extended + non-extended sets),
|
||
> modifier-state transitions (record/down-up/ignore-non-modifier/drain-and-clear), the
|
||
> system-combo classifier (Win/Alt+Tab/Alt+Esc/Ctrl+Esc divert; ordinary keys don't), and
|
||
> the toggle state machine (default ON, flip, explicit set). Live hook/SendInput/SendSAS
|
||
> behavior is plan Task 8 (needs a real desktop).
|
||
|
||
Files touched: `proto/guruconnect.proto` (KeyEvent `is_extended = 7`), `agent/src/viewer/input.rs`
|
||
(rewritten hook + toggle), `agent/src/viewer/render.rs` (focus-loss modifier release, host-key
|
||
toggle, is_extended on winit path), `agent/src/viewer/mod.rs` (unchanged SpecialKey forwarding),
|
||
`agent/src/input/keyboard.rs` (scan-code injection + ModifierState + vk_is_extended + tests),
|
||
`agent/src/input/mod.rs` (key_event_full / release_all_modifiers / vk_is_extended re-export),
|
||
`agent/src/session/mod.rs` (KeyEvent → key_event_full), `agent/src/bin/sas_service.rs`
|
||
(set_software_sas_policy + install wiring), `agent/src/input/keyboard.rs` (send_sas hardening).
|
||
|
||
- **Viewer capture:** install `WH_KEYBOARD_LL` while the viewer is in focused control; divert system
|
||
combos (Win key, Win+R, Alt+Tab, Ctrl+Esc) and forward as `KeyEvent` (VK + scan code + extended flag)
|
||
instead of letting the local shell act. Add a "send system keys to remote" toggle.
|
||
- **Agent injection:** scan-code `SendInput` (`KEYEVENTF_SCANCODE` + correct `KEYEVENTF_EXTENDEDKEY` for
|
||
right-Ctrl/Alt, arrows, Win, Insert/Home). Extend the salvaged `input/keyboard.rs`.
|
||
- **Ctrl+Alt+Del:** `SpecialKeyEvent.CTRL_ALT_DEL` → the salvaged SAS service (`bin/sas_service.rs`,
|
||
`SendSAS`, SYSTEM, `SoftwareSASGeneration` policy set by the managed installer).
|
||
- **Modifier hygiene:** track modifier up/down; **re-sync (release) on focus loss** to kill stuck modifiers.
|
||
|
||
Reference: SPEC-002 §4.1/§4.2; salvage ledger §2; `agent/src/input/keyboard.rs`, `agent/src/bin/sas_service.rs`.
|
||
|
||
---
|
||
|
||
## Task 7 [IMPLEMENTED 2026-05-30 — self-verified on local Windows toolchain: `cargo fmt --all --check` clean, `cargo clippy --workspace --all-targets --all-features -- -D warnings` exit 0, `cargo test --workspace` 89 pass (36 agent + 53 server; was 70, no regressions), `cargo build --workspace` ok; pending Code Review]: Hardware H.264 encode + negotiated raw/Zstd fallback
|
||
|
||
> [IMPLEMENTED] Raw+Zstd remains the DEFAULT and guaranteed fallback; H.264 is a
|
||
> negotiated upgrade that is COMPILE-VERIFIED ONLY (live MF encode/decode is Task
|
||
> 8 — needs real GPU + frames). The testable parts (abstraction, factory,
|
||
> negotiation, capability plumbing, color-conversion math) are done solidly with
|
||
> unit tests; the MF H.264 encoder and viewer decoder are first-cut, clearly
|
||
> marked, and gated behind a default-off policy so unvalidated H.264 never ships
|
||
> as the default.
|
||
>
|
||
> 1. ENCODER ABSTRACTION (`agent/src/encoder/mod.rs`): the existing `Encoder`
|
||
> trait (`encode(&mut self, &CapturedFrame) -> Result<EncodedFrame>`) is the
|
||
> abstraction; `RawEncoder` (salvaged raw+Zstd+dirty-rects, UNCHANGED behavior)
|
||
> and the new `H264Encoder` both implement it. Factory split into pure pieces:
|
||
> `codec_from_str` (config-string -> `VideoCodec`), `select_codec(negotiated,
|
||
> hardware_available)` (agent-side guard: H.264 only if HW present, HEVC->raw,
|
||
> else raw), and `create_encoder_for(VideoCodec, quality)` (builds the encoder;
|
||
> on H.264 init failure logs + returns a RAW encoder so the session never
|
||
> breaks). UNIT-TESTED: codec_from_str mapping, select_codec guard matrix, raw
|
||
> factory always succeeds, string path resolves to raw without HW.
|
||
> 2. CAPABILITY + NEGOTIATION (testable, done well):
|
||
> - `encoder/capability.rs`: `supports_hardware_h264()` probes MF once
|
||
> (`MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_HARDWARE,
|
||
> MFVideoFormat_H264)`), caches the bool via `OnceLock`; false on non-Windows
|
||
> / no HW / MF error. Advertised in `AgentStatus.supports_h264` (proto field
|
||
> 11, additive).
|
||
> - Server (`server/src/session/mod.rs`): `select_video_codec(agent_supports,
|
||
> prefer_h264)` is a PURE decision fn — H.264 only when BOTH the agent
|
||
> supports it AND policy prefers it, else raw. Policy constant
|
||
> `DEFAULT_PREFER_H264 = false` (documented: keeps raw as the negotiated codec
|
||
> until H.264 is hardware-validated). `supports_h264` stored on the in-memory
|
||
> `Session` from `AgentStatus` (`update_agent_status` gained the param). The
|
||
> negotiated codec is stamped on `StartStream.video_codec` in
|
||
> `send_start_stream_internal` (the LIVE server->agent codec-selection point —
|
||
> SessionRequest/SessionResponse are not exchanged on the wire in v2, so the
|
||
> proto's `SessionResponse.video_codec` is kept for spec parity but the live
|
||
> path uses `StartStream`). UNIT-TESTED: the negotiation matrix, the
|
||
> default-policy guardrail (capable agent still gets raw), and the
|
||
> `AgentStatus -> supports_h264` ingest.
|
||
> - Agent applies it: `StartStream` handler decodes `video_codec`, stores
|
||
> `negotiated_codec`, and `init_streaming` builds the encoder via
|
||
> `select_codec` + `create_encoder_for` (re-guards on local HW; older server
|
||
> sends 0 = RAW, preserving the default).
|
||
> 3. MF H.264 ENCODER (`agent/src/encoder/h264.rs`, FIRST-CUT, compile-verified
|
||
> only): enumerates+activates a HW H.264 encoder MFT, sets H.264 output then
|
||
> NV12 input media types (frame size/rate, bitrate from quality), feeds frames
|
||
> (`ProcessInput`) and drains synchronously (`ProcessOutput`, NEED_MORE_INPUT =
|
||
> "no output this tick"), emitting `VideoFrame{H264(EncodedFrame{data, keyframe,
|
||
> pts, dts})}`. BGRA->NV12 via `encoder/color.rs` (BT.601 limited-range, 2x2 box
|
||
> chroma; isolated + UNIT-TESTED: size, odd-dim/short-buffer rejection, black/
|
||
> white/red reference values, plane coverage). On ANY init failure the FACTORY
|
||
> falls back to raw (logged); per-frame errors surface to the session (which
|
||
> logs + continues). Handles resolution change (re-init), keyframe flag
|
||
> (CleanPoint), MF buffer alloc for non-sample-providing MFTs. NOT yet live: the
|
||
> async-MFT event model is documented as a Task-8 refinement (this cut drains
|
||
> synchronously); precise force-IDR (CODECAPI) is a TODO; D3D11 zero-copy
|
||
> deferred (feeds CPU NV12).
|
||
> 4. VIEWER H.264 DECODE (`agent/src/viewer/decoder.rs` [new], FIRST-CUT,
|
||
> compile-verified only): MF H.264 decoder MFT -> NV12 -> BGRA
|
||
> (`nv12_to_bgra`, BT.601 inverse, UNIT-TESTED round-trip within tolerance +
|
||
> short-buffer + black). Runs on a DEDICATED OS thread (`gc-h264-decode`), NOT a
|
||
> tokio task — the MF decoder has COM thread affinity and a tokio task can
|
||
> migrate across workers at await points. The receive task forwards H.264 access
|
||
> units over a std channel; the worker decodes and pushes BGRA `FrameData`
|
||
> through the existing render path via `blocking_send`. On decoder-init failure
|
||
> it logs once and drops H.264 frames; the RAW render path is untouched. Handles
|
||
> the `MF_E_TRANSFORM_STREAM_CHANGE` NV12 output renegotiation + size discovery.
|
||
> 5. RAW STILL WORKS END-TO-END: `RawEncoder` is unchanged; with
|
||
> `DEFAULT_PREFER_H264 = false` the server negotiates RAW for every session
|
||
> (including capable agents), the agent builds the raw encoder, and the viewer's
|
||
> existing `Raw` branch renders it — the guaranteed default/fallback path is
|
||
> fully intact and is what runs today.
|
||
>
|
||
> PROTO (additive — no field renumbered): `VideoCodec` enum (RAW=0, H264=1,
|
||
> H265=2); `SessionResponse.video_codec = 5` (spec parity); `StartStream.video_codec
|
||
> = 3` (live negotiation); `AgentStatus.supports_h264 = 11` (capability). HEVC is a
|
||
> documented TODO/opt-in everywhere (never selected). Cargo.toml: added the
|
||
> `Win32_Media_MediaFoundation` + COM windows features (no new external crates).
|
||
>
|
||
> COMPILE-VERIFIED-ONLY / NEEDS LIVE HARDWARE (Task 8): the MF H.264 encoder
|
||
> init/feed/emit on a real GPU, the viewer MF decoder on a live stream, the
|
||
> BGRA<->NV12 fidelity end-to-end, and the synchronous-drain timing. The encoder/
|
||
> decoder are structured to fall back to raw (encoder) / drop frames + log
|
||
> (decoder) on any failure so they cannot break a session even if MF misbehaves.
|
||
>
|
||
> TESTS ADDED (19): agent +16 (encoder factory/select matrix x5, color BGRA->NV12
|
||
> x8, decoder NV12<->BGRA x3), server +3 (codec negotiation matrix, default-policy
|
||
> guardrail, AgentStatus capability ingest).
|
||
|
||
Files touched: `proto/guruconnect.proto` (`VideoCodec` enum + `SessionResponse.video_codec`
|
||
+ `StartStream.video_codec` + `AgentStatus.supports_h264`), `agent/Cargo.toml` (MF/COM windows
|
||
features), `agent/src/encoder/mod.rs` (trait/factory/select), `agent/src/encoder/raw.rs`
|
||
(salvaged, unchanged), `agent/src/encoder/h264.rs` [new], `agent/src/encoder/capability.rs` [new],
|
||
`agent/src/encoder/color.rs` [new], `agent/src/session/mod.rs` (negotiated codec apply +
|
||
`supports_h264` advertise), `agent/src/viewer/mod.rs` (H.264 route + decode worker),
|
||
`agent/src/viewer/decoder.rs` [new], `server/src/session/mod.rs` (`select_video_codec` +
|
||
`DEFAULT_PREFER_H264` + `supports_h264` field/ingest + `StartStream` codec stamp),
|
||
`server/src/relay/mod.rs` (pass `supports_h264` from `AgentStatus`).
|
||
|
||
- HW **H.264** via Windows Media Foundation (transparently NVENC/AMF/QuickSync) emitting the proto's
|
||
`EncodedFrame` (h264). Native viewer decodes via MF/D3D11.
|
||
- **Fallback:** salvaged raw BGRA + Zstd + dirty-rects for Win7 / no HW encoder.
|
||
- **Negotiation:** agent advertises HW-encode capability in `AgentStatus`; server selects the codec in
|
||
`SessionResponse`; default H.264, HEVC opt-in, raw fallback. One encode feeds the native viewer now
|
||
(and the Phase-2 WebRTC track later).
|
||
|
||
Reference: SPEC-002 §5; `agent/src/encoder/raw.rs` (salvaged), `proto/guruconnect.proto` (EncodedFrame already modeled).
|
||
|
||
---
|
||
|
||
## Task 8: Verification (end-to-end, observable)
|
||
|
||
- **Security (re-audit clean):** run `/gc-audit --pass=security` (and `--pass=rust`) → the three relay
|
||
CRITICALs and the rate-limiting/frame-cap/reusable-code HIGHs are gone. Manually confirm: a revoked
|
||
`cak_` key is rejected on `/ws/agent`; a viewer token for session A is rejected on session B; a
|
||
logged-out (blacklisted) viewer token is rejected on `/ws/viewer`; a user JWT is rejected as an agent key.
|
||
- **Attended flow:** generate a support code → run the one-time agent → end user sees + accepts consent →
|
||
technician's session appears only after acceptance; a denied consent tears down.
|
||
- **Key fidelity (the headline):** in a live session, confirm **Win+R opens Run on the remote**, **Ctrl+C
|
||
on remote / Ctrl+V locally** (and vice-versa, text), and **Ctrl+Alt+Del reaches the remote secure
|
||
desktop**. Confirm no stuck modifiers after alt-tabbing away and back.
|
||
- **Codec:** confirm a HW-H.264 machine negotiates h264 (check `SessionResponse`), a Win7/no-HW machine
|
||
falls back to raw+Zstd, both render correctly in the native viewer.
|
||
- **Rate limiting:** hammer `/api/auth/login` and the code-validate route → confirm throttling/lockout.
|
||
- **Migrations:** fresh DB applies the v2 migrations cleanly; `_sqlx_migrations` consistent; `tenant_id`
|
||
populated with the default tenant.
|
||
|
||
---
|
||
|
||
## Task 9 [PROPOSED 2026-06-01 — provisioning model = TOFU auto-enroll, chosen by Mike]: `cak_` auto-enroll provisioning + shared-key retirement
|
||
|
||
> Context: Task 2 built the SERVER `cak_` machinery (mint/SHA-256 hash/verify in `auth/agent_keys.rs`,
|
||
> relay validation in `validate_agent_api_key`, admin issuance `POST /api/machines/:id/keys`). What's
|
||
> missing is how an AGENT obtains and uses a `cak_` — today agents still carry the deprecated shared
|
||
> `AGENT_API_KEY`, so `connect_agent_keys` is empty and the relay logs the DEPRECATED-shared-key warning
|
||
> for every agent. This task closes that with **trust-on-first-use auto-enroll** so the shared key can be
|
||
> retired (unblocks task list #5). NOTE: the agent already presents whatever is in its `api_key` slot and
|
||
> the relay auto-detects `cak_` vs shared — so a `cak_`-keyed agent needs **no change to its auth call**,
|
||
> only a way to *receive*, *persist*, and *prefer* a `cak_`.
|
||
|
||
**Flow (TOFU):**
|
||
1. **Bootstrap (first connect):** a fresh agent authenticates on `/ws/agent` with a bootstrap secret —
|
||
interim: the shared `AGENT_API_KEY` (embedded by the download endpoint); target: a single-use,
|
||
short-lived **enroll token** (more secure TOFU — see Security).
|
||
2. **Server issues on first connect:** when an agent authed via the bootstrap path (i.e. NOT already
|
||
`cak_`-keyed) connects and its machine has **no active (non-revoked) `cak_`**, the relay: resolves/creates
|
||
the machine row (existing `upsert_machine` on `machine_uid` — now functional after the 2026-06-01
|
||
ON CONFLICT fix), mints a `cak_` (`generate_agent_key` + `db::agent_keys::insert_agent_key` for that
|
||
`machine_id`), and sends the plaintext key to the agent **once** over a new server→agent message. Only
|
||
the hash is stored. **Idempotent:** never re-issue if an active key already exists for the machine.
|
||
3. **Agent receives + persists + prefers:** on `AgentKeyProvision`, the agent persists the `cak_` durably at
|
||
`%ProgramData%\GuruConnect\agent_key` (restricted ACL, same pattern as `machine_uid`). On startup it loads
|
||
the persisted `cak_` if present and uses it as its auth key, falling back to the embedded/bootstrap secret
|
||
only when no `cak_` is stored yet. After provisioning, every reconnect authenticates via `cak_` (no more
|
||
DEPRECATED-shared-key warning for that agent).
|
||
4. **Shared-key retirement (phased):** Phase A — shared key stays as the bootstrap so existing+new agents
|
||
self-enroll; monitor the relay WARN count → ~0. Phase B — once the fleet is `cak_`-keyed, restrict the
|
||
shared `AGENT_API_KEY` to enrollment-only or remove the env entirely (only `cak_` / enroll-token accepted).
|
||
This is the concrete completion of task-list #5.
|
||
|
||
**Protocol (4-artifact drift discipline):** add `AgentKeyProvision { string key = 1; }` (server→agent) to
|
||
`proto/guruconnect.proto` with a new reserved message ID; regenerate prost on both agent + server; the
|
||
hand-written `dashboard/src/lib/protobuf.ts` decoder does NOT need it (agent-plane only) but reserve the ID.
|
||
|
||
**Files:** `proto/guruconnect.proto` (new message); `server/src/relay/mod.rs` (issue+send on bootstrap connect
|
||
with no active key); `server/src/db/agent_keys.rs` (add `has_active_key(machine_id)` check; reuse insert);
|
||
`agent/src/transport/*` (handle inbound `AgentKeyProvision`); `agent/src/config.rs` + a small key-store module
|
||
(load/persist `cak_`, prefer over bootstrap).
|
||
|
||
**Security (TOFU):** the first connect trusts the bootstrap secret — a leaked shared key during the enroll
|
||
window could enroll a rogue agent; the secure target is a **single-use, short-lived enroll token** per
|
||
deployment instead of the shared key (shared-key bootstrap is interim convenience). The `cak_` is sent
|
||
plaintext once over the existing wss/TLS channel; only the hash is stored server-side; the agent stores it
|
||
locally with restricted ACLs. Revocation via the existing `DELETE /api/machines/:id/keys/:key_id` fails the
|
||
agent closed; on its next bootstrap connect it re-enrolls. The keyed-agent dedup (Task 3) keeps the
|
||
authenticated identity authoritative.
|
||
|
||
**Verification:** drop a current-build (signed 0.3.0+) agent configured with the shared-key bootstrap →
|
||
it connects, receives a `cak_`, persists it; restart → it authenticates via the `cak_` (relay shows NO
|
||
DEPRECATED-shared-key warning) and `connect_agent_keys` holds exactly one active key for the machine; issue
|
||
is idempotent across reconnects; revoke the key via the admin API → agent rejected, then re-enrolls on next
|
||
bootstrap connect. Reference: `auth/agent_keys.rs`, `api/machine_keys.rs`, `relay/mod.rs:266-309`
|
||
(`validate_agent_api_key`), `.claude/standards/security/credential-handling.md`.
|