feat(server): v2 secure-session-core Task 4 - rate limit + single-use codes

SPEC-002 Phase 1 Task 4 (the final keystone task), code-reviewed APPROVED. Closes the audit's reusable-code HIGH and rate-limiting-disabled HIGH. - Rebuilt rate limiting as a self-contained in-memory per-IP limiter (replaces the non-compiling tower_governor; removed that dep). Fixed-window caps wired to login (8/min), change-password (5/min), code-validate (15/min) -> 429; per-IP lockout after 10 consecutive failed code validations (15-min cooldown). - Single-use support codes: atomic consume on first agent bind (in-memory Pending->Connected under write lock + DB conditional UPDATE), rejecting a second presenter; validate/preview does not consume. - Widened code format: XXX-XXX-XXX, 31-char unambiguous alphabet (no 0/O/1/I/L), CSPRNG + rejection sampling, ~44.6 bits (replaces 6-digit numeric); migration 006 widens the code columns to TEXT. Completes the keystone (Tasks 1-4): every audit CRITICAL + HIGH in the secure auth/session core is now addressed. Known follow-up todos (not blocking): (1) trusted-proxy client-IP extraction (NPM-on-loopback collapses clients to 127.0.0.1); (2) multi-instance fail-closed DB single-use gate. Not cargo-check-verified locally - build-host/CI verification follows this commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 21:04:54 -07:00
parent 8a0193577b
commit bfcdbb5379
9 changed files with 1026 additions and 130 deletions
--- a/server/src/relay/mod.rs
+++ b/server/src/relay/mod.rs
@@ -128,17 +128,46 @@ pub async fn agent_ws_handler(
        return Err(StatusCode::UNAUTHORIZED);
    }

-    // Validate support code if provided
+    // Validate AND CONSUME the support code if provided (single-use, Task 4).
+    //
+    // SINGLE-USE (closes the reusable-code HIGH): a support code is consumed
+    // ATOMICALLY on the FIRST successful agent bind. `consume_for_bind` accepts
+    // the code only if it is currently `Pending` (never used) and flips it to
+    // `Connected` under the manager's write lock; a SECOND presenter of the same
+    // code sees it already `Connected` and is rejected here, before the socket is
+    // upgraded. This replaces the v1 check that accepted `pending` OR `connected`
+    // (which let any number of agents reuse one code).
+    //
+    // AUTHORITATIVE single-use gate = the in-memory atomic consume. The in-memory
+    // manager is the live source of truth for a code's joinable state (it is what
+    // the portal create + validate paths use), and it is empty on a fresh process,
+    // so a code from a previous run is already unknown here and rejected. A second
+    // presenter within this run loses the `Pending → Connected` race and is
+    // rejected. This single check closes the reusable-code HIGH.
+    //
+    // The database additionally carries a DURABLE single-use marker
+    // (`consume_code_for_bind`: a conditional UPDATE guarded by `consumed_at IS
+    // NULL AND status = 'pending'`). It is applied best-effort AFTER the
+    // authoritative in-memory consume — it stamps `consumed_at` for the audit
+    // trail and cross-restart durability, but a missing/uninsertable DB row must
+    // NOT veto an agent the in-memory layer already admitted (the DB row is only
+    // populated opportunistically by the portal create path).
    if let Some(ref code) = support_code {
-        // Check if it's a valid, pending support code
-        let code_info = state.support_codes.get_status(code).await;
-        if code_info.is_none() {
+        let consumed = state
+            .support_codes
+            .consume_for_bind(code, Some(agent_name.clone()), Some(agent_id.clone()))
+            .await;
+
+        if !consumed {
            warn!(
-                "Agent connection rejected: {} from {} - invalid support code {}",
-                agent_id, client_ip, code
+                "Agent connection rejected: {} from {} - support code already used, \
+                 invalid, expired, or cancelled",
+                agent_id, client_ip
            );

-            // Log failed connection attempt
+            // Log failed connection attempt. We cannot distinguish reuse from a
+            // nonexistent code without leaking timing, so log the generic
+            // invalid-code event (never log the code value itself).
            if let Some(ref db) = state.db {
                let _ = db::events::log_event(
                    db.pool(),
@@ -147,8 +176,7 @@ pub async fn agent_ws_handler(
                    None,
                    Some(&agent_id),
                    Some(serde_json::json!({
-                        "reason": "invalid_code",
-                        "support_code": code,
+                        "reason": "code_unavailable_or_already_used",
                        "agent_id": agent_id
                    })),
                    Some(client_ip),
@@ -158,42 +186,46 @@ pub async fn agent_ws_handler(

            return Err(StatusCode::UNAUTHORIZED);
        }
-        let status = code_info.unwrap();
-        if status != "pending" && status != "connected" {
-            warn!(
-                "Agent connection rejected: {} from {} - support code {} has status {}",
-                agent_id, client_ip, code, status
-            );

-            // Log failed connection attempt (expired/cancelled code)
-            if let Some(ref db) = state.db {
-                let event_type = if status == "cancelled" {
-                    db::events::EventTypes::CONNECTION_REJECTED_CANCELLED_CODE
-                } else {
-                    db::events::EventTypes::CONNECTION_REJECTED_EXPIRED_CODE
-                };
-
-                let _ = db::events::log_event(
-                    db.pool(),
-                    Uuid::new_v4(),
-                    event_type,
-                    None,
-                    Some(&agent_id),
-                    Some(serde_json::json!({
-                        "reason": status,
-                        "support_code": code,
-                        "agent_id": agent_id
-                    })),
-                    Some(client_ip),
-                )
-                .await;
+        // Durable single-use marker in the database (best-effort; see above). The
+        // real session_id is attached later in `handle_agent_connection` once the
+        // socket has upgraded; here we only stamp `consumed_at` on the row if it
+        // exists. `Ok(None)` (no consumable row) and `Err` are logged, not fatal —
+        // the in-memory consume already authorized this bind.
+        if let Some(ref db) = state.db {
+            match db::support_codes::consume_code_for_bind(
+                db.pool(),
+                code,
+                None,
+                Some(&agent_name),
+                Some(&agent_id),
+            )
+            .await
+            {
+                Ok(Some(_id)) => { /* durable consume recorded */ }
+                Ok(None) => {
+                    // No consumable DB row (the code may not have been persisted,
+                    // or was already consumed in the DB). Non-fatal: the in-memory
+                    // layer is authoritative for this process.
+                    tracing::debug!(
+                        "No durable support-code row to consume for agent {} (in-memory \
+                         consume already authoritative)",
+                        agent_id
+                    );
+                }
+                Err(e) => {
+                    tracing::warn!(
+                        "Database error stamping durable support-code consume for {}: {}",
+                        agent_id,
+                        e
+                    );
+                }
            }
-
-            return Err(StatusCode::UNAUTHORIZED);
        }
+
        info!(
-            "Agent {} from {} authenticated via support code {}",
-            agent_id, client_ip, code
+            "Agent {} from {} authenticated via single-use support code",
+            agent_id, client_ip
        );
    }

@@ -568,24 +600,21 @@ async fn handle_agent_connection(
        None
    };

-    // If a support code was provided, mark it as connected
+    // If a support code was provided, link it to the real session.
+    //
+    // The code was already CONSUMED atomically (single-use) in `agent_ws_handler`
+    // before this upgrade — it is already `Connected`/`consumed_at` set in both
+    // the in-memory manager and (if a DB is configured) the database. Here we only
+    // establish the mapping to the REAL session_id, which is not known until the
+    // socket has upgraded and `register_agent` has run. We do NOT re-consume or
+    // re-flip status (that would defeat the single-use guard).
    if let Some(ref code) = support_code {
-        info!("Linking support code {} to session {}", code, session_id);
-        support_codes
-            .mark_connected(code, Some(agent_name.clone()), Some(agent_id.clone()))
-            .await;
+        info!("Linking support code to session {}", session_id);
        support_codes.link_session(code, session_id).await;

-        // Database: update support code
+        // Database: attach the real session_id to the already-consumed code row.
        if let Some(ref db) = db {
-            let _ = db::support_codes::mark_code_connected(
-                db.pool(),
-                code,
-                Some(session_id),
-                Some(&agent_name),
-                Some(&agent_id),
-            )
-            .await;
+            let _ = db::support_codes::link_session_to_code(db.pool(), code, session_id).await;
        }
    }