feat(server): v2 secure-session-core Task 4 - rate limit + single-use codes
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 6m12s
Build and Test / Build Agent (Windows) (push) Successful in 6m43s
Build and Test / Security Audit (push) Successful in 4m23s
Build and Test / Build Summary (push) Has been skipped

SPEC-002 Phase 1 Task 4 (the final keystone task), code-reviewed APPROVED.
Closes the audit's reusable-code HIGH and rate-limiting-disabled HIGH.

- Rebuilt rate limiting as a self-contained in-memory per-IP limiter (replaces
  the non-compiling tower_governor; removed that dep). Fixed-window caps wired
  to login (8/min), change-password (5/min), code-validate (15/min) -> 429;
  per-IP lockout after 10 consecutive failed code validations (15-min cooldown).
- Single-use support codes: atomic consume on first agent bind (in-memory
  Pending->Connected under write lock + DB conditional UPDATE), rejecting a
  second presenter; validate/preview does not consume.
- Widened code format: XXX-XXX-XXX, 31-char unambiguous alphabet (no 0/O/1/I/L),
  CSPRNG + rejection sampling, ~44.6 bits (replaces 6-digit numeric); migration
  006 widens the code columns to TEXT.

Completes the keystone (Tasks 1-4): every audit CRITICAL + HIGH in the secure
auth/session core is now addressed. Known follow-up todos (not blocking): (1)
trusted-proxy client-IP extraction (NPM-on-loopback collapses clients to
127.0.0.1); (2) multi-instance fail-closed DB single-use gate. Not
cargo-check-verified locally - build-host/CI verification follows this commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 21:04:54 -07:00
parent 8a0193577b
commit bfcdbb5379
9 changed files with 1026 additions and 130 deletions

View File

@@ -128,17 +128,46 @@ pub async fn agent_ws_handler(
return Err(StatusCode::UNAUTHORIZED);
}
// Validate support code if provided
// Validate AND CONSUME the support code if provided (single-use, Task 4).
//
// SINGLE-USE (closes the reusable-code HIGH): a support code is consumed
// ATOMICALLY on the FIRST successful agent bind. `consume_for_bind` accepts
// the code only if it is currently `Pending` (never used) and flips it to
// `Connected` under the manager's write lock; a SECOND presenter of the same
// code sees it already `Connected` and is rejected here, before the socket is
// upgraded. This replaces the v1 check that accepted `pending` OR `connected`
// (which let any number of agents reuse one code).
//
// AUTHORITATIVE single-use gate = the in-memory atomic consume. The in-memory
// manager is the live source of truth for a code's joinable state (it is what
// the portal create + validate paths use), and it is empty on a fresh process,
// so a code from a previous run is already unknown here and rejected. A second
// presenter within this run loses the `Pending → Connected` race and is
// rejected. This single check closes the reusable-code HIGH.
//
// The database additionally carries a DURABLE single-use marker
// (`consume_code_for_bind`: a conditional UPDATE guarded by `consumed_at IS
// NULL AND status = 'pending'`). It is applied best-effort AFTER the
// authoritative in-memory consume — it stamps `consumed_at` for the audit
// trail and cross-restart durability, but a missing/uninsertable DB row must
// NOT veto an agent the in-memory layer already admitted (the DB row is only
// populated opportunistically by the portal create path).
if let Some(ref code) = support_code {
// Check if it's a valid, pending support code
let code_info = state.support_codes.get_status(code).await;
if code_info.is_none() {
let consumed = state
.support_codes
.consume_for_bind(code, Some(agent_name.clone()), Some(agent_id.clone()))
.await;
if !consumed {
warn!(
"Agent connection rejected: {} from {} - invalid support code {}",
agent_id, client_ip, code
"Agent connection rejected: {} from {} - support code already used, \
invalid, expired, or cancelled",
agent_id, client_ip
);
// Log failed connection attempt
// Log failed connection attempt. We cannot distinguish reuse from a
// nonexistent code without leaking timing, so log the generic
// invalid-code event (never log the code value itself).
if let Some(ref db) = state.db {
let _ = db::events::log_event(
db.pool(),
@@ -147,8 +176,7 @@ pub async fn agent_ws_handler(
None,
Some(&agent_id),
Some(serde_json::json!({
"reason": "invalid_code",
"support_code": code,
"reason": "code_unavailable_or_already_used",
"agent_id": agent_id
})),
Some(client_ip),
@@ -158,42 +186,46 @@ pub async fn agent_ws_handler(
return Err(StatusCode::UNAUTHORIZED);
}
let status = code_info.unwrap();
if status != "pending" && status != "connected" {
warn!(
"Agent connection rejected: {} from {} - support code {} has status {}",
agent_id, client_ip, code, status
);
// Log failed connection attempt (expired/cancelled code)
if let Some(ref db) = state.db {
let event_type = if status == "cancelled" {
db::events::EventTypes::CONNECTION_REJECTED_CANCELLED_CODE
} else {
db::events::EventTypes::CONNECTION_REJECTED_EXPIRED_CODE
};
let _ = db::events::log_event(
db.pool(),
Uuid::new_v4(),
event_type,
None,
Some(&agent_id),
Some(serde_json::json!({
"reason": status,
"support_code": code,
"agent_id": agent_id
})),
Some(client_ip),
)
.await;
// Durable single-use marker in the database (best-effort; see above). The
// real session_id is attached later in `handle_agent_connection` once the
// socket has upgraded; here we only stamp `consumed_at` on the row if it
// exists. `Ok(None)` (no consumable row) and `Err` are logged, not fatal —
// the in-memory consume already authorized this bind.
if let Some(ref db) = state.db {
match db::support_codes::consume_code_for_bind(
db.pool(),
code,
None,
Some(&agent_name),
Some(&agent_id),
)
.await
{
Ok(Some(_id)) => { /* durable consume recorded */ }
Ok(None) => {
// No consumable DB row (the code may not have been persisted,
// or was already consumed in the DB). Non-fatal: the in-memory
// layer is authoritative for this process.
tracing::debug!(
"No durable support-code row to consume for agent {} (in-memory \
consume already authoritative)",
agent_id
);
}
Err(e) => {
tracing::warn!(
"Database error stamping durable support-code consume for {}: {}",
agent_id,
e
);
}
}
return Err(StatusCode::UNAUTHORIZED);
}
info!(
"Agent {} from {} authenticated via support code {}",
agent_id, client_ip, code
"Agent {} from {} authenticated via single-use support code",
agent_id, client_ip
);
}
@@ -568,24 +600,21 @@ async fn handle_agent_connection(
None
};
// If a support code was provided, mark it as connected
// If a support code was provided, link it to the real session.
//
// The code was already CONSUMED atomically (single-use) in `agent_ws_handler`
// before this upgrade — it is already `Connected`/`consumed_at` set in both
// the in-memory manager and (if a DB is configured) the database. Here we only
// establish the mapping to the REAL session_id, which is not known until the
// socket has upgraded and `register_agent` has run. We do NOT re-consume or
// re-flip status (that would defeat the single-use guard).
if let Some(ref code) = support_code {
info!("Linking support code {} to session {}", code, session_id);
support_codes
.mark_connected(code, Some(agent_name.clone()), Some(agent_id.clone()))
.await;
info!("Linking support code to session {}", session_id);
support_codes.link_session(code, session_id).await;
// Database: update support code
// Database: attach the real session_id to the already-consumed code row.
if let Some(ref db) = db {
let _ = db::support_codes::mark_code_connected(
db.pool(),
code,
Some(session_id),
Some(&agent_name),
Some(&agent_id),
)
.await;
let _ = db::support_codes::link_session_to_code(db.pool(), code, session_id).await;
}
}