Files
guru-connect/reports/2026-05-29-gc-audit.md
Mike Swanson 486debfc52
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 6m14s
Build and Test / Build Server (Linux) (push) Successful in 10m29s
Build and Test / Security Audit (push) Successful in 4m12s
Build and Test / Build Summary (push) Successful in 10s
docs(audit): add inaugural gc-audit report 2026-05-29
First /gc-audit run (also a dry run validating the skill). 7 passes.
4 CRITICAL (3 relay-plane auth failures: any-JWT session hijack,
viewer-WS blacklist bypass, JWT-accepted-as-agent-key; 1 functional:
dashboard protobuf.ts wire-incompatible). Plus deploy.yml stub leaving
prod 57 commits stale. Proposed roadmap/tech-debt deltas listed (not
yet applied, pending review).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 17:46:26 -07:00

22 KiB
Raw Blame History

GuruConnect Audit Report — 2026-05-29

Auditor: Claude (claude-opus-4-8, via /gc-audit) Passes: API Surface, Rust Quality, TypeScript/Dashboard, Protocol Integrity, Security & Session, CI/CD Pipeline, Docs Reconciliation Previous audit: First audit (this is the inaugural /gc-audit run — also a dry run to validate the skill) Note: Gitea was down during this run (internal :3000 refused, public 502). Pass G's Gitea-runner checks are COVERAGE GAPS to re-verify; deploy-host (172.16.3.30) live checks succeeded (separate segment). Living docs were NOT auto-edited this run — proposed deltas are listed for review (dry-run posture).


Executive Summary

Pass Total Critical High Medium Low
API Surface 18 0 4 4 0
Rust Quality 9 1 2 2 2
TypeScript/Dashboard 12 2 1 3 4
Protocol Integrity 7 1 2 1 2
Security & Session 12 3 3 2 2
CI/CD Pipeline 11 0 3 1 1
TOTAL (delta-deduped) ~45 unique 4 9 ~10 ~10

Requires immediate action (CRITICAL):

  1. validate_agent_api_key accepts any valid dashboard JWT as an agent API key → viewer→agent impersonation / session-slot seizure (server/src/relay/mod.rs:224-239).
  2. Viewer WebSocket: any valid JWT attaches to and controls ANY session by UUID — no per-session/per-machine authorization (server/src/relay/mod.rs:242-288).
  3. Viewer WebSocket skips the token blacklist → revoked/logged-out tokens still grant live remote control until natural expiry (server/src/relay/mod.rs:260).
  4. Dashboard lib/protobuf.ts is a fabricated fixed-offset codec, wire-incompatible with the real protobuf the agent/server speak → React viewer cannot decode frames or send input (dashboard/src/lib/protobuf.ts:18-150). Functional-critical; currently dead code (prod uses static viewer.html) so production impact is deferred until the component library is mounted.

Pass 1: API Surface

The server exposes a substantially larger REST surface than the static dashboard consumes. No broken calls and no method mismatches — every static-page fetch resolves to a real route with the correct method. The gaps are UI-orphaned server capability.

[HIGH] Entire /api/machines resource is UI-orphaned; Machines tab is session-backed

File: server/src/main.rs:316-319; server/static/dashboard.html:818-821 Detail: GET /api/machines (list), GET /api/machines/:agent_id, DELETE /api/machines/:agent_id, GET /api/machines/:agent_id/history have no frontend caller. The dashboard "Machines" tab is populated by loadMachines() which actually fetches /api/sessions. Persistent-machine inventory, history, and deletion have no UI; if the sessions payload lacks is_online/agent_version, the tab renders empty/Unknown. Recommendation: Repoint the Machines tab to GET /api/machines, or document that the dashboard intentionally shows live sessions; add machine history/delete UI.

[HIGH] Full /api/releases CRUD has no UI

File: server/src/main.rs:326-333 Detail: GET/POST /api/releases, GET/PUT/DELETE /api/releases/:version have zero frontend callers; no release-management page exists. Recommendation: Build a releases admin page, or confirm releases are CI/API-managed and mark ops-only.

[HIGH] /api/users/:id/clients (set_client_access) orphaned

File: server/src/main.rs:302 Detail: users.html has a manage_clients permission checkbox but never assigns which clients a user may access. Server supports per-user client scoping with no UI to set it. Recommendation: Add a client-assignment UI, or confirm client-access is managed elsewhere.

[HIGH] useRemoteSession.ts /ws/viewer URL omits the required JWT token

File: dashboard/src/hooks/useRemoteSession.ts:92 Detail: The React hook builds ?session_id=... with no token (and no viewer_name), but the viewer WS requires a JWT in the token query param (static viewer.html includes it). If the component library is used against this server, viewer WS auth fails. Recommendation: Include the JWT token + viewer_name query params.

[MEDIUM] /api/sessions/:id (GET single), /api/changelog/:component/:version, /api/auth/blacklist/{stats,cleanup} orphaned (no UI)

Files: main.rs:313, :337-340, :284-291 Detail: Session-detail GET, changelog display, and blacklist ops endpoints have no UI consumer. Most are reasonable ops/API-only endpoints; flag and document.

[INFO] Genuinely-external routes (not dead)

/api/version (agent polling), /ws/agent (Rust agent), /health, /metrics are correctly out-of-band.

Pass A summary: All frontend calls resolve correctly; the issue is large UI-orphaned server CRUD — the entire /api/machines resource (the Machines tab is actually /api/sessions-backed), all of /api/releases, changelog, and /api/users/:id/clients client-scoping. Highest-impact: missing machine-management UI and the useRemoteSession missing-token bug.


Pass 2: Rust Code Quality

[CRITICAL] Auth-plane leak — dashboard JWT accepted as agent API key

File: server/src/relay/mod.rs:224-239 Detail: validate_agent_api_key returns true for any token passing jwt_config.validate_token. A viewer-plane credential (dashboard JWT) is thus a valid agent-plane credential — a leaked/low-priv JWT can register/impersonate an agent. The code comment admits "allows using dashboard token for testing." Recommendation: Drop the JWT-accept branch; validate api_key only against AGENT_API_KEY (and future per-agent keys with role == "agent"). (Also surfaced by Pass E as CRITICAL with session-level impact.)

[HIGH] Initial-admin password logged to tracing on fallback path

File: server/src/main.rs:175 Detail: Primary path writes creds to a 0600 file; if fs::write fails, info!(" Password: {}", password) emits the cleartext password to the log stream (journald/aggregators). Recommendation: On write failure, print to stdout only or abort startup; never route the secret through the structured logger.

[HIGH] Rate limiting disabled and non-compiling

File: server/src/middleware/mod.rs:3-11, middleware/rate_limit.rs, main.rs:268,303 Detail: rate_limit module is commented out ("not yet functional"); rate_limit.rs has invalid type signatures; login/change-password/code-validate routes carry only // TODO: Add rate limiting. Brute-force protection on auth is absent. (Cross-ref Pass E, Pass F — SEC-2 not fixed; roadmap [x] partially false.)

[MEDIUM] Stub admin endpoint returns 501

File: server/src/api/auth_logout.rs:99-131 Detail: POST /api/auth/admin/revoke-user is registered/reachable but always NOT_IMPLEMENTED — admins cannot force-revoke a compromised user's tokens.

[MEDIUM] Agent run-loop .unwrap() on transport

File: agent/src/session/mod.rs:312,392,465 Detail: self.transport.as_mut().unwrap() — in-loop invariant today, but any reconnect refactor turns it into a hard agent crash. Use ok_or_else(...)?.

[LOW] Support codes logged in cleartext; /metrics mutex .unwrap() panic

Files: relay/mod.rs:107,134,165,377, main.rs:413,415

[INFO] Acceptable panics / correct usage

Startup JWT_SECRET presence + ≥32-char enforcement (main.rs:94-100) is the required fail-closed behavior; static-literal parse().unwrap() in CORS/headers is infallible; agent println!/eprintln! are user-facing CLI output (version-info/help), not logging violations. sqlx compile-time macros used throughout — correct for GC, not flagged. Logout blacklist IS consulted per-request for REST (auth/mod.rs:116); JWT exp enforced; Argon2id confirmed; no raw e.to_string() leaks to clients.

Pass B summary: Server is broadly disciplined (tracing, anyhow, Argon2id, exp enforcement, per-request blacklist on REST, clean error handling, sqlx macros correctly used). Standout problem is the CRITICAL one-way auth-plane leak; secondary: admin-password-on-fallback logging and fully-disabled rate limiting.


Pass 3: TypeScript / Dashboard Quality

[CRITICAL] lib/protobuf.ts is not protobuf — fabricated fixed-offset layout, wire-incompatible

File: dashboard/src/lib/protobuf.ts:93-150 (decode), :18-88 (encode) Detail: The server relays genuine prost-encoded protobuf Message bytes verbatim (relay/mod.rs:449,454). A real Message{video_frame} starts with tag byte 0x52 (field 10 << 3 | 2). The decoder instead checks getUint8(0) === 10 and reads phantom fixed offsets, so every real frame returns null (canvas never renders). The static server/static/viewer.html parses the same wire tree correctly — the two implementations have diverged. Recommendation: Delete the hand-rolled codec; port viewer.html's varint/length-delimited parser, or use protobufjs/generated code.

[CRITICAL] Encoders emit the same fabricated layout — input events undecodable by the agent

File: dashboard/src/lib/protobuf.ts:18-46, 51-88 Detail: encodeMouseEvent/encodeKeyEvent write a flat struct with a leading message-id byte. The server decodes viewer input via proto::Message::decode (relay/mod.rs:666) and rejects these as malformed → clicks/keys silently dropped.

[HIGH] Decoder bounds-checking incomplete — short/hostile frames read OOB

File: dashboard/src/lib/protobuf.ts:93-117 Detail: Only length < 2 guarded, then reads through offset 21. A 320-byte frame throws RangeError into an uncaught promise (useRemoteSession.ts:60). Validate full header length; wrap decode in try/catch.

[MEDIUM] sequence decoded but out-of-order frames never dropped; WS hook reconnects forever with no backoff/error surface; fzstd decompression is a no-op stub

Files: RemoteViewer.tsx:40-87, useRemoteSession.ts:105-125, protobuf.ts:156-162

[LOW] Single union cast hides out-of-range encoding; SessionControls <select> lacks label; redundant cleanup

[INFO] BGRA→RGBA conversion correct; tsconfig strict + unused-symbol checks on; zero any/@ts-ignore/console.* in dashboard/src/. The React component library appears dead/divergent — prod serves the hand-written static viewer.

Pass C summary: Headline defect is the fabricated, wire-incompatible protobuf layer; every real frame returns null and input is silently dropped. The component library must be removed or rebuilt against the real wire format before it is ever mounted. Otherwise TS hygiene is good (strict tsconfig, no any/console).


Pass 4: Protocol & Wire-Format Integrity

Four-way drift matrix (abridged — full matrix from Agent D): proto / agent(prost) / server-relay(prost) / dashboard protobuf.ts / static viewer.html. The proto schema, agent prost codec, server relay codec, and static viewer.html are mutually consistent and wire-correct. The dashboard/protobuf.ts column is uniformly WRONG (fabricated offsets) for video_frame(10)/mouse_event(20)/key_event(21). Field NAMES in types/protocol.ts map correctly (camelCase↔snake_case); the drift is entirely the wire codec.

[CRITICAL] dashboard/protobuf.ts wire-incompatible (independently confirmed) — see Pass C.

[HIGH] No inbound WebSocket frame-size cap before decode/relay

File: server/src/relay/mod.rs:209,278,447-454 Detail: Neither agent_ws_handler nor viewer_ws_handler sets max_message_size/max_frame_size. A hostile agent can stream max-size frames, each to_vec()'d and broadcast to all viewers → memory pressure/OOM. Recommendation: Set explicit small caps on both upgrades; reject oversized frames before broadcast. (Cross-ref Pass E.)

[HIGH] Two divergent dashboard wire implementations exist; only viewer.html is correct.

[MEDIUM] Version strings out of sync

File: workspace Cargo.toml:18 (0.2.2) vs agent/server (0.2.0) vs dashboard/package.json (0.2.0) Detail: agent/server pin their own version instead of version.workspace = true; agent embeds 0.2.0 at build. Expected pre-auto-bump (SPEC-001 §3). Use version.workspace = true or land the auto-bump.

[LOW] v003 machine columns written by raw SQL but not mapped into the Machine struct; switch_display(12)/quality_settings(40)/update_status(76) defined in proto but undispatched.

Migration integrity: clean — 001/002/003 sequential, no gaps; every table has a column-accurate db/<name>.rs module.

Pass D summary: The Rust/server protocol representations are consistent and correct; the single severe defect is the fabricated dashboard/protobuf.ts. Secondary: no WS frame size cap (OOM), version split, and a few undispatched proto variants. Migrations clean.


Pass 5: Security & Remote-Session Integrity

The REST/dashboard plane is reasonably hardened; the remote-control relay plane — the product's actual threat surface — holds three independent CRITICAL auth-boundary failures.

[CRITICAL] Any valid JWT can attach to and control ANY session — no per-session authorization

File: server/src/relay/mod.rs:242-288 Detail: viewer_ws_handler verifies token signature/expiry but performs zero authorization binding between user and session_id. Any authenticated user (any role, no client grants) can connect to /ws/viewer?session_id=<UUID>&token=<any JWT> and receive video + send input. /api/sessions lists all live session UUIDs to any authenticated user, so IDs aren't secret → full remote-control hijack of any in-progress session. Recommendation: Enforce has_permission("control")/view + client-access scoping against the session's machine before join_session.

[CRITICAL] Viewer WebSocket does not check the token blacklist — revocation bypass

File: server/src/relay/mod.rs:260 Detail: Calls validate_token directly, never token_blacklist.is_revoked(). A revoked/logged-out token still grants screen view + input injection until natural expiry (up to 24h) — SEC-5 revocation does not cover the relay plane. Recommendation: Add the is_revoked gate (as in auth/mod.rs:116) to both viewer and JWT-based agent WS handlers.

[CRITICAL] Dashboard JWT accepted as agent API key — viewer→agent impersonation

File: server/src/relay/mod.rs:224-239 — see Pass B. Session-level impact: a JWT-bearer who knows a target agent_id can seize that machine's persistent-session slot (session/mod.rs:98-131, reattach keyed solely on query-string agent_id).

[HIGH] Support codes effectively reusable / second-agent attach

File: server/src/relay/mod.rs:102-132, support_codes.rs:252-260 Detail: Agent gate accepts a code whose status is pending OR connected — never single-use-consumed on first connect, so a second/attacker agent presenting the same 6-digit code is also admitted. Combined with unauthenticated /api/codes/:code/validate, a ~900k numeric space, and no rate limiting → brute-forceable. Recommendation: Consume the code atomically on first bind (accept only pending); rate-limit the validate endpoint.

[HIGH] No WS frame-size cap (OOM/DoS) — see Pass D.

[HIGH] Auth/validate rate limiting disabled & non-compiling (SEC-2 not fixed) — see Pass B.

File: relay/mod.rs:669-679, agent/src/session/mod.rs:502-515 Detail: Input events forwarded with no throttle; agent injects each via SendInput with no per-second cap and no operator control-grant. Sustained full-rate input floods the target.

[MEDIUM] TLS/WSS not enforced in-app (plaintext if proxy bypassed); [LOW] JWT/support-codes in URL query strings get logged by TraceLayer/proxy.

[INFO] Settled SEC items still hold: SEC-3 (parameterized SQL), SEC-4 (partial — IP/event logging), SEC-9 (Argon2id), SEC-13 (JWT exp). SEC-5 only partially holds (blacklist not enforced on viewer WS — see CRITICAL above).

Pass E summary: The relay plane has three CRITICAL auth failures (any-JWT session hijack, viewer-WS blacklist bypass, JWT-as-agent-key), compounded by reusable brute-forceable support codes, absent WS frame caps, absent input rate limiting/consent, and disabled rate limiting. Priority: close the JWT-as-agent-key leak and add per-session authorization + blacklist checks to the relay handlers first.


Pass 6: CI/CD Pipeline Health

[HIGH] deploy.yml production deploy step is a stub → prod is 57+ commits stale

File: .gitea/workflows/deploy.yml:71-78; live: running binary mtime 2026-01-18, git HEAD on host 1bfd476 vs submodule HEAD ccc6ba9 Detail: Deploy step only echos "Deployment command would run here"; the scp+ssh deploy.sh lines are commented out. The workflow builds/packages/uploads an artifact but never lands it. The production host has never received a CI-driven deploy and predates all v0.2.x tags + the SPEC-001 pipeline. Recommendation: Wire the real SSH deploy (deploy key in repo secrets) or document prod deploy as intentionally manual.

[HIGH] Served agent artifact stale + likely unsigned

Detail: https://connect.azcomputerguru.com/downloads/guruconnect.exe → 200, last-modified 2026-01-18, predating SPEC-001 §2 Azure Trusted Signing → distributed exe is almost certainly unsigned. Recommendation: Run release.yml and deploy the signed exe; confirm Authenticode signature on the served binary.

[HIGH→GAP] Gitea runner registration UNVERIFIABLE (outage)

Detail: Could not reach Gitea to confirm a live ubuntu-latest runner and the Pluto windows-msvc runner. Risk: if no Windows/Pluto runner is online, the agent build job in both build-and-test.yml and release.yml cannot run. Re-check post-outage via GET /api/v1/repos/azcomputerguru/guru-connect/actions/runners + .claude/machines/pluto.md.

[MEDIUM] SPEC-001 doc drift — "no Pluto dependency" contradicts shipped workflows

Detail: SPEC-001 states all jobs run on ubuntu-latest with no Windows runner; shipped workflows build the agent natively on the Pluto windows-msvc runner. Update SPEC-001 to reflect the native-Pluto decision.

[OK] Clippy -D warnings + cargo audit are HARD-FAIL gates (the active-lock work) — committed at HEAD ccc6ba9, dev lock released. SPEC-001 gate target already met.

[OK] Live deploy host: guruconnect.service active, /health 200, /metrics populated, backup timer armed (next 2026-05-30 02:00 UTC), 3/3 migrations applied.

[INFO] cargo-audit gate runs on push/PR to main but is NOT chained into release.yml/deploy.yml (tag-triggered). prometheus.yml has rule_files/alerting commented out → alerts.yml not wired (verify on host).

Pass G summary: Workflow code is in good shape and the SPEC-001 clippy/audit hard-fail target is already met. The real gap is the stubbed deploy.yml deploy step — corroborated by live state: the running binary and served agent exe are stuck at 2026-01-18 / 57 commits behind, so production runs pre-CI-era code and distributes a likely-unsigned agent. Re-check post-outage: runner registration (esp. Pluto), release-workflow secrets, Actions run history.


Proposed FEATURE_ROADMAP.md Delta (Agent F — NOT yet applied; dry-run)

Flip [ ][x] (STALE-INCOMPLETE — shipped end-to-end):

  • Code signing — Azure Trusted Signing in CI — proof: release.yml build-sign-publish (jsign TRUSTEDSIGNING, L417-433)
  • Automatic versioning — release.yml version job (conventional-commit bump, L73-187)
  • Changelog generation & API — release.yml git-cliff + server/src/api/changelog.rs + route main.rs:337-340
  • Feature-request workflow — .claude/commands/gc-feature-request.md
  • Roadmap / ADR / spec tracking — docs/FEATURE_ROADMAP.md + docs/ARCHITECTURE_DECISIONS.md + docs/specs/SPEC-001
  • Coord-API registration — guruconnect project_key in root CLAUDE.md project-keys table

Split / flag (STALE-COMPLETE):

  • Server/API [x] "JWT, Argon2id, rate limiting, security headers" → keep [x] for JWT/Argon2id/headers; add new [ ] Rate limiting — [HIGH] disabled & non-compiling (middleware/mod.rs:3-11)

Annotate (PARTIAL, keep box):

  • [~] React/TS web viewer → append "wire-incompatible, non-functional vs real protocol (dashboard/src/lib/protobuf.ts); decompressZstd is a stub"
  • [ ] CI security audit gate wired to release → append "audit gate exists + hard-fails in build-and-test.yml; NOT chained into release.yml/deploy.yml"

Verified accurate (no change): all Core RC [x] items, GuruRMM-Integration [ ] items, Sessions/machines/support-codes/events [x], Phase-1 hardening [x], programmatic pre-create [ ].

Proposed TECHNICAL_DEBT.md Delta (Agent F — NOT yet applied)

  • Move to "Completed Items" (2026-05-29): CI security audit (cargo audit) hard gate; SPEC-001 release engineering (signing/versioning/changelog) pipeline.
  • Confirmed still OPEN: #4/#19 rate-limit failed auth; #5 audit logging (audit_logs); #6 UI session timeout; #9 deploy SSH automation; #3 sd_notify watchdog.
  • Stale header: doc says "Phase 1, 89%, Last Updated 2026-01-18" — predates the 2026-05-29 release-engineering work.

  1. Relay-plane auth (all CRITICAL): close the JWT-as-agent-key leak (relay/mod.rs:224-239); add per-session authorization + token-blacklist checks to viewer_ws_handler.
  2. HIGH security: single-use support codes; WS inbound frame-size caps; implement + wire real rate limiting (login, change-password, code-validate); stop logging the fallback admin password.
  3. HIGH pipeline: wire the real deploy.yml SSH deploy; run release.yml to ship a signed, current agent; re-verify Gitea runner registration post-outage.
  4. Dashboard: remove or rebuild dashboard/src/lib/protobuf.ts against the real wire format before the React viewer is mounted; fix useRemoteSession missing JWT token.
  5. MEDIUM batch: input-injection throttle + consent gate; machines/releases UI; version workspace = true; SPEC-001 Pluto doc fix; chain cargo-audit into release/deploy.
  6. Docs: apply the roadmap + tech-debt deltas above (pending approval).