docs: 2026-05-31 security re-audit (Phase-1 EXIT) + roadmap reconcile

/gc-audit --pass=security re-pass over the deployed v0.3.0 code: PASS, 0 CRITICAL/HIGH/MEDIUM/LOW. The 3 relay CRITICALs stay closed (verified in code AND live against the deployed binary), the prior agent-update-TLS HIGH and chat-logging LOW are fixed, and the net-new SPEC-004 surface (machine_uid dedup gate, session reaper/supersede, operator removal API) audits clean — no non-admin removal path, no uid-spoof hijack, no auth-plane crossover. Marks v2 Phase 1 formally exited (secure-session-core Task 8 complete). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:19:09 -07:00
2 changed files with 139 additions and 5 deletions
--- a/docs/FEATURE_ROADMAP.md
+++ b/docs/FEATURE_ROADMAP.md
@@ -16,11 +16,16 @@ stack. It ships independently of GuruRMM and integrates with it via a versioned
 > match, blacklist-on-WS, agent-plane rejects user JWTs via per-agent `cak_` keys). The feature specs below
 > (SPEC-003–009) are **work-items inside the later v2 phases** — see the mapping.
 >
-> **Remaining to formally exit Phase 1:** secure-session-core **Task 8** (end-to-end verification +
-> `/gc-audit --pass=security` re-audit + the manual CRITICAL checks) and Code-Review sign-off on Tasks 3–5
-> (implemented without a local toolchain at the time; since built + deployed). Live HW-H.264 validation is
-> also pending — raw+Zstd remains the shipping default. ~~Sprint 0 (relay-auth CRITICAL hotfix)~~ **not
-> needed — those fixes shipped in Tasks 2–3.**
+> **Phase 1 formally EXITED (2026-05-31).** secure-session-core **Task 8** is complete — end-to-end
+> functional verification (live CRITICAL boundary checks against the deployed binary: login-JWT→401,
+> wrong-session viewer token→403, JWT-as-agent-key→401) **plus the `/gc-audit --pass=security` re-audit:
+> PASS, 0 CRITICAL/HIGH/MEDIUM/LOW** ([report](../reports/2026-05-31-gc-audit.md)). Code-Review sign-off on
+> Tasks 3–5 landed earlier. On top of Phase 1, **SPEC-004 (Tasks 2/4/5 — machine_uid dedup, session
+> reaping, operator removal API+UI) is implemented, reviewed, deployed, and the 11 live ghost rows were
+> purged**; the agent is now **auto-versioned + Azure-Trusted-Signing-signed via `release.yml`** with
+> **v0.3.0 published** as the stable release. ~~Sprint 0 (relay-auth CRITICAL hotfix)~~ **not needed.**
+> Still pending (NOT a Phase-1 blocker): live HW-H.264 cross-GPU validation — **raw+Zstd remains the
+> shipping default** (`DEFAULT_PREFER_H264=false`) until H.264 is validated across GPUs.

 ### v2 phase mapping of current specs

--- a/reports/2026-05-31-gc-audit.md
+++ b/reports/2026-05-31-gc-audit.md
@@ -0,0 +1,129 @@
+# GuruConnect Audit Report — 2026-05-31
+
+**Auditor:** Claude (claude-opus-4-8[1m])
+**Passes:** Security & Remote-Session Integrity (`--pass=security` only)
+**Previous audit:** 2026-05-30 (`reports/2026-05-30-gc-audit.md`)
+**Scope note:** v2 **Phase-1 EXIT gate** re-audit. Confirms the three relay CRITICALs stay closed and
+the prior net-new HIGH is fixed, and assesses the net-new SPEC-004 surface (Tasks 2/4/5 — machine_uid
+dedup, session reaping, operator removal) now committed + deployed. Includes **live** boundary tests
+against the running production binary, not just a code re-derivation.
+
+**Code under audit:** working tree at tag **v0.3.0 / e967cce** = the binary deployed to prod
+172.16.3.30:3002 (deployed this session from 96f9c0a; e967cce adds only the version bump + changelog).
+
+---
+
+## Executive Summary
+
+| Pass | Total | Critical | High | Medium | Low | Info |
+|------|-------|----------|------|--------|-----|------|
+| Security & Session | 4 | 0 | 0 | 0 | 0 | 4 |
+
+**Phase-1 security EXIT gate: PASS.** The relay/server plane is clean. All three 2026-05-29 CRITICALs
+remain CLOSED (verified in code AND live against the deployed server). The prior net-new HIGH (agent
+auto-update TLS bypass) and the prior LOW (chat content logged at INFO) are both remediated. The
+net-new SPEC-004 surface (operator removal, machine_uid dedup gate, session reaper/supersede) audits
+clean with the keyed-identity security invariant intact end-to-end. No net-new findings.
+
+**Requires action:** none.
+
+---
+
+## Live functional verification (deployed binary, 172.16.3.30:3002)
+
+Forged tokens (HS256, real `JWT_SECRET`) exercised the WS auth boundaries directly. Each illegitimate
+access was REJECTED (4xx, never a 101 upgrade):
+
+| Check | Result | Proves |
+|-------|--------|--------|
+| Login-shape JWT on `/ws/viewer` | **401** | Login token not accepted as a viewer token (`purpose=="viewer"` enforced) — CRITICAL #1 |
+| Validly-signed viewer token for session AAAA used on session BBBB | **403** | Session binding enforced — a correctly-signed token is refused for the wrong session — CRITICAL #1 |
+| Login JWT used as agent `api_key` on `/ws/agent` | **401** | Agent plane rejects JWTs (no JWT branch) — CRITICAL #3 |
+| Wrong-signature token on `/ws/viewer` | **401** | Signature validation holds (control) |
+
+The session-bind case is the decisive one: a token that WOULD be accepted for its own session is
+rejected 403 for a different session, proving the binding rather than mere signature validation.
+
+---
+
+## The three relay CRITICALs — verdict
+
+| CRITICAL | Verdict | Enforced at |
+|----------|---------|-------------|
+| #1 any-JWT-joins-any-session | **CLOSED** | mint authz `api/sessions.rs` (is_admin \|\| permission); viewer WS `relay/mod.rs:496` `validate_viewer_token` (sig+expiry+`purpose=="viewer"`); session-bind `relay/mod.rs:527-534` (`claim != requested → 403`) |
+| #2 viewer-WS blacklist | **CLOSED** (TTL-bounded residual unchanged) | `relay/mod.rs:509` `token_blacklist.is_revoked` before upgrade. Residual: logout revokes login JWT not minted viewer tokens (5-min TTL) — same tracked MEDIUM, no regression |
+| #3 JWT-accepted-as-agent-key | **CLOSED**, fails closed | `relay/mod.rs:417` `validate_agent_api_key` — no JWT branch; only `cak_` (`auth/agent_keys.rs`, SHA-256 vs `connect_agent_keys`, `revoked_at IS NULL`) or deprecated shared key (WARN). Unresolved machine → 503 (`:303`); client `agent_id` overridden by key identity (`:283`) |
+
+Live results match these code paths exactly.
+
+---
+
+## Prior HIGH — FIXED
+
+**Agent auto-update TLS bypass → MITM-RCE: CLOSED.** `agent/src/update.rs:21` `dev_insecure_tls()` is
+`cfg!(debug_assertions)` AND env-var gated, so a release build's `cfg!` compiles out and the agent
+ALWAYS verifies certs. Both `check_for_update` (`:64`) and `download_update` (`:130`) consume it; unit
+test `test_dev_insecure_tls_release_is_always_false` (`:362`) asserts the release invariant. No
+`danger_accept_invalid_certs(true)` reachable in production. A signed-manifest defense-in-depth TODO is
+filed at `install_update` (`:189`) (= tracked task #10, not an exit blocker).
+
+---
+
+## Pass 5: Security & Remote-Session Integrity — net-new SPEC-004 surface
+
+### [INFO] Operator removal API (`server/src/api/removal.rs`) — clean, admin-gated
+Every removal handler takes the `AdminUser` extractor as its first argument (runs before any DB
+mutation): `remove_machine` (`:88`), `remove_session` (`:321`), `bulk_remove_machines` (`:471`).
+`AdminUser` (`auth/mod.rs:141`) validates JWT (signature + expiry + blacklist `:97`) then requires
+`is_admin()` else 403 (`:146`). Soft-deletes are parameterized + idempotent (`WHERE … AND deleted_at IS
+NULL`); bulk bounded (MAX_BATCH 500) with per-id UUID validation + isolated failures; audit
+(`db/events.rs:126`) records actor + target + trusted-proxy IP, best-effort (cannot be suppressed by
+attacker-controlled input). Removal is admin-role-gated globally (not per-tenant ACL) — same Phase-1
+posture as viewer-mint, per-tenant narrowing deferred to SPEC-002 Phase 4. Acceptable by context.
+
+### [INFO] machine_uid dedup security gate — invariant holds
+Gate at `relay/mod.rs:352`: `effective_machine_uid = if is_keyed_agent { None } else { claimed }`. The
+suppressed value (not the raw claim) flows to `register_agent` and `upsert_machine`. Keyed (`cak_`)
+agents take the agent_id-keyed upsert branch and never write/touch a `ON CONFLICT (machine_uid)` row, so
+a valid key for machine X cannot repoint machine Y via a claimed uid. An un-keyed uid-spoof can only
+match a uid-bearing row — which the keyed connect path never creates; the only residual is a legacy
+pre-keying row, and the startup L1 fix (`main.rs:267-288` via `keyed_machine_ids`, fail-closed on query
+error) ensures keyed machines are never uid-indexed on restore.
+
+### [INFO] Session reaper + same-machine supersede — clean, TOCTOU closed
+`reap_stale_persistent` (`:875`) and supersede (`:322`) select under a read lock then re-assert the full
+predicate under the write lock via `remove_session_if` (`:755`). Predicate requires
+`!is_online && is_persistent && viewers.is_empty()` (+ TTL / same-uid) — an online, viewer-attached, or
+support session is never reaped/superseded. Un-keyed uid-spoof blast radius = denial-of-persistence on
+an offline same-uid session at worst, never a hijack. Lock order matches `register_agent`; predicate is
+synchronous (no await under lock).
+
+### [INFO] General posture — confirmed, no regressions
+Runtime sqlx parameterized everywhere (no `format!`-built SQL); migrations 008/009 idempotent. Frame
+caps: agent 4 MiB / viewer 64 KiB applied before upgrade. Input throttle retained. `/api/auth/login`
+rate-limited (`main.rs:397`). `JWT_SECRET` panics if <32 (`main.rs:143`); agent keys SHA-256; Argon2id
+passwords; no secret/token/code/PII logged. **Chat content no longer logged** (prior LOW fixed —
+`relay/mod.rs:829,1428` now log length only).
+
+---
+
+## Definitive answers
+
+- **(a) Any non-admin removal path?** NO — all three removal handlers gate on `AdminUser` (JWT+blacklist+`is_admin`→403) before any DB mutation.
+- **(b) Any uid-spoof that repoints/hijacks another machine's row or session (not just denial)?** NO — keyed identity is authoritative and uid-suppressed across connect → upsert → reattach → startup restore. Worst case for an un-keyed spoof is denial-of-persistence on an offline same-uid session.
+- **(c) Any auth-plane bypass (agent↔viewer credential crossover)?** NO — viewer plane requires a `purpose=="viewer"` session-bound minted token; agent plane requires a `cak_`/shared key with no JWT branch. Confirmed in code and live.
+
+---
+
+## Verdict
+
+**Phase-1 security EXIT gate: PASS.** Relay/server plane clean; prior HIGH + LOW remediated; SPEC-004
+surface sound with the keyed-identity invariant intact across the connect path, DB upsert, in-memory
+reattach, and startup restore. No new CRITICAL/HIGH/MEDIUM/LOW.
+
+**Tracked, deferred-by-design (not exit blockers):**
+- Viewer-token logout revocation residual (MEDIUM, TTL-bounded) — `v2-secure-session-core/plan.md`.
+- Update-binary signature verification (defense-in-depth, task #10) — TODO at `update.rs:189`.
+
+*Note: only `--pass=security` was run. API-surface, Rust-quality, TypeScript, protocol-integrity,
+docs-reconciliation, and CI/CD passes were not executed this run.*