Files
guru-connect/docs/specs/SPEC-002-v2-modernization-architecture.md
Mike Swanson 5c60a105c0
Some checks failed
Build and Test / Build Agent (Windows) (push) Successful in 6m34s
Build and Test / Build Server (Linux) (push) Has started running
Build and Test / Security Audit (push) Has been cancelled
Build and Test / Build Summary (push) Has been cancelled
docs(spec): add SPEC-002 GuruConnect v2 modernization architecture
Ground-up v2 re-architecture decided 2026-05-29 (Mike), grounded in the
2026-05-29 audit + adopted GuruRMM design principles. Greenfield salvaging
proven Rust cores (DXGI/GDI capture, input injection, SAS helper, prost codec,
CI). Native-first full key fidelity (Win+R/Ctrl+Alt+Del) + bidirectional file
transfer (clipboard cut/paste + drag-and-drop) as headline differentiators;
WebRTC fallback only. Hardened single-tenant, tenancy-ready schema. Standalone-
first + /api/integration/v1 RMM contract. Closes all audit CRITICALs by design.
Open decisions resolved: in-place repo reset, H.264 default, WSS-first web
transport, widened support codes, clean v1 cutover (no client migration).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 18:08:23 -07:00

29 KiB

SPEC-002: GuruConnect v2 — Modernization Architecture

Status: Approved — v2 direction + open decisions locked 2026-05-29; phases pending shape-spec Priority: P1 (foundational — supersedes the v1 architecture) Requested By: Mike Swanson (2026-05-29) Estimated Effort: X-Large (multi-month, phased) Supersedes: the v1 status docs (PHASE1_*, DEPLOYMENT_*, WEEK1_*, GAP_ANALYSIS.md) — archive on adoption Reference material: reports/2026-05-29-gc-audit.md (the v1 audit), GuruRMM docs/DESIGN.md + docs/ARCHITECTURE_DECISIONS.md, .claude/CODING_GUIDELINES.md, SPEC-001, specs/native-remote-control/, ADR-001, ADR-002


Why v2

v1 was built with an older model and the seams show. The 2026-05-29 audit found the product's threat surface — the remote-control relay plane — carries three independent CRITICAL auth failures (any-JWT session hijack, viewer-WS revocation bypass, JWT-accepted-as-agent-key), the web viewer's "protobuf" decoder is a fabricated layout that cannot decode a single real frame, and production has been running 57 commits stale on a likely-unsigned agent because the deploy step is a stub. At the same time, the audit confirmed the hardware-facing Rust is sound: DXGI/GDI capture, input injection, and the prost wire codec are correct. So this is not a rewrite born of failure — it's a deliberate reset that keeps the hard-won Windows-internals code and rebuilds everything above it to be secure, full-fidelity, modern, and aligned with GuruRMM's design discipline.

The four decisions (set by the product owner, 2026-05-29)

# Decision Choice Consequence
1 Rewrite scope Greenfield, salvage cores Clean architecture; reuse only proven Rust (capture, input, SAS, prost codec, proto, CI). Rebuild relay/auth, session, viewer, dashboard, deploy.
2 Transport + viewer Native-first on custom protobuf-over-WSS; WebRTC secondary Full key fidelity (Win+R, Ctrl+C/V, Ctrl+Alt+Del) is non-negotiable and browsers can't deliver it — see §4. Web/WebRTC is the no-install fallback with documented limits.
3 RMM relationship Standalone-first + versioned contract GC ships/sells independently; RMM integrates via the semver'd /api/integration/v1/ contract (ADR-001, native-remote-control spec).
4 Auth + tenancy Hardened single-tenant now, multi-tenancy-ready schema Fix the relay CRITICALs by design; carry nullable tenant_id from day one so the RMM partner/client model switches on later with no migration rewrite.

What makes v2 "considerably better"

  1. Secure by design — the relay plane is rebuilt around per-agent keys, per-session viewer authorization, and blacklist-checked tokens; the v1 CRITICALs are structurally impossible.
  2. Full-fidelity input — native viewer captures and forwards every key combo including the Windows key and Ctrl+Alt+Del (§4). This is the headline product differentiator and the owner's stated priority.
  3. Modern video — hardware H.264/HEVC encode (Media Foundation/NVENC/QuickSync) replaces raw-BGRA+Zstd as the default, with raw+Zstd kept as the universal fallback (§5).
  4. Complete UI — every capability ships its dashboard + viewer surface (RMM's full-stack rule); the audit's pile of orphaned APIs (machines, releases, client-access) gets real UI.
  5. Clean RMM integration — the /api/integration/v1/ contract + embedded viewer is designed in from day one, not retrofitted.
  6. Operationally honest — real deploy, enforced signing, living-roadmap definition-of-done (already established by SPEC-001/ADR-002; v2 keeps it).

1. Scope

In scope (v2 foundation, Phases 0-3)

Clean re-architecture of agent, relay server, session/auth model, native viewer, dashboard, and protocol; hardware video codecs; the secure session lifecycle; the standalone product end-to-end; the RMM integration contract surface.

Explicitly out of scope (deferred, not cancelled)

  • Full multi-tenancy activation (partner→client isolation, dev impersonation) — schema is ready in v2; the model is switched on in a later phase (§6, Phase 4).
  • WebRTC as primary — secondary/fallback only; the no-install web viewer ships in Phase 2 but native is the supported full-fidelity path.
  • Session recording, BACKSTAGE mode — modeled in the proto, deferred to post-foundation (each ships full-stack when prioritized). (File transfer via clipboard cut/paste + drag-and-drop is NOT deferred — it is a named core differentiator, §4.4.)
  • macOS/Linux agents — Windows-first remains (REQUIREMENTS.md); the server/dashboard are cross-platform-clean but the agent is Windows.

Success criteria

  • A technician starts an attended session via a support code, the end user sees and accepts a consent prompt, and the technician has full keyboard/mouse control including Win+R, Ctrl+C/V across the clipboard, and Ctrl+Alt+Del — with zero relay-plane auth holes (re-audited clean).
  • A technician can cut/copy a file on either side and paste it on the other, and drag files into and out of the session window (guest↔host), with chunked transfer, sha256 integrity, progress, and an audit trail (§4.4).
  • An unattended/managed agent enrolls with a per-agent key (no shared secret), and a revoked key or revoked viewer token is rejected on the WebSocket, not just on REST.
  • Video defaults to hardware H.264 with automatic fallback to raw+Zstd on unsupported hardware (Win7), negotiated per session.
  • RMM can pre-create a session and embed the viewer via /api/integration/v1/ without GC granting RMM any standing credential.

2. Salvage / Scrap Ledger

The audit is the authority for what's proven. Greenfield architecture, but these cores move over largely intact.

SALVAGE (audit-confirmed correct — reuse, don't rewrite)

Component Path (v1) Notes
DXGI screen capture agent/src/capture/dxgi.rs Primary capture path. Wrap in new module structure.
GDI capture fallback agent/src/capture/gdi.rs Win7 / no-DXGI fallback. Keep.
Multi-display capture agent/src/capture/display.rs Feeds the proto SwitchDisplay capability (finish full-stack).
Input injection agent/src/input/{mouse,keyboard}.rs Keep; extend for scan-code + extended-key fidelity (§4).
SAS helper (Ctrl+Alt+Del) agent/src/bin/sas_service.rs The privileged SendSAS path. Keep — it's the only way to deliver Ctrl+Alt+Del.
Protocol schema proto/guruconnect.proto Well-modeled (§3). Extend (Consent, tenant fields); the prost wire layer is correct.
Correct protobuf parser server/static/viewer.html:196-489 Real varint/length-delimited parser — reference (and interim web viewer) until the v2 web viewer lands.
CI/CD + signing .gitea/workflows/{build-and-test,release}.yml, ADR-002, SPEC-001 Modern, gated, hard-fail clippy+audit, Azure Trusted Signing. Keep wholesale.
Version embedding agent/build.rs Keep.
DB conventions migrations style, UUID PKs, TIMESTAMPTZ Keep; new schema, same conventions.

SCRAP / REBUILD (audit found broken or structurally unsafe)

Component Path (v1) Why
Relay auth server/src/relay/mod.rs:224-288 The 3 CRITICALs. Rebuild the entire auth model (§3.3).
Rate limiting server/src/middleware/rate_limit.rs Non-compiling, disabled. Rebuild.
Web "protobuf" codec dashboard/src/lib/protobuf.ts Fabricated fixed-offset layout, wire-incompatible. Delete; rebuild web viewer (§4.3).
React component lib dashboard/src/ Dead/divergent. Rebuild as a real dashboard (§4.4) covering the orphaned APIs.
Deploy step .gitea/workflows/deploy.yml:71-78 Echo stub. Wire real SSH deploy (§7).
Flat session/machine model server/src/{session,db}/ No per-agent keys, no per-session authz, no tenant column. Rebuild (§3).
Shared AGENT_API_KEY server/src/auth/, CLAUDE.md The anti-pattern RMM eliminated. Replace with per-agent keys (§3.3).

ARCHIVE (stale v1 status docs — move to docs/archive/v1/)

PHASE1_*.md, PHASE2_*.md, DEPLOYMENT_*.md, WEEK1_*.md, GAP_ANALYSIS.md, MASTER_ACTION_PLAN.md, CHECKPOINT_*.md, PROJECT_OVERVIEW.md (2026-01-17, stale). v2 tracks state only in docs/FEATURE_ROADMAP.md + docs/ARCHITECTURE_DECISIONS.md + docs/specs/ + TECHNICAL_DEBT.md (RMM living-roadmap discipline). The SEC*_AUDIT.md files stay as point-in-time security history.


3. Architecture

3.1 Repo & module structure (greenfield layout, RMM-aligned)

Decision (2026-05-29): clean architectural reset in the existing guru-connect repo (preserve git history, the good CI, the proto, the Gitea issues) rather than a brand-new repo — "greenfield" applies to the architecture, not the repo identity. Old code is removed/relocated as the v2 modules land; salvaged crates move into the new layout.

guru-connect/
├── agent/                    # Windows agent + native viewer (single static .exe)
│   ├── src/
│   │   ├── capture/          # [salvage] dxgi, gdi, display
│   │   ├── input/            # [salvage+extend] scan-code injection, extended keys, modifier hygiene
│   │   ├── encoder/          # [rebuild] HW H.264/HEVC (Media Foundation) + raw/zstd fallback
│   │   ├── viewer/           # [rebuild] native viewer: low-level kbd hook, decode, render
│   │   ├── transport/        # [salvage codec, rebuild auth handshake] prost over WSS
│   │   ├── session/          # [rebuild] session state, consent, reconnect
│   │   ├── clipboard/        # [new] bidirectional clipboard sync (text/html/image)
│   │   ├── filetransfer/     # [new] CF_HDROP cut/paste + drag in/out, chunked engine (§4.4)
│   │   ├── tray/ install.rs  # [salvage] tray, guruconnect:// handler, auto-install
│   │   └── bin/sas_service.rs# [salvage] Ctrl+Alt+Del SYSTEM helper
│   └── build.rs              # [salvage] version embed
├── server/                   # Linux relay (Axum + PostgreSQL/sqlx)
│   ├── src/
│   │   ├── relay/            # [rebuild] secure agent/viewer WS handlers, frame caps
│   │   ├── session/          # [rebuild] session lifecycle, per-session authz
│   │   ├── auth/             # [rebuild] per-agent keys, session-scoped viewer tokens, blacklist
│   │   ├── api/              # [rebuild] REST: machines, sessions, codes, releases, users, changelog
│   │   ├── integration/      # [new] /api/integration/v1/ contract (RMM broker)
│   │   ├── middleware/       # [rebuild] working rate limiting, security headers, framing allowlist
│   │   └── db/               # [rebuild] schema with tenant_id, agent keys
│   ├── migrations/           # fresh v2 schema (idempotent)
│   └── static/               # interim hardened web viewer (until SPA)
├── dashboard/                # [rebuild] real web dashboard (machines/sessions/codes/releases UI)
├── proto/guruconnect.proto   # [salvage+extend]
└── docs/ (specs, ADRs, roadmap) · reports/ · TECHNICAL_DEBT.md

3.2 Session model (rebuilt — the secure core)

Two session origins, one secure lifecycle:

  • Attended (support code): technician generates a code → end user runs the lightweight one-time agent (code baked in, user-space, no admin, self-deleting) → agent connects, server shows the technician the pending session only after the end user accepts a consent prompt on their machine (new ConsentRequest/ConsentResponse, §3.5).
  • Unattended (managed agent): persistent agent enrolled with a per-agent key; technician opens a session from the dashboard; consent policy is per-tenant (silent for managed endpoints, or notify-only).

Session security invariants (each fixes an audit CRITICAL/HIGH):

  1. Per-session viewer authorization. A viewer connects with a session-scoped, short-lived JWT minted only by an authenticated+authorized request for that session/machine. The WS handler verifies signature + expiry + blacklist + the session claim matches the requested session_id. (Fixes "any JWT joins any session" + "blacklist bypass.")
  2. Plane separation. Agent auth = per-agent key or support code, never a user JWT. The validate_agent_api_key JWT branch is deleted. (Fixes "JWT-as-agent-key.")
  3. Single-use codes. A support code is consumed atomically on first agent bind; a second presenter is rejected; the validate endpoint is rate-limited and the code space widened (high-entropy, human-readable). (Fixes "reusable support codes.")
  4. Bounded relay. Explicit max_message_size/max_frame_size on both WS upgrades; oversized frames rejected before broadcast. Input events rate-limited server-side with a bounded queue. (Fixes "no frame cap / no input throttle.")
  5. Reconcile on restart. Managed sessions persist in DB and reconcile on server restart so they aren't orphaned (native-remote-control prior art).

3.3 Auth model (rebuilt — hardened single-tenant, tenancy-ready)

Plane Credential Verification
Dashboard user JWT (Argon2id login), blacklist on logout full verify on every request and every WS
Viewer (per session) session-scoped short-lived JWT (~5 min) signature + expiry + blacklist + session-claim match
Agent (managed) per-agent hashed key (cak_…), connect_agent_keys(revoked_at) hash compare; revocation immediate
Agent (attended) single-use support code atomic consume; rate-limited validate
Server↔server (RMM) CONNECT_INTEGRATION_KEY (env/SOPS) on /api/integration/v1/* only

Tenancy-ready, not tenant-yet: every tenant-scoped table (machines, sessions, support_codes, events, connect_agent_keys, users) carries a nullable tenant_id defaulting to a single bootstrap tenant. Queries route through a tenancy helper that today resolves to the default tenant. Switching on multi-tenancy (Phase 4) = populate tenant_id, flip the helper to enforce WHERE tenant_id = :current, add the partner/client impersonation layer (RMM ADR-001 model) — no schema rewrite. This honors "hardened single-tenant first" while refusing to paint into a corner.

3.4 Database schema (fresh v2 migrations, conventions per RMM/CODING_GUIDELINES)

Idempotent migrations, UUID PKs, TIMESTAMPTZ, soft-delete, FK ON DELETE CASCADE. Key new/changed tables: connect_agent_keys (per-agent hashed keys + revoked_at), tenant_id columns everywhere, sessions gains is_managed/source/consent_state/tenant_id, support_codes gains single-use consume semantics. Runtime sqlx::query() for new queries (CODING_GUIDELINES), server applies migrations on startup, _sqlx_migrations respected.

3.5 Protocol v2 (proto/guruconnect.proto — salvage + extend)

The v1 proto already models video (raw/vp9/h264/h265 + dirty-rects), full input (VK/scan/unicode + modifiers, SpecialKey for Ctrl+Alt+Del/Lock/PrintScreen), cursor, clipboard (text/html/rtf/image/files), quality presets + codec preference, multi-display, chat, heartbeat, admin, and auto-update. Additions for v2:

  • ConsentRequest / ConsentResponse — attended-mode consent (the missing trust primitive).
  • Optional tenant_id on SessionRequest/AgentStatus for the tenancy-ready model.
  • Negotiated codec in SessionResponse (server tells the viewer which encoding to expect; agent advertises HW-encode capability in AgentStatus).
  • File-transfer messagesFileOfferStart / FileChunk / FileOfferComplete / FileTransferAck / FileTransferCancel, plus clipboard-files (CF_HDROP) and drag signaling (DragEnter/DragDrop) — promoting the FILE_TRANSFER enum to a real subsystem (§4.4), carried on its own logical stream.
  • Wire the already-modeled-but-undispatched variants (SwitchDisplay, QualitySettings, UpdateStatus) end-to-end (RMM full-stack rule — no headless capabilities).

4. Full-Fidelity Input, Clipboard & File Transfer — the headline features (§ owner priority)

Browsers cannot deliver full keyboard control; this is why native is primary and WebRTC is secondary. v2 treats input fidelity, clipboard, and file transfer as first-class, interlocking subsystems — the owner specifically named keyboard hooks and cut/paste + drag files from/to either guest or host as favorite ScreenConnect behaviors to match.

4.1 Native viewer capture (the technician side)

  • Low-level keyboard hook (WH_KEYBOARD_LL) installed while the viewer is focused/in control, so the viewer intercepts system combos — Win/Meta key, Win+R, Win+E, Alt+Tab, Ctrl+Escbefore the local shell consumes them, and forwards them as KeyEvent (VK + scan code + extended flag) instead of letting the local OS act.
  • A "send system keys to remote" toggle (default on when the viewer is in fullscreen/focused control), so the hook only diverts combos during active control.
  • Modifier-state hygiene: track modifier up/down and re-sync on focus loss (release stuck modifiers) — eliminates the classic "remote thinks Ctrl is still held" bug.

4.2 Agent injection (the remote side)

  • Scan-code injection (SendInput with KEYEVENTF_SCANCODE, correct KEYEVENTF_EXTENDEDKEY for right-Ctrl/Alt, arrows, Win, Insert/Home/etc.) — layout-independent, works in games/secure-desktop-adjacent apps where VK injection fails.
  • Ctrl+Alt+DelSpecialKeyEvent.CTRL_ALT_DEL → the salvaged SAS service (SendSAS, runs as SYSTEM, requires SoftwareSASGeneration policy which the managed installer sets). This is the only reliable path and a key differentiator vs browser tools.
  • Clipboard sync (Ctrl+C/Ctrl+V value): bidirectional ClipboardData sync — text/html/image (Phase 1) and files via CF_HDROP (the file-transfer subsystem, §4.4) — so copy on either side is available on the other. The keystrokes inject, and the clipboard content actually transfers. Full-stack: a clipboard-activity indicator in the viewer.

4.3 Web/WebRTC secondary viewer (the no-install fallback) — honest limits

  • Uses the Keyboard Lock API (navigator.keyboard.lock()) + Fullscreen + Pointer Lock to capture as many keys as the browser allows (Escape, Tab, most combos).
  • Cannot deliver Ctrl+Alt+Del (browser-impossible) or reliably the Meta/Win key. Compensates with an on-screen special-keys toolbar (Ctrl+Alt+Del button → SpecialKeyEvent; Win, Win+R, Alt+Tab macros) — the same idea the v1 SessionControls gestured at, done properly.
  • Transport: either the same protobuf-over-WSS (with a correct decoder ported from viewer.html, not the fabricated one) or a WebRTC track fed by the agent's H.264 encoder (§5) for lower latency. Decided in Phase 2; WSS is the simpler first cut.
  • Clearly labeled in-product as the quick-access path; the native viewer is the full-control path.

4.4 File Transfer — clipboard cut/paste + drag-and-drop, bidirectional (§ owner favorite)

A named must-have: the ScreenConnect behavior where you cut/copy a file and paste it across, or drag a file into or out of the session window, in either direction (guest↔host). It rides on the clipboard + input subsystems but needs a real chunked transfer channel underneath. v1 modeled it as an enum value only (SessionType.FILE_TRANSFER); v2 builds it for real and treats it as a core differentiator, not a deferred panel.

Two entry paths, one transfer engine:

  • Clipboard file transfer (Ctrl+C → Ctrl+V): detect CF_HDROP on the source clipboard; instead of copying bytes eagerly, place a delayed-render (virtual) clipboard promise on the destination so the transfer fires only when the user actually pastes. On paste, stream the bytes over the transfer channel and reconstruct a real CF_HDROP on the destination so Explorer/apps receive actual files. (Eager copy of a multi-GB selection on every Ctrl+C would be unusable — delayed render is the correct design and what ScreenConnect does.)
  • Drag-in (local → remote): the viewer window registers an OS drop target (IDropTarget / WM_DROPFILES); files dropped onto the session stream to the remote and a synthetic drop is delivered at the drop point on the remote desktop.
  • Drag-out (remote → local): detect a drag originating on the remote and present the local drag operation a delayed-render IDataObject that fetches bytes from the remote on drop. This is the genuinely hard case — ships after drag-in, behind the same engine.

Transfer engine (shared by all three paths):

  • New protocol messages (§3.5): FileOfferStart (name, size, sha256, count, relative path), FileChunk (offset, bytes), FileOfferComplete, FileTransferAck/Cancel, plus clipboard-files and drag signaling. Runs as its own logical stream so a large transfer never blocks video or input.
  • Chunked + backpressured (bounded in-flight window), directory-aware (relative paths preserved), sha256 integrity per file, cancellable, with resume-friendly framing.
  • Security & governance: moving files across a trust boundary is sensitive. Transfer is a per-session capability gated by policy/consent (default on for attended-with-consent; per-tenant policy for managed endpoints), fully audited (who, direction, filenames, sizes, sha256), with configurable per-tenant size/type limits. The audit lands in the events/audit trail and surfaces in the dashboard.
  • Full-stack: a transfer panel + per-file progress in the viewer; a transfer/audit view in the dashboard (RMM full-stack rule — no headless capability).

5. Video Pipeline Modernization

v1 ships only raw BGRA + Zstd — CPU-heavy and bandwidth-heavy. v2:

  • Default: hardware H.264/HEVC via Windows Media Foundation (transparently uses NVENC/AMF/QuickSync where present). Emits the proto's already-modeled EncodedFrame (h264/h265). Native viewer decodes via MF/D3D11.
  • Fallback: raw BGRA + Zstd + dirty-rects (the salvaged path) for Win7 or machines with no HW encoder — negotiated at session start via AgentStatus capability + SessionResponse codec selection.
  • One encode, two consumers: the same H.264 stream feeds the native viewer (WSS EncodedFrame) and, in Phase 2, the WebRTC track — so the codec investment serves both viewer surfaces.
  • Quality presets (QualitySettings auto/low/balanced/high, custom fps/bitrate) wired end-to-end with a viewer control (full-stack).

6. Phased Roadmap

Each phase ships full-stack (proto + agent + server + viewer + dashboard + docs + roadmap flip). Phases are sequential; each ends re-audit-clean via /gc-audit.

  • Phase 0 — Foundation reset. New module layout in-repo; salvage core crates; proto v2 (Consent, tenant fields, codec negotiation); fresh v2 schema (tenancy-ready); CI/signing already in place (keep). Archive stale docs.
  • Phase 1 — Secure session core (the heart). Rebuilt auth/session (per-agent keys, session-scoped viewer tokens, blacklist-on-WS, plane separation, single-use codes, frame caps, input throttle, real rate limiting); attended consent; native viewer end-to-end with full key fidelity (low-level hook, scan-code injection, SAS, clipboard sync); HW H.264 + raw/zstd fallback. Exit: a clean attended + unattended session with full control and zero relay CRITICALs.
  • Phase 2 — File transfer, dashboard + web viewer. File transfer — clipboard cut/paste (CF_HDROP delayed render) + drag-in, bidirectional, with progress + audit (§4.4); drag-out follows behind the same engine. Real dashboard covering the audit's orphaned surfaces (machines inventory/history/delete, releases mgmt, client-access, sessions detail); hardened web viewer (correct decoder, special-keys toolbar, Keyboard Lock); WebRTC secondary path.
  • Phase 3 — RMM integration contract. /api/integration/v1/ + GET /api/integration/capabilities + programmatic session pre-create + embedded viewer.html?embed=1 with scoped frame-ancestors allowlist (native-remote-control spec, made core). One-click launch from an RMM agent detail page (true-integration, anti-Datto).
  • Phase 4 — Multi-tenancy switch-on (deferred). Activate tenant_id enforcement; partner→client isolation + dev impersonation (RMM ADR-001 model); SSO. No schema rewrite (Phase 0 made it ready).

7. Deployment & CI (mostly keep — one real fix)

ADR-002's Gitea Actions + Azure Trusted Signing pipeline is modern and the audit rated the workflow code good. Two fixes:

  1. Wire the real deploy — replace deploy.yml's echo stub with the actual ssh/scp deploy (deploy key in repo secrets; stop→copy→start to avoid ETXTBSY, RMM lesson). Production has been 57 commits stale because of this.
  2. Chain cargo audit into release.yml/deploy.yml (currently only on push-to-main) so a release can't ship with a new advisory. Keep: clippy -D warnings + cargo-audit hard gates, conventional-commit versioning, git-cliff changelog + /api/changelog, fail-the-release-on-unsigned signing. Re-verify the Pluto/runner registration post any Gitea blip (audit coverage gap).

8. Security Posture (v2 closes every audit finding by design)

v1 audit finding v2 resolution Section
CRITICAL: any JWT joins any session session-scoped viewer tokens + session-claim match on WS §3.2/§3.3
CRITICAL: viewer WS blacklist bypass blacklist enforced on every JWT path incl. WS §3.3
CRITICAL: JWT accepted as agent key plane separation; per-agent keys; JWT branch deleted §3.3
HIGH: reusable support codes single-use atomic consume + rate-limited validate + wider entropy §3.2
HIGH: no WS frame cap (OOM) explicit max message/frame size; reject before broadcast §3.2
HIGH: rate limiting disabled/non-compiling rebuilt working rate limiting on auth/validate §3.1
HIGH: admin password logged on fallback never route secrets through the logger CODING_GUIDELINES
MEDIUM: no input throttle/consent gate bounded input queue + attended consent prompt §3.2/§4.2
shared AGENT_API_KEY anti-pattern per-agent hashed keys (RMM Option 3) §3.3
flat schema, no tenancy tenancy-ready tenant_id from day one §3.3
broken web protobuf decoder correct decoder; native-first §4.3
stale/unsigned production deploy real deploy step + enforced signing §7

9. Adopted GuruRMM Principles (provenance)

  • Per-agent enrollment keys, no installer shared secret (RMM Option 3 / DESIGN.md) → §3.3.
  • No TOML/config for endpoints — relay URL compiled in; identity in registry/baked at generation; server secrets in env/SOPS only (RMM permanent rule) → agent constraint, stated in v2 CLAUDE.md.
  • Living roadmap = definition-of-done (DESIGN.md) → every capability flips a dated roadmap box in the same commit; /gc-audit is the backstop.
  • Holistic full-stack features (DESIGN.md) → no headless capabilities; each proto message ships its UI (§3.5/§6).
  • Standalone works with zero upstream dependency (RMM "works without AI", GC ADR-001) → §1, Decision 3.
  • Versioned integration contract + capability discovery (RMM ADR-008 / GC ADR-001) → §6 Phase 3.
  • True integration, anti-Datto (RMM roadmap design principles) → one-click RMM→GC launch, embedded viewer, dual-side audit (§6 Phase 3).
  • Multi-tenancy partner/client model (RMM ADR-001) → designed-for in §3.3, activated Phase 4.
  • Error/sqlx/auth conventions (CODING_GUIDELINES) → anyhow/thiserror/tracing, runtime sqlx, idempotent migrations, JWT/Argon2, hashed short-lived tokens.

10. Resolved Decisions (2026-05-29, Mike)

  1. Repo: clean architectural reset in-place in guru-connect — keeps git history, CI, proto, and Gitea issues. v1 code is removed/relocated as v2 modules land.
  2. Codec default: H.264 (universal decode, low-latency tooling) is the default; HEVC is opt-in per session for bandwidth-constrained links.
  3. Web viewer transport (Phase 2): protobuf-over-WSS first (reuses the agent's encode path), with WebRTC added later as the latency upgrade.
  4. Support-code format: widen to a higher-entropy, human-readable code + rate-limit/lockout on the validate endpoint — replaces the brute-forceable 6-digit numeric.
  5. v1 cutover: clean wholesale replacement. No real clients on the instance, so v2 replaces v1 at connect.azcomputerguru.com once Phase 1 is live — no data migration; v1 is archived, not migrated.

Next Steps

  1. Confirm the Open Questions (esp. repo reset in-place, H.264 default, cutover).
  2. Flip the v1 FEATURE_ROADMAP.md to a v2 structure mirroring these phases (or supersede it).
  3. /shape-spec Phase 1 (the secure session core) into an implementable plan.
  4. Begin Phase 0 foundation reset.