Ground-up v2 re-architecture decided 2026-05-29 (Mike), grounded in the 2026-05-29 audit + adopted GuruRMM design principles. Greenfield salvaging proven Rust cores (DXGI/GDI capture, input injection, SAS helper, prost codec, CI). Native-first full key fidelity (Win+R/Ctrl+Alt+Del) + bidirectional file transfer (clipboard cut/paste + drag-and-drop) as headline differentiators; WebRTC fallback only. Hardened single-tenant, tenancy-ready schema. Standalone- first + /api/integration/v1 RMM contract. Closes all audit CRITICALs by design. Open decisions resolved: in-place repo reset, H.264 default, WSS-first web transport, widened support codes, clean v1 cutover (no client migration). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
29 KiB
SPEC-002: GuruConnect v2 — Modernization Architecture
Status: Approved — v2 direction + open decisions locked 2026-05-29; phases pending shape-spec
Priority: P1 (foundational — supersedes the v1 architecture)
Requested By: Mike Swanson (2026-05-29)
Estimated Effort: X-Large (multi-month, phased)
Supersedes: the v1 status docs (PHASE1_*, DEPLOYMENT_*, WEEK1_*, GAP_ANALYSIS.md) — archive on adoption
Reference material: reports/2026-05-29-gc-audit.md (the v1 audit), GuruRMM docs/DESIGN.md + docs/ARCHITECTURE_DECISIONS.md, .claude/CODING_GUIDELINES.md, SPEC-001, specs/native-remote-control/, ADR-001, ADR-002
Why v2
v1 was built with an older model and the seams show. The 2026-05-29 audit found the product's threat surface — the remote-control relay plane — carries three independent CRITICAL auth failures (any-JWT session hijack, viewer-WS revocation bypass, JWT-accepted-as-agent-key), the web viewer's "protobuf" decoder is a fabricated layout that cannot decode a single real frame, and production has been running 57 commits stale on a likely-unsigned agent because the deploy step is a stub. At the same time, the audit confirmed the hardware-facing Rust is sound: DXGI/GDI capture, input injection, and the prost wire codec are correct. So this is not a rewrite born of failure — it's a deliberate reset that keeps the hard-won Windows-internals code and rebuilds everything above it to be secure, full-fidelity, modern, and aligned with GuruRMM's design discipline.
The four decisions (set by the product owner, 2026-05-29)
| # | Decision | Choice | Consequence |
|---|---|---|---|
| 1 | Rewrite scope | Greenfield, salvage cores | Clean architecture; reuse only proven Rust (capture, input, SAS, prost codec, proto, CI). Rebuild relay/auth, session, viewer, dashboard, deploy. |
| 2 | Transport + viewer | Native-first on custom protobuf-over-WSS; WebRTC secondary | Full key fidelity (Win+R, Ctrl+C/V, Ctrl+Alt+Del) is non-negotiable and browsers can't deliver it — see §4. Web/WebRTC is the no-install fallback with documented limits. |
| 3 | RMM relationship | Standalone-first + versioned contract | GC ships/sells independently; RMM integrates via the semver'd /api/integration/v1/ contract (ADR-001, native-remote-control spec). |
| 4 | Auth + tenancy | Hardened single-tenant now, multi-tenancy-ready schema | Fix the relay CRITICALs by design; carry nullable tenant_id from day one so the RMM partner/client model switches on later with no migration rewrite. |
What makes v2 "considerably better"
- Secure by design — the relay plane is rebuilt around per-agent keys, per-session viewer authorization, and blacklist-checked tokens; the v1 CRITICALs are structurally impossible.
- Full-fidelity input — native viewer captures and forwards every key combo including the Windows key and Ctrl+Alt+Del (§4). This is the headline product differentiator and the owner's stated priority.
- Modern video — hardware H.264/HEVC encode (Media Foundation/NVENC/QuickSync) replaces raw-BGRA+Zstd as the default, with raw+Zstd kept as the universal fallback (§5).
- Complete UI — every capability ships its dashboard + viewer surface (RMM's full-stack rule); the audit's pile of orphaned APIs (machines, releases, client-access) gets real UI.
- Clean RMM integration — the
/api/integration/v1/contract + embedded viewer is designed in from day one, not retrofitted. - Operationally honest — real deploy, enforced signing, living-roadmap definition-of-done (already established by SPEC-001/ADR-002; v2 keeps it).
1. Scope
In scope (v2 foundation, Phases 0-3)
Clean re-architecture of agent, relay server, session/auth model, native viewer, dashboard, and protocol; hardware video codecs; the secure session lifecycle; the standalone product end-to-end; the RMM integration contract surface.
Explicitly out of scope (deferred, not cancelled)
- Full multi-tenancy activation (partner→client isolation, dev impersonation) — schema is ready in v2; the model is switched on in a later phase (§6, Phase 4).
- WebRTC as primary — secondary/fallback only; the no-install web viewer ships in Phase 2 but native is the supported full-fidelity path.
- Session recording, BACKSTAGE mode — modeled in the proto, deferred to post-foundation (each ships full-stack when prioritized). (File transfer via clipboard cut/paste + drag-and-drop is NOT deferred — it is a named core differentiator, §4.4.)
- macOS/Linux agents — Windows-first remains (REQUIREMENTS.md); the server/dashboard are cross-platform-clean but the agent is Windows.
Success criteria
- A technician starts an attended session via a support code, the end user sees and accepts a consent prompt, and the technician has full keyboard/mouse control including Win+R, Ctrl+C/V across the clipboard, and Ctrl+Alt+Del — with zero relay-plane auth holes (re-audited clean).
- A technician can cut/copy a file on either side and paste it on the other, and drag files into and out of the session window (guest↔host), with chunked transfer, sha256 integrity, progress, and an audit trail (§4.4).
- An unattended/managed agent enrolls with a per-agent key (no shared secret), and a revoked key or revoked viewer token is rejected on the WebSocket, not just on REST.
- Video defaults to hardware H.264 with automatic fallback to raw+Zstd on unsupported hardware (Win7), negotiated per session.
- RMM can pre-create a session and embed the viewer via
/api/integration/v1/without GC granting RMM any standing credential.
2. Salvage / Scrap Ledger
The audit is the authority for what's proven. Greenfield architecture, but these cores move over largely intact.
SALVAGE (audit-confirmed correct — reuse, don't rewrite)
| Component | Path (v1) | Notes |
|---|---|---|
| DXGI screen capture | agent/src/capture/dxgi.rs |
Primary capture path. Wrap in new module structure. |
| GDI capture fallback | agent/src/capture/gdi.rs |
Win7 / no-DXGI fallback. Keep. |
| Multi-display capture | agent/src/capture/display.rs |
Feeds the proto SwitchDisplay capability (finish full-stack). |
| Input injection | agent/src/input/{mouse,keyboard}.rs |
Keep; extend for scan-code + extended-key fidelity (§4). |
| SAS helper (Ctrl+Alt+Del) | agent/src/bin/sas_service.rs |
The privileged SendSAS path. Keep — it's the only way to deliver Ctrl+Alt+Del. |
| Protocol schema | proto/guruconnect.proto |
Well-modeled (§3). Extend (Consent, tenant fields); the prost wire layer is correct. |
| Correct protobuf parser | server/static/viewer.html:196-489 |
Real varint/length-delimited parser — reference (and interim web viewer) until the v2 web viewer lands. |
| CI/CD + signing | .gitea/workflows/{build-and-test,release}.yml, ADR-002, SPEC-001 |
Modern, gated, hard-fail clippy+audit, Azure Trusted Signing. Keep wholesale. |
| Version embedding | agent/build.rs |
Keep. |
| DB conventions | migrations style, UUID PKs, TIMESTAMPTZ | Keep; new schema, same conventions. |
SCRAP / REBUILD (audit found broken or structurally unsafe)
| Component | Path (v1) | Why |
|---|---|---|
| Relay auth | server/src/relay/mod.rs:224-288 |
The 3 CRITICALs. Rebuild the entire auth model (§3.3). |
| Rate limiting | server/src/middleware/rate_limit.rs |
Non-compiling, disabled. Rebuild. |
| Web "protobuf" codec | dashboard/src/lib/protobuf.ts |
Fabricated fixed-offset layout, wire-incompatible. Delete; rebuild web viewer (§4.3). |
| React component lib | dashboard/src/ |
Dead/divergent. Rebuild as a real dashboard (§4.4) covering the orphaned APIs. |
| Deploy step | .gitea/workflows/deploy.yml:71-78 |
Echo stub. Wire real SSH deploy (§7). |
| Flat session/machine model | server/src/{session,db}/ |
No per-agent keys, no per-session authz, no tenant column. Rebuild (§3). |
Shared AGENT_API_KEY |
server/src/auth/, CLAUDE.md |
The anti-pattern RMM eliminated. Replace with per-agent keys (§3.3). |
ARCHIVE (stale v1 status docs — move to docs/archive/v1/)
PHASE1_*.md, PHASE2_*.md, DEPLOYMENT_*.md, WEEK1_*.md, GAP_ANALYSIS.md, MASTER_ACTION_PLAN.md, CHECKPOINT_*.md, PROJECT_OVERVIEW.md (2026-01-17, stale). v2 tracks state only in docs/FEATURE_ROADMAP.md + docs/ARCHITECTURE_DECISIONS.md + docs/specs/ + TECHNICAL_DEBT.md (RMM living-roadmap discipline). The SEC*_AUDIT.md files stay as point-in-time security history.
3. Architecture
3.1 Repo & module structure (greenfield layout, RMM-aligned)
Decision (2026-05-29): clean architectural reset in the existing guru-connect repo (preserve git history, the good CI, the proto, the Gitea issues) rather than a brand-new repo — "greenfield" applies to the architecture, not the repo identity. Old code is removed/relocated as the v2 modules land; salvaged crates move into the new layout.
guru-connect/
├── agent/ # Windows agent + native viewer (single static .exe)
│ ├── src/
│ │ ├── capture/ # [salvage] dxgi, gdi, display
│ │ ├── input/ # [salvage+extend] scan-code injection, extended keys, modifier hygiene
│ │ ├── encoder/ # [rebuild] HW H.264/HEVC (Media Foundation) + raw/zstd fallback
│ │ ├── viewer/ # [rebuild] native viewer: low-level kbd hook, decode, render
│ │ ├── transport/ # [salvage codec, rebuild auth handshake] prost over WSS
│ │ ├── session/ # [rebuild] session state, consent, reconnect
│ │ ├── clipboard/ # [new] bidirectional clipboard sync (text/html/image)
│ │ ├── filetransfer/ # [new] CF_HDROP cut/paste + drag in/out, chunked engine (§4.4)
│ │ ├── tray/ install.rs # [salvage] tray, guruconnect:// handler, auto-install
│ │ └── bin/sas_service.rs# [salvage] Ctrl+Alt+Del SYSTEM helper
│ └── build.rs # [salvage] version embed
├── server/ # Linux relay (Axum + PostgreSQL/sqlx)
│ ├── src/
│ │ ├── relay/ # [rebuild] secure agent/viewer WS handlers, frame caps
│ │ ├── session/ # [rebuild] session lifecycle, per-session authz
│ │ ├── auth/ # [rebuild] per-agent keys, session-scoped viewer tokens, blacklist
│ │ ├── api/ # [rebuild] REST: machines, sessions, codes, releases, users, changelog
│ │ ├── integration/ # [new] /api/integration/v1/ contract (RMM broker)
│ │ ├── middleware/ # [rebuild] working rate limiting, security headers, framing allowlist
│ │ └── db/ # [rebuild] schema with tenant_id, agent keys
│ ├── migrations/ # fresh v2 schema (idempotent)
│ └── static/ # interim hardened web viewer (until SPA)
├── dashboard/ # [rebuild] real web dashboard (machines/sessions/codes/releases UI)
├── proto/guruconnect.proto # [salvage+extend]
└── docs/ (specs, ADRs, roadmap) · reports/ · TECHNICAL_DEBT.md
3.2 Session model (rebuilt — the secure core)
Two session origins, one secure lifecycle:
- Attended (support code): technician generates a code → end user runs the lightweight one-time agent (code baked in, user-space, no admin, self-deleting) → agent connects, server shows the technician the pending session only after the end user accepts a consent prompt on their machine (new
ConsentRequest/ConsentResponse, §3.5). - Unattended (managed agent): persistent agent enrolled with a per-agent key; technician opens a session from the dashboard; consent policy is per-tenant (silent for managed endpoints, or notify-only).
Session security invariants (each fixes an audit CRITICAL/HIGH):
- Per-session viewer authorization. A viewer connects with a session-scoped, short-lived JWT minted only by an authenticated+authorized request for that session/machine. The WS handler verifies signature + expiry + blacklist + the session claim matches the requested
session_id. (Fixes "any JWT joins any session" + "blacklist bypass.") - Plane separation. Agent auth = per-agent key or support code, never a user JWT. The
validate_agent_api_keyJWT branch is deleted. (Fixes "JWT-as-agent-key.") - Single-use codes. A support code is consumed atomically on first agent bind; a second presenter is rejected; the validate endpoint is rate-limited and the code space widened (high-entropy, human-readable). (Fixes "reusable support codes.")
- Bounded relay. Explicit
max_message_size/max_frame_sizeon both WS upgrades; oversized frames rejected before broadcast. Input events rate-limited server-side with a bounded queue. (Fixes "no frame cap / no input throttle.") - Reconcile on restart. Managed sessions persist in DB and reconcile on server restart so they aren't orphaned (native-remote-control prior art).
3.3 Auth model (rebuilt — hardened single-tenant, tenancy-ready)
| Plane | Credential | Verification |
|---|---|---|
| Dashboard user | JWT (Argon2id login), blacklist on logout | full verify on every request and every WS |
| Viewer (per session) | session-scoped short-lived JWT (~5 min) | signature + expiry + blacklist + session-claim match |
| Agent (managed) | per-agent hashed key (cak_…), connect_agent_keys(revoked_at) |
hash compare; revocation immediate |
| Agent (attended) | single-use support code | atomic consume; rate-limited validate |
| Server↔server (RMM) | CONNECT_INTEGRATION_KEY (env/SOPS) |
on /api/integration/v1/* only |
Tenancy-ready, not tenant-yet: every tenant-scoped table (machines, sessions, support_codes, events, connect_agent_keys, users) carries a nullable tenant_id defaulting to a single bootstrap tenant. Queries route through a tenancy helper that today resolves to the default tenant. Switching on multi-tenancy (Phase 4) = populate tenant_id, flip the helper to enforce WHERE tenant_id = :current, add the partner/client impersonation layer (RMM ADR-001 model) — no schema rewrite. This honors "hardened single-tenant first" while refusing to paint into a corner.
3.4 Database schema (fresh v2 migrations, conventions per RMM/CODING_GUIDELINES)
Idempotent migrations, UUID PKs, TIMESTAMPTZ, soft-delete, FK ON DELETE CASCADE. Key new/changed tables: connect_agent_keys (per-agent hashed keys + revoked_at), tenant_id columns everywhere, sessions gains is_managed/source/consent_state/tenant_id, support_codes gains single-use consume semantics. Runtime sqlx::query() for new queries (CODING_GUIDELINES), server applies migrations on startup, _sqlx_migrations respected.
3.5 Protocol v2 (proto/guruconnect.proto — salvage + extend)
The v1 proto already models video (raw/vp9/h264/h265 + dirty-rects), full input (VK/scan/unicode + modifiers, SpecialKey for Ctrl+Alt+Del/Lock/PrintScreen), cursor, clipboard (text/html/rtf/image/files), quality presets + codec preference, multi-display, chat, heartbeat, admin, and auto-update. Additions for v2:
ConsentRequest/ConsentResponse— attended-mode consent (the missing trust primitive).- Optional
tenant_idonSessionRequest/AgentStatusfor the tenancy-ready model. - Negotiated codec in
SessionResponse(server tells the viewer which encoding to expect; agent advertises HW-encode capability inAgentStatus). - File-transfer messages —
FileOfferStart/FileChunk/FileOfferComplete/FileTransferAck/FileTransferCancel, plus clipboard-files (CF_HDROP) and drag signaling (DragEnter/DragDrop) — promoting theFILE_TRANSFERenum to a real subsystem (§4.4), carried on its own logical stream. - Wire the already-modeled-but-undispatched variants (
SwitchDisplay,QualitySettings,UpdateStatus) end-to-end (RMM full-stack rule — no headless capabilities).
4. Full-Fidelity Input, Clipboard & File Transfer — the headline features (§ owner priority)
Browsers cannot deliver full keyboard control; this is why native is primary and WebRTC is secondary. v2 treats input fidelity, clipboard, and file transfer as first-class, interlocking subsystems — the owner specifically named keyboard hooks and cut/paste + drag files from/to either guest or host as favorite ScreenConnect behaviors to match.
4.1 Native viewer capture (the technician side)
- Low-level keyboard hook (
WH_KEYBOARD_LL) installed while the viewer is focused/in control, so the viewer intercepts system combos — Win/Meta key, Win+R, Win+E, Alt+Tab, Ctrl+Esc — before the local shell consumes them, and forwards them asKeyEvent(VK + scan code + extended flag) instead of letting the local OS act. - A "send system keys to remote" toggle (default on when the viewer is in fullscreen/focused control), so the hook only diverts combos during active control.
- Modifier-state hygiene: track modifier up/down and re-sync on focus loss (release stuck modifiers) — eliminates the classic "remote thinks Ctrl is still held" bug.
4.2 Agent injection (the remote side)
- Scan-code injection (
SendInputwithKEYEVENTF_SCANCODE, correctKEYEVENTF_EXTENDEDKEYfor right-Ctrl/Alt, arrows, Win, Insert/Home/etc.) — layout-independent, works in games/secure-desktop-adjacent apps where VK injection fails. - Ctrl+Alt+Del →
SpecialKeyEvent.CTRL_ALT_DEL→ the salvaged SAS service (SendSAS, runs as SYSTEM, requiresSoftwareSASGenerationpolicy which the managed installer sets). This is the only reliable path and a key differentiator vs browser tools. - Clipboard sync (
Ctrl+C/Ctrl+Vvalue): bidirectionalClipboardDatasync — text/html/image (Phase 1) and files viaCF_HDROP(the file-transfer subsystem, §4.4) — so copy on either side is available on the other. The keystrokes inject, and the clipboard content actually transfers. Full-stack: a clipboard-activity indicator in the viewer.
4.3 Web/WebRTC secondary viewer (the no-install fallback) — honest limits
- Uses the Keyboard Lock API (
navigator.keyboard.lock()) + Fullscreen + Pointer Lock to capture as many keys as the browser allows (Escape, Tab, most combos). - Cannot deliver Ctrl+Alt+Del (browser-impossible) or reliably the Meta/Win key. Compensates with an on-screen special-keys toolbar (Ctrl+Alt+Del button →
SpecialKeyEvent; Win, Win+R, Alt+Tab macros) — the same idea the v1SessionControlsgestured at, done properly. - Transport: either the same protobuf-over-WSS (with a correct decoder ported from
viewer.html, not the fabricated one) or a WebRTC track fed by the agent's H.264 encoder (§5) for lower latency. Decided in Phase 2; WSS is the simpler first cut. - Clearly labeled in-product as the quick-access path; the native viewer is the full-control path.
4.4 File Transfer — clipboard cut/paste + drag-and-drop, bidirectional (§ owner favorite)
A named must-have: the ScreenConnect behavior where you cut/copy a file and paste it across, or drag a file into or out of the session window, in either direction (guest↔host). It rides on the clipboard + input subsystems but needs a real chunked transfer channel underneath. v1 modeled it as an enum value only (SessionType.FILE_TRANSFER); v2 builds it for real and treats it as a core differentiator, not a deferred panel.
Two entry paths, one transfer engine:
- Clipboard file transfer (Ctrl+C → Ctrl+V): detect
CF_HDROPon the source clipboard; instead of copying bytes eagerly, place a delayed-render (virtual) clipboard promise on the destination so the transfer fires only when the user actually pastes. On paste, stream the bytes over the transfer channel and reconstruct a realCF_HDROPon the destination so Explorer/apps receive actual files. (Eager copy of a multi-GB selection on every Ctrl+C would be unusable — delayed render is the correct design and what ScreenConnect does.) - Drag-in (local → remote): the viewer window registers an OS drop target (
IDropTarget/WM_DROPFILES); files dropped onto the session stream to the remote and a synthetic drop is delivered at the drop point on the remote desktop. - Drag-out (remote → local): detect a drag originating on the remote and present the local drag operation a delayed-render
IDataObjectthat fetches bytes from the remote on drop. This is the genuinely hard case — ships after drag-in, behind the same engine.
Transfer engine (shared by all three paths):
- New protocol messages (§3.5):
FileOfferStart(name, size, sha256, count, relative path),FileChunk(offset, bytes),FileOfferComplete,FileTransferAck/Cancel, plus clipboard-files and drag signaling. Runs as its own logical stream so a large transfer never blocks video or input. - Chunked + backpressured (bounded in-flight window), directory-aware (relative paths preserved), sha256 integrity per file, cancellable, with resume-friendly framing.
- Security & governance: moving files across a trust boundary is sensitive. Transfer is a per-session capability gated by policy/consent (default on for attended-with-consent; per-tenant policy for managed endpoints), fully audited (who, direction, filenames, sizes, sha256), with configurable per-tenant size/type limits. The audit lands in the
events/audit trail and surfaces in the dashboard. - Full-stack: a transfer panel + per-file progress in the viewer; a transfer/audit view in the dashboard (RMM full-stack rule — no headless capability).
5. Video Pipeline Modernization
v1 ships only raw BGRA + Zstd — CPU-heavy and bandwidth-heavy. v2:
- Default: hardware H.264/HEVC via Windows Media Foundation (transparently uses NVENC/AMF/QuickSync where present). Emits the proto's already-modeled
EncodedFrame(h264/h265). Native viewer decodes via MF/D3D11. - Fallback: raw BGRA + Zstd + dirty-rects (the salvaged path) for Win7 or machines with no HW encoder — negotiated at session start via
AgentStatuscapability +SessionResponsecodec selection. - One encode, two consumers: the same H.264 stream feeds the native viewer (WSS
EncodedFrame) and, in Phase 2, the WebRTC track — so the codec investment serves both viewer surfaces. - Quality presets (
QualitySettingsauto/low/balanced/high, custom fps/bitrate) wired end-to-end with a viewer control (full-stack).
6. Phased Roadmap
Each phase ships full-stack (proto + agent + server + viewer + dashboard + docs + roadmap flip). Phases are sequential; each ends re-audit-clean via /gc-audit.
- Phase 0 — Foundation reset. New module layout in-repo; salvage core crates; proto v2 (Consent, tenant fields, codec negotiation); fresh v2 schema (tenancy-ready); CI/signing already in place (keep). Archive stale docs.
- Phase 1 — Secure session core (the heart). Rebuilt auth/session (per-agent keys, session-scoped viewer tokens, blacklist-on-WS, plane separation, single-use codes, frame caps, input throttle, real rate limiting); attended consent; native viewer end-to-end with full key fidelity (low-level hook, scan-code injection, SAS, clipboard sync); HW H.264 + raw/zstd fallback. Exit: a clean attended + unattended session with full control and zero relay CRITICALs.
- Phase 2 — File transfer, dashboard + web viewer. File transfer — clipboard cut/paste (
CF_HDROPdelayed render) + drag-in, bidirectional, with progress + audit (§4.4); drag-out follows behind the same engine. Real dashboard covering the audit's orphaned surfaces (machines inventory/history/delete, releases mgmt, client-access, sessions detail); hardened web viewer (correct decoder, special-keys toolbar, Keyboard Lock); WebRTC secondary path. - Phase 3 — RMM integration contract.
/api/integration/v1/+GET /api/integration/capabilities+ programmatic session pre-create + embeddedviewer.html?embed=1with scopedframe-ancestorsallowlist (native-remote-control spec, made core). One-click launch from an RMM agent detail page (true-integration, anti-Datto). - Phase 4 — Multi-tenancy switch-on (deferred). Activate
tenant_idenforcement; partner→client isolation + dev impersonation (RMM ADR-001 model); SSO. No schema rewrite (Phase 0 made it ready).
7. Deployment & CI (mostly keep — one real fix)
ADR-002's Gitea Actions + Azure Trusted Signing pipeline is modern and the audit rated the workflow code good. Two fixes:
- Wire the real deploy — replace
deploy.yml's echo stub with the actualssh/scpdeploy (deploy key in repo secrets; stop→copy→start to avoid ETXTBSY, RMM lesson). Production has been 57 commits stale because of this. - Chain
cargo auditintorelease.yml/deploy.yml(currently only on push-to-main) so a release can't ship with a new advisory. Keep: clippy-D warnings+ cargo-audit hard gates, conventional-commit versioning, git-cliff changelog +/api/changelog, fail-the-release-on-unsigned signing. Re-verify the Pluto/runner registration post any Gitea blip (audit coverage gap).
8. Security Posture (v2 closes every audit finding by design)
| v1 audit finding | v2 resolution | Section |
|---|---|---|
| CRITICAL: any JWT joins any session | session-scoped viewer tokens + session-claim match on WS | §3.2/§3.3 |
| CRITICAL: viewer WS blacklist bypass | blacklist enforced on every JWT path incl. WS | §3.3 |
| CRITICAL: JWT accepted as agent key | plane separation; per-agent keys; JWT branch deleted | §3.3 |
| HIGH: reusable support codes | single-use atomic consume + rate-limited validate + wider entropy | §3.2 |
| HIGH: no WS frame cap (OOM) | explicit max message/frame size; reject before broadcast | §3.2 |
| HIGH: rate limiting disabled/non-compiling | rebuilt working rate limiting on auth/validate | §3.1 |
| HIGH: admin password logged on fallback | never route secrets through the logger | CODING_GUIDELINES |
| MEDIUM: no input throttle/consent gate | bounded input queue + attended consent prompt | §3.2/§4.2 |
| shared AGENT_API_KEY anti-pattern | per-agent hashed keys (RMM Option 3) | §3.3 |
| flat schema, no tenancy | tenancy-ready tenant_id from day one |
§3.3 |
| broken web protobuf decoder | correct decoder; native-first | §4.3 |
| stale/unsigned production deploy | real deploy step + enforced signing | §7 |
9. Adopted GuruRMM Principles (provenance)
- Per-agent enrollment keys, no installer shared secret (RMM Option 3 /
DESIGN.md) → §3.3. - No TOML/config for endpoints — relay URL compiled in; identity in registry/baked at generation; server secrets in env/SOPS only (RMM permanent rule) → agent constraint, stated in v2
CLAUDE.md. - Living roadmap = definition-of-done (
DESIGN.md) → every capability flips a dated roadmap box in the same commit;/gc-auditis the backstop. - Holistic full-stack features (
DESIGN.md) → no headless capabilities; each proto message ships its UI (§3.5/§6). - Standalone works with zero upstream dependency (RMM "works without AI", GC ADR-001) → §1, Decision 3.
- Versioned integration contract + capability discovery (RMM ADR-008 / GC ADR-001) → §6 Phase 3.
- True integration, anti-Datto (RMM roadmap design principles) → one-click RMM→GC launch, embedded viewer, dual-side audit (§6 Phase 3).
- Multi-tenancy partner/client model (RMM ADR-001) → designed-for in §3.3, activated Phase 4.
- Error/sqlx/auth conventions (
CODING_GUIDELINES) → anyhow/thiserror/tracing, runtime sqlx, idempotent migrations, JWT/Argon2, hashed short-lived tokens.
10. Resolved Decisions (2026-05-29, Mike)
- Repo: clean architectural reset in-place in
guru-connect— keeps git history, CI, proto, and Gitea issues. v1 code is removed/relocated as v2 modules land. - Codec default: H.264 (universal decode, low-latency tooling) is the default; HEVC is opt-in per session for bandwidth-constrained links.
- Web viewer transport (Phase 2): protobuf-over-WSS first (reuses the agent's encode path), with WebRTC added later as the latency upgrade.
- Support-code format: widen to a higher-entropy, human-readable code + rate-limit/lockout on the validate endpoint — replaces the brute-forceable 6-digit numeric.
- v1 cutover: clean wholesale replacement. No real clients on the instance, so v2 replaces v1 at connect.azcomputerguru.com once Phase 1 is live — no data migration; v1 is archived, not migrated.
Next Steps
- Confirm the Open Questions (esp. repo reset in-place, H.264 default, cutover).
- Flip the v1
FEATURE_ROADMAP.mdto a v2 structure mirroring these phases (or supersede it). /shape-specPhase 1 (the secure session core) into an implementable plan.- Begin Phase 0 foundation reset.