docs(spec): add SPEC-002 GuruConnect v2 modernization architecture
Some checks failed
Build and Test / Build Agent (Windows) (push) Successful in 6m34s
Build and Test / Build Server (Linux) (push) Has started running
Build and Test / Security Audit (push) Has been cancelled
Build and Test / Build Summary (push) Has been cancelled

Ground-up v2 re-architecture decided 2026-05-29 (Mike), grounded in the
2026-05-29 audit + adopted GuruRMM design principles. Greenfield salvaging
proven Rust cores (DXGI/GDI capture, input injection, SAS helper, prost codec,
CI). Native-first full key fidelity (Win+R/Ctrl+Alt+Del) + bidirectional file
transfer (clipboard cut/paste + drag-and-drop) as headline differentiators;
WebRTC fallback only. Hardened single-tenant, tenancy-ready schema. Standalone-
first + /api/integration/v1 RMM contract. Closes all audit CRITICALs by design.
Open decisions resolved: in-place repo reset, H.264 default, WSS-first web
transport, widened support codes, clean v1 cutover (no client migration).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 18:08:23 -07:00
parent 486debfc52
commit 5c60a105c0

View File

@@ -0,0 +1,280 @@
# SPEC-002: GuruConnect v2 — Modernization Architecture
**Status:** Approved — v2 direction + open decisions locked 2026-05-29; phases pending shape-spec
**Priority:** P1 (foundational — supersedes the v1 architecture)
**Requested By:** Mike Swanson (2026-05-29)
**Estimated Effort:** X-Large (multi-month, phased)
**Supersedes:** the v1 status docs (`PHASE1_*`, `DEPLOYMENT_*`, `WEEK1_*`, `GAP_ANALYSIS.md`) — archive on adoption
**Reference material:** [`reports/2026-05-29-gc-audit.md`](../../reports/2026-05-29-gc-audit.md) (the v1 audit), GuruRMM `docs/DESIGN.md` + `docs/ARCHITECTURE_DECISIONS.md`, `.claude/CODING_GUIDELINES.md`, [SPEC-001](SPEC-001-operational-tooling-parity.md), [`specs/native-remote-control/`](../../specs/native-remote-control/), [ADR-001](../ARCHITECTURE_DECISIONS.md), [ADR-002](../ARCHITECTURE_DECISIONS.md)
---
## Why v2
v1 was built with an older model and the seams show. The 2026-05-29 audit found the product's **threat surface — the remote-control relay plane — carries three independent CRITICAL auth failures** (any-JWT session hijack, viewer-WS revocation bypass, JWT-accepted-as-agent-key), the web viewer's "protobuf" decoder is a fabricated layout that cannot decode a single real frame, and production has been running 57 commits stale on a likely-unsigned agent because the deploy step is a stub. At the same time, the audit confirmed the **hardware-facing Rust is sound**: DXGI/GDI capture, input injection, and the prost wire codec are correct. So this is not a rewrite born of failure — it's a deliberate reset that **keeps the hard-won Windows-internals code and rebuilds everything above it** to be secure, full-fidelity, modern, and aligned with GuruRMM's design discipline.
### The four decisions (set by the product owner, 2026-05-29)
| # | Decision | Choice | Consequence |
|---|----------|--------|-------------|
| 1 | Rewrite scope | **Greenfield, salvage cores** | Clean architecture; reuse only proven Rust (capture, input, SAS, prost codec, proto, CI). Rebuild relay/auth, session, viewer, dashboard, deploy. |
| 2 | Transport + viewer | **Native-first on custom protobuf-over-WSS; WebRTC secondary** | Full key fidelity (Win+R, Ctrl+C/V, Ctrl+Alt+Del) is non-negotiable and browsers can't deliver it — see §4. Web/WebRTC is the no-install fallback with documented limits. |
| 3 | RMM relationship | **Standalone-first + versioned contract** | GC ships/sells independently; RMM integrates via the semver'd `/api/integration/v1/` contract (ADR-001, native-remote-control spec). |
| 4 | Auth + tenancy | **Hardened single-tenant now, multi-tenancy-ready schema** | Fix the relay CRITICALs by design; carry nullable `tenant_id` from day one so the RMM partner/client model switches on later with no migration rewrite. |
### What makes v2 "considerably better"
1. **Secure by design** — the relay plane is rebuilt around per-agent keys, per-session viewer authorization, and blacklist-checked tokens; the v1 CRITICALs are structurally impossible.
2. **Full-fidelity input** — native viewer captures and forwards every key combo including the Windows key and Ctrl+Alt+Del (§4). This is the headline product differentiator and the owner's stated priority.
3. **Modern video** — hardware H.264/HEVC encode (Media Foundation/NVENC/QuickSync) replaces raw-BGRA+Zstd as the default, with raw+Zstd kept as the universal fallback (§5).
4. **Complete UI** — every capability ships its dashboard + viewer surface (RMM's full-stack rule); the audit's pile of orphaned APIs (machines, releases, client-access) gets real UI.
5. **Clean RMM integration** — the `/api/integration/v1/` contract + embedded viewer is designed in from day one, not retrofitted.
6. **Operationally honest** — real deploy, enforced signing, living-roadmap definition-of-done (already established by SPEC-001/ADR-002; v2 keeps it).
---
## 1. Scope
### In scope (v2 foundation, Phases 0-3)
Clean re-architecture of agent, relay server, session/auth model, native viewer, dashboard, and protocol; hardware video codecs; the secure session lifecycle; the standalone product end-to-end; the RMM integration contract surface.
### Explicitly out of scope (deferred, not cancelled)
- **Full multi-tenancy activation** (partner→client isolation, dev impersonation) — schema is *ready* in v2; the model is switched on in a later phase (§6, Phase 4).
- **WebRTC as primary** — secondary/fallback only; the no-install web viewer ships in Phase 2 but native is the supported full-fidelity path.
- **Session recording, BACKSTAGE mode** — modeled in the proto, deferred to post-foundation (each ships full-stack when prioritized). _(File transfer via clipboard cut/paste + drag-and-drop is **NOT** deferred — it is a named core differentiator, §4.4.)_
- **macOS/Linux agents** — Windows-first remains (REQUIREMENTS.md); the server/dashboard are cross-platform-clean but the agent is Windows.
### Success criteria
- A technician starts an attended session via a support code, the **end user sees and accepts a consent prompt**, and the technician has full keyboard/mouse control **including Win+R, Ctrl+C/V across the clipboard, and Ctrl+Alt+Del** — with zero relay-plane auth holes (re-audited clean).
- A technician can **cut/copy a file on either side and paste it on the other**, and **drag files into and out of the session window** (guest↔host), with chunked transfer, sha256 integrity, progress, and an audit trail (§4.4).
- An unattended/managed agent enrolls with a **per-agent key** (no shared secret), and a revoked key or revoked viewer token is rejected **on the WebSocket**, not just on REST.
- Video defaults to **hardware H.264** with automatic fallback to raw+Zstd on unsupported hardware (Win7), negotiated per session.
- RMM can **pre-create a session and embed the viewer** via `/api/integration/v1/` without GC granting RMM any standing credential.
---
## 2. Salvage / Scrap Ledger
The audit is the authority for what's proven. **Greenfield architecture, but these cores move over largely intact.**
### SALVAGE (audit-confirmed correct — reuse, don't rewrite)
| Component | Path (v1) | Notes |
|-----------|-----------|-------|
| DXGI screen capture | `agent/src/capture/dxgi.rs` | Primary capture path. Wrap in new module structure. |
| GDI capture fallback | `agent/src/capture/gdi.rs` | Win7 / no-DXGI fallback. Keep. |
| Multi-display capture | `agent/src/capture/display.rs` | Feeds the proto `SwitchDisplay` capability (finish full-stack). |
| Input injection | `agent/src/input/{mouse,keyboard}.rs` | Keep; **extend** for scan-code + extended-key fidelity (§4). |
| SAS helper (Ctrl+Alt+Del) | `agent/src/bin/sas_service.rs` | The privileged SendSAS path. Keep — it's the only way to deliver Ctrl+Alt+Del. |
| Protocol schema | `proto/guruconnect.proto` | Well-modeled (§3). Extend (Consent, tenant fields); the prost wire layer is correct. |
| Correct protobuf parser | `server/static/viewer.html:196-489` | Real varint/length-delimited parser — reference (and interim web viewer) until the v2 web viewer lands. |
| CI/CD + signing | `.gitea/workflows/{build-and-test,release}.yml`, ADR-002, SPEC-001 | Modern, gated, hard-fail clippy+audit, Azure Trusted Signing. Keep wholesale. |
| Version embedding | `agent/build.rs` | Keep. |
| DB conventions | migrations style, UUID PKs, TIMESTAMPTZ | Keep; new schema, same conventions. |
### SCRAP / REBUILD (audit found broken or structurally unsafe)
| Component | Path (v1) | Why |
|-----------|-----------|-----|
| Relay auth | `server/src/relay/mod.rs:224-288` | The 3 CRITICALs. Rebuild the entire auth model (§3.3). |
| Rate limiting | `server/src/middleware/rate_limit.rs` | Non-compiling, disabled. Rebuild. |
| Web "protobuf" codec | `dashboard/src/lib/protobuf.ts` | Fabricated fixed-offset layout, wire-incompatible. Delete; rebuild web viewer (§4.3). |
| React component lib | `dashboard/src/` | Dead/divergent. Rebuild as a real dashboard (§4.4) covering the orphaned APIs. |
| Deploy step | `.gitea/workflows/deploy.yml:71-78` | Echo stub. Wire real SSH deploy (§7). |
| Flat session/machine model | `server/src/{session,db}/` | No per-agent keys, no per-session authz, no tenant column. Rebuild (§3). |
| Shared `AGENT_API_KEY` | `server/src/auth/`, `CLAUDE.md` | The anti-pattern RMM eliminated. Replace with per-agent keys (§3.3). |
### ARCHIVE (stale v1 status docs — move to `docs/archive/v1/`)
`PHASE1_*.md`, `PHASE2_*.md`, `DEPLOYMENT_*.md`, `WEEK1_*.md`, `GAP_ANALYSIS.md`, `MASTER_ACTION_PLAN.md`, `CHECKPOINT_*.md`, `PROJECT_OVERVIEW.md` (2026-01-17, stale). v2 tracks state only in `docs/FEATURE_ROADMAP.md` + `docs/ARCHITECTURE_DECISIONS.md` + `docs/specs/` + `TECHNICAL_DEBT.md` (RMM living-roadmap discipline). The `SEC*_AUDIT.md` files stay as point-in-time security history.
---
## 3. Architecture
### 3.1 Repo & module structure (greenfield layout, RMM-aligned)
**Decision (2026-05-29):** clean architectural reset **in the existing `guru-connect` repo** (preserve git history, the good CI, the proto, the Gitea issues) rather than a brand-new repo — "greenfield" applies to the architecture, not the repo identity. Old code is removed/relocated as the v2 modules land; salvaged crates move into the new layout.
```
guru-connect/
├── agent/ # Windows agent + native viewer (single static .exe)
│ ├── src/
│ │ ├── capture/ # [salvage] dxgi, gdi, display
│ │ ├── input/ # [salvage+extend] scan-code injection, extended keys, modifier hygiene
│ │ ├── encoder/ # [rebuild] HW H.264/HEVC (Media Foundation) + raw/zstd fallback
│ │ ├── viewer/ # [rebuild] native viewer: low-level kbd hook, decode, render
│ │ ├── transport/ # [salvage codec, rebuild auth handshake] prost over WSS
│ │ ├── session/ # [rebuild] session state, consent, reconnect
│ │ ├── clipboard/ # [new] bidirectional clipboard sync (text/html/image)
│ │ ├── filetransfer/ # [new] CF_HDROP cut/paste + drag in/out, chunked engine (§4.4)
│ │ ├── tray/ install.rs # [salvage] tray, guruconnect:// handler, auto-install
│ │ └── bin/sas_service.rs# [salvage] Ctrl+Alt+Del SYSTEM helper
│ └── build.rs # [salvage] version embed
├── server/ # Linux relay (Axum + PostgreSQL/sqlx)
│ ├── src/
│ │ ├── relay/ # [rebuild] secure agent/viewer WS handlers, frame caps
│ │ ├── session/ # [rebuild] session lifecycle, per-session authz
│ │ ├── auth/ # [rebuild] per-agent keys, session-scoped viewer tokens, blacklist
│ │ ├── api/ # [rebuild] REST: machines, sessions, codes, releases, users, changelog
│ │ ├── integration/ # [new] /api/integration/v1/ contract (RMM broker)
│ │ ├── middleware/ # [rebuild] working rate limiting, security headers, framing allowlist
│ │ └── db/ # [rebuild] schema with tenant_id, agent keys
│ ├── migrations/ # fresh v2 schema (idempotent)
│ └── static/ # interim hardened web viewer (until SPA)
├── dashboard/ # [rebuild] real web dashboard (machines/sessions/codes/releases UI)
├── proto/guruconnect.proto # [salvage+extend]
└── docs/ (specs, ADRs, roadmap) · reports/ · TECHNICAL_DEBT.md
```
### 3.2 Session model (rebuilt — the secure core)
Two session origins, one secure lifecycle:
- **Attended** (support code): technician generates a code → end user runs the lightweight one-time agent (code baked in, user-space, no admin, self-deleting) → agent connects, **server shows the technician the pending session only after the end user accepts a consent prompt on their machine** (new `ConsentRequest`/`ConsentResponse`, §3.5).
- **Unattended** (managed agent): persistent agent enrolled with a **per-agent key**; technician opens a session from the dashboard; consent policy is per-tenant (silent for managed endpoints, or notify-only).
Session security invariants (each fixes an audit CRITICAL/HIGH):
1. **Per-session viewer authorization.** A viewer connects with a **session-scoped, short-lived JWT** minted only by an authenticated+authorized request for *that* session/machine. The WS handler verifies signature **+ expiry + blacklist + the session claim matches the requested `session_id`**. (Fixes "any JWT joins any session" + "blacklist bypass.")
2. **Plane separation.** Agent auth = per-agent key **or** support code, **never a user JWT**. The `validate_agent_api_key` JWT branch is deleted. (Fixes "JWT-as-agent-key.")
3. **Single-use codes.** A support code is consumed atomically on first agent bind; a second presenter is rejected; the validate endpoint is rate-limited and the code space widened (high-entropy, human-readable). (Fixes "reusable support codes.")
4. **Bounded relay.** Explicit `max_message_size`/`max_frame_size` on both WS upgrades; oversized frames rejected before broadcast. Input events rate-limited server-side with a bounded queue. (Fixes "no frame cap / no input throttle.")
5. **Reconcile on restart.** Managed sessions persist in DB and reconcile on server restart so they aren't orphaned (native-remote-control prior art).
### 3.3 Auth model (rebuilt — hardened single-tenant, tenancy-ready)
| Plane | Credential | Verification |
|-------|-----------|--------------|
| Dashboard user | JWT (Argon2id login), blacklist on logout | full verify on every request **and every WS** |
| Viewer (per session) | session-scoped short-lived JWT (~5 min) | signature + expiry + blacklist + session-claim match |
| Agent (managed) | per-agent hashed key (`cak_…`), `connect_agent_keys(revoked_at)` | hash compare; revocation immediate |
| Agent (attended) | single-use support code | atomic consume; rate-limited validate |
| Server↔server (RMM) | `CONNECT_INTEGRATION_KEY` (env/SOPS) | on `/api/integration/v1/*` only |
**Tenancy-ready, not tenant-yet:** every tenant-scoped table (`machines`, `sessions`, `support_codes`, `events`, `connect_agent_keys`, `users`) carries a nullable `tenant_id` defaulting to a single bootstrap tenant. Queries route through a tenancy helper that today resolves to the default tenant. Switching on multi-tenancy (Phase 4) = populate `tenant_id`, flip the helper to enforce `WHERE tenant_id = :current`, add the partner/client impersonation layer (RMM ADR-001 model) — **no schema rewrite**. This honors "hardened single-tenant first" while refusing to paint into a corner.
### 3.4 Database schema (fresh v2 migrations, conventions per RMM/CODING_GUIDELINES)
Idempotent migrations, UUID PKs, `TIMESTAMPTZ`, soft-delete, FK `ON DELETE CASCADE`. Key new/changed tables: `connect_agent_keys` (per-agent hashed keys + `revoked_at`), `tenant_id` columns everywhere, `sessions` gains `is_managed`/`source`/`consent_state`/`tenant_id`, `support_codes` gains single-use consume semantics. Runtime `sqlx::query()` for new queries (CODING_GUIDELINES), server applies migrations on startup, `_sqlx_migrations` respected.
### 3.5 Protocol v2 (`proto/guruconnect.proto` — salvage + extend)
The v1 proto already models video (raw/vp9/h264/h265 + dirty-rects), full input (VK/scan/unicode + modifiers, SpecialKey for Ctrl+Alt+Del/Lock/PrintScreen), cursor, clipboard (text/html/rtf/image/files), quality presets + codec preference, multi-display, chat, heartbeat, admin, and auto-update. **Additions for v2:**
- `ConsentRequest` / `ConsentResponse` — attended-mode consent (the missing trust primitive).
- Optional `tenant_id` on `SessionRequest`/`AgentStatus` for the tenancy-ready model.
- Negotiated codec in `SessionResponse` (server tells the viewer which encoding to expect; agent advertises HW-encode capability in `AgentStatus`).
- **File-transfer messages** — `FileOfferStart` / `FileChunk` / `FileOfferComplete` / `FileTransferAck` / `FileTransferCancel`, plus clipboard-files (`CF_HDROP`) and drag signaling (`DragEnter`/`DragDrop`) — promoting the `FILE_TRANSFER` enum to a real subsystem (§4.4), carried on its own logical stream.
- Wire the already-modeled-but-undispatched variants (`SwitchDisplay`, `QualitySettings`, `UpdateStatus`) end-to-end (RMM full-stack rule — no headless capabilities).
---
## 4. Full-Fidelity Input, Clipboard & File Transfer — the headline features (§ owner priority)
Browsers cannot deliver full keyboard control; this is *why* native is primary and WebRTC is secondary. v2 treats input fidelity, clipboard, and file transfer as first-class, interlocking subsystems — the owner specifically named **keyboard hooks** and **cut/paste + drag files from/to either guest or host** as favorite ScreenConnect behaviors to match.
### 4.1 Native viewer capture (the technician side)
- **Low-level keyboard hook** (`WH_KEYBOARD_LL`) installed while the viewer is focused/in control, so the viewer intercepts system combos — **Win/Meta key, Win+R, Win+E, Alt+Tab, Ctrl+Esc***before* the local shell consumes them, and forwards them as `KeyEvent` (VK + scan code + extended flag) instead of letting the local OS act.
- A **"send system keys to remote" toggle** (default on when the viewer is in fullscreen/focused control), so the hook only diverts combos during active control.
- **Modifier-state hygiene:** track modifier up/down and **re-sync on focus loss** (release stuck modifiers) — eliminates the classic "remote thinks Ctrl is still held" bug.
### 4.2 Agent injection (the remote side)
- **Scan-code injection** (`SendInput` with `KEYEVENTF_SCANCODE`, correct `KEYEVENTF_EXTENDEDKEY` for right-Ctrl/Alt, arrows, Win, Insert/Home/etc.) — layout-independent, works in games/secure-desktop-adjacent apps where VK injection fails.
- **Ctrl+Alt+Del** → `SpecialKeyEvent.CTRL_ALT_DEL` → the salvaged **SAS service** (`SendSAS`, runs as SYSTEM, requires `SoftwareSASGeneration` policy which the managed installer sets). This is the *only* reliable path and a key differentiator vs browser tools.
- **Clipboard sync** (`Ctrl+C`/`Ctrl+V` value): bidirectional `ClipboardData` sync — text/html/image (Phase 1) **and files via `CF_HDROP`** (the file-transfer subsystem, §4.4) — so copy on either side is available on the other. The keystrokes inject, and the clipboard content actually transfers. Full-stack: a clipboard-activity indicator in the viewer.
### 4.3 Web/WebRTC secondary viewer (the no-install fallback) — honest limits
- Uses the **Keyboard Lock API** (`navigator.keyboard.lock()`) + Fullscreen + Pointer Lock to capture as many keys as the browser allows (Escape, Tab, most combos).
- **Cannot** deliver Ctrl+Alt+Del (browser-impossible) or reliably the Meta/Win key. Compensates with an **on-screen special-keys toolbar** (Ctrl+Alt+Del button → `SpecialKeyEvent`; Win, Win+R, Alt+Tab macros) — the same idea the v1 `SessionControls` gestured at, done properly.
- Transport: either the same protobuf-over-WSS (with a *correct* decoder ported from `viewer.html`, not the fabricated one) **or** a WebRTC track fed by the agent's H.264 encoder (§5) for lower latency. Decided in Phase 2; WSS is the simpler first cut.
- Clearly labeled in-product as the **quick-access** path; the native viewer is the **full-control** path.
---
### 4.4 File Transfer — clipboard cut/paste + drag-and-drop, bidirectional (§ owner favorite)
A named must-have: the ScreenConnect behavior where you **cut/copy a file and paste it across**, or **drag a file into or out of the session window**, in either direction (guest↔host). It rides on the clipboard + input subsystems but needs a real chunked transfer channel underneath. v1 modeled it as an enum value only (`SessionType.FILE_TRANSFER`); v2 builds it for real and treats it as a **core differentiator, not a deferred panel**.
**Two entry paths, one transfer engine:**
- **Clipboard file transfer (Ctrl+C → Ctrl+V):** detect `CF_HDROP` on the source clipboard; instead of copying bytes eagerly, place a **delayed-render (virtual) clipboard promise** on the destination so the transfer fires only when the user actually *pastes*. On paste, stream the bytes over the transfer channel and reconstruct a real `CF_HDROP` on the destination so Explorer/apps receive actual files. (Eager copy of a multi-GB selection on every Ctrl+C would be unusable — delayed render is the correct design and what ScreenConnect does.)
- **Drag-in (local → remote):** the viewer window registers an OS drop target (`IDropTarget` / `WM_DROPFILES`); files dropped onto the session stream to the remote and a synthetic drop is delivered at the drop point on the remote desktop.
- **Drag-out (remote → local):** detect a drag originating on the remote and present the local drag operation a **delayed-render `IDataObject`** that fetches bytes from the remote on drop. This is the genuinely hard case — **ships after drag-in**, behind the same engine.
**Transfer engine (shared by all three paths):**
- New protocol messages (§3.5): `FileOfferStart` (name, size, sha256, count, relative path), `FileChunk` (offset, bytes), `FileOfferComplete`, `FileTransferAck`/`Cancel`, plus clipboard-files and drag signaling. Runs as its **own logical stream** so a large transfer never blocks video or input.
- **Chunked + backpressured** (bounded in-flight window), directory-aware (relative paths preserved), **sha256 integrity per file**, cancellable, with resume-friendly framing.
- **Security & governance:** moving files across a trust boundary is sensitive. Transfer is a **per-session capability gated by policy/consent** (default on for attended-with-consent; per-tenant policy for managed endpoints), **fully audited** (who, direction, filenames, sizes, sha256), with configurable per-tenant size/type limits. The audit lands in the `events`/audit trail and surfaces in the dashboard.
- **Full-stack:** a transfer panel + per-file progress in the viewer; a transfer/audit view in the dashboard (RMM full-stack rule — no headless capability).
## 5. Video Pipeline Modernization
v1 ships only raw BGRA + Zstd — CPU-heavy and bandwidth-heavy. v2:
- **Default: hardware H.264/HEVC** via Windows **Media Foundation** (transparently uses NVENC/AMF/QuickSync where present). Emits the proto's already-modeled `EncodedFrame` (h264/h265). Native viewer decodes via MF/D3D11.
- **Fallback: raw BGRA + Zstd + dirty-rects** (the salvaged path) for Win7 or machines with no HW encoder — negotiated at session start via `AgentStatus` capability + `SessionResponse` codec selection.
- **One encode, two consumers:** the same H.264 stream feeds the native viewer (WSS `EncodedFrame`) and, in Phase 2, the WebRTC track — so the codec investment serves both viewer surfaces.
- Quality presets (`QualitySettings` auto/low/balanced/high, custom fps/bitrate) wired end-to-end with a viewer control (full-stack).
---
## 6. Phased Roadmap
Each phase ships full-stack (proto + agent + server + viewer + dashboard + docs + roadmap flip). Phases are sequential; each ends re-audit-clean via `/gc-audit`.
- **Phase 0 — Foundation reset.** New module layout in-repo; salvage core crates; proto v2 (Consent, tenant fields, codec negotiation); fresh v2 schema (tenancy-ready); CI/signing already in place (keep). Archive stale docs.
- **Phase 1 — Secure session core (the heart).** Rebuilt auth/session (per-agent keys, session-scoped viewer tokens, blacklist-on-WS, plane separation, single-use codes, frame caps, input throttle, real rate limiting); attended consent; **native viewer end-to-end with full key fidelity** (low-level hook, scan-code injection, SAS, clipboard sync); **HW H.264 + raw/zstd fallback**. *Exit: a clean attended + unattended session with full control and zero relay CRITICALs.*
- **Phase 2 — File transfer, dashboard + web viewer.** **File transfer — clipboard cut/paste (`CF_HDROP` delayed render) + drag-in, bidirectional, with progress + audit (§4.4); drag-out follows behind the same engine.** Real dashboard covering the audit's orphaned surfaces (machines inventory/history/delete, releases mgmt, client-access, sessions detail); hardened web viewer (correct decoder, special-keys toolbar, Keyboard Lock); WebRTC secondary path.
- **Phase 3 — RMM integration contract.** `/api/integration/v1/` + `GET /api/integration/capabilities` + programmatic session pre-create + embedded `viewer.html?embed=1` with scoped `frame-ancestors` allowlist (native-remote-control spec, made core). One-click launch from an RMM agent detail page (true-integration, anti-Datto).
- **Phase 4 — Multi-tenancy switch-on (deferred).** Activate `tenant_id` enforcement; partner→client isolation + dev impersonation (RMM ADR-001 model); SSO. No schema rewrite (Phase 0 made it ready).
---
## 7. Deployment & CI (mostly keep — one real fix)
ADR-002's Gitea Actions + Azure Trusted Signing pipeline is modern and the audit rated the workflow code good. **Two fixes:**
1. **Wire the real deploy** — replace `deploy.yml`'s echo stub with the actual `ssh`/`scp` deploy (deploy key in repo secrets; stop→copy→start to avoid ETXTBSY, RMM lesson). Production has been 57 commits stale because of this.
2. **Chain `cargo audit`** into `release.yml`/`deploy.yml` (currently only on push-to-main) so a release can't ship with a new advisory.
Keep: clippy `-D warnings` + cargo-audit hard gates, conventional-commit versioning, git-cliff changelog + `/api/changelog`, fail-the-release-on-unsigned signing. Re-verify the Pluto/runner registration post any Gitea blip (audit coverage gap).
---
## 8. Security Posture (v2 closes every audit finding by design)
| v1 audit finding | v2 resolution | Section |
|------------------|---------------|---------|
| CRITICAL: any JWT joins any session | session-scoped viewer tokens + session-claim match on WS | §3.2/§3.3 |
| CRITICAL: viewer WS blacklist bypass | blacklist enforced on every JWT path incl. WS | §3.3 |
| CRITICAL: JWT accepted as agent key | plane separation; per-agent keys; JWT branch deleted | §3.3 |
| HIGH: reusable support codes | single-use atomic consume + rate-limited validate + wider entropy | §3.2 |
| HIGH: no WS frame cap (OOM) | explicit max message/frame size; reject before broadcast | §3.2 |
| HIGH: rate limiting disabled/non-compiling | rebuilt working rate limiting on auth/validate | §3.1 |
| HIGH: admin password logged on fallback | never route secrets through the logger | CODING_GUIDELINES |
| MEDIUM: no input throttle/consent gate | bounded input queue + attended consent prompt | §3.2/§4.2 |
| shared AGENT_API_KEY anti-pattern | per-agent hashed keys (RMM Option 3) | §3.3 |
| flat schema, no tenancy | tenancy-ready `tenant_id` from day one | §3.3 |
| broken web protobuf decoder | correct decoder; native-first | §4.3 |
| stale/unsigned production deploy | real deploy step + enforced signing | §7 |
---
## 9. Adopted GuruRMM Principles (provenance)
- **Per-agent enrollment keys, no installer shared secret** (RMM Option 3 / `DESIGN.md`) → §3.3.
- **No TOML/config for endpoints** — relay URL compiled in; identity in registry/baked at generation; server secrets in env/SOPS only (RMM permanent rule) → agent constraint, stated in v2 `CLAUDE.md`.
- **Living roadmap = definition-of-done** (`DESIGN.md`) → every capability flips a dated roadmap box in the same commit; `/gc-audit` is the backstop.
- **Holistic full-stack features** (`DESIGN.md`) → no headless capabilities; each proto message ships its UI (§3.5/§6).
- **Standalone works with zero upstream dependency** (RMM "works without AI", GC ADR-001) → §1, Decision 3.
- **Versioned integration contract + capability discovery** (RMM ADR-008 / GC ADR-001) → §6 Phase 3.
- **True integration, anti-Datto** (RMM roadmap design principles) → one-click RMM→GC launch, embedded viewer, dual-side audit (§6 Phase 3).
- **Multi-tenancy partner/client model** (RMM ADR-001) → designed-for in §3.3, activated Phase 4.
- **Error/sqlx/auth conventions** (`CODING_GUIDELINES`) → anyhow/thiserror/tracing, runtime sqlx, idempotent migrations, JWT/Argon2, hashed short-lived tokens.
---
## 10. Resolved Decisions (2026-05-29, Mike)
1. **Repo:** clean architectural reset **in-place** in `guru-connect` — keeps git history, CI, proto, and Gitea issues. v1 code is removed/relocated as v2 modules land.
2. **Codec default:** **H.264** (universal decode, low-latency tooling) is the default; **HEVC is opt-in per session** for bandwidth-constrained links.
3. **Web viewer transport (Phase 2):** **protobuf-over-WSS first** (reuses the agent's encode path), with **WebRTC added later** as the latency upgrade.
4. **Support-code format:** **widen to a higher-entropy, human-readable code + rate-limit/lockout** on the validate endpoint — replaces the brute-forceable 6-digit numeric.
5. **v1 cutover:** **clean wholesale replacement.** No real clients on the instance, so v2 replaces v1 at connect.azcomputerguru.com once Phase 1 is live — **no data migration**; v1 is archived, not migrated.
---
**Next Steps**
1. Confirm the Open Questions (esp. repo reset in-place, H.264 default, cutover).
2. Flip the v1 `FEATURE_ROADMAP.md` to a v2 structure mirroring these phases (or supersede it).
3. `/shape-spec` Phase 1 (the secure session core) into an implementable plan.
4. Begin Phase 0 foundation reset.