spec: add SPEC-017 end-user (sub-user) remote access

2026-06-02 12:56:15 -07:00
parent 367906bd54
commit 4c49b73a71
2 changed files with 181 additions and 0 deletions
--- a/docs/FEATURE_ROADMAP.md
+++ b/docs/FEATURE_ROADMAP.md
@@ -95,6 +95,7 @@ Bringing GC to parity with GuruRMM's release engineering. Full plan: [SPEC-001](
 - [ ] **Valuable error messages (structured errors + no silent swallows)** — P2 — one structured API error envelope with stable codes + a correlation id that also lands in the logs; contextual tracing on server/agent; sweep the 37 `let _ =` swallows (the pattern that hid the migration-005 bug); dashboard surfaces the real cause + id instead of a generic line. **[→ v2 Phase 0/1 conventions]** ([SPEC-008](specs/SPEC-008-valuable-error-messages.md))
 - [ ] **Feature-rich, fully-documented management API** — P2 — everything the console can do, callable by API: OpenAPI 3.x generated from code (utoipa) + browsable docs at `/api/docs`, long-lived revocable scoped API tokens (PAT-style, distinct from the 24h JWT + agent keys), an API-completeness gap audit, and consistent pagination/error conventions. Distinct from the ADR-001 RMM integration contract. **[→ v2 Phase 3]** ([SPEC-009](specs/SPEC-009-feature-rich-documented-api.md))
 - [ ] **Branding and white-label configuration** — P2 — Allow MSPs to customize logo, colors, and product name for white-labeled remote support. Dashboard admin settings page with logo upload (PNG/SVG, max 2MB), brand hue slider (OKLCH 0-360°, default 184=cyan), product name override, company name, and favicon. Agent tray tooltip uses custom product name from registry. Singleton database table with public GET endpoint for unauthenticated rendering. CSS variables (`--brand-hue`, `--accent`, `--panel`) for dynamic theming. **[→ v2 Phase 2]** ([SPEC-014](specs/SPEC-014-branding-whitelabel.md))
+- [ ] **End-user (sub-user) remote access** — P2 (may be P3) — let a client pay for their employees to reach their *own* machines from home: a deny-by-default `end_user` login role, a locked-down end-user portal listing only granted machines, and Connect reusing the existing session-scoped viewer-token + relay path. Grant primitive already exists (`user_client_access`, migration 002); directory sync (AD/Entra/Google) is a separate future spec. **[→ new capability, post v2-console]** ([SPEC-017](specs/SPEC-017-end-user-remote-access.md))
 - [ ] Programmatic session pre-create + viewer-token (integration contract) — P2

 ## Security & Infrastructure
--- a/docs/specs/SPEC-017-end-user-remote-access.md
+++ b/docs/specs/SPEC-017-end-user-remote-access.md
@@ -0,0 +1,180 @@
+# SPEC-017: End-User (Sub-User) Remote Access
+
+**Status:** Proposed
+**Priority:** P2 (may settle to P3 depending on client demand)
+**Requested By:** Mike (2026-06-02)
+**Estimated Effort:** Large
+
+## Overview
+
+Let a client pay for their own employees to remotely reach **their own work machines** from home
+through GuruConnect — the Splashtop-Business / unattended-end-user-access model, layered on top of the
+MSP-technician console GuruConnect ships today. An MSP admin (or, later, a delegated client-company
+admin) provisions a list of **end-users** and grants each one access to specific managed machines. The
+end-user signs into a locked-down **end-user portal**, sees only the machines granted to them, and
+connects — reusing the existing persistent-agent + session-scoped-viewer-token + relay path.
+
+Success criteria: an `end_user`-role account can log in at a separate portal, see exactly the machines
+in its grant set (and no others, across no other tenant), launch a control session to an online granted
+machine, and is hard-denied from every technician/admin API, the agent plane, and any machine it was
+not granted — with each login and machine access written to the audit log.
+
+This is a net-new **sellable capability**, not a console-MVP blocker. It is sequenced after the v2
+console foundations it depends on (tenancy, machine identity, persistent enrollment), which is why it is
+P2 rather than P1.
+
+## Scope
+
+### Included in v1
+- A new **`end_user`** value for `users.role`, provisioned by an MSP admin, with **deny-by-default**
+  authority: no console permissions, no agent-plane access, machine reach limited strictly to its
+  `user_client_access` grant set within its own tenant.
+- A **separate end-user login + portal** route (locked-down): lists only granted machines with
+  online/offline state and a Connect action. No admin nav, no other users/machines/companies.
+- **Admin UI + API** to create/disable end-users and assign/revoke per-machine grants, reusing the
+  existing `user_client_access` table.
+- **Connect flow** that reuses the landed session-scoped viewer-token mechanism (`ViewerClaims`,
+  `jwt.rs:114`) and the relay enforcement path — no new transport.
+- A new `connect_sessions.source` value **`end_user`** (migration widening the existing CHECK).
+- **Audit**: end-user login success/failure and each machine-access grant-check written to
+  `connect_session_events`.
+- Rate limiting + lockout on the public end-user login.
+
+### Explicitly out of scope (v1)
+- **Directory sync (AD / Entra-365 / Google) → end-user list** — its own future spec; v1 is manual
+  list management only.
+- **Self-service seat purchasing / billing automation.** v1 records/counts seats per tenant; real
+  metering and Syncro/billing wiring is deferred.
+- **Delegated client-company-admin role** (a client managing its own end-users/grants) — noted as a
+  fast-follow; v1 grants are MSP-admin-managed.
+- Per-session view-only-vs-control *policy* per end-user (v1 = Control of one's own machine; the
+  `ViewerAccess` split still exists at the token layer).
+- File transfer, session recording (already out of scope for the broader product v1).
+
+## Architecture
+
+### Principal model — `end_user` is a constrained variant of the login plane
+GuruConnect already has three credential planes that must stay separate (audit-hardened in v2 Phase 1):
+1. **Login `Claims`** (`jwt.rs:11`) — dashboard users; `role ∈ {admin, operator, viewer}` today.
+2. **Session-scoped `ViewerClaims`** (`jwt.rs:114`) — 5-min, one session, `purpose=viewer`.
+3. **Agent `cak_` keys** (`connect_agent_keys`, migration 004) — agents only.
+
+`end_user` is added as a **fourth role on the login plane** — it issues a normal login JWT
+(`create_token`, `jwt.rs:161`) carrying `role: "end_user"` and an **empty permission list**. The
+separation guarantees the v2 audit established are preserved: an `end_user` JWT still cannot be used as
+a viewer token (lacks `purpose`) nor as an agent key (agent plane rejects user JWTs).
+
+**Critical authz inversion:** `user_client_access` today documents "no entries = access to all (for
+admins)" (migration 002, line 25-26). The grant check **must branch on role** — for `end_user`, an
+empty grant set means **zero** machines, never all. Authz is deny-by-default and grant-scoped; the
+admin-bypass in `Claims::has_permission` (`jwt.rs:28-33`) must never fire for `end_user`.
+
+### Agent / Relay-server / Viewer / Dashboard responsibilities
+- **Agent:** no changes. End-users connect to existing **persistent/unattended** managed agents
+  (consent `not_required` — it is the user's own machine). Optionally honors the SPEC-015 notification
+  overlay if a per-machine policy requires it.
+- **Relay-server:** no transport change. New end-user auth + portal + connect endpoints; the
+  grant-check + viewer-token mint is the only new server logic on the hot path.
+- **Viewer:** reuse the React/TS web viewer (`dashboard/src/components/RemoteViewer.tsx`) — the
+  end-user portal embeds the same component with a Control-mode viewer token.
+- **Dashboard:** new **role-gated end-user portal** route (recommended separate from the technician
+  console — see Open Questions), plus admin screens for end-user + grant management.
+
+### Database (migrations)
+- **`user_client_access`** — reused as the grant table; no schema change (already
+  `user_id UUID × client_id UUID → connect_machines(id)`, unique pair, migration 002).
+- New migration `011_end_user_access.sql`:
+  - Widen `connect_sessions.source` CHECK to `('standalone','gururmm','end_user')` (currently
+    `('standalone','gururmm')`, migration 004 line 99-102).
+  - Optional `users` columns for the external principal: `mfa_secret TEXT NULL`,
+    `must_change_password BOOLEAN NOT NULL DEFAULT false`, and a partial index for fast
+    `role='end_user'` listing per `tenant_id`.
+  - (Seat tracking, if landed in v1: a lightweight per-tenant `end_user` count view or a
+    `tenant_seats` row — kept minimal.)
+- Grants are tenant-contained: insert path validates `machine.tenant_id == end_user.tenant_id`.
+
+### API endpoints / WS messages
+- `POST /api/enduser/auth/login` — public, rate-limited; returns an `end_user` login JWT.
+- `GET  /api/enduser/machines` — lists only the caller's granted, in-tenant machines + presence.
+- `POST /api/enduser/machines/:id/connect` — grant-checked; creates a `source=end_user` session and
+  mints a Control `ViewerClaims` token (`create_viewer_token`, `jwt.rs:233`) for that session.
+- Admin: `POST /api/users` (role=end_user), `POST /api/users/:id/grants`,
+  `DELETE /api/users/:id/grants/:machine_id`, `GET /api/users?role=end_user`.
+- No new protobuf messages — the WS viewer path and `guruconnect.proto` are unchanged.
+
+## Implementation details
+- `server/src/auth/jwt.rs` — extend the role vocabulary doc (`Claims.role`, line 16-17); add an
+  `is_end_user()` helper and ensure `has_permission` cannot grant `end_user` anything beyond explicit
+  permissions (the admin short-circuit at line 30 must be guarded).
+- `server/src/auth/mod.rs` — `AuthenticatedUser` (line 29+) gains role-aware helpers; add an extractor
+  / middleware that rejects non-`end_user` on the `/api/enduser/*` namespace and rejects `end_user` on
+  every console/admin route (deny-by-default allowlist).
+- `server/src/api/` — new `enduser` handler module (login, machines, connect); admin user+grant
+  handlers extended for `role=end_user` and `user_client_access` writes.
+- Grant check (shared fn): `machine_id ∈ user_client_access[user] AND machine.tenant_id == user.tenant_id`;
+  used by both `GET /machines` and `connect`.
+- Session create stamps `source='end_user'`, `is_managed=true`/unattended, `consent_state='not_required'`,
+  then mints the viewer token via the existing path so relay enforcement is unchanged.
+- `dashboard/src/` — end-user portal route (role-gated), reusing `RemoteViewer.tsx`; admin grant-matrix
+  UI. White-label (SPEC-014) applies to the portal as the most client-facing surface.
+- Migration `server/migrations/011_end_user_access.sql` as above (idempotent; applied by
+  `sqlx::migrate!` per the migration standard).
+
+## Security considerations
+- **Preserve the plane separation** audited in v2 Phase 1 — `end_user` is login-plane only; it can
+  never satisfy `validate_viewer_token` or the agent `cak_` path.
+- **Deny-by-default, grant-scoped:** empty `user_client_access` for an `end_user` = no access; the
+  admin-bypass must not apply. Every `/api/enduser/*` call re-checks the grant + tenant server-side
+  (never trust a machine id from the client).
+- **Tenant containment:** an `end_user` and its grants live in one tenant; cross-tenant grants are
+  rejected at write and re-validated at connect. (Full tenant isolation lands with Phase 4; v1 enforces
+  via explicit `tenant_id` equality checks.)
+- **External-user trust:** these accounts are public-internet-facing from home. Require
+  rate-limiting + lockout on `/api/enduser/auth/login`; support (recommend require) **TOTP MFA** for
+  `end_user` — schema column included so MFA can be v1 or an immediate fast-follow without a second
+  migration. Argon2id passwords (existing standard).
+- **Audit:** log each end-user login (success/failure, source IP) and each machine access to
+  `connect_session_events`; the unattended access is to the user's *own* machine but must be fully
+  traceable. Optionally enforce the SPEC-015 overlay per machine policy.
+- **Threat model:** stolen end-user creds reach only that user's granted machines (blast radius =
+  grant set), never the console, never the agent plane, never another tenant. Disabling the account
+  (`users.enabled=false`) immediately revokes portal + future tokens; the 5-min viewer-token TTL bounds
+  any in-flight session.
+
+## Testing strategy
+- **Unit:** grant-check fn (granted / not-granted / cross-tenant / empty-set-for-end_user = deny);
+  `has_permission` never elevates `end_user`; role-namespace middleware (end_user→console = 403,
+  technician→/api/enduser = 403).
+- **Integration:** end-user login → list shows only granted machines → connect mints a Control viewer
+  token for a `source=end_user` session → relay admits; connect to a non-granted / other-tenant machine
+  → 403; disabled account → login + token use rejected.
+- **Manual:** full portal walkthrough from an off-network browser; MFA enrol + challenge; audit rows
+  present for login and access; white-label branding renders on the portal.
+
+## Effort estimate & dependencies
+- **Size:** Large (new principal + portal + admin grant UI + auth namespace; transport/agent untouched
+  and the grant table already exists, which holds it below X-Large).
+- **Depends on (must precede / strongly preferred):**
+  - **Tenancy** (`tenants` + `tenant_id`, migration 004) — needed for containment; full isolation is
+    Phase 4 but v1 uses explicit tenant checks.
+  - **Stable machine identity + persistent enrollment** (SPEC-004 / 008 `machine_uid`, SPEC-016
+    zero-touch `cak_`) — end-users reach persistent managed agents.
+  - **Session-scoped viewer tokens** (v2 Phase 1, landed) — reused directly.
+- **Pairs with:** SPEC-014 (white-label — the portal is the client-facing surface), SPEC-003/005
+  (machine inventory/list — portal machine rows), SPEC-015 (optional connect-notification overlay).
+- **Unblocks:** the directory-sync spec (AD/Entra/Google → end-user list), delegated client-admin role,
+  and per-seat billing — all of which build on the `end_user` principal defined here.
+
+## Open questions
+1. **Same console vs separate end-user portal?** Recommendation: **separate, role-gated route** —
+   smaller attack surface, no risk of leaking technician controls, cleaner white-label. Confirm before
+   build.
+2. **End-users in the existing `users` table (role=end_user) vs a dedicated `end_users` table?**
+   Recommendation: reuse `users` (the grant FK `user_client_access.user_id` already points there) with
+   hard role guardrails. Revisit if mixing external + internal principals in one table proves risky.
+3. **MFA in v1 or immediate fast-follow?** Schema is included either way; decide enforcement timing.
+4. **Who administers grants in v1** — MSP admin only (assumed), or ship the delegated client-company
+   admin role together? (Affects scope/effort materially.)
+5. **Seat/licensing enforcement depth for v1** — count-and-display vs hard-cap vs billing-integrated.
+6. **Default access mode** — Control assumed (own machine); should an admin be able to pin a machine to
+   view-only for a given end-user? (Token layer already supports it.)