Files
claudetools/projects/msp-tools/guru-rmm/ROADMAP.md
Mike Swanson 733d87f20e Dataforth UI push + dedup + refactor, GuruRMM roadmap evolution, Azure signing setup
Dataforth (projects/dataforth-dos/):
- UI feature: row coloring + PUSH/RE-PUSH buttons + Website Status filter
- Database dedup to one row per SN (2.89M -> 469K rows, UNIQUE constraint added)
- Import logic handles FAIL -> PASS retest transition
- Refactored upload-to-api.js to render datasheets in-memory (dropped For_Web filesystem dep)
- Bulk pushed 170,984 records to Hoffman API
- Statistical sanity check: 100/100 stamped SNs verified on Hoffman

GuruRMM (projects/msp-tools/guru-rmm/):
- ROADMAP.md: added Terminology (5-tier hierarchy), Tunnel Channels Phase 2,
  Logging/Audit/Observability, Multi-tenancy, Modular Architecture,
  Protocol Versioning, Certificates sections + Decisions Log
- CONTEXT.md: hierarchy table, new anti-patterns (bootstrap sacred,
  no cross-module imports), revised next-steps priorities

Session logs for both projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:39:32 -07:00

31 KiB

GuruRMM - Feature Roadmap & Change Requests

Tracked list of desired features, improvements, and changes. Used to evaluate whether the current codebase supports these goals or if a rewrite is needed.

Last Updated: 2026-04-15


Terminology (canonical)

Decided 2026-04-15. Use these exact terms in code, UI, API, docs, and conversation. Don't invent synonyms.

Tier Term DB column Meaning Example
1 Platform The software author (us) GuruRMM
2 Partner tenant_id An MSP — a paying customer of the Platform "Acme IT Services"
3 Client client_id A Partner's customer "Dataforth Corp"
4 Site site_id A location or logical grouping within a Client "Dataforth Tucson HQ"
5 Agent agent_id An endpoint at a Site AD2, SL-SERVER

Notes:

  • UI/API use "Partner"; DB uses tenant_id (industry-standard term for isolation). Do not rename tenant_id in code.
  • "Client" may collide with HTTP-client terminology in context; when ambiguous, use "client org" or "client account".
  • Site is not always a physical location — can be a DMZ, VLAN, cloud region, whatever grouping makes sense for that Client.
  • Do not use "sub-tenant" or "customer" (ambiguous across tiers).
  • User roles: Platform admin (us), Partner admin, Partner tech, Client contact (limited read access to their own data).
  • Optional Department/OU tier inside a Site is deferred until a real customer asks for it.
  • MSPs can label-override their UI via partner_settings.label_overrides JSONB (e.g., rename "Client"→"Customer" for their branded view) — supported without schema changes.

API path convention: /api/public/v1/partners/{partner_id}/clients/{client_id}/sites/{site_id}/agents/{agent_id}

Event bus topic convention: agent.online, site.created, client.deleted, partner.upgraded, etc.


Dashboard / UI

# Feature Priority Status Notes
D1 All metrics clickable to relevant content High Done Stat cards link to filtered agent views
D2 Dark theme with branded sidebar High Done JetBrains Mono + Plus Jakarta Sans, GURURMM MISSION CONTROL branding
D3 Command cancel/delete/clear history Medium Done Cancel pending/running, delete any, bulk clear finished
D4 Global search across all agent details High Open Search by hostname, MAC, IP, OS, version -- any agent field. Dashboard main page.
D5 Clickable metric cards on agent detail -> drill-down views High Open CPU card -> process list sorted by CPU%. Memory card -> process list sorted by RAM. Disk card -> drive/folder usage breakdown. Sortable tables.
D6 Real-time terminal (PS/cmd) via WebSocket tunnel High Open Interactive shell session relayed through server. Separate from check-in process. Spawns on demand, full bidirectional I/O.
D7 Remote file system browser High Open Browse, upload, download, rename, delete files on agent. Tree view + detail pane. Via real-time tunnel.
D8 Remote registry editor (Windows) Medium Open Browse/edit/create/delete registry keys and values. Tree view like regedit. Via real-time tunnel.
D9 Remote services manager High Open List all services with status. Start/stop/restart/disable/enable/edit startup type. Sortable, searchable. Via real-time tunnel.
D10

Agent / Installer

# Feature Priority Status Notes
A1 Site-code-based installers (no API keys) High Done /install/:site_code/* endpoints, binary with embedded config
A2 Public shareable install links per client High Done Landing page at /install/:site_code with OS detection
A3 Capture full OS detail (distro/version) High Open Linux agents just report "linux" -- should capture distro name and version (e.g., Ubuntu 22.04, Debian 12). Agent-side change to collect, server-side to store/display.
A4 Reliable CPU/GPU temperature collection High Open Not working on any machine currently. Windows: WMI/OpenHardwareMonitor/LibreHardwareMonitor. Linux: lm-sensors/sysfs thermal zones. Need fallback chain.
A5 Process list collection (CPU%, RAM, disk I/O) High Open Needed for D5 drill-downs. Agent collects top processes, sends on demand or as part of extended state.
A6 Disk usage detail (per-drive, large folders) Medium Open Needed for D5 disk drill-down. Per-partition usage + optional large folder scan.
A7

Server / API

# Feature Priority Status Notes
S1 Claude Code integration (claude_task command type) Medium Planned gururmm-agent project has the Rust module, not yet integrated
S2 Stackable/inheritable policy system High Open Policies at Company > Site > Machine levels. Lower level overrides higher. Merge behavior for non-conflicting settings.
S3 Dynamic groups based on agent attributes High Open Rule-based groups (e.g., RAM <= 8GB, OS = Windows 10, disk > 90%). Policies can target dynamic groups.
S4 Policy actions: custom script execution High Open Policies can trigger scripts (PowerShell/bash) on matching agents. Scheduled or on-demand.
S5 Customizable alerting system High Open User-defined alert rules: offline detection, disk space thresholds, SMART errors, RAID degradation, bad sectors, CPU/RAM sustained high, temp thresholds. Configurable severity, notification channels, escalation.
S6 Alert notification channels Medium Open Email, webhook, Slack/Teams integration, push notifications. Per-alert-rule routing.
S7 Real-time tunnel mechanism (separate from check-in) High Phase 1 Done Session lifecycle REST+WS+DB+agent state machine complete (2026-04-14 / verified 2026-04-15). Phase 2 (channels) tracked under Tunnel Channels section below.
S8 Closed-session status endpoint returns 403 Medium Open GET /api/v1/tunnel/status/{id} returns 403 for closed sessions (should return {status: closed}). Root cause: verify_session_ownership() applies WHERE status='active' before ownership check. Fix in server/src/db/tunnel.rs:94-103.
S9

Tunnel Channels (Phase 2)

On-demand capabilities layered on top of the tunnel session framework. Each channel is a typed WebSocket payload pair (request/response) routed by channel_id under an open tech_session. All channel operations are audited per Logging & Audit section.

# Feature Priority Status Notes
T1 Terminal channel (interactive shell) High Open TunnelDataPayload::Terminal { command }TerminalOutput { stdout, stderr, exit_code } (types exist in server/src/ws/mod.rs:310-319, agent stub at agent/src/transport/websocket.rs:408-434). Implement via tokio::process::Command with configurable timeout (default 30s). 80% of field use cases. Ship before other channels.
T2 File channel (upload/download/rename/delete + tree browse) High Open Covers D7. Stream file bytes in chunks over WS with progress. Path safety (no .. traversal). Needs allowlist vs freeform decision.
T3 Registry channel (Windows) Medium Open Covers D8. Read/write/create/delete keys + values. Use winreg crate. Gate to tenant admins only.
T4 Service channel (Windows services) High Open Covers D9. List/start/stop/restart/change-startup-type. windows-service crate.
T5 Tech-side tunnel subscriber High Open Blocks all channels. Browser currently has no mechanism to receive tunnel data from server. Design: GET /api/v1/tunnel/stream/{session_id} WebSocket + in-memory HashMap<session_id, mpsc::Sender<TunnelData>> pub-sub.
T6 Server-side forward path High Open server/src/ws/mod.rs:808-825 currently logs+drops incoming AgentMessage::TunnelData. Wire to T5 pub-sub + tunnel_audit INSERT.
T7 Working directory / shell choice / elevation decisions High Open Terminal channel design decisions: cwd allowlist vs free-form; PowerShell vs cmd on Windows; admin elevation gating by role.
T8 Channel concurrency + rate limits Medium Open Multiple channels in one session. Per-channel rate/quota. Output size cap (default 1 MB/command).
T9

Logging, Audit & Observability

Three-tier design decided 2026-04-15. Each tier has distinct purpose, storage, retention, and consumer.

Design principles:

  • Agent self-logging uses OS-native mechanisms (no custom transport). Troubleshoot with familiar tools.
  • Client machine health via OS event log pulls. Feeds dashboard and alerting.
  • Tunnel audit captured directly to RMM DB. Non-negotiable, never scrubbed, designed for legal/compliance retention.
# Feature Priority Status Notes
L1 Agent self-logging via OS-native sinks High Open Windows Event Log (custom GuruRMM-Agent provider registered at install), Linux systemd/journald (tracing → stdout when run as unit), macOS unified log (os_log crate). Verbosity per-tenant configurable. Default INFO.
L2 Client event log pull + summarize High Open Agent polls OS event log on schedule; ships filtered events to server client_events table. Windows: Get-WinEvent -Level 1,2 -MaxEvents N. Linux: journalctl -p err --output json. macOS: log show --predicate 'messageType == error' --style json.
L3 L2 cadence — default 15-min delta poll + on tunnel open/close High Open Default 900s. On tunnel open: force delta pull so tech has fresh context. On tunnel close: force delta pull to capture anything tech's actions triggered. Configurable per-tenant in dashboard.
L4 L2 levels — default Critical + Error + Warning High Open Configurable per-tenant. Default: Critical(1), Error(2), Warning(3). Separate "noisy" bucket (Info/Debug/Audit/Notification) pulled every 4h default.
L5 Tunnel audit — every tech action persisted High Open Reuse existing tunnel_audit table (migration 010, unused today). Every command, file op, registry op, service op gets INSERT with session_id, channel_id, operation, details JSONB. No scrubbing — must retain sensitive input if a tech types it.
L6 Retention config High Open client_events: 90 days default, tenant configurable. tunnel_audit (live): 90 days default, tenant configurable. tunnel_audit (archive): indefinite, system-level rotation to object storage. Agent self-logs follow OS-native retention policy.
L7 Tunnel audit archive rotation High Open Monthly job: aged partitions of tunnel_audit → compressed JSONL or Parquet in S3/R2/MinIO. Naming: tunnel_audit/tenant_id={uuid}/year={YYYY}/month={MM}.jsonl.gz. Dashboard "deep search" endpoint queries archive on demand (Athena/DuckDB).
L8 Agent config push High Open On agent WS connect, server sends ServerMessage::Config { tenant_settings }. Real-time updates when tenant admin changes settings in dashboard. Agent adjusts poll cadence + event level filters live without restart.
L9 Dashboard surfaces for L2 (client_events) Medium Open Red-number badge on agent tile (count of unresolved errors last 24h). Time-sorted feed on agent detail page with filter/search. Acknowledge/dismiss individual events.
L10 Sensitive-data-at-rest protection High Open tunnel_audit may contain unscrubbed credentials. Postgres TDE or full-disk encryption on server. Access to audit tables strictly admin-role-gated. Meta-audit: log every SELECT on tunnel_audit to separate table. Document in tech SOP: "every tunnel keystroke is logged."
L11

Multi-tenancy / MSP SaaS

Goal stated 2026-04-15: make this a marketable product for other MSPs. Multi-tenancy must be baked in from here on — adding tenant_id later would be a brutal migration.

# Feature Priority Status Notes
M1 Core tenancy schema High Open New tables: tenants (id, name, plan, status, created_at), tenant_settings (tenant_id, key, value JSONB), msp_users (superadmins across tenants), tenant_users (tech ↔ tenant join with role). Add tenant_id UUID FK to: agents, tech_sessions, tunnel_audit, client_events, commands, any other per-customer table.
M2 Tenant-scoped authorization High Open JWT carries tenant_id + role. Every query must filter by tenant_id (middleware). Super-admin role bypasses for GuruRMM staff. Penalty for bugs here: data leakage across tenants.
M3 Tenant admin dashboard High Open UI for MSP admins to configure their tenant settings (L3/L4/L6 cadences, levels, retention). Super-admin can override across tenants.
M4 Billing / licensing meter Medium Open Per-agent-per-month is standard for RMM. Needs usage counter from day one. Consider Stripe Billing or manual invoicing to start.
M5 Data residency options Low Open Some MSPs require on-prem or regional hosting. Architectural impact: deployment model (single-tenant vs multi-tenant DB), encryption key management. Not required for MVP.
M6 Tenant export API Medium Open MSPs with SOC2/PCI customers will need to export their tenant's audit trail. GET /api/v1/tenants/{id}/export producing JSONL or Parquet. Self-service for portability.
M7 Onboarding flow High Open MSP signs up → tenant provisioned → first site created → install link generated → agent installs → first heartbeat → onboarding complete. End-to-end wizard.
M8

Infrastructure / Operations

# Feature Priority Status Notes
I1 Automate dark class injection in deploy Low Open Vite strips class="dark" -- need Vite plugin or build script
I2 Resolve stashed local changes on server Medium Open git stash on 172.16.3.30 has divergent dev work
I3 CI/CD webhook auto-builds on push Low Exists webhook at /webhook/build, build-agents.sh -- needs dashboard build added
I4

Modular Architecture & Public APIs

Goal stated 2026-04-15: the product should be modular from inception. Future modules under consideration: PSA/CRM, remote syslog aggregation, backups, likely more. Both first-party (us) and eventually third-party (other developers, customers) should be able to build modules against stable, versioned interfaces. End users should also have API access to automate against their own data.

Architectural principles:

  • Core is thin + opinionated. Tenants, agents, auth, audit, command dispatch, tunnel framework — that's the "kernel." Everything else is a module.
  • Modules own their data. Each module owns a schema namespace (psa_*, backup_*, syslog_*) and never writes directly to another module's tables. Cross-module data access goes through module-exposed APIs.
  • Event bus for cross-cutting communication. Agent.online, tunnel.opened, command.completed, client_event.received — core publishes, any module subscribes.
  • Public API is a first-class product surface, not an afterthought. OpenAPI spec, semver-versioned, rate-limited, key-authenticated, documented.
  • Boundary discipline: if it's tempting to reach across a module boundary, that's a signal to add an API there instead. Breaking this discipline once kills the modularity.
# Feature Priority Status Notes
X1 Core vs. module boundary definition High Open Document what's "core" (tenants, agents, auth, audit, command dispatch, tunnel framework, bootstrap) vs. what's a module (everything else). Codify via separate crates / modules in the Rust workspace (core/, modules/psa/, modules/backups/, etc.). Enforce via build system — module code cannot use private core internals, only the exposed core::api::* surface.
X2 Module manifest / registration High Open Each module ships a module.toml declaring: name, version, provides (APIs exposed), consumes (events/APIs used), permissions required (read_agents, write_commands, read_audit, etc.). Loaded at server startup; dashboard reflects installed modules.
X3 Event bus High Open NATS JetStream or Redis Streams. Every significant core action emits a typed event (agent.online, agent.offline, tunnel.opened, tunnel.closed, command.completed, client_event.received, tenant.created). Modules subscribe via the bus, not via direct core calls. Decouples timing + enables async modules.
X4 Module-to-core APIs High Open Core exposes a stable in-process API for modules: core::agents::list(tenant_id), core::commands::enqueue(...), core::audit::record(...). Versioned like core_api_v1, core_api_v2. Modules declare which version they require.
X5 Module-to-module APIs Medium Open Modules can expose their own APIs for other modules to consume. Example: PSA module exposes psa::tickets::create() which a Backups module could call when a backup fails. All via the module registry — no direct imports.
X6 Public REST API (for end users + integrations) High Open Versioned under /api/public/v1/. OpenAPI 3.1 spec auto-generated. Rate-limited per API key. Scoped API keys (read-only / write / admin). Separate from internal /api/v1/ used by dashboard. Publish spec at /api/public/v1/openapi.json.
X7 API key management High Open Dashboard UI: tenants create/revoke/rotate API keys, scope per key, view last-used and usage stats. Keys carry tenant_id. JWT session tokens (for dashboard) are separate from API keys (for machines).
X8 Public webhook subscriptions High Open Tenants subscribe to events via webhook URL. Event bus (X3) feeds a delivery worker that signs payloads (HMAC), retries with backoff, tracks delivery status in DB. Lets customers integrate without polling.
X9 Third-party module sandbox Medium Open Future work. Options: (a) WebAssembly modules loaded at runtime with capability-based access to core APIs; (b) signed OCI container images run as sidecars with mTLS. (a) is better UX but maturity risk. (b) is ops-heavy but proven. Decide when third-party demand is real.
X10 Module billing isolation Medium Open Each module can have independent pricing (PSA seat-based, Backups GB-based, RMM per-agent). Core billing meter (M4) becomes per-module, aggregates to tenant invoice. Enable tenants to subscribe to some modules but not others.
X11 Module upgrade independence Medium Open Modules version independently of core. Core API versioning (X4) lets modules pin core_api_v2 and survive core updates. Dashboard shows which modules need upgrades for a new core release.
X12 Module discoverability / marketplace Low Open Eventually: marketplace UI for MSPs to browse/install first- and third-party modules. Signed+reviewed entries only. Revenue share for third-party developers. Many moons away, design constraint for now: don't paint ourselves into a corner.
X13

Module candidates currently in mind

Capture these now so the core API design has concrete use cases to validate against:

  • PSA/CRM module — tickets, time tracking, contracts, invoicing. Likely largest module, heaviest DB load. Consumes: agent.online, client_event.received, command.completed. Exposes: psa::tickets::create|assign|close, psa::time::log.
  • Remote Syslog module — aggregates syslog/Windows Event Log from customer devices to a central searchable store. Consumes: client_event.received. Exposes: syslog::query|subscribe. Heavy ingest.
  • Backups module — schedules, monitors, reports on backup jobs (Veeam, Datto, Acronis, Synology, etc.). Consumes: integrations with third-party backup products (pull). Exposes: backups::status|history|alert. Compliance-sensitive.
  • Patch management — track OS + app patch levels, schedule installs, report compliance.
  • Documentation (IT Glue-style) — customer environment docs, credential vault, runbooks. Deep integration with PSA (customer entity shared).
  • Remote access — already covered by core tunnel framework; could grow into its own "pro" module with session recording, MFA-gated elevation, etc.
  • Network monitoring — SNMP/ping monitoring of non-agent devices (switches, printers, UPSs).

Protocol Versioning & Stale-Agent Recovery

Problem surfaced 2026-04-15: as the codebase evolves (multi-tenancy pivot, tunnel channels, new message types), long-offline agents will return to find the wire format they knew is gone. Without an upgrade lane, those agents become zombies — visible in the dashboard as "offline for 47 days," never self-heal, require manual intervention (RDP in, uninstall, reinstall).

Concrete example: Scileppi VP laptop offline for days. When it wakes up and tries to check in with v0.6.0 against a server that by then expects v0.9.x protocol, we need the server to say "I see you, you're old, here's how to update yourself" — and have the agent auto-comply.

Design principle: the bootstrap/hello path is sacred. It must never break, even across major protocol revisions. All other endpoints and message shapes are allowed to change. An agent that can still reach /hello can always recover.

# Feature Priority Status Notes
V1 Protocol version negotiation on connect High Open Agent sends {agent_version, protocol_version, os, arch} as first message. Server responds with {server_version, min_supported_protocol, latest_protocol, action} where action ∈ {proceed, upgrade_required, rejected}. WebSocket subprotocol header is one delivery option; a dedicated HTTP hello endpoint is another. Pick one, then never change its shape.
V2 Stable bootstrap endpoint High Open POST /api/v1/bootstrap/hello that accepts the agent handshake forever. Contract: input schema is additive-only (new optional fields OK, never rename/remove), output shape is additive-only. Agents as old as v0.1 must be able to hit this and get meaningful response.
V3 Compat shim layer per old protocol version High Open When an old agent checks in, server translates between the old wire format and current internal types. Shim lives in server/src/compat/v{N}.rs. Each shim documents: which protocol versions it supports, what adapters it provides, planned removal date.
V4 Server-initiated forced upgrade instruction High Open When handshake returns action: upgrade_required, response also includes update_url, update_checksum, update_args, and optional restart_policy. Agent treats this as highest-priority command, bypasses normal command queue, upgrades + relaunches itself.
V5 Agent self-update atomic rename (verify) High Exists (hardening needed) Already done per 2026-04-01 ADR. Audit against V4 flow: does current updater handle "tell me exactly which version to install" vs. "upgrade to latest"? May need parameterization.
V6 Per-version support matrix + sunset policy High Open Dashboard surface: table showing N agents per protocol version per tenant. Automated sunset: when a protocol version has 0 live agents for 60 days across all tenants, flag compat shim for removal in next release. Manual override to force-remove earlier.
V7 Agent version pinning per tenant Medium Open MSP can opt tenants into "stable" (N-1), "current" (latest), or "beta" (preview) update channels. Controls auto-update rollout pace across their fleet.
V8 Late check-in handling: accept then command High Open On stale-agent connect: (a) accept the handshake via compat shim, (b) record the connect event in audit, (c) immediately enqueue the upgrade command, (d) agent executes before any other work. Dashboard shows agent as "upgrading" briefly before "online".
V9 Graceful protocol deprecation warnings Medium Open When an agent connects on a deprecated (but still supported) protocol, server sends a warning field in every response. Agent logs it. Gives MSPs lead time to upgrade their fleet before hard-removal.
V10 Rollback path for bad upgrades High Open If v0.N upgrade bricks agents, bootstrap endpoint must let an operator mark v0.N action: downgrade_required and ship an older binary. Requires keeping old binaries in /var/www/gururmm/downloads/ with pinned checksums.
V11

Certificates & Trust

Code signing and TLS/trust certificates required to ship + operate the product without install-time friction. Decisions 2026-04-15.

# Item Priority Status Cost Notes
C1 Azure Trusted Signing — Windows agent + installer High In progress (2026-04-15) ~$9.99/mo + per-sig fee Hosted signing service. Bypasses hardware-token requirement that took effect June 2023. Public Trust level requires 3+ yrs business existence; Private Trust available immediately but limited usefulness. Identity verification via Microsoft takes days. See setup steps in session-logs/2026-04-15.
C2 Apple Developer Program — macOS agent notarization High Open $99/yr Developer ID Application + Installer certs; notarization via xcrun notarytool; Hardened Runtime entitlements; ticket stapling for offline installs. Enrollment can take days — start early.
C3 GPG signing — Linux .deb / .rpm packages High Open Free Generate key pair, publish pubkey at a stable URL, sign packages with debsign/rpmsign, host signed apt/yum repo with proper Release/repomd.xml.
C4 Timestamping — all signed artifacts High Open Free Use DigiCert or Sectigo public timestamp servers so signatures remain valid after cert rotation. Verify in CI that every signed binary has a valid timestamp.
C5 TLS automation for own domains High Done Free Cloudflare + Let's Encrypt already in place for rmm-api.azcomputerguru.com. Wildcard for *.gururmm.com when that domain lights up.
C6 Per-Partner white-label custom domains Medium Open ~$7/mo/domain via CF-for-SaaS, or DIY with ACME DNS-01 Partners want rmm.theirbrand.com. Decide: host certs ourselves via ACME DNS-01 + Cloudflare API, or use Cloudflare for SaaS. Defer until first Partner asks.
C7 Agent-to-server mTLS (enterprise option) Low Open Internal CA + time Self-signed CA + per-agent client certs. Bootstrap enrolls agent and issues cert scoped to agent_id. Adds install complexity. Defer until an enterprise customer demands it.
C8 SBOM + Sigstore/cosign provenance Medium Open Free Auto-generate CycloneDX or SPDX SBOM per release. cosign sign artifacts + container images. Important for SOC2-conscious MSPs evaluating supply chain.
C9 Windows Defender / vendor FP submission runbook Medium Open Despite valid signing, heuristic engines flag new binaries. Keep a runbook with submission portal links (Microsoft Security Intelligence, Malwarebytes, etc.).
C10 Email sending trust: DKIM / SPF / DMARC Medium Open Free Required when PSA module sends ticket notifications. Set up on sending domain; per-Partner if white-labeled email is a feature.
C11 WHQL driver signing Deferred Open $$$ + weeks turnaround Only if we ship a kernel driver. Avoid this path — use user-mode alternatives first.
C12

Decisions Log

Short record of why things are the shape they are. Append, don't edit.

2026-04-15 — Tunnel Phase 1 verified live. End-to-end test from off-LAN workstation via rmm-api.azcomputerguru.com. Open/status/close lifecycle works. Confirmed nginx proxies /api/* (not just /downloads/). See session-logs/2026-04-15-session.md.

2026-04-15 — Logging split into three tiers. Decided against a single custom log transport. Agent self-logging to OS-native sinks (Event Viewer / journald / os_log). Client machine health via OS event log pulls. Tunnel audit direct to RMM DB. Rationale: sysadmins can troubleshoot with familiar tools; only high-value audit data hits our DB.

2026-04-15 — Tunnel audit is never scrubbed. If a tech types a password during a session, it gets stored. Purpose is to audit tech behavior, and scrubbing would undermine that. Offsetting controls: encryption at rest, admin-role-gated access, meta-audit of log views, tech SOP documentation. See L10.

2026-04-15 — Multi-tenancy from day one. Target market is MSPs reselling this product. Adding tenant_id retroactively after feature growth is a brutal migration; baking it in now is cheap. Every new table gets tenant_id FK from here forward.

2026-04-15 — Poll cadences. 15-min delta + on-tunnel-open/close for critical+error+warning. 4h bulk for info/debug/audit/notification. All tenant-configurable.

2026-04-15 — Retention. 90 days default for tenant-visible tables. Indefinite system-level for tunnel_audit with object-storage archive after the tenant-visible window. Legal/compliance contexts (HIPAA 6yr, PCI 1yr) handled by per-tenant extended retention configs.

2026-04-15 — Hierarchy terminology locked. Platform > Partner (MSP, DB: tenant_id) > Client > Site > Agent. API and UI say "Partner"; DB says tenant_id. No "sub-tenant", no ambiguous "customer". Department/OU tier deferred. MSPs can white-label labels via JSONB overrides. See Terminology section at top of this file.

2026-04-15 — Modular architecture from day one. Core = tenants + agents + auth + audit + commands + tunnel framework + bootstrap. Everything else = module. Modules own their schema namespace, never touch each other's tables, communicate via event bus (X3) and versioned module APIs (X4/X5). Public REST API (X6) separate from internal dashboard API. Webhook subscriptions (X8) for customer integrations. Third-party modules via WASM or signed containers — deferred but design-constrained now. Concrete module candidates: PSA/CRM, remote syslog, backups, patch management, IT-Glue-style docs, network monitoring. See X1-X12.

2026-04-15 — Bootstrap endpoint is sacred. Protocol version negotiation via a single /api/v1/bootstrap/hello endpoint whose input/output are additive-only forever. Every other endpoint/message is free to evolve. Enables late-arriving agents (Scileppi VP example: offline for days, wakes up to find a newer server protocol) to reconnect, get accepted, and receive an automatic upgrade instruction. Compat shim layer per old protocol version with automated sunset policy when fleet-wide usage hits zero. See V1-V10.

Rewrite Assessment

Criteria for rewrite:

  • If >50% of planned features require fighting the current architecture
  • If the tech stack is fundamentally wrong for the goals
  • If accumulated tech debt makes changes unreasonably slow

Current assessment (2026-04-15): The multi-tenancy pivot means a schema refactor is unavoidable (add tenant_id everywhere, tenancy-aware auth middleware). This is additive, not a rewrite. Rust + Axum + Postgres + WebSocket stack remains fit for purpose. Current code is a solid foundation. No rewrite planned; structural additions tracked above.