Files
claudetools/projects/msp-tools/guru-rmm/session-logs/2026-04-15-session.md
Mike Swanson 733d87f20e Dataforth UI push + dedup + refactor, GuruRMM roadmap evolution, Azure signing setup
Dataforth (projects/dataforth-dos/):
- UI feature: row coloring + PUSH/RE-PUSH buttons + Website Status filter
- Database dedup to one row per SN (2.89M -> 469K rows, UNIQUE constraint added)
- Import logic handles FAIL -> PASS retest transition
- Refactored upload-to-api.js to render datasheets in-memory (dropped For_Web filesystem dep)
- Bulk pushed 170,984 records to Hoffman API
- Statistical sanity check: 100/100 stamped SNs verified on Hoffman

GuruRMM (projects/msp-tools/guru-rmm/):
- ROADMAP.md: added Terminology (5-tier hierarchy), Tunnel Channels Phase 2,
  Logging/Audit/Observability, Multi-tenancy, Modular Architecture,
  Protocol Versioning, Certificates sections + Decisions Log
- CONTEXT.md: hierarchy table, new anti-patterns (bootstrap sacred,
  no cross-module imports), revised next-steps priorities

Session logs for both projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:39:32 -07:00

10 KiB

GuruRMM Session Log — 2026-04-15

Context

End-to-end test of the Tunnel Phase 1 lifecycle, triggered opportunistically while troubleshooting SSH flakiness on AD2 (Dataforth project). No code changes — exercised the production API from an off-LAN workstation via the public Cloudflare endpoint (rmm-api.azcomputerguru.com).

What worked

Step Endpoint Result
Login POST /api/auth/login 200, token returned
List agents GET /api/agents 6 agents, AD2 and DESKTOP-0O8A1RL online on v0.6.0
Open tunnel POST /api/v1/tunnel/open (agent_id=AD2 d28a1c90-47d7-448f-a287-197bc8892234) 200, {session_id: 0682a80c-a899-403b-9473-aaaed50e4aba, status: active}
Status while active GET /api/v1/tunnel/status/{id} 200, full session record (opened_at, last_activity, agent_id)
Close tunnel POST /api/v1/tunnel/close 200, {status: closed}

Findings (actionable)

1. Status endpoint returns 403 after close

GET /api/v1/tunnel/status/{id} against a just-closed session returns 403 Forbidden — "Session not found or not owned by user" instead of {status: closed}. Root cause likely that the WHERE status = 'active' filter (from idx_tech_sessions_active — see CONTEXT.md line 256) is applied to the status lookup in addition to the ownership check, so closed sessions fail ownership verification and fall through to the 403 branch.

Fix: separate the existence lookup from the ownership check. If the session exists but belongs to the requesting tech, return the closed record rather than masking it as a permission error.

Location to inspect: server/src/api/tunnel.rs (status handler) and/or server/src/db/tunnel.rs (session fetch query).

2. Agent writes no logs

gururmm-agent.exe 0.6.0 on AD2 produces no files in C:\Program Files\GuruRMM\, C:\ProgramData\GuruRMM\, nor any Windows Application Event Log entries under provider gururmm*. This made it impossible to confirm the agent-side state transition (Heartbeat → Tunnel) or receipt of TunnelReady during the test.

Fix: add a log target in agent/src/main.rs (env_logger or tracing with a rolling file appender) writing to C:\ProgramData\GuruRMM\agent.log. Optionally also emit critical events (tunnel open/close, update success/failure) to the Windows Event Log via eventlog crate.

3. Phase 2 gap confirmed against a real use case

Live need: run a couple of diagnostic commands on AD2 (sshd flapping sporadically on port 22, no process crash in Event Log; want to investigate firewall/Defender events from the server side). With no channels, the tunnel's only utility today is proving the session layer works. The actual remote-operate capability still depends on Phase 2.

Priority order for Phase 2 channels (based on what would have been useful here):

  1. Terminal channel first — unlocks 80% of field use cases (log tails, Get-Service, Restart-Service, Get-WinEvent).
  2. Service channel second — tight scope, high value for "restart sshd".
  3. File channel third — needed but rarely urgent; SFTP already exists.
  4. Registry channel last — niche, can defer.

What Else We Observed

  • The public tunnel chain rmm-api.azcomputerguru.com → Cloudflare → nginx → API (3001) proxies /api/* correctly. The docs in CONTEXT.md implied nginx only served /downloads/; confirmed today that it also proxies API paths, which is why off-LAN admin usage works.
  • AD2 agent start time 2026-04-11 22:09 corresponds to last reboot of AD2; the agent has not restarted since despite sshd port flaps (sshd PID 4012 also continuously running since same moment). Confirms the tunnel infrastructure and the RMM agent are stable; the sshd flap is a separate network-layer issue unrelated to GuruRMM.

Credentials Used

Note: op read "op://Infrastructure/GuruRMM Server/Admin Password" returned a stale value (ClaudeAPI2026!@#) that fails login. The 2026-04-14 session log documents the current password as GuruRMM2025. 1Password entry should be updated to match.

Next Steps

  1. Update 1Password Infrastructure/GuruRMM Server entry — set Admin Password field to GuruRMM2025 to match what server accepts.
  2. Fix /api/v1/tunnel/status/{id} for closed sessions (see Finding 1).
  3. Add file/event-log output to agent (see Finding 2).
  4. Begin Phase 2 — Terminal channel first.

Update (evening session): Roadmap evolution + Azure Trusted Signing setup

Substantial architectural planning session. Product direction shifted from "single-tenant RMM tool" to "multi-tenant SaaS for MSPs." Roadmap updated significantly to reflect.

Roadmap additions to ROADMAP.md

  1. Terminology (canonical) — locked in the 5-tier hierarchy: Platform → Partner (DB: tenant_id) → Client → Site → Agent. API/UI says "Partner"; DB column is tenant_id. API path convention /api/public/v1/partners/{pid}/clients/{cid}/sites/{sid}/agents/{aid}. Event topics like agent.online, partner.upgraded. Full table + rules at top of ROADMAP.md.

  2. Tunnel Channels (Phase 2) — T1-T8 tracking Terminal/File/Registry/Service channels + tech-side subscriber (T5 is gating dep — browser currently has no way to receive tunnel data, server/src/ws/mod.rs:808-825 discards incoming AgentMessage::TunnelData).

  3. Logging, Audit & Observability — L1-L10 three-tier design:

    • Agent self-logging via OS-native sinks (Windows Event Log custom provider, Linux journald, macOS os_log)
    • Client machine health via OS event log pulls — default 15-min delta + force-pull on tunnel open/close; default levels Critical+Error+Warning for delta, 4h bulk for Info/Debug/Audit/Notification; all tenant-configurable
    • Tunnel audit direct to DB table tunnel_audit (already exists, unused) — no scrubbing, sensitive input captured intentionally for tech-behavior audit; 90-day tenant-visible retention default; indefinite system archive to object storage
    • Agent config push via ServerMessage::Config on connect + real-time when tenant admin changes settings
  4. Multi-tenancy / MSP SaaS (M1-M7) — tenant_id on every table from now forward, tenancy-aware auth middleware, tenant admin dashboard, per-agent/month billing meter, data residency options, tenant export API, onboarding wizard.

  5. Modular Architecture & Public APIs (X1-X12) — core vs. module boundary, event bus (NATS JetStream or Redis Streams), module manifest, module-to-core + module-to-module versioned APIs, public REST API /api/public/v1/ with OpenAPI spec + scoped API keys, webhook subscriptions, WASM or OCI sandbox for third-party modules (deferred), per-module billing. Concrete module candidates documented: PSA/CRM, Remote Syslog, Backups, Patch Mgmt, IT-Glue-style Docs, Network Monitoring.

  6. Protocol Versioning & Stale-Agent Recovery (V1-V10)/api/v1/bootstrap/hello declared sacred (additive-only forever). Compat shim layer per old protocol version at server/src/compat/v{N}.rs. Server-initiated forced-upgrade instruction. Per-tenant update channels (stable/current/beta). Auto-sunset policy when old version fleet hits zero. Rollback path via action: downgrade_required. Concrete motivating example: Scileppi VP laptop offline for days — must be able to reconnect, get accepted, auto-upgrade.

  7. Certificates & Trust (C1-C11) — full cost + priority matrix. C1: Azure Trusted Signing for Windows (Public Trust). C2: Apple Developer Program. C3: GPG for Linux. C4-C11: TLS automation, mTLS, SBOM, FP submissions, DKIM.

  8. Decisions Log — appended rationale entries for every 2026-04-15 decision so future sessions don't re-litigate.

CONTEXT.md anti-patterns added

  • "DO NOT make breaking changes to /api/v1/bootstrap/hello" — additive-only forever
  • "DO NOT cross module boundaries by importing another module's internals" — event bus or exposed APIs only
  • Hierarchy terminology table added to anti-patterns block (canonical reference)

Azure Trusted Signing — provisioned and IV submitted

Business identity confirmed via D&B profile lookup: Arizona Computer Guru LLC (D-U-N-S 00-566-1506 / 005661506), 7437 E 22ND St, Tucson AZ 85710, (520) 304-8300, mike@azcomputerguru.com. 25+ years operating history → Public Trust eligible (>3yr threshold).

Provisioned in subscription Basic (e507e953-2ce9-4887-ba96-9b654f7d3267):

  • Resource group: gururmm-signing-rg (westus2)
  • Trusted Signing Account: gururmm-signing
  • Account URI: https://wus2.codesigning.azure.net/
  • SKU: Basic (~$9.99/mo billing started 2026-04-16 00:16 UTC)

RBAC granted:

  • mike@azcomputerguru.com → role Artifact Signing Identity Verifier at account scope

Identity Validation submitted:

  • IV ID: 03028768-f611-4904-aa58-c755020f436a
  • Status: In Progress (Microsoft review, 1-5 business days typical)
  • Submitted name: Arizona Computer Guru LLC (state filing); D&B record has older COMPUTER GURU Corporation — may need to update D&B profile for consistency
  • Primary email: mike@; Secondary: admin@azcomputerguru.com
  • Microsoft may call 520-304-8300 — voicemail should identify Computer Guru

Pending (blocks on IV approval):

  • Certificate Profile creation: az trustedsigning certificate-profile create --resource-group gururmm-signing-rg --account-name gururmm-signing --profile-name gururmm-public-trust --profile-type PublicTrust --identity-validation-id 03028768-f611-4904-aa58-c755020f436a
  • Signing role assignment: Trusted Signing Certificate Profile Signer to CI build principal
  • Local tooling install: Windows SDK (for signtool.exe), Microsoft.Trusted.Signing.Client NuGet package

All details persisted to vault: D:\vault\services\azure-trusted-signing.sops.yaml (encrypted).

Action items for next session

  1. Check IV status — portal → Trusted Signing Accounts → gururmm-signing → Identity Validation
  2. If approved → run the cert profile create command (already staged in vault)
  3. If Microsoft flags legal name mismatch: reply with AZ Corp Commission LLC Articles; update D&B record
  4. Start signtool.exe + dlib integration in a local scratch project
  5. Meanwhile, fix the two backlog items (tunnel status 403 bug, agent logging) — they're both independent of the Azure work and small PRs