Dataforth UI push + dedup + refactor, GuruRMM roadmap evolution, Azure signing setup

Dataforth (projects/dataforth-dos/): - UI feature: row coloring + PUSH/RE-PUSH buttons + Website Status filter - Database dedup to one row per SN (2.89M -> 469K rows, UNIQUE constraint added) - Import logic handles FAIL -> PASS retest transition - Refactored upload-to-api.js to render datasheets in-memory (dropped For_Web filesystem dep) - Bulk pushed 170,984 records to Hoffman API - Statistical sanity check: 100/100 stamped SNs verified on Hoffman GuruRMM (projects/msp-tools/guru-rmm/): - ROADMAP.md: added Terminology (5-tier hierarchy), Tunnel Channels Phase 2, Logging/Audit/Observability, Multi-tenancy, Modular Architecture, Protocol Versioning, Certificates sections + Decisions Log - CONTEXT.md: hierarchy table, new anti-patterns (bootstrap sacred, no cross-module imports), revised next-steps priorities Session logs for both projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:39:32 -07:00
parent eae9d7f644
commit 733d87f20e
42 changed files with 9153 additions and 7 deletions
--- a/projects/msp-tools/guru-rmm/session-logs/2026-04-15-session.md
+++ b/projects/msp-tools/guru-rmm/session-logs/2026-04-15-session.md
@@ -0,0 +1,162 @@
+# GuruRMM Session Log — 2026-04-15
+
+## Context
+End-to-end test of the Tunnel Phase 1 lifecycle, triggered opportunistically
+while troubleshooting SSH flakiness on AD2 (Dataforth project). No code
+changes — exercised the production API from an off-LAN workstation via the
+public Cloudflare endpoint (`rmm-api.azcomputerguru.com`).
+
+## What worked
+
+| Step | Endpoint | Result |
+|---|---|---|
+| Login | `POST /api/auth/login` | 200, token returned |
+| List agents | `GET /api/agents` | 6 agents, AD2 and DESKTOP-0O8A1RL online on v0.6.0 |
+| Open tunnel | `POST /api/v1/tunnel/open` (agent_id=AD2 `d28a1c90-47d7-448f-a287-197bc8892234`) | 200, `{session_id: 0682a80c-a899-403b-9473-aaaed50e4aba, status: active}` |
+| Status while active | `GET /api/v1/tunnel/status/{id}` | 200, full session record (opened_at, last_activity, agent_id) |
+| Close tunnel | `POST /api/v1/tunnel/close` | 200, `{status: closed}` |
+
+## Findings (actionable)
+
+### 1. Status endpoint returns 403 after close
+`GET /api/v1/tunnel/status/{id}` against a just-closed session returns
+`403 Forbidden — "Session not found or not owned by user"` instead of
+`{status: closed}`. Root cause likely that the `WHERE status = 'active'`
+filter (from `idx_tech_sessions_active` — see CONTEXT.md line 256) is applied
+to the status lookup in addition to the ownership check, so closed sessions
+fail ownership verification and fall through to the 403 branch.
+
+**Fix:** separate the existence lookup from the ownership check. If the
+session exists but belongs to the requesting tech, return the closed record
+rather than masking it as a permission error.
+
+Location to inspect: `server/src/api/tunnel.rs` (status handler) and/or
+`server/src/db/tunnel.rs` (session fetch query).
+
+### 2. Agent writes no logs
+`gururmm-agent.exe 0.6.0` on AD2 produces no files in
+`C:\Program Files\GuruRMM\`, `C:\ProgramData\GuruRMM\`, nor any Windows
+Application Event Log entries under provider `gururmm*`. This made it
+impossible to confirm the agent-side state transition
+(`Heartbeat → Tunnel`) or receipt of `TunnelReady` during the test.
+
+**Fix:** add a log target in `agent/src/main.rs` (env_logger or tracing
+with a rolling file appender) writing to
+`C:\ProgramData\GuruRMM\agent.log`. Optionally also emit critical events
+(tunnel open/close, update success/failure) to the Windows Event Log via
+`eventlog` crate.
+
+### 3. Phase 2 gap confirmed against a real use case
+Live need: run a couple of diagnostic commands on AD2 (sshd flapping
+sporadically on port 22, no process crash in Event Log; want to investigate
+firewall/Defender events from the server side). With no channels, the
+tunnel's only utility today is proving the session layer works. The actual
+remote-operate capability still depends on Phase 2.
+
+**Priority order for Phase 2 channels** (based on what would have been useful
+here):
+1. **Terminal channel** first — unlocks 80% of field use cases (log tails,
+   `Get-Service`, `Restart-Service`, `Get-WinEvent`).
+2. **Service channel** second — tight scope, high value for "restart sshd".
+3. **File channel** third — needed but rarely urgent; SFTP already exists.
+4. **Registry channel** last — niche, can defer.
+
+## What Else We Observed
+
+- The public tunnel chain `rmm-api.azcomputerguru.com` → Cloudflare → nginx
+  → API (3001) proxies `/api/*` correctly. The docs in CONTEXT.md implied
+  nginx only served `/downloads/`; confirmed today that it also proxies API
+  paths, which is why off-LAN admin usage works.
+- AD2 agent start time `2026-04-11 22:09` corresponds to last reboot of
+  AD2; the agent has not restarted since despite sshd port flaps (sshd PID
+  4012 also continuously running since same moment). Confirms the tunnel
+  infrastructure and the RMM agent are stable; the sshd flap is a separate
+  network-layer issue unrelated to GuruRMM.
+
+## Credentials Used
+
+- **Admin Email:** admin@azcomputerguru.com
+- **Admin Password:** GuruRMM2025
+- **Public API:** https://rmm-api.azcomputerguru.com
+
+**Note:** `op read "op://Infrastructure/GuruRMM Server/Admin Password"`
+returned a stale value (`ClaudeAPI2026!@#`) that fails login. The
+2026-04-14 session log documents the current password as `GuruRMM2025`.
+1Password entry should be updated to match.
+
+## Next Steps
+
+1. Update 1Password `Infrastructure/GuruRMM Server` entry — set
+   `Admin Password` field to `GuruRMM2025` to match what server accepts.
+2. Fix `/api/v1/tunnel/status/{id}` for closed sessions (see Finding 1).
+3. Add file/event-log output to agent (see Finding 2).
+4. Begin Phase 2 — Terminal channel first.
+
+---
+
+## Update (evening session): Roadmap evolution + Azure Trusted Signing setup
+
+Substantial architectural planning session. Product direction shifted from "single-tenant RMM tool" to "multi-tenant SaaS for MSPs." Roadmap updated significantly to reflect.
+
+### Roadmap additions to ROADMAP.md
+
+1. **Terminology (canonical)** — locked in the 5-tier hierarchy: Platform → Partner (DB: tenant_id) → Client → Site → Agent. API/UI says "Partner"; DB column is `tenant_id`. API path convention `/api/public/v1/partners/{pid}/clients/{cid}/sites/{sid}/agents/{aid}`. Event topics like `agent.online`, `partner.upgraded`. Full table + rules at top of ROADMAP.md.
+
+2. **Tunnel Channels (Phase 2)** — T1-T8 tracking Terminal/File/Registry/Service channels + tech-side subscriber (T5 is gating dep — browser currently has no way to receive tunnel data, `server/src/ws/mod.rs:808-825` discards incoming `AgentMessage::TunnelData`).
+
+3. **Logging, Audit & Observability** — L1-L10 three-tier design:
+   - Agent self-logging via OS-native sinks (Windows Event Log custom provider, Linux journald, macOS os_log)
+   - Client machine health via OS event log pulls — default 15-min delta + force-pull on tunnel open/close; default levels Critical+Error+Warning for delta, 4h bulk for Info/Debug/Audit/Notification; all tenant-configurable
+   - Tunnel audit direct to DB table `tunnel_audit` (already exists, unused) — no scrubbing, sensitive input captured intentionally for tech-behavior audit; 90-day tenant-visible retention default; indefinite system archive to object storage
+   - Agent config push via `ServerMessage::Config` on connect + real-time when tenant admin changes settings
+
+4. **Multi-tenancy / MSP SaaS (M1-M7)** — tenant_id on every table from now forward, tenancy-aware auth middleware, tenant admin dashboard, per-agent/month billing meter, data residency options, tenant export API, onboarding wizard.
+
+5. **Modular Architecture & Public APIs (X1-X12)** — core vs. module boundary, event bus (NATS JetStream or Redis Streams), module manifest, module-to-core + module-to-module versioned APIs, public REST API `/api/public/v1/` with OpenAPI spec + scoped API keys, webhook subscriptions, WASM or OCI sandbox for third-party modules (deferred), per-module billing. Concrete module candidates documented: PSA/CRM, Remote Syslog, Backups, Patch Mgmt, IT-Glue-style Docs, Network Monitoring.
+
+6. **Protocol Versioning & Stale-Agent Recovery (V1-V10)** — `/api/v1/bootstrap/hello` declared **sacred** (additive-only forever). Compat shim layer per old protocol version at `server/src/compat/v{N}.rs`. Server-initiated forced-upgrade instruction. Per-tenant update channels (stable/current/beta). Auto-sunset policy when old version fleet hits zero. Rollback path via `action: downgrade_required`. Concrete motivating example: Scileppi VP laptop offline for days — must be able to reconnect, get accepted, auto-upgrade.
+
+7. **Certificates & Trust (C1-C11)** — full cost + priority matrix. C1: Azure Trusted Signing for Windows (Public Trust). C2: Apple Developer Program. C3: GPG for Linux. C4-C11: TLS automation, mTLS, SBOM, FP submissions, DKIM.
+
+8. **Decisions Log** — appended rationale entries for every 2026-04-15 decision so future sessions don't re-litigate.
+
+### CONTEXT.md anti-patterns added
+
+- "DO NOT make breaking changes to `/api/v1/bootstrap/hello`" — additive-only forever
+- "DO NOT cross module boundaries by importing another module's internals" — event bus or exposed APIs only
+- Hierarchy terminology table added to anti-patterns block (canonical reference)
+
+### Azure Trusted Signing — provisioned and IV submitted
+
+**Business identity confirmed** via D&B profile lookup: `Arizona Computer Guru LLC` (D-U-N-S `00-566-1506` / `005661506`), 7437 E 22ND St, Tucson AZ 85710, (520) 304-8300, mike@azcomputerguru.com. 25+ years operating history → Public Trust eligible (>3yr threshold).
+
+**Provisioned in subscription `Basic` (`e507e953-2ce9-4887-ba96-9b654f7d3267`):**
+- Resource group: `gururmm-signing-rg` (westus2)
+- Trusted Signing Account: `gururmm-signing`
+- Account URI: `https://wus2.codesigning.azure.net/`
+- SKU: Basic (~$9.99/mo billing started 2026-04-16 00:16 UTC)
+
+**RBAC granted:**
+- `mike@azcomputerguru.com` → role `Artifact Signing Identity Verifier` at account scope
+
+**Identity Validation submitted:**
+- IV ID: `03028768-f611-4904-aa58-c755020f436a`
+- Status: `In Progress` (Microsoft review, 1-5 business days typical)
+- Submitted name: `Arizona Computer Guru LLC` (state filing); D&B record has older `COMPUTER GURU` Corporation — may need to update D&B profile for consistency
+- Primary email: mike@; Secondary: admin@azcomputerguru.com
+- Microsoft may call 520-304-8300 — voicemail should identify Computer Guru
+
+**Pending (blocks on IV approval):**
+- Certificate Profile creation: `az trustedsigning certificate-profile create --resource-group gururmm-signing-rg --account-name gururmm-signing --profile-name gururmm-public-trust --profile-type PublicTrust --identity-validation-id 03028768-f611-4904-aa58-c755020f436a`
+- Signing role assignment: `Trusted Signing Certificate Profile Signer` to CI build principal
+- Local tooling install: Windows SDK (for signtool.exe), Microsoft.Trusted.Signing.Client NuGet package
+
+**All details persisted to vault:** `D:\vault\services\azure-trusted-signing.sops.yaml` (encrypted).
+
+### Action items for next session
+
+1. Check IV status — portal → Trusted Signing Accounts → gururmm-signing → Identity Validation
+2. If approved → run the cert profile create command (already staged in vault)
+3. If Microsoft flags legal name mismatch: reply with AZ Corp Commission LLC Articles; update D&B record
+4. Start signtool.exe + dlib integration in a local scratch project
+5. Meanwhile, fix the two backlog items (tunnel status 403 bug, agent logging) — they're both independent of the Azure work and small PRs