Dataforth UI push + dedup + refactor, GuruRMM roadmap evolution, Azure signing setup
Dataforth (projects/dataforth-dos/): - UI feature: row coloring + PUSH/RE-PUSH buttons + Website Status filter - Database dedup to one row per SN (2.89M -> 469K rows, UNIQUE constraint added) - Import logic handles FAIL -> PASS retest transition - Refactored upload-to-api.js to render datasheets in-memory (dropped For_Web filesystem dep) - Bulk pushed 170,984 records to Hoffman API - Statistical sanity check: 100/100 stamped SNs verified on Hoffman GuruRMM (projects/msp-tools/guru-rmm/): - ROADMAP.md: added Terminology (5-tier hierarchy), Tunnel Channels Phase 2, Logging/Audit/Observability, Multi-tenancy, Modular Architecture, Protocol Versioning, Certificates sections + Decisions Log - CONTEXT.md: hierarchy table, new anti-patterns (bootstrap sacred, no cross-module imports), revised next-steps priorities Session logs for both projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
162
projects/msp-tools/guru-rmm/session-logs/2026-04-15-session.md
Normal file
162
projects/msp-tools/guru-rmm/session-logs/2026-04-15-session.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# GuruRMM Session Log — 2026-04-15
|
||||
|
||||
## Context
|
||||
End-to-end test of the Tunnel Phase 1 lifecycle, triggered opportunistically
|
||||
while troubleshooting SSH flakiness on AD2 (Dataforth project). No code
|
||||
changes — exercised the production API from an off-LAN workstation via the
|
||||
public Cloudflare endpoint (`rmm-api.azcomputerguru.com`).
|
||||
|
||||
## What worked
|
||||
|
||||
| Step | Endpoint | Result |
|
||||
|---|---|---|
|
||||
| Login | `POST /api/auth/login` | 200, token returned |
|
||||
| List agents | `GET /api/agents` | 6 agents, AD2 and DESKTOP-0O8A1RL online on v0.6.0 |
|
||||
| Open tunnel | `POST /api/v1/tunnel/open` (agent_id=AD2 `d28a1c90-47d7-448f-a287-197bc8892234`) | 200, `{session_id: 0682a80c-a899-403b-9473-aaaed50e4aba, status: active}` |
|
||||
| Status while active | `GET /api/v1/tunnel/status/{id}` | 200, full session record (opened_at, last_activity, agent_id) |
|
||||
| Close tunnel | `POST /api/v1/tunnel/close` | 200, `{status: closed}` |
|
||||
|
||||
## Findings (actionable)
|
||||
|
||||
### 1. Status endpoint returns 403 after close
|
||||
`GET /api/v1/tunnel/status/{id}` against a just-closed session returns
|
||||
`403 Forbidden — "Session not found or not owned by user"` instead of
|
||||
`{status: closed}`. Root cause likely that the `WHERE status = 'active'`
|
||||
filter (from `idx_tech_sessions_active` — see CONTEXT.md line 256) is applied
|
||||
to the status lookup in addition to the ownership check, so closed sessions
|
||||
fail ownership verification and fall through to the 403 branch.
|
||||
|
||||
**Fix:** separate the existence lookup from the ownership check. If the
|
||||
session exists but belongs to the requesting tech, return the closed record
|
||||
rather than masking it as a permission error.
|
||||
|
||||
Location to inspect: `server/src/api/tunnel.rs` (status handler) and/or
|
||||
`server/src/db/tunnel.rs` (session fetch query).
|
||||
|
||||
### 2. Agent writes no logs
|
||||
`gururmm-agent.exe 0.6.0` on AD2 produces no files in
|
||||
`C:\Program Files\GuruRMM\`, `C:\ProgramData\GuruRMM\`, nor any Windows
|
||||
Application Event Log entries under provider `gururmm*`. This made it
|
||||
impossible to confirm the agent-side state transition
|
||||
(`Heartbeat → Tunnel`) or receipt of `TunnelReady` during the test.
|
||||
|
||||
**Fix:** add a log target in `agent/src/main.rs` (env_logger or tracing
|
||||
with a rolling file appender) writing to
|
||||
`C:\ProgramData\GuruRMM\agent.log`. Optionally also emit critical events
|
||||
(tunnel open/close, update success/failure) to the Windows Event Log via
|
||||
`eventlog` crate.
|
||||
|
||||
### 3. Phase 2 gap confirmed against a real use case
|
||||
Live need: run a couple of diagnostic commands on AD2 (sshd flapping
|
||||
sporadically on port 22, no process crash in Event Log; want to investigate
|
||||
firewall/Defender events from the server side). With no channels, the
|
||||
tunnel's only utility today is proving the session layer works. The actual
|
||||
remote-operate capability still depends on Phase 2.
|
||||
|
||||
**Priority order for Phase 2 channels** (based on what would have been useful
|
||||
here):
|
||||
1. **Terminal channel** first — unlocks 80% of field use cases (log tails,
|
||||
`Get-Service`, `Restart-Service`, `Get-WinEvent`).
|
||||
2. **Service channel** second — tight scope, high value for "restart sshd".
|
||||
3. **File channel** third — needed but rarely urgent; SFTP already exists.
|
||||
4. **Registry channel** last — niche, can defer.
|
||||
|
||||
## What Else We Observed
|
||||
|
||||
- The public tunnel chain `rmm-api.azcomputerguru.com` → Cloudflare → nginx
|
||||
→ API (3001) proxies `/api/*` correctly. The docs in CONTEXT.md implied
|
||||
nginx only served `/downloads/`; confirmed today that it also proxies API
|
||||
paths, which is why off-LAN admin usage works.
|
||||
- AD2 agent start time `2026-04-11 22:09` corresponds to last reboot of
|
||||
AD2; the agent has not restarted since despite sshd port flaps (sshd PID
|
||||
4012 also continuously running since same moment). Confirms the tunnel
|
||||
infrastructure and the RMM agent are stable; the sshd flap is a separate
|
||||
network-layer issue unrelated to GuruRMM.
|
||||
|
||||
## Credentials Used
|
||||
|
||||
- **Admin Email:** admin@azcomputerguru.com
|
||||
- **Admin Password:** GuruRMM2025
|
||||
- **Public API:** https://rmm-api.azcomputerguru.com
|
||||
|
||||
**Note:** `op read "op://Infrastructure/GuruRMM Server/Admin Password"`
|
||||
returned a stale value (`ClaudeAPI2026!@#`) that fails login. The
|
||||
2026-04-14 session log documents the current password as `GuruRMM2025`.
|
||||
1Password entry should be updated to match.
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Update 1Password `Infrastructure/GuruRMM Server` entry — set
|
||||
`Admin Password` field to `GuruRMM2025` to match what server accepts.
|
||||
2. Fix `/api/v1/tunnel/status/{id}` for closed sessions (see Finding 1).
|
||||
3. Add file/event-log output to agent (see Finding 2).
|
||||
4. Begin Phase 2 — Terminal channel first.
|
||||
|
||||
---
|
||||
|
||||
## Update (evening session): Roadmap evolution + Azure Trusted Signing setup
|
||||
|
||||
Substantial architectural planning session. Product direction shifted from "single-tenant RMM tool" to "multi-tenant SaaS for MSPs." Roadmap updated significantly to reflect.
|
||||
|
||||
### Roadmap additions to ROADMAP.md
|
||||
|
||||
1. **Terminology (canonical)** — locked in the 5-tier hierarchy: Platform → Partner (DB: tenant_id) → Client → Site → Agent. API/UI says "Partner"; DB column is `tenant_id`. API path convention `/api/public/v1/partners/{pid}/clients/{cid}/sites/{sid}/agents/{aid}`. Event topics like `agent.online`, `partner.upgraded`. Full table + rules at top of ROADMAP.md.
|
||||
|
||||
2. **Tunnel Channels (Phase 2)** — T1-T8 tracking Terminal/File/Registry/Service channels + tech-side subscriber (T5 is gating dep — browser currently has no way to receive tunnel data, `server/src/ws/mod.rs:808-825` discards incoming `AgentMessage::TunnelData`).
|
||||
|
||||
3. **Logging, Audit & Observability** — L1-L10 three-tier design:
|
||||
- Agent self-logging via OS-native sinks (Windows Event Log custom provider, Linux journald, macOS os_log)
|
||||
- Client machine health via OS event log pulls — default 15-min delta + force-pull on tunnel open/close; default levels Critical+Error+Warning for delta, 4h bulk for Info/Debug/Audit/Notification; all tenant-configurable
|
||||
- Tunnel audit direct to DB table `tunnel_audit` (already exists, unused) — no scrubbing, sensitive input captured intentionally for tech-behavior audit; 90-day tenant-visible retention default; indefinite system archive to object storage
|
||||
- Agent config push via `ServerMessage::Config` on connect + real-time when tenant admin changes settings
|
||||
|
||||
4. **Multi-tenancy / MSP SaaS (M1-M7)** — tenant_id on every table from now forward, tenancy-aware auth middleware, tenant admin dashboard, per-agent/month billing meter, data residency options, tenant export API, onboarding wizard.
|
||||
|
||||
5. **Modular Architecture & Public APIs (X1-X12)** — core vs. module boundary, event bus (NATS JetStream or Redis Streams), module manifest, module-to-core + module-to-module versioned APIs, public REST API `/api/public/v1/` with OpenAPI spec + scoped API keys, webhook subscriptions, WASM or OCI sandbox for third-party modules (deferred), per-module billing. Concrete module candidates documented: PSA/CRM, Remote Syslog, Backups, Patch Mgmt, IT-Glue-style Docs, Network Monitoring.
|
||||
|
||||
6. **Protocol Versioning & Stale-Agent Recovery (V1-V10)** — `/api/v1/bootstrap/hello` declared **sacred** (additive-only forever). Compat shim layer per old protocol version at `server/src/compat/v{N}.rs`. Server-initiated forced-upgrade instruction. Per-tenant update channels (stable/current/beta). Auto-sunset policy when old version fleet hits zero. Rollback path via `action: downgrade_required`. Concrete motivating example: Scileppi VP laptop offline for days — must be able to reconnect, get accepted, auto-upgrade.
|
||||
|
||||
7. **Certificates & Trust (C1-C11)** — full cost + priority matrix. C1: Azure Trusted Signing for Windows (Public Trust). C2: Apple Developer Program. C3: GPG for Linux. C4-C11: TLS automation, mTLS, SBOM, FP submissions, DKIM.
|
||||
|
||||
8. **Decisions Log** — appended rationale entries for every 2026-04-15 decision so future sessions don't re-litigate.
|
||||
|
||||
### CONTEXT.md anti-patterns added
|
||||
|
||||
- "DO NOT make breaking changes to `/api/v1/bootstrap/hello`" — additive-only forever
|
||||
- "DO NOT cross module boundaries by importing another module's internals" — event bus or exposed APIs only
|
||||
- Hierarchy terminology table added to anti-patterns block (canonical reference)
|
||||
|
||||
### Azure Trusted Signing — provisioned and IV submitted
|
||||
|
||||
**Business identity confirmed** via D&B profile lookup: `Arizona Computer Guru LLC` (D-U-N-S `00-566-1506` / `005661506`), 7437 E 22ND St, Tucson AZ 85710, (520) 304-8300, mike@azcomputerguru.com. 25+ years operating history → Public Trust eligible (>3yr threshold).
|
||||
|
||||
**Provisioned in subscription `Basic` (`e507e953-2ce9-4887-ba96-9b654f7d3267`):**
|
||||
- Resource group: `gururmm-signing-rg` (westus2)
|
||||
- Trusted Signing Account: `gururmm-signing`
|
||||
- Account URI: `https://wus2.codesigning.azure.net/`
|
||||
- SKU: Basic (~$9.99/mo billing started 2026-04-16 00:16 UTC)
|
||||
|
||||
**RBAC granted:**
|
||||
- `mike@azcomputerguru.com` → role `Artifact Signing Identity Verifier` at account scope
|
||||
|
||||
**Identity Validation submitted:**
|
||||
- IV ID: `03028768-f611-4904-aa58-c755020f436a`
|
||||
- Status: `In Progress` (Microsoft review, 1-5 business days typical)
|
||||
- Submitted name: `Arizona Computer Guru LLC` (state filing); D&B record has older `COMPUTER GURU` Corporation — may need to update D&B profile for consistency
|
||||
- Primary email: mike@; Secondary: admin@azcomputerguru.com
|
||||
- Microsoft may call 520-304-8300 — voicemail should identify Computer Guru
|
||||
|
||||
**Pending (blocks on IV approval):**
|
||||
- Certificate Profile creation: `az trustedsigning certificate-profile create --resource-group gururmm-signing-rg --account-name gururmm-signing --profile-name gururmm-public-trust --profile-type PublicTrust --identity-validation-id 03028768-f611-4904-aa58-c755020f436a`
|
||||
- Signing role assignment: `Trusted Signing Certificate Profile Signer` to CI build principal
|
||||
- Local tooling install: Windows SDK (for signtool.exe), Microsoft.Trusted.Signing.Client NuGet package
|
||||
|
||||
**All details persisted to vault:** `D:\vault\services\azure-trusted-signing.sops.yaml` (encrypted).
|
||||
|
||||
### Action items for next session
|
||||
|
||||
1. Check IV status — portal → Trusted Signing Accounts → gururmm-signing → Identity Validation
|
||||
2. If approved → run the cert profile create command (already staged in vault)
|
||||
3. If Microsoft flags legal name mismatch: reply with AZ Corp Commission LLC Articles; update D&B record
|
||||
4. Start signtool.exe + dlib integration in a local scratch project
|
||||
5. Meanwhile, fix the two backlog items (tunnel status 403 bug, agent logging) — they're both independent of the Azure work and small PRs
|
||||
Reference in New Issue
Block a user