Dataforth UI push + dedup + refactor, GuruRMM roadmap evolution, Azure signing setup

Dataforth (projects/dataforth-dos/):
- UI feature: row coloring + PUSH/RE-PUSH buttons + Website Status filter
- Database dedup to one row per SN (2.89M -> 469K rows, UNIQUE constraint added)
- Import logic handles FAIL -> PASS retest transition
- Refactored upload-to-api.js to render datasheets in-memory (dropped For_Web filesystem dep)
- Bulk pushed 170,984 records to Hoffman API
- Statistical sanity check: 100/100 stamped SNs verified on Hoffman

GuruRMM (projects/msp-tools/guru-rmm/):
- ROADMAP.md: added Terminology (5-tier hierarchy), Tunnel Channels Phase 2,
  Logging/Audit/Observability, Multi-tenancy, Modular Architecture,
  Protocol Versioning, Certificates sections + Decisions Log
- CONTEXT.md: hierarchy table, new anti-patterns (bootstrap sacred,
  no cross-module imports), revised next-steps priorities

Session logs for both projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-15 17:39:32 -07:00
parent eae9d7f644
commit 733d87f20e
42 changed files with 9153 additions and 7 deletions

View File

@@ -0,0 +1,162 @@
# GuruRMM Session Log — 2026-04-15
## Context
End-to-end test of the Tunnel Phase 1 lifecycle, triggered opportunistically
while troubleshooting SSH flakiness on AD2 (Dataforth project). No code
changes — exercised the production API from an off-LAN workstation via the
public Cloudflare endpoint (`rmm-api.azcomputerguru.com`).
## What worked
| Step | Endpoint | Result |
|---|---|---|
| Login | `POST /api/auth/login` | 200, token returned |
| List agents | `GET /api/agents` | 6 agents, AD2 and DESKTOP-0O8A1RL online on v0.6.0 |
| Open tunnel | `POST /api/v1/tunnel/open` (agent_id=AD2 `d28a1c90-47d7-448f-a287-197bc8892234`) | 200, `{session_id: 0682a80c-a899-403b-9473-aaaed50e4aba, status: active}` |
| Status while active | `GET /api/v1/tunnel/status/{id}` | 200, full session record (opened_at, last_activity, agent_id) |
| Close tunnel | `POST /api/v1/tunnel/close` | 200, `{status: closed}` |
## Findings (actionable)
### 1. Status endpoint returns 403 after close
`GET /api/v1/tunnel/status/{id}` against a just-closed session returns
`403 Forbidden — "Session not found or not owned by user"` instead of
`{status: closed}`. Root cause likely that the `WHERE status = 'active'`
filter (from `idx_tech_sessions_active` — see CONTEXT.md line 256) is applied
to the status lookup in addition to the ownership check, so closed sessions
fail ownership verification and fall through to the 403 branch.
**Fix:** separate the existence lookup from the ownership check. If the
session exists but belongs to the requesting tech, return the closed record
rather than masking it as a permission error.
Location to inspect: `server/src/api/tunnel.rs` (status handler) and/or
`server/src/db/tunnel.rs` (session fetch query).
### 2. Agent writes no logs
`gururmm-agent.exe 0.6.0` on AD2 produces no files in
`C:\Program Files\GuruRMM\`, `C:\ProgramData\GuruRMM\`, nor any Windows
Application Event Log entries under provider `gururmm*`. This made it
impossible to confirm the agent-side state transition
(`Heartbeat → Tunnel`) or receipt of `TunnelReady` during the test.
**Fix:** add a log target in `agent/src/main.rs` (env_logger or tracing
with a rolling file appender) writing to
`C:\ProgramData\GuruRMM\agent.log`. Optionally also emit critical events
(tunnel open/close, update success/failure) to the Windows Event Log via
`eventlog` crate.
### 3. Phase 2 gap confirmed against a real use case
Live need: run a couple of diagnostic commands on AD2 (sshd flapping
sporadically on port 22, no process crash in Event Log; want to investigate
firewall/Defender events from the server side). With no channels, the
tunnel's only utility today is proving the session layer works. The actual
remote-operate capability still depends on Phase 2.
**Priority order for Phase 2 channels** (based on what would have been useful
here):
1. **Terminal channel** first — unlocks 80% of field use cases (log tails,
`Get-Service`, `Restart-Service`, `Get-WinEvent`).
2. **Service channel** second — tight scope, high value for "restart sshd".
3. **File channel** third — needed but rarely urgent; SFTP already exists.
4. **Registry channel** last — niche, can defer.
## What Else We Observed
- The public tunnel chain `rmm-api.azcomputerguru.com` → Cloudflare → nginx
→ API (3001) proxies `/api/*` correctly. The docs in CONTEXT.md implied
nginx only served `/downloads/`; confirmed today that it also proxies API
paths, which is why off-LAN admin usage works.
- AD2 agent start time `2026-04-11 22:09` corresponds to last reboot of
AD2; the agent has not restarted since despite sshd port flaps (sshd PID
4012 also continuously running since same moment). Confirms the tunnel
infrastructure and the RMM agent are stable; the sshd flap is a separate
network-layer issue unrelated to GuruRMM.
## Credentials Used
- **Admin Email:** admin@azcomputerguru.com
- **Admin Password:** GuruRMM2025
- **Public API:** https://rmm-api.azcomputerguru.com
**Note:** `op read "op://Infrastructure/GuruRMM Server/Admin Password"`
returned a stale value (`ClaudeAPI2026!@#`) that fails login. The
2026-04-14 session log documents the current password as `GuruRMM2025`.
1Password entry should be updated to match.
## Next Steps
1. Update 1Password `Infrastructure/GuruRMM Server` entry — set
`Admin Password` field to `GuruRMM2025` to match what server accepts.
2. Fix `/api/v1/tunnel/status/{id}` for closed sessions (see Finding 1).
3. Add file/event-log output to agent (see Finding 2).
4. Begin Phase 2 — Terminal channel first.
---
## Update (evening session): Roadmap evolution + Azure Trusted Signing setup
Substantial architectural planning session. Product direction shifted from "single-tenant RMM tool" to "multi-tenant SaaS for MSPs." Roadmap updated significantly to reflect.
### Roadmap additions to ROADMAP.md
1. **Terminology (canonical)** — locked in the 5-tier hierarchy: Platform → Partner (DB: tenant_id) → Client → Site → Agent. API/UI says "Partner"; DB column is `tenant_id`. API path convention `/api/public/v1/partners/{pid}/clients/{cid}/sites/{sid}/agents/{aid}`. Event topics like `agent.online`, `partner.upgraded`. Full table + rules at top of ROADMAP.md.
2. **Tunnel Channels (Phase 2)** — T1-T8 tracking Terminal/File/Registry/Service channels + tech-side subscriber (T5 is gating dep — browser currently has no way to receive tunnel data, `server/src/ws/mod.rs:808-825` discards incoming `AgentMessage::TunnelData`).
3. **Logging, Audit & Observability** — L1-L10 three-tier design:
- Agent self-logging via OS-native sinks (Windows Event Log custom provider, Linux journald, macOS os_log)
- Client machine health via OS event log pulls — default 15-min delta + force-pull on tunnel open/close; default levels Critical+Error+Warning for delta, 4h bulk for Info/Debug/Audit/Notification; all tenant-configurable
- Tunnel audit direct to DB table `tunnel_audit` (already exists, unused) — no scrubbing, sensitive input captured intentionally for tech-behavior audit; 90-day tenant-visible retention default; indefinite system archive to object storage
- Agent config push via `ServerMessage::Config` on connect + real-time when tenant admin changes settings
4. **Multi-tenancy / MSP SaaS (M1-M7)** — tenant_id on every table from now forward, tenancy-aware auth middleware, tenant admin dashboard, per-agent/month billing meter, data residency options, tenant export API, onboarding wizard.
5. **Modular Architecture & Public APIs (X1-X12)** — core vs. module boundary, event bus (NATS JetStream or Redis Streams), module manifest, module-to-core + module-to-module versioned APIs, public REST API `/api/public/v1/` with OpenAPI spec + scoped API keys, webhook subscriptions, WASM or OCI sandbox for third-party modules (deferred), per-module billing. Concrete module candidates documented: PSA/CRM, Remote Syslog, Backups, Patch Mgmt, IT-Glue-style Docs, Network Monitoring.
6. **Protocol Versioning & Stale-Agent Recovery (V1-V10)**`/api/v1/bootstrap/hello` declared **sacred** (additive-only forever). Compat shim layer per old protocol version at `server/src/compat/v{N}.rs`. Server-initiated forced-upgrade instruction. Per-tenant update channels (stable/current/beta). Auto-sunset policy when old version fleet hits zero. Rollback path via `action: downgrade_required`. Concrete motivating example: Scileppi VP laptop offline for days — must be able to reconnect, get accepted, auto-upgrade.
7. **Certificates & Trust (C1-C11)** — full cost + priority matrix. C1: Azure Trusted Signing for Windows (Public Trust). C2: Apple Developer Program. C3: GPG for Linux. C4-C11: TLS automation, mTLS, SBOM, FP submissions, DKIM.
8. **Decisions Log** — appended rationale entries for every 2026-04-15 decision so future sessions don't re-litigate.
### CONTEXT.md anti-patterns added
- "DO NOT make breaking changes to `/api/v1/bootstrap/hello`" — additive-only forever
- "DO NOT cross module boundaries by importing another module's internals" — event bus or exposed APIs only
- Hierarchy terminology table added to anti-patterns block (canonical reference)
### Azure Trusted Signing — provisioned and IV submitted
**Business identity confirmed** via D&B profile lookup: `Arizona Computer Guru LLC` (D-U-N-S `00-566-1506` / `005661506`), 7437 E 22ND St, Tucson AZ 85710, (520) 304-8300, mike@azcomputerguru.com. 25+ years operating history → Public Trust eligible (>3yr threshold).
**Provisioned in subscription `Basic` (`e507e953-2ce9-4887-ba96-9b654f7d3267`):**
- Resource group: `gururmm-signing-rg` (westus2)
- Trusted Signing Account: `gururmm-signing`
- Account URI: `https://wus2.codesigning.azure.net/`
- SKU: Basic (~$9.99/mo billing started 2026-04-16 00:16 UTC)
**RBAC granted:**
- `mike@azcomputerguru.com` → role `Artifact Signing Identity Verifier` at account scope
**Identity Validation submitted:**
- IV ID: `03028768-f611-4904-aa58-c755020f436a`
- Status: `In Progress` (Microsoft review, 1-5 business days typical)
- Submitted name: `Arizona Computer Guru LLC` (state filing); D&B record has older `COMPUTER GURU` Corporation — may need to update D&B profile for consistency
- Primary email: mike@; Secondary: admin@azcomputerguru.com
- Microsoft may call 520-304-8300 — voicemail should identify Computer Guru
**Pending (blocks on IV approval):**
- Certificate Profile creation: `az trustedsigning certificate-profile create --resource-group gururmm-signing-rg --account-name gururmm-signing --profile-name gururmm-public-trust --profile-type PublicTrust --identity-validation-id 03028768-f611-4904-aa58-c755020f436a`
- Signing role assignment: `Trusted Signing Certificate Profile Signer` to CI build principal
- Local tooling install: Windows SDK (for signtool.exe), Microsoft.Trusted.Signing.Client NuGet package
**All details persisted to vault:** `D:\vault\services\azure-trusted-signing.sops.yaml` (encrypted).
### Action items for next session
1. Check IV status — portal → Trusted Signing Accounts → gururmm-signing → Identity Validation
2. If approved → run the cert profile create command (already staged in vault)
3. If Microsoft flags legal name mismatch: reply with AZ Corp Commission LLC Articles; update D&B record
4. Start signtool.exe + dlib integration in a local scratch project
5. Meanwhile, fix the two backlog items (tunnel status 403 bug, agent logging) — they're both independent of the Azure work and small PRs