Files
claudetools/clients/dataforth/session-logs/2026-04-13-session.md
Mike Swanson 5169936cfc Session log: IMC SQL move + DISM repair attempt, VWP RDWeb brute-force incident, Dataforth API planning
- IMC: document 716 GB SQL backup cleanup, retention scheduled task, DB move C:->S:, sysadmin grant via single-user recovery, parked RDS removal after KB5075999 apply rolled back on ETW manifest error
- Valleywide: document RDWeb brute-force incident on VWP-QBS, UDM port forward closure, 30-day audit showing no breach, lockout policy restoration
- Dataforth: capture Swagger API review and Hoffman Zoom call prep
2026-04-13 15:40:43 -07:00

6.5 KiB

Session Log: 2026-04-13 — Dataforth

Summary

Continuation of the test datasheet pipeline work. Prior session (2026-04-12) confirmed PostgreSQL migration complete; Hoffman provided the new Swagger API URL; awaiting OAuth credentials. Today: reviewed the full API spec, prepared a structured question list for a Zoom call with Hoffman, and discussed architecture options (raw file upload vs. structured record push vs. direct DB).

Also helped user triage an unrelated Neptune Exchange mail-flow issue (tsorensen → external bounce). User resolved on their own before I got into it.

Work completed

API spec review

Pulled https://www.dataforth.com/swagger/v1/swagger.json and mapped endpoints.

Base URL: https://www.dataforth.com (presumed; Swagger UI at /swagger/index.html)

Authentication (IdentityServer-style)

  • Flow: OAuth2 Authorization Code + PKCE
  • Authorization URL: https://login.dataforth.com/connect/authorize
  • Token URL: https://login.dataforth.com/connect/token
  • Scopes: openid, profile, dataforth.web
  • Swagger's own test client: client_id = dataforth.swagger (NOT for our use)
  • OIDC discovery expected at: https://login.dataforth.com/.well-known/openid-configuration

All endpoints

Path Method
/api/v1/Admin/refresh-cache POST
/api/v1/Admin/cache-status GET
/api/v1/Categories GET
/api/v1/Categories/{id} GET
/api/v1/Categories/by-catalog-node/{catalogNodeId} GET
/api/v1/OrderableProducts/{orderableProductId}/Attributes POST
/api/v1/OrderableProducts/{orderableProductId}/Attributes/{attributeId} PUT/DELETE
/api/v1/Products, /{id}, /by-part-number/{partNumber} GET
/api/v1/product-series, /{id}, /by-designation/{designation}, /by-catalog-node/{catalogNodeId} GET
/api/v1/ProductType, /{productTypeId}/products GET
/api/v1/TestReportDataFiles POST (single upload)
/api/v1/TestReportDataFiles GET (paginated list)
/api/v1/TestReportDataFiles/bulk POST (batch upload)
/api/v1/TestReportDataFiles/{serialNumber} GET / DELETE
/api/v1/TestReportDataFiles/stats GET

TestReportDataFiles payload shapes

  • POST single: { SerialNumber: string(max 50), Content: string(min 1) }{ SerialNumber, ContentHash, Created }
  • POST bulk: { Items: [CreateTestReportRequest, ...] }{ TotalReceived, Created, Updated, Unchanged, Errors[] }
  • GET single: { SerialNumber, Content, CreatedAtUtc, UpdatedAtUtc }
  • GET stats: { TotalCount, LatestCreatedAtUtc, LatestUpdatedAtUtc }
  • Server handles dedup via ContentHash → client doesn't need to pre-check.

Architecture discussion

Three options for delivering datasheets:

  • A: Raw file blob via current API — works today, zero new API work, simple client code
  • B: Structured records via new endpoints — cleaner long-term; we already have parsed data in AD2's PostgreSQL TestDataDB (2.8M records post-2026-04-12 migration). Requires Hoffman to add endpoints
  • C: Direct DB access — rejected (coupling, security, DBA nightmare)

Preferred path: whichever is less work for Hoffman. Frame it as offering flexibility — we can send raw text, structured JSON, or even CSV.

Questions prepared for John Hoffman Zoom call

Produced a prioritized list (MUST / SHOULD / NICE) covering:

  • Batch size + payload size + rate limits (MUST)
  • Idempotency + dedup semantics (MUST)
  • Cutover plan from old DataforthWebShare path (MUST)
  • Request: enable client_credentials grant on a new client for the AD2 uploader (SHOULD)
  • Staging endpoint availability (SHOULD)
  • PDF handling (X:\For_Web_PDF) — same endpoint or different? (SHOULD)
  • Product linkage — does a TestReport need to link to a Product/Series record? (SHOULD)
  • Monitoring + error visibility on his side (NICE)
  • SLA / escalation contact (NICE)

Pending from Hoffman (as of end-of-session 2026-04-13)

  • OAuth credentials (he said "today")
  • Clarification on client_credentials grant support
  • Answers to the MUST questions above after the Zoom

Pipeline context (unchanged from 2026-04-12)

Current state

  • Stage 1: DOS test stations → D2TESTNAS (192.168.0.9, rsync daemon, module "test" → /data/test) ✓
  • Stage 2: NAS → AD2 via Sync-FromNAS-rsync.ps1 scheduled every 15 min ✓
  • Stage 3: DFWDS.exe validates + renames — config wiped in crypto attack; C:\DFWDS\DFWDS_NAMES.TXT missing. Check Haubner D: for backup.
  • Stage 4: Website upload — BROKEN; this is what we're rebuilding via the new API
  • Stage 5: PDF generation — ~4,773 PDFs in X:\For_Web_PDF, origin unclear

Data locations

  • Incoming: X:\Test_Datasheets (staging)
  • Validated: X:\For_Web (~501K files) ← uploader source
  • PDFs: X:\For_Web_PDF (~4.7K files)
  • Rejected: X:\Bad_Datasheets (~18K)
  • DFWDS logs: X:\Datasheets_Log
  • X: = \\ad2\webshare

Datasheet format

Plain text, ~50 lines. Header: Dataforth address/phone. Fields: Date, Model (e.g. SCM5B41-03), SN (e.g. 178439-1), accuracy test table, final test results. Filename: {SN}.txt (e.g. 178439-1.txt).

Credentials used/referenced

  • Old upload path (being replaced): DataforthWebShare / Data6277
  • New API: OAuth client credentials pending from Hoffman
  • Neptune Exchange (for today's mail triage): ACG\administrator / Gptf*77ttb## — requires VPN

Next session plan

  1. Receive OAuth creds from Hoffman (client_id + client_secret, ideally client_credentials grant enabled)
  2. Store credentials in D:\vault\clients\dataforth\dataforth-api-oauth.sops.yaml
  3. Stand up a one-page POC: get token, POST one test report, verify via GET
  4. If POC works → implement full uploader on AD2:
    • Language: PowerShell (fits existing scripts) or Python (already used in projects/dataforth-dos/datasheet-pipeline/implementation/)
    • State tracking: local manifest (serial → hash + last-upload-time) or use server's ContentHash response
    • Use /bulk endpoint in batches (size TBD with Hoffman)
    • Scheduled task on AD2, 15-min or hourly cadence
    • Initial backfill script for 501K files — run off-hours
  5. Parallel-run with old webshare path until confident, then retire old path

Reference URLs