claudetools/clients/dataforth/session-logs/2026-04-13-session.md

# Session Log: 2026-04-13 — Dataforth

## Summary

Continuation of the test datasheet pipeline work. Prior session (2026-04-12) confirmed PostgreSQL migration complete; Hoffman provided the new Swagger API URL; awaiting OAuth credentials. Today: reviewed the full API spec, prepared a structured question list for a Zoom call with Hoffman, and discussed architecture options (raw file upload vs. structured record push vs. direct DB).

Also helped user triage an unrelated Neptune Exchange mail-flow issue (tsorensen → external bounce). User resolved on their own before I got into it.

## Work completed

### API spec review
Pulled `https://www.dataforth.com/swagger/v1/swagger.json` and mapped endpoints.

**Base URL:** `https://www.dataforth.com` (presumed; Swagger UI at `/swagger/index.html`)

**Authentication (IdentityServer-style)**
- Flow: **OAuth2 Authorization Code + PKCE**
- Authorization URL: `https://login.dataforth.com/connect/authorize`
- Token URL: `https://login.dataforth.com/connect/token`
- Scopes: `openid`, `profile`, `dataforth.web`
- Swagger's own test client: `client_id = dataforth.swagger` (NOT for our use)
- OIDC discovery expected at: `https://login.dataforth.com/.well-known/openid-configuration`

**All endpoints**
| Path | Method |
|------|--------|
| `/api/v1/Admin/refresh-cache` | POST |
| `/api/v1/Admin/cache-status` | GET |
| `/api/v1/Categories` | GET |
| `/api/v1/Categories/{id}` | GET |
| `/api/v1/Categories/by-catalog-node/{catalogNodeId}` | GET |
| `/api/v1/OrderableProducts/{orderableProductId}/Attributes` | POST |
| `/api/v1/OrderableProducts/{orderableProductId}/Attributes/{attributeId}` | PUT/DELETE |
| `/api/v1/Products`, `/{id}`, `/by-part-number/{partNumber}` | GET |
| `/api/v1/product-series`, `/{id}`, `/by-designation/{designation}`, `/by-catalog-node/{catalogNodeId}` | GET |
| `/api/v1/ProductType`, `/{productTypeId}/products` | GET |
| `/api/v1/TestReportDataFiles` | POST (single upload) |
| `/api/v1/TestReportDataFiles` | GET (paginated list) |
| `/api/v1/TestReportDataFiles/bulk` | POST (batch upload) |
| `/api/v1/TestReportDataFiles/{serialNumber}` | GET / DELETE |
| `/api/v1/TestReportDataFiles/stats` | GET |

**TestReportDataFiles payload shapes**
- POST single: `{ SerialNumber: string(max 50), Content: string(min 1) }` → `{ SerialNumber, ContentHash, Created }`
- POST bulk: `{ Items: [CreateTestReportRequest, ...] }` → `{ TotalReceived, Created, Updated, Unchanged, Errors[] }`
- GET single: `{ SerialNumber, Content, CreatedAtUtc, UpdatedAtUtc }`
- GET stats: `{ TotalCount, LatestCreatedAtUtc, LatestUpdatedAtUtc }`
- Server handles dedup via ContentHash → client doesn't need to pre-check.

### Architecture discussion
Three options for delivering datasheets:
- **A: Raw file blob via current API** — works today, zero new API work, simple client code
- **B: Structured records via new endpoints** — cleaner long-term; we already have parsed data in AD2's PostgreSQL `TestDataDB` (2.8M records post-2026-04-12 migration). Requires Hoffman to add endpoints
- **C: Direct DB access** — rejected (coupling, security, DBA nightmare)

Preferred path: whichever is less work for Hoffman. Frame it as offering flexibility — we can send raw text, structured JSON, or even CSV.

### Questions prepared for John Hoffman Zoom call
Produced a prioritized list (MUST / SHOULD / NICE) covering:
- Batch size + payload size + rate limits (MUST)
- Idempotency + dedup semantics (MUST)
- Cutover plan from old DataforthWebShare path (MUST)
- Request: enable `client_credentials` grant on a new client for the AD2 uploader (SHOULD)
- Staging endpoint availability (SHOULD)
- PDF handling (`X:\For_Web_PDF`) — same endpoint or different? (SHOULD)
- Product linkage — does a TestReport need to link to a Product/Series record? (SHOULD)
- Monitoring + error visibility on his side (NICE)
- SLA / escalation contact (NICE)

### Pending from Hoffman (as of end-of-session 2026-04-13)
- OAuth credentials (he said "today")
- Clarification on client_credentials grant support
- Answers to the MUST questions above after the Zoom

## Pipeline context (unchanged from 2026-04-12)

### Current state
- **Stage 1**: DOS test stations → D2TESTNAS (192.168.0.9, rsync daemon, module "test" → /data/test) ✓
- **Stage 2**: NAS → AD2 via `Sync-FromNAS-rsync.ps1` scheduled every 15 min ✓
- **Stage 3**: DFWDS.exe validates + renames — **config wiped in crypto attack**; `C:\DFWDS\DFWDS_NAMES.TXT` missing. Check Haubner D: for backup.
- **Stage 4**: Website upload — **BROKEN**; this is what we're rebuilding via the new API
- **Stage 5**: PDF generation — ~4,773 PDFs in `X:\For_Web_PDF`, origin unclear

### Data locations
- Incoming: `X:\Test_Datasheets` (staging)
- Validated: `X:\For_Web` (~501K files) ← uploader source
- PDFs: `X:\For_Web_PDF` (~4.7K files)
- Rejected: `X:\Bad_Datasheets` (~18K)
- DFWDS logs: `X:\Datasheets_Log`
- `X:` = `\\ad2\webshare`

### Datasheet format
Plain text, ~50 lines. Header: Dataforth address/phone. Fields: Date, Model (e.g. SCM5B41-03), SN (e.g. 178439-1), accuracy test table, final test results. Filename: `{SN}.txt` (e.g. `178439-1.txt`).

### Credentials used/referenced
- **Old upload path** (being replaced): `DataforthWebShare / Data6277`
- **New API**: OAuth client credentials pending from Hoffman
- **Neptune Exchange** (for today's mail triage): `ACG\administrator` / `Gptf*77ttb##` — requires VPN

## Next session plan

1. Receive OAuth creds from Hoffman (client_id + client_secret, ideally client_credentials grant enabled)
2. Store credentials in `D:\vault\clients\dataforth\dataforth-api-oauth.sops.yaml`
3. Stand up a one-page POC: get token, POST one test report, verify via GET
4. If POC works → implement full uploader on AD2:
   - Language: PowerShell (fits existing scripts) or Python (already used in `projects/dataforth-dos/datasheet-pipeline/implementation/`)
   - State tracking: local manifest (serial → hash + last-upload-time) or use server's ContentHash response
   - Use `/bulk` endpoint in batches (size TBD with Hoffman)
   - Scheduled task on AD2, 15-min or hourly cadence
   - Initial backfill script for 501K files — run off-hours
5. Parallel-run with old webshare path until confident, then retire old path

## Reference URLs

- Swagger UI: https://www.dataforth.com/swagger/index.html
- Swagger JSON: https://www.dataforth.com/swagger/v1/swagger.json
- Authorization URL: https://login.dataforth.com/connect/authorize
- Token URL: https://login.dataforth.com/connect/token
- Expected OIDC discovery: https://login.dataforth.com/.well-known/openid-configuration