# Session Log — Dataforth — 2026-04-14 ## DFWDS port + Hoffman API uploader pipeline (end-to-end working) Mid-day session: pulled the legacy DFWDS VB6 source from AD1, ported its file-classification logic to Node, wired up the bulk uploader against Hoffman's new `https://www.dataforth.com` API, drained the 897-file Test_Datasheets backlog, and pushed everything currently in For_Web to the live server. End result: server caught up to the AD2 For_Web state with 803 new records created. (Earlier today this same conversation also did a bunch of email-account password resets via ACG-DC16 — those are noted at the bottom for completeness but aren't Dataforth.) --- ## Session Summary ### What was accomplished 1. **Pulled DFWDS source** (Dataforth Web Datasheet System, VB6) from AD1 via AD2 jump host. 71 files / 957 KB. Folder structure: `Program/`, `Release/` (current source: DFWDS.bas + frmSplash.frm + DFWDS.exe), `_Notes/` (developer docs), `_Working/` (WIP), plus `History/` snapshots from 2014-09-24 through 2014-10-02. 2. **Determined what DFWDS actually does** (1181 lines of VB6): pure filesystem mover/renamer. Reads `X:\Test_Datasheets\*.txt`, validates filename, moves valid → `X:\For_Web`, bad → `X:\Bad_Datasheets`, logs to `X:\Datasheets_Log\DFWDS_YYYY_MM_DD.log`. **No HTTP/FTP code** — "website" in the name only refers to the destination folder; web upload was always a separate piece (currently the new Hoffman API). 3. **Validation rules captured from VB source:** - Filename must be `.txt` - Must contain a dash; left-of-dash = Work Order # - WO# is all-numeric → valid (unless starts with 0) - WO# starts with `A`-`J` followed by digits → DOS-encoded; decode first char to a 2-digit prefix (`A`=10, `B`=11, ... `J`=19) and rename in place, then move to For_Web - Anything else → bad 4. **Probed the X: drive state on AD2** — `\\ad2\webshare\`: - `Test_Datasheets`: 897 files, 2.4 MB, dates 2026-02-09 → 2026-04-13 14:42 (DFWDS hadn't run since 2026-03-11, ~33 days) - `Bad_Datasheets`: 18,801 files (historical, oldest 2003) - `For_Web`: 6,258 files at start of session (later 7,061 after DFWDS port ran) - `Datasheets_Log`: 3,336 log files 5. **Found existing reference scripts** in `projects/dataforth-dos/datasheet-pipeline/`: - `test-upload-two.py` — single POST + roundtrip diff - `test-scenarios.py` — idempotency, update, bulk tests (all green) - `fetch-server-inventory.py` — paginates GET, writes serial list - `compute-delta.py` — diffs local vs server inventories - All four were prior work that proved the API contract end-to-end. Confirmed `dataforth.onprem.sync` client + `Trxvwee2234-Awer8723-2` secret + `dataforth.web` scope still works. 6. **Read API Swagger** (`https://www.dataforth.com/swagger/v1/swagger.json`). Endpoints of interest: - `GET /api/v1/TestReportDataFiles/stats` → TotalCount, LatestCreatedAtUtc, LatestUpdatedAtUtc - `GET /api/v1/TestReportDataFiles?page=&pageSize=&afterSerialNumber=` → list (cursor-paginated) - `GET /api/v1/TestReportDataFiles/{serialNumber}` → fetch one - `POST /api/v1/TestReportDataFiles` body `{SerialNumber, Content}` → 200/201 - `POST /api/v1/TestReportDataFiles/bulk` body `{Items: [{SerialNumber, Content}, ...]}` → returns TotalReceived/Created/Updated/Unchanged/Errors - `DELETE /api/v1/TestReportDataFiles/{serialNumber}` 7. **Discovered the 6,240-record delta from yesterday was already uploaded** by some earlier process (today ~13:44 server time). Server jumped 483,339 → 489,579 between yesterday's snapshot and the start of this session. Spot-checks on `100078-1`, `99693-9`, `51848-7` confirmed `CreatedAtUtc = 2026-04-14T13:44`. Our test-50 came back all `Unchanged` because the records were there already. 8. **Wrote `dfwds-process.js`** — Node port of DFWDS's classification + move logic. ~120 lines. Mirrors VB validation exactly: ext check, dash position, all-numeric vs A-J prefix, decode + rename if prefixed. 9. **Wrote `upload-delta.js`** — Node bulk uploader. Reads pipe-delimited delta file (`SN|path|size|mtime`), batches into POST `/bulk`, refreshes token before expiry (1h TTL), retries 401 once, logs every batch result. 10. **Wrote orchestrators**: - `run_uploader_on_ad2.py` — SFTP scripts to AD2, set CF_* env vars in encoded PowerShell command, run uploader - `run_full_drain.py` — full pipeline: DFWDS → re-inventory → delta → upload (built but timed out at server-inventory pull step) - `upload_all_for_web.py` — simpler: enumerate For_Web, push everything (idempotent server handles dedup) 11. **End-to-end run of the pipeline:** - Dry-run DFWDS on 897: classified all as `valid=897 renamed=0 bad=0` (modern serial-number format, no DOS prefixes in the queue) - Live DFWDS run: 897/897 moved to For_Web in 1.1 seconds. For_Web: 6,258 → 7,061 (gain of 803, meaning ~94 of the 897 SNs already existed in For_Web and got overwritten in place) - Skipped the slow full-server-inventory pull (was timing out at 10 min mark, ~400K of ~490K records) - Just pushed entire For_Web to API: 7,061 items in 49.7s @ ~142/s sustained - Result: **803 created, 114 updated, 6,144 unchanged, 0 errors** - Server total: 489,579 → **490,382** (+803 new records) - LatestCreatedAtUtc: `2026-04-15T03:58:00` ### Key decisions & rationale - **Node, not Python, for the on-AD2 scripts.** Node is already installed on AD2 (used by testdatadb), no new runtime needed. Aligns with the testdatadb stack that the future DFWDS replacement should integrate into. - **`.replace('\\','')` for the AD2 vault password** — same stale-shell-escape issue as before in `clients/dataforth/ad2.sops.yaml`. Vault entry still needs cleanup. - **Skipped the slow inventory diff in the final run** — server is idempotent (returns Unchanged for matching content), so just pushing all of For_Web works. Full server inventory takes ~700s (490K records @ 700/s pagination). Not worth waiting when we can let the server tell us what's new. - **Did NOT install Python on AD2** — user offered to ("install whatever is necessary") but Node was already there and the rewrite was clean. No improvement from adding a second runtime. - **Used env vars (`CF_*`) for credentials in the on-AD2 invocation** — passed via SSH-encoded PowerShell, not written to disk. Credentials only live on the workstation in SOPS vault. - **Two scripts on AD2 are reusable for ongoing automation:** `dfwds-process.js` and `upload-delta.js` are the two pieces that need to run nightly going forward. ### Problems encountered and resolutions | Problem | Resolution | |---|---| | `\\ad2\` UNC path doesn't resolve from my workstation (DNS / SMB blocked over OpenVPN) | Deployed scripts to AD2 (`C:\Users\sysadmin\Documents\dataforth-uploader\`) and ran them locally with `node` | | AD2 has no Python | Ported uploader from `upload-delta.py` to `upload-delta.js` (Node already installed) | | `fetch-server-inventory.py` timed out at 600s after 405,918 of ~490K records | Skipped the full inventory; pushed all of For_Web directly and let server's idempotency dedup | | First test-50 came back all `Unchanged` | Investigation revealed the original delta had already been uploaded earlier today (server count jumped 483,339 → 489,579 between yesterday's snapshot and now). Idempotency working as designed. | | PowerShell regex to extract just the 897 from DFWDS log produced 0 matches | Pivoted to enumerate-everything-in-For_Web approach; idempotent server handles it just as well | | `socket.timeout` on `https://www.dataforth.com` from workstation | Probably OpenVPN routing; retried with longer timeout, worked. Server itself is healthy. | --- ## Credentials ### Dataforth Product API (Hoffman) — already in vault - Vault: `clients/dataforth/api-oauth.sops.yaml` - Token URL: `https://login.dataforth.com/connect/token` - API base: `https://www.dataforth.com` - Swagger: `https://www.dataforth.com/swagger/v1/swagger.json` - Grant type: `client_credentials` - Client ID: `dataforth.onprem.sync` - Client secret: `Trxvwee2234-Awer8723-2` - Scope: `dataforth.web` - TTL: 1 hour - Token claim `role` = Admin ### AD2 (Dataforth) - SSH: `sysadmin / Paper123!@#` on 192.168.0.6:22 - Vault entry stores `Paper123\!@#` (stale shell-escape; strip with `.replace('\\','')`) ### ACG-DC16 (separate work, see "Email password resets" below) - WinRM 5985: `administrator@acg.local / Gptf*77ttb##` on 172.16.3.50 --- ## Infrastructure & Servers ### AD2 webshare structure (verified today) | Folder | Purpose | Files (start) | Files (end) | |---|---|---|---| | `\\ad2\webshare\Test_Datasheets` | DFWDS input staging | 897 | 0 (drained) | | `\\ad2\webshare\Bad_Datasheets` | DFWDS quarantine | 18,801 | 18,801 (no bad in this batch) | | `\\ad2\webshare\For_Web` | DFWDS output / API source | 6,258 | 7,061 | | `\\ad2\webshare\Datasheets_Log` | DFWDS run logs | 3,336 | 3,337 (today's added) | ### Hoffman API server state - Start of session: `TotalCount=489,579`, `LatestCreatedAtUtc=2026-04-14T13:45:23` (from a backfill earlier today by someone/something else) - End of session: `TotalCount=490,382` (+803), `LatestCreatedAtUtc=2026-04-15T03:58:00` ### AD2 deployment dir (new) - `C:\Users\sysadmin\Documents\dataforth-uploader\` - `dfwds-process.js` - `upload-delta.js` - `delta_for_web_all.txt` (transient) - `for_web_inventory.txt` (transient) - `delta_to_upload.txt` (transient, from yesterday's compute-delta) - `upload-logs/` (per-run timestamped logs) --- ## Commands & Outputs ### Final upload-delta.js run (the headline result) ``` [INFO] 7061 items queued (start=0 limit=all batch=100) [OK] token len=915 batch 1/71: recv=100 cre=0 upd=2 unch=98 err=0 | rate=37/s ... batch 71/71: recv=61 cre=0 upd=0 unch=61 err=0 | rate=142/s [DONE] elapsed 49.7s received: 7061 created: 803 updated: 114 unchanged: 6144 errors: 0 ``` ### DFWDS port live run ``` [2026-04-15T03:44:00.797Z] === DFWDS-process start (Node port) === in: C:\Shares\webshare\Test_Datasheets out: C:\Shares\webshare\For_Web bad: C:\Shares\webshare\Bad_Datasheets dry: false, limit: all queued: 897 files (of 897 in dir) ... === summary: valid=897 renamed=0 bad=0 errors=0 [INFO] log: C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log ``` ### Quick OAuth token grab (handy CLI) ```bash curl -sS -X POST https://login.dataforth.com/connect/token \ -d grant_type=client_credentials \ -d client_id=dataforth.onprem.sync \ -d client_secret=Trxvwee2234-Awer8723-2 \ -d scope=dataforth.web ``` --- ## Configuration Changes ### New files created (all in `D:\claudetools\projects\dataforth-dos\`) | Path | Lines | Purpose | |---|---|---| | `dfwds-research/probe_dfwds.py` | ~30 | Initial AD1 folder probe | | `dfwds-research/fetch_dfwds.py` | ~70 | Pulls full DFWDS tree to local via AD2 stage + SFTP | | `dfwds-research/check_x_drive.py` | ~50 | Inventories X: folders to see DFWDS workload | | `dfwds-research/api_probe.py` | ~50 | OAuth + Swagger fetch + endpoint listing | | `dfwds-research/source/` | 72 files | Pulled DFWDS tree (Program/, Release/, _Notes/, _Working/, History/) | | `dfwds-research/swagger.json` | (full Swagger dump) | For offline reading | | `datasheet-pipeline/dfwds-process.js` | ~120 | Node port of DFWDS classification + move | | `datasheet-pipeline/upload-delta.js` | ~190 | Node bulk uploader to Hoffman API | | `datasheet-pipeline/probe_ad2_runtime.py` | ~15 | Check Python/Node availability on AD2 | | `datasheet-pipeline/run_uploader_on_ad2.py` | ~80 | Orchestrator: SFTP + run uploader | | `datasheet-pipeline/run_full_drain.py` | ~110 | Orchestrator: DFWDS + inventory + delta + upload | | `datasheet-pipeline/upload_all_for_web.py` | ~70 | Simpler orchestrator: enumerate + push everything | | `datasheet-pipeline/upload_897.py` | ~80 | (Failed regex experiment — can delete) | ### Files changed on AD2 - New: `C:\Users\sysadmin\Documents\dataforth-uploader\dfwds-process.js` - New: `C:\Users\sysadmin\Documents\dataforth-uploader\upload-delta.js` - 897 files moved: `C:\Shares\webshare\Test_Datasheets\*.txt` → `C:\Shares\webshare\For_Web\*.txt` - New log file: `C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log` ### Local machine - Python `pyyaml` — already installed (used for SOPS YAML parsing) - Python `paramiko` 4.0.0 — already installed --- ## Pending / Incomplete / Open Items 1. **Wire up scheduled automation on AD2.** Currently this whole pipeline runs only when triggered manually from the workstation. Need either: - (a) `creds.json` on AD2 (ACL'd to SYSTEM/svc_testdatadb) + Windows Scheduled Task running `dfwds-process.js` then `upload-delta.js` nightly - (b) Workstation cron → SSH to AD2 → run with env-var creds - (c) Windows Credential Manager on AD2 User asked which to wire (a or b) at end of session — **awaiting decision.** 2. **Integrate into the testdatadb pipeline.** The current scripts run separately from `C:\Shares\testdatadb`. Cleaner integration would have `database/import.js` trigger DFWDS-process + upload after each incremental import, rather than nightly batch. 3. **Vault hygiene** (carryover from previous sessions): - `clients/dataforth/ad2.sops.yaml` has stale `\!` escape in password (workaround in code: `.replace('\\','')`) - Cloudflare tokens still in 1Password only, not in `services/cloudflare.sops.yaml` 4. **Re-run server inventory** at some point so we have a fresh baseline. Today's pull timed out at 405K. The data isn't critical for ongoing operation since we use the server's idempotency, but having a fresh `server_inventory.txt` makes future delta computations faster than re-uploading everything. 5. **Delete `upload_897.py`** (failed regex experiment) — superseded by `upload_all_for_web.py`. --- ## Reference Information ### URLs / endpoints - Swagger UI: `https://www.dataforth.com/swagger/index.html` - Identity Server: `https://login.dataforth.com` - OIDC discovery: `https://login.dataforth.com/.well-known/openid-configuration` ### File paths (will need again) - AD2 deployment dir: `C:\Users\sysadmin\Documents\dataforth-uploader\` - AD2 webshare: `C:\Shares\webshare\` (Test_Datasheets, For_Web, Bad_Datasheets, Datasheets_Log) - AD1 DFWDS source: `\\AD1\Engineering\ENGR\ATE\Test Datasheets\DFWDS\` - Local workspace: `D:\claudetools\projects\dataforth-dos\` - Local DFWDS source pull: `D:\claudetools\projects\dataforth-dos\dfwds-research\source\` - Local pipeline scripts: `D:\claudetools\projects\dataforth-dos\datasheet-pipeline\` ### DFWDS classification rules (one-liner reference) - `.txt` extension required - Dash required in stem; left-of-dash = WO# - WO# all-numeric → valid (unless leading 0) - WO# `[A-J]\d+` → decode first char (A=10, B=11, ... J=19), rename file in place, then move - Anything else → bad ### API behavior notes - `POST /bulk`: server returns counts per category in single response. `Updated` = same SN, different content. `Unchanged` = same SN, same content. `Created` = new SN. - Token TTL = 3600s. Uploader refreshes 60s before expiry. - Server pagination: `?page=N&pageSize=1000&afterSerialNumber=`. Page sizes up to 1000 work. ### Sustained throughput observed today - Bulk upload: ~142 files/s sustained for ~50s (batches of 100, single client) - Server inventory pagination: ~700 records/s --- ## Other work in this same conversation (non-Dataforth) Earlier today this session also handled email-account password resets on **ACG-DC16** (acg.local domain, 172.16.3.50) for accounts hosted by Neptune Exchange. Captured here for completeness — full detail is independent of Dataforth: | Account | Email | New password | |---|---|---| | `cansley_starrpass.c` | cansley@devconllc.com (Chris Ansley) | `Natascha8144$` | | `bertie2` | bertie@amtransit.com (Ana Lozano) | `2026@Arizona` | | `caitlin` | caitlin@amtransit.com | `Welcomeamt2026!` | | `matilde` | matilde@amtransit.com (Slate) | `Welcomeamt2026!` | | `nydia` | Nydia@amtransit.com | `Welcomeamt2026!` | | `tom` | tsorensen@RieussetCorp.com (Tom Sorensen) | `RC$sor3740` | | `tomrc` | tomrc@RieussetCorp.com (Tom Sorensen) | `RC$sor3740` | | `ojodeagua` | ojodeagua@RieussetCorp.com (Tom Sorensen) | `RC$sor3740` | | `csorensen` | csorensen@RieussetCorp.com (Christine Sorensen) | `RC$sor3740` | Untouched intentionally: `driver@amtransit.com`, `Jeffrey.Schaufel` (request cancelled). Method: Python `winrm` with auth `administrator@acg.local / Gptf*77ttb##`, `Set-ADAccountPassword -Reset` + `Unlock-ADAccount`. UPN format required (`ACG\administrator` failed initial Python NTLM auth due to escape sequence quirk). Also today: created the **IMC ticket-notes write-up** at `clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md` (formatted 2026-04-11/12/13 IMC1 work for the PSA ticket — RDS removal blocked, SQL backup cleanup, DB relocation, RDS removal parked). Opened in Notepad++ for the user. Also: **Cloudflare DNS cleanup** — flipped `unifi.azcomputerguru.com` was already grey-cloud (no-op); flagged `ui.azcomputerguru.com` as the more likely target (still proxied + 525) — user didn't act on that. --- **Last Updated:** 2026-04-14 21:00 **Next Actions:** decide automation approach (a/b/c above), then commit + push.