Files
claudetools/clients/dataforth/session-logs/2026-04-14-session.md
Mike Swanson dd5c5afd4b Session log + DFWDS Node port + Hoffman API uploader pipeline
Built the missing piece between the test datasheet pipeline and Dataforth's
new product API. End-to-end:

- Pulled DFWDS (Dataforth Web Datasheet System) VB6 source from
  AD1\Engineering\ENGR\ATE\Test Datasheets\DFWDS to local for analysis
- Decoded its filename validation: A-J prefix decodes (A=10..J=19), all-
  numeric WO# valid (no leading 0), anything else bad
- Ported the validation + move logic to Node (dfwds-process.js)
- Built bulk uploader (upload-delta.js) for Hoffman's Swagger API
  (POST /api/v1/TestReportDataFiles/bulk with OAuth client_credentials)

Sanitized 3 prior reference scripts (fetch-server-inventory, test-scenarios,
test-upload-two) to read CF_* env vars instead of hardcoded creds.

Live drain results:
- 897 files moved Test_Datasheets -> For_Web (all valid, no renames, no
  bad), DFWDS port summary in 1.1s
- Pushed entire For_Web (7,061 files) to Hoffman API in 49.7s @ 142/s:
  Created=803 Updated=114 Unchanged=6,144 Errors=0
- Server count: 489,579 -> 490,382 (+803 net new)

Also:
- Added clients/dataforth/.gitignore to exclude plaintext Oauth.txt note
- Added clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md
  (ticket write-up of 2026-04-11/12/13 IMC1 RDS removal/SQL migration work)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 21:06:50 -07:00

17 KiB

Session Log — Dataforth — 2026-04-14

DFWDS port + Hoffman API uploader pipeline (end-to-end working)

Mid-day session: pulled the legacy DFWDS VB6 source from AD1, ported its file-classification logic to Node, wired up the bulk uploader against Hoffman's new https://www.dataforth.com API, drained the 897-file Test_Datasheets backlog, and pushed everything currently in For_Web to the live server. End result: server caught up to the AD2 For_Web state with 803 new records created.

(Earlier today this same conversation also did a bunch of email-account password resets via ACG-DC16 — those are noted at the bottom for completeness but aren't Dataforth.)


Session Summary

What was accomplished

  1. Pulled DFWDS source (Dataforth Web Datasheet System, VB6) from AD1 via AD2 jump host. 71 files / 957 KB. Folder structure: Program/, Release/ (current source: DFWDS.bas + frmSplash.frm + DFWDS.exe), _Notes/ (developer docs), _Working/ (WIP), plus History/ snapshots from 2014-09-24 through 2014-10-02.

  2. Determined what DFWDS actually does (1181 lines of VB6): pure filesystem mover/renamer. Reads X:\Test_Datasheets\*.txt, validates filename, moves valid → X:\For_Web, bad → X:\Bad_Datasheets, logs to X:\Datasheets_Log\DFWDS_YYYY_MM_DD.log. No HTTP/FTP code — "website" in the name only refers to the destination folder; web upload was always a separate piece (currently the new Hoffman API).

  3. Validation rules captured from VB source:

    • Filename must be .txt
    • Must contain a dash; left-of-dash = Work Order #
    • WO# is all-numeric → valid (unless starts with 0)
    • WO# starts with A-J followed by digits → DOS-encoded; decode first char to a 2-digit prefix (A=10, B=11, ... J=19) and rename in place, then move to For_Web
    • Anything else → bad
  4. Probed the X: drive state on AD2\\ad2\webshare\:

    • Test_Datasheets: 897 files, 2.4 MB, dates 2026-02-09 → 2026-04-13 14:42 (DFWDS hadn't run since 2026-03-11, ~33 days)
    • Bad_Datasheets: 18,801 files (historical, oldest 2003)
    • For_Web: 6,258 files at start of session (later 7,061 after DFWDS port ran)
    • Datasheets_Log: 3,336 log files
  5. Found existing reference scripts in projects/dataforth-dos/datasheet-pipeline/:

    • test-upload-two.py — single POST + roundtrip diff
    • test-scenarios.py — idempotency, update, bulk tests (all green)
    • fetch-server-inventory.py — paginates GET, writes serial list
    • compute-delta.py — diffs local vs server inventories
    • All four were prior work that proved the API contract end-to-end. Confirmed dataforth.onprem.sync client + Trxvwee2234-Awer8723-2 secret + dataforth.web scope still works.
  6. Read API Swagger (https://www.dataforth.com/swagger/v1/swagger.json). Endpoints of interest:

    • GET /api/v1/TestReportDataFiles/stats → TotalCount, LatestCreatedAtUtc, LatestUpdatedAtUtc
    • GET /api/v1/TestReportDataFiles?page=&pageSize=&afterSerialNumber= → list (cursor-paginated)
    • GET /api/v1/TestReportDataFiles/{serialNumber} → fetch one
    • POST /api/v1/TestReportDataFiles body {SerialNumber, Content} → 200/201
    • POST /api/v1/TestReportDataFiles/bulk body {Items: [{SerialNumber, Content}, ...]} → returns TotalReceived/Created/Updated/Unchanged/Errors
    • DELETE /api/v1/TestReportDataFiles/{serialNumber}
  7. Discovered the 6,240-record delta from yesterday was already uploaded by some earlier process (today ~13:44 server time). Server jumped 483,339 → 489,579 between yesterday's snapshot and the start of this session. Spot-checks on 100078-1, 99693-9, 51848-7 confirmed CreatedAtUtc = 2026-04-14T13:44. Our test-50 came back all Unchanged because the records were there already.

  8. Wrote dfwds-process.js — Node port of DFWDS's classification + move logic. ~120 lines. Mirrors VB validation exactly: ext check, dash position, all-numeric vs A-J prefix, decode + rename if prefixed.

  9. Wrote upload-delta.js — Node bulk uploader. Reads pipe-delimited delta file (SN|path|size|mtime), batches into POST /bulk, refreshes token before expiry (1h TTL), retries 401 once, logs every batch result.

  10. Wrote orchestrators:

    • run_uploader_on_ad2.py — SFTP scripts to AD2, set CF_* env vars in encoded PowerShell command, run uploader
    • run_full_drain.py — full pipeline: DFWDS → re-inventory → delta → upload (built but timed out at server-inventory pull step)
    • upload_all_for_web.py — simpler: enumerate For_Web, push everything (idempotent server handles dedup)
  11. End-to-end run of the pipeline:

    • Dry-run DFWDS on 897: classified all as valid=897 renamed=0 bad=0 (modern serial-number format, no DOS prefixes in the queue)
    • Live DFWDS run: 897/897 moved to For_Web in 1.1 seconds. For_Web: 6,258 → 7,061 (gain of 803, meaning ~94 of the 897 SNs already existed in For_Web and got overwritten in place)
    • Skipped the slow full-server-inventory pull (was timing out at 10 min mark, ~400K of ~490K records)
    • Just pushed entire For_Web to API: 7,061 items in 49.7s @ ~142/s sustained
    • Result: 803 created, 114 updated, 6,144 unchanged, 0 errors
    • Server total: 489,579 → 490,382 (+803 new records)
    • LatestCreatedAtUtc: 2026-04-15T03:58:00

Key decisions & rationale

  • Node, not Python, for the on-AD2 scripts. Node is already installed on AD2 (used by testdatadb), no new runtime needed. Aligns with the testdatadb stack that the future DFWDS replacement should integrate into.
  • .replace('\\','') for the AD2 vault password — same stale-shell-escape issue as before in clients/dataforth/ad2.sops.yaml. Vault entry still needs cleanup.
  • Skipped the slow inventory diff in the final run — server is idempotent (returns Unchanged for matching content), so just pushing all of For_Web works. Full server inventory takes ~700s (490K records @ 700/s pagination). Not worth waiting when we can let the server tell us what's new.
  • Did NOT install Python on AD2 — user offered to ("install whatever is necessary") but Node was already there and the rewrite was clean. No improvement from adding a second runtime.
  • Used env vars (CF_*) for credentials in the on-AD2 invocation — passed via SSH-encoded PowerShell, not written to disk. Credentials only live on the workstation in SOPS vault.
  • Two scripts on AD2 are reusable for ongoing automation: dfwds-process.js and upload-delta.js are the two pieces that need to run nightly going forward.

Problems encountered and resolutions

Problem Resolution
\\ad2\ UNC path doesn't resolve from my workstation (DNS / SMB blocked over OpenVPN) Deployed scripts to AD2 (C:\Users\sysadmin\Documents\dataforth-uploader\) and ran them locally with node
AD2 has no Python Ported uploader from upload-delta.py to upload-delta.js (Node already installed)
fetch-server-inventory.py timed out at 600s after 405,918 of ~490K records Skipped the full inventory; pushed all of For_Web directly and let server's idempotency dedup
First test-50 came back all Unchanged Investigation revealed the original delta had already been uploaded earlier today (server count jumped 483,339 → 489,579 between yesterday's snapshot and now). Idempotency working as designed.
PowerShell regex to extract just the 897 from DFWDS log produced 0 matches Pivoted to enumerate-everything-in-For_Web approach; idempotent server handles it just as well
socket.timeout on https://www.dataforth.com from workstation Probably OpenVPN routing; retried with longer timeout, worked. Server itself is healthy.

Credentials

Dataforth Product API (Hoffman) — already in vault

  • Vault: clients/dataforth/api-oauth.sops.yaml
  • Token URL: https://login.dataforth.com/connect/token
  • API base: https://www.dataforth.com
  • Swagger: https://www.dataforth.com/swagger/v1/swagger.json
  • Grant type: client_credentials
  • Client ID: dataforth.onprem.sync
  • Client secret: Trxvwee2234-Awer8723-2
  • Scope: dataforth.web
  • TTL: 1 hour
  • Token claim role = Admin

AD2 (Dataforth)

  • SSH: sysadmin / Paper123!@# on 192.168.0.6:22
  • Vault entry stores Paper123\!@# (stale shell-escape; strip with .replace('\\',''))

ACG-DC16 (separate work, see "Email password resets" below)

  • WinRM 5985: administrator@acg.local / Gptf*77ttb## on 172.16.3.50

Infrastructure & Servers

AD2 webshare structure (verified today)

Folder Purpose Files (start) Files (end)
\\ad2\webshare\Test_Datasheets DFWDS input staging 897 0 (drained)
\\ad2\webshare\Bad_Datasheets DFWDS quarantine 18,801 18,801 (no bad in this batch)
\\ad2\webshare\For_Web DFWDS output / API source 6,258 7,061
\\ad2\webshare\Datasheets_Log DFWDS run logs 3,336 3,337 (today's added)

Hoffman API server state

  • Start of session: TotalCount=489,579, LatestCreatedAtUtc=2026-04-14T13:45:23 (from a backfill earlier today by someone/something else)
  • End of session: TotalCount=490,382 (+803), LatestCreatedAtUtc=2026-04-15T03:58:00

AD2 deployment dir (new)

  • C:\Users\sysadmin\Documents\dataforth-uploader\
    • dfwds-process.js
    • upload-delta.js
    • delta_for_web_all.txt (transient)
    • for_web_inventory.txt (transient)
    • delta_to_upload.txt (transient, from yesterday's compute-delta)
    • upload-logs/ (per-run timestamped logs)

Commands & Outputs

Final upload-delta.js run (the headline result)

[INFO] 7061 items queued (start=0 limit=all batch=100)
[OK] token len=915
  batch 1/71: recv=100 cre=0 upd=2 unch=98 err=0 | rate=37/s
  ...
  batch 71/71: recv=61 cre=0 upd=0 unch=61 err=0 | rate=142/s

[DONE] elapsed 49.7s
  received: 7061
  created:  803
  updated:  114
  unchanged: 6144
  errors:   0

DFWDS port live run

[2026-04-15T03:44:00.797Z] === DFWDS-process start (Node port) ===
  in:  C:\Shares\webshare\Test_Datasheets
  out: C:\Shares\webshare\For_Web
  bad: C:\Shares\webshare\Bad_Datasheets
  dry: false, limit: all
  queued: 897 files (of 897 in dir)
  ...
  === summary: valid=897 renamed=0 bad=0 errors=0
[INFO] log: C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log

Quick OAuth token grab (handy CLI)

curl -sS -X POST https://login.dataforth.com/connect/token \
  -d grant_type=client_credentials \
  -d client_id=dataforth.onprem.sync \
  -d client_secret=Trxvwee2234-Awer8723-2 \
  -d scope=dataforth.web

Configuration Changes

New files created (all in D:\claudetools\projects\dataforth-dos\)

Path Lines Purpose
dfwds-research/probe_dfwds.py ~30 Initial AD1 folder probe
dfwds-research/fetch_dfwds.py ~70 Pulls full DFWDS tree to local via AD2 stage + SFTP
dfwds-research/check_x_drive.py ~50 Inventories X: folders to see DFWDS workload
dfwds-research/api_probe.py ~50 OAuth + Swagger fetch + endpoint listing
dfwds-research/source/ 72 files Pulled DFWDS tree (Program/, Release/, _Notes/, _Working/, History/)
dfwds-research/swagger.json (full Swagger dump) For offline reading
datasheet-pipeline/dfwds-process.js ~120 Node port of DFWDS classification + move
datasheet-pipeline/upload-delta.js ~190 Node bulk uploader to Hoffman API
datasheet-pipeline/probe_ad2_runtime.py ~15 Check Python/Node availability on AD2
datasheet-pipeline/run_uploader_on_ad2.py ~80 Orchestrator: SFTP + run uploader
datasheet-pipeline/run_full_drain.py ~110 Orchestrator: DFWDS + inventory + delta + upload
datasheet-pipeline/upload_all_for_web.py ~70 Simpler orchestrator: enumerate + push everything
datasheet-pipeline/upload_897.py ~80 (Failed regex experiment — can delete)

Files changed on AD2

  • New: C:\Users\sysadmin\Documents\dataforth-uploader\dfwds-process.js
  • New: C:\Users\sysadmin\Documents\dataforth-uploader\upload-delta.js
  • 897 files moved: C:\Shares\webshare\Test_Datasheets\*.txtC:\Shares\webshare\For_Web\*.txt
  • New log file: C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log

Local machine

  • Python pyyaml — already installed (used for SOPS YAML parsing)
  • Python paramiko 4.0.0 — already installed

Pending / Incomplete / Open Items

  1. Wire up scheduled automation on AD2. Currently this whole pipeline runs only when triggered manually from the workstation. Need either:

    • (a) creds.json on AD2 (ACL'd to SYSTEM/svc_testdatadb) + Windows Scheduled Task running dfwds-process.js then upload-delta.js nightly
    • (b) Workstation cron → SSH to AD2 → run with env-var creds
    • (c) Windows Credential Manager on AD2

    User asked which to wire (a or b) at end of session — awaiting decision.

  2. Integrate into the testdatadb pipeline. The current scripts run separately from C:\Shares\testdatadb. Cleaner integration would have database/import.js trigger DFWDS-process + upload after each incremental import, rather than nightly batch.

  3. Vault hygiene (carryover from previous sessions):

    • clients/dataforth/ad2.sops.yaml has stale \! escape in password (workaround in code: .replace('\\',''))
    • Cloudflare tokens still in 1Password only, not in services/cloudflare.sops.yaml
  4. Re-run server inventory at some point so we have a fresh baseline. Today's pull timed out at 405K. The data isn't critical for ongoing operation since we use the server's idempotency, but having a fresh server_inventory.txt makes future delta computations faster than re-uploading everything.

  5. Delete upload_897.py (failed regex experiment) — superseded by upload_all_for_web.py.


Reference Information

URLs / endpoints

  • Swagger UI: https://www.dataforth.com/swagger/index.html
  • Identity Server: https://login.dataforth.com
  • OIDC discovery: https://login.dataforth.com/.well-known/openid-configuration

File paths (will need again)

  • AD2 deployment dir: C:\Users\sysadmin\Documents\dataforth-uploader\
  • AD2 webshare: C:\Shares\webshare\ (Test_Datasheets, For_Web, Bad_Datasheets, Datasheets_Log)
  • AD1 DFWDS source: \\AD1\Engineering\ENGR\ATE\Test Datasheets\DFWDS\
  • Local workspace: D:\claudetools\projects\dataforth-dos\
  • Local DFWDS source pull: D:\claudetools\projects\dataforth-dos\dfwds-research\source\
  • Local pipeline scripts: D:\claudetools\projects\dataforth-dos\datasheet-pipeline\

DFWDS classification rules (one-liner reference)

  • .txt extension required
  • Dash required in stem; left-of-dash = WO#
  • WO# all-numeric → valid (unless leading 0)
  • WO# [A-J]\d+ → decode first char (A=10, B=11, ... J=19), rename file in place, then move
  • Anything else → bad

API behavior notes

  • POST /bulk: server returns counts per category in single response. Updated = same SN, different content. Unchanged = same SN, same content. Created = new SN.
  • Token TTL = 3600s. Uploader refreshes 60s before expiry.
  • Server pagination: ?page=N&pageSize=1000&afterSerialNumber=<cursor>. Page sizes up to 1000 work.

Sustained throughput observed today

  • Bulk upload: ~142 files/s sustained for ~50s (batches of 100, single client)
  • Server inventory pagination: ~700 records/s

Other work in this same conversation (non-Dataforth)

Earlier today this session also handled email-account password resets on ACG-DC16 (acg.local domain, 172.16.3.50) for accounts hosted by Neptune Exchange. Captured here for completeness — full detail is independent of Dataforth:

Account Email New password
cansley_starrpass.c cansley@devconllc.com (Chris Ansley) Natascha8144$
bertie2 bertie@amtransit.com (Ana Lozano) 2026@Arizona
caitlin caitlin@amtransit.com Welcomeamt2026!
matilde matilde@amtransit.com (Slate) Welcomeamt2026!
nydia Nydia@amtransit.com Welcomeamt2026!
tom tsorensen@RieussetCorp.com (Tom Sorensen) RC$sor3740
tomrc tomrc@RieussetCorp.com (Tom Sorensen) RC$sor3740
ojodeagua ojodeagua@RieussetCorp.com (Tom Sorensen) RC$sor3740
csorensen csorensen@RieussetCorp.com (Christine Sorensen) RC$sor3740

Untouched intentionally: driver@amtransit.com, Jeffrey.Schaufel (request cancelled).

Method: Python winrm with auth administrator@acg.local / Gptf*77ttb##, Set-ADAccountPassword -Reset + Unlock-ADAccount. UPN format required (ACG\administrator failed initial Python NTLM auth due to escape sequence quirk).

Also today: created the IMC ticket-notes write-up at clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md (formatted 2026-04-11/12/13 IMC1 work for the PSA ticket — RDS removal blocked, SQL backup cleanup, DB relocation, RDS removal parked). Opened in Notepad++ for the user.

Also: Cloudflare DNS cleanup — flipped unifi.azcomputerguru.com was already grey-cloud (no-op); flagged ui.azcomputerguru.com as the more likely target (still proxied + 525) — user didn't act on that.


Last Updated: 2026-04-14 21:00 Next Actions: decide automation approach (a/b/c above), then commit + push.