Built the missing piece between the test datasheet pipeline and Dataforth's new product API. End-to-end: - Pulled DFWDS (Dataforth Web Datasheet System) VB6 source from AD1\Engineering\ENGR\ATE\Test Datasheets\DFWDS to local for analysis - Decoded its filename validation: A-J prefix decodes (A=10..J=19), all- numeric WO# valid (no leading 0), anything else bad - Ported the validation + move logic to Node (dfwds-process.js) - Built bulk uploader (upload-delta.js) for Hoffman's Swagger API (POST /api/v1/TestReportDataFiles/bulk with OAuth client_credentials) Sanitized 3 prior reference scripts (fetch-server-inventory, test-scenarios, test-upload-two) to read CF_* env vars instead of hardcoded creds. Live drain results: - 897 files moved Test_Datasheets -> For_Web (all valid, no renames, no bad), DFWDS port summary in 1.1s - Pushed entire For_Web (7,061 files) to Hoffman API in 49.7s @ 142/s: Created=803 Updated=114 Unchanged=6,144 Errors=0 - Server count: 489,579 -> 490,382 (+803 net new) Also: - Added clients/dataforth/.gitignore to exclude plaintext Oauth.txt note - Added clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md (ticket write-up of 2026-04-11/12/13 IMC1 RDS removal/SQL migration work) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
17 KiB
Session Log — Dataforth — 2026-04-14
DFWDS port + Hoffman API uploader pipeline (end-to-end working)
Mid-day session: pulled the legacy DFWDS VB6 source from AD1, ported its file-classification logic to Node, wired up the bulk uploader against Hoffman's new https://www.dataforth.com API, drained the 897-file Test_Datasheets backlog, and pushed everything currently in For_Web to the live server. End result: server caught up to the AD2 For_Web state with 803 new records created.
(Earlier today this same conversation also did a bunch of email-account password resets via ACG-DC16 — those are noted at the bottom for completeness but aren't Dataforth.)
Session Summary
What was accomplished
-
Pulled DFWDS source (Dataforth Web Datasheet System, VB6) from AD1 via AD2 jump host. 71 files / 957 KB. Folder structure:
Program/,Release/(current source: DFWDS.bas + frmSplash.frm + DFWDS.exe),_Notes/(developer docs),_Working/(WIP), plusHistory/snapshots from 2014-09-24 through 2014-10-02. -
Determined what DFWDS actually does (1181 lines of VB6): pure filesystem mover/renamer. Reads
X:\Test_Datasheets\*.txt, validates filename, moves valid →X:\For_Web, bad →X:\Bad_Datasheets, logs toX:\Datasheets_Log\DFWDS_YYYY_MM_DD.log. No HTTP/FTP code — "website" in the name only refers to the destination folder; web upload was always a separate piece (currently the new Hoffman API). -
Validation rules captured from VB source:
- Filename must be
.txt - Must contain a dash; left-of-dash = Work Order #
- WO# is all-numeric → valid (unless starts with 0)
- WO# starts with
A-Jfollowed by digits → DOS-encoded; decode first char to a 2-digit prefix (A=10,B=11, ...J=19) and rename in place, then move to For_Web - Anything else → bad
- Filename must be
-
Probed the X: drive state on AD2 —
\\ad2\webshare\:Test_Datasheets: 897 files, 2.4 MB, dates 2026-02-09 → 2026-04-13 14:42 (DFWDS hadn't run since 2026-03-11, ~33 days)Bad_Datasheets: 18,801 files (historical, oldest 2003)For_Web: 6,258 files at start of session (later 7,061 after DFWDS port ran)Datasheets_Log: 3,336 log files
-
Found existing reference scripts in
projects/dataforth-dos/datasheet-pipeline/:test-upload-two.py— single POST + roundtrip difftest-scenarios.py— idempotency, update, bulk tests (all green)fetch-server-inventory.py— paginates GET, writes serial listcompute-delta.py— diffs local vs server inventories- All four were prior work that proved the API contract end-to-end. Confirmed
dataforth.onprem.syncclient +Trxvwee2234-Awer8723-2secret +dataforth.webscope still works.
-
Read API Swagger (
https://www.dataforth.com/swagger/v1/swagger.json). Endpoints of interest:GET /api/v1/TestReportDataFiles/stats→ TotalCount, LatestCreatedAtUtc, LatestUpdatedAtUtcGET /api/v1/TestReportDataFiles?page=&pageSize=&afterSerialNumber=→ list (cursor-paginated)GET /api/v1/TestReportDataFiles/{serialNumber}→ fetch onePOST /api/v1/TestReportDataFilesbody{SerialNumber, Content}→ 200/201POST /api/v1/TestReportDataFiles/bulkbody{Items: [{SerialNumber, Content}, ...]}→ returns TotalReceived/Created/Updated/Unchanged/ErrorsDELETE /api/v1/TestReportDataFiles/{serialNumber}
-
Discovered the 6,240-record delta from yesterday was already uploaded by some earlier process (today ~13:44 server time). Server jumped 483,339 → 489,579 between yesterday's snapshot and the start of this session. Spot-checks on
100078-1,99693-9,51848-7confirmedCreatedAtUtc = 2026-04-14T13:44. Our test-50 came back allUnchangedbecause the records were there already. -
Wrote
dfwds-process.js— Node port of DFWDS's classification + move logic. ~120 lines. Mirrors VB validation exactly: ext check, dash position, all-numeric vs A-J prefix, decode + rename if prefixed. -
Wrote
upload-delta.js— Node bulk uploader. Reads pipe-delimited delta file (SN|path|size|mtime), batches into POST/bulk, refreshes token before expiry (1h TTL), retries 401 once, logs every batch result. -
Wrote orchestrators:
run_uploader_on_ad2.py— SFTP scripts to AD2, set CF_* env vars in encoded PowerShell command, run uploaderrun_full_drain.py— full pipeline: DFWDS → re-inventory → delta → upload (built but timed out at server-inventory pull step)upload_all_for_web.py— simpler: enumerate For_Web, push everything (idempotent server handles dedup)
-
End-to-end run of the pipeline:
- Dry-run DFWDS on 897: classified all as
valid=897 renamed=0 bad=0(modern serial-number format, no DOS prefixes in the queue) - Live DFWDS run: 897/897 moved to For_Web in 1.1 seconds. For_Web: 6,258 → 7,061 (gain of 803, meaning ~94 of the 897 SNs already existed in For_Web and got overwritten in place)
- Skipped the slow full-server-inventory pull (was timing out at 10 min mark, ~400K of ~490K records)
- Just pushed entire For_Web to API: 7,061 items in 49.7s @ ~142/s sustained
- Result: 803 created, 114 updated, 6,144 unchanged, 0 errors
- Server total: 489,579 → 490,382 (+803 new records)
- LatestCreatedAtUtc:
2026-04-15T03:58:00
- Dry-run DFWDS on 897: classified all as
Key decisions & rationale
- Node, not Python, for the on-AD2 scripts. Node is already installed on AD2 (used by testdatadb), no new runtime needed. Aligns with the testdatadb stack that the future DFWDS replacement should integrate into.
.replace('\\','')for the AD2 vault password — same stale-shell-escape issue as before inclients/dataforth/ad2.sops.yaml. Vault entry still needs cleanup.- Skipped the slow inventory diff in the final run — server is idempotent (returns Unchanged for matching content), so just pushing all of For_Web works. Full server inventory takes ~700s (490K records @ 700/s pagination). Not worth waiting when we can let the server tell us what's new.
- Did NOT install Python on AD2 — user offered to ("install whatever is necessary") but Node was already there and the rewrite was clean. No improvement from adding a second runtime.
- Used env vars (
CF_*) for credentials in the on-AD2 invocation — passed via SSH-encoded PowerShell, not written to disk. Credentials only live on the workstation in SOPS vault. - Two scripts on AD2 are reusable for ongoing automation:
dfwds-process.jsandupload-delta.jsare the two pieces that need to run nightly going forward.
Problems encountered and resolutions
| Problem | Resolution |
|---|---|
\\ad2\ UNC path doesn't resolve from my workstation (DNS / SMB blocked over OpenVPN) |
Deployed scripts to AD2 (C:\Users\sysadmin\Documents\dataforth-uploader\) and ran them locally with node |
| AD2 has no Python | Ported uploader from upload-delta.py to upload-delta.js (Node already installed) |
fetch-server-inventory.py timed out at 600s after 405,918 of ~490K records |
Skipped the full inventory; pushed all of For_Web directly and let server's idempotency dedup |
First test-50 came back all Unchanged |
Investigation revealed the original delta had already been uploaded earlier today (server count jumped 483,339 → 489,579 between yesterday's snapshot and now). Idempotency working as designed. |
| PowerShell regex to extract just the 897 from DFWDS log produced 0 matches | Pivoted to enumerate-everything-in-For_Web approach; idempotent server handles it just as well |
socket.timeout on https://www.dataforth.com from workstation |
Probably OpenVPN routing; retried with longer timeout, worked. Server itself is healthy. |
Credentials
Dataforth Product API (Hoffman) — already in vault
- Vault:
clients/dataforth/api-oauth.sops.yaml - Token URL:
https://login.dataforth.com/connect/token - API base:
https://www.dataforth.com - Swagger:
https://www.dataforth.com/swagger/v1/swagger.json - Grant type:
client_credentials - Client ID:
dataforth.onprem.sync - Client secret:
Trxvwee2234-Awer8723-2 - Scope:
dataforth.web - TTL: 1 hour
- Token claim
role= Admin
AD2 (Dataforth)
- SSH:
sysadmin / Paper123!@#on 192.168.0.6:22 - Vault entry stores
Paper123\!@#(stale shell-escape; strip with.replace('\\',''))
ACG-DC16 (separate work, see "Email password resets" below)
- WinRM 5985:
administrator@acg.local / Gptf*77ttb##on 172.16.3.50
Infrastructure & Servers
AD2 webshare structure (verified today)
| Folder | Purpose | Files (start) | Files (end) |
|---|---|---|---|
\\ad2\webshare\Test_Datasheets |
DFWDS input staging | 897 | 0 (drained) |
\\ad2\webshare\Bad_Datasheets |
DFWDS quarantine | 18,801 | 18,801 (no bad in this batch) |
\\ad2\webshare\For_Web |
DFWDS output / API source | 6,258 | 7,061 |
\\ad2\webshare\Datasheets_Log |
DFWDS run logs | 3,336 | 3,337 (today's added) |
Hoffman API server state
- Start of session:
TotalCount=489,579,LatestCreatedAtUtc=2026-04-14T13:45:23(from a backfill earlier today by someone/something else) - End of session:
TotalCount=490,382(+803),LatestCreatedAtUtc=2026-04-15T03:58:00
AD2 deployment dir (new)
C:\Users\sysadmin\Documents\dataforth-uploader\dfwds-process.jsupload-delta.jsdelta_for_web_all.txt(transient)for_web_inventory.txt(transient)delta_to_upload.txt(transient, from yesterday's compute-delta)upload-logs/(per-run timestamped logs)
Commands & Outputs
Final upload-delta.js run (the headline result)
[INFO] 7061 items queued (start=0 limit=all batch=100)
[OK] token len=915
batch 1/71: recv=100 cre=0 upd=2 unch=98 err=0 | rate=37/s
...
batch 71/71: recv=61 cre=0 upd=0 unch=61 err=0 | rate=142/s
[DONE] elapsed 49.7s
received: 7061
created: 803
updated: 114
unchanged: 6144
errors: 0
DFWDS port live run
[2026-04-15T03:44:00.797Z] === DFWDS-process start (Node port) ===
in: C:\Shares\webshare\Test_Datasheets
out: C:\Shares\webshare\For_Web
bad: C:\Shares\webshare\Bad_Datasheets
dry: false, limit: all
queued: 897 files (of 897 in dir)
...
=== summary: valid=897 renamed=0 bad=0 errors=0
[INFO] log: C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log
Quick OAuth token grab (handy CLI)
curl -sS -X POST https://login.dataforth.com/connect/token \
-d grant_type=client_credentials \
-d client_id=dataforth.onprem.sync \
-d client_secret=Trxvwee2234-Awer8723-2 \
-d scope=dataforth.web
Configuration Changes
New files created (all in D:\claudetools\projects\dataforth-dos\)
| Path | Lines | Purpose |
|---|---|---|
dfwds-research/probe_dfwds.py |
~30 | Initial AD1 folder probe |
dfwds-research/fetch_dfwds.py |
~70 | Pulls full DFWDS tree to local via AD2 stage + SFTP |
dfwds-research/check_x_drive.py |
~50 | Inventories X: folders to see DFWDS workload |
dfwds-research/api_probe.py |
~50 | OAuth + Swagger fetch + endpoint listing |
dfwds-research/source/ |
72 files | Pulled DFWDS tree (Program/, Release/, _Notes/, _Working/, History/) |
dfwds-research/swagger.json |
(full Swagger dump) | For offline reading |
datasheet-pipeline/dfwds-process.js |
~120 | Node port of DFWDS classification + move |
datasheet-pipeline/upload-delta.js |
~190 | Node bulk uploader to Hoffman API |
datasheet-pipeline/probe_ad2_runtime.py |
~15 | Check Python/Node availability on AD2 |
datasheet-pipeline/run_uploader_on_ad2.py |
~80 | Orchestrator: SFTP + run uploader |
datasheet-pipeline/run_full_drain.py |
~110 | Orchestrator: DFWDS + inventory + delta + upload |
datasheet-pipeline/upload_all_for_web.py |
~70 | Simpler orchestrator: enumerate + push everything |
datasheet-pipeline/upload_897.py |
~80 | (Failed regex experiment — can delete) |
Files changed on AD2
- New:
C:\Users\sysadmin\Documents\dataforth-uploader\dfwds-process.js - New:
C:\Users\sysadmin\Documents\dataforth-uploader\upload-delta.js - 897 files moved:
C:\Shares\webshare\Test_Datasheets\*.txt→C:\Shares\webshare\For_Web\*.txt - New log file:
C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log
Local machine
- Python
pyyaml— already installed (used for SOPS YAML parsing) - Python
paramiko4.0.0 — already installed
Pending / Incomplete / Open Items
-
Wire up scheduled automation on AD2. Currently this whole pipeline runs only when triggered manually from the workstation. Need either:
- (a)
creds.jsonon AD2 (ACL'd to SYSTEM/svc_testdatadb) + Windows Scheduled Task runningdfwds-process.jsthenupload-delta.jsnightly - (b) Workstation cron → SSH to AD2 → run with env-var creds
- (c) Windows Credential Manager on AD2
User asked which to wire (a or b) at end of session — awaiting decision.
- (a)
-
Integrate into the testdatadb pipeline. The current scripts run separately from
C:\Shares\testdatadb. Cleaner integration would havedatabase/import.jstrigger DFWDS-process + upload after each incremental import, rather than nightly batch. -
Vault hygiene (carryover from previous sessions):
clients/dataforth/ad2.sops.yamlhas stale\!escape in password (workaround in code:.replace('\\',''))- Cloudflare tokens still in 1Password only, not in
services/cloudflare.sops.yaml
-
Re-run server inventory at some point so we have a fresh baseline. Today's pull timed out at 405K. The data isn't critical for ongoing operation since we use the server's idempotency, but having a fresh
server_inventory.txtmakes future delta computations faster than re-uploading everything. -
Delete
upload_897.py(failed regex experiment) — superseded byupload_all_for_web.py.
Reference Information
URLs / endpoints
- Swagger UI:
https://www.dataforth.com/swagger/index.html - Identity Server:
https://login.dataforth.com - OIDC discovery:
https://login.dataforth.com/.well-known/openid-configuration
File paths (will need again)
- AD2 deployment dir:
C:\Users\sysadmin\Documents\dataforth-uploader\ - AD2 webshare:
C:\Shares\webshare\(Test_Datasheets, For_Web, Bad_Datasheets, Datasheets_Log) - AD1 DFWDS source:
\\AD1\Engineering\ENGR\ATE\Test Datasheets\DFWDS\ - Local workspace:
D:\claudetools\projects\dataforth-dos\ - Local DFWDS source pull:
D:\claudetools\projects\dataforth-dos\dfwds-research\source\ - Local pipeline scripts:
D:\claudetools\projects\dataforth-dos\datasheet-pipeline\
DFWDS classification rules (one-liner reference)
.txtextension required- Dash required in stem; left-of-dash = WO#
- WO# all-numeric → valid (unless leading 0)
- WO#
[A-J]\d+→ decode first char (A=10, B=11, ... J=19), rename file in place, then move - Anything else → bad
API behavior notes
POST /bulk: server returns counts per category in single response.Updated= same SN, different content.Unchanged= same SN, same content.Created= new SN.- Token TTL = 3600s. Uploader refreshes 60s before expiry.
- Server pagination:
?page=N&pageSize=1000&afterSerialNumber=<cursor>. Page sizes up to 1000 work.
Sustained throughput observed today
- Bulk upload: ~142 files/s sustained for ~50s (batches of 100, single client)
- Server inventory pagination: ~700 records/s
Other work in this same conversation (non-Dataforth)
Earlier today this session also handled email-account password resets on ACG-DC16 (acg.local domain, 172.16.3.50) for accounts hosted by Neptune Exchange. Captured here for completeness — full detail is independent of Dataforth:
| Account | New password | |
|---|---|---|
cansley_starrpass.c |
cansley@devconllc.com (Chris Ansley) | Natascha8144$ |
bertie2 |
bertie@amtransit.com (Ana Lozano) | 2026@Arizona |
caitlin |
caitlin@amtransit.com | Welcomeamt2026! |
matilde |
matilde@amtransit.com (Slate) | Welcomeamt2026! |
nydia |
Nydia@amtransit.com | Welcomeamt2026! |
tom |
tsorensen@RieussetCorp.com (Tom Sorensen) | RC$sor3740 |
tomrc |
tomrc@RieussetCorp.com (Tom Sorensen) | RC$sor3740 |
ojodeagua |
ojodeagua@RieussetCorp.com (Tom Sorensen) | RC$sor3740 |
csorensen |
csorensen@RieussetCorp.com (Christine Sorensen) | RC$sor3740 |
Untouched intentionally: driver@amtransit.com, Jeffrey.Schaufel (request cancelled).
Method: Python winrm with auth administrator@acg.local / Gptf*77ttb##, Set-ADAccountPassword -Reset + Unlock-ADAccount. UPN format required (ACG\administrator failed initial Python NTLM auth due to escape sequence quirk).
Also today: created the IMC ticket-notes write-up at clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md (formatted 2026-04-11/12/13 IMC1 work for the PSA ticket — RDS removal blocked, SQL backup cleanup, DB relocation, RDS removal parked). Opened in Notepad++ for the user.
Also: Cloudflare DNS cleanup — flipped unifi.azcomputerguru.com was already grey-cloud (no-op); flagged ui.azcomputerguru.com as the more likely target (still proxied + 525) — user didn't act on that.
Last Updated: 2026-04-14 21:00 Next Actions: decide automation approach (a/b/c above), then commit + push.