Session log + DFWDS Node port + Hoffman API uploader pipeline
Built the missing piece between the test datasheet pipeline and Dataforth's new product API. End-to-end: - Pulled DFWDS (Dataforth Web Datasheet System) VB6 source from AD1\Engineering\ENGR\ATE\Test Datasheets\DFWDS to local for analysis - Decoded its filename validation: A-J prefix decodes (A=10..J=19), all- numeric WO# valid (no leading 0), anything else bad - Ported the validation + move logic to Node (dfwds-process.js) - Built bulk uploader (upload-delta.js) for Hoffman's Swagger API (POST /api/v1/TestReportDataFiles/bulk with OAuth client_credentials) Sanitized 3 prior reference scripts (fetch-server-inventory, test-scenarios, test-upload-two) to read CF_* env vars instead of hardcoded creds. Live drain results: - 897 files moved Test_Datasheets -> For_Web (all valid, no renames, no bad), DFWDS port summary in 1.1s - Pushed entire For_Web (7,061 files) to Hoffman API in 49.7s @ 142/s: Created=803 Updated=114 Unchanged=6,144 Errors=0 - Server count: 489,579 -> 490,382 (+803 net new) Also: - Added clients/dataforth/.gitignore to exclude plaintext Oauth.txt note - Added clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md (ticket write-up of 2026-04-11/12/13 IMC1 RDS removal/SQL migration work) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
292
clients/dataforth/session-logs/2026-04-14-session.md
Normal file
292
clients/dataforth/session-logs/2026-04-14-session.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# Session Log — Dataforth — 2026-04-14
|
||||
|
||||
## DFWDS port + Hoffman API uploader pipeline (end-to-end working)
|
||||
|
||||
Mid-day session: pulled the legacy DFWDS VB6 source from AD1, ported its file-classification logic to Node, wired up the bulk uploader against Hoffman's new `https://www.dataforth.com` API, drained the 897-file Test_Datasheets backlog, and pushed everything currently in For_Web to the live server. End result: server caught up to the AD2 For_Web state with 803 new records created.
|
||||
|
||||
(Earlier today this same conversation also did a bunch of email-account password resets via ACG-DC16 — those are noted at the bottom for completeness but aren't Dataforth.)
|
||||
|
||||
---
|
||||
|
||||
## Session Summary
|
||||
|
||||
### What was accomplished
|
||||
|
||||
1. **Pulled DFWDS source** (Dataforth Web Datasheet System, VB6) from AD1 via AD2 jump host. 71 files / 957 KB. Folder structure: `Program/`, `Release/` (current source: DFWDS.bas + frmSplash.frm + DFWDS.exe), `_Notes/` (developer docs), `_Working/` (WIP), plus `History/` snapshots from 2014-09-24 through 2014-10-02.
|
||||
|
||||
2. **Determined what DFWDS actually does** (1181 lines of VB6): pure filesystem mover/renamer. Reads `X:\Test_Datasheets\*.txt`, validates filename, moves valid → `X:\For_Web`, bad → `X:\Bad_Datasheets`, logs to `X:\Datasheets_Log\DFWDS_YYYY_MM_DD.log`. **No HTTP/FTP code** — "website" in the name only refers to the destination folder; web upload was always a separate piece (currently the new Hoffman API).
|
||||
|
||||
3. **Validation rules captured from VB source:**
|
||||
- Filename must be `.txt`
|
||||
- Must contain a dash; left-of-dash = Work Order #
|
||||
- WO# is all-numeric → valid (unless starts with 0)
|
||||
- WO# starts with `A`-`J` followed by digits → DOS-encoded; decode first char to a 2-digit prefix (`A`=10, `B`=11, ... `J`=19) and rename in place, then move to For_Web
|
||||
- Anything else → bad
|
||||
|
||||
4. **Probed the X: drive state on AD2** — `\\ad2\webshare\`:
|
||||
- `Test_Datasheets`: 897 files, 2.4 MB, dates 2026-02-09 → 2026-04-13 14:42 (DFWDS hadn't run since 2026-03-11, ~33 days)
|
||||
- `Bad_Datasheets`: 18,801 files (historical, oldest 2003)
|
||||
- `For_Web`: 6,258 files at start of session (later 7,061 after DFWDS port ran)
|
||||
- `Datasheets_Log`: 3,336 log files
|
||||
|
||||
5. **Found existing reference scripts** in `projects/dataforth-dos/datasheet-pipeline/`:
|
||||
- `test-upload-two.py` — single POST + roundtrip diff
|
||||
- `test-scenarios.py` — idempotency, update, bulk tests (all green)
|
||||
- `fetch-server-inventory.py` — paginates GET, writes serial list
|
||||
- `compute-delta.py` — diffs local vs server inventories
|
||||
- All four were prior work that proved the API contract end-to-end. Confirmed `dataforth.onprem.sync` client + `Trxvwee2234-Awer8723-2` secret + `dataforth.web` scope still works.
|
||||
|
||||
6. **Read API Swagger** (`https://www.dataforth.com/swagger/v1/swagger.json`). Endpoints of interest:
|
||||
- `GET /api/v1/TestReportDataFiles/stats` → TotalCount, LatestCreatedAtUtc, LatestUpdatedAtUtc
|
||||
- `GET /api/v1/TestReportDataFiles?page=&pageSize=&afterSerialNumber=` → list (cursor-paginated)
|
||||
- `GET /api/v1/TestReportDataFiles/{serialNumber}` → fetch one
|
||||
- `POST /api/v1/TestReportDataFiles` body `{SerialNumber, Content}` → 200/201
|
||||
- `POST /api/v1/TestReportDataFiles/bulk` body `{Items: [{SerialNumber, Content}, ...]}` → returns TotalReceived/Created/Updated/Unchanged/Errors
|
||||
- `DELETE /api/v1/TestReportDataFiles/{serialNumber}`
|
||||
|
||||
7. **Discovered the 6,240-record delta from yesterday was already uploaded** by some earlier process (today ~13:44 server time). Server jumped 483,339 → 489,579 between yesterday's snapshot and the start of this session. Spot-checks on `100078-1`, `99693-9`, `51848-7` confirmed `CreatedAtUtc = 2026-04-14T13:44`. Our test-50 came back all `Unchanged` because the records were there already.
|
||||
|
||||
8. **Wrote `dfwds-process.js`** — Node port of DFWDS's classification + move logic. ~120 lines. Mirrors VB validation exactly: ext check, dash position, all-numeric vs A-J prefix, decode + rename if prefixed.
|
||||
|
||||
9. **Wrote `upload-delta.js`** — Node bulk uploader. Reads pipe-delimited delta file (`SN|path|size|mtime`), batches into POST `/bulk`, refreshes token before expiry (1h TTL), retries 401 once, logs every batch result.
|
||||
|
||||
10. **Wrote orchestrators**:
|
||||
- `run_uploader_on_ad2.py` — SFTP scripts to AD2, set CF_* env vars in encoded PowerShell command, run uploader
|
||||
- `run_full_drain.py` — full pipeline: DFWDS → re-inventory → delta → upload (built but timed out at server-inventory pull step)
|
||||
- `upload_all_for_web.py` — simpler: enumerate For_Web, push everything (idempotent server handles dedup)
|
||||
|
||||
11. **End-to-end run of the pipeline:**
|
||||
- Dry-run DFWDS on 897: classified all as `valid=897 renamed=0 bad=0` (modern serial-number format, no DOS prefixes in the queue)
|
||||
- Live DFWDS run: 897/897 moved to For_Web in 1.1 seconds. For_Web: 6,258 → 7,061 (gain of 803, meaning ~94 of the 897 SNs already existed in For_Web and got overwritten in place)
|
||||
- Skipped the slow full-server-inventory pull (was timing out at 10 min mark, ~400K of ~490K records)
|
||||
- Just pushed entire For_Web to API: 7,061 items in 49.7s @ ~142/s sustained
|
||||
- Result: **803 created, 114 updated, 6,144 unchanged, 0 errors**
|
||||
- Server total: 489,579 → **490,382** (+803 new records)
|
||||
- LatestCreatedAtUtc: `2026-04-15T03:58:00`
|
||||
|
||||
### Key decisions & rationale
|
||||
|
||||
- **Node, not Python, for the on-AD2 scripts.** Node is already installed on AD2 (used by testdatadb), no new runtime needed. Aligns with the testdatadb stack that the future DFWDS replacement should integrate into.
|
||||
- **`.replace('\\','')` for the AD2 vault password** — same stale-shell-escape issue as before in `clients/dataforth/ad2.sops.yaml`. Vault entry still needs cleanup.
|
||||
- **Skipped the slow inventory diff in the final run** — server is idempotent (returns Unchanged for matching content), so just pushing all of For_Web works. Full server inventory takes ~700s (490K records @ 700/s pagination). Not worth waiting when we can let the server tell us what's new.
|
||||
- **Did NOT install Python on AD2** — user offered to ("install whatever is necessary") but Node was already there and the rewrite was clean. No improvement from adding a second runtime.
|
||||
- **Used env vars (`CF_*`) for credentials in the on-AD2 invocation** — passed via SSH-encoded PowerShell, not written to disk. Credentials only live on the workstation in SOPS vault.
|
||||
- **Two scripts on AD2 are reusable for ongoing automation:** `dfwds-process.js` and `upload-delta.js` are the two pieces that need to run nightly going forward.
|
||||
|
||||
### Problems encountered and resolutions
|
||||
|
||||
| Problem | Resolution |
|
||||
|---|---|
|
||||
| `\\ad2\` UNC path doesn't resolve from my workstation (DNS / SMB blocked over OpenVPN) | Deployed scripts to AD2 (`C:\Users\sysadmin\Documents\dataforth-uploader\`) and ran them locally with `node` |
|
||||
| AD2 has no Python | Ported uploader from `upload-delta.py` to `upload-delta.js` (Node already installed) |
|
||||
| `fetch-server-inventory.py` timed out at 600s after 405,918 of ~490K records | Skipped the full inventory; pushed all of For_Web directly and let server's idempotency dedup |
|
||||
| First test-50 came back all `Unchanged` | Investigation revealed the original delta had already been uploaded earlier today (server count jumped 483,339 → 489,579 between yesterday's snapshot and now). Idempotency working as designed. |
|
||||
| PowerShell regex to extract just the 897 from DFWDS log produced 0 matches | Pivoted to enumerate-everything-in-For_Web approach; idempotent server handles it just as well |
|
||||
| `socket.timeout` on `https://www.dataforth.com` from workstation | Probably OpenVPN routing; retried with longer timeout, worked. Server itself is healthy. |
|
||||
|
||||
---
|
||||
|
||||
## Credentials
|
||||
|
||||
### Dataforth Product API (Hoffman) — already in vault
|
||||
- Vault: `clients/dataforth/api-oauth.sops.yaml`
|
||||
- Token URL: `https://login.dataforth.com/connect/token`
|
||||
- API base: `https://www.dataforth.com`
|
||||
- Swagger: `https://www.dataforth.com/swagger/v1/swagger.json`
|
||||
- Grant type: `client_credentials`
|
||||
- Client ID: `dataforth.onprem.sync`
|
||||
- Client secret: `Trxvwee2234-Awer8723-2`
|
||||
- Scope: `dataforth.web`
|
||||
- TTL: 1 hour
|
||||
- Token claim `role` = Admin
|
||||
|
||||
### AD2 (Dataforth)
|
||||
- SSH: `sysadmin / Paper123!@#` on 192.168.0.6:22
|
||||
- Vault entry stores `Paper123\!@#` (stale shell-escape; strip with `.replace('\\','')`)
|
||||
|
||||
### ACG-DC16 (separate work, see "Email password resets" below)
|
||||
- WinRM 5985: `administrator@acg.local / Gptf*77ttb##` on 172.16.3.50
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
### AD2 webshare structure (verified today)
|
||||
| Folder | Purpose | Files (start) | Files (end) |
|
||||
|---|---|---|---|
|
||||
| `\\ad2\webshare\Test_Datasheets` | DFWDS input staging | 897 | 0 (drained) |
|
||||
| `\\ad2\webshare\Bad_Datasheets` | DFWDS quarantine | 18,801 | 18,801 (no bad in this batch) |
|
||||
| `\\ad2\webshare\For_Web` | DFWDS output / API source | 6,258 | 7,061 |
|
||||
| `\\ad2\webshare\Datasheets_Log` | DFWDS run logs | 3,336 | 3,337 (today's added) |
|
||||
|
||||
### Hoffman API server state
|
||||
- Start of session: `TotalCount=489,579`, `LatestCreatedAtUtc=2026-04-14T13:45:23` (from a backfill earlier today by someone/something else)
|
||||
- End of session: `TotalCount=490,382` (+803), `LatestCreatedAtUtc=2026-04-15T03:58:00`
|
||||
|
||||
### AD2 deployment dir (new)
|
||||
- `C:\Users\sysadmin\Documents\dataforth-uploader\`
|
||||
- `dfwds-process.js`
|
||||
- `upload-delta.js`
|
||||
- `delta_for_web_all.txt` (transient)
|
||||
- `for_web_inventory.txt` (transient)
|
||||
- `delta_to_upload.txt` (transient, from yesterday's compute-delta)
|
||||
- `upload-logs/` (per-run timestamped logs)
|
||||
|
||||
---
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
### Final upload-delta.js run (the headline result)
|
||||
```
|
||||
[INFO] 7061 items queued (start=0 limit=all batch=100)
|
||||
[OK] token len=915
|
||||
batch 1/71: recv=100 cre=0 upd=2 unch=98 err=0 | rate=37/s
|
||||
...
|
||||
batch 71/71: recv=61 cre=0 upd=0 unch=61 err=0 | rate=142/s
|
||||
|
||||
[DONE] elapsed 49.7s
|
||||
received: 7061
|
||||
created: 803
|
||||
updated: 114
|
||||
unchanged: 6144
|
||||
errors: 0
|
||||
```
|
||||
|
||||
### DFWDS port live run
|
||||
```
|
||||
[2026-04-15T03:44:00.797Z] === DFWDS-process start (Node port) ===
|
||||
in: C:\Shares\webshare\Test_Datasheets
|
||||
out: C:\Shares\webshare\For_Web
|
||||
bad: C:\Shares\webshare\Bad_Datasheets
|
||||
dry: false, limit: all
|
||||
queued: 897 files (of 897 in dir)
|
||||
...
|
||||
=== summary: valid=897 renamed=0 bad=0 errors=0
|
||||
[INFO] log: C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log
|
||||
```
|
||||
|
||||
### Quick OAuth token grab (handy CLI)
|
||||
```bash
|
||||
curl -sS -X POST https://login.dataforth.com/connect/token \
|
||||
-d grant_type=client_credentials \
|
||||
-d client_id=dataforth.onprem.sync \
|
||||
-d client_secret=Trxvwee2234-Awer8723-2 \
|
||||
-d scope=dataforth.web
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
### New files created (all in `D:\claudetools\projects\dataforth-dos\`)
|
||||
|
||||
| Path | Lines | Purpose |
|
||||
|---|---|---|
|
||||
| `dfwds-research/probe_dfwds.py` | ~30 | Initial AD1 folder probe |
|
||||
| `dfwds-research/fetch_dfwds.py` | ~70 | Pulls full DFWDS tree to local via AD2 stage + SFTP |
|
||||
| `dfwds-research/check_x_drive.py` | ~50 | Inventories X: folders to see DFWDS workload |
|
||||
| `dfwds-research/api_probe.py` | ~50 | OAuth + Swagger fetch + endpoint listing |
|
||||
| `dfwds-research/source/` | 72 files | Pulled DFWDS tree (Program/, Release/, _Notes/, _Working/, History/) |
|
||||
| `dfwds-research/swagger.json` | (full Swagger dump) | For offline reading |
|
||||
| `datasheet-pipeline/dfwds-process.js` | ~120 | Node port of DFWDS classification + move |
|
||||
| `datasheet-pipeline/upload-delta.js` | ~190 | Node bulk uploader to Hoffman API |
|
||||
| `datasheet-pipeline/probe_ad2_runtime.py` | ~15 | Check Python/Node availability on AD2 |
|
||||
| `datasheet-pipeline/run_uploader_on_ad2.py` | ~80 | Orchestrator: SFTP + run uploader |
|
||||
| `datasheet-pipeline/run_full_drain.py` | ~110 | Orchestrator: DFWDS + inventory + delta + upload |
|
||||
| `datasheet-pipeline/upload_all_for_web.py` | ~70 | Simpler orchestrator: enumerate + push everything |
|
||||
| `datasheet-pipeline/upload_897.py` | ~80 | (Failed regex experiment — can delete) |
|
||||
|
||||
### Files changed on AD2
|
||||
- New: `C:\Users\sysadmin\Documents\dataforth-uploader\dfwds-process.js`
|
||||
- New: `C:\Users\sysadmin\Documents\dataforth-uploader\upload-delta.js`
|
||||
- 897 files moved: `C:\Shares\webshare\Test_Datasheets\*.txt` → `C:\Shares\webshare\For_Web\*.txt`
|
||||
- New log file: `C:\Shares\webshare\Datasheets_Log\DFWDS_2026_04_14.log`
|
||||
|
||||
### Local machine
|
||||
- Python `pyyaml` — already installed (used for SOPS YAML parsing)
|
||||
- Python `paramiko` 4.0.0 — already installed
|
||||
|
||||
---
|
||||
|
||||
## Pending / Incomplete / Open Items
|
||||
|
||||
1. **Wire up scheduled automation on AD2.** Currently this whole pipeline runs only when triggered manually from the workstation. Need either:
|
||||
- (a) `creds.json` on AD2 (ACL'd to SYSTEM/svc_testdatadb) + Windows Scheduled Task running `dfwds-process.js` then `upload-delta.js` nightly
|
||||
- (b) Workstation cron → SSH to AD2 → run with env-var creds
|
||||
- (c) Windows Credential Manager on AD2
|
||||
|
||||
User asked which to wire (a or b) at end of session — **awaiting decision.**
|
||||
|
||||
2. **Integrate into the testdatadb pipeline.** The current scripts run separately from `C:\Shares\testdatadb`. Cleaner integration would have `database/import.js` trigger DFWDS-process + upload after each incremental import, rather than nightly batch.
|
||||
|
||||
3. **Vault hygiene** (carryover from previous sessions):
|
||||
- `clients/dataforth/ad2.sops.yaml` has stale `\!` escape in password (workaround in code: `.replace('\\','')`)
|
||||
- Cloudflare tokens still in 1Password only, not in `services/cloudflare.sops.yaml`
|
||||
|
||||
4. **Re-run server inventory** at some point so we have a fresh baseline. Today's pull timed out at 405K. The data isn't critical for ongoing operation since we use the server's idempotency, but having a fresh `server_inventory.txt` makes future delta computations faster than re-uploading everything.
|
||||
|
||||
5. **Delete `upload_897.py`** (failed regex experiment) — superseded by `upload_all_for_web.py`.
|
||||
|
||||
---
|
||||
|
||||
## Reference Information
|
||||
|
||||
### URLs / endpoints
|
||||
- Swagger UI: `https://www.dataforth.com/swagger/index.html`
|
||||
- Identity Server: `https://login.dataforth.com`
|
||||
- OIDC discovery: `https://login.dataforth.com/.well-known/openid-configuration`
|
||||
|
||||
### File paths (will need again)
|
||||
- AD2 deployment dir: `C:\Users\sysadmin\Documents\dataforth-uploader\`
|
||||
- AD2 webshare: `C:\Shares\webshare\` (Test_Datasheets, For_Web, Bad_Datasheets, Datasheets_Log)
|
||||
- AD1 DFWDS source: `\\AD1\Engineering\ENGR\ATE\Test Datasheets\DFWDS\`
|
||||
- Local workspace: `D:\claudetools\projects\dataforth-dos\`
|
||||
- Local DFWDS source pull: `D:\claudetools\projects\dataforth-dos\dfwds-research\source\`
|
||||
- Local pipeline scripts: `D:\claudetools\projects\dataforth-dos\datasheet-pipeline\`
|
||||
|
||||
### DFWDS classification rules (one-liner reference)
|
||||
- `.txt` extension required
|
||||
- Dash required in stem; left-of-dash = WO#
|
||||
- WO# all-numeric → valid (unless leading 0)
|
||||
- WO# `[A-J]\d+` → decode first char (A=10, B=11, ... J=19), rename file in place, then move
|
||||
- Anything else → bad
|
||||
|
||||
### API behavior notes
|
||||
- `POST /bulk`: server returns counts per category in single response. `Updated` = same SN, different content. `Unchanged` = same SN, same content. `Created` = new SN.
|
||||
- Token TTL = 3600s. Uploader refreshes 60s before expiry.
|
||||
- Server pagination: `?page=N&pageSize=1000&afterSerialNumber=<cursor>`. Page sizes up to 1000 work.
|
||||
|
||||
### Sustained throughput observed today
|
||||
- Bulk upload: ~142 files/s sustained for ~50s (batches of 100, single client)
|
||||
- Server inventory pagination: ~700 records/s
|
||||
|
||||
---
|
||||
|
||||
## Other work in this same conversation (non-Dataforth)
|
||||
|
||||
Earlier today this session also handled email-account password resets on **ACG-DC16** (acg.local domain, 172.16.3.50) for accounts hosted by Neptune Exchange. Captured here for completeness — full detail is independent of Dataforth:
|
||||
|
||||
| Account | Email | New password |
|
||||
|---|---|---|
|
||||
| `cansley_starrpass.c` | cansley@devconllc.com (Chris Ansley) | `Natascha8144$` |
|
||||
| `bertie2` | bertie@amtransit.com (Ana Lozano) | `2026@Arizona` |
|
||||
| `caitlin` | caitlin@amtransit.com | `Welcomeamt2026!` |
|
||||
| `matilde` | matilde@amtransit.com (Slate) | `Welcomeamt2026!` |
|
||||
| `nydia` | Nydia@amtransit.com | `Welcomeamt2026!` |
|
||||
| `tom` | tsorensen@RieussetCorp.com (Tom Sorensen) | `RC$sor3740` |
|
||||
| `tomrc` | tomrc@RieussetCorp.com (Tom Sorensen) | `RC$sor3740` |
|
||||
| `ojodeagua` | ojodeagua@RieussetCorp.com (Tom Sorensen) | `RC$sor3740` |
|
||||
| `csorensen` | csorensen@RieussetCorp.com (Christine Sorensen) | `RC$sor3740` |
|
||||
|
||||
Untouched intentionally: `driver@amtransit.com`, `Jeffrey.Schaufel` (request cancelled).
|
||||
|
||||
Method: Python `winrm` with auth `administrator@acg.local / Gptf*77ttb##`, `Set-ADAccountPassword -Reset` + `Unlock-ADAccount`. UPN format required (`ACG\administrator` failed initial Python NTLM auth due to escape sequence quirk).
|
||||
|
||||
Also today: created the **IMC ticket-notes write-up** at `clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md` (formatted 2026-04-11/12/13 IMC1 work for the PSA ticket — RDS removal blocked, SQL backup cleanup, DB relocation, RDS removal parked). Opened in Notepad++ for the user.
|
||||
|
||||
Also: **Cloudflare DNS cleanup** — flipped `unifi.azcomputerguru.com` was already grey-cloud (no-op); flagged `ui.azcomputerguru.com` as the more likely target (still proxied + 525) — user didn't act on that.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2026-04-14 21:00
|
||||
**Next Actions:** decide automation approach (a/b/c above), then commit + push.
|
||||
Reference in New Issue
Block a user