# Dataforth DOS Project - Context **Last Updated:** 2026-04-14 **Status:** Active - Datasheet Pipeline Extended for SCMVAS/SCMHVAS ## Quick Start - Infrastructure Overview | Component | IP/Location | Access | Notes | |-----------|-------------|--------|-------| | **AD2** (Primary) | 192.168.0.6 | SSH: sysadmin / vault | Windows Server 2022, hosts testdatadb service | | **AD1** (Secondary) | 192.168.0.27 | SSH: sysadmin / vault | Hosts Engineering share at \\AD1\Engineering | | **D2TESTNAS** | 192.168.0.9 | SMB1 only | Bridge for DOS test stations (TS-xx machines) | | **VPN** | Required | FortiClient | Access to 192.168.0.x network | **Get credentials:** ```bash # AD2 password (has stale backslash escape - strip it) bash D:/vault/scripts/vault.sh get-field clients/dataforth/ad2.sops.yaml credentials.password | sed 's/\\//g' # AD1 password bash D:/vault/scripts/vault.sh get-field clients/dataforth/ad1.sops.yaml credentials.password ``` **All passwords:** `Paper123!@#` (stored in vault, note backslash escape issue in ad2.sops.yaml) ## Current State (READ THIS FIRST) ### Recent Work (2026-04-11/12) **Extended Test Datasheet Pipeline for SCMVAS-Mxxx and SCMHVAS-Mxxxx families** - Added VASLOG parser support (multiline CSV .DAT format) - Created accuracy-only datasheet template (simple format, no hvin.dat lookup) - Implemented pass-through for Engineering-Tested .txt files - **Backfilled 27,503 historical records** (438 required regex patch for QB STR$() format quirk) - **434 Engineering .txt files** imported and published - Deployed to AD2, service restarted, web publishing verified **Status:** ✅ Complete, production-deployed **Critical Files Changed:** 5 modified, 1 new parser - server/parsers/vaslog.js (new) - server/templates/datasheet-exact.js (SCMVAS/SCMHVAS branch added) - server/database/import.js (recursive flag fix, VASLOG_ENG support) - server/parsers/spec-reader.js (stub for SCMVAS/SCMHVAS) - deploy/deploy-to-ad2.py (vault-based credentials) **Session Logs:** - **2026-04-12-session.md** - Implementation, deploy, backfill, patch (DEFINITIVE) - **2026-04-11-discovery-session.md** - Discovery phase ### testdatadb Service (on AD2) - **Service Name:** testdatadb - **Status:** Running - **Service Account:** INTRANET\svc_testdatadb - **Working Directory:** C:\Shares\testdatadb - **API Port:** 3000 (http://192.168.0.6:3000) - **Database:** SQLite at C:\Shares\testdatadb\database/testdata.db (4.1GB) - **Web Output:** X:\For_Web (= \\ad2\webshare\For_Web UNC path) ### File Shares on AD2 ``` C:\Shares\test\ # Mirror of D2TESTNAS test data ├── TS-xx\LOGS\ # Test logs from DOS stations │ ├── 5BLOG\ # SCM5B family │ ├── 8BLOG\ # 8B family │ ├── VASLOG\ # SCMVAS/SCMHVAS .DAT files │ │ ├── HVAS-M01.DAT # Production logs │ │ ├── VAS-M100.DAT │ │ └── VASLOG - Engineering Tested\ # 434 .txt files │ └── ... └── Corrected HVAS Files\ # 200 pre-generated datasheets C:\Shares\testdatadb\ # Node.js application ├── server/ │ ├── parsers/ # Log file parsers │ ├── templates/ # Datasheet formatters │ └── database/ # Import/export scripts ├── database/ │ └── testdata.db # SQLite (4.1GB, not in git) └── node_modules/ ``` ### File Shares on AD1 ``` \\AD1\Engineering\ └── ENGR\ATE\High Voltage Input Module Test\ ├── HVDATA\ │ └── hvin.dat # Spec database (33 records, engineering MODNAMEs) └── Released\ ├── TESTHV3.BAS # Primary test program (2020) ├── TESTHV4.BAS # Alternate test program (2017) ├── NLIBATE3.BAS # ATE library └── DBHV.BAS # Database editor (TYPE DBASE definition) ``` ## Email / SMTP Dataforth is **M365 hybrid** — Exchange Online is the mail system. Use SMTP via M365: - **SMTP host:** smtp.office365.com **Port:** 587 (STARTTLS) - **Auth:** sysadmin@dataforth.com (vault: `clients/dataforth/m365.sops.yaml` → `credentials.password`) - **Tenant ID:** `7dfa3ce8-c496-4b51-ab8d-bd3dcd78b584` - **Neptune Exchange (neptune.acghosting.com):** ACG infrastructure — NOT Dataforth's, do not use --- ## Anti-Patterns (DON'T DO THIS) ❌ **DO NOT hardcode Paper123!@#** - Always fetch from vault: ```bash bash D:/vault/scripts/vault.sh get-field clients/dataforth/ad2.sops.yaml credentials.password | sed 's/\\//g' ``` ❌ **DO NOT use X: drive in SSH sessions** - It's only mapped under service account. Use UNC path instead: ```powershell # Wrong: node database/export-datasheets.js # Fails: "X:\For_Web does not exist" # Right: $env:OUTPUT_DIR = "\\ad2\webshare\For_Web" node database/export-datasheets.js ``` ❌ **DO NOT assume hvin.dat lookup works** - Marketing names (SCMHVAS-M0100) ≠ engineering MODNAMEs (SCM5B41-1181). SCMVAS/SCMHVAS use simplified accuracy-only template WITHOUT hvin.dat. ❌ **DO NOT pass 50+ file paths on PowerShell command line** - Hits "Command line too long". Use inline node script with fs.readdirSync instead. ❌ **DO NOT commit testdata.db or large samples** - 4.1GB database is in .gitignore. Keep research samples local only. ❌ **DO NOT use SMB1 on AD2** - Disabled for security. Use SSH/SFTP (port 22) or SMB2+ shares. ❌ **DO NOT expect immediate output from exec_command** - paramiko buffers stdout. Use progress markers or drain at completion. ❌ **DO NOT assume VPN is stable** - Dataforth VPN can drop mid-session. Save work frequently, use local samples for offline analysis. ## Where to Find Things ### Codebase Structure ``` projects/dataforth-dos/ ├── datasheet-pipeline/ │ ├── implementation/ # Staged code (approved by Code Review) │ ├── scmvas-hvas-research/ # Discovery scripts and source files │ │ ├── source/ # TESTHV3.BAS, hvin.dat, etc. │ │ ├── samples/ # .DAT and .txt samples (local) │ │ ├── parse_hvin.py # hvin.dat binary parser │ │ └── pull-*.py # SSH download scripts │ └── IMPLEMENTATION_PLAN.md # Approved plan (2026-04-11) ├── deploy/ │ └── deploy-to-ad2.py # Deployment script (vault-based auth) ├── session-logs/ │ ├── 2026-04-12-session.md # SCMVAS/SCMHVAS implementation (DEFINITIVE) │ └── 2026-04-11-discovery-session.md └── CONTEXT.md # This file ``` ### Production Files on AD2 ``` C:\Shares\testdatadb\ ├── server.js # Main entry point ├── server/ │ ├── parsers/ │ │ ├── multiline.js # Handles VASLOG .DAT (CSV format) │ │ ├── vaslog.js # VASLOG-specific logic (new) │ │ └── spec-reader.js # Spec DB loader (stub for SCMVAS/SCMHVAS) │ ├── templates/ │ │ └── datasheet-exact.js # Datasheet formatter (SCMVAS/SCMHVAS branch added) │ └── database/ │ ├── import.js # LOG_TYPES registry, importFiles() │ └── export-datasheets.js # Batch export script └── database/ └── testdata.db # SQLite (27k+ records after backfill) ``` ## Common Operations ### Deploy Code to AD2 ```bash # From projects/dataforth-dos/deploy/ python3 deploy-to-ad2.py # What it does: # 1. Fetches password from vault (D:/vault/scripts/vault.sh) # 2. Connects via paramiko SFTP to 192.168.0.6:22 # 3. Creates .bak-YYYYMMDD timestamped backups # 4. Uploads modified files from implementation/ # 5. Restarts testdatadb service via SSH exec_command # 6. Verifies API responds 200 OK on port 3000 ``` **Manual deployment (if script unavailable):** ```bash # Get password AD2_PASS=$(bash D:/vault/scripts/vault.sh get-field clients/dataforth/ad2.sops.yaml credentials.password | sed 's/\\//g') # Connect sshpass -p "${AD2_PASS}" ssh sysadmin@192.168.0.6 # Backup + copy cd C:\Shares\testdatadb\server\parsers copy multiline.js multiline.js.bak-20260414 # ... upload new files via SFTP ... # Restart service Restart-Service -Name testdatadb # Verify curl http://localhost:3000 ``` ### Import New Test Data ```bash # SSH to AD2 ssh sysadmin@192.168.0.6 # Run import for specific log type cd C:\Shares\testdatadb node database/import.js # Import specific files (avoid "Command line too long") node -e " const importFiles = require('./server/database/import').importFiles; const fs = require('fs'); const files = fs.readdirSync('C:/Shares/test/TS-3R/LOGS/VASLOG/VASLOG - Engineering Tested') .filter(f => f.endsWith('.txt')) .map(f => 'C:/Shares/test/TS-3R/LOGS/VASLOG/VASLOG - Engineering Tested/' + f); importFiles(files, 'VASLOG_ENG').then(() => console.log('Done')); " ``` ### Export Datasheets for Web ```bash # SSH to AD2 ssh sysadmin@192.168.0.6 # Export all pending datasheets cd C:\Shares\testdatadb $env:OUTPUT_DIR = "\\ad2\webshare\For_Web" # NOT X:\For_Web in SSH node database/export-datasheets.js # Export specific model family node database/export-datasheets.js --family SCMHVAS ``` ### Backfill Historical Data ```bash # SSH to AD2, run as inline script to avoid command-line length limits node -e " const db = require('./server/database/db'); const exportDatasheet = require('./server/templates/datasheet-exact'); db.all(\` SELECT * FROM test_records WHERE log_type IN ('VASLOG', 'VASLOG_ENG') AND exported_at IS NULL ORDER BY id \`, (err, rows) => { if (err) throw err; console.log(\`[INFO] Found \${rows.length} records to export\`); let count = 0; rows.forEach(row => { try { exportDatasheet(row); count++; if (count % 100 === 0) console.log(\`[PROGRESS] \${count}/\${rows.length}\`); } catch (e) { console.error(\`[SKIP] \${row.model_name}: \${e.message}\`); } }); console.log(\`[DONE] Exported \${count} datasheets\`); }); " ``` ### Check Service Status ```powershell # On AD2 (via SSH or RDP) Get-Service testdatadb # View service logs (if logging enabled) Get-EventLog -LogName Application -Source testdatadb -Newest 50 # Test API Invoke-WebRequest http://localhost:3000 | Select-Object StatusCode # Check process Get-Process | Where-Object { $_.ProcessName -like "*node*" } ``` ### Access Shares from macOS/Linux ```bash # Mount AD2 share (SMB2+) mkdir -p ~/mnt/ad2-testdatadb mount_smbfs //sysadmin:Password@192.168.0.6/testdatadb ~/mnt/ad2-testdatadb # Mount AD1 Engineering share mkdir -p ~/mnt/ad1-engineering mount_smbfs //sysadmin:Password@192.168.0.27/Engineering ~/mnt/ad1-engineering # Unmount umount ~/mnt/ad2-testdatadb ``` ## Key Technical Decisions (ADRs) **2026-04-12:** Use Option C (simple accuracy-only template, no hvin.dat lookup) - Reason: Marketing names (SCMHVAS-M0100) ≠ engineering MODNAMEs (SCM5B41-1181) in hvin.dat - Sample datasheets show simple 1-parameter format (Accuracy only) - Spec-reader stub lets SCMVAS/SCMHVAS pass through pipeline without schema changes **2026-04-12:** Pass-through for VASLOG_ENG .txt files (not re-render) - Reason: Engineering-Tested files already match target format exactly - fs.copyFileSync() guarantees byte-level fidelity, avoids encoding round-trip - Fallback to writeFileSync(raw_data, 'utf8') if source file missing **2026-04-12:** Fix recursive=false default regression with `config.recursive !== false` - Reason: Adding `recursive` field to LOG_TYPES must not break 7 pre-existing families - Treats absent/undefined as true (legacy behavior), explicit false as false **2026-04-12:** Vault-based credentials in deploy script (no hardcoding, no prompts) - Reason: Never commit passwords, even to private repo - deploy-to-ad2.py calls vault.sh with 30s timeout, fails loud if unavailable - No env-var fallback, no interactive prompt **2026-04-12:** MM/DD/YYYY date normalization for datasheet Date field - Reason: Matches newest Engineering-Tested samples - Older "Corrected HVAS Files" used MM-DD-YYYY (hyphens) - backfill rewrites with slashes - Intentional visible change, documented in implementation plan **2026-04-12:** Patch regex with plain-decimal fallback for QuickBASIC STR$() quirk - Reason: QB STR$() emits scientific notation for most values, plain decimal for ~1.6% - Not a version difference or bug - purely QB float-to-string formatting threshold - Two-regex approach: try scientific first, fall back to plain decimal ## QuickBASIC Artifacts & Log Formats ### VASLOG .DAT Structure ``` "SCMHVAS-M0100 " # Header: model name (marketing, NOT engineering MODNAME) 20,0.0034 # CSV line 1: measurement data 40,0.0126 # CSV line 2 60,-0.0046 # CSV line 3 80,0.0141 # CSV line 4 100,-0.00325 # CSV line 5 "PASS-7.005501E-033",... # Status line: PASS/FAIL + accuracy (scientific OR plain decimal) "179379-1","04-09-2026" # Footer: serial number, test date (MM-DD-YYYY) ``` ### VASLOG_ENG .txt Structure (Engineering-Tested) ``` SCMHVAS - M0100 SN: 171087-1 Date: 04/08/2024 Test: PASS Accuracy: -7.0055E-03 % ``` ### QuickBASIC STR$() Formatting Quirk ```basic ' QB emits TWO formats for floats: PRINT STR$(-7.005501E-03) ' → "-7.005501E-033" (scientific + status digit) PRINT STR$(0.01599373) ' → " .01599373" (plain decimal, leading space) ' Threshold: ~0.01 magnitude ' Affects ~1.6% of records (438/27503) ' NOT a bug - documented QB behavior ``` ### hvin.dat Binary Format ``` TYPE DBASE (from DBHV.BAS) MODNAME AS STRING * 13 ' Engineering ID: "SCM5B41-1181 " INTYPE AS STRING * 3 OUTSIGTYPE AS STRING * 7 WAVESHPCAL AS STRING * 8 ' ... 42 SINGLE floats (IEEE 754, 4 bytes each) ... END TYPE ' Total: 13+3+7+8 + (42*4) = 199 bytes/record ' File size: 6567 bytes = 33 records ``` ## Troubleshooting ### "Output directory does not exist: X:\For_Web" - **Cause:** X: drive only mapped under service account, not in SSH session - **Fix:** Use UNC path: `\\ad2\webshare\For_Web` ```powershell $env:OUTPUT_DIR = "\\ad2\webshare\For_Web" node database/export-datasheets.js ``` ### "Command line is too long" (PowerShell) - **Cause:** Passing 50+ file paths as arguments exceeds PowerShell limit - **Fix:** Use inline node script with fs.readdirSync (see Common Operations above) ### VPN Drops Mid-Session - **Symptom:** AD2/AD1 become unreachable, SSH hangs - **Fix:** 1. Work offline on local samples for analysis 2. Restore VPN (FortiClient) 3. Resume deployment/import when connection stable ### Vault Returns `Paper123\!@#` (Backslash) - **Cause:** Legacy shell escape stored in ad2.sops.yaml - **Fix:** Strip backslash at read-time: `sed 's/\\//g'` - **TODO:** Clean vault entry to remove backslash ### Paramiko "No Output" for Long-Running Commands - **Cause:** exec_command buffers stdout until completion - **Fix:** Either: 1. Accept final output when command completes 2. Add progress markers that flush every N records 3. Drain channel periodically: `while not channel.exit_status_ready(): channel.recv(1024)` ### 438 Records Skipped During Backfill - **Cause:** Plain-decimal format not matching scientific-notation-only regex - **Fix:** Already patched (2026-04-12). Regex now tries both formats. - **Verification:** Rerun backfill on stragglers → 438/438 rendered ## Recent Commit History **2026-04-12 (commit 0dd3d82):** SCMVAS/SCMHVAS pipeline extension - 114 files changed, 35,486 insertions - 5 production files modified, 1 new parser - All research scripts sanitized (vault-based credentials) - .gitignore updated (exclude testdata.db) ## Useful Links - **Latest Session:** session-logs/2026-04-12-session.md (DEFINITIVE) - **Discovery Session:** session-logs/2026-04-11-discovery-session.md - **Implementation Plan:** datasheet-pipeline/scmvas-hvas-research/IMPLEMENTATION_PLAN.md - **Credentials (vault):** D:\vault\clients\dataforth\ ## Quick Reference - Log Types | Family | Log Type | Format | Parser | Location | |--------|----------|--------|--------|----------| | SCM5B | 5BLOG | Multiline CSV .DAT | multiline.js | TS-xx/LOGS/5BLOG | | 8B | 8BLOG | Multiline CSV .DAT | multiline.js | TS-xx/LOGS/8BLOG | | DSCA | DSCLOG | Multiline CSV .DAT | multiline.js | TS-xx/LOGS/DSCLOG | | SCMVAS | VASLOG | Multiline CSV .DAT | vaslog.js | TS-3R/LOGS/VASLOG | | SCMHVAS (prod) | VASLOG | Multiline CSV .DAT | vaslog.js | TS-3R/LOGS/VASLOG | | SCMHVAS (eng) | VASLOG_ENG | .txt (pass-through) | vaslog.js | TS-3R/LOGS/VASLOG/VASLOG - Engineering Tested | --- **Before starting work:** Read session-logs/2026-04-12-session.md for complete context **For AD2 access:** Ensure Dataforth VPN connected (FortiClient) **For credentials:** Always use vault - never hardcode passwords