docs: Add Dataforth TestDataDB import fix instructions

Problem: Data syncs to AD2 but import.js not importing to database
- Newest DB record stuck at 2026-01-19 (2 months stale)
- rsync transfers working (9.8MB at 06:30 & 07:15)
- import.js not being called or failing silently

Instructions for PC to diagnose and fix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-03-13 07:32:31 -07:00
parent 6e892bb674
commit 1b9973e150

View File

@@ -0,0 +1,153 @@
# Dataforth TestDataDB Import Fix - 2026-03-13
## Problem Identified
Data is flowing correctly through most of the pipeline, but NOT being imported to the database:
| Stage | Status | Evidence |
|-------|--------|----------|
| DOS Machines → NAS | **[OK]** | Files from 2026-03-13 06:30 AM on D2TESTNAS |
| NAS → AD2 (rsync) | **[OK]** | 9.8MB transferred at 06:30 & 07:15 per rsync logs |
| AD2 → Database | **[BROKEN]** | Newest DB record: 2026-01-19 (2 months stale!) |
**Root Cause:** The `import.js` script is either not being called after rsync sync, or is failing silently.
---
## Diagnostic Steps (Run on AD2 or PC with AD2 access)
### 1. Check the sync log for import activity
```powershell
Get-Content "C:\Shares\test\scripts\sync-from-nas.log" -Tail 100
```
Look for:
- Lines mentioning "import" or "Import-ToDatabase"
- Any error messages
- Recent timestamps
### 2. Check if import is configured in sync script
```powershell
Select-String -Path "C:\Shares\test\scripts\Sync-FromNAS-rsync.ps1" -Pattern "import|Import-ToDatabase" -Context 2,2
```
The sync script should call `node import.js --file [files]` after syncing.
### 3. Verify TestDataDB service is running
```powershell
# Check if Node.js server is running
Get-Process node -ErrorAction SilentlyContinue
# Or check the API
Invoke-RestMethod -Uri "http://localhost:3000/api/stats"
```
### 4. Check database current state
```powershell
cd C:\Shares\testdatadb
node -e "const db=require('better-sqlite3')('database/testdata.db'); console.log(db.prepare('SELECT MAX(test_date) as latest_test, MAX(import_date) as latest_import, COUNT(*) as total FROM test_records').get())"
```
Expected output shows:
- `latest_test`: Should be recent (today or yesterday)
- `latest_import`: When last import ran
- `total`: Currently ~1.6 million records
### 5. Test manual import of a single file
```powershell
cd C:\Shares\testdatadb\database
# Pick a recent file from the sync
$recentFile = Get-ChildItem "C:\Shares\test\TS-3R\LOGS\DSCLOG\*.DAT" | Sort-Object LastWriteTime -Descending | Select-Object -First 1
# Run import
node import.js --file $recentFile.FullName
```
If this works, the import script is fine - the issue is in the sync script not calling it.
---
## Fix Options
### Option A: Fix the Sync Script to Call Import
Edit `C:\Shares\test\scripts\Sync-FromNAS-rsync.ps1` and ensure it calls import after syncing:
```powershell
# After the PULL phase completes, add:
if ($syncedFiles.Count -gt 0) {
$importScript = "C:\Shares\testdatadb\database\import.js"
$importArgs = @("import.js", "--file") + $syncedFiles
& node $importArgs 2>&1 | Tee-Object -Append -FilePath $LOG_FILE
}
```
### Option B: Run a Catch-Up Import
If the sync script is working but imports were missed, run a full re-import:
```powershell
cd C:\Shares\testdatadb\database
# This will import ALL files (takes ~30 minutes for 1M+ records)
node import.js
```
Or import only recent files:
```powershell
cd C:\Shares\testdatadb\database
# Find files modified in last 7 days and import them
$recentFiles = Get-ChildItem "C:\Shares\test\TS-*\LOGS\*\*.DAT" -Recurse |
Where-Object { $_.LastWriteTime -gt (Get-Date).AddDays(-7) }
foreach ($file in $recentFiles) {
node import.js --file $file.FullName
}
```
### Option C: Create Scheduled Task for Import
If sync and import should run separately:
```powershell
# Create a scheduled task to run import every hour
$action = New-ScheduledTaskAction -Execute "node" -Argument "C:\Shares\testdatadb\database\import.js" -WorkingDirectory "C:\Shares\testdatadb\database"
$trigger = New-ScheduledTaskTrigger -Once -At (Get-Date) -RepetitionInterval (New-TimeSpan -Hours 1)
Register-ScheduledTask -TaskName "TestDataDB-Import" -Action $action -Trigger $trigger -User "SYSTEM"
```
---
## Verification
After fixing, verify the import is working:
```powershell
# Check API for updated stats
Invoke-RestMethod -Uri "http://localhost:3000/api/stats" | ConvertTo-Json
# The "newest" date should now be today or yesterday
# The "total_records" should have increased
```
---
## Files Referenced
- **Sync Script:** `C:\Shares\test\scripts\Sync-FromNAS-rsync.ps1`
- **Sync Log:** `C:\Shares\test\scripts\sync-from-nas.log`
- **Import Script:** `C:\Shares\testdatadb\database\import.js`
- **Database:** `C:\Shares\testdatadb\database\testdata.db`
- **TestDataDB Server:** `C:\Shares\testdatadb\server.js`
---
## Context from Mac Investigation
- SSH to D2TESTNAS (192.168.0.9) confirmed data arriving from DOS machines
- TestDataDB API at http://192.168.0.6:3000/api/stats responds but shows stale data
- rsync daemon logs show successful transfers (9.8MB) to AD2
- The gap is between rsync completing and import.js being called