Session log: D2TESTNAS VM build, NAS migration, rsync sync fix
Built Debian 13 VM replacement for aging ReadyNAS, deployed rsync-based sync script to AD2, transferred data, completed IP cutover to 192.168.0.9. Includes setup scripts, sync fixes, and comprehensive session logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,61 @@
|
||||
# TestDataDB Investigation - Missing Recent Records
|
||||
**Date:** 2026-03-11
|
||||
**Status:** BLOCKED - VPN down, need parser source code from AD2
|
||||
|
||||
## Problem
|
||||
The test database on AD2 has newest `test_date` of 2026-01-19 despite daily tests being run every weekday. After a full re-import (1,028,275 -> 1,632,793 records), the max date did not change.
|
||||
|
||||
## Key Evidence
|
||||
1. DAT files with TODAY's timestamps (2026-03-11 13:29-13:30) exist on AD2 at `C:\Shares\test\TS-1L\LOGS\5BLOG\`
|
||||
2. These files were processed by the full import (604,518 new records added)
|
||||
3. But `MAX(test_date)` is still 2026-01-19
|
||||
|
||||
## Sample DAT File Content (33-06D.DAT from 5BLOG)
|
||||
```
|
||||
"SCM5B33-06D "
|
||||
9.528068E-02,.9528068,.9696456,.1683873,"PASS"
|
||||
.3142081,3.142081,3.155015,.1293349,"PASS"
|
||||
.5374944,5.374944,5.380653,5.708694E-02,"PASS"
|
||||
.7651215,7.651215,7.651233,1.811981E-04,"PASS"
|
||||
.997809,9.97809,9.967015,-.1107502,"PASS"
|
||||
"PASS 92.941","","PASS .16838733","PASS 2.045663E-023","PASS-2.213427E-023"
|
||||
"","PASS .29938753","","PASS .79032473","PASS .05982453"
|
||||
"","PASS 117.35150","","PASS-.1008325","PASS 91.444891"
|
||||
"PASS 1.024068E-023",""
|
||||
```
|
||||
|
||||
**No date anywhere in the content or filename.**
|
||||
|
||||
## Root Cause Hypothesis
|
||||
The `multiline.js` parser at `C:\Shares\testdatadb\parsers\multiline.js` determines `test_date` from some source. Possibilities:
|
||||
1. **File modification time (mtime)** - most likely since there's no date in content
|
||||
2. **A date field elsewhere in larger files** - maybe we only saw a partial file
|
||||
3. **A hardcoded date or fallback** - parser might have a bug
|
||||
|
||||
If parser uses mtime, the question is whether mtime is preserved when:
|
||||
- DOS machines XCOPY files to NAS
|
||||
- Sync-FromNAS.ps1 SCPs files from NAS to AD2
|
||||
- SCP with -O flag may or may not preserve timestamps
|
||||
|
||||
## HISTLOGS vs Station Files
|
||||
HISTLOGS files at `C:\Shares\test\Ate\HISTLOGS\` are the authoritative consolidated source. These may have a DIFFERENT format than the per-station DAT files. The initial import (1,030,940 records) came mostly from HISTLOGS (576K) and Recovery-TEST (454K), with only 59 from live station data.
|
||||
|
||||
The 604K new records from the re-import might all be from HISTLOGS/Recovery with dates up to Jan 19, while the per-station files might be producing 0 records or records with the same old dates.
|
||||
|
||||
## Next Steps (when VPN reconnects)
|
||||
1. **READ THE PARSER:** `ssh sysadmin@AD2 "type C:\Shares\testdatadb\parsers\multiline.js"`
|
||||
2. **Check a specific record:** Query DB for records from `33-06D.DAT` source file to see what test_date was assigned
|
||||
3. **Check import logs:** `ssh sysadmin@AD2 "type C:\Shares\test\scripts\sync-from-nas.log"` for any import errors
|
||||
4. **Verify HISTLOGS content:** Check if HISTLOGS files have different format than station files
|
||||
|
||||
## Deployed Fixes This Session
|
||||
- Sync-FromNAS.ps1: Get-NASFileList fix (stdout deadlock), 8.3 filename filtering, SCP path escaping
|
||||
- import.js: Changed INSERT OR IGNORE to INSERT OR REPLACE
|
||||
- Both deployed to AD2 at C:\Shares\test\scripts\ and C:\Shares\testdatadb\database\
|
||||
- Commit dd4086d (local only, not pushed - Gitea unreachable)
|
||||
|
||||
## Session Context
|
||||
- Worked on from Windows machine (ACG-M-L5090)
|
||||
- VPN went down during investigation
|
||||
- Previous session summary in conversation compaction
|
||||
- User said "continue to work this problem - we need to find those records"
|
||||
243
projects/dataforth-dos/session-logs/2026-03-12-session.md
Normal file
243
projects/dataforth-dos/session-logs/2026-03-12-session.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Session Log: 2026-03-12 - D2TESTNAS VM Build, NAS Migration, Rsync Sync Fix
|
||||
|
||||
## Session Summary
|
||||
|
||||
Major infrastructure session: replaced broken SCP-based sync with rsync, built a new Debian 13 VM to replace the aging ReadyNAS, transferred data, and performed IP cutover. Also investigated BTRFS snapshots on old NAS and began DOS machine testing against new Linux-based NAS.
|
||||
|
||||
### Key Accomplishments
|
||||
1. **Fixed Sync-FromNAS.ps1 on AD2** - Replaced broken SCP with rsync daemon protocol, added guards for stray files (TS-21, TS-3R/HVLOG), added log file write retry for AV locking
|
||||
2. **Disabled old SCP scheduled tasks on AD2** - Killed Sync-FromNAS and BulkSync-Catchup tasks
|
||||
3. **Built D2TESTNAS replacement VM** on DF-HYPERV-B (Debian 13, Samba SMB1, rsync daemon, BTRFS 512GB data disk)
|
||||
4. **Transferred data from old NAS** - test/ data (~24GB+), datasheets, home, 82 snapshots (partial ~43GB logical)
|
||||
5. **IP cutover completed** - New VM now at 192.168.0.9, old NAS on DHCP at 192.168.0.117
|
||||
6. **WINS/NetBIOS conflict resolved** - Killed nmbd on old NAS, removed auto-restart cron, blocked ports 137/138 via iptables
|
||||
|
||||
### Key Decisions
|
||||
- Chose Hyper-V VM on DF-HYPERV-B over repurposing physical server DF-SVR-D2-SYNC
|
||||
- Used BTRFS for data disk with subvolumes for test and datasheets
|
||||
- Single rsync stream to avoid overloading old NAS (ARM processor)
|
||||
- BTRFS snapshots from old NAS are being flattened (CoW -> full copies) which makes them much larger than ReadyNAS UI reported
|
||||
|
||||
### Problems Encountered and Solutions
|
||||
- **TS-21 stray file**: 1,129-byte DAT file from 2012 existed instead of directory. Renamed, added script guard.
|
||||
- **TS-3R/LOGS/HVLOG stray file**: 56-byte file from 2013. Same fix.
|
||||
- **Log file locking**: AV locking sync-from-nas.log. Added 3-retry with 100ms delay.
|
||||
- **AD2 high latency**: AV causing 685-1056ms ping. Recommended exclusions.
|
||||
- **NAS freezing under SSH load**: Power cycled, limited to single rsync stream.
|
||||
- **nmbd auto-restart on old NAS**: Cron `*/5 * * * * pgrep -x nmbd || /usr/sbin/nmbd -D`. Removed cron, blocked ports via iptables.
|
||||
- **nmcli config didn't save first attempt**: SSH dropped before apply. Re-ran successfully.
|
||||
- **DOS Error 53 (network path not found)**: Old NAS still broadcasting D2TESTNAS name. Fixed by killing nmbd and blocking NetBIOS ports.
|
||||
|
||||
---
|
||||
|
||||
## Credentials
|
||||
|
||||
### New D2TESTNAS VM (Debian 13)
|
||||
- **IP**: 192.168.0.9 (static via NetworkManager)
|
||||
- **SSH**: root / Paper123!@# (also localadmin / Paper123!@#)
|
||||
- **SSH Key**: ed25519 generated on VM, public key installed on old NAS
|
||||
- **Key fingerprint**: SHA256:S2Eom4RwHS/8YMu+ePnOmDOJxGhIkxJQ2ocR3WsH24o root@D2TESTNAS
|
||||
|
||||
### Rsync Daemon (new VM)
|
||||
- **Port**: 873
|
||||
- **Module**: test = /data/test
|
||||
- **User**: rsync
|
||||
- **Password**: IQ203s32119
|
||||
- **Config**: /etc/rsyncd.conf
|
||||
- **Secrets**: /etc/rsyncd.secrets
|
||||
|
||||
### Samba (new VM)
|
||||
- **Shares**: test (/data/test), datasheets (/data/datasheets), snapshots (/data/test/.snapshots)
|
||||
- **Protocol**: SMB1 (CORE) through SMB3
|
||||
- **Auth**: Guest OK on all shares
|
||||
- **Workgroup**: D2TESTING
|
||||
- **NetBIOS name**: D2TESTNAS
|
||||
- **WINS support**: yes
|
||||
|
||||
### Old NAS (ReadyNAS)
|
||||
- **Current IP**: 192.168.0.117 (DHCP, was 192.168.0.9)
|
||||
- **MAC**: 28:C6:8E:34:4B:5E
|
||||
- **SSH**: root (key-based auth from new VM)
|
||||
- **Status**: nmbd killed, cron cleared, NetBIOS ports blocked via iptables. Samba stopped. SSH still works for rsync transfers.
|
||||
|
||||
### AD2 (Windows Server)
|
||||
- **IP**: 192.168.0.6
|
||||
- **Sync script**: C:\Scripts\Sync-FromNAS-rsync.ps1 (deployed, dry-run validated)
|
||||
- **Test data path**: C:\Shares\test\
|
||||
- **cwRsync**: Installed via Chocolatey
|
||||
|
||||
### DF-SVR-D2-SYNC (unused physical server)
|
||||
- **IP**: 192.168.0.93
|
||||
- **Creds**: sysadmin / Paper123!@#
|
||||
- **HP ProLiant ML350 G6, 64GB RAM, Server 2019**
|
||||
- **SMB share**: NAS-BACKUP (was used temporarily for CIFS backup attempt)
|
||||
- **SMB1 enabled** on this server
|
||||
|
||||
### UDM Network
|
||||
- **WINS server**: 192.168.0.9 (configured in UDM DHCP option 44)
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure
|
||||
|
||||
### New D2TESTNAS VM Configuration
|
||||
- **Host**: DF-HYPERV-B (dedicated Hyper-V host)
|
||||
- **OS**: Debian 13 (Trixie)
|
||||
- **Network**: eth0, static 192.168.0.9/24, gateway 192.168.0.1
|
||||
- **Disks**:
|
||||
- /dev/sda: OS disk
|
||||
- /dev/sdb: 512GB BTRFS data disk mounted at /data
|
||||
- **BTRFS subvolumes**: test, datasheets (under /data)
|
||||
- **Services**: smbd, nmbd, rsync (daemon), sshd, cron
|
||||
- **Snapshot cron**:
|
||||
```
|
||||
0 * * * * /usr/local/bin/btrfs-snapshot.sh test 48
|
||||
0 * * * * /usr/local/bin/btrfs-snapshot.sh datasheets 48
|
||||
0 0 * * * /usr/local/bin/btrfs-snapshot.sh test 30
|
||||
0 0 * * 0 /usr/local/bin/btrfs-snapshot.sh test 12
|
||||
```
|
||||
|
||||
### Key Config Files on New VM
|
||||
- `/etc/samba/smb.conf` - Samba config (SMB1/CORE, DOS charset CP437, WINS)
|
||||
- `/etc/rsyncd.conf` - rsync daemon (module "test")
|
||||
- `/etc/rsyncd.secrets` - rsync auth (rsync:IQ203s32119)
|
||||
- `/usr/local/bin/btrfs-snapshot.sh` - BTRFS snapshot script
|
||||
|
||||
### Data Transfer Status (as of ~18:30)
|
||||
- **test/ data (excl snapshots)**: ~24 GB transferred, rsync still running (single stream from .117)
|
||||
- **test/ snapshots**: ~43 GB logical transferred (82 snapshots), transfer was stopped to reduce NAS load - needs restart
|
||||
- **datasheets/ + snapshots**: Complete (2.3 MB + 82 snapshot dirs)
|
||||
- **home/**: Complete (612 KB)
|
||||
- **Disk usage**: ~26 GB actual on BTRFS (CoW dedup), 486 GB free
|
||||
- **Note**: ReadyNAS UI reported 5.26GB data + 16.28GB snapshots, but actual rsync transfer is MUCH larger due to BTRFS CoW flattening
|
||||
|
||||
---
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
### New Files
|
||||
- `D:\ClaudeTools\projects\dataforth-dos\sync-fixes\Sync-FromNAS-rsync.ps1` - Complete rsync-based replacement sync script (deployed to AD2)
|
||||
- `D:\ClaudeTools\projects\dataforth-dos\d2testnas-vm\setup-d2testnas.sh` - 522-line post-install setup script
|
||||
- `D:\ClaudeTools\projects\dataforth-dos\d2testnas-vm\README.md` - Hyper-V creation commands, Debian install notes, cutover checklist
|
||||
|
||||
### Script Fixes Applied (Sync-FromNAS-rsync.ps1)
|
||||
1. **Directory-only filter** for NAS station enumeration (line ~125)
|
||||
2. **Station path guard** - detects stray files where directories expected
|
||||
3. **Log type directory guard** - renames stray files in LOGS subdirs
|
||||
4. **Write-Log retry** - 3 attempts with 100ms delay for AV file locking
|
||||
|
||||
### Deployed to New VM (via SSH)
|
||||
- /etc/samba/smb.conf (full Samba config)
|
||||
- /etc/rsyncd.conf + /etc/rsyncd.secrets
|
||||
- /usr/local/bin/btrfs-snapshot.sh + cron entries
|
||||
- SSH key pair generated, public key added to old NAS
|
||||
|
||||
---
|
||||
|
||||
## Pending/Incomplete Tasks
|
||||
|
||||
### Immediate (resume next session)
|
||||
1. **Monitor test/ data rsync** - Single stream running from old NAS (.117) to new VM (.9). Check with:
|
||||
```bash
|
||||
ssh root@192.168.0.9 "ps aux | grep 'rsync -av' | grep -v grep; du -sh /data/test/ --exclude=.snapshots"
|
||||
```
|
||||
2. **Restart snapshot transfer** after data transfer completes:
|
||||
```bash
|
||||
ssh root@192.168.0.9 "nohup bash -c 'rsync -av root@192.168.0.117:/data/test/.snapshots/ /data/test/.snapshots/ 2>&1 | tail -5' &"
|
||||
```
|
||||
3. **Test DOS machine connectivity** - Error 53 was resolved (old NAS NetBIOS killed). Need to reboot DOS machine and test:
|
||||
- `NET USE T: \\D2TESTNAS\TEST`
|
||||
- Run CTONW.BAT (copy logs to NAS)
|
||||
- Run NWTOC.BAT (download updates from NAS)
|
||||
- Verify files appear in /data/test/TS-XX/LOGS/ on new VM
|
||||
|
||||
### After Data Transfer Complete
|
||||
4. **Verify data integrity** - Compare file counts/sizes between old and new NAS
|
||||
5. **Power off old NAS** once all data confirmed transferred
|
||||
6. **Set up scheduled task on AD2** - Create 15-minute scheduled task for Sync-FromNAS-rsync.ps1
|
||||
7. **Run real (non-dry) sync on AD2** - Execute Sync-FromNAS-rsync.ps1 without -DryRun flag
|
||||
8. **AV exclusions on AD2** - Add exclusions for C:\Shares\test\ and rsync.exe
|
||||
|
||||
### Nice to Have
|
||||
9. **Copy NAS config backup to new VM** (already backed up to DF-SVR-D2-SYNC)
|
||||
10. **Datto Workplace SmartBadge research** - Researched that SmartBadge add-in for Excel doesn't exist; Workplace integrates via sync client and web, not Excel plugin
|
||||
|
||||
---
|
||||
|
||||
## DOS Machine Data Flow
|
||||
|
||||
```
|
||||
DOS 6.22 (C:\ATE\) --COPY--> T:\MACHINE\LOGS\ (NAS via SMB1)
|
||||
|
|
||||
v (rsync daemon, port 873)
|
||||
AD2 C:\Shares\test\
|
||||
|
|
||||
v (future: database ingestion)
|
||||
MariaDB @ 172.16.3.30
|
||||
```
|
||||
|
||||
### Batch Files (DOS -> NAS)
|
||||
- **CTONW.BAT v3.2** - Uses COPY (not XCOPY) to upload log files from C:\ATE\ to T:\MACHINE\LOGS\
|
||||
- **NWTOC.BAT v3.5** - Uses COPY to download updates from T:\COMMON\ProdSW\ to C:\BAT\ and C:\ATE\
|
||||
- **UPDATE.BAT v2.1** - Uses XCOPY for full machine backup (had /D flag fix for DOS 6.22)
|
||||
|
||||
### Log Types
|
||||
5BLOG, 7BLOG, 8BLOG, DSCLOG, SCTLOG, VASLOG, PWRLOG, HVLOG
|
||||
|
||||
### Active Stations
|
||||
TS-3L (most recent activity), TS-4R, TS-3R, TS-11L, TS-GURU, plus many others
|
||||
|
||||
---
|
||||
|
||||
## Reference
|
||||
|
||||
### Key Commands
|
||||
```bash
|
||||
# SSH to new D2TESTNAS
|
||||
ssh root@192.168.0.9
|
||||
|
||||
# SSH to old NAS (DHCP)
|
||||
ssh root@192.168.0.117
|
||||
|
||||
# Check rsync transfers on new VM
|
||||
ssh root@192.168.0.9 "ps aux | grep rsync | grep -v grep"
|
||||
|
||||
# Test Samba from Windows
|
||||
net view \\192.168.0.9
|
||||
smbclient -L //192.168.0.9 -N
|
||||
|
||||
# Test rsync daemon
|
||||
rsync rsync://rsync@192.168.0.9/test/
|
||||
|
||||
# Restart services on new VM
|
||||
ssh root@192.168.0.9 "systemctl restart smbd nmbd rsync"
|
||||
|
||||
# BTRFS snapshot status
|
||||
ssh root@192.168.0.9 "ls /data/test/.snapshots/"
|
||||
```
|
||||
|
||||
### Old NAS Lockdown Commands (already applied)
|
||||
```bash
|
||||
# Block NetBIOS (prevents name conflict)
|
||||
ssh root@192.168.0.117 "iptables -A INPUT -p udp --dport 137 -j DROP; iptables -A INPUT -p udp --dport 138 -j DROP; iptables -A OUTPUT -p udp --sport 137 -j DROP; iptables -A OUTPUT -p udp --sport 138 -j DROP"
|
||||
|
||||
# Remove auto-restart cron
|
||||
ssh root@192.168.0.117 "crontab -r"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Session Timeline
|
||||
- Started: ~14:00 (context recovery from previous session)
|
||||
- Rsync script fixes and deployment to AD2
|
||||
- Disabled old SCP scheduled tasks
|
||||
- Investigated BTRFS snapshots (81 found)
|
||||
- Built D2TESTNAS VM on DF-HYPERV-B (Debian 13)
|
||||
- Configured all services (Samba, rsync, BTRFS, SSH)
|
||||
- Started data transfer from old NAS
|
||||
- Killed snapshot transfer to reduce NAS load (single stream)
|
||||
- IP cutover: new VM .185 -> .9, old NAS .9 -> DHCP .117
|
||||
- Resolved WINS conflict (killed old NAS nmbd, removed cron, blocked ports)
|
||||
- DOS machine testing started - Error 53 resolved
|
||||
- Data transfer ongoing (~24GB+ transferred, snapshots pending restart)
|
||||
- Session saved: ~18:45
|
||||
Reference in New Issue
Block a user