Files
claudetools/projects/dataforth-dos/session-logs/2026-03-12-session.md
azcomputerguru 99e83a2848 sync: Auto-sync from Mikes-MacBook-Air.local at 2026-03-13 06:39:13
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: Mikes-MacBook-Air.local
Timestamp: 2026-03-13 06:39:13

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-13 06:39:13 -07:00

375 lines
16 KiB
Markdown

# Session Log: 2026-03-12 - D2TESTNAS VM Build, NAS Migration, Rsync Sync Fix
## Session Summary
Major infrastructure session: replaced broken SCP-based sync with rsync, built a new Debian 13 VM to replace the aging ReadyNAS, transferred data, and performed IP cutover. Also investigated BTRFS snapshots on old NAS and began DOS machine testing against new Linux-based NAS.
### Key Accomplishments
1. **Fixed Sync-FromNAS.ps1 on AD2** - Replaced broken SCP with rsync daemon protocol, added guards for stray files (TS-21, TS-3R/HVLOG), added log file write retry for AV locking
2. **Disabled old SCP scheduled tasks on AD2** - Killed Sync-FromNAS and BulkSync-Catchup tasks
3. **Built D2TESTNAS replacement VM** on DF-HYPERV-B (Debian 13, Samba SMB1, rsync daemon, BTRFS 512GB data disk)
4. **Transferred data from old NAS** - test/ data (~24GB+), datasheets, home, 82 snapshots (partial ~43GB logical)
5. **IP cutover completed** - New VM now at 192.168.0.9, old NAS on DHCP at 192.168.0.117
6. **WINS/NetBIOS conflict resolved** - Killed nmbd on old NAS, removed auto-restart cron, blocked ports 137/138 via iptables
### Key Decisions
- Chose Hyper-V VM on DF-HYPERV-B over repurposing physical server DF-SVR-D2-SYNC
- Used BTRFS for data disk with subvolumes for test and datasheets
- Single rsync stream to avoid overloading old NAS (ARM processor)
- BTRFS snapshots from old NAS are being flattened (CoW -> full copies) which makes them much larger than ReadyNAS UI reported
### Problems Encountered and Solutions
- **TS-21 stray file**: 1,129-byte DAT file from 2012 existed instead of directory. Renamed, added script guard.
- **TS-3R/LOGS/HVLOG stray file**: 56-byte file from 2013. Same fix.
- **Log file locking**: AV locking sync-from-nas.log. Added 3-retry with 100ms delay.
- **AD2 high latency**: AV causing 685-1056ms ping. Recommended exclusions.
- **NAS freezing under SSH load**: Power cycled, limited to single rsync stream.
- **nmbd auto-restart on old NAS**: Cron `*/5 * * * * pgrep -x nmbd || /usr/sbin/nmbd -D`. Removed cron, blocked ports via iptables.
- **nmcli config didn't save first attempt**: SSH dropped before apply. Re-ran successfully.
- **DOS Error 53 (network path not found)**: Old NAS still broadcasting D2TESTNAS name. Fixed by killing nmbd and blocking NetBIOS ports.
---
## Credentials
### New D2TESTNAS VM (Debian 13)
- **IP**: 192.168.0.9 (static via NetworkManager)
- **SSH**: root / Paper123!@# (also localadmin / Paper123!@#)
- **SSH Key**: ed25519 generated on VM, public key installed on old NAS
- **Key fingerprint**: SHA256:S2Eom4RwHS/8YMu+ePnOmDOJxGhIkxJQ2ocR3WsH24o root@D2TESTNAS
### Mac SSH Key (add to AD2 and D2TESTNAS)
```
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDrGbr4EwvQ4P3ZtyZW3ZKkuDQOMbqyAQUul2+JE4K4S azcomputerguru@local
```
**Add to:**
- AD2: `C:\Users\sysadmin\.ssh\authorized_keys`
- D2TESTNAS: `/root/.ssh/authorized_keys`
### Rsync Daemon (new VM)
- **Port**: 873
- **Module**: test = /data/test
- **User**: rsync
- **Password**: IQ203s32119
- **Config**: /etc/rsyncd.conf
- **Secrets**: /etc/rsyncd.secrets
### Samba (new VM)
- **Shares**: test (/data/test), datasheets (/data/datasheets), snapshots (/data/test/.snapshots)
- **Protocol**: SMB1 (CORE) through SMB3
- **Auth**: Guest OK on all shares
- **Workgroup**: D2TESTING
- **NetBIOS name**: D2TESTNAS
- **WINS support**: yes
### Old NAS (ReadyNAS)
- **Current IP**: 192.168.0.117 (DHCP, was 192.168.0.9)
- **MAC**: 28:C6:8E:34:4B:5E
- **SSH**: root (key-based auth from new VM)
- **Status**: nmbd killed, cron cleared, NetBIOS ports blocked via iptables. Samba stopped. SSH still works for rsync transfers.
### AD2 (Windows Server)
- **IP**: 192.168.0.6
- **Sync script**: C:\Scripts\Sync-FromNAS-rsync.ps1 (deployed, dry-run validated)
- **Test data path**: C:\Shares\test\
- **cwRsync**: Installed via Chocolatey
### DF-SVR-D2-SYNC (unused physical server)
- **IP**: 192.168.0.93
- **Creds**: sysadmin / Paper123!@#
- **HP ProLiant ML350 G6, 64GB RAM, Server 2019**
- **SMB share**: NAS-BACKUP (was used temporarily for CIFS backup attempt)
- **SMB1 enabled** on this server
### UDM Network
- **WINS server**: 192.168.0.9 (configured in UDM DHCP option 44)
---
## Infrastructure
### New D2TESTNAS VM Configuration
- **Host**: DF-HYPERV-B (dedicated Hyper-V host)
- **OS**: Debian 13 (Trixie)
- **Network**: eth0, static 192.168.0.9/24, gateway 192.168.0.1
- **Disks**:
- /dev/sda: OS disk
- /dev/sdb: 512GB BTRFS data disk mounted at /data
- **BTRFS subvolumes**: test, datasheets (under /data)
- **Services**: smbd, nmbd, rsync (daemon), sshd, cron
- **Snapshot cron**:
```
0 * * * * /usr/local/bin/btrfs-snapshot.sh test 48
0 * * * * /usr/local/bin/btrfs-snapshot.sh datasheets 48
0 0 * * * /usr/local/bin/btrfs-snapshot.sh test 30
0 0 * * 0 /usr/local/bin/btrfs-snapshot.sh test 12
```
### Key Config Files on New VM
- `/etc/samba/smb.conf` - Samba config (SMB1/CORE, DOS charset CP437, WINS)
- `/etc/rsyncd.conf` - rsync daemon (module "test")
- `/etc/rsyncd.secrets` - rsync auth (rsync:IQ203s32119)
- `/usr/local/bin/btrfs-snapshot.sh` - BTRFS snapshot script
### Data Transfer Status (as of ~18:30)
- **test/ data (excl snapshots)**: ~24 GB transferred, rsync still running (single stream from .117)
- **test/ snapshots**: ~43 GB logical transferred (82 snapshots), transfer was stopped to reduce NAS load - needs restart
- **datasheets/ + snapshots**: Complete (2.3 MB + 82 snapshot dirs)
- **home/**: Complete (612 KB)
- **Disk usage**: ~26 GB actual on BTRFS (CoW dedup), 486 GB free
- **Note**: ReadyNAS UI reported 5.26GB data + 16.28GB snapshots, but actual rsync transfer is MUCH larger due to BTRFS CoW flattening
---
## Files Created/Modified
### New Files
- `D:\ClaudeTools\projects\dataforth-dos\sync-fixes\Sync-FromNAS-rsync.ps1` - Complete rsync-based replacement sync script (deployed to AD2)
- `D:\ClaudeTools\projects\dataforth-dos\d2testnas-vm\setup-d2testnas.sh` - 522-line post-install setup script
- `D:\ClaudeTools\projects\dataforth-dos\d2testnas-vm\README.md` - Hyper-V creation commands, Debian install notes, cutover checklist
### Script Fixes Applied (Sync-FromNAS-rsync.ps1)
1. **Directory-only filter** for NAS station enumeration (line ~125)
2. **Station path guard** - detects stray files where directories expected
3. **Log type directory guard** - renames stray files in LOGS subdirs
4. **Write-Log retry** - 3 attempts with 100ms delay for AV file locking
### Deployed to New VM (via SSH)
- /etc/samba/smb.conf (full Samba config)
- /etc/rsyncd.conf + /etc/rsyncd.secrets
- /usr/local/bin/btrfs-snapshot.sh + cron entries
- SSH key pair generated, public key added to old NAS
---
## Pending/Incomplete Tasks
### Immediate (resume next session)
1. **Monitor test/ data rsync** - Single stream running from old NAS (.117) to new VM (.9). Check with:
```bash
ssh root@192.168.0.9 "ps aux | grep 'rsync -av' | grep -v grep; du -sh /data/test/ --exclude=.snapshots"
```
2. **Restart snapshot transfer** after data transfer completes:
```bash
ssh root@192.168.0.9 "nohup bash -c 'rsync -av root@192.168.0.117:/data/test/.snapshots/ /data/test/.snapshots/ 2>&1 | tail -5' &"
```
3. **Test DOS machine connectivity** - Error 53 was resolved (old NAS NetBIOS killed). Need to reboot DOS machine and test:
- `NET USE T: \\D2TESTNAS\TEST`
- Run CTONW.BAT (copy logs to NAS)
- Run NWTOC.BAT (download updates from NAS)
- Verify files appear in /data/test/TS-XX/LOGS/ on new VM
### After Data Transfer Complete
4. **Verify data integrity** - Compare file counts/sizes between old and new NAS
5. **Power off old NAS** once all data confirmed transferred
6. **Set up scheduled task on AD2** - Create 15-minute scheduled task for Sync-FromNAS-rsync.ps1
7. **Run real (non-dry) sync on AD2** - Execute Sync-FromNAS-rsync.ps1 without -DryRun flag
8. **AV exclusions on AD2** - Add exclusions for C:\Shares\test\ and rsync.exe
### Nice to Have
9. **Copy NAS config backup to new VM** (already backed up to DF-SVR-D2-SYNC)
10. **Datto Workplace SmartBadge research** - Researched that SmartBadge add-in for Excel doesn't exist; Workplace integrates via sync client and web, not Excel plugin
---
## DOS Machine Data Flow
```
DOS 6.22 (C:\ATE\) --COPY--> T:\MACHINE\LOGS\ (NAS via SMB1)
|
v (rsync daemon, port 873)
AD2 C:\Shares\test\
|
v (future: database ingestion)
MariaDB @ 172.16.3.30
```
### Batch Files (DOS -> NAS)
- **CTONW.BAT v3.2** - Uses COPY (not XCOPY) to upload log files from C:\ATE\ to T:\MACHINE\LOGS\
- **NWTOC.BAT v3.5** - Uses COPY to download updates from T:\COMMON\ProdSW\ to C:\BAT\ and C:\ATE\
- **UPDATE.BAT v2.1** - Uses XCOPY for full machine backup (had /D flag fix for DOS 6.22)
### Log Types
5BLOG, 7BLOG, 8BLOG, DSCLOG, SCTLOG, VASLOG, PWRLOG, HVLOG
### Active Stations
TS-3L (most recent activity), TS-4R, TS-3R, TS-11L, TS-GURU, plus many others
---
## Reference
### Key Commands
```bash
# SSH to new D2TESTNAS
ssh root@192.168.0.9
# SSH to old NAS (DHCP)
ssh root@192.168.0.117
# Check rsync transfers on new VM
ssh root@192.168.0.9 "ps aux | grep rsync | grep -v grep"
# Test Samba from Windows
net view \\192.168.0.9
smbclient -L //192.168.0.9 -N
# Test rsync daemon
rsync rsync://rsync@192.168.0.9/test/
# Restart services on new VM
ssh root@192.168.0.9 "systemctl restart smbd nmbd rsync"
# BTRFS snapshot status
ssh root@192.168.0.9 "ls /data/test/.snapshots/"
```
### Old NAS Lockdown Commands (already applied)
```bash
# Block NetBIOS (prevents name conflict)
ssh root@192.168.0.117 "iptables -A INPUT -p udp --dport 137 -j DROP; iptables -A INPUT -p udp --dport 138 -j DROP; iptables -A OUTPUT -p udp --sport 137 -j DROP; iptables -A OUTPUT -p udp --sport 138 -j DROP"
# Remove auto-restart cron
ssh root@192.168.0.117 "crontab -r"
```
---
## Session Timeline
- Started: ~14:00 (context recovery from previous session)
- Rsync script fixes and deployment to AD2
- Disabled old SCP scheduled tasks
- Investigated BTRFS snapshots (81 found)
- Built D2TESTNAS VM on DF-HYPERV-B (Debian 13)
- Configured all services (Samba, rsync, BTRFS, SSH)
- Started data transfer from old NAS
- Killed snapshot transfer to reduce NAS load (single stream)
- IP cutover: new VM .185 -> .9, old NAS .9 -> DHCP .117
- Resolved WINS conflict (killed old NAS nmbd, removed cron, blocked ports)
- DOS machine testing started - Error 53 resolved
- Data transfer ongoing (~24GB+ transferred, snapshots pending restart)
- Session saved: ~18:45
## Update: ~19:30 - Batch File Fix and DOS Machine Testing
### DOS Machine Testing Results
- All 4 tested machines (TS-3L, TS-3R, TS-4L, TS-4R) connected to new Linux NAS successfully
- T: drive mapped via NetBIOS name (after killing old NAS nmbd)
- Files successfully copied (3 .LOG files)
- BUT: "Bad command or file name" (5x) and "Too many parameters" (5x) errors from IF EXIST/IF NOT EXIST commands
- Confirmed CTONW.BAT v3.2 on machine, correct line endings (CR+LF verified via DEBUG)
- Root cause: DOS 6.22 IF EXIST command failing on network paths - likely SMB1 compatibility issue with wildcard queries
### Fix Applied: Batch Files v4.0
Eliminated all IF EXIST/IF NOT EXIST checks from startup batch files. Directories pre-created on server.
**CTONW.BAT v4.0** - Direct COPY commands, no IF EXIST guards. Target dirs pre-created on NAS.
**NWTOC.BAT v4.0** - Direct MD and COPY commands, no IF EXIST guards. MD harmless if dir exists locally.
**AUTOEXEC.BAT v4.0** - Removed IF EXIST around CALL commands, direct MD for local dirs.
All deployed to NAS at `/data/test/COMMON/ProdSW/`. Machines will pick up new versions on next boot via NWTOC download.
### Pre-created Directories on NAS
Ran script to create LOGS/5BLOG, LOGS/7BLOG, LOGS/8BLOG, LOGS/DSCLOG, LOGS/HVLOG, LOGS/PWRLOG, LOGS/SCTLOG, LOGS/VASLOG, and Reports for ALL TS-* station directories.
### Old NAS Status
- DHCP at 192.168.0.117
- nmbd killed, cron removed, NetBIOS ports 137/138 blocked via iptables
- rsync data transfer still running (single stream, ~24GB+ transferred)
- Snapshot transfer stopped (was at ~43GB logical), needs restart after data completes
### Pending
1. Reboot a DOS machine to test v4.0 batch files (second boot needed for NWTOC v4.0)
2. Monitor data transfer completion (rsync single stream still running as of ~20:00)
3. Restart snapshot transfer after data completes
4. Verify test data appears in correct LOGS subdirectories on NAS
5. Set up AD2 scheduled task for rsync sync
6. Run real (non-dry) Sync-FromNAS-rsync.ps1 on AD2
## Update: ~20:00 - DEPLOY Trailing Space Bug and Data Upload Success
### Critical Bug Found: DEPLOY.BAT Trailing Space
- **Root cause of ALL "Too many parameters" errors**: `ECHO SET MACHINE=%MACHINE% >> C:\AUTOEXEC.BAT` includes the space before `>>` in the output
- This sets `MACHINE=TS-3L ` (with trailing space) which causes `T:\TS-3L \LOGS\DSCLOG` to be parsed as two parameters
- **Fix**: DEPLOY v4.1 moves redirect before ECHO: `>>C:\AUTOEXEC.BAT ECHO SET MACHINE=%MACHINE%`
- First line uses `>` (overwrite), rest use `>>` (append)
- DEPLOY v4.1 deployed to NAS at `/data/test/COMMON/ProdSW/DEPLOY.BAT`
### Samba Case Sensitivity - Confirmed OK
- `smb.conf` has `case sensitive = no` and `default case = upper`
- No duplicate directories (only `TS-4L` exists, not `ts-4L`)
### TS-3L Deploy Test
- Ran `T:\UPDATE TS-3L` which calls DEPLOY v4.0 (before trailing space fix)
- DEPLOY completed, files confirmed v4.0 on machine via TYPE
- After reboot: NAS still showed old CTONW.LOG/NWTOC.LOG - MACHINE had trailing space
- Running CTONW manually showed 9x "Too many parameters" on all COPY-to-subdirectory lines
- `COPY C:\ATE\*.LOG T:\%MACHINE%` worked (no subdirectory in path) but `COPY ... T:\%MACHINE%\LOGS\DSCLOG` failed
- This confirmed the trailing space theory - space before `\LOGS\` splits the path
### TS-4L Data Upload - SUCCESS
- TS-4L uploaded data at 20:10 with clean MACHINE variable (no trailing space)
- **84 test data files uploaded to NAS:**
- 5BLOG: 20 files
- 7BLOG: 29 files (historical .SHT files)
- 8BLOG: 10 files
- DSCLOG: 21 files (including today's 38-02.DAT from 03-12-26)
- SCTLOG: 2 files
- VASLOG: 2 files
- **90+ work-order Reports** (.TXT files) uploaded to TS-4L/Reports/
- **3 LOG files** (NWTOC.LOG, CTONW.LOG, CTONWTXT.LOG)
- CTONW.LOG confirms: `CTONW.BAT v4.0 / Machine: TS-4L` (no trailing space)
### Original STARTNET.BAT Found (from TS-3L backup)
The actual STARTNET.BAT on DOS machines loads network drivers manually:
```
LH /L:0;1,45472 /S c:\net\smartdrv.exe /q
c:\net\net initialize
c:\net\netbind.com
lh c:\net\umb.com
c:\net\tcptsr.exe
c:\net\tinyrfc.exe
c:\net\nmtsr.exe
c:\net\emsbfr.exe
c:\net\net start
net use T: \\d2testnas\test
net use X: \\d2testnas\datasheets
```
- `net start` prompts for computer name (pre-populated from SYSTEM.INI)
- Could add `/y` flag to suppress prompt, or use MACHINE variable
- Our v2.0 STARTNET.BAT on ProdSW is a simplified rewrite that was never deployed to machines
### T:\UPDATE.BAT
- Tiny 4-line wrapper at root of test share: `CALL T:\COMMON\ProdSW\DEPLOY.BAT %1`
- Allows running `T:\UPDATE TS-3L` from DOS machines
### Rsync Transfer Status
- Single stream still running from old NAS (.117) to new VM (.9)
- Snapshot transfer still pending restart
### Files Modified This Update
- `D:\ClaudeTools\projects\dataforth-dos\batch-files\DEPLOY.BAT` - v4.1 (trailing space fix)
- Deployed to NAS at `/data/test/COMMON/ProdSW/DEPLOY.BAT`
---
## Pending/Incomplete Tasks (Updated)
### Immediate
1. **Re-deploy TS-3L with DEPLOY v4.1** - needs new deploy + reboot to fix trailing space
2. **Set up AD2 rsync scheduled task** - Sync-FromNAS-rsync.ps1 deployed but no task created (15-min interval planned)
3. **Run real (non-dry) sync** of Sync-FromNAS-rsync.ps1 on AD2
4. **Database ingestion pipeline** - No ingestion exists yet. Data flows: NAS -> AD2 -> MariaDB @ 172.16.3.30
### After Data Transfer Complete
5. **Monitor old NAS rsync completion** - single stream still running
6. **Restart snapshot transfer** after data completes
7. **Verify data integrity** - compare file counts between old and new NAS
8. **Power off old NAS** once confirmed
### Batch File Updates Needed
9. **UPDATE.BAT** (in ProdSW) - has IF EXIST checks, needs v4.0 treatment
10. **ATESYNC.BAT / ATESYNCD.BAT** - have IF EXIST checks, need v4.0 treatment (not currently called by AUTOEXEC)
11. **STARTNET.BAT** - consider deploying updated version or adding `/y` to suppress net start prompt
12. **AV exclusions on AD2** - add exclusions for C:\Shares\test\ and rsync.exe