Files
claudetools/clients/valleywide/app-modernization/source-analysis/size_candidates.py
Mike Swanson 5359e7c49e feat(valleywide): recover VWP Orders VB6 source from D: backup drive
Recovered Darv's VB6 source for the Valley Wide Plastering Orders
application from the D: backup drive (label "Backup", 8 TB, 5.3 TB used).
This is the first time we've had the actual source — prior session only
had a single frmPayroll.frm from the AD server.

Three project variants identified across two snapshots:
- Full-Project/   (2,129 files, 124 MB) — D:\Office-Estimates\Darv\Full\Project\
- Kingston-Project/ (2,189 files, 130 MB) — D:\Office-Estimates\Darv\Kingston\Project\
- Source/         (170 files, 559 MB)   — D:\Office-Estimates\Darv\Source\ wholesale
- SOURCE-HOLD/    (3 files, 1 MB)       — D:\Office-Estimates\Darv\SOURCE HOLD\

Latest ORDERS_C.vbp date is 2020-06-09 (Kingston snapshot). Production
Orders_10A.exe was live as of April 2024 — open question whether newer
source exists on other backup drives Mike will scan next.

Also includes per-category and per-keyword analysis CSVs from a WizTree
file-list export, plus the analyzer script that produced them
(re-runnable for the next drive's CSV).

VMs (VWIN7-DW.vdi 8.3 GB + XP-for-ORDERS_copy.vdi 2.8 GB), the live
VWP.mdb, and the 393 MB raw WizTree CSV stay on disk only — gitignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:36:27 -07:00

53 lines
2.0 KiB
Python

"""Size up candidate source folders we might want to copy off D:.
Match by exact-leaf-name; print all matches per leaf, sorted by size.
"""
import csv, sys, os
from collections import defaultdict
CSV = sys.argv[1] if len(sys.argv) > 1 else 'clients/valleywide/app-modernization/WizTree_20260516172207.csv'
# Match folders whose leaf name (the last path component) is in this set
LEAVES_OF_INTEREST = {
'project', 'source', 'source hold', 'virtualbox', 'vm_vdi',
'vwp_current', 'vwp_update', 'vwp_inv', 'vwp_current_0317',
'kingston', 'full', 'recovery', 'darv',
'virtual box', 'virtual box copy', 'xp box',
}
groups = defaultdict(list) # leaf -> [(size, files, folders, modified, path)]
with open(CSV, encoding='utf-8-sig', errors='replace') as f:
r = csv.reader(f)
next(r) # banner
next(r) # header
for row in r:
if not row or len(row) < 7:
continue
p = row[0]
if not p.endswith('\\'):
continue
leaf = p.rstrip('\\').rsplit('\\', 1)[-1].lower()
if leaf in LEAVES_OF_INTEREST:
try:
sz = int(row[1])
files = int(row[5]) if row[5].strip() else 0
folders = int(row[6]) if row[6].strip() else 0
except (ValueError, IndexError):
continue
groups[leaf].append((sz, files, folders, row[3], p))
print(f'{"GB":>8} {"Files":>8} {"Folders":>7} Modified Path')
print('-' * 140)
# Order: source code targets first, then VMs
order = ['vwp_current', 'vwp_update', 'vwp_inv', 'vwp_current_0317',
'project', 'source', 'source hold', 'kingston', 'full', 'recovery',
'virtualbox', 'vm_vdi', 'virtual box', 'virtual box copy', 'xp box',
'darv']
for leaf in order:
items = sorted(groups.get(leaf, []), reverse=True)
if not items:
continue
print(f'\n--- leaf "{leaf}" ({len(items)} match{"es" if len(items)!=1 else ""}) ---')
for sz, files, folders, mod, p in items[:10]:
print(f'{sz/1024/1024/1024:>8.2f} {files:>8} {folders:>7} {mod:<19} {p}')