- Fix voice_profiler threshold bug (HOST label overwrote Unknown unconditionally) - Audio preload optimization: single ffmpeg per episode, 149.5x realtime on 5070 Ti - WavLM threshold raised to 0.85 (Mike 0.90-0.99, callers 0.46-0.83) - Promo/bumper filter: weighted signature scoring, 42->27 clean Q&A pairs - Text-only Q&A fallback for episodes with no CALLER diarization labels - TRANSFORMERS_OFFLINE=1 to skip HuggingFace freshness checks - Add diarize_2018.py for targeted re-run + FTS5 rebuild - Add benchmark.py + BENCH_SETUP.md for GURU-BEAST-ROG (RTX 4090) comparison - Commit 9-episode training diarization.json outputs - Session log: 2026-04-27-diarization-pipeline.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.0 KiB
GURU-BEAST-ROG Benchmark Setup
RTX 4090 performance comparison against DESKTOP-0O8A1RL (RTX 5070 Ti baseline: 149.5x realtime).
Step 1 — Sync repo
The audio-processor lives inside the claudetools repo. Pull latest on main.
cd D:\claudetools # or wherever claudetools is cloned on this machine
git pull
If not yet cloned:
git clone https://azcomputerguru@git.azcomputerguru.com/azcomputerguru/claudetools.git D:\claudetools
cd D:\claudetools\projects\radio-show\audio-processor
Step 2 — Python environment
Requires Python 3.11+. Use py launcher on Windows.
cd D:\claudetools\projects\radio-show\audio-processor
py -m venv .venv
.venv\Scripts\activate
# PyTorch with CUDA 12.8 (matches RTX 4090 driver)
pip install torch==2.11.0+cu128 --index-url https://download.pytorch.org/whl/cu128
# Core deps
pip install faster-whisper==1.2.1 transformers==5.6.2 soundfile==0.13.1
pip install numpy==2.4.4 rich==15.0.0 ollama==0.6.1 pyyaml scikit-learn
# Install project in editable mode
pip install -e . --no-deps
Verify GPU is visible:
.venv\Scripts\python -c "import torch; print(torch.cuda.get_device_name(0))"
Step 3 — Copy voice profiles from DESKTOP-0O8A1RL
Voice profiles are not in git (binary numpy files). Copy from the 5070 Ti machine via Tailscale. DESKTOP-0O8A1RL Tailscale IP: 100.92.127.64
# From GURU-BEAST-ROG — pulls the voice-profiles directory over Tailscale
robocopy "\\100.92.127.64\claudetools\projects\radio-show\audio-processor\voice-profiles" `
"D:\claudetools\projects\radio-show\audio-processor\voice-profiles" /E /COPYALL
If the network share isn't available, copy manually or use scp:
scp -r mike@100.92.127.64:"D:/claudetools/projects/radio-show/audio-processor/voice-profiles" .
Expected contents after copy:
voice-profiles/
profiles.json
mike-swanson/
composite.npy
embedding_0000.npy ... embedding_0179.npy (180 files)
Step 4 — Download test episodes from IX server
Tailscale must be running. IX server: 172.16.3.10 (use Python paramiko — raw SSH has key agent interference).
.venv\Scripts\python - << 'EOF'
import paramiko, os
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect('172.16.3.10', username='root', password='Gptf*77ttb!@#!@#',
look_for_keys=False, allow_agent=False, timeout=30)
sftp = client.open_sftp()
os.makedirs('test-data/episodes', exist_ok=True)
downloads = [
('/home/gurushow/public_html/archive/2011/3-12-11 HR 1.mp3', 'test-data/episodes/2011-03-12-hr1.mp3'),
('/home/gurushow/public_html/archive/2012/3 - March/3-10-12HR1.mp3','test-data/episodes/2012-03-10-hr1.mp3'),
('/home/gurushow/public_html/archive/2012/6 - June/6-9-12-HR1.mp3', 'test-data/episodes/2012-06-09-hr1.mp3'),
('/home/gurushow/public_html/archive/2014/06/s6e19.mp3', 'test-data/episodes/2014-s6e19.mp3'),
('/home/gurushow/public_html/archive/2016/06/s8e43.mp3', 'test-data/episodes/2016-s8e43.mp3'),
('/home/gurushow/public_html/archive/2017/04/s9e30.mp3', 'test-data/episodes/2017-s9e30.mp3'),
]
for remote, local in downloads:
size_mb = sftp.stat(remote).st_size / 1024 / 1024
print(f'Downloading {local} ({size_mb:.1f} MB)...', flush=True)
sftp.get(remote, local)
print(' done', flush=True)
sftp.close()
client.close()
print('All downloads complete.')
EOF
Step 5 — Run benchmark
.venv\Scripts\python benchmark.py
This diarizes all 6 test episodes, prints per-episode timing, and compares to the 5070 Ti baseline.
Step 6 — Report results
Post the benchmark output in the session log or share back to DESKTOP-0O8A1RL.
The key number to compare: total realtime factor (5070 Ti got 149.5x).
Also note any Q&A pair count differences — same episodes should produce same pairs on both machines (results are deterministic given the same voice profiles).