- Fix voice_profiler threshold bug (HOST label overwrote Unknown unconditionally) - Audio preload optimization: single ffmpeg per episode, 149.5x realtime on 5070 Ti - WavLM threshold raised to 0.85 (Mike 0.90-0.99, callers 0.46-0.83) - Promo/bumper filter: weighted signature scoring, 42->27 clean Q&A pairs - Text-only Q&A fallback for episodes with no CALLER diarization labels - TRANSFORMERS_OFFLINE=1 to skip HuggingFace freshness checks - Add diarize_2018.py for targeted re-run + FTS5 rebuild - Add benchmark.py + BENCH_SETUP.md for GURU-BEAST-ROG (RTX 4090) comparison - Commit 9-episode training diarization.json outputs - Session log: 2026-04-27-diarization-pipeline.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
134 lines
4.0 KiB
Markdown
134 lines
4.0 KiB
Markdown
# GURU-BEAST-ROG Benchmark Setup
|
|
|
|
RTX 4090 performance comparison against DESKTOP-0O8A1RL (RTX 5070 Ti baseline: **149.5x realtime**).
|
|
|
|
---
|
|
|
|
## Step 1 — Sync repo
|
|
|
|
The audio-processor lives inside the claudetools repo. Pull latest on main.
|
|
|
|
```powershell
|
|
cd D:\claudetools # or wherever claudetools is cloned on this machine
|
|
git pull
|
|
```
|
|
|
|
If not yet cloned:
|
|
```powershell
|
|
git clone https://azcomputerguru@git.azcomputerguru.com/azcomputerguru/claudetools.git D:\claudetools
|
|
cd D:\claudetools\projects\radio-show\audio-processor
|
|
```
|
|
|
|
---
|
|
|
|
## Step 2 — Python environment
|
|
|
|
Requires Python 3.11+. Use `py` launcher on Windows.
|
|
|
|
```powershell
|
|
cd D:\claudetools\projects\radio-show\audio-processor
|
|
|
|
py -m venv .venv
|
|
.venv\Scripts\activate
|
|
|
|
# PyTorch with CUDA 12.8 (matches RTX 4090 driver)
|
|
pip install torch==2.11.0+cu128 --index-url https://download.pytorch.org/whl/cu128
|
|
|
|
# Core deps
|
|
pip install faster-whisper==1.2.1 transformers==5.6.2 soundfile==0.13.1
|
|
pip install numpy==2.4.4 rich==15.0.0 ollama==0.6.1 pyyaml scikit-learn
|
|
|
|
# Install project in editable mode
|
|
pip install -e . --no-deps
|
|
```
|
|
|
|
Verify GPU is visible:
|
|
```powershell
|
|
.venv\Scripts\python -c "import torch; print(torch.cuda.get_device_name(0))"
|
|
```
|
|
|
|
---
|
|
|
|
## Step 3 — Copy voice profiles from DESKTOP-0O8A1RL
|
|
|
|
Voice profiles are not in git (binary numpy files). Copy from the 5070 Ti machine via Tailscale.
|
|
DESKTOP-0O8A1RL Tailscale IP: **100.92.127.64**
|
|
|
|
```powershell
|
|
# From GURU-BEAST-ROG — pulls the voice-profiles directory over Tailscale
|
|
robocopy "\\100.92.127.64\claudetools\projects\radio-show\audio-processor\voice-profiles" `
|
|
"D:\claudetools\projects\radio-show\audio-processor\voice-profiles" /E /COPYALL
|
|
```
|
|
|
|
If the network share isn't available, copy manually or use scp:
|
|
```powershell
|
|
scp -r mike@100.92.127.64:"D:/claudetools/projects/radio-show/audio-processor/voice-profiles" .
|
|
```
|
|
|
|
Expected contents after copy:
|
|
```
|
|
voice-profiles/
|
|
profiles.json
|
|
mike-swanson/
|
|
composite.npy
|
|
embedding_0000.npy ... embedding_0179.npy (180 files)
|
|
```
|
|
|
|
---
|
|
|
|
## Step 4 — Download test episodes from IX server
|
|
|
|
Tailscale must be running. IX server: **172.16.3.10** (use Python paramiko — raw SSH has key agent interference).
|
|
|
|
```powershell
|
|
.venv\Scripts\python - << 'EOF'
|
|
import paramiko, os
|
|
client = paramiko.SSHClient()
|
|
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
|
|
client.connect('172.16.3.10', username='root', password='Gptf*77ttb!@#!@#',
|
|
look_for_keys=False, allow_agent=False, timeout=30)
|
|
sftp = client.open_sftp()
|
|
|
|
os.makedirs('test-data/episodes', exist_ok=True)
|
|
|
|
downloads = [
|
|
('/home/gurushow/public_html/archive/2011/3-12-11 HR 1.mp3', 'test-data/episodes/2011-03-12-hr1.mp3'),
|
|
('/home/gurushow/public_html/archive/2012/3 - March/3-10-12HR1.mp3','test-data/episodes/2012-03-10-hr1.mp3'),
|
|
('/home/gurushow/public_html/archive/2012/6 - June/6-9-12-HR1.mp3', 'test-data/episodes/2012-06-09-hr1.mp3'),
|
|
('/home/gurushow/public_html/archive/2014/06/s6e19.mp3', 'test-data/episodes/2014-s6e19.mp3'),
|
|
('/home/gurushow/public_html/archive/2016/06/s8e43.mp3', 'test-data/episodes/2016-s8e43.mp3'),
|
|
('/home/gurushow/public_html/archive/2017/04/s9e30.mp3', 'test-data/episodes/2017-s9e30.mp3'),
|
|
]
|
|
|
|
for remote, local in downloads:
|
|
size_mb = sftp.stat(remote).st_size / 1024 / 1024
|
|
print(f'Downloading {local} ({size_mb:.1f} MB)...', flush=True)
|
|
sftp.get(remote, local)
|
|
print(' done', flush=True)
|
|
|
|
sftp.close()
|
|
client.close()
|
|
print('All downloads complete.')
|
|
EOF
|
|
```
|
|
|
|
---
|
|
|
|
## Step 5 — Run benchmark
|
|
|
|
```powershell
|
|
.venv\Scripts\python benchmark.py
|
|
```
|
|
|
|
This diarizes all 6 test episodes, prints per-episode timing, and compares to the 5070 Ti baseline.
|
|
|
|
---
|
|
|
|
## Step 6 — Report results
|
|
|
|
Post the benchmark output in the session log or share back to DESKTOP-0O8A1RL.
|
|
|
|
The key number to compare: **total realtime factor** (5070 Ti got 149.5x).
|
|
|
|
Also note any Q&A pair count differences — same episodes should produce same pairs on both machines (results are deterministic given the same voice profiles).
|