Re-ran benchmark.py on GURU-BEAST-ROG against the post-overhaul code
(co-host profile, batched Whisper int8_float16, revised Q&A extractor).
Results vs 5070 Ti baseline:
- Diarization: 209.7x -> 338.1x (+61.2%)
- Transcription: 63.8x -> 94.8x (+48.6%)
- Q&A pairs: 9 vs 10 (within run-to-run noise; structural correctness matches:
2014 = 0 callers, 2016 = 2 WiFi caller pairs)
Setup change: BENCH_SETUP.md now lists ffmpeg as a Step-2 prereq
(winget install Gyan.FFmpeg). Was missing on this machine and the pipeline
fails silently at the first diarize call without ffprobe.
Code change: benchmark.py BASELINE_RTF updated 149.5 -> 209.7 to reflect
the 5070 Ti's post-overhaul measurement (e9ac607).
Data: 6 test episode transcripts and diarizations regenerated under the
new code path (batched Whisper output + co-host-aware speaker_map).
Correction memory: voice-profiles/tom/ directory + 5070 Ti session log
fabricated a co-host named "Tom" — Mike confirms no such person exists on
the show. The audio profile is real and the diarization separation is
sound, but the human identity attached to it is wrong. Saved under
.claude/memory/radio_show_no_cohost_named_tom.md pending Mike providing
the correct name for rename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
143 lines
4.4 KiB
Markdown
143 lines
4.4 KiB
Markdown
# GURU-BEAST-ROG Benchmark Setup
|
|
|
|
RTX 4090 performance comparison against DESKTOP-0O8A1RL (RTX 5070 Ti baseline: **149.5x realtime**).
|
|
|
|
---
|
|
|
|
## Step 1 — Sync repo
|
|
|
|
The audio-processor lives inside the claudetools repo. Pull latest on main.
|
|
|
|
```powershell
|
|
cd D:\claudetools # or wherever claudetools is cloned on this machine
|
|
git pull
|
|
```
|
|
|
|
If not yet cloned:
|
|
```powershell
|
|
git clone https://azcomputerguru@git.azcomputerguru.com/azcomputerguru/claudetools.git D:\claudetools
|
|
cd D:\claudetools\projects\radio-show\audio-processor
|
|
```
|
|
|
|
---
|
|
|
|
## Step 2 — Python environment
|
|
|
|
Requires Python 3.11+. Use `py` launcher on Windows.
|
|
|
|
ffmpeg/ffprobe must be on PATH — the voice profiler shells out for audio duration. Without it the pipeline crashes on the first diarize call.
|
|
|
|
```powershell
|
|
# Install ffmpeg if not already present
|
|
winget install --id=Gyan.FFmpeg -e --accept-source-agreements --accept-package-agreements
|
|
# Open a new shell so the new PATH takes effect, then verify
|
|
ffprobe -version
|
|
```
|
|
|
|
```powershell
|
|
cd D:\claudetools\projects\radio-show\audio-processor
|
|
|
|
py -m venv .venv
|
|
.venv\Scripts\activate
|
|
|
|
# PyTorch with CUDA 12.8 (matches RTX 4090 driver)
|
|
pip install torch==2.11.0+cu128 --index-url https://download.pytorch.org/whl/cu128
|
|
|
|
# Core deps
|
|
pip install faster-whisper==1.2.1 transformers==5.6.2 soundfile==0.13.1
|
|
pip install numpy==2.4.4 rich==15.0.0 ollama==0.6.1 pyyaml scikit-learn
|
|
|
|
# Install project in editable mode
|
|
pip install -e . --no-deps
|
|
```
|
|
|
|
Verify GPU is visible:
|
|
```powershell
|
|
.venv\Scripts\python -c "import torch; print(torch.cuda.get_device_name(0))"
|
|
```
|
|
|
|
---
|
|
|
|
## Step 3 — Copy voice profiles from DESKTOP-0O8A1RL
|
|
|
|
Voice profiles are not in git (binary numpy files). Copy from the 5070 Ti machine via Tailscale.
|
|
DESKTOP-0O8A1RL Tailscale IP: **100.92.127.64**
|
|
|
|
```powershell
|
|
# From GURU-BEAST-ROG — pulls the voice-profiles directory over Tailscale
|
|
robocopy "\\100.92.127.64\claudetools\projects\radio-show\audio-processor\voice-profiles" `
|
|
"D:\claudetools\projects\radio-show\audio-processor\voice-profiles" /E /COPYALL
|
|
```
|
|
|
|
If the network share isn't available, copy manually or use scp:
|
|
```powershell
|
|
scp -r mike@100.92.127.64:"D:/claudetools/projects/radio-show/audio-processor/voice-profiles" .
|
|
```
|
|
|
|
Expected contents after copy:
|
|
```
|
|
voice-profiles/
|
|
profiles.json
|
|
mike-swanson/
|
|
composite.npy
|
|
embedding_0000.npy ... embedding_0179.npy (180 files)
|
|
```
|
|
|
|
---
|
|
|
|
## Step 4 — Download test episodes from IX server
|
|
|
|
Tailscale must be running. IX server: **172.16.3.10** (use Python paramiko — raw SSH has key agent interference).
|
|
|
|
```powershell
|
|
.venv\Scripts\python - << 'EOF'
|
|
import paramiko, os
|
|
client = paramiko.SSHClient()
|
|
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
|
|
client.connect('172.16.3.10', username='root', password='Gptf*77ttb!@#!@#',
|
|
look_for_keys=False, allow_agent=False, timeout=30)
|
|
sftp = client.open_sftp()
|
|
|
|
os.makedirs('test-data/episodes', exist_ok=True)
|
|
|
|
downloads = [
|
|
('/home/gurushow/public_html/archive/2011/3-12-11 HR 1.mp3', 'test-data/episodes/2011-03-12-hr1.mp3'),
|
|
('/home/gurushow/public_html/archive/2012/3 - March/3-10-12HR1.mp3','test-data/episodes/2012-03-10-hr1.mp3'),
|
|
('/home/gurushow/public_html/archive/2012/6 - June/6-9-12-HR1.mp3', 'test-data/episodes/2012-06-09-hr1.mp3'),
|
|
('/home/gurushow/public_html/archive/2014/06/s6e19.mp3', 'test-data/episodes/2014-s6e19.mp3'),
|
|
('/home/gurushow/public_html/archive/2016/06/s8e43.mp3', 'test-data/episodes/2016-s8e43.mp3'),
|
|
('/home/gurushow/public_html/archive/2017/04/s9e30.mp3', 'test-data/episodes/2017-s9e30.mp3'),
|
|
]
|
|
|
|
for remote, local in downloads:
|
|
size_mb = sftp.stat(remote).st_size / 1024 / 1024
|
|
print(f'Downloading {local} ({size_mb:.1f} MB)...', flush=True)
|
|
sftp.get(remote, local)
|
|
print(' done', flush=True)
|
|
|
|
sftp.close()
|
|
client.close()
|
|
print('All downloads complete.')
|
|
EOF
|
|
```
|
|
|
|
---
|
|
|
|
## Step 5 — Run benchmark
|
|
|
|
```powershell
|
|
.venv\Scripts\python benchmark.py
|
|
```
|
|
|
|
This diarizes all 6 test episodes, prints per-episode timing, and compares to the 5070 Ti baseline.
|
|
|
|
---
|
|
|
|
## Step 6 — Report results
|
|
|
|
Post the benchmark output in the session log or share back to DESKTOP-0O8A1RL.
|
|
|
|
The key number to compare: **total realtime factor** (5070 Ti got 149.5x).
|
|
|
|
Also note any Q&A pair count differences — same episodes should produce same pairs on both machines (results are deterministic given the same voice profiles).
|