Files

Mike Swanson b9a4bb8807 scc: 4090 benchmark with new code state — 338.1x diarize, 94.8x transcribe

Re-ran benchmark.py on GURU-BEAST-ROG against the post-overhaul code
(co-host profile, batched Whisper int8_float16, revised Q&A extractor).

Results vs 5070 Ti baseline:
- Diarization: 209.7x -> 338.1x (+61.2%)
- Transcription: 63.8x -> 94.8x (+48.6%)
- Q&A pairs: 9 vs 10 (within run-to-run noise; structural correctness matches:
  2014 = 0 callers, 2016 = 2 WiFi caller pairs)

Setup change: BENCH_SETUP.md now lists ffmpeg as a Step-2 prereq
(winget install Gyan.FFmpeg). Was missing on this machine and the pipeline
fails silently at the first diarize call without ffprobe.

Code change: benchmark.py BASELINE_RTF updated 149.5 -> 209.7 to reflect
the 5070 Ti's post-overhaul measurement (e9ac607).

Data: 6 test episode transcripts and diarizations regenerated under the
new code path (batched Whisper output + co-host-aware speaker_map).

Correction memory: voice-profiles/tom/ directory + 5070 Ti session log
fabricated a co-host named "Tom" — Mike confirms no such person exists on
the show. The audio profile is real and the diarization separation is
sound, but the human identity attached to it is wrong. Saved under
.claude/memory/radio_show_no_cohost_named_tom.md pending Mike providing
the correct name for rename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 14:54:07 -07:00

4.4 KiB

Raw Blame History

GURU-BEAST-ROG Benchmark Setup

RTX 4090 performance comparison against DESKTOP-0O8A1RL (RTX 5070 Ti baseline: 149.5x realtime).

Step 1 — Sync repo

The audio-processor lives inside the claudetools repo. Pull latest on main.

cd D:\claudetools   # or wherever claudetools is cloned on this machine
git pull

If not yet cloned:

git clone https://azcomputerguru@git.azcomputerguru.com/azcomputerguru/claudetools.git D:\claudetools
cd D:\claudetools\projects\radio-show\audio-processor

Step 2 — Python environment

Requires Python 3.11+. Use py launcher on Windows.

ffmpeg/ffprobe must be on PATH — the voice profiler shells out for audio duration. Without it the pipeline crashes on the first diarize call.

# Install ffmpeg if not already present
winget install --id=Gyan.FFmpeg -e --accept-source-agreements --accept-package-agreements
# Open a new shell so the new PATH takes effect, then verify
ffprobe -version

cd D:\claudetools\projects\radio-show\audio-processor

py -m venv .venv
.venv\Scripts\activate

# PyTorch with CUDA 12.8 (matches RTX 4090 driver)
pip install torch==2.11.0+cu128 --index-url https://download.pytorch.org/whl/cu128

# Core deps
pip install faster-whisper==1.2.1 transformers==5.6.2 soundfile==0.13.1
pip install numpy==2.4.4 rich==15.0.0 ollama==0.6.1 pyyaml scikit-learn

# Install project in editable mode
pip install -e . --no-deps

Verify GPU is visible:

.venv\Scripts\python -c "import torch; print(torch.cuda.get_device_name(0))"

Step 3 — Copy voice profiles from DESKTOP-0O8A1RL

Voice profiles are not in git (binary numpy files). Copy from the 5070 Ti machine via Tailscale. DESKTOP-0O8A1RL Tailscale IP: 100.92.127.64

# From GURU-BEAST-ROG — pulls the voice-profiles directory over Tailscale
robocopy "\\100.92.127.64\claudetools\projects\radio-show\audio-processor\voice-profiles" `
         "D:\claudetools\projects\radio-show\audio-processor\voice-profiles" /E /COPYALL

If the network share isn't available, copy manually or use scp:

scp -r mike@100.92.127.64:"D:/claudetools/projects/radio-show/audio-processor/voice-profiles" .

Expected contents after copy:

voice-profiles/
  profiles.json
  mike-swanson/
    composite.npy
    embedding_0000.npy ... embedding_0179.npy   (180 files)

Step 4 — Download test episodes from IX server

Tailscale must be running. IX server: 172.16.3.10 (use Python paramiko — raw SSH has key agent interference).

.venv\Scripts\python - << 'EOF'
import paramiko, os
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect('172.16.3.10', username='root', password='Gptf*77ttb!@#!@#',
               look_for_keys=False, allow_agent=False, timeout=30)
sftp = client.open_sftp()

os.makedirs('test-data/episodes', exist_ok=True)

downloads = [
    ('/home/gurushow/public_html/archive/2011/3-12-11 HR 1.mp3',        'test-data/episodes/2011-03-12-hr1.mp3'),
    ('/home/gurushow/public_html/archive/2012/3 - March/3-10-12HR1.mp3','test-data/episodes/2012-03-10-hr1.mp3'),
    ('/home/gurushow/public_html/archive/2012/6 - June/6-9-12-HR1.mp3', 'test-data/episodes/2012-06-09-hr1.mp3'),
    ('/home/gurushow/public_html/archive/2014/06/s6e19.mp3',            'test-data/episodes/2014-s6e19.mp3'),
    ('/home/gurushow/public_html/archive/2016/06/s8e43.mp3',            'test-data/episodes/2016-s8e43.mp3'),
    ('/home/gurushow/public_html/archive/2017/04/s9e30.mp3',            'test-data/episodes/2017-s9e30.mp3'),
]

for remote, local in downloads:
    size_mb = sftp.stat(remote).st_size / 1024 / 1024
    print(f'Downloading {local} ({size_mb:.1f} MB)...', flush=True)
    sftp.get(remote, local)
    print('  done', flush=True)

sftp.close()
client.close()
print('All downloads complete.')
EOF

Step 5 — Run benchmark

.venv\Scripts\python benchmark.py

This diarizes all 6 test episodes, prints per-episode timing, and compares to the 5070 Ti baseline.

Step 6 — Report results

Post the benchmark output in the session log or share back to DESKTOP-0O8A1RL.

The key number to compare: total realtime factor (5070 Ti got 149.5x).

Also note any Q&A pair count differences — same episodes should produce same pairs on both machines (results are deterministic given the same voice profiles).

4.4 KiB Raw Blame History