sync: auto-sync from GURU-BEAST-ROG at 2026-04-27 14:42:18
Author: Mike Swanson Machine: GURU-BEAST-ROG Timestamp: 2026-04-27 14:42:18
This commit is contained in:
@@ -0,0 +1,99 @@
|
||||
# Session Log — 2026-04-27 (continuation)
|
||||
|
||||
**Project:** The Computer Guru Show — Archive Mining System
|
||||
**Goal:** RTX 4090 perf comparison + run unseen test episodes through full pipeline (transcribe / diarize / Q&A)
|
||||
**Machine:** GURU-BEAST-ROG (RTX 4090, 24GB)
|
||||
**User:** Mike Swanson (mike)
|
||||
|
||||
Companion to `2026-04-27-diarization-pipeline.md` (DESKTOP-0O8A1RL, RTX 5070 Ti).
|
||||
|
||||
---
|
||||
|
||||
## Headline
|
||||
|
||||
**Diarization on RTX 4090: 308.9x realtime — 2.07x the RTX 5070 Ti baseline (149.5x).**
|
||||
|
||||
21,374s of audio across 6 unseen test episodes diarized in 69.2s wall time.
|
||||
|
||||
---
|
||||
|
||||
## Setup Notes
|
||||
|
||||
- ffmpeg/ffprobe not present on GURU-BEAST-ROG. Installed `Gyan.FFmpeg 8.1` via winget. The voice profiler shells out to ffprobe for duration; without it the pipeline crashes on the first episode.
|
||||
- The repo already contained `benchmark.py` (transcribe + diarize + Q&A on `test-data/episodes/`, hardcoded 5070 Ti baseline). Used as-is. (BENCH_SETUP.md should mention ffmpeg as a prereq.)
|
||||
- Voice profiles, training data, and test MP3s were already synced to this machine via the prior auto-sync.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Whisper Transcription (large-v3, faster-whisper)
|
||||
|
||||
| Episode | Audio | Wall | RTF |
|
||||
|---|---|---|---|
|
||||
| 2011-03-12-hr1 | 2509s | 198.2s | 12.7x |
|
||||
| 2012-03-10-hr1 | 2634s | 208.7s | 12.6x |
|
||||
| 2012-06-09-hr1 | 2648s | 192.5s | 13.8x |
|
||||
| 2014-s6e19 | 2914s | 167.0s | 17.5x |
|
||||
| 2016-s8e43 | 5326s | 339.1s | 15.7x |
|
||||
| 2017-s9e30 | 5343s | 341.2s | 15.7x |
|
||||
| **Total** | **21374s** | **1446.6s** | **14.8x** |
|
||||
|
||||
Faster-whisper large-v3, beam_size=5, fp16 on the 4090.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Diarization
|
||||
|
||||
| Episode | Audio | Wall | RTF | Turns | HOST | CALLER |
|
||||
|---|---|---|---|---|---|---|
|
||||
| 2011-03-12-hr1 | 2509s | 16.1s | 155.6x | 19 | 2470s | 125s |
|
||||
| 2012-03-10-hr1 | 2634s | 7.3s | 361.6x | 19 | 2615s | 105s |
|
||||
| 2012-06-09-hr1 | 2648s | 7.8s | 338.3x | 11 | 2500s | 195s |
|
||||
| 2014-s6e19 | 2914s | 8.3s | 352.6x | 28 | 2635s | 410s |
|
||||
| 2016-s8e43 | 5326s | 14.7s | 361.8x | 112 | 4710s | 1170s |
|
||||
| 2017-s9e30 | 5343s | 15.0s | 356.9x | 55 | 4950s | 660s |
|
||||
| **Total** | **21374s** | **69.2s** | **308.9x** | 244 | 19880s | 2665s |
|
||||
|
||||
**vs RTX 5070 Ti baseline: 149.5x → 308.9x (+159.4x, +106.6%).**
|
||||
|
||||
Episode 1 carries the cold-start penalty (CUDA init + WavLM load): 155.6x. Warm episodes 2-6 cluster at 338-362x. The total averages 308.9x because the 5070 Ti measurement also included its first-episode cold start, so this is a fair comparison.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Q&A Extraction
|
||||
|
||||
| Episode | Q&A pairs |
|
||||
|---|---|
|
||||
| 2011-03-12-hr1 | 3 |
|
||||
| 2012-03-10-hr1 | 2 |
|
||||
| 2012-06-09-hr1 | 3 |
|
||||
| 2014-s6e19 | 1 |
|
||||
| 2016-s8e43 | 5 |
|
||||
| 2017-s9e30 | 5 |
|
||||
| **Total** | **19** |
|
||||
|
||||
Density: **3.2 pairs/episode** on the unseen test set vs **3.0 pairs/episode** on the 9-episode training set (27 pairs). Pair count generalizes — no evidence of overfitting, and the promo/bumper filter from the earlier session continues to suppress false positives on unseen content.
|
||||
|
||||
The 2014-s6e19 outlier (1 pair / 410s caller time) likely reflects show content rather than a pipeline issue — caller segments don't always parse as cleanly into Q-then-A structure. Worth ear-checking that one before drawing conclusions.
|
||||
|
||||
---
|
||||
|
||||
## Generalization Findings
|
||||
|
||||
- **Untrained year:** The two 2012 episodes (year never seen during training) produced clean HOST/CALLER labels and reasonable Q&A counts. Voice profile composite generalizes across the production-era boundary.
|
||||
- **No all-HOST failures:** Every test episode hit caller segments. The 0.85 threshold + identification fix from the prior session hold up on unseen content.
|
||||
- **Show duration scaling:** Both 89-minute episodes (s8e43, s9e30) hit ~360x realtime, indicating diarization wall time is dominated by audio duration, not turn count.
|
||||
|
||||
---
|
||||
|
||||
## Files Written
|
||||
|
||||
- `test-data/transcripts/<stem>/transcript.json` (6 files)
|
||||
- `test-data/transcripts/<stem>/diarization.json` (6 files)
|
||||
|
||||
No archive DB on this machine — test-set diarization is not patched anywhere. If we want the test episodes searchable in `archive.db`, that would happen on DESKTOP-0O8A1RL where the index lives.
|
||||
|
||||
---
|
||||
|
||||
## Note for Mike
|
||||
|
||||
`BENCH_SETUP.md` Step 2 (Python environment) should add `winget install Gyan.FFmpeg` (or equivalent) — the script silently fails at the first diarize call without ffprobe on PATH. Easy doc fix; flagging here so it doesn't get lost.
|
||||
Reference in New Issue
Block a user