Added 2010, 2015, 2018 test episodes to round out the test set to one
per available year:
- 2010-05-08-hr1 (May 2010, earliest available; pre-Tara era)
- 2015-s7e19 (Jan 2015, avoids training's s7e30)
- 2018-s10e18 (only 3 non-training 2018 episodes exist)
Archive has no 2019 directory — Rob's "2018/2019 appearances" are
constrained to the 5 available 2018 episodes only.
Per-year diarization summary (Tara presence, post-rename):
2010-05-08 30s 1.2% likely false positive (pre-Tara)
2011-03-12 140s 5.6% likely false positive (call-in only)
2012-03-10 30s 1.1% likely false positive (call-in only)
2012-06-09 340s 12.8% suspicious — Mike to confirm
2014-s6e19 680s 23.3% confirmed
2015-s7e19 280s 9.9% plausible — Mike to confirm
2016-s8e43 1890s 35.5% confirmed
2017-s9e30 610s 11.4% plausible
2018-s10e18 880s 17.1% COULD BE ROB — Mike flagged Rob for
2018/2019 appearances; cosine threshold may
be hitting on Rob being acoustically similar
to Tara
Total Tara across 9 episodes: 1h 21m / 8h 52m audio (15.3%).
Q&A counts (still suspect — every voice that isn't Mike-or-Tara is
labeled CALLER, so Randall/Rob/producers inflate the bucket):
2010=4, 2011=1, 2012a=2, 2012b=0, 2014=0, 2015=1, 2016=2, 2017=4, 2018=3
Total: 17 pairs across 9 episodes
4090 perf on the expanded set:
- Diarization: 31928s in 121.5s = 262.7x realtime (vs 209.7x on 5070 Ti, +25.3%)
- Transcription (3 new episodes only): 10554s in 112.4s = 93.9x
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
190 lines
11 KiB
Markdown
190 lines
11 KiB
Markdown
# Session Log — 2026-04-27 (continuation)
|
||
|
||
**Project:** The Computer Guru Show — Archive Mining System
|
||
**Goal:** RTX 4090 perf comparison + run unseen test episodes through full pipeline (transcribe / diarize / Q&A)
|
||
**Machine:** GURU-BEAST-ROG (RTX 4090, 24GB)
|
||
**User:** Mike Swanson (mike)
|
||
|
||
Companion to:
|
||
- `2026-04-27-diarization-pipeline.md` (DESKTOP-0O8A1RL, RTX 5070 Ti — initial diarization fixes)
|
||
- `2026-04-27-qa-extraction-cohost-indexing.md` (DESKTOP-0O8A1RL — co-host profile, batched Whisper, Q&A overhaul)
|
||
|
||
This run uses the post-overhaul code (commit `e9ac607`): batched Whisper transcription, co-host-aware diarizer, revised Q&A extractor.
|
||
|
||
---
|
||
|
||
## Headline
|
||
|
||
| Metric | 5070 Ti baseline | RTX 4090 | Delta |
|
||
|---|---|---|---|
|
||
| Diarization | 209.7x realtime | **338.1x** | +128.4x (+61.2%) |
|
||
| Transcription (batched, large-v3 int8_float16) | 63.8x | **94.8x** | +31.0x (+48.6%) |
|
||
| Q&A pairs (6 test episodes) | 10 | 9 | within noise |
|
||
|
||
21,374s of audio (5h 56m) end-to-end on the 4090: **225.5s transcription + 63.2s diarization + Q&A extraction**.
|
||
|
||
---
|
||
|
||
## Co-host identity correction — Tara, not Tom
|
||
|
||
The 5070 Ti session fabricated a co-host named "Tom" — Mike confirmed there is no such person on the show. After listening to the source windows, Mike identified the voice in both 2014-s6e19 and 2016-s8e43 as **Tara** (a real co-host; the show has had multiple over the years).
|
||
|
||
Rename swept this session:
|
||
- `voice-profiles/tom/` → `voice-profiles/tara/` (git mv, all 44 embeddings + composite preserved)
|
||
- `voice-profiles/profiles.json`: `"Tom"` key → `"Tara"`
|
||
- `build_cohost_profile.py`: docstring, `TOM_WINDOWS` → `TARA_WINDOWS`, `COHOST_NAME = "Tara"`, console output strings
|
||
- `projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md`: correction header added, all body references updated
|
||
- `.claude/memory/radio_show_no_cohost_named_tom.md`: resolution recorded
|
||
- Diarization re-run post-rename so `speaker_map` in each `diarization.json` emits `Cohost: Tara`
|
||
|
||
The 5070 Ti session log's claim of "Tom was the regular co-host roughly 2013-2016" carried two errors: the wrong name AND an unverified tenure window. The corrected log notes Tara appears in 2014-s6e19 and 2016-s8e43 only — generalizing to the full 2013-2016 era hasn't been confirmed.
|
||
|
||
---
|
||
|
||
## Setup notes (for next machine)
|
||
|
||
- ffmpeg/ffprobe is required on PATH — the voice profiler shells out to ffprobe for audio duration and the pipeline crashes on the first diarize call without it. Was missing on this machine; installed via `winget install Gyan.FFmpeg`. BENCH_SETUP.md updated to call this out as a Step-2 prereq.
|
||
- `.gitignore` (added in `e9ac607`) excludes `episodes/`, `transcripts/`, `*.db`, `.venv`. The test MP3s + transcripts I committed earlier in `2c06e72` are still tracked from before the gitignore arrived; can be `git rm --cached`-ed in a follow-up cleanup.
|
||
- All voice profiles, training data, and test MP3s were already on this machine via prior auto-sync.
|
||
|
||
---
|
||
|
||
## Phase 1 — Whisper Transcription (large-v3, batched, int8_float16, batch_size=16)
|
||
|
||
| Episode | Audio | Wall | RTF |
|
||
|---|---|---|---|
|
||
| 2011-03-12-hr1 | 2509s | 29.7s | 84.6x |
|
||
| 2012-03-10-hr1 | 2634s | 30.3s | 87.0x |
|
||
| 2012-06-09-hr1 | 2648s | 33.6s | 78.8x |
|
||
| 2014-s6e19 | 2914s | 30.2s | 96.6x |
|
||
| 2016-s8e43 | 5326s | 49.2s | 108.2x |
|
||
| 2017-s9e30 | 5343s | 52.5s | 101.8x |
|
||
| **Total** | **21374s** | **225.5s** | **94.8x** |
|
||
|
||
vs 5070 Ti's 63.8x: **+48.6%**.
|
||
|
||
Batching is doing real work here. The pre-batched code path on this same hardware (first benchmark run earlier today) was 14.8x — batching gave a 6.4× speedup on the 4090.
|
||
|
||
---
|
||
|
||
## Phase 2 — Diarization (with co-host profile applied)
|
||
|
||
| Episode | Audio | Wall | RTF | Turns | HOST | CALLER |
|
||
|---|---|---|---|---|---|---|
|
||
| 2011-03-12-hr1 | 2509s | 9.1s | 275.0x | 25 | 2455s | 70s |
|
||
| 2012-03-10-hr1 | 2634s | 7.6s | 348.3x | 22 | 2615s | 90s |
|
||
| 2012-06-09-hr1 | 2648s | 7.7s | 343.1x | 13 | 2500s | 10s |
|
||
| 2014-s6e19 | 2914s | 8.3s | 352.6x | 31 | 2625s | 30s |
|
||
| 2016-s8e43 | 5326s | 15.1s | 353.6x | 134 | 4615s | 140s |
|
||
| 2017-s9e30 | 5343s | 15.5s | 345.1x | 69 | 4945s | 350s |
|
||
| **Total** | **21374s** | **63.2s** | **338.1x** | 294 | 19755s | 690s |
|
||
|
||
**vs 5070 Ti baseline: 209.7x → 338.1x (+61.2%).**
|
||
|
||
Per-episode RTFs cluster tightly at 343-354x for warm episodes (5/6); episode 1 carries the cold-start penalty at 275.0x. Apples-to-apples vs the 5070 Ti measurement which also includes a cold start.
|
||
|
||
Aggregate CALLER time dropped from 2665s (pre-co-host pipeline, run earlier today) to 690s. That ~2000s delta is the second-voice signal correctly being routed away from the CALLER bucket. The benchmark table only sums HOST + CALLER, so CO-HOST seconds aren't shown in the totals — present in the per-episode `diarization.json` files.
|
||
|
||
---
|
||
|
||
## Phase 3 — Q&A Extraction (post-overhaul: turn-based lookback, 4s CALLER preference, expanded promo signatures)
|
||
|
||
| Episode | 4090 Q&A pairs | 5070 Ti reference | Note |
|
||
|---|---|---|---|
|
||
| 2011-03-12-hr1 | 1 | 3 | -2 |
|
||
| 2012-03-10-hr1 | 2 | 1 | +1 |
|
||
| 2012-06-09-hr1 | 0 | 1 | -1 |
|
||
| 2014-s6e19 | 0 | 0 | match (gaming, no callers) |
|
||
| 2016-s8e43 | 2 | 2 | match (WiFi caller) |
|
||
| 2017-s9e30 | 4 | 3 | +1 |
|
||
| **Total** | **9** | **10** | **-1** |
|
||
|
||
Differences are within noise. Likely sources:
|
||
- Whisper batched inference produces slightly different segment boundaries on identical audio under different GPU schedule orderings.
|
||
- Sliding-window diarization midpoint resolution can put a borderline segment in either bucket on different runs.
|
||
- Q&A extraction thresholds are sensitive to small boundary shifts.
|
||
|
||
**The two structural correctness signals match**: 2014 = 0 (no callers in gaming special) and 2016 = 2 (real WiFi caller, two-turn). That's the meaningful test. Aggregate ±1 across six episodes is acceptable run-to-run drift.
|
||
|
||
---
|
||
|
||
## Files written / modified
|
||
|
||
- `test-data/transcripts/<stem>/transcript.json` (6, regenerated with batched Whisper)
|
||
- `test-data/transcripts/<stem>/diarization.json` (6, regenerated with co-host-aware diarizer)
|
||
- `benchmark.py` line 27 — `BASELINE_RTF` updated 149.5 → 209.7
|
||
- `BENCH_SETUP.md` — added ffmpeg prereq to Step 2
|
||
- `.claude/memory/radio_show_no_cohost_named_tom.md` (new, project memory)
|
||
- `.claude/memory/MEMORY.md` (index updated)
|
||
|
||
archive.db is not on this machine — index update happens on DESKTOP-0O8A1RL.
|
||
|
||
---
|
||
|
||
## Per-year test set (one episode per year, expanded)
|
||
|
||
Mike asked to expand from the original 6 to one episode per year. Added:
|
||
- 2010: `2010-05-08-hr1.mp3` (May 2010, earliest available; avoids training's Oct 2)
|
||
- 2015: `2015-s7e19.mp3` (Jan 2015; avoids training's s7e30)
|
||
- 2018: `2018-s10e18.mp3` (only 3 non-training episodes exist for 2018)
|
||
|
||
Archive has no 2019 directory (years 2010-2018, no 2013 either). Rob's "2018/2019 appearances" are constrained to the 5 available 2018 episodes only.
|
||
|
||
### Diarization across all 9 episodes
|
||
|
||
| Year | Episode | Audio | Tara | % | HOST | CALLER (suspect) | Q&A |
|
||
|---|---|---|---|---|---|---|---|
|
||
| 2010 | 05-08-hr1 | 42:57 | 0:30 | 1.2% | 2325s | **355s** | 4 |
|
||
| 2011 | 03-12-hr1 | 41:49 | 2:20 | 5.6% | 2455s | 70s | 1 |
|
||
| 2012 | 03-10-hr1 | 43:54 | 0:30 | 1.1% | 2615s | 90s | 2 |
|
||
| 2012 | 06-09-hr1 | 44:08 | 5:40 | 12.8% | 2500s | 10s | 0 |
|
||
| 2014 | s6e19 | 48:34 | 11:20 | 23.3% | 2625s | 30s | 0 |
|
||
| 2015 | s7e19 | 47:13 | 4:40 | 9.9% | 2690s | 45s | 1 |
|
||
| 2016 | s8e43 | 88:46 | 31:30 | 35.5% | 4615s | 140s | 2 |
|
||
| 2017 | s9e30 | 89:03 | 10:10 | 11.4% | 4945s | 350s | 4 |
|
||
| 2018 | s10e18 | 85:45 | 14:40 | 17.1% | 4745s | 230s | 3 |
|
||
| **Total** | | **8h 52m** | **1h 21m** (15.3%) | | | **1320s** | **17** |
|
||
|
||
### Read on each row
|
||
|
||
| Episode | Tara reading |
|
||
|---|---|
|
||
| 2010-05-08-hr1 | likely false positive (30s); 2010 was pre-Tara; could be Randall or a producer |
|
||
| 2011-03-12-hr1 | likely false positive; 2011 was pure call-in per Mike |
|
||
| 2012-03-10-hr1 | likely false positive; 2012 was pure call-in per Mike |
|
||
| 2012-06-09-hr1 | suspicious (5:40 is too much for noise); pending Mike spot-check |
|
||
| 2014-s6e19 | confirmed Tara |
|
||
| 2015-s7e19 | substantial (4:40) — plausibly Tara was on early 2015; Mike to confirm |
|
||
| 2016-s8e43 | confirmed Tara |
|
||
| 2017-s9e30 | plausible Tara (or another co-host); Mike to confirm |
|
||
| 2018-s10e18 | **could be Rob, not Tara** — Mike flagged Rob for 2018/2019 appearances. The cosine threshold may be hitting because the two co-hosts have similar acoustic properties. Worth Mike sampling. |
|
||
|
||
### Q&A counts caveat
|
||
|
||
The Q&A column is still suspect because **every voice that isn't Mike-or-Tara is labeled CALLER**, including Randall, Rob, and any on-air producer (Andrew/Shannon/Ken/etc). The 2010 episode in particular shows 355s CALLER and 4 Q&A — but per Mike's roster, that CALLER bucket likely includes a co-host or producer, not real callers. Spot-check before treating early-years Q&A as ground truth.
|
||
|
||
**Mike's broader correction (2026-04-27):**
|
||
- **Co-hosts** rotated through over the years. Confirmed: Tara, Randall (early years), Rob (early years + occasional 2018/2019).
|
||
- **Producers / board ops** would sometimes go on-air. Named so far: Andrew, Shannon, Ken, plus "a couple more" Mike doesn't recall off-hand.
|
||
|
||
Of all these, only Tara has a voice profile. Every other co-host AND every producer-on-air moment in the archive is currently being labeled CALLER, which inflates Q&A false positives in those eras and episodes.
|
||
|
||
The small Tara percentages in 2011/2012 (1-13%) most likely reflect the 0.85 cosine threshold hitting on a similar-sounding speaker that isn't actually Tara — could be a producer (Andrew/Shannon/Ken/etc) or another early-years voice we haven't catalogued. Worth Mike sampling these short windows to identify before assuming false positive vs producer.
|
||
|
||
**Implication for full-archive runs:** before processing the 579-episode archive in earnest, build profiles for at least Randall, Rob, and the named producers. Otherwise the Q&A extraction across early-years and 2018/2019 episodes will inherit the same false-positive pattern that originally produced 12 bogus pairs in 2016-s8e43.
|
||
|
||
---
|
||
|
||
## Pending work (from 5070 Ti session, still unblocked)
|
||
|
||
1. **Resolve "Tom" identity** — Mike to confirm who the second voice is in 2014-s6e19 and 2016-s8e43. Then rename `voice-profiles/tom/`, update `profiles.json`, fix labels in code. Until then, voice-profile data is correct but mislabeled.
|
||
2. **Full archive download** — 579 MP3s from IX server (~30-40GB). 4090 + Tailscale ready.
|
||
3. **Full pipeline run on archive** — at 338x diarization + 95x transcription, total wall time for ~30h of audio extrapolates to roughly 19 minutes diarization + 19 minutes transcription. Disk I/O may dominate.
|
||
|
||
---
|
||
|
||
## Note for Mike
|
||
|
||
- "Tom" is wrong — see callout above. Tell me who that is and I'll do the rename in one pass (directory, profiles.json, build_cohost_profile.py, the 5070 Ti session log, and a fresh diarization pass to update `speaker_map`).
|
||
- BENCH_SETUP.md got a one-paragraph ffmpeg prereq added at the top of Step 2.
|