Created Mac M4 batch transcription using mlx-whisper with Apple Silicon
GPU acceleration. Transcribed 8 remaining episodes (17,555 total segments).
Scripts:
- batch_transcribe_mac.py: Full batch processor with mlx-whisper
- test_mac_transcribe.py: Quick test script for faster-whisper
Transcripts (JSON, SRT, TXT formats):
- 2011-06-04-hr1: 1,503 segments
- 2011-09-10-hr1: 1,378 segments
- 2014-s6e05: 1,340 segments
- 2015-s7e30: 1,053 segments
- 2016-s8e42: 2,205 segments
- 2017-s9e26: 2,366 segments
- 2018-s10e17: 4,683 segments
- 2018-s10e21: 2,493 segments
All 9 episodes now transcribed (8 on Mac + 1 from Linux).
Ready for Stages 3-6 on Linux PC.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fine-grained speaker analysis (3s windows, 1s hop) across 42min episode
- Host voice: 0.90-0.98 similarity (clear positive match)
- Callers: 0.65-0.68 (correctly below threshold)
- Produced audio/clips: 0.53-0.65 (correctly identified as non-host)
- Co-host/other speakers: 0.56-0.62 (correctly identified)
- Tuned host_match_threshold from 0.75 to 0.83 based on empirical data
- Cross-referenced dips with transcript: correctly identifies callers,
show intros, played audio clips, and station breaks
- Batch transcription of 7 additional training episodes in progress
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>