Commit Graph

5 Commits

Author SHA1 Message Date
79abef9dc9 radio: diarization pipeline fixes, benchmark setup, test episode set
- Fix voice_profiler threshold bug (HOST label overwrote Unknown unconditionally)
- Audio preload optimization: single ffmpeg per episode, 149.5x realtime on 5070 Ti
- WavLM threshold raised to 0.85 (Mike 0.90-0.99, callers 0.46-0.83)
- Promo/bumper filter: weighted signature scoring, 42->27 clean Q&A pairs
- Text-only Q&A fallback for episodes with no CALLER diarization labels
- TRANSFORMERS_OFFLINE=1 to skip HuggingFace freshness checks
- Add diarize_2018.py for targeted re-run + FTS5 rebuild
- Add benchmark.py + BENCH_SETUP.md for GURU-BEAST-ROG (RTX 4090) comparison
- Commit 9-episode training diarization.json outputs
- Session log: 2026-04-27-diarization-pipeline.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:20:40 -07:00
26dbe2327f sync: Auto-sync from GURU-BEAST-ROG at 2026-04-26 15:09:57
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: GURU-BEAST-ROG
Timestamp: 2026-04-26 15:09:57

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-26 15:10:06 -07:00
9011670fce sync: Auto-sync from GURU-BEAST-ROG at 2026-03-25 03:45:04
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: GURU-BEAST-ROG
Timestamp: 2026-03-25 03:45:04

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-25 03:46:07 -07:00
ad88fc31f0 sync: Auto-sync from acg-guru-5070 at 2026-03-22 22:31:46
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: acg-guru-5070
Timestamp: 2026-03-22 22:31:46

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-22 22:31:46 -07:00
a3a47f2d5e Add batch transcription scripts and 8 episode transcripts
Created Mac M4 batch transcription using mlx-whisper with Apple Silicon
GPU acceleration. Transcribed 8 remaining episodes (17,555 total segments).

Scripts:
- batch_transcribe_mac.py: Full batch processor with mlx-whisper
- test_mac_transcribe.py: Quick test script for faster-whisper

Transcripts (JSON, SRT, TXT formats):
- 2011-06-04-hr1: 1,503 segments
- 2011-09-10-hr1: 1,378 segments
- 2014-s6e05: 1,340 segments
- 2015-s7e30: 1,053 segments
- 2016-s8e42: 2,205 segments
- 2017-s9e26: 2,366 segments
- 2018-s10e17: 4,683 segments
- 2018-s10e21: 2,493 segments

All 9 episodes now transcribed (8 on Mac + 1 from Linux).
Ready for Stages 3-6 on Linux PC.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-21 23:12:06 -07:00