Commit Graph

8 Commits

Author SHA1 Message Date
5e6ec54614 sync: Auto-sync from acg-guru-5070 at 2026-03-22 22:31:46
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: acg-guru-5070
Timestamp: 2026-03-22 22:31:46

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-22 22:31:46 -07:00
01a97db3fe Add batch transcription scripts and 8 episode transcripts
Created Mac M4 batch transcription using mlx-whisper with Apple Silicon
GPU acceleration. Transcribed 8 remaining episodes (17,555 total segments).

Scripts:
- batch_transcribe_mac.py: Full batch processor with mlx-whisper
- test_mac_transcribe.py: Quick test script for faster-whisper

Transcripts (JSON, SRT, TXT formats):
- 2011-06-04-hr1: 1,503 segments
- 2011-09-10-hr1: 1,378 segments
- 2014-s6e05: 1,340 segments
- 2015-s7e30: 1,053 segments
- 2016-s8e42: 2,205 segments
- 2017-s9e26: 2,366 segments
- 2018-s10e17: 4,683 segments
- 2018-s10e21: 2,493 segments

All 9 episodes now transcribed (8 on Mac + 1 from Linux).
Ready for Stages 3-6 on Linux PC.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-21 23:12:06 -07:00
301e90a399 Audio processor: add Mac build task for voice training
GPU firmware bug (NVRM 0x00000062) on RTX 5070 Ti makes
GPU transcription impossible. Handoff doc for Mac M4 to
build native version and complete the 8 remaining episode
transcriptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:44:52 -07:00
a1f4c43ee7 sync: Auto-sync from acg-guru-5070 at 2026-03-21 16:34:05
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: acg-guru-5070
Timestamp: 2026-03-21 16:34:05

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-21 16:34:05 -07:00
49560de424 Audio processor: validated voice profiling accuracy, tuned threshold
- Fine-grained speaker analysis (3s windows, 1s hop) across 42min episode
- Host voice: 0.90-0.98 similarity (clear positive match)
- Callers: 0.65-0.68 (correctly below threshold)
- Produced audio/clips: 0.53-0.65 (correctly identified as non-host)
- Co-host/other speakers: 0.56-0.62 (correctly identified)
- Tuned host_match_threshold from 0.75 to 0.83 based on empirical data
- Cross-referenced dips with transcript: correctly identifies callers,
  show intros, played audio clips, and station breaks
- Batch transcription of 7 additional training episodes in progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 12:48:25 -07:00
2559935293 Audio processor: working voice profiler with WavLM speaker embeddings
- Voice profiler using microsoft/wavlm-base-sv (512-dim x-vector embeddings)
- Bootstrap from archive: 180 embeddings from 9 episodes across 2010-2018
- Host identification accuracy: 0.87-0.98 similarity for live speech,
  0.60-0.64 for non-host audio (produced intros, co-host)
- Dropped speechbrain dependency (requires torchaudio, CUDA version conflicts)
- Patched torchaudio CUDA 12.8/13.1 version check (warning instead of error)
- Profile stored in voice-profiles/mike-swanson/ with per-chunk embeddings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 12:19:13 -07:00
1d73ff7137 Audio processor: fix segment detection with transcript-driven breaks
- Add transcript break phrase detection (going_to_break/coming_back cues)
- Create segments from transcript breaks with silence boundary snapping
- Fix segment dedup in merge_adjacent (handle overlapping segments)
- Add CUDA 12 library path fix (gpu.py + venv activate hook)
- Auto-load existing transcript in detect command
- Tested on 2011-03-05 HR1: correctly identifies commercial break at 34:38

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 11:59:54 -07:00
3cbb1b8aab Add radio show audio processor and post-show workflow
- Audio processor CLI tool with 6-stage pipeline: transcribe (faster-whisper GPU),
  diarize (pyannote), detect segments (multi-signal classifier), remove commercials,
  split segments, analyze content (Ollama)
- Post-show workflow doc for episode posts, forum threads, deep-dive blog posts
- Training plan for using 579-episode archive for voice profiles and commercial detection
- Successful test: 45min episode transcribed in 2:37 on RTX 5070 Ti
- Sample transcript output from S7E30 (March 2015)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 11:51:59 -07:00