Session log: audio processor tool, voice profiling, post-show workflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 13:42:54 -07:00
parent 6cc9043b8e
commit 37aaa6660b
1 changed files with 142 additions and 0 deletions
--- a/session-logs/2026-03-21-session.md
+++ b/session-logs/2026-03-21-session.md
@@ -433,3 +433,145 @@ Both speaker pairs: Active (tweeters + woofers scale together)
 2. **UCM2 package update risk** — Modified `/usr/share/alsa/ucm2/HDA/HiFi-analog.conf` will be overwritten by `alsa-ucm-conf` package updates
 3. **Kernel update risk** — AW88399 patch lost on kernel updates, rebuild needed
 4. **Cleanup** — `/home/guru/kernel-build/build/` still contains large build artifacts
+
+## Update: 13:45 — Post-Show Workflow, Audio Processor Tool, Voice Profiling
+
+### Session Summary
+Designed and built the post-show content workflow and radio show audio processor tool. Built a working voice profiler with validated speaker identification. GPU hit an error state during batch processing — needs reboot.
+
+### Post-Show Workflow
+Created `projects/radio-show/post-show-workflow.md` — 3-phase content pipeline:
+1. **Phase 1:** Post-show debrief questionnaire (what aired vs. prep)
+2. **Phase 2:** Three tiers of content generation:
+   - Episode post for radio website (Astro markdown)
+   - Forum discussion thread (Flarum)
+   - Deep-dive blog posts (1-3 per episode, SEO-rich)
+3. **Phase 3:** Cross-promotion and engagement
+
+Identified 10 engagement gaps: no social media, newsletter placeholder, podcast distribution unclear, no SEO structured data, no audiograms, no caller follow-up, no weekly email, forum needs better seeding, no guest pipeline, no analytics-driven topics.
+
+### Audio Processor Tool
+Built `projects/radio-show/audio-processor/` — Python CLI tool with 6-stage pipeline.
+
+**Architecture:**
+- Stage 1: Transcription (faster-whisper large-v3, GPU) — **WORKING**
+- Stage 2: Speaker diarization (WavLM x-vector embeddings) — **WORKING**
+- Stage 3: Segment detection (multi-signal classifier) — **WORKING (basic)**
+- Stage 4: Commercial removal (ffmpeg) — **CODED, untested**
+- Stage 5: Segment splitting (ffmpeg) — **CODED, untested**
+- Stage 6: Content analysis (Ollama) — **CODED, untested**
+
+**Key design decisions:**
+- Used WavLM (microsoft/wavlm-base-sv) instead of speechbrain ECAPA-TDNN due to torchaudio/CUDA 13.1 incompatibility
+- Patched torchaudio CUDA version check (12.8 vs 13.1) — warning instead of error
+- CUDA 12 libs from Ollama's bundled libraries (`/usr/local/lib/ollama/cuda_v12`) needed for faster-whisper's ctranslate2
+- Added to venv activate script: `export LD_LIBRARY_PATH="/usr/local/lib/ollama/cuda_v12:${LD_LIBRARY_PATH}"`
+- Element library designed as a learning system (seed known elements + discover unknown from archive)
+- Multi-signal commercial detector: fingerprints + speaker ID + audio characteristics + break patterns + structural heuristics
+
+**Python environment:**
+- Venv: `/home/guru/.local/share/radio-processor/`
+- Created with `--system-site-packages` (uses system python-pytorch-cuda 2.10.0)
+- Key packages: faster-whisper, transformers, soundfile, librosa, scikit-learn, ollama, rich, pyyaml
+- CRITICAL: Must activate with `source /home/guru/.local/share/radio-processor/bin/activate` which sets LD_LIBRARY_PATH for CUDA 12
+
+### Voice Profiling Results
+
+**Host profile: 180 embeddings from 9 episodes (2010-2018)**
+- Model: microsoft/wavlm-base-sv (512-dim x-vector)
+- Stored at: `voice-profiles/mike-swanson/embedding_0000.npy` through `embedding_0179.npy`
+- Composite embedding: `voice-profiles/mike-swanson/composite.npy`
+
+**Validated accuracy (fine-grained 3s windows on 2011-03-05 HR1):**
+- Host (Mike) voice: **0.90-0.98** similarity
+- Callers: **0.65-0.68**
+- Produced audio/voiceovers: **0.53-0.65**
+- Co-host (Ken): **0.56-0.62**
+- Threshold set to **0.83** (empirically determined)
+
+**Cross-referenced with transcript — correctly identified:**
+- Show intro voiceover (produced, 0.647)
+- Conan O'Brien clip played during show (0.528-0.655)
+- Station break/re-intro (0.547)
+- Caller asking about internet access (0.674)
+- Co-host discussing iPad (0.565-0.613)
+
+### Training Data Downloaded
+**9 episodes from IX server archive (151MB total):**
+- `training-data/episodes/2010-10-02-hr1.mp3` (7.3MB, 44min)
+- `training-data/episodes/2011-06-04-hr1.mp3` (7.4MB)
+- `training-data/episodes/2011-09-10-hr1.mp3` (11MB)
+- `training-data/episodes/2014-s6e05.mp3` (9.5MB)
+- `training-data/episodes/2015-s7e30.mp3` (9.0MB, 45min)
+- `training-data/episodes/2016-s8e42.mp3` (19MB)
+- `training-data/episodes/2017-s9e26.mp3` (48MB)
+- `training-data/episodes/2018-s10e17.mp3` (21MB)
+- `training-data/episodes/2018-s10e21.mp3` (21MB)
+
+**Show production elements (training-data/elements/):**
+- Bumpers: 7 files (MP3+WAV) — cities_in_dust, ET_edit, rancid_riot, stereo_mc, Warnng, white_n_nerdy
+- Computer Guru Elements: 5 WAV (intro_beast, intro_kick_back, intro_or_outro, outro, SHOW INTRO)
+- Corrected Elements: 5 WAV (same with corrected phone number)
+- Permanent Elements: 7 WAV (az_comp_guru_spot, combos, streaming, promo_window)
+
+### Transcription Results
+**Completed (2 episodes):**
+- `training-data/transcripts/2010-10-02-hr1/transcript.json` (1.2MB, 534 segments)
+- `test-data/output/transcript.json` (2015-s7e30, 1.2MB, 746 segments)
+- `test-data/output-hr1/transcript.json` (2011-03-05, 1070 segments)
+
+**Failed — GPU error state (6 episodes need reboot):**
+- 2011-06-04-hr1, 2014-s6e05, 2016-s8e42, 2017-s9e26, 2018-s10e17, 2018-s10e21
+
+**Transcription speed:** ~2.5 min per 45min episode on RTX 5070 Ti (17x realtime)
+
+### GPU Error State
+After extended processing (voice profiling + fine-grained analysis + multiple transcriptions), the RTX 5070 Ti entered an error state:
+- `nvidia-smi` shows ERR! across all fields
+- `torch.cuda.is_available()` returns False
+- GPU reset not supported on laptop GPUs
+- Processes holding GPU: nvidia-powerd, Discord, Chrome
+- **Fix: Reboot required**
+
+### Forum Post Updates (continued from earlier)
+- Added Issue 3 (tweeter/woofer volume balance)
+- Added "Beyond the Community Patch" section
+- Fixed bounty references to past tense
+- Linkified all bug tracker references
+- Updated Final Working Configuration for software volume
+
+### Files Created This Update
+- `projects/radio-show/post-show-workflow.md` — Full post-show content workflow
+- `projects/radio-show/audio-processor/` — Complete tool with:
+  - `src/cli.py` — CLI entry point (8 subcommands)
+  - `src/config.py` — Config loader
+  - `src/transcriber.py` — Whisper GPU transcription
+  - `src/diarizer.py` — Pyannote diarization (unused, needs HF token)
+  - `src/voice_profiler.py` — WavLM speaker embeddings
+  - `src/segment_detector.py` — Multi-signal commercial detector
+  - `src/audio_editor.py` — Commercial removal + segment splitting
+  - `src/analyzer.py` — Ollama content analysis
+  - `src/gpu.py` — CUDA library path setup
+  - `config.yaml` — Default configuration
+  - `pyproject.toml` — Package config (entry point: radio-process)
+  - `training-plan.md` — Archive training strategy
+  - `README.md` — Full architecture documentation
+- `voice-profiles/mike-swanson/` — 180 embedding files + composite + profiles.json
+
+### Next Session: Resume Batch Training
+After reboot:
+```bash
+source /home/guru/.local/share/radio-processor/bin/activate
+cd /home/guru/ClaudeTools/projects/radio-show/audio-processor
+
+# Verify GPU is back
+python3 -c "import torch; print(torch.cuda.is_available())"
+
+# Transcribe remaining 6 episodes
+for ep in training-data/episodes/2011-06-04-hr1.mp3 training-data/episodes/2014-s6e05.mp3 training-data/episodes/2016-s8e42.mp3 training-data/episodes/2017-s9e26.mp3 training-data/episodes/2018-s10e17.mp3 training-data/episodes/2018-s10e21.mp3; do
+    name=$(basename "$ep" .mp3)
+    radio-process transcribe "$ep" --output "training-data/transcripts/$name"
+done
+```
+
+Then: run speaker identification across all transcribed episodes, cluster non-host voices, begin element fingerprinting.