Session log: audio processor tool, voice profiling, post-show workflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-21 13:42:54 -07:00
parent 6cc9043b8e
commit 37aaa6660b

View File

@@ -433,3 +433,145 @@ Both speaker pairs: Active (tweeters + woofers scale together)
2. **UCM2 package update risk** — Modified `/usr/share/alsa/ucm2/HDA/HiFi-analog.conf` will be overwritten by `alsa-ucm-conf` package updates
3. **Kernel update risk** — AW88399 patch lost on kernel updates, rebuild needed
4. **Cleanup** — `/home/guru/kernel-build/build/` still contains large build artifacts
## Update: 13:45 — Post-Show Workflow, Audio Processor Tool, Voice Profiling
### Session Summary
Designed and built the post-show content workflow and radio show audio processor tool. Built a working voice profiler with validated speaker identification. GPU hit an error state during batch processing — needs reboot.
### Post-Show Workflow
Created `projects/radio-show/post-show-workflow.md` — 3-phase content pipeline:
1. **Phase 1:** Post-show debrief questionnaire (what aired vs. prep)
2. **Phase 2:** Three tiers of content generation:
- Episode post for radio website (Astro markdown)
- Forum discussion thread (Flarum)
- Deep-dive blog posts (1-3 per episode, SEO-rich)
3. **Phase 3:** Cross-promotion and engagement
Identified 10 engagement gaps: no social media, newsletter placeholder, podcast distribution unclear, no SEO structured data, no audiograms, no caller follow-up, no weekly email, forum needs better seeding, no guest pipeline, no analytics-driven topics.
### Audio Processor Tool
Built `projects/radio-show/audio-processor/` — Python CLI tool with 6-stage pipeline.
**Architecture:**
- Stage 1: Transcription (faster-whisper large-v3, GPU) — **WORKING**
- Stage 2: Speaker diarization (WavLM x-vector embeddings) — **WORKING**
- Stage 3: Segment detection (multi-signal classifier) — **WORKING (basic)**
- Stage 4: Commercial removal (ffmpeg) — **CODED, untested**
- Stage 5: Segment splitting (ffmpeg) — **CODED, untested**
- Stage 6: Content analysis (Ollama) — **CODED, untested**
**Key design decisions:**
- Used WavLM (microsoft/wavlm-base-sv) instead of speechbrain ECAPA-TDNN due to torchaudio/CUDA 13.1 incompatibility
- Patched torchaudio CUDA version check (12.8 vs 13.1) — warning instead of error
- CUDA 12 libs from Ollama's bundled libraries (`/usr/local/lib/ollama/cuda_v12`) needed for faster-whisper's ctranslate2
- Added to venv activate script: `export LD_LIBRARY_PATH="/usr/local/lib/ollama/cuda_v12:${LD_LIBRARY_PATH}"`
- Element library designed as a learning system (seed known elements + discover unknown from archive)
- Multi-signal commercial detector: fingerprints + speaker ID + audio characteristics + break patterns + structural heuristics
**Python environment:**
- Venv: `/home/guru/.local/share/radio-processor/`
- Created with `--system-site-packages` (uses system python-pytorch-cuda 2.10.0)
- Key packages: faster-whisper, transformers, soundfile, librosa, scikit-learn, ollama, rich, pyyaml
- CRITICAL: Must activate with `source /home/guru/.local/share/radio-processor/bin/activate` which sets LD_LIBRARY_PATH for CUDA 12
### Voice Profiling Results
**Host profile: 180 embeddings from 9 episodes (2010-2018)**
- Model: microsoft/wavlm-base-sv (512-dim x-vector)
- Stored at: `voice-profiles/mike-swanson/embedding_0000.npy` through `embedding_0179.npy`
- Composite embedding: `voice-profiles/mike-swanson/composite.npy`
**Validated accuracy (fine-grained 3s windows on 2011-03-05 HR1):**
- Host (Mike) voice: **0.90-0.98** similarity
- Callers: **0.65-0.68**
- Produced audio/voiceovers: **0.53-0.65**
- Co-host (Ken): **0.56-0.62**
- Threshold set to **0.83** (empirically determined)
**Cross-referenced with transcript — correctly identified:**
- Show intro voiceover (produced, 0.647)
- Conan O'Brien clip played during show (0.528-0.655)
- Station break/re-intro (0.547)
- Caller asking about internet access (0.674)
- Co-host discussing iPad (0.565-0.613)
### Training Data Downloaded
**9 episodes from IX server archive (151MB total):**
- `training-data/episodes/2010-10-02-hr1.mp3` (7.3MB, 44min)
- `training-data/episodes/2011-06-04-hr1.mp3` (7.4MB)
- `training-data/episodes/2011-09-10-hr1.mp3` (11MB)
- `training-data/episodes/2014-s6e05.mp3` (9.5MB)
- `training-data/episodes/2015-s7e30.mp3` (9.0MB, 45min)
- `training-data/episodes/2016-s8e42.mp3` (19MB)
- `training-data/episodes/2017-s9e26.mp3` (48MB)
- `training-data/episodes/2018-s10e17.mp3` (21MB)
- `training-data/episodes/2018-s10e21.mp3` (21MB)
**Show production elements (training-data/elements/):**
- Bumpers: 7 files (MP3+WAV) — cities_in_dust, ET_edit, rancid_riot, stereo_mc, Warnng, white_n_nerdy
- Computer Guru Elements: 5 WAV (intro_beast, intro_kick_back, intro_or_outro, outro, SHOW INTRO)
- Corrected Elements: 5 WAV (same with corrected phone number)
- Permanent Elements: 7 WAV (az_comp_guru_spot, combos, streaming, promo_window)
### Transcription Results
**Completed (2 episodes):**
- `training-data/transcripts/2010-10-02-hr1/transcript.json` (1.2MB, 534 segments)
- `test-data/output/transcript.json` (2015-s7e30, 1.2MB, 746 segments)
- `test-data/output-hr1/transcript.json` (2011-03-05, 1070 segments)
**Failed — GPU error state (6 episodes need reboot):**
- 2011-06-04-hr1, 2014-s6e05, 2016-s8e42, 2017-s9e26, 2018-s10e17, 2018-s10e21
**Transcription speed:** ~2.5 min per 45min episode on RTX 5070 Ti (17x realtime)
### GPU Error State
After extended processing (voice profiling + fine-grained analysis + multiple transcriptions), the RTX 5070 Ti entered an error state:
- `nvidia-smi` shows ERR! across all fields
- `torch.cuda.is_available()` returns False
- GPU reset not supported on laptop GPUs
- Processes holding GPU: nvidia-powerd, Discord, Chrome
- **Fix: Reboot required**
### Forum Post Updates (continued from earlier)
- Added Issue 3 (tweeter/woofer volume balance)
- Added "Beyond the Community Patch" section
- Fixed bounty references to past tense
- Linkified all bug tracker references
- Updated Final Working Configuration for software volume
### Files Created This Update
- `projects/radio-show/post-show-workflow.md` — Full post-show content workflow
- `projects/radio-show/audio-processor/` — Complete tool with:
- `src/cli.py` — CLI entry point (8 subcommands)
- `src/config.py` — Config loader
- `src/transcriber.py` — Whisper GPU transcription
- `src/diarizer.py` — Pyannote diarization (unused, needs HF token)
- `src/voice_profiler.py` — WavLM speaker embeddings
- `src/segment_detector.py` — Multi-signal commercial detector
- `src/audio_editor.py` — Commercial removal + segment splitting
- `src/analyzer.py` — Ollama content analysis
- `src/gpu.py` — CUDA library path setup
- `config.yaml` — Default configuration
- `pyproject.toml` — Package config (entry point: radio-process)
- `training-plan.md` — Archive training strategy
- `README.md` — Full architecture documentation
- `voice-profiles/mike-swanson/` — 180 embedding files + composite + profiles.json
### Next Session: Resume Batch Training
After reboot:
```bash
source /home/guru/.local/share/radio-processor/bin/activate
cd /home/guru/ClaudeTools/projects/radio-show/audio-processor
# Verify GPU is back
python3 -c "import torch; print(torch.cuda.is_available())"
# Transcribe remaining 6 episodes
for ep in training-data/episodes/2011-06-04-hr1.mp3 training-data/episodes/2014-s6e05.mp3 training-data/episodes/2016-s8e42.mp3 training-data/episodes/2017-s9e26.mp3 training-data/episodes/2018-s10e17.mp3 training-data/episodes/2018-s10e21.mp3; do
name=$(basename "$ep" .mp3)
radio-process transcribe "$ep" --output "training-data/transcripts/$name"
done
```
Then: run speaker identification across all transcribed episodes, cluster non-host voices, begin element fingerprinting.