Session log: audio processor tool, voice profiling, post-show workflow
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -433,3 +433,145 @@ Both speaker pairs: Active (tweeters + woofers scale together)
|
||||
2. **UCM2 package update risk** — Modified `/usr/share/alsa/ucm2/HDA/HiFi-analog.conf` will be overwritten by `alsa-ucm-conf` package updates
|
||||
3. **Kernel update risk** — AW88399 patch lost on kernel updates, rebuild needed
|
||||
4. **Cleanup** — `/home/guru/kernel-build/build/` still contains large build artifacts
|
||||
|
||||
## Update: 13:45 — Post-Show Workflow, Audio Processor Tool, Voice Profiling
|
||||
|
||||
### Session Summary
|
||||
Designed and built the post-show content workflow and radio show audio processor tool. Built a working voice profiler with validated speaker identification. GPU hit an error state during batch processing — needs reboot.
|
||||
|
||||
### Post-Show Workflow
|
||||
Created `projects/radio-show/post-show-workflow.md` — 3-phase content pipeline:
|
||||
1. **Phase 1:** Post-show debrief questionnaire (what aired vs. prep)
|
||||
2. **Phase 2:** Three tiers of content generation:
|
||||
- Episode post for radio website (Astro markdown)
|
||||
- Forum discussion thread (Flarum)
|
||||
- Deep-dive blog posts (1-3 per episode, SEO-rich)
|
||||
3. **Phase 3:** Cross-promotion and engagement
|
||||
|
||||
Identified 10 engagement gaps: no social media, newsletter placeholder, podcast distribution unclear, no SEO structured data, no audiograms, no caller follow-up, no weekly email, forum needs better seeding, no guest pipeline, no analytics-driven topics.
|
||||
|
||||
### Audio Processor Tool
|
||||
Built `projects/radio-show/audio-processor/` — Python CLI tool with 6-stage pipeline.
|
||||
|
||||
**Architecture:**
|
||||
- Stage 1: Transcription (faster-whisper large-v3, GPU) — **WORKING**
|
||||
- Stage 2: Speaker diarization (WavLM x-vector embeddings) — **WORKING**
|
||||
- Stage 3: Segment detection (multi-signal classifier) — **WORKING (basic)**
|
||||
- Stage 4: Commercial removal (ffmpeg) — **CODED, untested**
|
||||
- Stage 5: Segment splitting (ffmpeg) — **CODED, untested**
|
||||
- Stage 6: Content analysis (Ollama) — **CODED, untested**
|
||||
|
||||
**Key design decisions:**
|
||||
- Used WavLM (microsoft/wavlm-base-sv) instead of speechbrain ECAPA-TDNN due to torchaudio/CUDA 13.1 incompatibility
|
||||
- Patched torchaudio CUDA version check (12.8 vs 13.1) — warning instead of error
|
||||
- CUDA 12 libs from Ollama's bundled libraries (`/usr/local/lib/ollama/cuda_v12`) needed for faster-whisper's ctranslate2
|
||||
- Added to venv activate script: `export LD_LIBRARY_PATH="/usr/local/lib/ollama/cuda_v12:${LD_LIBRARY_PATH}"`
|
||||
- Element library designed as a learning system (seed known elements + discover unknown from archive)
|
||||
- Multi-signal commercial detector: fingerprints + speaker ID + audio characteristics + break patterns + structural heuristics
|
||||
|
||||
**Python environment:**
|
||||
- Venv: `/home/guru/.local/share/radio-processor/`
|
||||
- Created with `--system-site-packages` (uses system python-pytorch-cuda 2.10.0)
|
||||
- Key packages: faster-whisper, transformers, soundfile, librosa, scikit-learn, ollama, rich, pyyaml
|
||||
- CRITICAL: Must activate with `source /home/guru/.local/share/radio-processor/bin/activate` which sets LD_LIBRARY_PATH for CUDA 12
|
||||
|
||||
### Voice Profiling Results
|
||||
|
||||
**Host profile: 180 embeddings from 9 episodes (2010-2018)**
|
||||
- Model: microsoft/wavlm-base-sv (512-dim x-vector)
|
||||
- Stored at: `voice-profiles/mike-swanson/embedding_0000.npy` through `embedding_0179.npy`
|
||||
- Composite embedding: `voice-profiles/mike-swanson/composite.npy`
|
||||
|
||||
**Validated accuracy (fine-grained 3s windows on 2011-03-05 HR1):**
|
||||
- Host (Mike) voice: **0.90-0.98** similarity
|
||||
- Callers: **0.65-0.68**
|
||||
- Produced audio/voiceovers: **0.53-0.65**
|
||||
- Co-host (Ken): **0.56-0.62**
|
||||
- Threshold set to **0.83** (empirically determined)
|
||||
|
||||
**Cross-referenced with transcript — correctly identified:**
|
||||
- Show intro voiceover (produced, 0.647)
|
||||
- Conan O'Brien clip played during show (0.528-0.655)
|
||||
- Station break/re-intro (0.547)
|
||||
- Caller asking about internet access (0.674)
|
||||
- Co-host discussing iPad (0.565-0.613)
|
||||
|
||||
### Training Data Downloaded
|
||||
**9 episodes from IX server archive (151MB total):**
|
||||
- `training-data/episodes/2010-10-02-hr1.mp3` (7.3MB, 44min)
|
||||
- `training-data/episodes/2011-06-04-hr1.mp3` (7.4MB)
|
||||
- `training-data/episodes/2011-09-10-hr1.mp3` (11MB)
|
||||
- `training-data/episodes/2014-s6e05.mp3` (9.5MB)
|
||||
- `training-data/episodes/2015-s7e30.mp3` (9.0MB, 45min)
|
||||
- `training-data/episodes/2016-s8e42.mp3` (19MB)
|
||||
- `training-data/episodes/2017-s9e26.mp3` (48MB)
|
||||
- `training-data/episodes/2018-s10e17.mp3` (21MB)
|
||||
- `training-data/episodes/2018-s10e21.mp3` (21MB)
|
||||
|
||||
**Show production elements (training-data/elements/):**
|
||||
- Bumpers: 7 files (MP3+WAV) — cities_in_dust, ET_edit, rancid_riot, stereo_mc, Warnng, white_n_nerdy
|
||||
- Computer Guru Elements: 5 WAV (intro_beast, intro_kick_back, intro_or_outro, outro, SHOW INTRO)
|
||||
- Corrected Elements: 5 WAV (same with corrected phone number)
|
||||
- Permanent Elements: 7 WAV (az_comp_guru_spot, combos, streaming, promo_window)
|
||||
|
||||
### Transcription Results
|
||||
**Completed (2 episodes):**
|
||||
- `training-data/transcripts/2010-10-02-hr1/transcript.json` (1.2MB, 534 segments)
|
||||
- `test-data/output/transcript.json` (2015-s7e30, 1.2MB, 746 segments)
|
||||
- `test-data/output-hr1/transcript.json` (2011-03-05, 1070 segments)
|
||||
|
||||
**Failed — GPU error state (6 episodes need reboot):**
|
||||
- 2011-06-04-hr1, 2014-s6e05, 2016-s8e42, 2017-s9e26, 2018-s10e17, 2018-s10e21
|
||||
|
||||
**Transcription speed:** ~2.5 min per 45min episode on RTX 5070 Ti (17x realtime)
|
||||
|
||||
### GPU Error State
|
||||
After extended processing (voice profiling + fine-grained analysis + multiple transcriptions), the RTX 5070 Ti entered an error state:
|
||||
- `nvidia-smi` shows ERR! across all fields
|
||||
- `torch.cuda.is_available()` returns False
|
||||
- GPU reset not supported on laptop GPUs
|
||||
- Processes holding GPU: nvidia-powerd, Discord, Chrome
|
||||
- **Fix: Reboot required**
|
||||
|
||||
### Forum Post Updates (continued from earlier)
|
||||
- Added Issue 3 (tweeter/woofer volume balance)
|
||||
- Added "Beyond the Community Patch" section
|
||||
- Fixed bounty references to past tense
|
||||
- Linkified all bug tracker references
|
||||
- Updated Final Working Configuration for software volume
|
||||
|
||||
### Files Created This Update
|
||||
- `projects/radio-show/post-show-workflow.md` — Full post-show content workflow
|
||||
- `projects/radio-show/audio-processor/` — Complete tool with:
|
||||
- `src/cli.py` — CLI entry point (8 subcommands)
|
||||
- `src/config.py` — Config loader
|
||||
- `src/transcriber.py` — Whisper GPU transcription
|
||||
- `src/diarizer.py` — Pyannote diarization (unused, needs HF token)
|
||||
- `src/voice_profiler.py` — WavLM speaker embeddings
|
||||
- `src/segment_detector.py` — Multi-signal commercial detector
|
||||
- `src/audio_editor.py` — Commercial removal + segment splitting
|
||||
- `src/analyzer.py` — Ollama content analysis
|
||||
- `src/gpu.py` — CUDA library path setup
|
||||
- `config.yaml` — Default configuration
|
||||
- `pyproject.toml` — Package config (entry point: radio-process)
|
||||
- `training-plan.md` — Archive training strategy
|
||||
- `README.md` — Full architecture documentation
|
||||
- `voice-profiles/mike-swanson/` — 180 embedding files + composite + profiles.json
|
||||
|
||||
### Next Session: Resume Batch Training
|
||||
After reboot:
|
||||
```bash
|
||||
source /home/guru/.local/share/radio-processor/bin/activate
|
||||
cd /home/guru/ClaudeTools/projects/radio-show/audio-processor
|
||||
|
||||
# Verify GPU is back
|
||||
python3 -c "import torch; print(torch.cuda.is_available())"
|
||||
|
||||
# Transcribe remaining 6 episodes
|
||||
for ep in training-data/episodes/2011-06-04-hr1.mp3 training-data/episodes/2014-s6e05.mp3 training-data/episodes/2016-s8e42.mp3 training-data/episodes/2017-s9e26.mp3 training-data/episodes/2018-s10e17.mp3 training-data/episodes/2018-s10e21.mp3; do
|
||||
name=$(basename "$ep" .mp3)
|
||||
radio-process transcribe "$ep" --output "training-data/transcripts/$name"
|
||||
done
|
||||
```
|
||||
|
||||
Then: run speaker identification across all transcribed episodes, cluster non-host voices, begin element fingerprinting.
|
||||
|
||||
Reference in New Issue
Block a user