diff --git a/session-logs/2026-03-21-session.md b/session-logs/2026-03-21-session.md index 15dbba2..72b875d 100644 --- a/session-logs/2026-03-21-session.md +++ b/session-logs/2026-03-21-session.md @@ -433,3 +433,145 @@ Both speaker pairs: Active (tweeters + woofers scale together) 2. **UCM2 package update risk** — Modified `/usr/share/alsa/ucm2/HDA/HiFi-analog.conf` will be overwritten by `alsa-ucm-conf` package updates 3. **Kernel update risk** — AW88399 patch lost on kernel updates, rebuild needed 4. **Cleanup** — `/home/guru/kernel-build/build/` still contains large build artifacts + +## Update: 13:45 — Post-Show Workflow, Audio Processor Tool, Voice Profiling + +### Session Summary +Designed and built the post-show content workflow and radio show audio processor tool. Built a working voice profiler with validated speaker identification. GPU hit an error state during batch processing — needs reboot. + +### Post-Show Workflow +Created `projects/radio-show/post-show-workflow.md` — 3-phase content pipeline: +1. **Phase 1:** Post-show debrief questionnaire (what aired vs. prep) +2. **Phase 2:** Three tiers of content generation: + - Episode post for radio website (Astro markdown) + - Forum discussion thread (Flarum) + - Deep-dive blog posts (1-3 per episode, SEO-rich) +3. **Phase 3:** Cross-promotion and engagement + +Identified 10 engagement gaps: no social media, newsletter placeholder, podcast distribution unclear, no SEO structured data, no audiograms, no caller follow-up, no weekly email, forum needs better seeding, no guest pipeline, no analytics-driven topics. + +### Audio Processor Tool +Built `projects/radio-show/audio-processor/` — Python CLI tool with 6-stage pipeline. + +**Architecture:** +- Stage 1: Transcription (faster-whisper large-v3, GPU) — **WORKING** +- Stage 2: Speaker diarization (WavLM x-vector embeddings) — **WORKING** +- Stage 3: Segment detection (multi-signal classifier) — **WORKING (basic)** +- Stage 4: Commercial removal (ffmpeg) — **CODED, untested** +- Stage 5: Segment splitting (ffmpeg) — **CODED, untested** +- Stage 6: Content analysis (Ollama) — **CODED, untested** + +**Key design decisions:** +- Used WavLM (microsoft/wavlm-base-sv) instead of speechbrain ECAPA-TDNN due to torchaudio/CUDA 13.1 incompatibility +- Patched torchaudio CUDA version check (12.8 vs 13.1) — warning instead of error +- CUDA 12 libs from Ollama's bundled libraries (`/usr/local/lib/ollama/cuda_v12`) needed for faster-whisper's ctranslate2 +- Added to venv activate script: `export LD_LIBRARY_PATH="/usr/local/lib/ollama/cuda_v12:${LD_LIBRARY_PATH}"` +- Element library designed as a learning system (seed known elements + discover unknown from archive) +- Multi-signal commercial detector: fingerprints + speaker ID + audio characteristics + break patterns + structural heuristics + +**Python environment:** +- Venv: `/home/guru/.local/share/radio-processor/` +- Created with `--system-site-packages` (uses system python-pytorch-cuda 2.10.0) +- Key packages: faster-whisper, transformers, soundfile, librosa, scikit-learn, ollama, rich, pyyaml +- CRITICAL: Must activate with `source /home/guru/.local/share/radio-processor/bin/activate` which sets LD_LIBRARY_PATH for CUDA 12 + +### Voice Profiling Results + +**Host profile: 180 embeddings from 9 episodes (2010-2018)** +- Model: microsoft/wavlm-base-sv (512-dim x-vector) +- Stored at: `voice-profiles/mike-swanson/embedding_0000.npy` through `embedding_0179.npy` +- Composite embedding: `voice-profiles/mike-swanson/composite.npy` + +**Validated accuracy (fine-grained 3s windows on 2011-03-05 HR1):** +- Host (Mike) voice: **0.90-0.98** similarity +- Callers: **0.65-0.68** +- Produced audio/voiceovers: **0.53-0.65** +- Co-host (Ken): **0.56-0.62** +- Threshold set to **0.83** (empirically determined) + +**Cross-referenced with transcript — correctly identified:** +- Show intro voiceover (produced, 0.647) +- Conan O'Brien clip played during show (0.528-0.655) +- Station break/re-intro (0.547) +- Caller asking about internet access (0.674) +- Co-host discussing iPad (0.565-0.613) + +### Training Data Downloaded +**9 episodes from IX server archive (151MB total):** +- `training-data/episodes/2010-10-02-hr1.mp3` (7.3MB, 44min) +- `training-data/episodes/2011-06-04-hr1.mp3` (7.4MB) +- `training-data/episodes/2011-09-10-hr1.mp3` (11MB) +- `training-data/episodes/2014-s6e05.mp3` (9.5MB) +- `training-data/episodes/2015-s7e30.mp3` (9.0MB, 45min) +- `training-data/episodes/2016-s8e42.mp3` (19MB) +- `training-data/episodes/2017-s9e26.mp3` (48MB) +- `training-data/episodes/2018-s10e17.mp3` (21MB) +- `training-data/episodes/2018-s10e21.mp3` (21MB) + +**Show production elements (training-data/elements/):** +- Bumpers: 7 files (MP3+WAV) — cities_in_dust, ET_edit, rancid_riot, stereo_mc, Warnng, white_n_nerdy +- Computer Guru Elements: 5 WAV (intro_beast, intro_kick_back, intro_or_outro, outro, SHOW INTRO) +- Corrected Elements: 5 WAV (same with corrected phone number) +- Permanent Elements: 7 WAV (az_comp_guru_spot, combos, streaming, promo_window) + +### Transcription Results +**Completed (2 episodes):** +- `training-data/transcripts/2010-10-02-hr1/transcript.json` (1.2MB, 534 segments) +- `test-data/output/transcript.json` (2015-s7e30, 1.2MB, 746 segments) +- `test-data/output-hr1/transcript.json` (2011-03-05, 1070 segments) + +**Failed — GPU error state (6 episodes need reboot):** +- 2011-06-04-hr1, 2014-s6e05, 2016-s8e42, 2017-s9e26, 2018-s10e17, 2018-s10e21 + +**Transcription speed:** ~2.5 min per 45min episode on RTX 5070 Ti (17x realtime) + +### GPU Error State +After extended processing (voice profiling + fine-grained analysis + multiple transcriptions), the RTX 5070 Ti entered an error state: +- `nvidia-smi` shows ERR! across all fields +- `torch.cuda.is_available()` returns False +- GPU reset not supported on laptop GPUs +- Processes holding GPU: nvidia-powerd, Discord, Chrome +- **Fix: Reboot required** + +### Forum Post Updates (continued from earlier) +- Added Issue 3 (tweeter/woofer volume balance) +- Added "Beyond the Community Patch" section +- Fixed bounty references to past tense +- Linkified all bug tracker references +- Updated Final Working Configuration for software volume + +### Files Created This Update +- `projects/radio-show/post-show-workflow.md` — Full post-show content workflow +- `projects/radio-show/audio-processor/` — Complete tool with: + - `src/cli.py` — CLI entry point (8 subcommands) + - `src/config.py` — Config loader + - `src/transcriber.py` — Whisper GPU transcription + - `src/diarizer.py` — Pyannote diarization (unused, needs HF token) + - `src/voice_profiler.py` — WavLM speaker embeddings + - `src/segment_detector.py` — Multi-signal commercial detector + - `src/audio_editor.py` — Commercial removal + segment splitting + - `src/analyzer.py` — Ollama content analysis + - `src/gpu.py` — CUDA library path setup + - `config.yaml` — Default configuration + - `pyproject.toml` — Package config (entry point: radio-process) + - `training-plan.md` — Archive training strategy + - `README.md` — Full architecture documentation +- `voice-profiles/mike-swanson/` — 180 embedding files + composite + profiles.json + +### Next Session: Resume Batch Training +After reboot: +```bash +source /home/guru/.local/share/radio-processor/bin/activate +cd /home/guru/ClaudeTools/projects/radio-show/audio-processor + +# Verify GPU is back +python3 -c "import torch; print(torch.cuda.is_available())" + +# Transcribe remaining 6 episodes +for ep in training-data/episodes/2011-06-04-hr1.mp3 training-data/episodes/2014-s6e05.mp3 training-data/episodes/2016-s8e42.mp3 training-data/episodes/2017-s9e26.mp3 training-data/episodes/2018-s10e17.mp3 training-data/episodes/2018-s10e21.mp3; do + name=$(basename "$ep" .mp3) + radio-process transcribe "$ep" --output "training-data/transcripts/$name" +done +``` + +Then: run speaker identification across all transcribed episodes, cluster non-host voices, begin element fingerprinting.