claudetools/wiki/projects/radio-show.md

---
type: project
name: radio-show
display_name: The Computer Guru Show
last_compiled: 2026-05-24
compiled_by: DESKTOP-0O8A1RL/claude-main
sources:
  - projects/radio-show/post-show-workflow.md
  - projects/radio-show/audio-processor/README.md
  - projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md
  - projects/radio-show/session-logs/2026-05-01-ui-redesign-recovery.md
---

# The Computer Guru Show

## Overview

"The Computer Guru Show" is Mike Swanson's radio program. The project covers two distinct workstreams:

1. **Audio Processor** — Automated pipeline that processes raw broadcast recordings (with commercials) into podcast-ready audio, transcripts, speaker-diarized segments, and a searchable SQLite archive.
2. **Post-Show Content Workflow** — Process for turning each episode into an episode page (website), forum discussion thread (Flarum), and 1–3 deep-dive blog posts within 48 hours of air.

**Status:** Active development. Audio processor pipeline functional with 572 episodes indexed locally on BEAST. FastAPI browse/search UI redesigned (2026-05-01). Jupiter deployment has a known audio-file gap (open). Post-show workflow documented but not yet fully automated.

Archive spans 2010–2018 (no 2013 season), 579 MP3s, ~30–40 GB.

---

## Tech Stack

| Layer | Technology |
|---|---|
| Transcription | faster-whisper (`large-v3`, CTranslate2 + CUDA), int8_float16, batched |
| Speaker diarization | pyannote.audio 3.1 (WavLM embeddings) |
| Audio processing | ffmpeg, pydub, librosa |
| Audio fingerprinting | chromaprint |
| Voice activity detection | silero-vad |
| ML / classification | scikit-learn (break pattern classifier) |
| Content analysis | Ollama — `qwen3:14b` (narrative/summary), local LLM |
| Archive database | SQLite with FTS5 (segments, Q&A pairs) |
| Web server | FastAPI + uvicorn (embedded HTML templates) |
| Hardware (primary) | DESKTOP-0O8A1RL — RTX 5070 Ti Laptop GPU |
| Hardware (secondary) | GURU-BEAST-ROG — RTX 4090 (benchmark pending) |

---

## Architecture

### Audio Processor Pipeline

```
Raw MP3 (full broadcast with commercials)
  |
  +-- 1. Transcription: faster-whisper large-v3 (63.8x realtime on 5070 Ti)
  |       Output: word-level timestamps, language detection
  |
  +-- 2. Speaker Diarization: pyannote.audio 3.1 (209.7x realtime on 5070 Ti)
  |       10s windows / 5s hop, midpoint boundary resolution at load time
  |       Speaker profiles: host (Mike, era-specific embeddings), co-hosts, callers
  |
  +-- 3. Segment Detection: Multi-signal classifier (6 signals, combined weighted score)
  |       Signals: fingerprint match (0.30), speaker identity (0.25),
  |       audio characteristics (0.20), break pattern (0.15), structural heuristics (0.10)
  |       Element library: SQLite fingerprints.db + learning/discovery system
  |
  +-- 4. Commercial Removal: ffmpeg — stitch segments, EBU R128 normalize
  |
  +-- 5. Segment Splitting: ffmpeg — individual MP3s per segment, ID3 tags, chapter markers
  |
  +-- 6. Content Analysis: Ollama qwen3:14b
          Output: episode summary, per-segment summaries, key quotes, topic tags,
                  suggested blog post topics, auto-filled post-show debrief
```

### Key Thresholds

| Parameter | Value |
|---|---|
| Host/co-host match threshold | 0.85 cosine similarity (WavLM) |
| Tara (co-host) vs Mike separation | 0.698 cosine similarity |
| CALLER minimum coverage in transcript segment | 4.0 seconds |
| Promo score threshold | 2 (weighted signatures) |
| Min Q&A question duration | 5.0s |
| Min Q&A answer duration | 15.0s |
| Max gap between Q and A | 30.0s |
| Commercial break: min/max duration | 30s / 300s |
| Combined confidence threshold (commercial) | 0.70 |

### Voice Profile System

Bootstrapped from the 579-episode archive. Host (Mike) has era-specific embeddings (2010, 2014, 2018, 2026). Co-host Tara has 44 embeddings from 2 episodes. Unknown repeat voices are clustered and held for host review.

```
voice-profiles/
  host-mike-swanson/      -- composite + era embeddings
  guests/<name>.npy       -- named guest embeddings (built over time)
  callers/regular-NNN.npy -- unnamed repeat callers
  unknown/cluster-NNN.npy -- unidentified voices appearing multiple times
```

### Archive Index (SQLite)

`archive.db` schema: `episodes`, `segments`, `segments_fts` (FTS5), `qa_pairs`, `qa_fts`. As of 2026-05-01 on BEAST: 572 episodes indexed.

FTS5 search supports: segment text search, Q&A pair search, speaker filter.

### FastAPI Browse/Search UI

Single-file server at `projects/radio-show/audio-processor/server/main.py`. Two embedded HTML templates:

- `INDEX_HTML` — search/browse page with CSS custom property theme (`#c39733` accent), browse-mode toggle, Q&A pill badges.
- `EPISODE_HTML` — episode detail page with sticky `<audio>` player, active-Q&A highlight that follows playhead via `timeupdate` listener, `preload="metadata"`.

Env vars: `ARCHIVE_DB`, `EPISODES_DIR`, `PORT`.

### Post-Show Content Workflow

Three content tiers produced within 48 hours of each episode:

| Tier | Target | Output |
|---|---|---|
| 1 | Radio show website | Episode page (`website/src/content/episodes/sXXeYY-slug.md`) with summary, chapters, links |
| 2 | Flarum forum | Discussion thread (tag: Show Discussion, ID 8) at community.azcomputerguru.com |
| 3 | Radio show website | 1–3 deep-dive blog posts (`website/src/content/blog/<slug>.md`) |

Claude handles: generating all content from show-prep + debrief, posting to Flarum via DB insert, building and deploying the Astro website.

---

## Deployment / Hosting

| Item | Value |
|---|---|
| Jupiter (primary archive host) | `172.16.3.20:8765` — uvicorn, FastAPI |
| Local dev (BEAST) | `127.0.0.1:8765` — same port as Jupiter for bookmark parity |
| Archive source (IX server) | `172.16.3.10` — `gurushow@`, `/home/gurushow/public_html/archive/Radio/` |
| Archive local copy (BEAST) | `projects/radio-show/audio-processor/archive-data/` |
| Forum | community.azcomputerguru.com (Flarum) |
| Radio show website | Astro site, deployed via rsync |

[WARNING] Jupiter's `/data/episodes` tree is EMPTY. `GET /api/audio/{id}` returns HTTP 404 for all episode IDs on Jupiter. Audio works locally on BEAST only (full archive in `archive-data/episodes/`). Fix decision is pending — see Open Items.

---

## Configuration / Credentials

| Secret | Location |
|---|---|
| IX server SSH (gurushow) | SOPS vault — search `gurushow` or `ix server` |
| HuggingFace token (pyannote license) | `huggingface-cli login` — required for pyannote.audio |
| Forum DB access (Flarum insert) | SOPS vault — search `flarum` or `community forum` |

IX server access: paramiko with `look_for_keys=False, allow_agent=False`. Tailscale required for `172.16.3.10`.

---

## Active Work / Open Items

- [ ] **Jupiter audio fix (open, unresolved).** Three options, no pick made:
  1. rsync full archive (~30–40 GB) to Jupiter at `/data/episodes/`
  2. Proxy `/api/audio/{id}` from Jupiter to IX on demand (~5 lines)
  3. Point `<audio src>` at IX directly via public HTTPS endpoint
- [ ] **Commit intro/QA sort tie-break fix** (`server/main.py` lines 551, 597 — `key=lambda x: x[0]`). Two-line fix, uncommitted as of end of 2026-05-01 session. Mike had not yet OK'd the commit.
- [ ] **RTX 4090 benchmark on BEAST** — establish diarization RTF baseline (expected ~250–300x vs 209.7x on laptop 5070 Ti).
- [ ] **Download full archive from IX to BEAST** for batch training (paramiko script skeleton exists in prior session log `2026-04-27-diarization-pipeline.md`).
- [ ] **Verify Tara profile generalizes across 2015/2016 episodes** — re-run `build_cohost_profile.py` with additional windows if false positives appear.
- [ ] **Post-show workflow automation** — social media, email newsletter, podcast RSS still need platform setup.

---

## Key Events / History

| Date | Event |
|---|---|
| 2010–2018 | Show original run. 579 episodes archived. No 2013 season. |
| 2026-04-27 | Q&A extraction + co-host profile session (DESKTOP-0O8A1RL). Built Tara co-host voice profile (44 embeddings, 0.698 cosine vs Mike). Fixed false-positive Q&A extraction for co-host episodes. Created `archive.db` with FTS5. Indexed 6 test episodes: 762 segments, 10 Q&A pairs. Transcription benchmarked at 63.8x realtime; diarization at 209.7x realtime. |
| 2026-04-30 | UI redesign done on BEAST (mid-session, uncommitted before reboot). |
| 2026-05-01 | Session recovery after BEAST reboot. Found 820-line uncommitted diff to `server/main.py`. Committed as `d7ce9cb` (rebased to `296d157`). Diagnosed Jupiter audio-404 (pre-existing deployment gap, not a regression). Deployed locally on BEAST — confirmed 572 episodes, working audio. Fixed episode-500 sort bug (episode 479). |
| 2026-05-01 | Co-host name corrected: previously labeled "Tom" in session log, Mike confirmed it is "Tara." All references updated. |

---

## Anti-Patterns / Warnings

[WARNING] Do NOT attempt interactive SSH to `gurushow@172.16.3.10` from scripts. Use paramiko with `look_for_keys=False, allow_agent=False`. Key-based auth is disabled on this host.

[WARNING] Tailscale must be active to reach `172.16.3.10` (IX server) or `172.16.3.20` (Jupiter).

[WARNING] The Ollama `/save` protocol has a known stale-prompt-file bug: `save_narrative_prompt.txt` at `C:/Users/guru/AppData/Local/Temp/` is reused across sessions and can cause qwen3 to produce a narrative about the WRONG session. Recovery: write narrative directly. Fix: delete prompt file before re-writing, or use a unique per-session filename.

[WARNING] `sorted()` over `(timestamp, sqlite3.Row)` tuples without `key=` will raise `TypeError` when two rows share the same timestamp. Always use `key=lambda x: x[0]`. This bit `_episode_html` at lines 551 and 597 (2026-05-01 bug).

[INFO] Co-host voice profiles must be built from the first 60 minutes of co-host episodes. Real callers do not call in during the first hour — those CALLER-labeled windows are safely all co-host speech.

[INFO] Tara's exact tenure as co-host is unverified. Do not assume her profile applies across all 2013–2016 episodes without spot-checking.

---

## Backlinks

- `wiki/systems/jupiter.md` [unverified — may not exist yet] — Jupiter server spec
- `wiki/systems/ix-server.md` [unverified — may not exist yet] — IX hosting server spec
- `wiki/projects/gururmm.md` — related ACG project
- `projects/radio-show/audio-processor/README.md` — full pipeline spec and configuration reference
- `projects/radio-show/post-show-workflow.md` — full post-show content workflow spec