Files
claudetools/wiki/projects/radio-show.md
Mike Swanson f4fb131529 wiki: seed remaining clients and projects (batch 3)
Adds 11 client articles and 5 project articles:

Clients: kittle, khalsa, anaise, azcomputerguru.com, bg-builders,
evs, furrier, horseshoe-management, kittle-design, scileppi-law,
western-tire

Projects: discord-bot, radio-show, msp-pricing, wrightstown-smarthome,
wrightstown-solar

Updates wiki/index.md with all new entries, cross-references, and
removes seeded client:birthbiologic from compilation queue.

Critical findings surfaced:
- Kittle: WS2025 EVAL license, no backups, 3 plaintext creds in Syncro
- Western Tire: SSL cert *.westerntire.com expires 2026-05-30
- Kittle Design: active compromise (Ken inbox rule unresolved)
- Horseshoe Mgmt: plaintext creds for 5+ users in Syncro notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 19:59:40 -07:00

10 KiB
Raw Blame History

type, name, display_name, last_compiled, compiled_by, sources
type name display_name last_compiled compiled_by sources
project radio-show The Computer Guru Show 2026-05-24 DESKTOP-0O8A1RL/claude-main
projects/radio-show/post-show-workflow.md
projects/radio-show/audio-processor/README.md
projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md
projects/radio-show/session-logs/2026-05-01-ui-redesign-recovery.md

The Computer Guru Show

Overview

"The Computer Guru Show" is Mike Swanson's radio program. The project covers two distinct workstreams:

  1. Audio Processor — Automated pipeline that processes raw broadcast recordings (with commercials) into podcast-ready audio, transcripts, speaker-diarized segments, and a searchable SQLite archive.
  2. Post-Show Content Workflow — Process for turning each episode into an episode page (website), forum discussion thread (Flarum), and 13 deep-dive blog posts within 48 hours of air.

Status: Active development. Audio processor pipeline functional with 572 episodes indexed locally on BEAST. FastAPI browse/search UI redesigned (2026-05-01). Jupiter deployment has a known audio-file gap (open). Post-show workflow documented but not yet fully automated.

Archive spans 20102018 (no 2013 season), 579 MP3s, ~3040 GB.


Tech Stack

Layer Technology
Transcription faster-whisper (large-v3, CTranslate2 + CUDA), int8_float16, batched
Speaker diarization pyannote.audio 3.1 (WavLM embeddings)
Audio processing ffmpeg, pydub, librosa
Audio fingerprinting chromaprint
Voice activity detection silero-vad
ML / classification scikit-learn (break pattern classifier)
Content analysis Ollama — qwen3:14b (narrative/summary), local LLM
Archive database SQLite with FTS5 (segments, Q&A pairs)
Web server FastAPI + uvicorn (embedded HTML templates)
Hardware (primary) DESKTOP-0O8A1RL — RTX 5070 Ti Laptop GPU
Hardware (secondary) GURU-BEAST-ROG — RTX 4090 (benchmark pending)

Architecture

Audio Processor Pipeline

Raw MP3 (full broadcast with commercials)
  |
  +-- 1. Transcription: faster-whisper large-v3 (63.8x realtime on 5070 Ti)
  |       Output: word-level timestamps, language detection
  |
  +-- 2. Speaker Diarization: pyannote.audio 3.1 (209.7x realtime on 5070 Ti)
  |       10s windows / 5s hop, midpoint boundary resolution at load time
  |       Speaker profiles: host (Mike, era-specific embeddings), co-hosts, callers
  |
  +-- 3. Segment Detection: Multi-signal classifier (6 signals, combined weighted score)
  |       Signals: fingerprint match (0.30), speaker identity (0.25),
  |       audio characteristics (0.20), break pattern (0.15), structural heuristics (0.10)
  |       Element library: SQLite fingerprints.db + learning/discovery system
  |
  +-- 4. Commercial Removal: ffmpeg — stitch segments, EBU R128 normalize
  |
  +-- 5. Segment Splitting: ffmpeg — individual MP3s per segment, ID3 tags, chapter markers
  |
  +-- 6. Content Analysis: Ollama qwen3:14b
          Output: episode summary, per-segment summaries, key quotes, topic tags,
                  suggested blog post topics, auto-filled post-show debrief

Key Thresholds

Parameter Value
Host/co-host match threshold 0.85 cosine similarity (WavLM)
Tara (co-host) vs Mike separation 0.698 cosine similarity
CALLER minimum coverage in transcript segment 4.0 seconds
Promo score threshold 2 (weighted signatures)
Min Q&A question duration 5.0s
Min Q&A answer duration 15.0s
Max gap between Q and A 30.0s
Commercial break: min/max duration 30s / 300s
Combined confidence threshold (commercial) 0.70

Voice Profile System

Bootstrapped from the 579-episode archive. Host (Mike) has era-specific embeddings (2010, 2014, 2018, 2026). Co-host Tara has 44 embeddings from 2 episodes. Unknown repeat voices are clustered and held for host review.

voice-profiles/
  host-mike-swanson/      -- composite + era embeddings
  guests/<name>.npy       -- named guest embeddings (built over time)
  callers/regular-NNN.npy -- unnamed repeat callers
  unknown/cluster-NNN.npy -- unidentified voices appearing multiple times

Archive Index (SQLite)

archive.db schema: episodes, segments, segments_fts (FTS5), qa_pairs, qa_fts. As of 2026-05-01 on BEAST: 572 episodes indexed.

FTS5 search supports: segment text search, Q&A pair search, speaker filter.

FastAPI Browse/Search UI

Single-file server at projects/radio-show/audio-processor/server/main.py. Two embedded HTML templates:

  • INDEX_HTML — search/browse page with CSS custom property theme (#c39733 accent), browse-mode toggle, Q&A pill badges.
  • EPISODE_HTML — episode detail page with sticky <audio> player, active-Q&A highlight that follows playhead via timeupdate listener, preload="metadata".

Env vars: ARCHIVE_DB, EPISODES_DIR, PORT.

Post-Show Content Workflow

Three content tiers produced within 48 hours of each episode:

Tier Target Output
1 Radio show website Episode page (website/src/content/episodes/sXXeYY-slug.md) with summary, chapters, links
2 Flarum forum Discussion thread (tag: Show Discussion, ID 8) at community.azcomputerguru.com
3 Radio show website 13 deep-dive blog posts (website/src/content/blog/<slug>.md)

Claude handles: generating all content from show-prep + debrief, posting to Flarum via DB insert, building and deploying the Astro website.


Deployment / Hosting

Item Value
Jupiter (primary archive host) 172.16.3.20:8765 — uvicorn, FastAPI
Local dev (BEAST) 127.0.0.1:8765 — same port as Jupiter for bookmark parity
Archive source (IX server) 172.16.3.10gurushow@, /home/gurushow/public_html/archive/Radio/
Archive local copy (BEAST) projects/radio-show/audio-processor/archive-data/
Forum community.azcomputerguru.com (Flarum)
Radio show website Astro site, deployed via rsync

[WARNING] Jupiter's /data/episodes tree is EMPTY. GET /api/audio/{id} returns HTTP 404 for all episode IDs on Jupiter. Audio works locally on BEAST only (full archive in archive-data/episodes/). Fix decision is pending — see Open Items.


Configuration / Credentials

Secret Location
IX server SSH (gurushow) SOPS vault — search gurushow or ix server
HuggingFace token (pyannote license) huggingface-cli login — required for pyannote.audio
Forum DB access (Flarum insert) SOPS vault — search flarum or community forum

IX server access: paramiko with look_for_keys=False, allow_agent=False. Tailscale required for 172.16.3.10.


Active Work / Open Items

  • Jupiter audio fix (open, unresolved). Three options, no pick made:
    1. rsync full archive (~3040 GB) to Jupiter at /data/episodes/
    2. Proxy /api/audio/{id} from Jupiter to IX on demand (~5 lines)
    3. Point <audio src> at IX directly via public HTTPS endpoint
  • Commit intro/QA sort tie-break fix (server/main.py lines 551, 597 — key=lambda x: x[0]). Two-line fix, uncommitted as of end of 2026-05-01 session. Mike had not yet OK'd the commit.
  • RTX 4090 benchmark on BEAST — establish diarization RTF baseline (expected ~250300x vs 209.7x on laptop 5070 Ti).
  • Download full archive from IX to BEAST for batch training (paramiko script skeleton exists in prior session log 2026-04-27-diarization-pipeline.md).
  • Verify Tara profile generalizes across 2015/2016 episodes — re-run build_cohost_profile.py with additional windows if false positives appear.
  • Post-show workflow automation — social media, email newsletter, podcast RSS still need platform setup.

Key Events / History

Date Event
20102018 Show original run. 579 episodes archived. No 2013 season.
2026-04-27 Q&A extraction + co-host profile session (DESKTOP-0O8A1RL). Built Tara co-host voice profile (44 embeddings, 0.698 cosine vs Mike). Fixed false-positive Q&A extraction for co-host episodes. Created archive.db with FTS5. Indexed 6 test episodes: 762 segments, 10 Q&A pairs. Transcription benchmarked at 63.8x realtime; diarization at 209.7x realtime.
2026-04-30 UI redesign done on BEAST (mid-session, uncommitted before reboot).
2026-05-01 Session recovery after BEAST reboot. Found 820-line uncommitted diff to server/main.py. Committed as d7ce9cb (rebased to 296d157). Diagnosed Jupiter audio-404 (pre-existing deployment gap, not a regression). Deployed locally on BEAST — confirmed 572 episodes, working audio. Fixed episode-500 sort bug (episode 479).
2026-05-01 Co-host name corrected: previously labeled "Tom" in session log, Mike confirmed it is "Tara." All references updated.

Anti-Patterns / Warnings

[WARNING] Do NOT attempt interactive SSH to gurushow@172.16.3.10 from scripts. Use paramiko with look_for_keys=False, allow_agent=False. Key-based auth is disabled on this host.

[WARNING] Tailscale must be active to reach 172.16.3.10 (IX server) or 172.16.3.20 (Jupiter).

[WARNING] The Ollama /save protocol has a known stale-prompt-file bug: save_narrative_prompt.txt at C:/Users/guru/AppData/Local/Temp/ is reused across sessions and can cause qwen3 to produce a narrative about the WRONG session. Recovery: write narrative directly. Fix: delete prompt file before re-writing, or use a unique per-session filename.

[WARNING] sorted() over (timestamp, sqlite3.Row) tuples without key= will raise TypeError when two rows share the same timestamp. Always use key=lambda x: x[0]. This bit _episode_html at lines 551 and 597 (2026-05-01 bug).

[INFO] Co-host voice profiles must be built from the first 60 minutes of co-host episodes. Real callers do not call in during the first hour — those CALLER-labeled windows are safely all co-host speech.

[INFO] Tara's exact tenure as co-host is unverified. Do not assume her profile applies across all 20132016 episodes without spot-checking.


  • wiki/systems/jupiter.md [unverified — may not exist yet] — Jupiter server spec
  • wiki/systems/ix-server.md [unverified — may not exist yet] — IX hosting server spec
  • wiki/projects/gururmm.md — related ACG project
  • projects/radio-show/audio-processor/README.md — full pipeline spec and configuration reference
  • projects/radio-show/post-show-workflow.md — full post-show content workflow spec