Adds 11 client articles and 5 project articles: Clients: kittle, khalsa, anaise, azcomputerguru.com, bg-builders, evs, furrier, horseshoe-management, kittle-design, scileppi-law, western-tire Projects: discord-bot, radio-show, msp-pricing, wrightstown-smarthome, wrightstown-solar Updates wiki/index.md with all new entries, cross-references, and removes seeded client:birthbiologic from compilation queue. Critical findings surfaced: - Kittle: WS2025 EVAL license, no backups, 3 plaintext creds in Syncro - Western Tire: SSL cert *.westerntire.com expires 2026-05-30 - Kittle Design: active compromise (Ken inbox rule unresolved) - Horseshoe Mgmt: plaintext creds for 5+ users in Syncro notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
206 lines
10 KiB
Markdown
206 lines
10 KiB
Markdown
---
|
||
type: project
|
||
name: radio-show
|
||
display_name: The Computer Guru Show
|
||
last_compiled: 2026-05-24
|
||
compiled_by: DESKTOP-0O8A1RL/claude-main
|
||
sources:
|
||
- projects/radio-show/post-show-workflow.md
|
||
- projects/radio-show/audio-processor/README.md
|
||
- projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md
|
||
- projects/radio-show/session-logs/2026-05-01-ui-redesign-recovery.md
|
||
---
|
||
|
||
# The Computer Guru Show
|
||
|
||
## Overview
|
||
|
||
"The Computer Guru Show" is Mike Swanson's radio program. The project covers two distinct workstreams:
|
||
|
||
1. **Audio Processor** — Automated pipeline that processes raw broadcast recordings (with commercials) into podcast-ready audio, transcripts, speaker-diarized segments, and a searchable SQLite archive.
|
||
2. **Post-Show Content Workflow** — Process for turning each episode into an episode page (website), forum discussion thread (Flarum), and 1–3 deep-dive blog posts within 48 hours of air.
|
||
|
||
**Status:** Active development. Audio processor pipeline functional with 572 episodes indexed locally on BEAST. FastAPI browse/search UI redesigned (2026-05-01). Jupiter deployment has a known audio-file gap (open). Post-show workflow documented but not yet fully automated.
|
||
|
||
Archive spans 2010–2018 (no 2013 season), 579 MP3s, ~30–40 GB.
|
||
|
||
---
|
||
|
||
## Tech Stack
|
||
|
||
| Layer | Technology |
|
||
|---|---|
|
||
| Transcription | faster-whisper (`large-v3`, CTranslate2 + CUDA), int8_float16, batched |
|
||
| Speaker diarization | pyannote.audio 3.1 (WavLM embeddings) |
|
||
| Audio processing | ffmpeg, pydub, librosa |
|
||
| Audio fingerprinting | chromaprint |
|
||
| Voice activity detection | silero-vad |
|
||
| ML / classification | scikit-learn (break pattern classifier) |
|
||
| Content analysis | Ollama — `qwen3:14b` (narrative/summary), local LLM |
|
||
| Archive database | SQLite with FTS5 (segments, Q&A pairs) |
|
||
| Web server | FastAPI + uvicorn (embedded HTML templates) |
|
||
| Hardware (primary) | DESKTOP-0O8A1RL — RTX 5070 Ti Laptop GPU |
|
||
| Hardware (secondary) | GURU-BEAST-ROG — RTX 4090 (benchmark pending) |
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### Audio Processor Pipeline
|
||
|
||
```
|
||
Raw MP3 (full broadcast with commercials)
|
||
|
|
||
+-- 1. Transcription: faster-whisper large-v3 (63.8x realtime on 5070 Ti)
|
||
| Output: word-level timestamps, language detection
|
||
|
|
||
+-- 2. Speaker Diarization: pyannote.audio 3.1 (209.7x realtime on 5070 Ti)
|
||
| 10s windows / 5s hop, midpoint boundary resolution at load time
|
||
| Speaker profiles: host (Mike, era-specific embeddings), co-hosts, callers
|
||
|
|
||
+-- 3. Segment Detection: Multi-signal classifier (6 signals, combined weighted score)
|
||
| Signals: fingerprint match (0.30), speaker identity (0.25),
|
||
| audio characteristics (0.20), break pattern (0.15), structural heuristics (0.10)
|
||
| Element library: SQLite fingerprints.db + learning/discovery system
|
||
|
|
||
+-- 4. Commercial Removal: ffmpeg — stitch segments, EBU R128 normalize
|
||
|
|
||
+-- 5. Segment Splitting: ffmpeg — individual MP3s per segment, ID3 tags, chapter markers
|
||
|
|
||
+-- 6. Content Analysis: Ollama qwen3:14b
|
||
Output: episode summary, per-segment summaries, key quotes, topic tags,
|
||
suggested blog post topics, auto-filled post-show debrief
|
||
```
|
||
|
||
### Key Thresholds
|
||
|
||
| Parameter | Value |
|
||
|---|---|
|
||
| Host/co-host match threshold | 0.85 cosine similarity (WavLM) |
|
||
| Tara (co-host) vs Mike separation | 0.698 cosine similarity |
|
||
| CALLER minimum coverage in transcript segment | 4.0 seconds |
|
||
| Promo score threshold | 2 (weighted signatures) |
|
||
| Min Q&A question duration | 5.0s |
|
||
| Min Q&A answer duration | 15.0s |
|
||
| Max gap between Q and A | 30.0s |
|
||
| Commercial break: min/max duration | 30s / 300s |
|
||
| Combined confidence threshold (commercial) | 0.70 |
|
||
|
||
### Voice Profile System
|
||
|
||
Bootstrapped from the 579-episode archive. Host (Mike) has era-specific embeddings (2010, 2014, 2018, 2026). Co-host Tara has 44 embeddings from 2 episodes. Unknown repeat voices are clustered and held for host review.
|
||
|
||
```
|
||
voice-profiles/
|
||
host-mike-swanson/ -- composite + era embeddings
|
||
guests/<name>.npy -- named guest embeddings (built over time)
|
||
callers/regular-NNN.npy -- unnamed repeat callers
|
||
unknown/cluster-NNN.npy -- unidentified voices appearing multiple times
|
||
```
|
||
|
||
### Archive Index (SQLite)
|
||
|
||
`archive.db` schema: `episodes`, `segments`, `segments_fts` (FTS5), `qa_pairs`, `qa_fts`. As of 2026-05-01 on BEAST: 572 episodes indexed.
|
||
|
||
FTS5 search supports: segment text search, Q&A pair search, speaker filter.
|
||
|
||
### FastAPI Browse/Search UI
|
||
|
||
Single-file server at `projects/radio-show/audio-processor/server/main.py`. Two embedded HTML templates:
|
||
|
||
- `INDEX_HTML` — search/browse page with CSS custom property theme (`#c39733` accent), browse-mode toggle, Q&A pill badges.
|
||
- `EPISODE_HTML` — episode detail page with sticky `<audio>` player, active-Q&A highlight that follows playhead via `timeupdate` listener, `preload="metadata"`.
|
||
|
||
Env vars: `ARCHIVE_DB`, `EPISODES_DIR`, `PORT`.
|
||
|
||
### Post-Show Content Workflow
|
||
|
||
Three content tiers produced within 48 hours of each episode:
|
||
|
||
| Tier | Target | Output |
|
||
|---|---|---|
|
||
| 1 | Radio show website | Episode page (`website/src/content/episodes/sXXeYY-slug.md`) with summary, chapters, links |
|
||
| 2 | Flarum forum | Discussion thread (tag: Show Discussion, ID 8) at community.azcomputerguru.com |
|
||
| 3 | Radio show website | 1–3 deep-dive blog posts (`website/src/content/blog/<slug>.md`) |
|
||
|
||
Claude handles: generating all content from show-prep + debrief, posting to Flarum via DB insert, building and deploying the Astro website.
|
||
|
||
---
|
||
|
||
## Deployment / Hosting
|
||
|
||
| Item | Value |
|
||
|---|---|
|
||
| Jupiter (primary archive host) | `172.16.3.20:8765` — uvicorn, FastAPI |
|
||
| Local dev (BEAST) | `127.0.0.1:8765` — same port as Jupiter for bookmark parity |
|
||
| Archive source (IX server) | `172.16.3.10` — `gurushow@`, `/home/gurushow/public_html/archive/Radio/` |
|
||
| Archive local copy (BEAST) | `projects/radio-show/audio-processor/archive-data/` |
|
||
| Forum | community.azcomputerguru.com (Flarum) |
|
||
| Radio show website | Astro site, deployed via rsync |
|
||
|
||
[WARNING] Jupiter's `/data/episodes` tree is EMPTY. `GET /api/audio/{id}` returns HTTP 404 for all episode IDs on Jupiter. Audio works locally on BEAST only (full archive in `archive-data/episodes/`). Fix decision is pending — see Open Items.
|
||
|
||
---
|
||
|
||
## Configuration / Credentials
|
||
|
||
| Secret | Location |
|
||
|---|---|
|
||
| IX server SSH (gurushow) | SOPS vault — search `gurushow` or `ix server` |
|
||
| HuggingFace token (pyannote license) | `huggingface-cli login` — required for pyannote.audio |
|
||
| Forum DB access (Flarum insert) | SOPS vault — search `flarum` or `community forum` |
|
||
|
||
IX server access: paramiko with `look_for_keys=False, allow_agent=False`. Tailscale required for `172.16.3.10`.
|
||
|
||
---
|
||
|
||
## Active Work / Open Items
|
||
|
||
- [ ] **Jupiter audio fix (open, unresolved).** Three options, no pick made:
|
||
1. rsync full archive (~30–40 GB) to Jupiter at `/data/episodes/`
|
||
2. Proxy `/api/audio/{id}` from Jupiter to IX on demand (~5 lines)
|
||
3. Point `<audio src>` at IX directly via public HTTPS endpoint
|
||
- [ ] **Commit intro/QA sort tie-break fix** (`server/main.py` lines 551, 597 — `key=lambda x: x[0]`). Two-line fix, uncommitted as of end of 2026-05-01 session. Mike had not yet OK'd the commit.
|
||
- [ ] **RTX 4090 benchmark on BEAST** — establish diarization RTF baseline (expected ~250–300x vs 209.7x on laptop 5070 Ti).
|
||
- [ ] **Download full archive from IX to BEAST** for batch training (paramiko script skeleton exists in prior session log `2026-04-27-diarization-pipeline.md`).
|
||
- [ ] **Verify Tara profile generalizes across 2015/2016 episodes** — re-run `build_cohost_profile.py` with additional windows if false positives appear.
|
||
- [ ] **Post-show workflow automation** — social media, email newsletter, podcast RSS still need platform setup.
|
||
|
||
---
|
||
|
||
## Key Events / History
|
||
|
||
| Date | Event |
|
||
|---|---|
|
||
| 2010–2018 | Show original run. 579 episodes archived. No 2013 season. |
|
||
| 2026-04-27 | Q&A extraction + co-host profile session (DESKTOP-0O8A1RL). Built Tara co-host voice profile (44 embeddings, 0.698 cosine vs Mike). Fixed false-positive Q&A extraction for co-host episodes. Created `archive.db` with FTS5. Indexed 6 test episodes: 762 segments, 10 Q&A pairs. Transcription benchmarked at 63.8x realtime; diarization at 209.7x realtime. |
|
||
| 2026-04-30 | UI redesign done on BEAST (mid-session, uncommitted before reboot). |
|
||
| 2026-05-01 | Session recovery after BEAST reboot. Found 820-line uncommitted diff to `server/main.py`. Committed as `d7ce9cb` (rebased to `296d157`). Diagnosed Jupiter audio-404 (pre-existing deployment gap, not a regression). Deployed locally on BEAST — confirmed 572 episodes, working audio. Fixed episode-500 sort bug (episode 479). |
|
||
| 2026-05-01 | Co-host name corrected: previously labeled "Tom" in session log, Mike confirmed it is "Tara." All references updated. |
|
||
|
||
---
|
||
|
||
## Anti-Patterns / Warnings
|
||
|
||
[WARNING] Do NOT attempt interactive SSH to `gurushow@172.16.3.10` from scripts. Use paramiko with `look_for_keys=False, allow_agent=False`. Key-based auth is disabled on this host.
|
||
|
||
[WARNING] Tailscale must be active to reach `172.16.3.10` (IX server) or `172.16.3.20` (Jupiter).
|
||
|
||
[WARNING] The Ollama `/save` protocol has a known stale-prompt-file bug: `save_narrative_prompt.txt` at `C:/Users/guru/AppData/Local/Temp/` is reused across sessions and can cause qwen3 to produce a narrative about the WRONG session. Recovery: write narrative directly. Fix: delete prompt file before re-writing, or use a unique per-session filename.
|
||
|
||
[WARNING] `sorted()` over `(timestamp, sqlite3.Row)` tuples without `key=` will raise `TypeError` when two rows share the same timestamp. Always use `key=lambda x: x[0]`. This bit `_episode_html` at lines 551 and 597 (2026-05-01 bug).
|
||
|
||
[INFO] Co-host voice profiles must be built from the first 60 minutes of co-host episodes. Real callers do not call in during the first hour — those CALLER-labeled windows are safely all co-host speech.
|
||
|
||
[INFO] Tara's exact tenure as co-host is unverified. Do not assume her profile applies across all 2013–2016 episodes without spot-checking.
|
||
|
||
---
|
||
|
||
## Backlinks
|
||
|
||
- `wiki/systems/jupiter.md` [unverified — may not exist yet] — Jupiter server spec
|
||
- `wiki/systems/ix-server.md` [unverified — may not exist yet] — IX hosting server spec
|
||
- `wiki/projects/gururmm.md` — related ACG project
|
||
- `projects/radio-show/audio-processor/README.md` — full pipeline spec and configuration reference
|
||
- `projects/radio-show/post-show-workflow.md` — full post-show content workflow spec
|