Files

Mike Swanson f4fb131529 wiki: seed remaining clients and projects (batch 3)

Adds 11 client articles and 5 project articles:

Clients: kittle, khalsa, anaise, azcomputerguru.com, bg-builders,
evs, furrier, horseshoe-management, kittle-design, scileppi-law,
western-tire

Projects: discord-bot, radio-show, msp-pricing, wrightstown-smarthome,
wrightstown-solar

Updates wiki/index.md with all new entries, cross-references, and
removes seeded client:birthbiologic from compilation queue.

Critical findings surfaced:
- Kittle: WS2025 EVAL license, no backups, 3 plaintext creds in Syncro
- Western Tire: SSL cert *.westerntire.com expires 2026-05-30
- Kittle Design: active compromise (Ken inbox rule unresolved)
- Horseshoe Mgmt: plaintext creds for 5+ users in Syncro notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-24 19:59:40 -07:00

10 KiB

Raw Blame History

type, name, display_name, last_compiled, compiled_by, sources

type

name

display_name

last_compiled

compiled_by

sources

project

radio-show

The Computer Guru Show

2026-05-24

DESKTOP-0O8A1RL/claude-main

projects/radio-show/post-show-workflow.md

projects/radio-show/audio-processor/README.md

projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md

projects/radio-show/session-logs/2026-05-01-ui-redesign-recovery.md

The Computer Guru Show

Overview

"The Computer Guru Show" is Mike Swanson's radio program. The project covers two distinct workstreams:

Audio Processor — Automated pipeline that processes raw broadcast recordings (with commercials) into podcast-ready audio, transcripts, speaker-diarized segments, and a searchable SQLite archive.
Post-Show Content Workflow — Process for turning each episode into an episode page (website), forum discussion thread (Flarum), and 1–3 deep-dive blog posts within 48 hours of air.

Status: Active development. Audio processor pipeline functional with 572 episodes indexed locally on BEAST. FastAPI browse/search UI redesigned (2026-05-01). Jupiter deployment has a known audio-file gap (open). Post-show workflow documented but not yet fully automated.

Archive spans 2010–2018 (no 2013 season), 579 MP3s, ~30–40 GB.

Tech Stack

Layer	Technology
Transcription	faster-whisper (`large-v3`, CTranslate2 + CUDA), int8_float16, batched
Speaker diarization	pyannote.audio 3.1 (WavLM embeddings)
Audio processing	ffmpeg, pydub, librosa
Audio fingerprinting	chromaprint
Voice activity detection	silero-vad
ML / classification	scikit-learn (break pattern classifier)
Content analysis	Ollama — `qwen3:14b` (narrative/summary), local LLM
Archive database	SQLite with FTS5 (segments, Q&A pairs)
Web server	FastAPI + uvicorn (embedded HTML templates)
Hardware (primary)	DESKTOP-0O8A1RL — RTX 5070 Ti Laptop GPU
Hardware (secondary)	GURU-BEAST-ROG — RTX 4090 (benchmark pending)

Architecture

Audio Processor Pipeline

Raw MP3 (full broadcast with commercials)
  |
  +-- 1. Transcription: faster-whisper large-v3 (63.8x realtime on 5070 Ti)
  |       Output: word-level timestamps, language detection
  |
  +-- 2. Speaker Diarization: pyannote.audio 3.1 (209.7x realtime on 5070 Ti)
  |       10s windows / 5s hop, midpoint boundary resolution at load time
  |       Speaker profiles: host (Mike, era-specific embeddings), co-hosts, callers
  |
  +-- 3. Segment Detection: Multi-signal classifier (6 signals, combined weighted score)
  |       Signals: fingerprint match (0.30), speaker identity (0.25),
  |       audio characteristics (0.20), break pattern (0.15), structural heuristics (0.10)
  |       Element library: SQLite fingerprints.db + learning/discovery system
  |
  +-- 4. Commercial Removal: ffmpeg — stitch segments, EBU R128 normalize
  |
  +-- 5. Segment Splitting: ffmpeg — individual MP3s per segment, ID3 tags, chapter markers
  |
  +-- 6. Content Analysis: Ollama qwen3:14b
          Output: episode summary, per-segment summaries, key quotes, topic tags,
                  suggested blog post topics, auto-filled post-show debrief

Key Thresholds

Parameter	Value
Host/co-host match threshold	0.85 cosine similarity (WavLM)
Tara (co-host) vs Mike separation	0.698 cosine similarity
CALLER minimum coverage in transcript segment	4.0 seconds
Promo score threshold	2 (weighted signatures)
Min Q&A question duration	5.0s
Min Q&A answer duration	15.0s
Max gap between Q and A	30.0s
Commercial break: min/max duration	30s / 300s
Combined confidence threshold (commercial)	0.70

Voice Profile System

Bootstrapped from the 579-episode archive. Host (Mike) has era-specific embeddings (2010, 2014, 2018, 2026). Co-host Tara has 44 embeddings from 2 episodes. Unknown repeat voices are clustered and held for host review.

voice-profiles/
  host-mike-swanson/      -- composite + era embeddings
  guests/<name>.npy       -- named guest embeddings (built over time)
  callers/regular-NNN.npy -- unnamed repeat callers
  unknown/cluster-NNN.npy -- unidentified voices appearing multiple times

Archive Index (SQLite)

archive.db schema: episodes, segments, segments_fts (FTS5), qa_pairs, qa_fts. As of 2026-05-01 on BEAST: 572 episodes indexed.

FTS5 search supports: segment text search, Q&A pair search, speaker filter.

FastAPI Browse/Search UI

Single-file server at projects/radio-show/audio-processor/server/main.py. Two embedded HTML templates:

INDEX_HTML — search/browse page with CSS custom property theme (#c39733 accent), browse-mode toggle, Q&A pill badges.
EPISODE_HTML — episode detail page with sticky <audio> player, active-Q&A highlight that follows playhead via timeupdate listener, preload="metadata".

Env vars: ARCHIVE_DB, EPISODES_DIR, PORT.

Post-Show Content Workflow

Three content tiers produced within 48 hours of each episode:

Tier	Target	Output
1	Radio show website	Episode page (`website/src/content/episodes/sXXeYY-slug.md`) with summary, chapters, links
2	Flarum forum	Discussion thread (tag: Show Discussion, ID 8) at community.azcomputerguru.com
3	Radio show website	1–3 deep-dive blog posts (`website/src/content/blog/<slug>.md`)

Claude handles: generating all content from show-prep + debrief, posting to Flarum via DB insert, building and deploying the Astro website.

Deployment / Hosting

Item	Value
Jupiter (primary archive host)	`172.16.3.20:8765` — uvicorn, FastAPI
Local dev (BEAST)	`127.0.0.1:8765` — same port as Jupiter for bookmark parity
Archive source (IX server)	`172.16.3.10` — `gurushow@`, `/home/gurushow/public_html/archive/Radio/`
Archive local copy (BEAST)	`projects/radio-show/audio-processor/archive-data/`
Forum	community.azcomputerguru.com (Flarum)
Radio show website	Astro site, deployed via rsync

[WARNING] Jupiter's /data/episodes tree is EMPTY. GET /api/audio/{id} returns HTTP 404 for all episode IDs on Jupiter. Audio works locally on BEAST only (full archive in archive-data/episodes/). Fix decision is pending — see Open Items.

Configuration / Credentials

Secret	Location
IX server SSH (gurushow)	SOPS vault — search `gurushow` or `ix server`
HuggingFace token (pyannote license)	`huggingface-cli login` — required for pyannote.audio
Forum DB access (Flarum insert)	SOPS vault — search `flarum` or `community forum`

IX server access: paramiko with look_for_keys=False, allow_agent=False. Tailscale required for 172.16.3.10.

Active Work / Open Items

Jupiter audio fix (open, unresolved). Three options, no pick made:
1. rsync full archive (~30–40 GB) to Jupiter at /data/episodes/
2. Proxy /api/audio/{id} from Jupiter to IX on demand (~5 lines)
3. Point <audio src> at IX directly via public HTTPS endpoint
Commit intro/QA sort tie-break fix (server/main.py lines 551, 597 — key=lambda x: x[0]). Two-line fix, uncommitted as of end of 2026-05-01 session. Mike had not yet OK'd the commit.
RTX 4090 benchmark on BEAST — establish diarization RTF baseline (expected ~250–300x vs 209.7x on laptop 5070 Ti).
Download full archive from IX to BEAST for batch training (paramiko script skeleton exists in prior session log 2026-04-27-diarization-pipeline.md).
Verify Tara profile generalizes across 2015/2016 episodes — re-run build_cohost_profile.py with additional windows if false positives appear.
Post-show workflow automation — social media, email newsletter, podcast RSS still need platform setup.

Key Events / History

Date	Event
2010–2018	Show original run. 579 episodes archived. No 2013 season.
2026-04-27	Q&A extraction + co-host profile session (DESKTOP-0O8A1RL). Built Tara co-host voice profile (44 embeddings, 0.698 cosine vs Mike). Fixed false-positive Q&A extraction for co-host episodes. Created `archive.db` with FTS5. Indexed 6 test episodes: 762 segments, 10 Q&A pairs. Transcription benchmarked at 63.8x realtime; diarization at 209.7x realtime.
2026-04-30	UI redesign done on BEAST (mid-session, uncommitted before reboot).
2026-05-01	Session recovery after BEAST reboot. Found 820-line uncommitted diff to `server/main.py`. Committed as `d7ce9cb` (rebased to `296d157`). Diagnosed Jupiter audio-404 (pre-existing deployment gap, not a regression). Deployed locally on BEAST — confirmed 572 episodes, working audio. Fixed episode-500 sort bug (episode 479).
2026-05-01	Co-host name corrected: previously labeled "Tom" in session log, Mike confirmed it is "Tara." All references updated.

Anti-Patterns / Warnings

[WARNING] Do NOT attempt interactive SSH to gurushow@172.16.3.10 from scripts. Use paramiko with look_for_keys=False, allow_agent=False. Key-based auth is disabled on this host.

[WARNING] Tailscale must be active to reach 172.16.3.10 (IX server) or 172.16.3.20 (Jupiter).

[WARNING] The Ollama /save protocol has a known stale-prompt-file bug: save_narrative_prompt.txt at C:/Users/guru/AppData/Local/Temp/ is reused across sessions and can cause qwen3 to produce a narrative about the WRONG session. Recovery: write narrative directly. Fix: delete prompt file before re-writing, or use a unique per-session filename.

[WARNING] sorted() over (timestamp, sqlite3.Row) tuples without key= will raise TypeError when two rows share the same timestamp. Always use key=lambda x: x[0]. This bit _episode_html at lines 551 and 597 (2026-05-01 bug).

[INFO] Co-host voice profiles must be built from the first 60 minutes of co-host episodes. Real callers do not call in during the first hour — those CALLER-labeled windows are safely all co-host speech.

[INFO] Tara's exact tenure as co-host is unverified. Do not assume her profile applies across all 2013–2016 episodes without spot-checking.

Backlinks

wiki/systems/jupiter.md [unverified — may not exist yet] — Jupiter server spec
wiki/systems/ix-server.md [unverified — may not exist yet] — IX hosting server spec
wiki/projects/gururmm.md — related ACG project
projects/radio-show/audio-processor/README.md — full pipeline spec and configuration reference
projects/radio-show/post-show-workflow.md — full post-show content workflow spec

10 KiB Raw Blame History Unescape Escape

The Computer Guru Show

Overview

Tech Stack

Architecture

Audio Processor Pipeline

Key Thresholds

Voice Profile System

Archive Index (SQLite)

FastAPI Browse/Search UI

Post-Show Content Workflow

Deployment / Hosting

Configuration / Credentials

Active Work / Open Items

Key Events / History

Anti-Patterns / Warnings

Backlinks

10 KiB

Raw Blame History