Files
claudetools/.claude/memory/radio_show_no_cohost_named_tom.md
Mike Swanson 4c89402df8 radio: skip Clay profile build (failed) — accept 2015-s7e19 Q&A as noisy
First attempt at Clay's voice profile from 2015-s7e19 produced
Clay-vs-Mike cosine similarity of 0.994 — essentially a Mike clone.
Root cause: 10s WavLM x-vector chunks averaged Mike's frequent
interjections together with Clay's dialogue, and Mike's well-trained
profile dominated the resulting embedding signal.

Mike's call: skip Clay, accept the 2015-s7e19 Q&A as noisy. Clay rarely
appears in other episodes, so the cost of not having his profile is
bounded to this one episode plus any rare future appearances.

Cleanup:
- voice-profiles/clay/ removed
- voice-profiles/profiles.json: Clay entry removed
- Memory updated to record the decision and the failure mode

Kept build_clay_profile.py in-repo as documentation of the attempt and
the Mike-similarity-filter pattern. Useful starting point if a future
attempt provides cleaner pure-Clay timestamps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:36:46 -07:00

6.5 KiB

name, description, type
name description type
Radio show — co-host roster (Randall, Rob, Tara, others) The Computer Guru Show has had multiple co-hosts over the years. The fabricated "Tom" was actually Tara. Track known co-hosts here as Mike confirms identities. project

The Computer Guru Show has had multiple co-hosts rotating through over the years. Mike Swanson is the only constant host.

Known speaker roster (per Mike, 2026-04-27)

The show has had multiple co-hosts rotating through, plus producers / board ops who would sometimes go on-air. Both groups need separate voice profiles to avoid being mislabeled as callers.

Co-hosts

Co-host Era Confirmed in audio Profile built
Randall early years not yet no
Rob early years + appearances in 2018/2019 (Mike unsure of exact dates) not yet no
Tony 2012-era co-host (Mike unsure whether on-air in 2012-06-09-hr1) not yet no
Tara confirmed 2014-s6e19, 2016-s8e43, 2018-s10e18 @ 50:50 (verified by Mike 2026-04-27 listen). Plausible in 2015 and 2017 (pending verify). yes yes — voice-profiles/tara/ (44 embeddings, possibly contaminated, see below)

Tara profile contamination flag

Mike spot-checked CO-HOST-flagged windows on 2026-04-27 and found the diarizer matching:

In 2018-s10e18:

  • A bumper (09:20-10:05, music/promo — not a voice)
  • Tara (50:50 — true positive)
  • A caller, "Christopher" (~82:10 — false positive, real caller misattributed as Tara)

In 2012-06-09-hr1:

  • A caller, "Kay" (22:10-26:00 — real caller misattributed as Tara). Spans the 22:25-24:30 (125s) and 25:15-25:55 (40s) CO-HOST turns. Mike unsure whether co-host Tony was on-air this episode.

In 2015-s7e19 (Jan 2015 New Year episode):

  • A caller, "William" (~35:30 — confirmed in transcript: "let's talk to William. Hello, William. How are you?", asks about Excel→Word mail merge)
  • A caller, "Charles" (~16:30 — Mike-identified, transcript not yet verified)
  • A recurring special guest, "Clay" from "Nerd Junkies" — appears multiple times: transcript at 33:13 "More Clay from the Nerd Junkies", at 37:33 "I'm just curious, Clay, do you have any feedback". Clay is a recurring guest, not a co-host. The 4:40 of "Tara"-attributed audio in this episode is likely all Clay + callers, with no actual Tara presence.

Recurring guests / fill-ins

Person Affiliation Confirmed in audio Profile built
Clay "Nerd Junkies" — fills in for Tara when she's out (Mike: rarely appears in other episodes) 2015-s7e19 (throughout — Tara was out, Clay covered) skipped — first attempt failed (Clay vs Mike sim = 0.994); Mike chose to accept 2015-s7e19's Q&A as noisy rather than build cleanly. Mike's rationale: Clay is rare in other episodes, so the cost of not having his profile is bounded

Tara's role is explicit per transcript at 2015-s7e19 @ 00:51: "in Tara's place, we have Clay. Clay from the Nerd Junkies." — Tara is the regular co-host for that era; Clay is a fill-in.

Root cause is likely contamination in build_cohost_profile.py: the TARA_WINDOWS were sourced from "first 60 min CALLER turns" under the assumption "real callers don't call in during the first hour of a 2-hour show." That assumption appears to leak — at least one real caller ended up in Tara's training data, and the resulting profile now matches a too-broad acoustic space.

Two distinct fixes needed:

  1. Bumper handling in diarizer — the qa_extractor has bumper signature detection but the diarizer doesn't filter music/promo segments before speaker matching. Bumpers with vocal content can trigger speaker matches.
  2. Tara profile rebuild from vetted windows — Mike-confirmed windows only, not the heuristic-selected first-60-min approach. The 2026-04-27 listen confirmed 50:50 in 2018-s10e18 as a clean Tara window; more would be needed.

Producers / board ops (sometimes on-air)

Person Profile built
Andrew no
Shannon no
Ken no
Unknown board op (2015-s7e19 opening) no — Mike heard him at the very start of 2015-s7e19, name forgotten
(Mike: "a couple more" he doesn't recall off-hand) no

Mike: "The 'producer' (board op) would also be on-air sometimes." Anywhere a producer's voice appears, they're currently being labeled CALLER, which inflates Q&A false positives. Same problem as unprofiled co-hosts.

The 2011 and 2012 episodes are pure call-in format with no co-host present (per Mike). However, a producer could still have been on-air — so even small CO-HOST attributions in 2011/2012 (1-12% of audio) may be capturing a producer rather than being false positives.

"Tom" was hallucinated

The 5070 Ti session (2026-04-27-qa-extraction-cohost-indexing.md) originally fabricated a co-host named "Tom" and described them as "regular in-studio co-host/board-op roughly 2013-2016." That entire identity was invented by the prior conversation. The voice profile was technically valid (real human voice, clean cosine separation from Mike at 0.698) but the human attached to it was wrong.

Resolution applied 2026-04-27 (GURU-BEAST-ROG session):

  • voice-profiles/tom/ renamed to voice-profiles/tara/
  • voice-profiles/profiles.json: key TomTara
  • build_cohost_profile.py: TOM_WINDOWSTARA_WINDOWS, COHOST_NAME = "Tara"
  • Both relevant session logs updated; correction header preserves the history
  • Diarization re-run; speaker_map now emits Cohost: Tara

Implications for the archive pipeline

Co-hosts without a built profile get labeled CALLER, which inflates Q&A false positives in those eras:

  • Early-years archive (~2010-2013): Randall and Rob are present but unprofiled — caller-labeled audio in this era is suspect.
  • 2018/2019: Rob makes appearances — same issue.
  • 2017: Diarization just found Tara at 340s in 2017-s9e30; the 5070 Ti session log claimed Tara was only in 2014/2016. Pending Mike's confirmation that the 2017 attribution is correct.

How to apply

  • When diarizing a new episode and a CALLER cluster looks too long / too prominent / too consistent, suspect an unprofiled co-host before assuming a real caller.
  • Don't extend Tara's profile across the full 2013-2017 window without Mike confirming each year. She may not have been in every episode.
  • Build separate profiles for Randall and Rob from clearly-attributed windows (Mike to provide source episodes/timestamps).
  • Never invent a co-host name from voice signature alone — ask Mike.