radio: rename Tom -> Tara, expand speaker roster

Mike confirmed there is no co-host named "Tom" — the voice in 2014-s6e19
and 2016-s8e43 is Tara. The 5070 Ti session fabricated the Tom identity.
The voice profile itself (44 embeddings, 0.698 cosine vs Mike) is correct;
only the human label was wrong.

Rename swept:
- voice-profiles/tom/ -> voice-profiles/tara/ (git mv preserves all .npy)
- voice-profiles/profiles.json: "Tom" key -> "Tara"
- build_cohost_profile.py: TOM_WINDOWS -> TARA_WINDOWS, COHOST_NAME, comments
- 2026-04-27-qa-extraction-cohost-indexing.md: correction header + body sweep
- 2026-04-27-4090-benchmark-and-test-set.md: closure note
- .claude/memory/radio_show_no_cohost_named_tom.md: resolution + speaker roster

Diarization re-run after rename so speaker_map emits "Cohost: Tara".
Q&A counts unchanged (rename is label-only): 9 pairs across 6 test episodes.

Tara distribution from the post-rename diarization (per-episode % of audio):
  2011-03-12-hr1   140s   5.6%   likely false positive (call-in only)
  2012-03-10-hr1    30s   1.1%   likely false positive (call-in only)
  2012-06-09-hr1   340s  12.8%   suspicious — pending Mike confirm
  2014-s6e19       680s  23.3%   confirmed
  2016-s8e43      1890s  35.5%   confirmed
  2017-s9e30       610s  11.4%   plausible — pending Mike confirm

Broader speaker-roster context Mike provided this session (saved to
memory): the show has had multiple co-hosts (Tara, Randall, Rob) plus
producers/board ops (Andrew, Shannon, Ken, others) who would sometimes
go on-air. Only Tara has a profile so far. Every other speaker is
currently labeled CALLER, which means small CO-HOST attributions in
unexpected episodes (e.g. 2011/2012) may actually be a producer rather
than a false positive — Mike to spot-check.

Action item before full-archive run: build profiles for Randall, Rob,
and the named producers to avoid systematic Q&A false positives in
early-years and 2018/2019 episodes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 15:11:03 -07:00
parent d412495d5c
commit 413e506481
51 changed files with 114 additions and 45 deletions

View File

@@ -1,5 +1,5 @@
"""
Build voice profile for Tom (co-host) from known co-host speech windows.
Build voice profile for Tara (co-host) from known co-host speech windows.
Uses CALLER-labeled windows from the first 60 min of co-host-era episodes,
before any real callers would have called in.
@@ -32,10 +32,10 @@ console.print(f"Device: {device}")
profiler = VoiceProfiler(PROFILES_DIR, device=device)
# Tom's known speech windows per episode
# Tara's known speech windows per episode
# CALLER turns from diarization that are in the first 60 min (before real callers)
# Windows at 0-40s excluded (promo/jingle, not Tom's voice)
TOM_WINDOWS = {
# Windows at 0-40s excluded (promo/jingle, not Tara's voice)
TARA_WINDOWS = {
"2014-s6e19.mp3": [
(195, 260),
(320, 425),
@@ -53,7 +53,7 @@ TOM_WINDOWS = {
],
}
COHOST_NAME = "Tom"
COHOST_NAME = "Tara"
if COHOST_NAME not in profiler.profiles:
profiler.profiles[COHOST_NAME] = SpeakerProfile(
@@ -66,7 +66,7 @@ if COHOST_NAME not in profiler.profiles:
profile = profiler.profiles[COHOST_NAME]
console.print(f"\n[bold]Building co-host profile for: {COHOST_NAME}[/bold]")
for ep_name, windows in TOM_WINDOWS.items():
for ep_name, windows in TARA_WINDOWS.items():
ep_path = EPISODES_DIR / ep_name
if not ep_path.exists():
console.print(f"[yellow] Skipping {ep_name} — not found[/yellow]")
@@ -101,7 +101,7 @@ if not profile.embeddings:
sys.exit(1)
profile.compute_composite()
console.print(f"\n[green]Tom profile built: {profile.num_samples} embeddings "
console.print(f"\n[green]Tara profile built: {profile.num_samples} embeddings "
f"from {len(profile.source_episodes)} episodes[/green]")
# Verify: check cosine similarity vs Mike to ensure separation
@@ -109,7 +109,7 @@ mike = profiler.profiles.get("Mike Swanson")
if mike and mike.composite_embedding is not None and profile.composite_embedding is not None:
sim = float(np.dot(mike.composite_embedding, profile.composite_embedding) /
(np.linalg.norm(mike.composite_embedding) * np.linalg.norm(profile.composite_embedding) + 1e-8))
console.print(f"Tom vs Mike similarity: {sim:.3f} (lower is better separation)")
console.print(f"Tara vs Mike similarity: {sim:.3f} (lower is better separation)")
profiler.save_profiles()
console.print("[bold green]Profile saved.[/bold green]")