radio: rename Tom -> Tara, expand speaker roster
Mike confirmed there is no co-host named "Tom" — the voice in 2014-s6e19 and 2016-s8e43 is Tara. The 5070 Ti session fabricated the Tom identity. The voice profile itself (44 embeddings, 0.698 cosine vs Mike) is correct; only the human label was wrong. Rename swept: - voice-profiles/tom/ -> voice-profiles/tara/ (git mv preserves all .npy) - voice-profiles/profiles.json: "Tom" key -> "Tara" - build_cohost_profile.py: TOM_WINDOWS -> TARA_WINDOWS, COHOST_NAME, comments - 2026-04-27-qa-extraction-cohost-indexing.md: correction header + body sweep - 2026-04-27-4090-benchmark-and-test-set.md: closure note - .claude/memory/radio_show_no_cohost_named_tom.md: resolution + speaker roster Diarization re-run after rename so speaker_map emits "Cohost: Tara". Q&A counts unchanged (rename is label-only): 9 pairs across 6 test episodes. Tara distribution from the post-rename diarization (per-episode % of audio): 2011-03-12-hr1 140s 5.6% likely false positive (call-in only) 2012-03-10-hr1 30s 1.1% likely false positive (call-in only) 2012-06-09-hr1 340s 12.8% suspicious — pending Mike confirm 2014-s6e19 680s 23.3% confirmed 2016-s8e43 1890s 35.5% confirmed 2017-s9e30 610s 11.4% plausible — pending Mike confirm Broader speaker-roster context Mike provided this session (saved to memory): the show has had multiple co-hosts (Tara, Randall, Rob) plus producers/board ops (Andrew, Shannon, Ken, others) who would sometimes go on-air. Only Tara has a profile so far. Every other speaker is currently labeled CALLER, which means small CO-HOST attributions in unexpected episodes (e.g. 2011/2012) may actually be a producer rather than a false positive — Mike to spot-check. Action item before full-archive run: build profiles for Randall, Rob, and the named producers to avoid systematic Q&A false positives in early-years and 2018/2019 episodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -25,13 +25,19 @@ This run uses the post-overhaul code (commit `e9ac607`): batched Whisper transcr
|
||||
|
||||
---
|
||||
|
||||
## Important — "Tom" co-host name is wrong
|
||||
## Co-host identity correction — Tara, not Tom
|
||||
|
||||
The 5070 Ti session built a voice profile labeled `voice-profiles/tom/` and described it in the session log as "Tom, regular in-studio co-host/board-op roughly 2013-2016." Mike confirmed on this session: **there is no co-host named Tom**. The voice profile is real (clean cosine separation, 0.698 vs Mike) and the diarization correctly identifies the second speaker, but the human identity attached to it is hallucinated.
|
||||
The 5070 Ti session fabricated a co-host named "Tom" — Mike confirmed there is no such person on the show. After listening to the source windows, Mike identified the voice in both 2014-s6e19 and 2016-s8e43 as **Tara** (a real co-host; the show has had multiple over the years).
|
||||
|
||||
The directory, `profiles.json` entry, `build_cohost_profile.py` references, and the 5070 Ti session log all carry the bogus name. Identity TBD pending Mike confirming who that voice actually is.
|
||||
Rename swept this session:
|
||||
- `voice-profiles/tom/` → `voice-profiles/tara/` (git mv, all 44 embeddings + composite preserved)
|
||||
- `voice-profiles/profiles.json`: `"Tom"` key → `"Tara"`
|
||||
- `build_cohost_profile.py`: docstring, `TOM_WINDOWS` → `TARA_WINDOWS`, `COHOST_NAME = "Tara"`, console output strings
|
||||
- `projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md`: correction header added, all body references updated
|
||||
- `.claude/memory/radio_show_no_cohost_named_tom.md`: resolution recorded
|
||||
- Diarization re-run post-rename so `speaker_map` in each `diarization.json` emits `Cohost: Tara`
|
||||
|
||||
Memory entry added: `.claude/memory/radio_show_no_cohost_named_tom.md`. The profile will be renamed once Mike provides the correct identity.
|
||||
The 5070 Ti session log's claim of "Tom was the regular co-host roughly 2013-2016" carried two errors: the wrong name AND an unverified tenure window. The corrected log notes Tara appears in 2014-s6e19 and 2016-s8e43 only — generalizing to the full 2013-2016 era hasn't been confirmed.
|
||||
|
||||
---
|
||||
|
||||
@@ -115,6 +121,31 @@ archive.db is not on this machine — index update happens on DESKTOP-0O8A1RL.
|
||||
|
||||
---
|
||||
|
||||
## Tara distribution across the test set (post-rename diarization)
|
||||
|
||||
After the rename, the diarizer's per-episode `speaker_map` shows Tara in **all 6** test episodes — well beyond the 2014+2016 the 5070 Ti session log claimed.
|
||||
|
||||
| Episode | Tara (seconds) | % of audio | Read |
|
||||
|---|---|---|---|
|
||||
| 2011-03-12-hr1 | 140s (2:20) | 5.6% | likely false positive — Mike confirms 2011 was pure call-in |
|
||||
| 2012-03-10-hr1 | 30s (0:30) | 1.1% | likely false positive — 2012 was pure call-in |
|
||||
| 2012-06-09-hr1 | 340s (5:40) | 12.8% | suspicious — too much for noise; awaiting Mike confirm |
|
||||
| 2014-s6e19 | 680s (11:20) | 23.3% | confirmed (Mike) |
|
||||
| 2016-s8e43 | 1890s (31:30) | 35.5% | confirmed (Mike) |
|
||||
| 2017-s9e30 | 610s (10:10) | 11.4% | plausible — pending Mike confirm; 5070 Ti log only listed Tara in 2014+2016 |
|
||||
|
||||
**Mike's broader correction (2026-04-27):**
|
||||
- **Co-hosts** rotated through over the years. Confirmed: Tara, Randall (early years), Rob (early years + occasional 2018/2019).
|
||||
- **Producers / board ops** would sometimes go on-air. Named so far: Andrew, Shannon, Ken, plus "a couple more" Mike doesn't recall off-hand.
|
||||
|
||||
Of all these, only Tara has a voice profile. Every other co-host AND every producer-on-air moment in the archive is currently being labeled CALLER, which inflates Q&A false positives in those eras and episodes.
|
||||
|
||||
The small Tara percentages in 2011/2012 (1-13%) most likely reflect the 0.85 cosine threshold hitting on a similar-sounding speaker that isn't actually Tara — could be a producer (Andrew/Shannon/Ken/etc) or another early-years voice we haven't catalogued. Worth Mike sampling these short windows to identify before assuming false positive vs producer.
|
||||
|
||||
**Implication for full-archive runs:** before processing the 579-episode archive in earnest, build profiles for at least Randall, Rob, and the named producers. Otherwise the Q&A extraction across early-years and 2018/2019 episodes will inherit the same false-positive pattern that originally produced 12 bogus pairs in 2016-s8e43.
|
||||
|
||||
---
|
||||
|
||||
## Pending work (from 5070 Ti session, still unblocked)
|
||||
|
||||
1. **Resolve "Tom" identity** — Mike to confirm who the second voice is in 2014-s6e19 and 2016-s8e43. Then rename `voice-profiles/tom/`, update `profiles.json`, fix labels in code. Until then, voice-profile data is correct but mislabeled.
|
||||
|
||||
Reference in New Issue
Block a user