radio: rename Tom -> Tara, expand speaker roster

Mike confirmed there is no co-host named "Tom" — the voice in 2014-s6e19
and 2016-s8e43 is Tara. The 5070 Ti session fabricated the Tom identity.
The voice profile itself (44 embeddings, 0.698 cosine vs Mike) is correct;
only the human label was wrong.

Rename swept:
- voice-profiles/tom/ -> voice-profiles/tara/ (git mv preserves all .npy)
- voice-profiles/profiles.json: "Tom" key -> "Tara"
- build_cohost_profile.py: TOM_WINDOWS -> TARA_WINDOWS, COHOST_NAME, comments
- 2026-04-27-qa-extraction-cohost-indexing.md: correction header + body sweep
- 2026-04-27-4090-benchmark-and-test-set.md: closure note
- .claude/memory/radio_show_no_cohost_named_tom.md: resolution + speaker roster

Diarization re-run after rename so speaker_map emits "Cohost: Tara".
Q&A counts unchanged (rename is label-only): 9 pairs across 6 test episodes.

Tara distribution from the post-rename diarization (per-episode % of audio):
  2011-03-12-hr1   140s   5.6%   likely false positive (call-in only)
  2012-03-10-hr1    30s   1.1%   likely false positive (call-in only)
  2012-06-09-hr1   340s  12.8%   suspicious — pending Mike confirm
  2014-s6e19       680s  23.3%   confirmed
  2016-s8e43      1890s  35.5%   confirmed
  2017-s9e30       610s  11.4%   plausible — pending Mike confirm

Broader speaker-roster context Mike provided this session (saved to
memory): the show has had multiple co-hosts (Tara, Randall, Rob) plus
producers/board ops (Andrew, Shannon, Ken, others) who would sometimes
go on-air. Only Tara has a profile so far. Every other speaker is
currently labeled CALLER, which means small CO-HOST attributions in
unexpected episodes (e.g. 2011/2012) may actually be a producer rather
than a false positive — Mike to spot-check.

Action item before full-archive run: build profiles for Randall, Rob,
and the named producers to avoid systematic Q&A false positives in
early-years and 2018/2019 episodes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 15:11:03 -07:00
parent b9a4bb8807
commit fb683d6a05
55 changed files with 122 additions and 53 deletions

View File

@@ -41,4 +41,4 @@
- [Neptune SBR Email Routing Setup](project_neptune_sbr_email_routing.md) - Full SBR routing chain, config file locations, MailProtector integration, access methods
- [Dataforth Test Datasheet Pipeline](project_datasheet_pipeline.md) - Full pipeline rebuilt 2026-03-27. Server-side generation replaces DFWDS/Uploader. Website upload still broken.
- [Dataforth Security Incident](project_dataforth_incident_2026-03-27.md) - DF-JOEL2 compromised, MFA deployed, IC3 filed. CA policies enforce April 4.
- [Radio show — no co-host named Tom](radio_show_no_cohost_named_tom.md) — voice profile is real, name is hallucinated. Do not propagate "Tom" as a show member; ask Mike for correct identity.
- [Radio show co-host — Tara, not Tom](radio_show_no_cohost_named_tom.md) — Co-host in 2014-s6e19 and 2016-s8e43 is Tara. "Tom" was hallucinated; rename complete. Multiple co-hosts have rotated through the show.

View File

@@ -1,24 +1,54 @@
---
name: Radio show — "Tom" is not a real co-host
description: Correction to a fabricated co-host identity in the Computer Guru Show diarization pipeline; the voice exists but the name "Tom" is wrong
name: Radio show — co-host roster (Randall, Rob, Tara, others)
description: The Computer Guru Show has had multiple co-hosts over the years. The fabricated "Tom" was actually Tara. Track known co-hosts here as Mike confirms identities.
type: project
---
There is no co-host named **Tom** on The Computer Guru Show. Mike Swanson confirmed this directly on 2026-04-27.
The Computer Guru Show has had **multiple co-hosts** rotating through over the years. Mike Swanson is the only constant host.
The 5070 Ti session (`projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md`) and corresponding code/data on disk fabricated this identity:
## Known speaker roster (per Mike, 2026-04-27)
- `voice-profiles/tom/` — directory with 44 embeddings labeled as "Tom"
- `voice-profiles/profiles.json` — entry naming the profile "Tom"
- `build_cohost_profile.py` — references TOM_WINDOWS dict
- The session log claims "Tom was the regular in-studio co-host/board-op roughly 2013-2016" — this is hallucinated
The show has had multiple **co-hosts** rotating through, plus **producers / board ops** who would sometimes go on-air. Both groups need separate voice profiles to avoid being mislabeled as callers.
The underlying voice profile **is technically valid** — there is a real second voice in 2014-s6e19 and 2016-s8e43 that is not Mike and not a caller, and the cosine separation (0.698 vs Mike's 0.85) is sound. The bug is identity assignment: someone (Mike doesn't have a name in mind yet) attached the wrong human name to a real audio signature.
### Co-hosts
| Co-host | Era | Confirmed in audio | Profile built |
|---|---|---|---|
| **Randall** | early years | not yet | no |
| **Rob** | early years + appearances in 2018/2019 (Mike unsure of exact dates) | not yet | no |
| **Tara** | confirmed 2014-s6e19, 2016-s8e43; diarizer also found her in 2017-s9e30 (610s/11.4%) — pending Mike spot-check | yes | yes — `voice-profiles/tara/` (44 embeddings) |
**Why:** This will re-surface every time a future conversation reads the session log, the directory tree, or `profiles.json`. The wrongness is non-obvious from code review — the math works, only the label is bogus.
### Producers / board ops (sometimes on-air)
| Person | Profile built |
|---|---|
| **Andrew** | no |
| **Shannon** | no |
| **Ken** | no |
| (Mike: "a couple more" he doesn't recall off-hand) | no |
**How to apply:**
- Do not refer to "Tom" as a member of the show.
- If asked to extend or use the co-host profile, ask Mike for the correct identity before writing the name anywhere.
- Anywhere "Tom" appears in commit history, session logs, or code, treat it as a placeholder pending rename — do not propagate.
- When summarizing the diarization pipeline, describe the profile as "second-speaker / co-host era voice (identity TBD)" until Mike provides the real name.
Mike: "The 'producer' (board op) would also be on-air sometimes." Anywhere a producer's voice appears, they're currently being labeled CALLER, which inflates Q&A false positives. Same problem as unprofiled co-hosts.
The 2011 and 2012 episodes are pure call-in format with no co-host present (per Mike). However, a producer could still have been on-air — so even small CO-HOST attributions in 2011/2012 (1-12% of audio) may be capturing a producer rather than being false positives.
## "Tom" was hallucinated
The 5070 Ti session (`2026-04-27-qa-extraction-cohost-indexing.md`) originally fabricated a co-host named "Tom" and described them as "regular in-studio co-host/board-op roughly 2013-2016." That entire identity was invented by the prior conversation. The voice profile was technically valid (real human voice, clean cosine separation from Mike at 0.698) but the human attached to it was wrong.
**Resolution applied 2026-04-27 (GURU-BEAST-ROG session):**
- `voice-profiles/tom/` renamed to `voice-profiles/tara/`
- `voice-profiles/profiles.json`: key `Tom``Tara`
- `build_cohost_profile.py`: `TOM_WINDOWS``TARA_WINDOWS`, `COHOST_NAME = "Tara"`
- Both relevant session logs updated; correction header preserves the history
- Diarization re-run; `speaker_map` now emits `Cohost: Tara`
## Implications for the archive pipeline
Co-hosts without a built profile get labeled CALLER, which inflates Q&A false positives in those eras:
- **Early-years archive (~2010-2013):** Randall and Rob are present but unprofiled — caller-labeled audio in this era is suspect.
- **2018/2019:** Rob makes appearances — same issue.
- **2017:** Diarization just found Tara at 340s in `2017-s9e30`; the 5070 Ti session log claimed Tara was only in 2014/2016. Pending Mike's confirmation that the 2017 attribution is correct.
## How to apply
- When diarizing a new episode and a CALLER cluster looks too long / too prominent / too consistent, suspect an unprofiled co-host before assuming a real caller.
- Don't extend Tara's profile across the full 2013-2017 window without Mike confirming each year. She may not have been in every episode.
- Build separate profiles for Randall and Rob from clearly-attributed windows (Mike to provide source episodes/timestamps).
- Never invent a co-host name from voice signature alone — ask Mike.