diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md
index 0620aea1..061c0e62 100644
--- a/.claude/memory/MEMORY.md
+++ b/.claude/memory/MEMORY.md
@@ -41,3 +41,4 @@
 - [Neptune SBR Email Routing Setup](project_neptune_sbr_email_routing.md) - Full SBR routing chain, config file locations, MailProtector integration, access methods
 - [Dataforth Test Datasheet Pipeline](project_datasheet_pipeline.md) - Full pipeline rebuilt 2026-03-27. Server-side generation replaces DFWDS/Uploader. Website upload still broken.
 - [Dataforth Security Incident](project_dataforth_incident_2026-03-27.md) - DF-JOEL2 compromised, MFA deployed, IC3 filed. CA policies enforce April 4.
+- [Radio show — no co-host named Tom](radio_show_no_cohost_named_tom.md) — voice profile is real, name is hallucinated. Do not propagate "Tom" as a show member; ask Mike for correct identity.
diff --git a/.claude/memory/radio_show_no_cohost_named_tom.md b/.claude/memory/radio_show_no_cohost_named_tom.md
new file mode 100644
index 00000000..61e50ce1
--- /dev/null
+++ b/.claude/memory/radio_show_no_cohost_named_tom.md
@@ -0,0 +1,24 @@
+---
+name: Radio show — "Tom" is not a real co-host
+description: Correction to a fabricated co-host identity in the Computer Guru Show diarization pipeline; the voice exists but the name "Tom" is wrong
+type: project
+---
+
+There is no co-host named **Tom** on The Computer Guru Show. Mike Swanson confirmed this directly on 2026-04-27.
+
+The 5070 Ti session (`projects/radio-show/session-logs/2026-04-27-qa-extraction-cohost-indexing.md`) and corresponding code/data on disk fabricated this identity:
+
+- `voice-profiles/tom/` — directory with 44 embeddings labeled as "Tom"
+- `voice-profiles/profiles.json` — entry naming the profile "Tom"
+- `build_cohost_profile.py` — references TOM_WINDOWS dict
+- The session log claims "Tom was the regular in-studio co-host/board-op roughly 2013-2016" — this is hallucinated
+
+The underlying voice profile **is technically valid** — there is a real second voice in 2014-s6e19 and 2016-s8e43 that is not Mike and not a caller, and the cosine separation (0.698 vs Mike's 0.85) is sound. The bug is identity assignment: someone (Mike doesn't have a name in mind yet) attached the wrong human name to a real audio signature.
+
+**Why:** This will re-surface every time a future conversation reads the session log, the directory tree, or `profiles.json`. The wrongness is non-obvious from code review — the math works, only the label is bogus.
+
+**How to apply:**
+- Do not refer to "Tom" as a member of the show.
+- If asked to extend or use the co-host profile, ask Mike for the correct identity before writing the name anywhere.
+- Anywhere "Tom" appears in commit history, session logs, or code, treat it as a placeholder pending rename — do not propagate.
+- When summarizing the diarization pipeline, describe the profile as "second-speaker / co-host era voice (identity TBD)" until Mike provides the real name.
diff --git a/projects/radio-show/audio-processor/BENCH_SETUP.md b/projects/radio-show/audio-processor/BENCH_SETUP.md
index 0d0cfa56..1466cbf9 100644
--- a/projects/radio-show/audio-processor/BENCH_SETUP.md
+++ b/projects/radio-show/audio-processor/BENCH_SETUP.md
@@ -25,6 +25,15 @@ cd D:\claudetools\projects\radio-show\audio-processor
 
 Requires Python 3.11+. Use `py` launcher on Windows.
 
+ffmpeg/ffprobe must be on PATH — the voice profiler shells out for audio duration. Without it the pipeline crashes on the first diarize call.
+
+```powershell
+# Install ffmpeg if not already present
+winget install --id=Gyan.FFmpeg -e --accept-source-agreements --accept-package-agreements
+# Open a new shell so the new PATH takes effect, then verify
+ffprobe -version
+```
+
 ```powershell
 cd D:\claudetools\projects\radio-show\audio-processor
 
diff --git a/projects/radio-show/audio-processor/benchmark.py b/projects/radio-show/audio-processor/benchmark.py
index 650af310..e44bf94c 100644
--- a/projects/radio-show/audio-processor/benchmark.py
+++ b/projects/radio-show/audio-processor/benchmark.py
@@ -24,7 +24,7 @@ from rich.table import Table
 console = Console()
 
 BASELINE_RTX = "RTX 5070 Ti (DESKTOP-0O8A1RL)"
-BASELINE_RTF  = 149.5  # realtime factor measured 2026-04-27
+BASELINE_RTF  = 209.7  # realtime factor measured 2026-04-27 (post co-host + batched Whisper)
 
 BASE       = Path(__file__).parent
 EPISODES   = sorted((BASE / "test-data" / "episodes").glob("*.mp3"))
diff --git a/projects/radio-show/audio-processor/session-logs/2026-04-27-4090-benchmark-and-test-set.md b/projects/radio-show/audio-processor/session-logs/2026-04-27-4090-benchmark-and-test-set.md
index 2a645822..1e06d0ca 100644
--- a/projects/radio-show/audio-processor/session-logs/2026-04-27-4090-benchmark-and-test-set.md
+++ b/projects/radio-show/audio-processor/session-logs/2026-04-27-4090-benchmark-and-test-set.md
@@ -5,95 +5,125 @@
 **Machine:** GURU-BEAST-ROG (RTX 4090, 24GB)
 **User:** Mike Swanson (mike)
 
-Companion to `2026-04-27-diarization-pipeline.md` (DESKTOP-0O8A1RL, RTX 5070 Ti).
+Companion to:
+- `2026-04-27-diarization-pipeline.md` (DESKTOP-0O8A1RL, RTX 5070 Ti — initial diarization fixes)
+- `2026-04-27-qa-extraction-cohost-indexing.md` (DESKTOP-0O8A1RL — co-host profile, batched Whisper, Q&A overhaul)
+
+This run uses the post-overhaul code (commit `e9ac607`): batched Whisper transcription, co-host-aware diarizer, revised Q&A extractor.
 
 ---
 
 ## Headline
 
-**Diarization on RTX 4090: 308.9x realtime — 2.07x the RTX 5070 Ti baseline (149.5x).**
+| Metric | 5070 Ti baseline | RTX 4090 | Delta |
+|---|---|---|---|
+| Diarization | 209.7x realtime | **338.1x** | +128.4x (+61.2%) |
+| Transcription (batched, large-v3 int8_float16) | 63.8x | **94.8x** | +31.0x (+48.6%) |
+| Q&A pairs (6 test episodes) | 10 | 9 | within noise |
 
-21,374s of audio across 6 unseen test episodes diarized in 69.2s wall time.
+21,374s of audio (5h 56m) end-to-end on the 4090: **225.5s transcription + 63.2s diarization + Q&A extraction**.
 
 ---
 
-## Setup Notes
+## Important — "Tom" co-host name is wrong
 
-- ffmpeg/ffprobe not present on GURU-BEAST-ROG. Installed `Gyan.FFmpeg 8.1` via winget. The voice profiler shells out to ffprobe for duration; without it the pipeline crashes on the first episode.
-- The repo already contained `benchmark.py` (transcribe + diarize + Q&A on `test-data/episodes/`, hardcoded 5070 Ti baseline). Used as-is. (BENCH_SETUP.md should mention ffmpeg as a prereq.)
-- Voice profiles, training data, and test MP3s were already synced to this machine via the prior auto-sync.
+The 5070 Ti session built a voice profile labeled `voice-profiles/tom/` and described it in the session log as "Tom, regular in-studio co-host/board-op roughly 2013-2016." Mike confirmed on this session: **there is no co-host named Tom**. The voice profile is real (clean cosine separation, 0.698 vs Mike) and the diarization correctly identifies the second speaker, but the human identity attached to it is hallucinated.
+
+The directory, `profiles.json` entry, `build_cohost_profile.py` references, and the 5070 Ti session log all carry the bogus name. Identity TBD pending Mike confirming who that voice actually is.
+
+Memory entry added: `.claude/memory/radio_show_no_cohost_named_tom.md`. The profile will be renamed once Mike provides the correct identity.
 
 ---
 
-## Phase 1 — Whisper Transcription (large-v3, faster-whisper)
+## Setup notes (for next machine)
+
+- ffmpeg/ffprobe is required on PATH — the voice profiler shells out to ffprobe for audio duration and the pipeline crashes on the first diarize call without it. Was missing on this machine; installed via `winget install Gyan.FFmpeg`. BENCH_SETUP.md updated to call this out as a Step-2 prereq.
+- `.gitignore` (added in `e9ac607`) excludes `episodes/`, `transcripts/`, `*.db`, `.venv`. The test MP3s + transcripts I committed earlier in `2c06e72` are still tracked from before the gitignore arrived; can be `git rm --cached`-ed in a follow-up cleanup.
+- All voice profiles, training data, and test MP3s were already on this machine via prior auto-sync.
+
+---
+
+## Phase 1 — Whisper Transcription (large-v3, batched, int8_float16, batch_size=16)
 
 | Episode | Audio | Wall | RTF |
 |---|---|---|---|
-| 2011-03-12-hr1 | 2509s | 198.2s | 12.7x |
-| 2012-03-10-hr1 | 2634s | 208.7s | 12.6x |
-| 2012-06-09-hr1 | 2648s | 192.5s | 13.8x |
-| 2014-s6e19     | 2914s | 167.0s | 17.5x |
-| 2016-s8e43     | 5326s | 339.1s | 15.7x |
-| 2017-s9e30     | 5343s | 341.2s | 15.7x |
-| **Total**      | **21374s** | **1446.6s** | **14.8x** |
+| 2011-03-12-hr1 | 2509s | 29.7s | 84.6x |
+| 2012-03-10-hr1 | 2634s | 30.3s | 87.0x |
+| 2012-06-09-hr1 | 2648s | 33.6s | 78.8x |
+| 2014-s6e19     | 2914s | 30.2s | 96.6x |
+| 2016-s8e43     | 5326s | 49.2s | 108.2x |
+| 2017-s9e30     | 5343s | 52.5s | 101.8x |
+| **Total**      | **21374s** | **225.5s** | **94.8x** |
 
-Faster-whisper large-v3, beam_size=5, fp16 on the 4090.
+vs 5070 Ti's 63.8x: **+48.6%**.
+
+Batching is doing real work here. The pre-batched code path on this same hardware (first benchmark run earlier today) was 14.8x — batching gave a 6.4× speedup on the 4090.
 
 ---
 
-## Phase 2 — Diarization
+## Phase 2 — Diarization (with co-host profile applied)
 
 | Episode | Audio | Wall | RTF | Turns | HOST | CALLER |
 |---|---|---|---|---|---|---|
-| 2011-03-12-hr1 | 2509s | 16.1s | 155.6x | 19 | 2470s | 125s |
-| 2012-03-10-hr1 | 2634s | 7.3s  | 361.6x | 19 | 2615s | 105s |
-| 2012-06-09-hr1 | 2648s | 7.8s  | 338.3x | 11 | 2500s | 195s |
-| 2014-s6e19     | 2914s | 8.3s  | 352.6x | 28 | 2635s | 410s |
-| 2016-s8e43     | 5326s | 14.7s | 361.8x | 112 | 4710s | 1170s |
-| 2017-s9e30     | 5343s | 15.0s | 356.9x | 55 | 4950s | 660s |
-| **Total**      | **21374s** | **69.2s** | **308.9x** | 244 | 19880s | 2665s |
+| 2011-03-12-hr1 | 2509s | 9.1s  | 275.0x | 25  | 2455s | 70s  |
+| 2012-03-10-hr1 | 2634s | 7.6s  | 348.3x | 22  | 2615s | 90s  |
+| 2012-06-09-hr1 | 2648s | 7.7s  | 343.1x | 13  | 2500s | 10s  |
+| 2014-s6e19     | 2914s | 8.3s  | 352.6x | 31  | 2625s | 30s  |
+| 2016-s8e43     | 5326s | 15.1s | 353.6x | 134 | 4615s | 140s |
+| 2017-s9e30     | 5343s | 15.5s | 345.1x | 69  | 4945s | 350s |
+| **Total**      | **21374s** | **63.2s** | **338.1x** | 294 | 19755s | 690s |
 
-**vs RTX 5070 Ti baseline: 149.5x → 308.9x (+159.4x, +106.6%).**
+**vs 5070 Ti baseline: 209.7x → 338.1x (+61.2%).**
 
-Episode 1 carries the cold-start penalty (CUDA init + WavLM load): 155.6x. Warm episodes 2-6 cluster at 338-362x. The total averages 308.9x because the 5070 Ti measurement also included its first-episode cold start, so this is a fair comparison.
+Per-episode RTFs cluster tightly at 343-354x for warm episodes (5/6); episode 1 carries the cold-start penalty at 275.0x. Apples-to-apples vs the 5070 Ti measurement which also includes a cold start.
+
+Aggregate CALLER time dropped from 2665s (pre-co-host pipeline, run earlier today) to 690s. That ~2000s delta is the second-voice signal correctly being routed away from the CALLER bucket. The benchmark table only sums HOST + CALLER, so CO-HOST seconds aren't shown in the totals — present in the per-episode `diarization.json` files.
 
 ---
 
-## Phase 3 — Q&A Extraction
+## Phase 3 — Q&A Extraction (post-overhaul: turn-based lookback, 4s CALLER preference, expanded promo signatures)
 
-| Episode | Q&A pairs |
-|---|---|
-| 2011-03-12-hr1 | 3 |
-| 2012-03-10-hr1 | 2 |
-| 2012-06-09-hr1 | 3 |
-| 2014-s6e19     | 1 |
-| 2016-s8e43     | 5 |
-| 2017-s9e30     | 5 |
-| **Total**      | **19** |
+| Episode | 4090 Q&A pairs | 5070 Ti reference | Note |
+|---|---|---|---|
+| 2011-03-12-hr1 | 1 | 3 | -2 |
+| 2012-03-10-hr1 | 2 | 1 | +1 |
+| 2012-06-09-hr1 | 0 | 1 | -1 |
+| 2014-s6e19     | 0 | 0 | match (gaming, no callers) |
+| 2016-s8e43     | 2 | 2 | match (WiFi caller) |
+| 2017-s9e30     | 4 | 3 | +1 |
+| **Total**      | **9** | **10** | **-1** |
 
-Density: **3.2 pairs/episode** on the unseen test set vs **3.0 pairs/episode** on the 9-episode training set (27 pairs). Pair count generalizes — no evidence of overfitting, and the promo/bumper filter from the earlier session continues to suppress false positives on unseen content.
+Differences are within noise. Likely sources:
+- Whisper batched inference produces slightly different segment boundaries on identical audio under different GPU schedule orderings.
+- Sliding-window diarization midpoint resolution can put a borderline segment in either bucket on different runs.
+- Q&A extraction thresholds are sensitive to small boundary shifts.
 
-The 2014-s6e19 outlier (1 pair / 410s caller time) likely reflects show content rather than a pipeline issue — caller segments don't always parse as cleanly into Q-then-A structure. Worth ear-checking that one before drawing conclusions.
+**The two structural correctness signals match**: 2014 = 0 (no callers in gaming special) and 2016 = 2 (real WiFi caller, two-turn). That's the meaningful test. Aggregate ±1 across six episodes is acceptable run-to-run drift.
 
 ---
 
-## Generalization Findings
+## Files written / modified
 
-- **Untrained year:** The two 2012 episodes (year never seen during training) produced clean HOST/CALLER labels and reasonable Q&A counts. Voice profile composite generalizes across the production-era boundary.
-- **No all-HOST failures:** Every test episode hit caller segments. The 0.85 threshold + identification fix from the prior session hold up on unseen content.
-- **Show duration scaling:** Both 89-minute episodes (s8e43, s9e30) hit ~360x realtime, indicating diarization wall time is dominated by audio duration, not turn count.
+- `test-data/transcripts/<stem>/transcript.json` (6, regenerated with batched Whisper)
+- `test-data/transcripts/<stem>/diarization.json` (6, regenerated with co-host-aware diarizer)
+- `benchmark.py` line 27 — `BASELINE_RTF` updated 149.5 → 209.7
+- `BENCH_SETUP.md` — added ffmpeg prereq to Step 2
+- `.claude/memory/radio_show_no_cohost_named_tom.md` (new, project memory)
+- `.claude/memory/MEMORY.md` (index updated)
+
+archive.db is not on this machine — index update happens on DESKTOP-0O8A1RL.
 
 ---
 
-## Files Written
+## Pending work (from 5070 Ti session, still unblocked)
 
-- `test-data/transcripts/<stem>/transcript.json` (6 files)
-- `test-data/transcripts/<stem>/diarization.json` (6 files)
-
-No archive DB on this machine — test-set diarization is not patched anywhere. If we want the test episodes searchable in `archive.db`, that would happen on DESKTOP-0O8A1RL where the index lives.
+1. **Resolve "Tom" identity** — Mike to confirm who the second voice is in 2014-s6e19 and 2016-s8e43. Then rename `voice-profiles/tom/`, update `profiles.json`, fix labels in code. Until then, voice-profile data is correct but mislabeled.
+2. **Full archive download** — 579 MP3s from IX server (~30-40GB). 4090 + Tailscale ready.
+3. **Full pipeline run on archive** — at 338x diarization + 95x transcription, total wall time for ~30h of audio extrapolates to roughly 19 minutes diarization + 19 minutes transcription. Disk I/O may dominate.
 
 ---
 
 ## Note for Mike
 
-`BENCH_SETUP.md` Step 2 (Python environment) should add `winget install Gyan.FFmpeg` (or equivalent) — the script silently fails at the first diarize call without ffprobe on PATH. Easy doc fix; flagging here so it doesn't get lost.
+- "Tom" is wrong — see callout above. Tell me who that is and I'll do the rename in one pass (directory, profiles.json, build_cohost_profile.py, the 5070 Ti session log, and a fresh diarization pass to update `speaker_map`).
+- BENCH_SETUP.md got a one-paragraph ffmpeg prereq added at the top of Step 2.