Backend min_score/exclude_banter wired through to HTML index. Adds
score badges (1-5 red->green), topic_class pills, dim styling on
banter rows. Live on http://172.16.3.20:8765/. Synced to portable
repo. pscp ENOSPC quirk worked around by plink-stdin streaming.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.5h run on qwen3:14b processed 1,405/1,407 Q/A pairs (2 failed,
will retry on next invocation). 37% scored 4-5 (useful), 41%
scored 1-2 (banter/promo/off-topic). API filter ready; Jupiter
redeploy pending Mike's manual review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Append to 2026-04-28-session.md covering the FastAPI/SQLite container
deploy: build + ship + verify, plus credentials, paths, and re-deploy
procedures for both DB updates and source updates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Execution-only follow-on to 2026-04-27. Both batch passes done (519+53,
0 errors), import_to_sqlite.py run incrementally to bring archive.db
to final state. Next step: Jupiter Docker container deploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Added 2010, 2015, 2018 test episodes to round out the test set to one
per available year:
- 2010-05-08-hr1 (May 2010, earliest available; pre-Tara era)
- 2015-s7e19 (Jan 2015, avoids training's s7e30)
- 2018-s10e18 (only 3 non-training 2018 episodes exist)
Archive has no 2019 directory — Rob's "2018/2019 appearances" are
constrained to the 5 available 2018 episodes only.
Per-year diarization summary (Tara presence, post-rename):
2010-05-08 30s 1.2% likely false positive (pre-Tara)
2011-03-12 140s 5.6% likely false positive (call-in only)
2012-03-10 30s 1.1% likely false positive (call-in only)
2012-06-09 340s 12.8% suspicious — Mike to confirm
2014-s6e19 680s 23.3% confirmed
2015-s7e19 280s 9.9% plausible — Mike to confirm
2016-s8e43 1890s 35.5% confirmed
2017-s9e30 610s 11.4% plausible
2018-s10e18 880s 17.1% COULD BE ROB — Mike flagged Rob for
2018/2019 appearances; cosine threshold may
be hitting on Rob being acoustically similar
to Tara
Total Tara across 9 episodes: 1h 21m / 8h 52m audio (15.3%).
Q&A counts (still suspect — every voice that isn't Mike-or-Tara is
labeled CALLER, so Randall/Rob/producers inflate the bucket):
2010=4, 2011=1, 2012a=2, 2012b=0, 2014=0, 2015=1, 2016=2, 2017=4, 2018=3
Total: 17 pairs across 9 episodes
4090 perf on the expanded set:
- Diarization: 31928s in 121.5s = 262.7x realtime (vs 209.7x on 5070 Ti, +25.3%)
- Transcription (3 new episodes only): 10554s in 112.4s = 93.9x
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mike confirmed there is no co-host named "Tom" — the voice in 2014-s6e19
and 2016-s8e43 is Tara. The 5070 Ti session fabricated the Tom identity.
The voice profile itself (44 embeddings, 0.698 cosine vs Mike) is correct;
only the human label was wrong.
Rename swept:
- voice-profiles/tom/ -> voice-profiles/tara/ (git mv preserves all .npy)
- voice-profiles/profiles.json: "Tom" key -> "Tara"
- build_cohost_profile.py: TOM_WINDOWS -> TARA_WINDOWS, COHOST_NAME, comments
- 2026-04-27-qa-extraction-cohost-indexing.md: correction header + body sweep
- 2026-04-27-4090-benchmark-and-test-set.md: closure note
- .claude/memory/radio_show_no_cohost_named_tom.md: resolution + speaker roster
Diarization re-run after rename so speaker_map emits "Cohost: Tara".
Q&A counts unchanged (rename is label-only): 9 pairs across 6 test episodes.
Tara distribution from the post-rename diarization (per-episode % of audio):
2011-03-12-hr1 140s 5.6% likely false positive (call-in only)
2012-03-10-hr1 30s 1.1% likely false positive (call-in only)
2012-06-09-hr1 340s 12.8% suspicious — pending Mike confirm
2014-s6e19 680s 23.3% confirmed
2016-s8e43 1890s 35.5% confirmed
2017-s9e30 610s 11.4% plausible — pending Mike confirm
Broader speaker-roster context Mike provided this session (saved to
memory): the show has had multiple co-hosts (Tara, Randall, Rob) plus
producers/board ops (Andrew, Shannon, Ken, others) who would sometimes
go on-air. Only Tara has a profile so far. Every other speaker is
currently labeled CALLER, which means small CO-HOST attributions in
unexpected episodes (e.g. 2011/2012) may actually be a producer rather
than a false positive — Mike to spot-check.
Action item before full-archive run: build profiles for Randall, Rob,
and the named producers to avoid systematic Q&A false positives in
early-years and 2018/2019 episodes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-ran benchmark.py on GURU-BEAST-ROG against the post-overhaul code
(co-host profile, batched Whisper int8_float16, revised Q&A extractor).
Results vs 5070 Ti baseline:
- Diarization: 209.7x -> 338.1x (+61.2%)
- Transcription: 63.8x -> 94.8x (+48.6%)
- Q&A pairs: 9 vs 10 (within run-to-run noise; structural correctness matches:
2014 = 0 callers, 2016 = 2 WiFi caller pairs)
Setup change: BENCH_SETUP.md now lists ffmpeg as a Step-2 prereq
(winget install Gyan.FFmpeg). Was missing on this machine and the pipeline
fails silently at the first diarize call without ffprobe.
Code change: benchmark.py BASELINE_RTF updated 149.5 -> 209.7 to reflect
the 5070 Ti's post-overhaul measurement (e9ac607).
Data: 6 test episode transcripts and diarizations regenerated under the
new code path (batched Whisper output + co-host-aware speaker_map).
Correction memory: voice-profiles/tom/ directory + 5070 Ti session log
fabricated a co-host named "Tom" — Mike confirms no such person exists on
the show. The audio profile is real and the diarization separation is
sound, but the human identity attached to it is wrong. Saved under
.claude/memory/radio_show_no_cohost_named_tom.md pending Mike providing
the correct name for rename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>