- Build Tom (co-host) voice profile (44 embeddings, 0.698 similarity to Mike) - diarizer.py: add CO-HOST speaker label for cohost-role profiles - voice_profiler.py: emit "Cohost: <name>" label for cohost role - qa_extractor.py: overlap resolution at load time (midpoint boundary split), 4s CALLER-preference threshold, turn-based caller-intro lookback (2 HOST turns), _preceded_by_caller_intro() helper, _PHONE_GREETING pattern, 751-1041 + "we'll get your problem solved" promo signatures - benchmark.py: use src.transcriber.transcribe with batch_size=16 - add index_test_episodes.py and build_cohost_profile.py scripts - add .gitignore (exclude episodes, transcripts, *.db, .venv) - session log: 2026-04-27-qa-extraction-cohost-indexing.md Result: 2016-s8e43 drops from 12 false-positive Q&A pairs to 2 real caller pairs. archive.db: 6 episodes, 762 segments, 10 Q&A pairs, FTS5 search verified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
26 lines
238 B
Plaintext
26 lines
238 B
Plaintext
# Python
|
|
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
.venv/
|
|
*.egg-info/
|
|
|
|
# Large data files
|
|
test-data/episodes/
|
|
test-data/transcripts/
|
|
episodes/
|
|
processed/
|
|
|
|
# Databases (regenerable)
|
|
*.db
|
|
*.sqlite
|
|
|
|
# Model cache
|
|
.cache/
|
|
*.pt
|
|
*.bin
|
|
|
|
# OS
|
|
.DS_Store
|
|
Thumbs.db
|