Files
Mike Swanson 82940d96d7 radio: utf-8 transcript writes + sqlite archive importer + session log
- src/transcriber.py: open transcript.{json,txt,srt} with encoding="utf-8".
  Windows cp1252 default crashed on Whisper output containing U+2044.
- import_to_sqlite.py: new. Walks archive-data/transcripts, builds
  archive.db (5 tables + 2 FTS5 virtual tables, sha256-keyed idempotency).
  20.5 MB / 208 episodes at smoke-test time, 1.9s rebuild.
- batch_process.py: tracked from prior session — full-archive batch with
  resumable transcribe/diarize/intros/qa pipeline.
- .gitignore: archive-data/ and logs/.

Session log: 2026-04-27-archive-batch-and-sqlite-import.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 19:38:02 -07:00

28 lines
258 B
Plaintext

# Python
__pycache__/
*.pyc
*.pyo
.venv/
*.egg-info/
# Large data files
test-data/episodes/
test-data/transcripts/
episodes/
processed/
archive-data/
logs/
# Databases (regenerable)
*.db
*.sqlite
# Model cache
.cache/
*.pt
*.bin
# OS
.DS_Store
Thumbs.db