- src/transcriber.py: open transcript.{json,txt,srt} with encoding="utf-8".
Windows cp1252 default crashed on Whisper output containing U+2044.
- import_to_sqlite.py: new. Walks archive-data/transcripts, builds
archive.db (5 tables + 2 FTS5 virtual tables, sha256-keyed idempotency).
20.5 MB / 208 episodes at smoke-test time, 1.9s rebuild.
- batch_process.py: tracked from prior session — full-archive batch with
resumable transcribe/diarize/intros/qa pipeline.
- .gitignore: archive-data/ and logs/.
Session log: 2026-04-27-archive-batch-and-sqlite-import.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
28 lines
258 B
Plaintext
28 lines
258 B
Plaintext
# Python
|
|
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
.venv/
|
|
*.egg-info/
|
|
|
|
# Large data files
|
|
test-data/episodes/
|
|
test-data/transcripts/
|
|
episodes/
|
|
processed/
|
|
archive-data/
|
|
logs/
|
|
|
|
# Databases (regenerable)
|
|
*.db
|
|
*.sqlite
|
|
|
|
# Model cache
|
|
.cache/
|
|
*.pt
|
|
*.bin
|
|
|
|
# OS
|
|
.DS_Store
|
|
Thumbs.db
|