radio: utf-8 transcript writes + sqlite archive importer + session log
- src/transcriber.py: open transcript.{json,txt,srt} with encoding="utf-8".
Windows cp1252 default crashed on Whisper output containing U+2044.
- import_to_sqlite.py: new. Walks archive-data/transcripts, builds
archive.db (5 tables + 2 FTS5 virtual tables, sha256-keyed idempotency).
20.5 MB / 208 episodes at smoke-test time, 1.9s rebuild.
- batch_process.py: tracked from prior session — full-archive batch with
resumable transcribe/diarize/intros/qa pipeline.
- .gitignore: archive-data/ and logs/.
Session log: 2026-04-27-archive-batch-and-sqlite-import.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -90,15 +90,15 @@ class Transcript:
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# JSON with full detail
|
||||
with open(output_dir / "transcript.json", "w") as f:
|
||||
with open(output_dir / "transcript.json", "w", encoding="utf-8") as f:
|
||||
json.dump(self.to_dict(), f, indent=2)
|
||||
|
||||
# Plain text
|
||||
with open(output_dir / "transcript.txt", "w") as f:
|
||||
with open(output_dir / "transcript.txt", "w", encoding="utf-8") as f:
|
||||
f.write(self.full_text)
|
||||
|
||||
# SRT subtitles
|
||||
with open(output_dir / "transcript.srt", "w") as f:
|
||||
with open(output_dir / "transcript.srt", "w", encoding="utf-8") as f:
|
||||
f.write(self.to_srt())
|
||||
|
||||
console.print(f"[green]Transcript saved to {output_dir}[/green]")
|
||||
|
||||
Reference in New Issue
Block a user