claudetools/.claude/memory/project_audio_processor_architecture.md at 814310c9e1af3e0300499e4720b4e88ff1d249dd

Files

Mike Swanson ad88fc31f0 sync: Auto-sync from acg-guru-5070 at 2026-03-22 22:31:46

Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: acg-guru-5070
Timestamp: 2026-03-22 22:31:46

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-03-22 22:31:46 -07:00

2.0 KiB

Raw Blame History

name, description, type

name	description	type
Audio Processor - Segment-First Architecture	Revised pipeline architecture - detect breaks and split into segments BEFORE transcription for complete content capture	project

Revised Pipeline Architecture (decided 2026-03-22)

Shows are almost always 4 segments per hour (8 total for a 2-hour show). Extra breaks are rare.

Old approach: Transcribe full episode -> truncate to fit LLM context -> analyze (loses content)

New approach: Detect breaks first (audio-only) -> split into ~8 segments -> transcribe each -> analyze each with full context -> cross-segment synthesis

Pipeline Order

Audio-level break detection (no transcript needed) — loudness/compression jumps, silence gaps, known bumper fingerprints, HR1/HR2 boundary
Split into segments — ~7-15 min each, complete audio chunks
Transcribe each segment — smaller files, complete content, no truncation
Analyze each segment — full transcript fits in LLM context window easily
Cross-segment synthesis — detect topics spanning segments, callbacks ("going back to what we said before the break"), narrative arc
Generate content — blog posts, forum posts, episode summary from complete analysis

Key Insights

4 segments/hour is a strong structural prior for break detection — if 12-18 min into a segment and audio signatures appear, almost certainly a break. At 5 min, probably not.
Each segment transcript is ~5-10K chars — fits in any LLM context with room for detailed prompts
Cross-segment synthesis pass is new and essential for catching callbacks and recurring topics

Why: Solves the context window truncation problem that loses show content. Each segment gets complete analysis.

How to apply: This is the architecture direction for all future audio processor work. The existing Stage 3 segment detector needs to work without transcript input (audio-only signals). Stage 6 analyzer needs per-segment + synthesis passes.

2.0 KiB Raw Blame History

Revised Pipeline Architecture (decided 2026-03-22)

Pipeline Order

Key Insights

2.0 KiB

Raw Blame History