sync: Auto-sync from acg-guru-5070 at 2026-03-22 22:31:46
Synced files: - Session logs updated - Latest context and credentials - Command/directive updates Machine: acg-guru-5070 Timestamp: 2026-03-22 22:31:46 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
32
.claude/memory/project_audio_processor_architecture.md
Normal file
32
.claude/memory/project_audio_processor_architecture.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
name: Audio Processor - Segment-First Architecture
|
||||
description: Revised pipeline architecture - detect breaks and split into segments BEFORE transcription for complete content capture
|
||||
type: project
|
||||
---
|
||||
|
||||
## Revised Pipeline Architecture (decided 2026-03-22)
|
||||
|
||||
Shows are almost always 4 segments per hour (8 total for a 2-hour show). Extra breaks are rare.
|
||||
|
||||
**Old approach:** Transcribe full episode -> truncate to fit LLM context -> analyze (loses content)
|
||||
|
||||
**New approach:** Detect breaks first (audio-only) -> split into ~8 segments -> transcribe each -> analyze each with full context -> cross-segment synthesis
|
||||
|
||||
### Pipeline Order
|
||||
|
||||
1. **Audio-level break detection** (no transcript needed) — loudness/compression jumps, silence gaps, known bumper fingerprints, HR1/HR2 boundary
|
||||
2. **Split into segments** — ~7-15 min each, complete audio chunks
|
||||
3. **Transcribe each segment** — smaller files, complete content, no truncation
|
||||
4. **Analyze each segment** — full transcript fits in LLM context window easily
|
||||
5. **Cross-segment synthesis** — detect topics spanning segments, callbacks ("going back to what we said before the break"), narrative arc
|
||||
6. **Generate content** — blog posts, forum posts, episode summary from complete analysis
|
||||
|
||||
### Key Insights
|
||||
|
||||
- 4 segments/hour is a strong structural prior for break detection — if 12-18 min into a segment and audio signatures appear, almost certainly a break. At 5 min, probably not.
|
||||
- Each segment transcript is ~5-10K chars — fits in any LLM context with room for detailed prompts
|
||||
- Cross-segment synthesis pass is new and essential for catching callbacks and recurring topics
|
||||
|
||||
**Why:** Solves the context window truncation problem that loses show content. Each segment gets complete analysis.
|
||||
|
||||
**How to apply:** This is the architecture direction for all future audio processor work. The existing Stage 3 segment detector needs to work without transcript input (audio-only signals). Stage 6 analyzer needs per-segment + synthesis passes.
|
||||
Reference in New Issue
Block a user