Adds an Ollama-based content quality classifier and exposes the results via the search API. 1,407 existing Q/A pairs were scored in 3.5h via qwen3:14b (1,405 succeeded, 2 failed). Distribution: 37% scored 4-5 (useful), 41% scored 1-2 (banter/promo/ off-topic). 43% flagged as banter overall. Default-on filtering at search time will hide ~half of the noise without losing any real listener questions. Files: - new classify_qa_quality.py: walks qa_pairs, calls Ollama qwen3:14b per row, writes usefulness_score/topic_class/is_banter back to DB. Idempotent (--rebuild to reprocess), --smoke for sample check, --limit for partial runs. Detached run handles 1407 rows in ~3.5h on a 4090. - server/main.py: /api/search accepts min_score (0-5) and exclude_banter query params. NULL scores treat as "include" so unprocessed rows still appear. Episode detail endpoint includes the new fields in qa results. Schema migration in import_to_sqlite.py was made by the same agent run (visible on the live archive.db: usefulness_score / topic_class / is_banter columns now exist on qa_pairs). Local archive.db updated; Jupiter container has NOT been redeployed yet — that is a separate manual step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19 KiB
19 KiB