Files

Mike Swanson 6239f9fc3a radio: session log update — index UI exposes classifier filters

Backend min_score/exclude_banter wired through to HTML index. Adds
score badges (1-5 red->green), topic_class pills, dim styling on
banter rows. Live on http://172.16.3.20:8765/. Synced to portable
repo. pscp ENOSPC quirk worked around by plink-stdin streaming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 06:07:00 -07:00

19 KiB

Raw Blame History

Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy

Project: The Computer Guru Show — Archive Mining System Goal: Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container Machine: GURU-BEAST-ROG (RTX 4090) User: Mike Swanson (mike) Continues from: 2026-04-29-qa-quality-classifier.md (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB)

User

User: Mike Swanson (mike)
Machine: GURU-BEAST-ROG
Role: admin

Session Summary

The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed.

Verified Tailscale state from tailscale status --json — pfsense-2 (100.119.153.74) advertises 172.16.0.0/22 as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to 172.16.3.20:8765 already accepts subnet-routed traffic without modification.

Built a new private Gitea repo azcomputerguru/radio-archive-portable with eight files: server code (identical to upstream), a sync-db.sh that curl-fetches archive.db from a new /api/db.sqlite endpoint, a run.sh that creates a venv on first invocation and starts uvicorn on localhost:8765, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed.

Added the /api/db.sqlite endpoint to the upstream server/main.py using FastAPI FileResponse. Disclosure equivalence: anyone who can reach /api/search already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd main.py + classified archive.db, then docker compose up -d --build). Verified end-to-end: GET /api/db.sqlite returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; GET /api/search?min_score=4 filters correctly with the new fields in the response.

Key Decisions

Subnet routing already in place — confirmed via tailscale status --json that pfsense-2 advertises 172.16.0.0/22 as primary route. No new daemons or routing changes required. Container bind to 172.16.3.20:8765 is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener.
/api/db.sqlite over HTTP instead of SSH/SCP for the DB sync — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with /api/search (which already returns every transcript) means no auth was added to either.
Separate repo for the portable bundle — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at git.azcomputerguru.com/azcomputerguru/radio-archive-portable (private, under the user namespace).
DB excluded from the repo via gitignore — the 60 MB blob is fetched via sync-db.sh on first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to .partial, validate size, rename into place).
Used docker compose up -d --build (combined) instead of separate build then up — separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable.
Stripped API token from .git/config after push — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt.

Problems Encountered

First deploy attempt landed but rebuild didn't happen — chained docker compose build && docker compose up -d via plink completed exit-code-0 but the container kept running yesterday's code (verified via docker exec radio-archive grep db.sqlite /app/main.py returning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by using docker compose up -d --build as a single foreground command.
Bash background-task output capture flaky on long plink runs — early deploy attempts went into the Bash tool's run_in_background mode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously.
/tmp path clash between git-bash and Windows Python — a smoke-test command tried to fetch the DB via curl (using /tmp/test-db.sqlite) and then read it with python -c (also writing /tmp/...). Different tools resolved /tmp differently on Windows. Switched to a project-local test-fetched.db path to avoid the issue.
Gitea API at /api/v1/orgs/azcomputerguru/repos returned 404 — azcomputerguru is a USER, not an org. Repo creation succeeded via /api/v1/user/repos instead. (The token's owner is azcomputerguru, so user-namespace creation worked.)
HEAD /api/db.sqlite returns 405 Method Not Allowed — FastAPI's default routing only registers GET. A HEAD is fine to fail because the sync script uses GET. Documented behavior, not a bug.

Credentials Used

Jupiter (Unraid Primary)

Vault path: infrastructure/jupiter-unraid-primary.sops.yaml
Host: 172.16.3.20
User: root
Password: Th1nk3r^99##
iDRAC IP: 172.16.1.73 / root / Window123!@#-idrac

Gitea

Vault path: services/gitea.sops.yaml
URL: https://git.azcomputerguru.com
Username: azcomputerguru
Password: Gptf*77ttb123!@#-git (alt: Window123!@#-git)
API token (used this session): 9b1da4b79a38ef782268341d25a4b6880572063f
SSH: ssh://git@172.16.3.20:2222

New Repo

Clone URL: https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
SSH URL: git@172.16.3.21:azcomputerguru/radio-archive-portable.git
Visibility: private
Default branch: main

Infrastructure Touched

Host	IP	Role	Action
Jupiter (Unraid Primary)	172.16.3.20	Hypervisor + Docker host	pscp'd updated `main.py` + `archive.db`; `docker compose up -d --build`
Radio-archive container	container on Jupiter, bind `172.16.3.20:8765`	FastAPI + SQLite	Rebuilt with new endpoint; restarted with classifier-populated DB
Gitea (on Jupiter, port 3000)	git.azcomputerguru.com	Source hosting	New repo created via API
pfsense-2 router	(Tailscale `100.119.153.74`)	Subnet router	No changes — verified existing 172.16.0.0/22 advertisement

Tailscale state at session time

100.101.122.4    guru-beast-rog     (this machine, online)
100.65.158.123   mikes-macbook-air  (last seen 4m before check)
100.95.216.79    acg-guru-5070      (offline 30d ago — boot it up next week)
100.119.153.74   pfsense-2          (active; advertises 172.16.0.0/22 as PRIMARY)

Files Created / Modified

New repo: `radio-archive-portable/`

Path	Purpose
`README.md`	Quick-start, refresh procedure, architecture diagram
`server/main.py`	Identical to deployed upstream (with `/api/db.sqlite`)
`server/requirements.txt`	`fastapi==0.115.6`, `uvicorn[standard]==0.34.0`
`sync-db.sh`	`curl -fSL -o archive-data/archive.db.partial $URL && mv` (atomic)
`run.sh`	Creates `.venv` on first run, then `uvicorn server.main:app --host 127.0.0.1 --port 8765`
`.env.example`	`ARCHIVE_HOST=172.16.3.20:8765`, `ARCHIVE_DB=archive-data/archive.db`, `PORT=8765`
`.gitignore`	Excludes `archive-data/archive.db`, `.venv/`, `.env`, etc.
`archive-data/.gitkeep`	Placeholder so the dir exists in git but the DB file doesn't

ClaudeTools (upstream)

Path	Change
`projects/radio-show/audio-processor/server/main.py`	+18 / -1 — added `from fastapi.responses import FileResponse` and the `/api/db.sqlite` GET endpoint

Jupiter (deployed state)

Path	Change
`/mnt/user/appdata/radio-archive/app/main.py`	Replaced (now matches `5e3b1a2`)
`/mnt/user/appdata/radio-archive/data/archive.db`	Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored)
Container `radio-archive`	Rebuilt to image `radio-archive:latest` (`sha256:dbb5ad62bdb1...`), running

Commands Run

Tailscale verification (local)

tailscale status --json | grep -E "advertis|route|172\.|primary"
# Confirmed 172.16.0.0/22 listed under PrimaryRoutes

New repo creation

curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \
  -H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \
  -d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}'
# HTTP 201, repo id 12

cd /c/Users/guru/radio-archive-portable
git init -b main
git config user.name "Mike Swanson"
git config user.email "mike@azcomputerguru.com"
git add -A && git commit
git remote add origin https://azcomputerguru:<token>@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
git push -u origin main
git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git  # strip token

Jupiter deploy

"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
  c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py

"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
  c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
# 60.5 MB at ~580 KB/s = ~100 seconds

"/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \
  "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
# Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running

Live verification

curl -sS http://172.16.3.20:8765/api/stats
# {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,...

curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \
  http://172.16.3.20:8765/api/db.sqlite
# HTTP 200 | dl=60583936B

.venv/Scripts/python.exe -c "
import sqlite3
db = sqlite3.connect('test-fetched.db')
print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone())
"
# (1405,)

curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2'
# returns 2 hits, each with usefulness_score=5, topic_class='computer-help'

Pending / Next

Test the laptop install end-to-end when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine.
HTML index UI update — backend supports min_score and exclude_banter query params, but the search UI on / doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is.
Re-run the 2 failed classifier rows — classify_qa_quality.py re-invocation will retry the NULL-scored rows; one-line cleanup.
Track 2 (voice profile clustering) — still deferred. Lower priority since content-quality filter solved most of the search-quality problem.
Track 3 (speaker oracle wiring through to search UI) — still deferred. speaker_oracle.py resolves names from intros but the search results still show "CALLER" rather than the resolved name.

Reference

Endpoints (all live on http://172.16.3.20:8765/ as of this commit)

Method	Path	Notes
GET	`/`	Search UI (no min_score toggle yet — query string works manually)
GET	`/api/stats`	Counts and per-year breakdown
GET	`/api/episodes?year=YYYY&limit=N`	Episode list
GET	`/api/episodes/{id}`	Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter)
GET	`/api/episodes/{id}/transcript`	Chronological merged segments + turns
GET	`/api/search?q=...&kind=both\|segments\|qa&min_score=N&exclude_banter=true&limit=N`	FTS5
GET	`/api/callers?limit=N`	Top recurring caller_names
GET	`/api/db.sqlite`	NEW — streams the read-only DB blob (60 MB)

Laptop next-week recipe (5070 / Linux)

# Tailscale already enabled on the laptop and on pfsense-2
git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
cd radio-archive-portable
./sync-db.sh         # pulls from 172.16.3.20:8765/api/db.sqlite
./run.sh             # creates .venv, starts uvicorn on localhost:8765
xdg-open http://localhost:8765/

Refreshing: ./sync-db.sh any time. Atomic — partial download won't corrupt existing DB.

macOS variant (mikes-macbook-air, if used)

Same recipe. python3 -m venv works on Mac. xdg-open → open.

Jupiter redeploy procedure (when source or DB changes)

# Source change:
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp server/main.py \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/app/
"/c/Program Files/PuTTY/plink.exe" -ssh -pw <pw> root@172.16.3.20 \
  "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"

# DB-only change (no container restart needed):
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp archive-data/archive.db \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db

The SQLite connection on the container side is mode=ro URI — picks up fresh DB on next request without restart.

Status at session end

Upstream container rebuilt + running with /api/db.sqlite endpoint live
Classified DB deployed to Jupiter (1,405/1,407 scored)
Portable repo created and pushed to git.azcomputerguru.com/azcomputerguru/radio-archive-portable
Laptop install is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week)
ClaudeTools commits: 5e3b1a2 (this session's main.py change)
Untested edge cases: offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1)

Update: 06:05 — Index UI exposes classifier filters

User asked to wire the new classifier fields into the search UI. The backend already supported min_score and exclude_banter query params (commit 5e3b1a2); this update brings them into the HTML index and adds visible quality indicators on Q/A hits.

Update Summary

Edited INDEX_HTML in server/main.py to add two filter controls and score badges. Verified locally via uvicorn on 127.0.0.1:8866 against the classifier-populated DB (no-filter, min_score=4, and exclude_banter=true modes all behaved correctly). Hit an unexpected No space left on device from pscp despite Jupiter having 37 TB free on /mnt/user; bypassed by streaming the file through plink stdin (plink ... "cat > /path" < local_file). md5 verified byte-identical. Container rebuilt via docker compose up -d --build. Synced the same main.py to the portable repo so the laptop UI stays in sync.

What changed in the UI

min score select — values: any, 2+, 3+, 4+, 5. Default any to preserve old search behavior. Filters surface 1,096 mid-and-above pairs at 3+ or 523 useful pairs at 4+.
hide banter checkbox — when checked, drops the 606 rows with is_banter=1.
Score badge per Q/A hit — small color-coded number (1=red, 5=green) next to each hit's metadata line. Title attribute shows usefulness N/5 on hover.
Topic class tag — small gray pill showing computer-help, banter, off-topic, promo, or unclear.
Dimmed rendering — hits with score 1-2 or is_banter=true render at 55% opacity. Visible but visually de-emphasized so good hits stand out at a glance.
escapeHtml helper — defensive XSS guard on caller_name and title (transcript-derived strings).

Key Decisions (this update)

Default filter "any" — preserves prior search habits and saved URLs. Mike opts into filtering when needed rather than being forced into a curated view.
URLSearchParams instead of string concat — only emits min_score= / exclude_banter= when non-default, keeping URL bar clean for the common case.
Color-coded badge with both score AND topic tag — score is numeric/comparable; topic tag is categorical and explains why a score is what it is. Both together make the classifier's reasoning visible at a glance without forcing a click.
Dim instead of hide for low-quality hits — keeps everything visible by default; the filter controls are the explicit "hide" lever.
Used plink "cat > path" instead of pscp for the deploy when pscp failed — faster than diagnosing the underlying scp/shfs issue and gets the job done deterministically.

Problems Encountered (this update)

pscp ENOSPC despite 37 TB free — pscp main.py failed with No space left on device on two retries. df showed 37 TB free on /mnt/user, df -i showed inodes fine. Workaround: plink ... "cat > /path/main.py" < local_main.py. md5sum confirmed byte-identical post-transfer. Likely Unraid shfs cache-pool churn or an issue with overwriting an in-use file from inside a container's mount. Worth understanding eventually but didn't block the deploy.
plink output buffering on chained docker commands — long docker compose up -d --build runs hung from Bash's run_in_background view (output file stayed empty for minutes). Foreground sync run with the same command worked instantly. Same pattern observed yesterday. Workaround: don't background long plink runs; just block.

Files Changed (this update)

Path	Change
`projects/radio-show/audio-processor/server/main.py`	+51 / -4 — INDEX_HTML gained controls + badge styles + topic tag + escapeHtml + dim-class JS rendering
`c:/Users/guru/radio-archive-portable/server/main.py`	Same diff, synced from upstream

Commits (this update)

Repo	SHA	Branch
ClaudeTools	`b9af34f`	main
radio-archive-portable	`1d6c795`	main

Live verification

$ curl -s http://172.16.3.20:8765/ | grep -cE "min_score|exclude_banter|badge.s5|topic_class"
10
$ curl -so /dev/null -w "%{size_download}\n" http://172.16.3.20:8765/
5757   # was 4040 before
$ curl -s 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2' \
    | python -c "import sys,json; d=json.load(sys.stdin); print('hits:', len(d['qa']))"
hits: 2   # both score=5, topic_class='computer-help'

Status at update end

UI controls live on http://172.16.3.20:8765/ and on the portable repo
Backend filters working (verified end-to-end)
Untouched: HTML still has no per-hit deep-link to /api/episodes/{id} (clicking a hit doesn't navigate). Future enhancement.
Pending: laptop validation (still next week's task)

19 KiB Raw Blame History

Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy

User

Session Summary

Key Decisions

Problems Encountered

Credentials Used

Jupiter (Unraid Primary)

Gitea

New Repo

Infrastructure Touched

Tailscale state at session time

Files Created / Modified

New repo: radio-archive-portable/

ClaudeTools (upstream)

Jupiter (deployed state)

Commands Run

Tailscale verification (local)

New repo creation

Jupiter deploy

Live verification

Pending / Next

Reference

Endpoints (all live on http://172.16.3.20:8765/ as of this commit)

Laptop next-week recipe (5070 / Linux)

macOS variant (mikes-macbook-air, if used)

Jupiter redeploy procedure (when source or DB changes)

Status at session end

Update: 06:05 — Index UI exposes classifier filters

Update Summary

What changed in the UI

Key Decisions (this update)

Problems Encountered (this update)

Files Changed (this update)

Commits (this update)

Live verification

Status at update end

19 KiB

Raw Blame History

New repo: `radio-archive-portable/`