Backend min_score/exclude_banter wired through to HTML index. Adds score badges (1-5 red->green), topic_class pills, dim styling on banter rows. Live on http://172.16.3.20:8765/. Synced to portable repo. pscp ENOSPC quirk worked around by plink-stdin streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19 KiB
Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy
Project: The Computer Guru Show — Archive Mining System
Goal: Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container
Machine: GURU-BEAST-ROG (RTX 4090)
User: Mike Swanson (mike)
Continues from: 2026-04-29-qa-quality-classifier.md (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB)
User
- User: Mike Swanson (mike)
- Machine: GURU-BEAST-ROG
- Role: admin
Session Summary
The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed.
Verified Tailscale state from tailscale status --json — pfsense-2 (100.119.153.74) advertises 172.16.0.0/22 as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to 172.16.3.20:8765 already accepts subnet-routed traffic without modification.
Built a new private Gitea repo azcomputerguru/radio-archive-portable with eight files: server code (identical to upstream), a sync-db.sh that curl-fetches archive.db from a new /api/db.sqlite endpoint, a run.sh that creates a venv on first invocation and starts uvicorn on localhost:8765, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed.
Added the /api/db.sqlite endpoint to the upstream server/main.py using FastAPI FileResponse. Disclosure equivalence: anyone who can reach /api/search already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd main.py + classified archive.db, then docker compose up -d --build). Verified end-to-end: GET /api/db.sqlite returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; GET /api/search?min_score=4 filters correctly with the new fields in the response.
Key Decisions
- Subnet routing already in place — confirmed via
tailscale status --jsonthat pfsense-2 advertises172.16.0.0/22as primary route. No new daemons or routing changes required. Container bind to172.16.3.20:8765is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener. /api/db.sqliteover HTTP instead of SSH/SCP for the DB sync — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with/api/search(which already returns every transcript) means no auth was added to either.- Separate repo for the portable bundle — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at
git.azcomputerguru.com/azcomputerguru/radio-archive-portable(private, under the user namespace). - DB excluded from the repo via gitignore — the 60 MB blob is fetched via
sync-db.shon first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to.partial, validate size, rename into place). - Used
docker compose up -d --build(combined) instead of separatebuildthenup— separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable. - Stripped API token from
.git/configafter push — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt.
Problems Encountered
- First deploy attempt landed but rebuild didn't happen — chained
docker compose build && docker compose up -dvia plink completed exit-code-0 but the container kept running yesterday's code (verified viadocker exec radio-archive grep db.sqlite /app/main.pyreturning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by usingdocker compose up -d --buildas a single foreground command. - Bash background-task output capture flaky on long plink runs — early deploy attempts went into the Bash tool's
run_in_backgroundmode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously. /tmppath clash between git-bash and Windows Python — a smoke-test command tried to fetch the DB via curl (using/tmp/test-db.sqlite) and then read it withpython -c(also writing/tmp/...). Different tools resolved/tmpdifferently on Windows. Switched to a project-localtest-fetched.dbpath to avoid the issue.- Gitea API at
/api/v1/orgs/azcomputerguru/reposreturned 404 —azcomputerguruis a USER, not an org. Repo creation succeeded via/api/v1/user/reposinstead. (The token's owner isazcomputerguru, so user-namespace creation worked.) HEAD /api/db.sqlitereturns 405 Method Not Allowed — FastAPI's default routing only registers GET. AHEADis fine to fail because the sync script usesGET. Documented behavior, not a bug.
Credentials Used
Jupiter (Unraid Primary)
- Vault path:
infrastructure/jupiter-unraid-primary.sops.yaml - Host: 172.16.3.20
- User: root
- Password:
Th1nk3r^99## - iDRAC IP: 172.16.1.73 / root /
Window123!@#-idrac
Gitea
- Vault path:
services/gitea.sops.yaml - URL: https://git.azcomputerguru.com
- Username:
azcomputerguru - Password:
Gptf*77ttb123!@#-git(alt:Window123!@#-git) - API token (used this session):
9b1da4b79a38ef782268341d25a4b6880572063f - SSH:
ssh://git@172.16.3.20:2222
New Repo
- Clone URL: https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
- SSH URL:
git@172.16.3.21:azcomputerguru/radio-archive-portable.git - Visibility: private
- Default branch: main
Infrastructure Touched
| Host | IP | Role | Action |
|---|---|---|---|
| Jupiter (Unraid Primary) | 172.16.3.20 | Hypervisor + Docker host | pscp'd updated main.py + archive.db; docker compose up -d --build |
| Radio-archive container | container on Jupiter, bind 172.16.3.20:8765 |
FastAPI + SQLite | Rebuilt with new endpoint; restarted with classifier-populated DB |
| Gitea (on Jupiter, port 3000) | git.azcomputerguru.com | Source hosting | New repo created via API |
| pfsense-2 router | (Tailscale 100.119.153.74) |
Subnet router | No changes — verified existing 172.16.0.0/22 advertisement |
Tailscale state at session time
100.101.122.4 guru-beast-rog (this machine, online)
100.65.158.123 mikes-macbook-air (last seen 4m before check)
100.95.216.79 acg-guru-5070 (offline 30d ago — boot it up next week)
100.119.153.74 pfsense-2 (active; advertises 172.16.0.0/22 as PRIMARY)
Files Created / Modified
New repo: radio-archive-portable/
| Path | Purpose |
|---|---|
README.md |
Quick-start, refresh procedure, architecture diagram |
server/main.py |
Identical to deployed upstream (with /api/db.sqlite) |
server/requirements.txt |
fastapi==0.115.6, uvicorn[standard]==0.34.0 |
sync-db.sh |
curl -fSL -o archive-data/archive.db.partial $URL && mv (atomic) |
run.sh |
Creates .venv on first run, then uvicorn server.main:app --host 127.0.0.1 --port 8765 |
.env.example |
ARCHIVE_HOST=172.16.3.20:8765, ARCHIVE_DB=archive-data/archive.db, PORT=8765 |
.gitignore |
Excludes archive-data/archive.db, .venv/, .env, etc. |
archive-data/.gitkeep |
Placeholder so the dir exists in git but the DB file doesn't |
ClaudeTools (upstream)
| Path | Change |
|---|---|
projects/radio-show/audio-processor/server/main.py |
+18 / -1 — added from fastapi.responses import FileResponse and the /api/db.sqlite GET endpoint |
Jupiter (deployed state)
| Path | Change |
|---|---|
/mnt/user/appdata/radio-archive/app/main.py |
Replaced (now matches 5e3b1a2) |
/mnt/user/appdata/radio-archive/data/archive.db |
Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored) |
Container radio-archive |
Rebuilt to image radio-archive:latest (sha256:dbb5ad62bdb1...), running |
Commands Run
Tailscale verification (local)
tailscale status --json | grep -E "advertis|route|172\.|primary"
# Confirmed 172.16.0.0/22 listed under PrimaryRoutes
New repo creation
curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \
-H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \
-d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}'
# HTTP 201, repo id 12
cd /c/Users/guru/radio-archive-portable
git init -b main
git config user.name "Mike Swanson"
git config user.email "mike@azcomputerguru.com"
git add -A && git commit
git remote add origin https://azcomputerguru:<token>@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
git push -u origin main
git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git # strip token
Jupiter deploy
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
# 60.5 MB at ~580 KB/s = ~100 seconds
"/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
# Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running
Live verification
curl -sS http://172.16.3.20:8765/api/stats
# {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,...
curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \
http://172.16.3.20:8765/api/db.sqlite
# HTTP 200 | dl=60583936B
.venv/Scripts/python.exe -c "
import sqlite3
db = sqlite3.connect('test-fetched.db')
print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone())
"
# (1405,)
curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2'
# returns 2 hits, each with usefulness_score=5, topic_class='computer-help'
Pending / Next
- Test the laptop install end-to-end when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine.
- HTML index UI update — backend supports
min_scoreandexclude_banterquery params, but the search UI on/doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is. - Re-run the 2 failed classifier rows —
classify_qa_quality.pyre-invocation will retry the NULL-scored rows; one-line cleanup. - Track 2 (voice profile clustering) — still deferred. Lower priority since content-quality filter solved most of the search-quality problem.
- Track 3 (speaker oracle wiring through to search UI) — still deferred.
speaker_oracle.pyresolves names from intros but the search results still show "CALLER" rather than the resolved name.
Reference
Endpoints (all live on http://172.16.3.20:8765/ as of this commit)
| Method | Path | Notes |
|---|---|---|
| GET | / |
Search UI (no min_score toggle yet — query string works manually) |
| GET | /api/stats |
Counts and per-year breakdown |
| GET | /api/episodes?year=YYYY&limit=N |
Episode list |
| GET | /api/episodes/{id} |
Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter) |
| GET | /api/episodes/{id}/transcript |
Chronological merged segments + turns |
| GET | /api/search?q=...&kind=both|segments|qa&min_score=N&exclude_banter=true&limit=N |
FTS5 |
| GET | /api/callers?limit=N |
Top recurring caller_names |
| GET | /api/db.sqlite |
NEW — streams the read-only DB blob (60 MB) |
Laptop next-week recipe (5070 / Linux)
# Tailscale already enabled on the laptop and on pfsense-2
git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
cd radio-archive-portable
./sync-db.sh # pulls from 172.16.3.20:8765/api/db.sqlite
./run.sh # creates .venv, starts uvicorn on localhost:8765
xdg-open http://localhost:8765/
Refreshing: ./sync-db.sh any time. Atomic — partial download won't corrupt existing DB.
macOS variant (mikes-macbook-air, if used)
Same recipe. python3 -m venv works on Mac. xdg-open → open.
Jupiter redeploy procedure (when source or DB changes)
# Source change:
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp server/main.py \
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/
"/c/Program Files/PuTTY/plink.exe" -ssh -pw <pw> root@172.16.3.20 \
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
# DB-only change (no container restart needed):
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp archive-data/archive.db \
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
The SQLite connection on the container side is mode=ro URI — picks up fresh DB on next request without restart.
Status at session end
- Upstream container rebuilt + running with
/api/db.sqliteendpoint live - Classified DB deployed to Jupiter (1,405/1,407 scored)
- Portable repo created and pushed to
git.azcomputerguru.com/azcomputerguru/radio-archive-portable - Laptop install is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week)
- ClaudeTools commits:
5e3b1a2(this session's main.py change) - Untested edge cases: offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1)
Update: 06:05 — Index UI exposes classifier filters
User asked to wire the new classifier fields into the search UI. The
backend already supported min_score and exclude_banter query params
(commit 5e3b1a2); this update brings them into the HTML index and adds
visible quality indicators on Q/A hits.
Update Summary
Edited INDEX_HTML in server/main.py to add two filter controls and
score badges. Verified locally via uvicorn on 127.0.0.1:8866 against
the classifier-populated DB (no-filter, min_score=4, and
exclude_banter=true modes all behaved correctly). Hit an unexpected
No space left on device from pscp despite Jupiter having 37 TB free
on /mnt/user; bypassed by streaming the file through plink stdin
(plink ... "cat > /path" < local_file). md5 verified byte-identical.
Container rebuilt via docker compose up -d --build. Synced the same
main.py to the portable repo so the laptop UI stays in sync.
What changed in the UI
min scoreselect — values: any, 2+, 3+, 4+, 5. Defaultanyto preserve old search behavior. Filters surface 1,096 mid-and-above pairs at3+or 523 useful pairs at4+.hide bantercheckbox — when checked, drops the 606 rows withis_banter=1.- Score badge per Q/A hit — small color-coded number (1=red, 5=green)
next to each hit's metadata line. Title attribute shows
usefulness N/5on hover. - Topic class tag — small gray pill showing
computer-help,banter,off-topic,promo, orunclear. - Dimmed rendering — hits with score 1-2 or
is_banter=truerender at 55% opacity. Visible but visually de-emphasized so good hits stand out at a glance. escapeHtmlhelper — defensive XSS guard oncaller_nameandtitle(transcript-derived strings).
Key Decisions (this update)
- Default filter "any" — preserves prior search habits and saved URLs. Mike opts into filtering when needed rather than being forced into a curated view.
URLSearchParamsinstead of string concat — only emitsmin_score=/exclude_banter=when non-default, keeping URL bar clean for the common case.- Color-coded badge with both score AND topic tag — score is numeric/comparable; topic tag is categorical and explains why a score is what it is. Both together make the classifier's reasoning visible at a glance without forcing a click.
- Dim instead of hide for low-quality hits — keeps everything visible by default; the filter controls are the explicit "hide" lever.
- Used
plink "cat > path"instead of pscp for the deploy when pscp failed — faster than diagnosing the underlying scp/shfs issue and gets the job done deterministically.
Problems Encountered (this update)
- pscp ENOSPC despite 37 TB free —
pscp main.pyfailed withNo space left on deviceon two retries. df showed 37 TB free on/mnt/user, df -i showed inodes fine. Workaround:plink ... "cat > /path/main.py" < local_main.py. md5sum confirmed byte-identical post-transfer. Likely Unraid shfs cache-pool churn or an issue with overwriting an in-use file from inside a container's mount. Worth understanding eventually but didn't block the deploy. - plink output buffering on chained docker commands — long
docker compose up -d --buildruns hung from Bash's run_in_background view (output file stayed empty for minutes). Foreground sync run with the same command worked instantly. Same pattern observed yesterday. Workaround: don't background long plink runs; just block.
Files Changed (this update)
| Path | Change |
|---|---|
projects/radio-show/audio-processor/server/main.py |
+51 / -4 — INDEX_HTML gained controls + badge styles + topic tag + escapeHtml + dim-class JS rendering |
c:/Users/guru/radio-archive-portable/server/main.py |
Same diff, synced from upstream |
Commits (this update)
| Repo | SHA | Branch |
|---|---|---|
| ClaudeTools | b9af34f |
main |
| radio-archive-portable | 1d6c795 |
main |
Live verification
$ curl -s http://172.16.3.20:8765/ | grep -cE "min_score|exclude_banter|badge.s5|topic_class"
10
$ curl -so /dev/null -w "%{size_download}\n" http://172.16.3.20:8765/
5757 # was 4040 before
$ curl -s 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2' \
| python -c "import sys,json; d=json.load(sys.stdin); print('hits:', len(d['qa']))"
hits: 2 # both score=5, topic_class='computer-help'
Status at update end
- UI controls live on http://172.16.3.20:8765/ and on the portable repo
- Backend filters working (verified end-to-end)
- Untouched: HTML still has no per-hit deep-link to
/api/episodes/{id}(clicking a hit doesn't navigate). Future enhancement. - Pending: laptop validation (still next week's task)