Files
claudetools/projects/radio-show/audio-processor/session-logs/2026-04-30-session.md
Mike Swanson 6239f9fc3a radio: session log update — index UI exposes classifier filters
Backend min_score/exclude_banter wired through to HTML index. Adds
score badges (1-5 red->green), topic_class pills, dim styling on
banter rows. Live on http://172.16.3.20:8765/. Synced to portable
repo. pscp ENOSPC quirk worked around by plink-stdin streaming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:07:00 -07:00

359 lines
19 KiB
Markdown

# Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy
**Project:** The Computer Guru Show — Archive Mining System
**Goal:** Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container
**Machine:** GURU-BEAST-ROG (RTX 4090)
**User:** Mike Swanson (mike)
**Continues from:** `2026-04-29-qa-quality-classifier.md` (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB)
---
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-BEAST-ROG
- **Role:** admin
---
## Session Summary
The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed.
Verified Tailscale state from `tailscale status --json` — pfsense-2 (`100.119.153.74`) advertises `172.16.0.0/22` as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to `172.16.3.20:8765` already accepts subnet-routed traffic without modification.
Built a new private Gitea repo `azcomputerguru/radio-archive-portable` with eight files: server code (identical to upstream), a `sync-db.sh` that curl-fetches `archive.db` from a new `/api/db.sqlite` endpoint, a `run.sh` that creates a venv on first invocation and starts uvicorn on `localhost:8765`, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed.
Added the `/api/db.sqlite` endpoint to the upstream `server/main.py` using FastAPI `FileResponse`. Disclosure equivalence: anyone who can reach `/api/search` already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd `main.py` + classified `archive.db`, then `docker compose up -d --build`). Verified end-to-end: `GET /api/db.sqlite` returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; `GET /api/search?min_score=4` filters correctly with the new fields in the response.
---
## Key Decisions
- **Subnet routing already in place** — confirmed via `tailscale status --json` that pfsense-2 advertises `172.16.0.0/22` as primary route. No new daemons or routing changes required. Container bind to `172.16.3.20:8765` is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener.
- **`/api/db.sqlite` over HTTP instead of SSH/SCP for the DB sync** — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with `/api/search` (which already returns every transcript) means no auth was added to either.
- **Separate repo for the portable bundle** — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at `git.azcomputerguru.com/azcomputerguru/radio-archive-portable` (private, under the user namespace).
- **DB excluded from the repo via gitignore** — the 60 MB blob is fetched via `sync-db.sh` on first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to `.partial`, validate size, rename into place).
- **Used `docker compose up -d --build` (combined) instead of separate `build` then `up`** — separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable.
- **Stripped API token from `.git/config` after push** — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt.
---
## Problems Encountered
- **First deploy attempt landed but rebuild didn't happen** — chained `docker compose build && docker compose up -d` via plink completed exit-code-0 but the container kept running yesterday's code (verified via `docker exec radio-archive grep db.sqlite /app/main.py` returning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by using `docker compose up -d --build` as a single foreground command.
- **Bash background-task output capture flaky on long plink runs** — early deploy attempts went into the Bash tool's `run_in_background` mode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously.
- **`/tmp` path clash between git-bash and Windows Python** — a smoke-test command tried to fetch the DB via curl (using `/tmp/test-db.sqlite`) and then read it with `python -c` (also writing `/tmp/...`). Different tools resolved `/tmp` differently on Windows. Switched to a project-local `test-fetched.db` path to avoid the issue.
- **Gitea API at `/api/v1/orgs/azcomputerguru/repos` returned 404** — `azcomputerguru` is a USER, not an org. Repo creation succeeded via `/api/v1/user/repos` instead. (The token's owner is `azcomputerguru`, so user-namespace creation worked.)
- **`HEAD /api/db.sqlite` returns 405 Method Not Allowed** — FastAPI's default routing only registers GET. A `HEAD` is fine to fail because the sync script uses `GET`. Documented behavior, not a bug.
---
## Credentials Used
### Jupiter (Unraid Primary)
- **Vault path:** `infrastructure/jupiter-unraid-primary.sops.yaml`
- **Host:** 172.16.3.20
- **User:** root
- **Password:** `Th1nk3r^99##`
- **iDRAC IP:** 172.16.1.73 / root / `Window123!@#-idrac`
### Gitea
- **Vault path:** `services/gitea.sops.yaml`
- **URL:** https://git.azcomputerguru.com
- **Username:** `azcomputerguru`
- **Password:** `Gptf*77ttb123!@#-git` (alt: `Window123!@#-git`)
- **API token (used this session):** `9b1da4b79a38ef782268341d25a4b6880572063f`
- **SSH:** `ssh://git@172.16.3.20:2222`
### New Repo
- **Clone URL:** https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
- **SSH URL:** `git@172.16.3.21:azcomputerguru/radio-archive-portable.git`
- **Visibility:** private
- **Default branch:** main
---
## Infrastructure Touched
| Host | IP | Role | Action |
|---|---|---|---|
| Jupiter (Unraid Primary) | 172.16.3.20 | Hypervisor + Docker host | pscp'd updated `main.py` + `archive.db`; `docker compose up -d --build` |
| Radio-archive container | container on Jupiter, bind `172.16.3.20:8765` | FastAPI + SQLite | Rebuilt with new endpoint; restarted with classifier-populated DB |
| Gitea (on Jupiter, port 3000) | git.azcomputerguru.com | Source hosting | New repo created via API |
| pfsense-2 router | (Tailscale `100.119.153.74`) | Subnet router | No changes — verified existing 172.16.0.0/22 advertisement |
### Tailscale state at session time
```
100.101.122.4 guru-beast-rog (this machine, online)
100.65.158.123 mikes-macbook-air (last seen 4m before check)
100.95.216.79 acg-guru-5070 (offline 30d ago — boot it up next week)
100.119.153.74 pfsense-2 (active; advertises 172.16.0.0/22 as PRIMARY)
```
---
## Files Created / Modified
### New repo: `radio-archive-portable/`
| Path | Purpose |
|---|---|
| `README.md` | Quick-start, refresh procedure, architecture diagram |
| `server/main.py` | Identical to deployed upstream (with `/api/db.sqlite`) |
| `server/requirements.txt` | `fastapi==0.115.6`, `uvicorn[standard]==0.34.0` |
| `sync-db.sh` | `curl -fSL -o archive-data/archive.db.partial $URL && mv` (atomic) |
| `run.sh` | Creates `.venv` on first run, then `uvicorn server.main:app --host 127.0.0.1 --port 8765` |
| `.env.example` | `ARCHIVE_HOST=172.16.3.20:8765`, `ARCHIVE_DB=archive-data/archive.db`, `PORT=8765` |
| `.gitignore` | Excludes `archive-data/archive.db`, `.venv/`, `.env`, etc. |
| `archive-data/.gitkeep` | Placeholder so the dir exists in git but the DB file doesn't |
### ClaudeTools (upstream)
| Path | Change |
|---|---|
| `projects/radio-show/audio-processor/server/main.py` | +18 / -1 — added `from fastapi.responses import FileResponse` and the `/api/db.sqlite` GET endpoint |
### Jupiter (deployed state)
| Path | Change |
|---|---|
| `/mnt/user/appdata/radio-archive/app/main.py` | Replaced (now matches `5e3b1a2`) |
| `/mnt/user/appdata/radio-archive/data/archive.db` | Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored) |
| Container `radio-archive` | Rebuilt to image `radio-archive:latest` (`sha256:dbb5ad62bdb1...`), running |
---
## Commands Run
### Tailscale verification (local)
```bash
tailscale status --json | grep -E "advertis|route|172\.|primary"
# Confirmed 172.16.0.0/22 listed under PrimaryRoutes
```
### New repo creation
```bash
curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \
-H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \
-d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}'
# HTTP 201, repo id 12
cd /c/Users/guru/radio-archive-portable
git init -b main
git config user.name "Mike Swanson"
git config user.email "mike@azcomputerguru.com"
git add -A && git commit
git remote add origin https://azcomputerguru:<token>@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
git push -u origin main
git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git # strip token
```
### Jupiter deploy
```bash
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
# 60.5 MB at ~580 KB/s = ~100 seconds
"/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
# Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running
```
### Live verification
```bash
curl -sS http://172.16.3.20:8765/api/stats
# {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,...
curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \
http://172.16.3.20:8765/api/db.sqlite
# HTTP 200 | dl=60583936B
.venv/Scripts/python.exe -c "
import sqlite3
db = sqlite3.connect('test-fetched.db')
print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone())
"
# (1405,)
curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2'
# returns 2 hits, each with usefulness_score=5, topic_class='computer-help'
```
---
## Pending / Next
1. **Test the laptop install end-to-end** when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine.
2. **HTML index UI update** — backend supports `min_score` and `exclude_banter` query params, but the search UI on `/` doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is.
3. **Re-run the 2 failed classifier rows**`classify_qa_quality.py` re-invocation will retry the NULL-scored rows; one-line cleanup.
4. **Track 2 (voice profile clustering)** — still deferred. Lower priority since content-quality filter solved most of the search-quality problem.
5. **Track 3 (speaker oracle wiring through to search UI)** — still deferred. `speaker_oracle.py` resolves names from intros but the search results still show "CALLER" rather than the resolved name.
---
## Reference
### Endpoints (all live on http://172.16.3.20:8765/ as of this commit)
| Method | Path | Notes |
|---|---|---|
| GET | `/` | Search UI (no min_score toggle yet — query string works manually) |
| GET | `/api/stats` | Counts and per-year breakdown |
| GET | `/api/episodes?year=YYYY&limit=N` | Episode list |
| GET | `/api/episodes/{id}` | Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter) |
| GET | `/api/episodes/{id}/transcript` | Chronological merged segments + turns |
| GET | `/api/search?q=...&kind=both\|segments\|qa&min_score=N&exclude_banter=true&limit=N` | FTS5 |
| GET | `/api/callers?limit=N` | Top recurring caller_names |
| GET | `/api/db.sqlite` | **NEW** — streams the read-only DB blob (60 MB) |
### Laptop next-week recipe (5070 / Linux)
```bash
# Tailscale already enabled on the laptop and on pfsense-2
git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
cd radio-archive-portable
./sync-db.sh # pulls from 172.16.3.20:8765/api/db.sqlite
./run.sh # creates .venv, starts uvicorn on localhost:8765
xdg-open http://localhost:8765/
```
Refreshing: `./sync-db.sh` any time. Atomic — partial download won't corrupt existing DB.
### macOS variant (mikes-macbook-air, if used)
Same recipe. `python3 -m venv` works on Mac. `xdg-open``open`.
### Jupiter redeploy procedure (when source or DB changes)
```bash
# Source change:
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp server/main.py \
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/
"/c/Program Files/PuTTY/plink.exe" -ssh -pw <pw> root@172.16.3.20 \
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
# DB-only change (no container restart needed):
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp archive-data/archive.db \
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
```
The SQLite connection on the container side is `mode=ro` URI — picks up fresh DB on next request without restart.
---
## Status at session end
- **Upstream container** rebuilt + running with `/api/db.sqlite` endpoint live
- **Classified DB** deployed to Jupiter (1,405/1,407 scored)
- **Portable repo** created and pushed to `git.azcomputerguru.com/azcomputerguru/radio-archive-portable`
- **Laptop install** is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week)
- **ClaudeTools commits:** `5e3b1a2` (this session's main.py change)
- **Untested edge cases:** offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1)
---
## Update: 06:05 — Index UI exposes classifier filters
User asked to wire the new classifier fields into the search UI. The
backend already supported `min_score` and `exclude_banter` query params
(commit `5e3b1a2`); this update brings them into the HTML index and adds
visible quality indicators on Q/A hits.
### Update Summary
Edited `INDEX_HTML` in `server/main.py` to add two filter controls and
score badges. Verified locally via `uvicorn` on `127.0.0.1:8866` against
the classifier-populated DB (no-filter, `min_score=4`, and
`exclude_banter=true` modes all behaved correctly). Hit an unexpected
`No space left on device` from `pscp` despite Jupiter having 37 TB free
on `/mnt/user`; bypassed by streaming the file through plink stdin
(`plink ... "cat > /path" < local_file`). md5 verified byte-identical.
Container rebuilt via `docker compose up -d --build`. Synced the same
`main.py` to the portable repo so the laptop UI stays in sync.
### What changed in the UI
- **`min score` select** — values: any, 2+, 3+, 4+, 5. Default `any` to
preserve old search behavior. Filters surface 1,096 mid-and-above
pairs at `3+` or 523 useful pairs at `4+`.
- **`hide banter` checkbox** — when checked, drops the 606 rows with
`is_banter=1`.
- **Score badge per Q/A hit** — small color-coded number (1=red, 5=green)
next to each hit's metadata line. Title attribute shows
`usefulness N/5` on hover.
- **Topic class tag** — small gray pill showing `computer-help`,
`banter`, `off-topic`, `promo`, or `unclear`.
- **Dimmed rendering** — hits with score 1-2 or `is_banter=true` render
at 55% opacity. Visible but visually de-emphasized so good hits stand
out at a glance.
- **`escapeHtml` helper** — defensive XSS guard on `caller_name` and
`title` (transcript-derived strings).
### Key Decisions (this update)
- **Default filter "any"** — preserves prior search habits and saved
URLs. Mike opts into filtering when needed rather than being forced
into a curated view.
- **`URLSearchParams` instead of string concat** — only emits
`min_score=` / `exclude_banter=` when non-default, keeping URL bar
clean for the common case.
- **Color-coded badge with both score AND topic tag** — score is
numeric/comparable; topic tag is categorical and explains *why* a
score is what it is. Both together make the classifier's reasoning
visible at a glance without forcing a click.
- **Dim instead of hide for low-quality hits** — keeps everything
visible by default; the filter controls are the explicit "hide" lever.
- **Used `plink "cat > path"` instead of pscp** for the deploy when
pscp failed — faster than diagnosing the underlying scp/shfs issue
and gets the job done deterministically.
### Problems Encountered (this update)
- **pscp ENOSPC despite 37 TB free** — `pscp main.py` failed with
`No space left on device` on two retries. df showed 37 TB free on
`/mnt/user`, df -i showed inodes fine. Workaround:
`plink ... "cat > /path/main.py" < local_main.py`. md5sum confirmed
byte-identical post-transfer. Likely Unraid shfs cache-pool churn or
an issue with overwriting an in-use file from inside a container's
mount. Worth understanding eventually but didn't block the deploy.
- **plink output buffering on chained docker commands** — long
`docker compose up -d --build` runs hung from Bash's run_in_background
view (output file stayed empty for minutes). Foreground sync run with
the same command worked instantly. Same pattern observed yesterday.
Workaround: don't background long plink runs; just block.
### Files Changed (this update)
| Path | Change |
|---|---|
| `projects/radio-show/audio-processor/server/main.py` | +51 / -4 — INDEX_HTML gained controls + badge styles + topic tag + escapeHtml + dim-class JS rendering |
| `c:/Users/guru/radio-archive-portable/server/main.py` | Same diff, synced from upstream |
### Commits (this update)
| Repo | SHA | Branch |
|---|---|---|
| ClaudeTools | `b9af34f` | main |
| radio-archive-portable | `1d6c795` | main |
### Live verification
```
$ curl -s http://172.16.3.20:8765/ | grep -cE "min_score|exclude_banter|badge.s5|topic_class"
10
$ curl -so /dev/null -w "%{size_download}\n" http://172.16.3.20:8765/
5757 # was 4040 before
$ curl -s 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2' \
| python -c "import sys,json; d=json.load(sys.stdin); print('hits:', len(d['qa']))"
hits: 2 # both score=5, topic_class='computer-help'
```
### Status at update end
- UI controls live on http://172.16.3.20:8765/ and on the portable repo
- Backend filters working (verified end-to-end)
- Untouched: HTML still has no per-hit deep-link to `/api/episodes/{id}`
(clicking a hit doesn't navigate). Future enhancement.
- Pending: laptop validation (still next week's task)