# Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy **Project:** The Computer Guru Show — Archive Mining System **Goal:** Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container **Machine:** GURU-BEAST-ROG (RTX 4090) **User:** Mike Swanson (mike) **Continues from:** `2026-04-29-qa-quality-classifier.md` (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB) --- ## User - **User:** Mike Swanson (mike) - **Machine:** GURU-BEAST-ROG - **Role:** admin --- ## Session Summary The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed. Verified Tailscale state from `tailscale status --json` — pfsense-2 (`100.119.153.74`) advertises `172.16.0.0/22` as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to `172.16.3.20:8765` already accepts subnet-routed traffic without modification. Built a new private Gitea repo `azcomputerguru/radio-archive-portable` with eight files: server code (identical to upstream), a `sync-db.sh` that curl-fetches `archive.db` from a new `/api/db.sqlite` endpoint, a `run.sh` that creates a venv on first invocation and starts uvicorn on `localhost:8765`, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed. Added the `/api/db.sqlite` endpoint to the upstream `server/main.py` using FastAPI `FileResponse`. Disclosure equivalence: anyone who can reach `/api/search` already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd `main.py` + classified `archive.db`, then `docker compose up -d --build`). Verified end-to-end: `GET /api/db.sqlite` returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; `GET /api/search?min_score=4` filters correctly with the new fields in the response. --- ## Key Decisions - **Subnet routing already in place** — confirmed via `tailscale status --json` that pfsense-2 advertises `172.16.0.0/22` as primary route. No new daemons or routing changes required. Container bind to `172.16.3.20:8765` is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener. - **`/api/db.sqlite` over HTTP instead of SSH/SCP for the DB sync** — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with `/api/search` (which already returns every transcript) means no auth was added to either. - **Separate repo for the portable bundle** — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at `git.azcomputerguru.com/azcomputerguru/radio-archive-portable` (private, under the user namespace). - **DB excluded from the repo via gitignore** — the 60 MB blob is fetched via `sync-db.sh` on first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to `.partial`, validate size, rename into place). - **Used `docker compose up -d --build` (combined) instead of separate `build` then `up`** — separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable. - **Stripped API token from `.git/config` after push** — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt. --- ## Problems Encountered - **First deploy attempt landed but rebuild didn't happen** — chained `docker compose build && docker compose up -d` via plink completed exit-code-0 but the container kept running yesterday's code (verified via `docker exec radio-archive grep db.sqlite /app/main.py` returning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by using `docker compose up -d --build` as a single foreground command. - **Bash background-task output capture flaky on long plink runs** — early deploy attempts went into the Bash tool's `run_in_background` mode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously. - **`/tmp` path clash between git-bash and Windows Python** — a smoke-test command tried to fetch the DB via curl (using `/tmp/test-db.sqlite`) and then read it with `python -c` (also writing `/tmp/...`). Different tools resolved `/tmp` differently on Windows. Switched to a project-local `test-fetched.db` path to avoid the issue. - **Gitea API at `/api/v1/orgs/azcomputerguru/repos` returned 404** — `azcomputerguru` is a USER, not an org. Repo creation succeeded via `/api/v1/user/repos` instead. (The token's owner is `azcomputerguru`, so user-namespace creation worked.) - **`HEAD /api/db.sqlite` returns 405 Method Not Allowed** — FastAPI's default routing only registers GET. A `HEAD` is fine to fail because the sync script uses `GET`. Documented behavior, not a bug. --- ## Credentials Used ### Jupiter (Unraid Primary) - **Vault path:** `infrastructure/jupiter-unraid-primary.sops.yaml` - **Host:** 172.16.3.20 - **User:** root - **Password:** `Th1nk3r^99##` - **iDRAC IP:** 172.16.1.73 / root / `Window123!@#-idrac` ### Gitea - **Vault path:** `services/gitea.sops.yaml` - **URL:** https://git.azcomputerguru.com - **Username:** `azcomputerguru` - **Password:** `Gptf*77ttb123!@#-git` (alt: `Window123!@#-git`) - **API token (used this session):** `9b1da4b79a38ef782268341d25a4b6880572063f` - **SSH:** `ssh://git@172.16.3.20:2222` ### New Repo - **Clone URL:** https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git - **SSH URL:** `git@172.16.3.21:azcomputerguru/radio-archive-portable.git` - **Visibility:** private - **Default branch:** main --- ## Infrastructure Touched | Host | IP | Role | Action | |---|---|---|---| | Jupiter (Unraid Primary) | 172.16.3.20 | Hypervisor + Docker host | pscp'd updated `main.py` + `archive.db`; `docker compose up -d --build` | | Radio-archive container | container on Jupiter, bind `172.16.3.20:8765` | FastAPI + SQLite | Rebuilt with new endpoint; restarted with classifier-populated DB | | Gitea (on Jupiter, port 3000) | git.azcomputerguru.com | Source hosting | New repo created via API | | pfsense-2 router | (Tailscale `100.119.153.74`) | Subnet router | No changes — verified existing 172.16.0.0/22 advertisement | ### Tailscale state at session time ``` 100.101.122.4 guru-beast-rog (this machine, online) 100.65.158.123 mikes-macbook-air (last seen 4m before check) 100.95.216.79 acg-guru-5070 (offline 30d ago — boot it up next week) 100.119.153.74 pfsense-2 (active; advertises 172.16.0.0/22 as PRIMARY) ``` --- ## Files Created / Modified ### New repo: `radio-archive-portable/` | Path | Purpose | |---|---| | `README.md` | Quick-start, refresh procedure, architecture diagram | | `server/main.py` | Identical to deployed upstream (with `/api/db.sqlite`) | | `server/requirements.txt` | `fastapi==0.115.6`, `uvicorn[standard]==0.34.0` | | `sync-db.sh` | `curl -fSL -o archive-data/archive.db.partial $URL && mv` (atomic) | | `run.sh` | Creates `.venv` on first run, then `uvicorn server.main:app --host 127.0.0.1 --port 8765` | | `.env.example` | `ARCHIVE_HOST=172.16.3.20:8765`, `ARCHIVE_DB=archive-data/archive.db`, `PORT=8765` | | `.gitignore` | Excludes `archive-data/archive.db`, `.venv/`, `.env`, etc. | | `archive-data/.gitkeep` | Placeholder so the dir exists in git but the DB file doesn't | ### ClaudeTools (upstream) | Path | Change | |---|---| | `projects/radio-show/audio-processor/server/main.py` | +18 / -1 — added `from fastapi.responses import FileResponse` and the `/api/db.sqlite` GET endpoint | ### Jupiter (deployed state) | Path | Change | |---|---| | `/mnt/user/appdata/radio-archive/app/main.py` | Replaced (now matches `5e3b1a2`) | | `/mnt/user/appdata/radio-archive/data/archive.db` | Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored) | | Container `radio-archive` | Rebuilt to image `radio-archive:latest` (`sha256:dbb5ad62bdb1...`), running | --- ## Commands Run ### Tailscale verification (local) ```bash tailscale status --json | grep -E "advertis|route|172\.|primary" # Confirmed 172.16.0.0/22 listed under PrimaryRoutes ``` ### New repo creation ```bash curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \ -H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \ -d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}' # HTTP 201, repo id 12 cd /c/Users/guru/radio-archive-portable git init -b main git config user.name "Mike Swanson" git config user.email "mike@azcomputerguru.com" git add -A && git commit git remote add origin https://azcomputerguru:@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git git push -u origin main git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git # strip token ``` ### Jupiter deploy ```bash "/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \ c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \ root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py "/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \ c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \ root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db # 60.5 MB at ~580 KB/s = ~100 seconds "/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \ "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build" # Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running ``` ### Live verification ```bash curl -sS http://172.16.3.20:8765/api/stats # {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,... curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \ http://172.16.3.20:8765/api/db.sqlite # HTTP 200 | dl=60583936B .venv/Scripts/python.exe -c " import sqlite3 db = sqlite3.connect('test-fetched.db') print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone()) " # (1405,) curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2' # returns 2 hits, each with usefulness_score=5, topic_class='computer-help' ``` --- ## Pending / Next 1. **Test the laptop install end-to-end** when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine. 2. **HTML index UI update** — backend supports `min_score` and `exclude_banter` query params, but the search UI on `/` doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is. 3. **Re-run the 2 failed classifier rows** — `classify_qa_quality.py` re-invocation will retry the NULL-scored rows; one-line cleanup. 4. **Track 2 (voice profile clustering)** — still deferred. Lower priority since content-quality filter solved most of the search-quality problem. 5. **Track 3 (speaker oracle wiring through to search UI)** — still deferred. `speaker_oracle.py` resolves names from intros but the search results still show "CALLER" rather than the resolved name. --- ## Reference ### Endpoints (all live on http://172.16.3.20:8765/ as of this commit) | Method | Path | Notes | |---|---|---| | GET | `/` | Search UI (no min_score toggle yet — query string works manually) | | GET | `/api/stats` | Counts and per-year breakdown | | GET | `/api/episodes?year=YYYY&limit=N` | Episode list | | GET | `/api/episodes/{id}` | Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter) | | GET | `/api/episodes/{id}/transcript` | Chronological merged segments + turns | | GET | `/api/search?q=...&kind=both\|segments\|qa&min_score=N&exclude_banter=true&limit=N` | FTS5 | | GET | `/api/callers?limit=N` | Top recurring caller_names | | GET | `/api/db.sqlite` | **NEW** — streams the read-only DB blob (60 MB) | ### Laptop next-week recipe (5070 / Linux) ```bash # Tailscale already enabled on the laptop and on pfsense-2 git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git cd radio-archive-portable ./sync-db.sh # pulls from 172.16.3.20:8765/api/db.sqlite ./run.sh # creates .venv, starts uvicorn on localhost:8765 xdg-open http://localhost:8765/ ``` Refreshing: `./sync-db.sh` any time. Atomic — partial download won't corrupt existing DB. ### macOS variant (mikes-macbook-air, if used) Same recipe. `python3 -m venv` works on Mac. `xdg-open` → `open`. ### Jupiter redeploy procedure (when source or DB changes) ```bash # Source change: "/c/Program Files/PuTTY/pscp.exe" -pw -scp server/main.py \ root@172.16.3.20:/mnt/user/appdata/radio-archive/app/ "/c/Program Files/PuTTY/plink.exe" -ssh -pw root@172.16.3.20 \ "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build" # DB-only change (no container restart needed): "/c/Program Files/PuTTY/pscp.exe" -pw -scp archive-data/archive.db \ root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db ``` The SQLite connection on the container side is `mode=ro` URI — picks up fresh DB on next request without restart. --- ## Status at session end - **Upstream container** rebuilt + running with `/api/db.sqlite` endpoint live - **Classified DB** deployed to Jupiter (1,405/1,407 scored) - **Portable repo** created and pushed to `git.azcomputerguru.com/azcomputerguru/radio-archive-portable` - **Laptop install** is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week) - **ClaudeTools commits:** `5e3b1a2` (this session's main.py change) - **Untested edge cases:** offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1) --- ## Update: 06:05 — Index UI exposes classifier filters User asked to wire the new classifier fields into the search UI. The backend already supported `min_score` and `exclude_banter` query params (commit `5e3b1a2`); this update brings them into the HTML index and adds visible quality indicators on Q/A hits. ### Update Summary Edited `INDEX_HTML` in `server/main.py` to add two filter controls and score badges. Verified locally via `uvicorn` on `127.0.0.1:8866` against the classifier-populated DB (no-filter, `min_score=4`, and `exclude_banter=true` modes all behaved correctly). Hit an unexpected `No space left on device` from `pscp` despite Jupiter having 37 TB free on `/mnt/user`; bypassed by streaming the file through plink stdin (`plink ... "cat > /path" < local_file`). md5 verified byte-identical. Container rebuilt via `docker compose up -d --build`. Synced the same `main.py` to the portable repo so the laptop UI stays in sync. ### What changed in the UI - **`min score` select** — values: any, 2+, 3+, 4+, 5. Default `any` to preserve old search behavior. Filters surface 1,096 mid-and-above pairs at `3+` or 523 useful pairs at `4+`. - **`hide banter` checkbox** — when checked, drops the 606 rows with `is_banter=1`. - **Score badge per Q/A hit** — small color-coded number (1=red, 5=green) next to each hit's metadata line. Title attribute shows `usefulness N/5` on hover. - **Topic class tag** — small gray pill showing `computer-help`, `banter`, `off-topic`, `promo`, or `unclear`. - **Dimmed rendering** — hits with score 1-2 or `is_banter=true` render at 55% opacity. Visible but visually de-emphasized so good hits stand out at a glance. - **`escapeHtml` helper** — defensive XSS guard on `caller_name` and `title` (transcript-derived strings). ### Key Decisions (this update) - **Default filter "any"** — preserves prior search habits and saved URLs. Mike opts into filtering when needed rather than being forced into a curated view. - **`URLSearchParams` instead of string concat** — only emits `min_score=` / `exclude_banter=` when non-default, keeping URL bar clean for the common case. - **Color-coded badge with both score AND topic tag** — score is numeric/comparable; topic tag is categorical and explains *why* a score is what it is. Both together make the classifier's reasoning visible at a glance without forcing a click. - **Dim instead of hide for low-quality hits** — keeps everything visible by default; the filter controls are the explicit "hide" lever. - **Used `plink "cat > path"` instead of pscp** for the deploy when pscp failed — faster than diagnosing the underlying scp/shfs issue and gets the job done deterministically. ### Problems Encountered (this update) - **pscp ENOSPC despite 37 TB free** — `pscp main.py` failed with `No space left on device` on two retries. df showed 37 TB free on `/mnt/user`, df -i showed inodes fine. Workaround: `plink ... "cat > /path/main.py" < local_main.py`. md5sum confirmed byte-identical post-transfer. Likely Unraid shfs cache-pool churn or an issue with overwriting an in-use file from inside a container's mount. Worth understanding eventually but didn't block the deploy. - **plink output buffering on chained docker commands** — long `docker compose up -d --build` runs hung from Bash's run_in_background view (output file stayed empty for minutes). Foreground sync run with the same command worked instantly. Same pattern observed yesterday. Workaround: don't background long plink runs; just block. ### Files Changed (this update) | Path | Change | |---|---| | `projects/radio-show/audio-processor/server/main.py` | +51 / -4 — INDEX_HTML gained controls + badge styles + topic tag + escapeHtml + dim-class JS rendering | | `c:/Users/guru/radio-archive-portable/server/main.py` | Same diff, synced from upstream | ### Commits (this update) | Repo | SHA | Branch | |---|---|---| | ClaudeTools | `b9af34f` | main | | radio-archive-portable | `1d6c795` | main | ### Live verification ``` $ curl -s http://172.16.3.20:8765/ | grep -cE "min_score|exclude_banter|badge.s5|topic_class" 10 $ curl -so /dev/null -w "%{size_download}\n" http://172.16.3.20:8765/ 5757 # was 4040 before $ curl -s 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2' \ | python -c "import sys,json; d=json.load(sys.stdin); print('hits:', len(d['qa']))" hits: 2 # both score=5, topic_class='computer-help' ``` ### Status at update end - UI controls live on http://172.16.3.20:8765/ and on the portable repo - Backend filters working (verified end-to-end) - Untouched: HTML still has no per-hit deep-link to `/api/episodes/{id}` (clicking a hit doesn't navigate). Future enhancement. - Pending: laptop validation (still next week's task)