Files

Mike Swanson 48c8b311bf radio: session log — portable laptop bundle + /api/db.sqlite deploy

New private Gitea repo `azcomputerguru/radio-archive-portable` for
laptop offline use. Upstream gained /api/db.sqlite for HTTP-only DB
sync (no SSH keys needed). Jupiter container rebuilt + restarted with
the classifier-populated DB; verified end-to-end (200 OK, 60.5 MB,
1,405 classifier rows intact, min_score filter working).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 05:37:01 -07:00

14 KiB

Raw Blame History

Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy

Project: The Computer Guru Show — Archive Mining System Goal: Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container Machine: GURU-BEAST-ROG (RTX 4090) User: Mike Swanson (mike) Continues from: 2026-04-29-qa-quality-classifier.md (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB)

User

User: Mike Swanson (mike)
Machine: GURU-BEAST-ROG
Role: admin

Session Summary

The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed.

Verified Tailscale state from tailscale status --json — pfsense-2 (100.119.153.74) advertises 172.16.0.0/22 as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to 172.16.3.20:8765 already accepts subnet-routed traffic without modification.

Built a new private Gitea repo azcomputerguru/radio-archive-portable with eight files: server code (identical to upstream), a sync-db.sh that curl-fetches archive.db from a new /api/db.sqlite endpoint, a run.sh that creates a venv on first invocation and starts uvicorn on localhost:8765, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed.

Added the /api/db.sqlite endpoint to the upstream server/main.py using FastAPI FileResponse. Disclosure equivalence: anyone who can reach /api/search already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd main.py + classified archive.db, then docker compose up -d --build). Verified end-to-end: GET /api/db.sqlite returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; GET /api/search?min_score=4 filters correctly with the new fields in the response.

Key Decisions

Subnet routing already in place — confirmed via tailscale status --json that pfsense-2 advertises 172.16.0.0/22 as primary route. No new daemons or routing changes required. Container bind to 172.16.3.20:8765 is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener.
/api/db.sqlite over HTTP instead of SSH/SCP for the DB sync — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with /api/search (which already returns every transcript) means no auth was added to either.
Separate repo for the portable bundle — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at git.azcomputerguru.com/azcomputerguru/radio-archive-portable (private, under the user namespace).
DB excluded from the repo via gitignore — the 60 MB blob is fetched via sync-db.sh on first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to .partial, validate size, rename into place).
Used docker compose up -d --build (combined) instead of separate build then up — separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable.
Stripped API token from .git/config after push — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt.

Problems Encountered

First deploy attempt landed but rebuild didn't happen — chained docker compose build && docker compose up -d via plink completed exit-code-0 but the container kept running yesterday's code (verified via docker exec radio-archive grep db.sqlite /app/main.py returning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by using docker compose up -d --build as a single foreground command.
Bash background-task output capture flaky on long plink runs — early deploy attempts went into the Bash tool's run_in_background mode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously.
/tmp path clash between git-bash and Windows Python — a smoke-test command tried to fetch the DB via curl (using /tmp/test-db.sqlite) and then read it with python -c (also writing /tmp/...). Different tools resolved /tmp differently on Windows. Switched to a project-local test-fetched.db path to avoid the issue.
Gitea API at /api/v1/orgs/azcomputerguru/repos returned 404 — azcomputerguru is a USER, not an org. Repo creation succeeded via /api/v1/user/repos instead. (The token's owner is azcomputerguru, so user-namespace creation worked.)
HEAD /api/db.sqlite returns 405 Method Not Allowed — FastAPI's default routing only registers GET. A HEAD is fine to fail because the sync script uses GET. Documented behavior, not a bug.

Credentials Used

Jupiter (Unraid Primary)

Vault path: infrastructure/jupiter-unraid-primary.sops.yaml
Host: 172.16.3.20
User: root
Password: Th1nk3r^99##
iDRAC IP: 172.16.1.73 / root / Window123!@#-idrac

Gitea

Vault path: services/gitea.sops.yaml
URL: https://git.azcomputerguru.com
Username: azcomputerguru
Password: Gptf*77ttb123!@#-git (alt: Window123!@#-git)
API token (used this session): 9b1da4b79a38ef782268341d25a4b6880572063f
SSH: ssh://git@172.16.3.20:2222

New Repo

Clone URL: https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
SSH URL: git@172.16.3.21:azcomputerguru/radio-archive-portable.git
Visibility: private
Default branch: main

Infrastructure Touched

Host	IP	Role	Action
Jupiter (Unraid Primary)	172.16.3.20	Hypervisor + Docker host	pscp'd updated `main.py` + `archive.db`; `docker compose up -d --build`
Radio-archive container	container on Jupiter, bind `172.16.3.20:8765`	FastAPI + SQLite	Rebuilt with new endpoint; restarted with classifier-populated DB
Gitea (on Jupiter, port 3000)	git.azcomputerguru.com	Source hosting	New repo created via API
pfsense-2 router	(Tailscale `100.119.153.74`)	Subnet router	No changes — verified existing 172.16.0.0/22 advertisement

Tailscale state at session time

100.101.122.4    guru-beast-rog     (this machine, online)
100.65.158.123   mikes-macbook-air  (last seen 4m before check)
100.95.216.79    acg-guru-5070      (offline 30d ago — boot it up next week)
100.119.153.74   pfsense-2          (active; advertises 172.16.0.0/22 as PRIMARY)

Files Created / Modified

New repo: `radio-archive-portable/`

Path	Purpose
`README.md`	Quick-start, refresh procedure, architecture diagram
`server/main.py`	Identical to deployed upstream (with `/api/db.sqlite`)
`server/requirements.txt`	`fastapi==0.115.6`, `uvicorn[standard]==0.34.0`
`sync-db.sh`	`curl -fSL -o archive-data/archive.db.partial $URL && mv` (atomic)
`run.sh`	Creates `.venv` on first run, then `uvicorn server.main:app --host 127.0.0.1 --port 8765`
`.env.example`	`ARCHIVE_HOST=172.16.3.20:8765`, `ARCHIVE_DB=archive-data/archive.db`, `PORT=8765`
`.gitignore`	Excludes `archive-data/archive.db`, `.venv/`, `.env`, etc.
`archive-data/.gitkeep`	Placeholder so the dir exists in git but the DB file doesn't

ClaudeTools (upstream)

Path	Change
`projects/radio-show/audio-processor/server/main.py`	+18 / -1 — added `from fastapi.responses import FileResponse` and the `/api/db.sqlite` GET endpoint

Jupiter (deployed state)

Path	Change
`/mnt/user/appdata/radio-archive/app/main.py`	Replaced (now matches `5e3b1a2`)
`/mnt/user/appdata/radio-archive/data/archive.db`	Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored)
Container `radio-archive`	Rebuilt to image `radio-archive:latest` (`sha256:dbb5ad62bdb1...`), running

Commands Run

Tailscale verification (local)

tailscale status --json | grep -E "advertis|route|172\.|primary"
# Confirmed 172.16.0.0/22 listed under PrimaryRoutes

New repo creation

curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \
  -H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \
  -d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}'
# HTTP 201, repo id 12

cd /c/Users/guru/radio-archive-portable
git init -b main
git config user.name "Mike Swanson"
git config user.email "mike@azcomputerguru.com"
git add -A && git commit
git remote add origin https://azcomputerguru:<token>@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
git push -u origin main
git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git  # strip token

Jupiter deploy

"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
  c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py

"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
  c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
# 60.5 MB at ~580 KB/s = ~100 seconds

"/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \
  "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
# Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running

Live verification

curl -sS http://172.16.3.20:8765/api/stats
# {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,...

curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \
  http://172.16.3.20:8765/api/db.sqlite
# HTTP 200 | dl=60583936B

.venv/Scripts/python.exe -c "
import sqlite3
db = sqlite3.connect('test-fetched.db')
print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone())
"
# (1405,)

curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2'
# returns 2 hits, each with usefulness_score=5, topic_class='computer-help'

Pending / Next

Test the laptop install end-to-end when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine.
HTML index UI update — backend supports min_score and exclude_banter query params, but the search UI on / doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is.
Re-run the 2 failed classifier rows — classify_qa_quality.py re-invocation will retry the NULL-scored rows; one-line cleanup.
Track 2 (voice profile clustering) — still deferred. Lower priority since content-quality filter solved most of the search-quality problem.
Track 3 (speaker oracle wiring through to search UI) — still deferred. speaker_oracle.py resolves names from intros but the search results still show "CALLER" rather than the resolved name.

Reference

Endpoints (all live on http://172.16.3.20:8765/ as of this commit)

Method	Path	Notes
GET	`/`	Search UI (no min_score toggle yet — query string works manually)
GET	`/api/stats`	Counts and per-year breakdown
GET	`/api/episodes?year=YYYY&limit=N`	Episode list
GET	`/api/episodes/{id}`	Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter)
GET	`/api/episodes/{id}/transcript`	Chronological merged segments + turns
GET	`/api/search?q=...&kind=both\|segments\|qa&min_score=N&exclude_banter=true&limit=N`	FTS5
GET	`/api/callers?limit=N`	Top recurring caller_names
GET	`/api/db.sqlite`	NEW — streams the read-only DB blob (60 MB)

Laptop next-week recipe (5070 / Linux)

# Tailscale already enabled on the laptop and on pfsense-2
git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
cd radio-archive-portable
./sync-db.sh         # pulls from 172.16.3.20:8765/api/db.sqlite
./run.sh             # creates .venv, starts uvicorn on localhost:8765
xdg-open http://localhost:8765/

Refreshing: ./sync-db.sh any time. Atomic — partial download won't corrupt existing DB.

macOS variant (mikes-macbook-air, if used)

Same recipe. python3 -m venv works on Mac. xdg-open → open.

Jupiter redeploy procedure (when source or DB changes)

# Source change:
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp server/main.py \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/app/
"/c/Program Files/PuTTY/plink.exe" -ssh -pw <pw> root@172.16.3.20 \
  "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"

# DB-only change (no container restart needed):
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp archive-data/archive.db \
  root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db

The SQLite connection on the container side is mode=ro URI — picks up fresh DB on next request without restart.

Status at session end

Upstream container rebuilt + running with /api/db.sqlite endpoint live
Classified DB deployed to Jupiter (1,405/1,407 scored)
Portable repo created and pushed to git.azcomputerguru.com/azcomputerguru/radio-archive-portable
Laptop install is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week)
ClaudeTools commits: 5e3b1a2 (this session's main.py change)
Untested edge cases: offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1)

14 KiB Raw Blame History