From 48c8b311bf7e0c298e020c88daeee583058c8b0b Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Thu, 30 Apr 2026 05:37:01 -0700 Subject: [PATCH] =?UTF-8?q?radio:=20session=20log=20=E2=80=94=20portable?= =?UTF-8?q?=20laptop=20bundle=20+=20/api/db.sqlite=20deploy?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New private Gitea repo `azcomputerguru/radio-archive-portable` for laptop offline use. Upstream gained /api/db.sqlite for HTTP-only DB sync (no SSH keys needed). Jupiter container rebuilt + restarted with the classifier-populated DB; verified end-to-end (200 OK, 60.5 MB, 1,405 classifier rows intact, min_score filter working). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../session-logs/2026-04-30-session.md | 252 ++++++++++++++++++ 1 file changed, 252 insertions(+) create mode 100644 projects/radio-show/audio-processor/session-logs/2026-04-30-session.md diff --git a/projects/radio-show/audio-processor/session-logs/2026-04-30-session.md b/projects/radio-show/audio-processor/session-logs/2026-04-30-session.md new file mode 100644 index 0000000..7ad22ae --- /dev/null +++ b/projects/radio-show/audio-processor/session-logs/2026-04-30-session.md @@ -0,0 +1,252 @@ +# Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy + +**Project:** The Computer Guru Show — Archive Mining System +**Goal:** Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container +**Machine:** GURU-BEAST-ROG (RTX 4090) +**User:** Mike Swanson (mike) +**Continues from:** `2026-04-29-qa-quality-classifier.md` (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB) + +--- + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-BEAST-ROG +- **Role:** admin + +--- + +## Session Summary + +The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed. + +Verified Tailscale state from `tailscale status --json` — pfsense-2 (`100.119.153.74`) advertises `172.16.0.0/22` as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to `172.16.3.20:8765` already accepts subnet-routed traffic without modification. + +Built a new private Gitea repo `azcomputerguru/radio-archive-portable` with eight files: server code (identical to upstream), a `sync-db.sh` that curl-fetches `archive.db` from a new `/api/db.sqlite` endpoint, a `run.sh` that creates a venv on first invocation and starts uvicorn on `localhost:8765`, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed. + +Added the `/api/db.sqlite` endpoint to the upstream `server/main.py` using FastAPI `FileResponse`. Disclosure equivalence: anyone who can reach `/api/search` already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd `main.py` + classified `archive.db`, then `docker compose up -d --build`). Verified end-to-end: `GET /api/db.sqlite` returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; `GET /api/search?min_score=4` filters correctly with the new fields in the response. + +--- + +## Key Decisions + +- **Subnet routing already in place** — confirmed via `tailscale status --json` that pfsense-2 advertises `172.16.0.0/22` as primary route. No new daemons or routing changes required. Container bind to `172.16.3.20:8765` is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener. +- **`/api/db.sqlite` over HTTP instead of SSH/SCP for the DB sync** — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with `/api/search` (which already returns every transcript) means no auth was added to either. +- **Separate repo for the portable bundle** — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at `git.azcomputerguru.com/azcomputerguru/radio-archive-portable` (private, under the user namespace). +- **DB excluded from the repo via gitignore** — the 60 MB blob is fetched via `sync-db.sh` on first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to `.partial`, validate size, rename into place). +- **Used `docker compose up -d --build` (combined) instead of separate `build` then `up`** — separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable. +- **Stripped API token from `.git/config` after push** — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt. + +--- + +## Problems Encountered + +- **First deploy attempt landed but rebuild didn't happen** — chained `docker compose build && docker compose up -d` via plink completed exit-code-0 but the container kept running yesterday's code (verified via `docker exec radio-archive grep db.sqlite /app/main.py` returning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by using `docker compose up -d --build` as a single foreground command. +- **Bash background-task output capture flaky on long plink runs** — early deploy attempts went into the Bash tool's `run_in_background` mode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously. +- **`/tmp` path clash between git-bash and Windows Python** — a smoke-test command tried to fetch the DB via curl (using `/tmp/test-db.sqlite`) and then read it with `python -c` (also writing `/tmp/...`). Different tools resolved `/tmp` differently on Windows. Switched to a project-local `test-fetched.db` path to avoid the issue. +- **Gitea API at `/api/v1/orgs/azcomputerguru/repos` returned 404** — `azcomputerguru` is a USER, not an org. Repo creation succeeded via `/api/v1/user/repos` instead. (The token's owner is `azcomputerguru`, so user-namespace creation worked.) +- **`HEAD /api/db.sqlite` returns 405 Method Not Allowed** — FastAPI's default routing only registers GET. A `HEAD` is fine to fail because the sync script uses `GET`. Documented behavior, not a bug. + +--- + +## Credentials Used + +### Jupiter (Unraid Primary) +- **Vault path:** `infrastructure/jupiter-unraid-primary.sops.yaml` +- **Host:** 172.16.3.20 +- **User:** root +- **Password:** `Th1nk3r^99##` +- **iDRAC IP:** 172.16.1.73 / root / `Window123!@#-idrac` + +### Gitea +- **Vault path:** `services/gitea.sops.yaml` +- **URL:** https://git.azcomputerguru.com +- **Username:** `azcomputerguru` +- **Password:** `Gptf*77ttb123!@#-git` (alt: `Window123!@#-git`) +- **API token (used this session):** `9b1da4b79a38ef782268341d25a4b6880572063f` +- **SSH:** `ssh://git@172.16.3.20:2222` + +### New Repo +- **Clone URL:** https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git +- **SSH URL:** `git@172.16.3.21:azcomputerguru/radio-archive-portable.git` +- **Visibility:** private +- **Default branch:** main + +--- + +## Infrastructure Touched + +| Host | IP | Role | Action | +|---|---|---|---| +| Jupiter (Unraid Primary) | 172.16.3.20 | Hypervisor + Docker host | pscp'd updated `main.py` + `archive.db`; `docker compose up -d --build` | +| Radio-archive container | container on Jupiter, bind `172.16.3.20:8765` | FastAPI + SQLite | Rebuilt with new endpoint; restarted with classifier-populated DB | +| Gitea (on Jupiter, port 3000) | git.azcomputerguru.com | Source hosting | New repo created via API | +| pfsense-2 router | (Tailscale `100.119.153.74`) | Subnet router | No changes — verified existing 172.16.0.0/22 advertisement | + +### Tailscale state at session time + +``` +100.101.122.4 guru-beast-rog (this machine, online) +100.65.158.123 mikes-macbook-air (last seen 4m before check) +100.95.216.79 acg-guru-5070 (offline 30d ago — boot it up next week) +100.119.153.74 pfsense-2 (active; advertises 172.16.0.0/22 as PRIMARY) +``` + +--- + +## Files Created / Modified + +### New repo: `radio-archive-portable/` +| Path | Purpose | +|---|---| +| `README.md` | Quick-start, refresh procedure, architecture diagram | +| `server/main.py` | Identical to deployed upstream (with `/api/db.sqlite`) | +| `server/requirements.txt` | `fastapi==0.115.6`, `uvicorn[standard]==0.34.0` | +| `sync-db.sh` | `curl -fSL -o archive-data/archive.db.partial $URL && mv` (atomic) | +| `run.sh` | Creates `.venv` on first run, then `uvicorn server.main:app --host 127.0.0.1 --port 8765` | +| `.env.example` | `ARCHIVE_HOST=172.16.3.20:8765`, `ARCHIVE_DB=archive-data/archive.db`, `PORT=8765` | +| `.gitignore` | Excludes `archive-data/archive.db`, `.venv/`, `.env`, etc. | +| `archive-data/.gitkeep` | Placeholder so the dir exists in git but the DB file doesn't | + +### ClaudeTools (upstream) +| Path | Change | +|---|---| +| `projects/radio-show/audio-processor/server/main.py` | +18 / -1 — added `from fastapi.responses import FileResponse` and the `/api/db.sqlite` GET endpoint | + +### Jupiter (deployed state) +| Path | Change | +|---|---| +| `/mnt/user/appdata/radio-archive/app/main.py` | Replaced (now matches `5e3b1a2`) | +| `/mnt/user/appdata/radio-archive/data/archive.db` | Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored) | +| Container `radio-archive` | Rebuilt to image `radio-archive:latest` (`sha256:dbb5ad62bdb1...`), running | + +--- + +## Commands Run + +### Tailscale verification (local) +```bash +tailscale status --json | grep -E "advertis|route|172\.|primary" +# Confirmed 172.16.0.0/22 listed under PrimaryRoutes +``` + +### New repo creation +```bash +curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \ + -H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \ + -d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}' +# HTTP 201, repo id 12 + +cd /c/Users/guru/radio-archive-portable +git init -b main +git config user.name "Mike Swanson" +git config user.email "mike@azcomputerguru.com" +git add -A && git commit +git remote add origin https://azcomputerguru:@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git +git push -u origin main +git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git # strip token +``` + +### Jupiter deploy +```bash +"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \ + c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \ + root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py + +"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \ + c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \ + root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db +# 60.5 MB at ~580 KB/s = ~100 seconds + +"/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \ + "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build" +# Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running +``` + +### Live verification +```bash +curl -sS http://172.16.3.20:8765/api/stats +# {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,... + +curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \ + http://172.16.3.20:8765/api/db.sqlite +# HTTP 200 | dl=60583936B + +.venv/Scripts/python.exe -c " +import sqlite3 +db = sqlite3.connect('test-fetched.db') +print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone()) +" +# (1405,) + +curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2' +# returns 2 hits, each with usefulness_score=5, topic_class='computer-help' +``` + +--- + +## Pending / Next + +1. **Test the laptop install end-to-end** when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine. +2. **HTML index UI update** — backend supports `min_score` and `exclude_banter` query params, but the search UI on `/` doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is. +3. **Re-run the 2 failed classifier rows** — `classify_qa_quality.py` re-invocation will retry the NULL-scored rows; one-line cleanup. +4. **Track 2 (voice profile clustering)** — still deferred. Lower priority since content-quality filter solved most of the search-quality problem. +5. **Track 3 (speaker oracle wiring through to search UI)** — still deferred. `speaker_oracle.py` resolves names from intros but the search results still show "CALLER" rather than the resolved name. + +--- + +## Reference + +### Endpoints (all live on http://172.16.3.20:8765/ as of this commit) + +| Method | Path | Notes | +|---|---|---| +| GET | `/` | Search UI (no min_score toggle yet — query string works manually) | +| GET | `/api/stats` | Counts and per-year breakdown | +| GET | `/api/episodes?year=YYYY&limit=N` | Episode list | +| GET | `/api/episodes/{id}` | Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter) | +| GET | `/api/episodes/{id}/transcript` | Chronological merged segments + turns | +| GET | `/api/search?q=...&kind=both\|segments\|qa&min_score=N&exclude_banter=true&limit=N` | FTS5 | +| GET | `/api/callers?limit=N` | Top recurring caller_names | +| GET | `/api/db.sqlite` | **NEW** — streams the read-only DB blob (60 MB) | + +### Laptop next-week recipe (5070 / Linux) + +```bash +# Tailscale already enabled on the laptop and on pfsense-2 +git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git +cd radio-archive-portable +./sync-db.sh # pulls from 172.16.3.20:8765/api/db.sqlite +./run.sh # creates .venv, starts uvicorn on localhost:8765 +xdg-open http://localhost:8765/ +``` + +Refreshing: `./sync-db.sh` any time. Atomic — partial download won't corrupt existing DB. + +### macOS variant (mikes-macbook-air, if used) +Same recipe. `python3 -m venv` works on Mac. `xdg-open` → `open`. + +### Jupiter redeploy procedure (when source or DB changes) +```bash +# Source change: +"/c/Program Files/PuTTY/pscp.exe" -pw -scp server/main.py \ + root@172.16.3.20:/mnt/user/appdata/radio-archive/app/ +"/c/Program Files/PuTTY/plink.exe" -ssh -pw root@172.16.3.20 \ + "cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build" + +# DB-only change (no container restart needed): +"/c/Program Files/PuTTY/pscp.exe" -pw -scp archive-data/archive.db \ + root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db +``` + +The SQLite connection on the container side is `mode=ro` URI — picks up fresh DB on next request without restart. + +--- + +## Status at session end + +- **Upstream container** rebuilt + running with `/api/db.sqlite` endpoint live +- **Classified DB** deployed to Jupiter (1,405/1,407 scored) +- **Portable repo** created and pushed to `git.azcomputerguru.com/azcomputerguru/radio-archive-portable` +- **Laptop install** is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week) +- **ClaudeTools commits:** `5e3b1a2` (this session's main.py change) +- **Untested edge cases:** offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1)