Backend min_score/exclude_banter wired through to HTML index. Adds score badges (1-5 red->green), topic_class pills, dim styling on banter rows. Live on http://172.16.3.20:8765/. Synced to portable repo. pscp ENOSPC quirk worked around by plink-stdin streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
359 lines
19 KiB
Markdown
359 lines
19 KiB
Markdown
# Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy
|
|
|
|
**Project:** The Computer Guru Show — Archive Mining System
|
|
**Goal:** Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container
|
|
**Machine:** GURU-BEAST-ROG (RTX 4090)
|
|
**User:** Mike Swanson (mike)
|
|
**Continues from:** `2026-04-29-qa-quality-classifier.md` (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB)
|
|
|
|
---
|
|
|
|
## User
|
|
- **User:** Mike Swanson (mike)
|
|
- **Machine:** GURU-BEAST-ROG
|
|
- **Role:** admin
|
|
|
|
---
|
|
|
|
## Session Summary
|
|
|
|
The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed.
|
|
|
|
Verified Tailscale state from `tailscale status --json` — pfsense-2 (`100.119.153.74`) advertises `172.16.0.0/22` as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to `172.16.3.20:8765` already accepts subnet-routed traffic without modification.
|
|
|
|
Built a new private Gitea repo `azcomputerguru/radio-archive-portable` with eight files: server code (identical to upstream), a `sync-db.sh` that curl-fetches `archive.db` from a new `/api/db.sqlite` endpoint, a `run.sh` that creates a venv on first invocation and starts uvicorn on `localhost:8765`, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed.
|
|
|
|
Added the `/api/db.sqlite` endpoint to the upstream `server/main.py` using FastAPI `FileResponse`. Disclosure equivalence: anyone who can reach `/api/search` already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd `main.py` + classified `archive.db`, then `docker compose up -d --build`). Verified end-to-end: `GET /api/db.sqlite` returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; `GET /api/search?min_score=4` filters correctly with the new fields in the response.
|
|
|
|
---
|
|
|
|
## Key Decisions
|
|
|
|
- **Subnet routing already in place** — confirmed via `tailscale status --json` that pfsense-2 advertises `172.16.0.0/22` as primary route. No new daemons or routing changes required. Container bind to `172.16.3.20:8765` is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener.
|
|
- **`/api/db.sqlite` over HTTP instead of SSH/SCP for the DB sync** — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with `/api/search` (which already returns every transcript) means no auth was added to either.
|
|
- **Separate repo for the portable bundle** — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at `git.azcomputerguru.com/azcomputerguru/radio-archive-portable` (private, under the user namespace).
|
|
- **DB excluded from the repo via gitignore** — the 60 MB blob is fetched via `sync-db.sh` on first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to `.partial`, validate size, rename into place).
|
|
- **Used `docker compose up -d --build` (combined) instead of separate `build` then `up`** — separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable.
|
|
- **Stripped API token from `.git/config` after push** — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt.
|
|
|
|
---
|
|
|
|
## Problems Encountered
|
|
|
|
- **First deploy attempt landed but rebuild didn't happen** — chained `docker compose build && docker compose up -d` via plink completed exit-code-0 but the container kept running yesterday's code (verified via `docker exec radio-archive grep db.sqlite /app/main.py` returning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by using `docker compose up -d --build` as a single foreground command.
|
|
- **Bash background-task output capture flaky on long plink runs** — early deploy attempts went into the Bash tool's `run_in_background` mode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously.
|
|
- **`/tmp` path clash between git-bash and Windows Python** — a smoke-test command tried to fetch the DB via curl (using `/tmp/test-db.sqlite`) and then read it with `python -c` (also writing `/tmp/...`). Different tools resolved `/tmp` differently on Windows. Switched to a project-local `test-fetched.db` path to avoid the issue.
|
|
- **Gitea API at `/api/v1/orgs/azcomputerguru/repos` returned 404** — `azcomputerguru` is a USER, not an org. Repo creation succeeded via `/api/v1/user/repos` instead. (The token's owner is `azcomputerguru`, so user-namespace creation worked.)
|
|
- **`HEAD /api/db.sqlite` returns 405 Method Not Allowed** — FastAPI's default routing only registers GET. A `HEAD` is fine to fail because the sync script uses `GET`. Documented behavior, not a bug.
|
|
|
|
---
|
|
|
|
## Credentials Used
|
|
|
|
### Jupiter (Unraid Primary)
|
|
- **Vault path:** `infrastructure/jupiter-unraid-primary.sops.yaml`
|
|
- **Host:** 172.16.3.20
|
|
- **User:** root
|
|
- **Password:** `Th1nk3r^99##`
|
|
- **iDRAC IP:** 172.16.1.73 / root / `Window123!@#-idrac`
|
|
|
|
### Gitea
|
|
- **Vault path:** `services/gitea.sops.yaml`
|
|
- **URL:** https://git.azcomputerguru.com
|
|
- **Username:** `azcomputerguru`
|
|
- **Password:** `Gptf*77ttb123!@#-git` (alt: `Window123!@#-git`)
|
|
- **API token (used this session):** `9b1da4b79a38ef782268341d25a4b6880572063f`
|
|
- **SSH:** `ssh://git@172.16.3.20:2222`
|
|
|
|
### New Repo
|
|
- **Clone URL:** https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
|
|
- **SSH URL:** `git@172.16.3.21:azcomputerguru/radio-archive-portable.git`
|
|
- **Visibility:** private
|
|
- **Default branch:** main
|
|
|
|
---
|
|
|
|
## Infrastructure Touched
|
|
|
|
| Host | IP | Role | Action |
|
|
|---|---|---|---|
|
|
| Jupiter (Unraid Primary) | 172.16.3.20 | Hypervisor + Docker host | pscp'd updated `main.py` + `archive.db`; `docker compose up -d --build` |
|
|
| Radio-archive container | container on Jupiter, bind `172.16.3.20:8765` | FastAPI + SQLite | Rebuilt with new endpoint; restarted with classifier-populated DB |
|
|
| Gitea (on Jupiter, port 3000) | git.azcomputerguru.com | Source hosting | New repo created via API |
|
|
| pfsense-2 router | (Tailscale `100.119.153.74`) | Subnet router | No changes — verified existing 172.16.0.0/22 advertisement |
|
|
|
|
### Tailscale state at session time
|
|
|
|
```
|
|
100.101.122.4 guru-beast-rog (this machine, online)
|
|
100.65.158.123 mikes-macbook-air (last seen 4m before check)
|
|
100.95.216.79 acg-guru-5070 (offline 30d ago — boot it up next week)
|
|
100.119.153.74 pfsense-2 (active; advertises 172.16.0.0/22 as PRIMARY)
|
|
```
|
|
|
|
---
|
|
|
|
## Files Created / Modified
|
|
|
|
### New repo: `radio-archive-portable/`
|
|
| Path | Purpose |
|
|
|---|---|
|
|
| `README.md` | Quick-start, refresh procedure, architecture diagram |
|
|
| `server/main.py` | Identical to deployed upstream (with `/api/db.sqlite`) |
|
|
| `server/requirements.txt` | `fastapi==0.115.6`, `uvicorn[standard]==0.34.0` |
|
|
| `sync-db.sh` | `curl -fSL -o archive-data/archive.db.partial $URL && mv` (atomic) |
|
|
| `run.sh` | Creates `.venv` on first run, then `uvicorn server.main:app --host 127.0.0.1 --port 8765` |
|
|
| `.env.example` | `ARCHIVE_HOST=172.16.3.20:8765`, `ARCHIVE_DB=archive-data/archive.db`, `PORT=8765` |
|
|
| `.gitignore` | Excludes `archive-data/archive.db`, `.venv/`, `.env`, etc. |
|
|
| `archive-data/.gitkeep` | Placeholder so the dir exists in git but the DB file doesn't |
|
|
|
|
### ClaudeTools (upstream)
|
|
| Path | Change |
|
|
|---|---|
|
|
| `projects/radio-show/audio-processor/server/main.py` | +18 / -1 — added `from fastapi.responses import FileResponse` and the `/api/db.sqlite` GET endpoint |
|
|
|
|
### Jupiter (deployed state)
|
|
| Path | Change |
|
|
|---|---|
|
|
| `/mnt/user/appdata/radio-archive/app/main.py` | Replaced (now matches `5e3b1a2`) |
|
|
| `/mnt/user/appdata/radio-archive/data/archive.db` | Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored) |
|
|
| Container `radio-archive` | Rebuilt to image `radio-archive:latest` (`sha256:dbb5ad62bdb1...`), running |
|
|
|
|
---
|
|
|
|
## Commands Run
|
|
|
|
### Tailscale verification (local)
|
|
```bash
|
|
tailscale status --json | grep -E "advertis|route|172\.|primary"
|
|
# Confirmed 172.16.0.0/22 listed under PrimaryRoutes
|
|
```
|
|
|
|
### New repo creation
|
|
```bash
|
|
curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \
|
|
-H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \
|
|
-d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}'
|
|
# HTTP 201, repo id 12
|
|
|
|
cd /c/Users/guru/radio-archive-portable
|
|
git init -b main
|
|
git config user.name "Mike Swanson"
|
|
git config user.email "mike@azcomputerguru.com"
|
|
git add -A && git commit
|
|
git remote add origin https://azcomputerguru:<token>@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
|
|
git push -u origin main
|
|
git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git # strip token
|
|
```
|
|
|
|
### Jupiter deploy
|
|
```bash
|
|
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
|
|
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \
|
|
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py
|
|
|
|
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
|
|
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \
|
|
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
|
|
# 60.5 MB at ~580 KB/s = ~100 seconds
|
|
|
|
"/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \
|
|
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
|
|
# Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running
|
|
```
|
|
|
|
### Live verification
|
|
```bash
|
|
curl -sS http://172.16.3.20:8765/api/stats
|
|
# {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,...
|
|
|
|
curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \
|
|
http://172.16.3.20:8765/api/db.sqlite
|
|
# HTTP 200 | dl=60583936B
|
|
|
|
.venv/Scripts/python.exe -c "
|
|
import sqlite3
|
|
db = sqlite3.connect('test-fetched.db')
|
|
print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone())
|
|
"
|
|
# (1405,)
|
|
|
|
curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2'
|
|
# returns 2 hits, each with usefulness_score=5, topic_class='computer-help'
|
|
```
|
|
|
|
---
|
|
|
|
## Pending / Next
|
|
|
|
1. **Test the laptop install end-to-end** when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine.
|
|
2. **HTML index UI update** — backend supports `min_score` and `exclude_banter` query params, but the search UI on `/` doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is.
|
|
3. **Re-run the 2 failed classifier rows** — `classify_qa_quality.py` re-invocation will retry the NULL-scored rows; one-line cleanup.
|
|
4. **Track 2 (voice profile clustering)** — still deferred. Lower priority since content-quality filter solved most of the search-quality problem.
|
|
5. **Track 3 (speaker oracle wiring through to search UI)** — still deferred. `speaker_oracle.py` resolves names from intros but the search results still show "CALLER" rather than the resolved name.
|
|
|
|
---
|
|
|
|
## Reference
|
|
|
|
### Endpoints (all live on http://172.16.3.20:8765/ as of this commit)
|
|
|
|
| Method | Path | Notes |
|
|
|---|---|---|
|
|
| GET | `/` | Search UI (no min_score toggle yet — query string works manually) |
|
|
| GET | `/api/stats` | Counts and per-year breakdown |
|
|
| GET | `/api/episodes?year=YYYY&limit=N` | Episode list |
|
|
| GET | `/api/episodes/{id}` | Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter) |
|
|
| GET | `/api/episodes/{id}/transcript` | Chronological merged segments + turns |
|
|
| GET | `/api/search?q=...&kind=both\|segments\|qa&min_score=N&exclude_banter=true&limit=N` | FTS5 |
|
|
| GET | `/api/callers?limit=N` | Top recurring caller_names |
|
|
| GET | `/api/db.sqlite` | **NEW** — streams the read-only DB blob (60 MB) |
|
|
|
|
### Laptop next-week recipe (5070 / Linux)
|
|
|
|
```bash
|
|
# Tailscale already enabled on the laptop and on pfsense-2
|
|
git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
|
|
cd radio-archive-portable
|
|
./sync-db.sh # pulls from 172.16.3.20:8765/api/db.sqlite
|
|
./run.sh # creates .venv, starts uvicorn on localhost:8765
|
|
xdg-open http://localhost:8765/
|
|
```
|
|
|
|
Refreshing: `./sync-db.sh` any time. Atomic — partial download won't corrupt existing DB.
|
|
|
|
### macOS variant (mikes-macbook-air, if used)
|
|
Same recipe. `python3 -m venv` works on Mac. `xdg-open` → `open`.
|
|
|
|
### Jupiter redeploy procedure (when source or DB changes)
|
|
```bash
|
|
# Source change:
|
|
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp server/main.py \
|
|
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/
|
|
"/c/Program Files/PuTTY/plink.exe" -ssh -pw <pw> root@172.16.3.20 \
|
|
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
|
|
|
|
# DB-only change (no container restart needed):
|
|
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp archive-data/archive.db \
|
|
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
|
|
```
|
|
|
|
The SQLite connection on the container side is `mode=ro` URI — picks up fresh DB on next request without restart.
|
|
|
|
---
|
|
|
|
## Status at session end
|
|
|
|
- **Upstream container** rebuilt + running with `/api/db.sqlite` endpoint live
|
|
- **Classified DB** deployed to Jupiter (1,405/1,407 scored)
|
|
- **Portable repo** created and pushed to `git.azcomputerguru.com/azcomputerguru/radio-archive-portable`
|
|
- **Laptop install** is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week)
|
|
- **ClaudeTools commits:** `5e3b1a2` (this session's main.py change)
|
|
- **Untested edge cases:** offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1)
|
|
|
|
---
|
|
|
|
## Update: 06:05 — Index UI exposes classifier filters
|
|
|
|
User asked to wire the new classifier fields into the search UI. The
|
|
backend already supported `min_score` and `exclude_banter` query params
|
|
(commit `5e3b1a2`); this update brings them into the HTML index and adds
|
|
visible quality indicators on Q/A hits.
|
|
|
|
### Update Summary
|
|
|
|
Edited `INDEX_HTML` in `server/main.py` to add two filter controls and
|
|
score badges. Verified locally via `uvicorn` on `127.0.0.1:8866` against
|
|
the classifier-populated DB (no-filter, `min_score=4`, and
|
|
`exclude_banter=true` modes all behaved correctly). Hit an unexpected
|
|
`No space left on device` from `pscp` despite Jupiter having 37 TB free
|
|
on `/mnt/user`; bypassed by streaming the file through plink stdin
|
|
(`plink ... "cat > /path" < local_file`). md5 verified byte-identical.
|
|
Container rebuilt via `docker compose up -d --build`. Synced the same
|
|
`main.py` to the portable repo so the laptop UI stays in sync.
|
|
|
|
### What changed in the UI
|
|
|
|
- **`min score` select** — values: any, 2+, 3+, 4+, 5. Default `any` to
|
|
preserve old search behavior. Filters surface 1,096 mid-and-above
|
|
pairs at `3+` or 523 useful pairs at `4+`.
|
|
- **`hide banter` checkbox** — when checked, drops the 606 rows with
|
|
`is_banter=1`.
|
|
- **Score badge per Q/A hit** — small color-coded number (1=red, 5=green)
|
|
next to each hit's metadata line. Title attribute shows
|
|
`usefulness N/5` on hover.
|
|
- **Topic class tag** — small gray pill showing `computer-help`,
|
|
`banter`, `off-topic`, `promo`, or `unclear`.
|
|
- **Dimmed rendering** — hits with score 1-2 or `is_banter=true` render
|
|
at 55% opacity. Visible but visually de-emphasized so good hits stand
|
|
out at a glance.
|
|
- **`escapeHtml` helper** — defensive XSS guard on `caller_name` and
|
|
`title` (transcript-derived strings).
|
|
|
|
### Key Decisions (this update)
|
|
|
|
- **Default filter "any"** — preserves prior search habits and saved
|
|
URLs. Mike opts into filtering when needed rather than being forced
|
|
into a curated view.
|
|
- **`URLSearchParams` instead of string concat** — only emits
|
|
`min_score=` / `exclude_banter=` when non-default, keeping URL bar
|
|
clean for the common case.
|
|
- **Color-coded badge with both score AND topic tag** — score is
|
|
numeric/comparable; topic tag is categorical and explains *why* a
|
|
score is what it is. Both together make the classifier's reasoning
|
|
visible at a glance without forcing a click.
|
|
- **Dim instead of hide for low-quality hits** — keeps everything
|
|
visible by default; the filter controls are the explicit "hide" lever.
|
|
- **Used `plink "cat > path"` instead of pscp** for the deploy when
|
|
pscp failed — faster than diagnosing the underlying scp/shfs issue
|
|
and gets the job done deterministically.
|
|
|
|
### Problems Encountered (this update)
|
|
|
|
- **pscp ENOSPC despite 37 TB free** — `pscp main.py` failed with
|
|
`No space left on device` on two retries. df showed 37 TB free on
|
|
`/mnt/user`, df -i showed inodes fine. Workaround:
|
|
`plink ... "cat > /path/main.py" < local_main.py`. md5sum confirmed
|
|
byte-identical post-transfer. Likely Unraid shfs cache-pool churn or
|
|
an issue with overwriting an in-use file from inside a container's
|
|
mount. Worth understanding eventually but didn't block the deploy.
|
|
- **plink output buffering on chained docker commands** — long
|
|
`docker compose up -d --build` runs hung from Bash's run_in_background
|
|
view (output file stayed empty for minutes). Foreground sync run with
|
|
the same command worked instantly. Same pattern observed yesterday.
|
|
Workaround: don't background long plink runs; just block.
|
|
|
|
### Files Changed (this update)
|
|
|
|
| Path | Change |
|
|
|---|---|
|
|
| `projects/radio-show/audio-processor/server/main.py` | +51 / -4 — INDEX_HTML gained controls + badge styles + topic tag + escapeHtml + dim-class JS rendering |
|
|
| `c:/Users/guru/radio-archive-portable/server/main.py` | Same diff, synced from upstream |
|
|
|
|
### Commits (this update)
|
|
|
|
| Repo | SHA | Branch |
|
|
|---|---|---|
|
|
| ClaudeTools | `b9af34f` | main |
|
|
| radio-archive-portable | `1d6c795` | main |
|
|
|
|
### Live verification
|
|
|
|
```
|
|
$ curl -s http://172.16.3.20:8765/ | grep -cE "min_score|exclude_banter|badge.s5|topic_class"
|
|
10
|
|
$ curl -so /dev/null -w "%{size_download}\n" http://172.16.3.20:8765/
|
|
5757 # was 4040 before
|
|
$ curl -s 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2' \
|
|
| python -c "import sys,json; d=json.load(sys.stdin); print('hits:', len(d['qa']))"
|
|
hits: 2 # both score=5, topic_class='computer-help'
|
|
```
|
|
|
|
### Status at update end
|
|
|
|
- UI controls live on http://172.16.3.20:8765/ and on the portable repo
|
|
- Backend filters working (verified end-to-end)
|
|
- Untouched: HTML still has no per-hit deep-link to `/api/episodes/{id}`
|
|
(clicking a hit doesn't navigate). Future enhancement.
|
|
- Pending: laptop validation (still next week's task)
|