radio: session log — portable laptop bundle + /api/db.sqlite deploy
New private Gitea repo `azcomputerguru/radio-archive-portable` for laptop offline use. Upstream gained /api/db.sqlite for HTTP-only DB sync (no SSH keys needed). Jupiter container rebuilt + restarted with the classifier-populated DB; verified end-to-end (200 OK, 60.5 MB, 1,405 classifier rows intact, min_score filter working). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,252 @@
|
||||
# Session Log — 2026-04-30 — Portable Laptop Bundle + /api/db.sqlite Deploy
|
||||
|
||||
**Project:** The Computer Guru Show — Archive Mining System
|
||||
**Goal:** Make the search service usable from a laptop next week, including offline; ship it as a separate repo and add a DB-fetch endpoint to the upstream container
|
||||
**Machine:** GURU-BEAST-ROG (RTX 4090)
|
||||
**User:** Mike Swanson (mike)
|
||||
**Continues from:** `2026-04-29-qa-quality-classifier.md` (which covered the 3.5h qwen3:14b classifier run that produced the 1,405-row scored DB)
|
||||
|
||||
---
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-BEAST-ROG
|
||||
- **Role:** admin
|
||||
|
||||
---
|
||||
|
||||
## Session Summary
|
||||
|
||||
The radio-archive search service needed to become portable so Mike could use it from a laptop next week, including offline scenarios on a plane or in a conference room. Three options were proposed: (1) install Tailscale on Jupiter, (2) use existing Tailscale subnet routing on the office router, (3) ship a self-contained laptop copy. Mike clarified Tailscale was already running on the office router covering Jupiter's subnet, then asked to "box up the offline version" for the Dell 5070 — its own repo if needed.
|
||||
|
||||
Verified Tailscale state from `tailscale status --json` — pfsense-2 (`100.119.153.74`) advertises `172.16.0.0/22` as PRIMARY ROUTE, mike's macbook-air and acg-guru-5070 are both existing tailnet members. No subnet configuration changes needed. The existing container's bind to `172.16.3.20:8765` already accepts subnet-routed traffic without modification.
|
||||
|
||||
Built a new private Gitea repo `azcomputerguru/radio-archive-portable` with eight files: server code (identical to upstream), a `sync-db.sh` that curl-fetches `archive.db` from a new `/api/db.sqlite` endpoint, a `run.sh` that creates a venv on first invocation and starts uvicorn on `localhost:8765`, plus README, .env.example, .gitignore, archive-data placeholder. The DB itself is gitignored (60 MB; fetched on demand, never committed). Repo created via Gitea API, initial commit pushed.
|
||||
|
||||
Added the `/api/db.sqlite` endpoint to the upstream `server/main.py` using FastAPI `FileResponse`. Disclosure equivalence: anyone who can reach `/api/search` already has full transcript access, so exposing the SQLite blob adds nothing meaningful. This avoided needing SSH keys or stored credentials on the laptop side. Deployed to Jupiter (pscp'd `main.py` + classified `archive.db`, then `docker compose up -d --build`). Verified end-to-end: `GET /api/db.sqlite` returns 200 with 60,583,936 bytes; the fetched DB contains all 1,405 classifier rows intact; `GET /api/search?min_score=4` filters correctly with the new fields in the response.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Subnet routing already in place** — confirmed via `tailscale status --json` that pfsense-2 advertises `172.16.0.0/22` as primary route. No new daemons or routing changes required. Container bind to `172.16.3.20:8765` is sufficient because Tailscale traffic destined for that IP arrives via the router's LAN egress and hits the existing listener.
|
||||
- **`/api/db.sqlite` over HTTP instead of SSH/SCP for the DB sync** — keeps everything on the same Tailscale-routed port, no SSH key management, no stored passwords on the laptop. Disclosure equivalence with `/api/search` (which already returns every transcript) means no auth was added to either.
|
||||
- **Separate repo for the portable bundle** — keeps the laptop install-flow simple (clone + run two scripts) and avoids cloning the 100+ GB ClaudeTools monorepo on a travel laptop. Repo lives at `git.azcomputerguru.com/azcomputerguru/radio-archive-portable` (private, under the user namespace).
|
||||
- **DB excluded from the repo via gitignore** — the 60 MB blob is fetched via `sync-db.sh` on first run. Repo stays at ~15 KB. The fetch is idempotent and atomic (download to `.partial`, validate size, rename into place).
|
||||
- **Used `docker compose up -d --build` (combined) instead of separate `build` then `up`** — separate commands chained through plink either silently buffered or failed to trigger a rebuild on a previous attempt; container kept running 2-hour-old code. Combined form was reliable.
|
||||
- **Stripped API token from `.git/config` after push** — token had been embedded in the origin URL for the initial push; replaced with the bare HTTPS URL afterward so it doesn't sit in plain text. Future pushes will go via Gitea credential helper or interactive prompt.
|
||||
|
||||
---
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **First deploy attempt landed but rebuild didn't happen** — chained `docker compose build && docker compose up -d` via plink completed exit-code-0 but the container kept running yesterday's code (verified via `docker exec radio-archive grep db.sqlite /app/main.py` returning nothing). Likely BuildKit output buffering or plink session quirks. Resolved by using `docker compose up -d --build` as a single foreground command.
|
||||
- **Bash background-task output capture flaky on long plink runs** — early deploy attempts went into the Bash tool's `run_in_background` mode but the output file stayed empty for minutes despite the underlying SSH session completing. Worked around by running shorter commands synchronously.
|
||||
- **`/tmp` path clash between git-bash and Windows Python** — a smoke-test command tried to fetch the DB via curl (using `/tmp/test-db.sqlite`) and then read it with `python -c` (also writing `/tmp/...`). Different tools resolved `/tmp` differently on Windows. Switched to a project-local `test-fetched.db` path to avoid the issue.
|
||||
- **Gitea API at `/api/v1/orgs/azcomputerguru/repos` returned 404** — `azcomputerguru` is a USER, not an org. Repo creation succeeded via `/api/v1/user/repos` instead. (The token's owner is `azcomputerguru`, so user-namespace creation worked.)
|
||||
- **`HEAD /api/db.sqlite` returns 405 Method Not Allowed** — FastAPI's default routing only registers GET. A `HEAD` is fine to fail because the sync script uses `GET`. Documented behavior, not a bug.
|
||||
|
||||
---
|
||||
|
||||
## Credentials Used
|
||||
|
||||
### Jupiter (Unraid Primary)
|
||||
- **Vault path:** `infrastructure/jupiter-unraid-primary.sops.yaml`
|
||||
- **Host:** 172.16.3.20
|
||||
- **User:** root
|
||||
- **Password:** `Th1nk3r^99##`
|
||||
- **iDRAC IP:** 172.16.1.73 / root / `Window123!@#-idrac`
|
||||
|
||||
### Gitea
|
||||
- **Vault path:** `services/gitea.sops.yaml`
|
||||
- **URL:** https://git.azcomputerguru.com
|
||||
- **Username:** `azcomputerguru`
|
||||
- **Password:** `Gptf*77ttb123!@#-git` (alt: `Window123!@#-git`)
|
||||
- **API token (used this session):** `9b1da4b79a38ef782268341d25a4b6880572063f`
|
||||
- **SSH:** `ssh://git@172.16.3.20:2222`
|
||||
|
||||
### New Repo
|
||||
- **Clone URL:** https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
|
||||
- **SSH URL:** `git@172.16.3.21:azcomputerguru/radio-archive-portable.git`
|
||||
- **Visibility:** private
|
||||
- **Default branch:** main
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Touched
|
||||
|
||||
| Host | IP | Role | Action |
|
||||
|---|---|---|---|
|
||||
| Jupiter (Unraid Primary) | 172.16.3.20 | Hypervisor + Docker host | pscp'd updated `main.py` + `archive.db`; `docker compose up -d --build` |
|
||||
| Radio-archive container | container on Jupiter, bind `172.16.3.20:8765` | FastAPI + SQLite | Rebuilt with new endpoint; restarted with classifier-populated DB |
|
||||
| Gitea (on Jupiter, port 3000) | git.azcomputerguru.com | Source hosting | New repo created via API |
|
||||
| pfsense-2 router | (Tailscale `100.119.153.74`) | Subnet router | No changes — verified existing 172.16.0.0/22 advertisement |
|
||||
|
||||
### Tailscale state at session time
|
||||
|
||||
```
|
||||
100.101.122.4 guru-beast-rog (this machine, online)
|
||||
100.65.158.123 mikes-macbook-air (last seen 4m before check)
|
||||
100.95.216.79 acg-guru-5070 (offline 30d ago — boot it up next week)
|
||||
100.119.153.74 pfsense-2 (active; advertises 172.16.0.0/22 as PRIMARY)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files Created / Modified
|
||||
|
||||
### New repo: `radio-archive-portable/`
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `README.md` | Quick-start, refresh procedure, architecture diagram |
|
||||
| `server/main.py` | Identical to deployed upstream (with `/api/db.sqlite`) |
|
||||
| `server/requirements.txt` | `fastapi==0.115.6`, `uvicorn[standard]==0.34.0` |
|
||||
| `sync-db.sh` | `curl -fSL -o archive-data/archive.db.partial $URL && mv` (atomic) |
|
||||
| `run.sh` | Creates `.venv` on first run, then `uvicorn server.main:app --host 127.0.0.1 --port 8765` |
|
||||
| `.env.example` | `ARCHIVE_HOST=172.16.3.20:8765`, `ARCHIVE_DB=archive-data/archive.db`, `PORT=8765` |
|
||||
| `.gitignore` | Excludes `archive-data/archive.db`, `.venv/`, `.env`, etc. |
|
||||
| `archive-data/.gitkeep` | Placeholder so the dir exists in git but the DB file doesn't |
|
||||
|
||||
### ClaudeTools (upstream)
|
||||
| Path | Change |
|
||||
|---|---|
|
||||
| `projects/radio-show/audio-processor/server/main.py` | +18 / -1 — added `from fastapi.responses import FileResponse` and the `/api/db.sqlite` GET endpoint |
|
||||
|
||||
### Jupiter (deployed state)
|
||||
| Path | Change |
|
||||
|---|---|
|
||||
| `/mnt/user/appdata/radio-archive/app/main.py` | Replaced (now matches `5e3b1a2`) |
|
||||
| `/mnt/user/appdata/radio-archive/data/archive.db` | Replaced with classifier-populated copy (60,583,936 bytes, 1,405/1,407 scored) |
|
||||
| Container `radio-archive` | Rebuilt to image `radio-archive:latest` (`sha256:dbb5ad62bdb1...`), running |
|
||||
|
||||
---
|
||||
|
||||
## Commands Run
|
||||
|
||||
### Tailscale verification (local)
|
||||
```bash
|
||||
tailscale status --json | grep -E "advertis|route|172\.|primary"
|
||||
# Confirmed 172.16.0.0/22 listed under PrimaryRoutes
|
||||
```
|
||||
|
||||
### New repo creation
|
||||
```bash
|
||||
curl -X POST "https://git.azcomputerguru.com/api/v1/user/repos" \
|
||||
-H "Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f" \
|
||||
-d '{"name":"radio-archive-portable","private":true,"default_branch":"main"}'
|
||||
# HTTP 201, repo id 12
|
||||
|
||||
cd /c/Users/guru/radio-archive-portable
|
||||
git init -b main
|
||||
git config user.name "Mike Swanson"
|
||||
git config user.email "mike@azcomputerguru.com"
|
||||
git add -A && git commit
|
||||
git remote add origin https://azcomputerguru:<token>@git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
|
||||
git push -u origin main
|
||||
git remote set-url origin https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git # strip token
|
||||
```
|
||||
|
||||
### Jupiter deploy
|
||||
```bash
|
||||
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
|
||||
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/server/main.py \
|
||||
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/main.py
|
||||
|
||||
"/c/Program Files/PuTTY/pscp.exe" -batch -pw "$PW" -scp \
|
||||
c:/Users/guru/ClaudeTools/projects/radio-show/audio-processor/archive-data/archive.db \
|
||||
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
|
||||
# 60.5 MB at ~580 KB/s = ~100 seconds
|
||||
|
||||
"/c/Program Files/PuTTY/plink.exe" -batch -ssh -pw "$PW" root@172.16.3.20 \
|
||||
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
|
||||
# Built radio-archive:latest sha256:dbb5ad62bdb1..., container Running
|
||||
```
|
||||
|
||||
### Live verification
|
||||
```bash
|
||||
curl -sS http://172.16.3.20:8765/api/stats
|
||||
# {"counts":{"episodes":572,"segments":60917,...},"by_year":[{"year":2010,...
|
||||
|
||||
curl -sS -o test-fetched.db -w "HTTP %{http_code} | dl=%{size_download}B\n" \
|
||||
http://172.16.3.20:8765/api/db.sqlite
|
||||
# HTTP 200 | dl=60583936B
|
||||
|
||||
.venv/Scripts/python.exe -c "
|
||||
import sqlite3
|
||||
db = sqlite3.connect('test-fetched.db')
|
||||
print(db.execute('SELECT COUNT(*) FROM qa_pairs WHERE usefulness_score IS NOT NULL').fetchone())
|
||||
"
|
||||
# (1405,)
|
||||
|
||||
curl -sS 'http://172.16.3.20:8765/api/search?q=BIOS&kind=qa&min_score=4&limit=2'
|
||||
# returns 2 hits, each with usefulness_score=5, topic_class='computer-help'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pending / Next
|
||||
|
||||
1. **Test the laptop install end-to-end** when the 5070 boots up next week — confirm sync-db.sh + run.sh work cleanly on Linux. Currently untested on the actual target machine.
|
||||
2. **HTML index UI update** — backend supports `min_score` and `exclude_banter` query params, but the search UI on `/` doesn't expose them as toggles or show the score/topic_class on each hit. Backend is ready when the UI is.
|
||||
3. **Re-run the 2 failed classifier rows** — `classify_qa_quality.py` re-invocation will retry the NULL-scored rows; one-line cleanup.
|
||||
4. **Track 2 (voice profile clustering)** — still deferred. Lower priority since content-quality filter solved most of the search-quality problem.
|
||||
5. **Track 3 (speaker oracle wiring through to search UI)** — still deferred. `speaker_oracle.py` resolves names from intros but the search results still show "CALLER" rather than the resolved name.
|
||||
|
||||
---
|
||||
|
||||
## Reference
|
||||
|
||||
### Endpoints (all live on http://172.16.3.20:8765/ as of this commit)
|
||||
|
||||
| Method | Path | Notes |
|
||||
|---|---|---|
|
||||
| GET | `/` | Search UI (no min_score toggle yet — query string works manually) |
|
||||
| GET | `/api/stats` | Counts and per-year breakdown |
|
||||
| GET | `/api/episodes?year=YYYY&limit=N` | Episode list |
|
||||
| GET | `/api/episodes/{id}` | Detail with intros + qa_pairs (now includes usefulness_score, topic_class, is_banter) |
|
||||
| GET | `/api/episodes/{id}/transcript` | Chronological merged segments + turns |
|
||||
| GET | `/api/search?q=...&kind=both\|segments\|qa&min_score=N&exclude_banter=true&limit=N` | FTS5 |
|
||||
| GET | `/api/callers?limit=N` | Top recurring caller_names |
|
||||
| GET | `/api/db.sqlite` | **NEW** — streams the read-only DB blob (60 MB) |
|
||||
|
||||
### Laptop next-week recipe (5070 / Linux)
|
||||
|
||||
```bash
|
||||
# Tailscale already enabled on the laptop and on pfsense-2
|
||||
git clone https://git.azcomputerguru.com/azcomputerguru/radio-archive-portable.git
|
||||
cd radio-archive-portable
|
||||
./sync-db.sh # pulls from 172.16.3.20:8765/api/db.sqlite
|
||||
./run.sh # creates .venv, starts uvicorn on localhost:8765
|
||||
xdg-open http://localhost:8765/
|
||||
```
|
||||
|
||||
Refreshing: `./sync-db.sh` any time. Atomic — partial download won't corrupt existing DB.
|
||||
|
||||
### macOS variant (mikes-macbook-air, if used)
|
||||
Same recipe. `python3 -m venv` works on Mac. `xdg-open` → `open`.
|
||||
|
||||
### Jupiter redeploy procedure (when source or DB changes)
|
||||
```bash
|
||||
# Source change:
|
||||
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp server/main.py \
|
||||
root@172.16.3.20:/mnt/user/appdata/radio-archive/app/
|
||||
"/c/Program Files/PuTTY/plink.exe" -ssh -pw <pw> root@172.16.3.20 \
|
||||
"cd /mnt/user/appdata/radio-archive/app && docker compose up -d --build"
|
||||
|
||||
# DB-only change (no container restart needed):
|
||||
"/c/Program Files/PuTTY/pscp.exe" -pw <pw> -scp archive-data/archive.db \
|
||||
root@172.16.3.20:/mnt/user/appdata/radio-archive/data/archive.db
|
||||
```
|
||||
|
||||
The SQLite connection on the container side is `mode=ro` URI — picks up fresh DB on next request without restart.
|
||||
|
||||
---
|
||||
|
||||
## Status at session end
|
||||
|
||||
- **Upstream container** rebuilt + running with `/api/db.sqlite` endpoint live
|
||||
- **Classified DB** deployed to Jupiter (1,405/1,407 scored)
|
||||
- **Portable repo** created and pushed to `git.azcomputerguru.com/azcomputerguru/radio-archive-portable`
|
||||
- **Laptop install** is a clone + 2 shell scripts; untested on the actual 5070 (will validate next week)
|
||||
- **ClaudeTools commits:** `5e3b1a2` (this session's main.py change)
|
||||
- **Untested edge cases:** offline behavior (planes, no Tailscale), curl with HTTP/2 to /api/db.sqlite (was tested with HTTP/1.1)
|
||||
Reference in New Issue
Block a user