Commit Graph

54 Commits

Author SHA1 Message Date
6c1cb9ed83 sync: auto-sync from Mikes-MacBook-Air.local at 2026-05-16 09:02:02
Author: Mike Swanson
Machine: Mikes-MacBook-Air.local
Timestamp: 2026-05-16 09:02:02
2026-05-16 09:02:06 -07:00
f91355d993 sync: auto-sync from GURU-BEAST-ROG at 2026-05-15 18:35:44
Author: Mike Swanson
Machine: GURU-BEAST-ROG
Timestamp: 2026-05-15 18:35:44
2026-05-15 18:35:45 -07:00
8539f62462 radio-archive: add /api/clip endpoint + download buttons + ffmpeg in Dockerfile
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:44:46 -07:00
b008b61440 sync: auto-sync from GURU-BEAST-ROG at 2026-05-01 15:05:53
Author: Mike Swanson
Machine: GURU-BEAST-ROG
Timestamp: 2026-05-01 15:05:53
2026-05-01 15:05:56 -07:00
281cdc4e4f Session log: radio-show UI redesign recovery + Jupiter audio-404 diagnosis
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 05:41:07 -07:00
d7ce9cb670 radio: visual redesign of search + episode pages, active-Q&A highlight follows playhead
Frontend pass on the two embedded HTML templates in the FastAPI server. No
backend / Python logic changed; only template strings, CSS, and inline JS.

Index page: full CSS custom-property theme (light, #c39733 accent),
responsive viewport meta, search input with embedded SVG magnifier and
focus ring, control bar reorganised into divider-separated groups with
the browse-mode toggle rendered via :has() selector, hit cards with
hover-lift + arrow indicator and focus-visible outline, restyled Q/A
badges and score/topic chips, animated loading dots.

Episode page: sticky audio player and sticky aside (top: 130px,
max-height calc'd against viewport). New active-Q&A highlight builds a
sorted index of QA blocks at load time, computes each block's end as
the next block's start (capped at +180s), and on timeupdate/pause
toggles .active on both the body QA block and its aside list item; a
"NOW PLAYING" pill is revealed on .qa.active. Intro-marker also gets
.active. Audio preload bumped from none to metadata so #qa-<id> deep
links can seek without a prior user gesture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 05:35:55 -07:00
f20a9628c3 radio: browseable Q&A — /api/qa, /api/audio range streaming, /episode HTML view
Make the radio archive Q&A pairs actually browseable end to end:

- /api/qa list endpoint (year, min_score, exclude_banter, topic_class,
  pagination, sort by air_date or score). Returns the same column shape as
  /api/search Q&A hits.
- /api/audio/{episode_id} streams the MP3 with HTTP Range support so the
  browser <audio> can seek. 206 + Content-Range when ranged, 200 when
  full-file. Returns 404 cleanly when episodes/ tree is absent (Jupiter).
- /episode/{id} HTML transcript view: chronological segments with clickable
  timestamps, Q&A blocks spliced inline (anchor #qa-<id>), intros marked
  inline, right-rail summary. Hash-anchor on load auto-seeks the audio.
- New question_excerpt / answer_excerpt fields on /api/search Q&A hits and
  on /api/qa items: trim leading run-on chatter, take ~300 chars, end on a
  sentence boundary or word boundary with ellipsis.
- Index UI: each Q&A hit now links to /episode/{id}#qa-{qa_id}; new
  "Browse all Q&A" toggle (year selector, sort, append-load 50 per page,
  defaults to min_score=3); FTS snippet replaced with the plain excerpt
  when available.

No new dependencies, no schema changes, no LLM calls. Uses
EPISODES_DIR env (default /data/episodes) — Jupiter compose still only
mounts /data so audio degrades gracefully to 404 there until episodes
are uploaded.
2026-04-30 07:17:48 -07:00
6239f9fc3a radio: session log update — index UI exposes classifier filters
Backend min_score/exclude_banter wired through to HTML index. Adds
score badges (1-5 red->green), topic_class pills, dim styling on
banter rows. Live on http://172.16.3.20:8765/. Synced to portable
repo. pscp ENOSPC quirk worked around by plink-stdin streaming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:07:00 -07:00
b9af34fbd8 radio: index UI exposes min_score / exclude_banter + score badges
Adds quality-filter controls to the search UI: a "min score" select
(any/2+/3+/4+/5) and a "hide banter" checkbox. Q/A hits gain a small
color-coded usefulness badge (1-5, red->green) and a topic_class tag
(computer-help, banter, off-topic, promo). Low-score and banter rows
render dimmed by default so they're visible but de-emphasized.

Defaults to "any" + banter visible to preserve existing search habits.
Mike toggles up when he wants quality. URL-encoded params built via
URLSearchParams so empty values don't leak into requests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:54:45 -07:00
48c8b311bf radio: session log — portable laptop bundle + /api/db.sqlite deploy
New private Gitea repo `azcomputerguru/radio-archive-portable` for
laptop offline use. Upstream gained /api/db.sqlite for HTTP-only DB
sync (no SSH keys needed). Jupiter container rebuilt + restarted with
the classifier-populated DB; verified end-to-end (200 OK, 60.5 MB,
1,405 classifier rows intact, min_score filter working).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:37:01 -07:00
5e3b1a2297 radio: add /api/db.sqlite for offline laptop sync
Streams the read-only archive.db over the same Tailscale-routed port
as the search service. Companion to azcomputerguru/radio-archive-portable
which curl-fetches from this endpoint and runs locally on the laptop.

Disclosure equivalent to /api/search (which already exposes every
transcript), so no auth added. Deployed to Jupiter; verified GET
returns 60 MB SQLite blob with all 1,405 classifier rows intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:50:50 -07:00
8d4bb16255 radio: session log — Q/A usefulness classifier (Track 1) complete
3.5h run on qwen3:14b processed 1,405/1,407 Q/A pairs (2 failed,
will retry on next invocation). 37% scored 4-5 (useful), 41%
scored 1-2 (banter/promo/off-topic). API filter ready; Jupiter
redeploy pending Mike's manual review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:34:15 -07:00
42688901f9 radio: Q/A usefulness classifier + min_score search filter (Track 1)
Adds an Ollama-based content quality classifier and exposes the
results via the search API. 1,407 existing Q/A pairs were scored
in 3.5h via qwen3:14b (1,405 succeeded, 2 failed).

Distribution: 37% scored 4-5 (useful), 41% scored 1-2 (banter/promo/
off-topic). 43% flagged as banter overall. Default-on filtering at
search time will hide ~half of the noise without losing any real
listener questions.

Files:
- new classify_qa_quality.py: walks qa_pairs, calls Ollama qwen3:14b
  per row, writes usefulness_score/topic_class/is_banter back to DB.
  Idempotent (--rebuild to reprocess), --smoke for sample check, --limit
  for partial runs. Detached run handles 1407 rows in ~3.5h on a 4090.
- server/main.py: /api/search accepts min_score (0-5) and exclude_banter
  query params. NULL scores treat as "include" so unprocessed rows still
  appear. Episode detail endpoint includes the new fields in qa results.

Schema migration in import_to_sqlite.py was made by the same agent run
(visible on the live archive.db: usefulness_score / topic_class /
is_banter columns now exist on qa_pairs).

Local archive.db updated; Jupiter container has NOT been redeployed
yet — that is a separate manual step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:32:41 -07:00
6b63c154d2 sync: auto-sync from GURU-BEAST-ROG at 2026-04-29 13:29:17
Author: Mike Swanson
Machine: GURU-BEAST-ROG
Timestamp: 2026-04-29 13:29:17
2026-04-29 13:29:19 -07:00
519278a664 radio: session log update — Jupiter container live at 172.16.3.20:8765
Append to 2026-04-28-session.md covering the FastAPI/SQLite container
deploy: build + ship + verify, plus credentials, paths, and re-deploy
procedures for both DB updates and source updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:05:34 -07:00
71ada136a8 radio: FastAPI/SQLite query server, deployed to Jupiter
Read-only HTTP layer over archive.db. Endpoints: /api/stats,
/api/episodes, /api/episodes/{id}, /api/episodes/{id}/transcript,
/api/search (FTS5 over segments + qa_pairs, bm25-ranked, snippets),
/api/callers. Single-file HTML index with debounced search UI.

Deployed: Jupiter (Unraid Docker), bound to 172.16.3.20:8765, LAN only.
Container path: /mnt/user/appdata/radio-archive/{app,data}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:00:22 -07:00
90b1ffff8b radio: session log — full archive imported (572 ep / 482.7h / 57.7 MB DB)
Execution-only follow-on to 2026-04-27. Both batch passes done (519+53,
0 errors), import_to_sqlite.py run incrementally to bring archive.db
to final state. Next step: Jupiter Docker container deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 05:30:08 -07:00
82940d96d7 radio: utf-8 transcript writes + sqlite archive importer + session log
- src/transcriber.py: open transcript.{json,txt,srt} with encoding="utf-8".
  Windows cp1252 default crashed on Whisper output containing U+2044.
- import_to_sqlite.py: new. Walks archive-data/transcripts, builds
  archive.db (5 tables + 2 FTS5 virtual tables, sha256-keyed idempotency).
  20.5 MB / 208 episodes at smoke-test time, 1.9s rebuild.
- batch_process.py: tracked from prior session — full-archive batch with
  resumable transcribe/diarize/intros/qa pipeline.
- .gitignore: archive-data/ and logs/.

Session log: 2026-04-27-archive-batch-and-sqlite-import.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 19:38:02 -07:00
488bf5849e radio: attach caller names to Q&A pairs from transcript intros
QAPair gets caller_name and caller_role fields populated by a new
attach_caller_names(pairs, transcript_segments) helper. For each pair,
finds the active opening intro at the question_start time (8s forward
tolerance, no backward limit — a caller's call can run for 10+ minutes
and the intro happens once at the start) and attaches the speaker name.

Validation on 9-episode test set:
  19/19 Q&A pairs (100%) now have caller names attached.

Examples of corrections from oracle attribution:
  2018-s10e18 @ 73:36  Christopher (was misattributed to "Tara")
  2015-s7e19 @ 35:45   William     (was misattributed to "Tara")
  2010-05-08-hr1       Jackie x3, Bruce
  2012-03-10-hr1       Adam x2
  2016-s8e43           John, Doug
  2017-s9e30           Tom, Denise x3, Charlie

speaker_oracle.py: adds speaker_at(time, intros) helper used both by the
existing resolve_speakers() and the new caller-name attachment. Also
adds the "let's fit/bring/put X in/on" intro pattern variant (caught
Charlie at 70:21 in 2017-s9e30 that "talk to X" missed).

download_full_archive.py: SSH keepalive every 30s + per-file retry-on-
failure (up to 3 attempts with reconnect). Earlier run hung on a dead
connection at file 109 of 589 with no recovery; restarted run is now
running at ~10 MB/s vs ~2-3 MB/s before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:55:31 -07:00
1b574caba4 radio: transcript-driven speaker name resolution (oracle)
New module src/speaker_oracle.py extracts speaker introductions from
transcripts ("let's talk to William", "we have Clay from the Nerd Junkies",
"in Tara's place, we have Clay", "thanks for the call <name>") and binds
them to non-HOST diarization turns. Pure post-pass on diarization JSONs,
no audio processing — corrects audio-only cosine errors using Mike's
deterministic on-air announcements.

Algorithm:
- Extract intros: regex patterns for caller pickups, guest intros,
  fill-in announcements, caller closes. Case-strict (rejects mid-sentence
  lowercase matches), with a blacklist of common false-positive words.
  Deduplicates same-name intros within 5s.
- Resolve speakers: for each non-HOST turn, find the LATEST opening intro
  at or before turn.start (with 8s forward tolerance for boundary slop).
  Later intros implicitly close earlier callers, so the most recent
  intro wins. No artificial lookback limit (callers can talk for 10+ min).
- Falls back to caller_close patterns within 30s after a turn ends.

Validation on 9-episode test set:
  2018-s10e18: Christopher 190s correctly named (was mislabeled "Tara")
  2012-06-09 : Kay 160s correctly named (was mislabeled "Tara")
  2015-s7e19 : Clay 45s as fillin for Tara, William 40s as caller
  2016-s8e43 : Charles 630s, Bruce 210s, John 205s — most callers named
  2017-s9e30 : Denise 295s, Tom 115s, Elaine 85s, Jeff 10s
  Many other callers across all episodes correctly named.

Remaining unnamed CO-HOST/CALLER (~5-10% of non-HOST time) are real
co-host banter or callers without explicit Mike-introductions.

benchmark.py: adds Phase 2.5 "Name Resolution" between diarization and
Q&A extraction. Prints named-speaker breakdown per episode. Doesn't
modify diarization JSONs (resolution is computed on demand).

Next step: feed named turns into qa_extractor so Q&A pairs get caller
name attached for searchability. Also: bootstrap recurring-speaker
profiles (Tara, Tony, Rob, Randall, producers) by accumulating
intro-tagged windows across the full archive once download completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:48:16 -07:00
4c89402df8 radio: skip Clay profile build (failed) — accept 2015-s7e19 Q&A as noisy
First attempt at Clay's voice profile from 2015-s7e19 produced
Clay-vs-Mike cosine similarity of 0.994 — essentially a Mike clone.
Root cause: 10s WavLM x-vector chunks averaged Mike's frequent
interjections together with Clay's dialogue, and Mike's well-trained
profile dominated the resulting embedding signal.

Mike's call: skip Clay, accept the 2015-s7e19 Q&A as noisy. Clay rarely
appears in other episodes, so the cost of not having his profile is
bounded to this one episode plus any rare future appearances.

Cleanup:
- voice-profiles/clay/ removed
- voice-profiles/profiles.json: Clay entry removed
- Memory updated to record the decision and the failure mode

Kept build_clay_profile.py in-repo as documentation of the attempt and
the Mike-similarity-filter pattern. Useful starting point if a future
attempt provides cleaner pure-Clay timestamps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:36:46 -07:00
c760e430c0 radio: bumper detection in diarizer + full archive download script
Adds a transcript-driven bumper filter to the diarization pipeline. When
a transcript segment matches qa_extractor's promo/bumper signatures, the
overlapping audio windows are labeled BUMPER and the WavLM cosine match
is skipped. Prevents music/promo from being matched against speaker
profiles (the failure mode Mike caught in 2018-s10e18 @ 09:20-10:05).

Code changes:
- src/voice_profiler.py: identify_speakers() takes optional skip_ranges
  parameter; windows whose midpoint falls in a skip range get labeled
  "[bumper]" and skip cosine match
- src/diarizer.py: diarize() takes optional transcript_path; pre-computes
  bumper time ranges via qa_extractor._is_promo_or_bumper, passes to
  identify_speakers; adds BUMPER speaker label
- benchmark.py: passes transcript_path to diarize()

Aggregate impact across 9-episode test set:
  Tara attribution: 4880s -> 3680s  (-1200s / -25%)
  Q&A pairs: 17 -> 19 (+2)
    (bumper-flagged segments had been disrupting conversation detection
     in 2017-s9e30 and 2018-s10e18)
  CALLER total: 1320s -> 1190s  (bumpers previously labeled CALLER moved)
  Per-episode bumpers caught: 1-8, total ~165 bumper segments across set

Remaining Tara false positives are real callers acoustically similar to
Tara (Christopher in 2018, Kay in 2012, William and Charles in 2015) and
guest Clay in 2015-s7e19 — those need profile rebuild + Clay profile,
not bumper filtering.

Adds download_full_archive.py — resumable mirror-style downloader that
walks IX server's /home/gurushow/public_html/archive/{year}/ and copies
all MP3s to archive-data/episodes/. Run is in progress (~589 files,
~10-15GB). Used to source clean profile windows for the remaining
co-hosts (Tara rebuild, Clay, Tony, Rob, Randall, producers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:17:50 -07:00
a4f527f31e radio: per-year test set (one episode per year, 2010-2018)
Added 2010, 2015, 2018 test episodes to round out the test set to one
per available year:
- 2010-05-08-hr1 (May 2010, earliest available; pre-Tara era)
- 2015-s7e19 (Jan 2015, avoids training's s7e30)
- 2018-s10e18 (only 3 non-training 2018 episodes exist)

Archive has no 2019 directory — Rob's "2018/2019 appearances" are
constrained to the 5 available 2018 episodes only.

Per-year diarization summary (Tara presence, post-rename):
  2010-05-08    30s   1.2%   likely false positive (pre-Tara)
  2011-03-12   140s   5.6%   likely false positive (call-in only)
  2012-03-10    30s   1.1%   likely false positive (call-in only)
  2012-06-09   340s  12.8%   suspicious — Mike to confirm
  2014-s6e19   680s  23.3%   confirmed
  2015-s7e19   280s   9.9%   plausible — Mike to confirm
  2016-s8e43  1890s  35.5%   confirmed
  2017-s9e30   610s  11.4%   plausible
  2018-s10e18  880s  17.1%   COULD BE ROB — Mike flagged Rob for
                              2018/2019 appearances; cosine threshold may
                              be hitting on Rob being acoustically similar
                              to Tara

Total Tara across 9 episodes: 1h 21m / 8h 52m audio (15.3%).

Q&A counts (still suspect — every voice that isn't Mike-or-Tara is
labeled CALLER, so Randall/Rob/producers inflate the bucket):
  2010=4, 2011=1, 2012a=2, 2012b=0, 2014=0, 2015=1, 2016=2, 2017=4, 2018=3
  Total: 17 pairs across 9 episodes

4090 perf on the expanded set:
- Diarization: 31928s in 121.5s = 262.7x realtime (vs 209.7x on 5070 Ti, +25.3%)
- Transcription (3 new episodes only): 10554s in 112.4s = 93.9x

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:20:09 -07:00
fb683d6a05 radio: rename Tom -> Tara, expand speaker roster
Mike confirmed there is no co-host named "Tom" — the voice in 2014-s6e19
and 2016-s8e43 is Tara. The 5070 Ti session fabricated the Tom identity.
The voice profile itself (44 embeddings, 0.698 cosine vs Mike) is correct;
only the human label was wrong.

Rename swept:
- voice-profiles/tom/ -> voice-profiles/tara/ (git mv preserves all .npy)
- voice-profiles/profiles.json: "Tom" key -> "Tara"
- build_cohost_profile.py: TOM_WINDOWS -> TARA_WINDOWS, COHOST_NAME, comments
- 2026-04-27-qa-extraction-cohost-indexing.md: correction header + body sweep
- 2026-04-27-4090-benchmark-and-test-set.md: closure note
- .claude/memory/radio_show_no_cohost_named_tom.md: resolution + speaker roster

Diarization re-run after rename so speaker_map emits "Cohost: Tara".
Q&A counts unchanged (rename is label-only): 9 pairs across 6 test episodes.

Tara distribution from the post-rename diarization (per-episode % of audio):
  2011-03-12-hr1   140s   5.6%   likely false positive (call-in only)
  2012-03-10-hr1    30s   1.1%   likely false positive (call-in only)
  2012-06-09-hr1   340s  12.8%   suspicious — pending Mike confirm
  2014-s6e19       680s  23.3%   confirmed
  2016-s8e43      1890s  35.5%   confirmed
  2017-s9e30       610s  11.4%   plausible — pending Mike confirm

Broader speaker-roster context Mike provided this session (saved to
memory): the show has had multiple co-hosts (Tara, Randall, Rob) plus
producers/board ops (Andrew, Shannon, Ken, others) who would sometimes
go on-air. Only Tara has a profile so far. Every other speaker is
currently labeled CALLER, which means small CO-HOST attributions in
unexpected episodes (e.g. 2011/2012) may actually be a producer rather
than a false positive — Mike to spot-check.

Action item before full-archive run: build profiles for Randall, Rob,
and the named producers to avoid systematic Q&A false positives in
early-years and 2018/2019 episodes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:11:03 -07:00
b9a4bb8807 scc: 4090 benchmark with new code state — 338.1x diarize, 94.8x transcribe
Re-ran benchmark.py on GURU-BEAST-ROG against the post-overhaul code
(co-host profile, batched Whisper int8_float16, revised Q&A extractor).

Results vs 5070 Ti baseline:
- Diarization: 209.7x -> 338.1x (+61.2%)
- Transcription: 63.8x -> 94.8x (+48.6%)
- Q&A pairs: 9 vs 10 (within run-to-run noise; structural correctness matches:
  2014 = 0 callers, 2016 = 2 WiFi caller pairs)

Setup change: BENCH_SETUP.md now lists ffmpeg as a Step-2 prereq
(winget install Gyan.FFmpeg). Was missing on this machine and the pipeline
fails silently at the first diarize call without ffprobe.

Code change: benchmark.py BASELINE_RTF updated 149.5 -> 209.7 to reflect
the 5070 Ti's post-overhaul measurement (e9ac607).

Data: 6 test episode transcripts and diarizations regenerated under the
new code path (batched Whisper output + co-host-aware speaker_map).

Correction memory: voice-profiles/tom/ directory + 5070 Ti session log
fabricated a co-host named "Tom" — Mike confirms no such person exists on
the show. The audio profile is real and the diarization separation is
sound, but the human identity attached to it is wrong. Saved under
.claude/memory/radio_show_no_cohost_named_tom.md pending Mike providing
the correct name for rename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:54:07 -07:00
7bb683a3ed sync: auto-sync from GURU-BEAST-ROG at 2026-04-27 14:42:18
Author: Mike Swanson
Machine: GURU-BEAST-ROG
Timestamp: 2026-04-27 14:42:18
2026-04-27 14:42:25 -07:00
e9ac607500 radio show: co-host voice profile, Q&A extraction fixes, archive index
- Build Tom (co-host) voice profile (44 embeddings, 0.698 similarity to Mike)
- diarizer.py: add CO-HOST speaker label for cohost-role profiles
- voice_profiler.py: emit "Cohost: <name>" label for cohost role
- qa_extractor.py: overlap resolution at load time (midpoint boundary split),
  4s CALLER-preference threshold, turn-based caller-intro lookback (2 HOST turns),
  _preceded_by_caller_intro() helper, _PHONE_GREETING pattern,
  751-1041 + "we'll get your problem solved" promo signatures
- benchmark.py: use src.transcriber.transcribe with batch_size=16
- add index_test_episodes.py and build_cohost_profile.py scripts
- add .gitignore (exclude episodes, transcripts, *.db, .venv)
- session log: 2026-04-27-qa-extraction-cohost-indexing.md

Result: 2016-s8e43 drops from 12 false-positive Q&A pairs to 2 real caller pairs.
archive.db: 6 episodes, 762 segments, 10 Q&A pairs, FTS5 search verified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 14:41:04 -07:00
79abef9dc9 radio: diarization pipeline fixes, benchmark setup, test episode set
- Fix voice_profiler threshold bug (HOST label overwrote Unknown unconditionally)
- Audio preload optimization: single ffmpeg per episode, 149.5x realtime on 5070 Ti
- WavLM threshold raised to 0.85 (Mike 0.90-0.99, callers 0.46-0.83)
- Promo/bumper filter: weighted signature scoring, 42->27 clean Q&A pairs
- Text-only Q&A fallback for episodes with no CALLER diarization labels
- TRANSFORMERS_OFFLINE=1 to skip HuggingFace freshness checks
- Add diarize_2018.py for targeted re-run + FTS5 rebuild
- Add benchmark.py + BENCH_SETUP.md for GURU-BEAST-ROG (RTX 4090) comparison
- Commit 9-episode training diarization.json outputs
- Session log: 2026-04-27-diarization-pipeline.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:20:40 -07:00
26dbe2327f sync: Auto-sync from GURU-BEAST-ROG at 2026-04-26 15:09:57
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: GURU-BEAST-ROG
Timestamp: 2026-04-26 15:09:57

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-26 15:10:06 -07:00
ab518fce30 radio show: patch Option D (big-money-bets) to full quality
Replaced thin Ollama draft with complete show prep:
- Full common thread narrative
- 5-7 talking points per segment (was 2-3)
- Added second story per segment (dot-com playbook, Optimus robot, Adobe/NVIDIA small biz angle)
- Specific facts: NASDAQ -78%, Amazon $107->$5.51, pets.com $82.5M raised
- Tucson-specific angles added throughout
- HTML rewritten with full template CSS matching April 18 show format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 06:57:31 -07:00
822574429d radio: 2026-04-25 show prep — three episodes (AI jobs, GPT-5.5 arms race, big money bets)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 06:45:36 -07:00
492fbbf4c9 chore: add PROJECT_STATE.md to all active projects and clients
Establishes inter-session coordination for 29 projects/clients:
- Full lock/component format for active projects (dataforth-dos,
  radio-show, cascades-tucson, valleywide, instrumental-music-center,
  lens-auto-brokerage, msp-audit-scripts)
- Light format for complete/stalled/planning (msp-pricing, pavon,
  wrightstown-*, gururmm-agent, community-forum, glaztech, etc.)
- Onboarding stubs for recently added clients

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:53:34 -07:00
002a3ff69b sync: Mac session - radio show prep + vanilla cake recipe
- Added fresh radio show prep HTML (April 18, 2026 broadcast)
- Created vanilla cake recipe HTML for web publishing
- Removed guru-rmm submodule (migration incomplete, needs gururmm repo)

Machine: Mikes-MacBook-Air.local
Timestamp: 2026-04-19 08:09:00

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-19 08:28:31 -07:00
0d46de672f Update HTML show prep with enhanced details
- Removed gaming section per user request
- Added detailed pricing and availability for all CES gadgets
- Added company names, researchers, trial info for medical breakthroughs
- Added detailed specs for AI tools (NotebookLM, Gemini)
- Updated to 3-segment format
- Added price badges and availability badges for visual clarity
- Used ASCII markers instead of emojis per directives

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-12 18:43:32 -07:00
fcf4efefc9 Enhance April 18 show prep with detailed specs and pricing
- Added company names, prices, availability dates for all topics
- CES gadgets: LG (,999-,999), Samsung TriFold (,500-,000), Roborock (,599), etc.
- Medical: Galleri test (, available now), VERVE-102 gene therapy details
- AI tools: NotebookLM (free), Gemini Imagen 3 (free tier), detailed access info
- Removed gaming section per user request
- Updated common thread and show wrap for 3-segment format
- Added specific researchers, trial status, company details throughout

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-12 18:43:32 -07:00
b6a2faa9a2 Add radio show prep files and IX security scan
- Show prep for April 5, 11, 18, 2026 (markdown + HTML)
- IX server Smart Slider 3 Pro security scan script
- Comprehensive security audit report (87 WordPress sites)
- All sites safe: 0 PRO (compromised), 3 FREE (safe)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-12 18:43:32 -07:00
9011670fce sync: Auto-sync from GURU-BEAST-ROG at 2026-03-25 03:45:04
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: GURU-BEAST-ROG
Timestamp: 2026-03-25 03:45:04

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-25 03:46:07 -07:00
ad88fc31f0 sync: Auto-sync from acg-guru-5070 at 2026-03-22 22:31:46
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: acg-guru-5070
Timestamp: 2026-03-22 22:31:46

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-22 22:31:46 -07:00
a3a47f2d5e Add batch transcription scripts and 8 episode transcripts
Created Mac M4 batch transcription using mlx-whisper with Apple Silicon
GPU acceleration. Transcribed 8 remaining episodes (17,555 total segments).

Scripts:
- batch_transcribe_mac.py: Full batch processor with mlx-whisper
- test_mac_transcribe.py: Quick test script for faster-whisper

Transcripts (JSON, SRT, TXT formats):
- 2011-06-04-hr1: 1,503 segments
- 2011-09-10-hr1: 1,378 segments
- 2014-s6e05: 1,340 segments
- 2015-s7e30: 1,053 segments
- 2016-s8e42: 2,205 segments
- 2017-s9e26: 2,366 segments
- 2018-s10e17: 4,683 segments
- 2018-s10e21: 2,493 segments

All 9 episodes now transcribed (8 on Mac + 1 from Linux).
Ready for Stages 3-6 on Linux PC.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-21 23:12:06 -07:00
122b87a1d6 Audio processor: add Mac build task for voice training
GPU firmware bug (NVRM 0x00000062) on RTX 5070 Ti makes
GPU transcription impossible. Handoff doc for Mac M4 to
build native version and complete the 8 remaining episode
transcriptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:44:52 -07:00
bdd070f055 Radio show prep: Expanded show notes for March 21 episode
- Researched and expanded all 6 segments with additional detail
- Added 35+ source links throughout
- Expanded NVIDIA GTC coverage (Vera Rubin specs, Groq acquisition, $1T orders)
- Added White House AI Framework 7 pillars breakdown
- Detailed TELUS breach attack chain via Salesloft/Drift
- Expanded Right to Repair with Colorado HB24-1121 parts pairing ban
- Added GPT-5.4, LillyPod, Uber robotaxi details to bonus section

Machine: Mikes-MacBook-Air.local

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-21 17:36:40 -07:00
a29d00c6b2 sync: Auto-sync from acg-guru-5070 at 2026-03-21 16:34:05
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: acg-guru-5070
Timestamp: 2026-03-21 16:34:05

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-21 16:34:05 -07:00
6cc9043b8e Audio processor: validated voice profiling accuracy, tuned threshold
- Fine-grained speaker analysis (3s windows, 1s hop) across 42min episode
- Host voice: 0.90-0.98 similarity (clear positive match)
- Callers: 0.65-0.68 (correctly below threshold)
- Produced audio/clips: 0.53-0.65 (correctly identified as non-host)
- Co-host/other speakers: 0.56-0.62 (correctly identified)
- Tuned host_match_threshold from 0.75 to 0.83 based on empirical data
- Cross-referenced dips with transcript: correctly identifies callers,
  show intros, played audio clips, and station breaks
- Batch transcription of 7 additional training episodes in progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 12:48:25 -07:00
826141a319 Audio processor: working voice profiler with WavLM speaker embeddings
- Voice profiler using microsoft/wavlm-base-sv (512-dim x-vector embeddings)
- Bootstrap from archive: 180 embeddings from 9 episodes across 2010-2018
- Host identification accuracy: 0.87-0.98 similarity for live speech,
  0.60-0.64 for non-host audio (produced intros, co-host)
- Dropped speechbrain dependency (requires torchaudio, CUDA version conflicts)
- Patched torchaudio CUDA 12.8/13.1 version check (warning instead of error)
- Profile stored in voice-profiles/mike-swanson/ with per-chunk embeddings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 12:19:13 -07:00
87f5a9306a Audio processor: fix segment detection with transcript-driven breaks
- Add transcript break phrase detection (going_to_break/coming_back cues)
- Create segments from transcript breaks with silence boundary snapping
- Fix segment dedup in merge_adjacent (handle overlapping segments)
- Add CUDA 12 library path fix (gpu.py + venv activate hook)
- Auto-load existing transcript in detect command
- Tested on 2011-03-05 HR1: correctly identifies commercial break at 34:38

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 11:59:54 -07:00
a1e0442d8b Add radio show audio processor and post-show workflow
- Audio processor CLI tool with 6-stage pipeline: transcribe (faster-whisper GPU),
  diarize (pyannote), detect segments (multi-signal classifier), remove commercials,
  split segments, analyze content (Ollama)
- Post-show workflow doc for episode posts, forum threads, deep-dive blog posts
- Training plan for using 579-episode archive for voice profiles and commercial detection
- Successful test: 45min episode transcribed in 2:37 on RTX 5070 Ti
- Sample transcript output from S7E30 (March 2015)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 11:51:59 -07:00
604c9d9d4b Session log: repo reorganization, GrepAI test, radio show prep
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 18:34:11 -07:00
5cbd49ce24 Reorganize repo: compartmentalize scripts by client/project
Move 150+ scripts from root and scripts/ into client/project directories:
- clients/dataforth/scripts/ (110 files: AD2, sync, SSH, DB, DOS scripts)
- clients/bg-builders/scripts/ (14 files: Lesley mgmt, Exchange, termination)
- clients/internal-infrastructure/scripts/ (10 files: GDAP, Gitea, backups)
- projects/msp-tools/scripts/ (9 files: CIPP, MSP onboarding, Datto)
- projects/gururmm-agent/scripts/ (3 files: API test, JWT, record counts)
- clients/glaztech/scripts/ (1 file: CentraStage removal)

Also reorganized:
- VPN scripts → infrastructure/vpn-configs/
- Retrieved API/JS files → api/
- Forum posts → projects/community-forum/forum-posts/
- SSH docs → clients/internal-infrastructure/docs/
- NWTOC/CTONW docs → projects/wrightstown-smarthome/docs/
- ACG website files → projects/internal/acg-website-2025/
- Dataforth docs → clients/dataforth/docs/
- schema-retrieved.sql → docs/database/

Deleted 24 tmp_*.ps1 one-off debug scripts (preserved in git history).
Root reduced from 220+ files to 62 items (docs + directories only).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 17:15:07 -07:00
acea558406 Community page: link forum to community.azcomputerguru.com
Flarum forum is live with categories: Tech News, Security & Privacy,
AI, Space Tech, Gadgets & Hardware, How-Tos & Tips, Show Discussion,
Off-Topic. Email configured via local Exim.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 07:51:33 -07:00
4e84a7f810 sync: Auto-sync from Mikes-MacBook-Air.local at 2026-03-16 06:58:31
Synced files:
- Session logs updated
- Latest context and credentials
- Command/directive updates

Machine: Mikes-MacBook-Air.local
Timestamp: 2026-03-16 06:58:31

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-16 06:59:22 -07:00