Go to file

Mike Swanson 1b574caba4 radio: transcript-driven speaker name resolution (oracle)

New module src/speaker_oracle.py extracts speaker introductions from
transcripts ("let's talk to William", "we have Clay from the Nerd Junkies",
"in Tara's place, we have Clay", "thanks for the call <name>") and binds
them to non-HOST diarization turns. Pure post-pass on diarization JSONs,
no audio processing — corrects audio-only cosine errors using Mike's
deterministic on-air announcements.

Algorithm:
- Extract intros: regex patterns for caller pickups, guest intros,
  fill-in announcements, caller closes. Case-strict (rejects mid-sentence
  lowercase matches), with a blacklist of common false-positive words.
  Deduplicates same-name intros within 5s.
- Resolve speakers: for each non-HOST turn, find the LATEST opening intro
  at or before turn.start (with 8s forward tolerance for boundary slop).
  Later intros implicitly close earlier callers, so the most recent
  intro wins. No artificial lookback limit (callers can talk for 10+ min).
- Falls back to caller_close patterns within 30s after a turn ends.

Validation on 9-episode test set:
  2018-s10e18: Christopher 190s correctly named (was mislabeled "Tara")
  2012-06-09 : Kay 160s correctly named (was mislabeled "Tara")
  2015-s7e19 : Clay 45s as fillin for Tara, William 40s as caller
  2016-s8e43 : Charles 630s, Bruce 210s, John 205s — most callers named
  2017-s9e30 : Denise 295s, Tom 115s, Elaine 85s, Jeff 10s
  Many other callers across all episodes correctly named.

Remaining unnamed CO-HOST/CALLER (~5-10% of non-HOST time) are real
co-host banter or callers without explicit Mike-introductions.

benchmark.py: adds Phase 2.5 "Name Resolution" between diarization and
Q&A extraction. Prints named-speaker breakdown per episode. Doesn't
modify diarization JSONs (resolution is computed on demand).

Next step: feed named turns into qa_extractor so Q&A pairs get caller
name attached for searchability. Also: bootstrap recurring-speaker
profiles (Tara, Tony, Rob, Randall, producers) by accumulating
intro-tagged windows across the full archive once download completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 16:48:16 -07:00

.claude

radio: skip Clay profile build (failed) — accept 2015-s7e19 Q&A as noisy

2026-04-27 16:36:46 -07:00

api

gravityzone: add full GravityZone integration module

2026-04-24 07:13:16 -07:00

clients

client(valleywide): init app modernization project — VB6/Access stack analysis

2026-04-27 10:10:35 -07:00

docs

Reorganize repo: compartmentalize scripts by client/project

2026-03-20 17:15:07 -07:00

fleet

sync: Auto-sync from GURU-BEAST-ROG at 2026-04-26 15:09:57

2026-04-26 15:10:06 -07:00

imported-conversations

feat: Complete DOS update system with test data routing fix

2026-01-19 12:49:54 -07:00

infrastructure

Reorganize repo: compartmentalize scripts by client/project

2026-03-20 17:15:07 -07:00

mcp-servers

Add TickTick integration, MCP server, and dev project tracking

2026-03-31 10:08:53 -07:00

migrations

Add TickTick integration, MCP server, and dev project tracking

2026-03-31 10:08:53 -07:00

projects

radio: transcript-driven speaker name resolution (oracle)

2026-04-27 16:48:16 -07:00

scripts

Session 2026-03-30: SOPS vault, SC-Syncro sync, Syncro scripts

2026-03-30 19:38:38 -07:00

session-logs

session log: terminal font investigation (inconclusive)

2026-04-24 07:54:33 -07:00

temp

sync: auto-sync from Mikes-MacBook-Air.local at 2026-04-19 19:34:27

2026-04-19 19:34:27 -07:00

Test Datasheets

AD2 session 2026-03-27/28/29: Test datasheet pipeline rebuild

2026-03-29 17:48:37 -07:00

tests

feat: Major directory reorganization and cleanup

2026-01-18 20:42:28 -07:00

tmp

scc: NWTOC v5.0 - fix test exe deployment, session log

2026-03-16 18:55:48 -07:00

tools

Add session import tool + fix audit gaps (GrepAI, Ollama, MCP, settings)

2026-04-16 19:21:01 -07:00

.env.example

Complete Phase 6: MSP Work Tracking with Context Recall System

2026-01-17 06:00:26 -07:00

.gitignore

sync: auto-sync from HOWARD-HOME at 2026-04-22 21:40:31

2026-04-22 21:40:33 -07:00

.gitmodules

fix: Restore .gitmodules for gururmm submodule

2026-04-19 08:30:51 -07:00

.mcp.json.example

feat: Add Sequential Thinking to Code Review + Frontend Validation

2026-01-17 16:23:52 -07:00

ai-misconceptions-reading-list.md

sync: Auto-sync from ACG-M-L5090 at 2026-02-09

2026-02-09 20:24:03 -07:00

alembic.ini

sync: Auto-sync from Mikes-MacBook-Air.local at 2026-03-09 08:14:13

2026-03-09 08:14:13 -07:00

ANALYSIS_COMPLETE.md

feat: Complete DOS update system with test data routing fix

2026-01-19 12:49:54 -07:00

AUTOCODER_INTEGRATION.md

feat: Add Sequential Thinking to Code Review + Frontend Validation

2026-01-17 16:23:52 -07:00

BEHAVIORAL_RULES_INTEGRATION_SUMMARY.md

feat: Complete DOS update system with test data routing fix

2026-01-19 12:49:54 -07:00

bootstrap.ps1

sync: Neptune Exchange session - domain cleanup, SBR routing, Mailprotector config, AD remediation

2026-04-23 12:35:04 -07:00

CATALOG_CLIENTS.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

CATALOG_PROJECTS.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

CATALOG_SESSION_LOGS.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

CATALOG_SHARED_DATA.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

CATALOG_SOLUTIONS.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

claudetools-api.tar.gz

[Config] Add coding guidelines and code-fixer agent

2026-01-17 12:51:43 -07:00

claudetools-migration-20260225.tar.gpg

sync: Auto-sync from ACG-M-L5090 at 2026-03-10 19:11:00

2026-03-10 19:59:08 -07:00

CLIENT_DIRECTORY.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

CONTEXT_RECOVERY_PROMPT.md

fix: Remove all emojis from documentation for cross-platform compliance

2026-01-20 16:21:06 -07:00

CONTEXT.md

Add CONTEXT.md files for automatic context recovery

2026-04-14 20:45:46 -07:00

CREDENTIAL_AUDIT_2026-01-24.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

CREDENTIAL_GAP_ANALYSIS.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

credentials.md

credentials.md: add Uranus entry, note IP reuse on Saturn

2026-04-16 09:07:43 -07:00

DEPLOYMENT_CHECKLIST.txt

feat: Complete DOS update system with test data routing fix

2026-01-19 12:49:54 -07:00

DEPLOYMENT_GUIDE.md

feat: Complete DOS update system with test data routing fix

2026-01-19 12:49:54 -07:00

DOS_FIX_INDEX.txt

feat: Complete DOS update system with test data routing fix

2026-01-19 12:49:54 -07:00

FILE_DEPENDENCIES.md

Add deployment safeguards to prevent code mismatch issues

2026-01-18 15:13:47 -07:00

GREPAI_OPTIMIZATION_GUIDE.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-22 19:22:24

2026-01-22 19:23:16 -07:00

GREPAI_OPTIMIZATION_SUMMARY.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-22 19:22:24

2026-01-22 19:23:16 -07:00

GREPAI_SYNC_STRATEGY.md

docs: Add Mac sync guide and grepai sync strategy

2026-01-22 19:06:45 -07:00

GURURMM_API_ACCESS.md

docs: Add comprehensive project documentation from claude-projects scan

2026-01-22 09:58:32 -07:00

IMPORT_COMPLETE_REPORT.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

IMPORT_VERIFICATION.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

INITIAL_DATA.md

fix: Remove all emojis from documentation for cross-platform compliance

2026-01-20 16:21:06 -07:00

MAC_SYNC_PROMPT.md

docs: Add Mac sync guide and grepai sync strategy

2026-01-22 19:06:45 -07:00

MCP_INSTALLATION_SUMMARY.md

feat: Add Sequential Thinking to Code Review + Frontend Validation

2026-01-17 16:23:52 -07:00

MCP_SERVERS.md

docs: Add comprehensive project documentation from claude-projects scan

2026-01-22 09:58:32 -07:00

NEW_MACHINE_SETUP.md

Switch Gitea remotes from SSH to HTTPS for reliable access

2026-03-13 06:10:29 -07:00

ORGANIZATION_SETUP_COMPLETE.md

Complete project organization: move all DOS files to projects/dataforth-dos, create client folders, update Claude config

2026-01-20 16:03:00 -07:00

package-retrieved.json

docs: Document Dataforth test database system and troubleshooting

2026-01-21 16:38:54 -07:00

PHASE1_QUICK_SUMMARY.txt

Complete Phase 6: MSP Work Tracking with Context Recall System

2026-01-17 06:00:26 -07:00

PHASE3_TEST_REPORT.md

fix: Remove all emojis from documentation for cross-platform compliance

2026-01-20 16:21:06 -07:00

PROJECT_DIRECTORY.md

sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54

2026-02-01 16:23:47 -07:00

PROJECT_ORGANIZATION.md

Complete project organization: move all DOS files to projects/dataforth-dos, create client folders, update Claude config

2026-01-20 16:03:00 -07:00

PROJECTS_INDEX.md

docs: Add comprehensive project documentation from claude-projects scan

2026-01-22 09:58:32 -07:00

QUICKSTART-retrieved.md

docs: Document Dataforth test database system and troubleshooting

2026-01-21 16:38:54 -07:00

README.md

sync: Neptune Exchange session - domain cleanup, SBR routing, Mailprotector config, AD remediation

2026-04-23 12:35:04 -07:00

requirements.txt

sync: Auto-sync from Mikes-MacBook-Air.local at 2026-03-09 08:14:13

2026-03-09 08:14:13 -07:00

save_session_context.json

[Config] Add coding guidelines and code-fixer agent

2026-01-17 12:51:43 -07:00

SESSION_NOTES-retrieved.md

docs: Document Dataforth test database system and troubleshooting

2026-01-21 16:38:54 -07:00

session-context.json

Remove conversation context/recall system from ClaudeTools

2026-01-18 19:10:41 -07:00

START_HERE.md

fix: Remove all emojis from documentation for cross-platform compliance

2026-01-20 16:21:06 -07:00

UPDATE_WORKFLOW.md

feat: Complete DOS update system with test data routing fix

2026-01-19 12:49:54 -07:00

VPN_QUICK_SETUP.md

feat: Add AD2 WinRM automation and modernize sync infrastructure

2026-01-19 14:28:24 -07:00

WORKITEMS.md

sync: auto-sync from HOWARD-HOME at 2026-04-21 18:50:48

2026-04-21 18:50:52 -07:00

README.md

ClaudeTools Bootstrap / Reinstall Guide

Complete instructions for backing up and restoring a ClaudeTools development environment on Windows 11.

Pre-Reinstall: Creating the Archive

Before wiping or reinstalling Windows, create a backup archive.

Option A: Automated Archive (Recommended)

Run the bootstrap script in archive mode:

cd D:\ClaudeTools\bootstrap
.\bootstrap.ps1 -Archive

This creates D:\ClaudeTools-backup.zip containing:

The full ClaudeTools repository (excluding node_modules, __pycache__, venv)
Claude configuration and memory from C:\Users\<you>\.claude\

To specify a custom output path:

.\bootstrap.ps1 -Archive -ArchivePath "E:\Backups\claudetools-2026-03-17.zip"

Option B: Manual Archive

If the script is unavailable, manually zip these locations:

ClaudeTools repository: D:\ClaudeTools\ (entire directory)
Claude memory and config: C:\Users\<you>\.claude\ (entire directory)

Copy the archive(s) to external storage (USB, NAS, cloud) before proceeding with the Windows reinstall.

What Does NOT Need Archiving

These are restored automatically by the bootstrap script:

Git, Node.js, Python, Ollama (reinstalled via winget)
npm global packages (reinstalled)
Python pip packages (reinstalled)
Ollama models (re-pulled)
MCP server virtual environments (recreated)

Post-Reinstall: Running the Bootstrap

Step 1: Prepare the D: Drive

If D:\ClaudeTools was on a separate partition that survived the reinstall, skip to Step 2.

Otherwise, extract your archive:

# Extract the ClaudeTools repo to D:\
Expand-Archive -Path "E:\Backups\claudetools-2026-03-17.zip" -DestinationPath "D:\"

# Extract Claude config to your user profile
# (The archive contains a 'claude-config' folder - copy it to the right place)
Copy-Item -Path "D:\claude-config\*" -Destination "$env:USERPROFILE\.claude\" -Recurse -Force

Step 2: Run the Bootstrap Script

Open an elevated PowerShell (Run as Administrator):

Set-ExecutionPolicy Bypass -Scope Process -Force
D:\ClaudeTools\bootstrap\bootstrap.ps1

The script runs 9 phases and takes approximately 15-30 minutes depending on download speeds and Ollama model sizes.

Step 3: Advanced Usage

Run a single phase:

.\bootstrap.ps1 -OnlyPhase 4    # Only install Python packages

Skip specific phases:

.\bootstrap.ps1 -SkipPhase 5    # Skip Ollama model pulls (slow)
.\bootstrap.ps1 -SkipPhase 4,5  # Skip Python packages and Ollama models

Phase Reference

Phase	What It Does	Duration
1	Install Git, Node.js, Python 3.13, Ollama via winget	2-5 min
2	Install Claude Code CLI + global npm packages	1-2 min
3	Clone or configure ClaudeTools git repository	<1 min
4	Install all Python pip packages globally	3-5 min
5	Pull Ollama models (nomic-embed-text, llama3.1:8b, qwen2.5-coder:7b)	5-15 min
6	Create MCP server venv and install dependencies	1-2 min
7	Write Claude Code settings.json, copy commands, create directories	<1 min
8	Initialize GrepAI	<1 min
9	Verify all components are installed and working	<1 min

Manual Steps (Cannot Be Automated)

These steps require interactive authentication or browser actions:

1. Authenticate Claude Code

claude

Follow the prompts to enter your Anthropic API key or log in via browser.

2. GitHub Personal Access Token

Edit D:\ClaudeTools\.mcp.json and set the GITHUB_PERSONAL_ACCESS_TOKEN value:

"env": {
    "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_your_token_here"
}

Generate a new token at: https://github.com/settings/tokens

3. Claude-in-Chrome Extension

Install the Chrome extension manually:

Open Chrome and navigate to the Chrome Web Store
Search for "Claude in Chrome" (or install from the MCP extension source)
Configure the extension to connect to your local MCP server

4. Restore Memory Files (If Needed)

If the bootstrap reports memory files are missing:

# Copy from your archive
Copy-Item -Path "E:\Backups\claude-config\projects\D--ClaudeTools\memory\*" `
          -Destination "$env:USERPROFILE\.claude\projects\D--ClaudeTools\memory\" `
          -Recurse -Force

5. Git Credentials

When you first git pull or git push to Gitea, you will be prompted for credentials. Use the Gitea username and password from credentials.md.

Verification Checklist

After bootstrap completes, verify manually:

git --version returns a version
node --version returns v24.x or later
python --version returns 3.13.x
claude --version returns a version
ollama list shows all 3 models
D:\ClaudeTools exists and has .git directory
D:\ClaudeTools\.mcp.json exists
D:\ClaudeTools\grepai.exe exists
C:\Users\<you>\.claude\settings.json exists
C:\Users\<you>\.claude\commands\ has command files
Run claude from D:\ClaudeTools and confirm MCP servers connect
Claude-in-Chrome extension is installed and responsive

Troubleshooting

winget not found

Install "App Installer" from the Microsoft Store. It ships with Windows 11 but may need updating.

Node.js/Python not on PATH after install

Close and reopen your terminal, or run:

$env:Path = [System.Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path", "User")

Ollama models fail to pull

Ensure the Ollama service is running:

ollama serve

Then retry:

.\bootstrap.ps1 -OnlyPhase 5

pip install fails for specific packages

Some packages (pywin32, opencv-python, pyzbar) require Visual C++ Build Tools. Install if needed:

winget install Microsoft.VisualStudio.2022.BuildTools

Then re-run Phase 4:

.\bootstrap.ps1 -OnlyPhase 4

GrepAI init requires interaction

Run manually:

cd D:\ClaudeTools
.\grepai.exe init

Select Ollama as the provider and nomic-embed-text as the embedding model.

SSL certificate errors with Gitea

The bootstrap configures http.sslVerify false automatically. If you still see errors:

cd D:\ClaudeTools
git config http.sslVerify false

Languages

MDX 55.2%

Visual Basic 6.0 40%

Python 1.3%

QuickBASIC 0.7%

HTML 0.7%

Other 1.9%