Mike Swanson 1b574caba4 radio: transcript-driven speaker name resolution (oracle)
New module src/speaker_oracle.py extracts speaker introductions from
transcripts ("let's talk to William", "we have Clay from the Nerd Junkies",
"in Tara's place, we have Clay", "thanks for the call <name>") and binds
them to non-HOST diarization turns. Pure post-pass on diarization JSONs,
no audio processing — corrects audio-only cosine errors using Mike's
deterministic on-air announcements.

Algorithm:
- Extract intros: regex patterns for caller pickups, guest intros,
  fill-in announcements, caller closes. Case-strict (rejects mid-sentence
  lowercase matches), with a blacklist of common false-positive words.
  Deduplicates same-name intros within 5s.
- Resolve speakers: for each non-HOST turn, find the LATEST opening intro
  at or before turn.start (with 8s forward tolerance for boundary slop).
  Later intros implicitly close earlier callers, so the most recent
  intro wins. No artificial lookback limit (callers can talk for 10+ min).
- Falls back to caller_close patterns within 30s after a turn ends.

Validation on 9-episode test set:
  2018-s10e18: Christopher 190s correctly named (was mislabeled "Tara")
  2012-06-09 : Kay 160s correctly named (was mislabeled "Tara")
  2015-s7e19 : Clay 45s as fillin for Tara, William 40s as caller
  2016-s8e43 : Charles 630s, Bruce 210s, John 205s — most callers named
  2017-s9e30 : Denise 295s, Tom 115s, Elaine 85s, Jeff 10s
  Many other callers across all episodes correctly named.

Remaining unnamed CO-HOST/CALLER (~5-10% of non-HOST time) are real
co-host banter or callers without explicit Mike-introductions.

benchmark.py: adds Phase 2.5 "Name Resolution" between diarization and
Q&A extraction. Prints named-speaker breakdown per episode. Doesn't
modify diarization JSONs (resolution is computed on demand).

Next step: feed named turns into qa_extractor so Q&A pairs get caller
name attached for searchability. Also: bootstrap recurring-speaker
profiles (Tara, Tony, Rob, Randall, producers) by accumulating
intro-tagged windows across the full archive once download completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:48:16 -07:00

ClaudeTools Bootstrap / Reinstall Guide

Complete instructions for backing up and restoring a ClaudeTools development environment on Windows 11.


Pre-Reinstall: Creating the Archive

Before wiping or reinstalling Windows, create a backup archive.

Run the bootstrap script in archive mode:

cd D:\ClaudeTools\bootstrap
.\bootstrap.ps1 -Archive

This creates D:\ClaudeTools-backup.zip containing:

  • The full ClaudeTools repository (excluding node_modules, __pycache__, venv)
  • Claude configuration and memory from C:\Users\<you>\.claude\

To specify a custom output path:

.\bootstrap.ps1 -Archive -ArchivePath "E:\Backups\claudetools-2026-03-17.zip"

Option B: Manual Archive

If the script is unavailable, manually zip these locations:

  1. ClaudeTools repository: D:\ClaudeTools\ (entire directory)
  2. Claude memory and config: C:\Users\<you>\.claude\ (entire directory)

Copy the archive(s) to external storage (USB, NAS, cloud) before proceeding with the Windows reinstall.

What Does NOT Need Archiving

These are restored automatically by the bootstrap script:

  • Git, Node.js, Python, Ollama (reinstalled via winget)
  • npm global packages (reinstalled)
  • Python pip packages (reinstalled)
  • Ollama models (re-pulled)
  • MCP server virtual environments (recreated)

Post-Reinstall: Running the Bootstrap

Step 1: Prepare the D: Drive

If D:\ClaudeTools was on a separate partition that survived the reinstall, skip to Step 2.

Otherwise, extract your archive:

# Extract the ClaudeTools repo to D:\
Expand-Archive -Path "E:\Backups\claudetools-2026-03-17.zip" -DestinationPath "D:\"

# Extract Claude config to your user profile
# (The archive contains a 'claude-config' folder - copy it to the right place)
Copy-Item -Path "D:\claude-config\*" -Destination "$env:USERPROFILE\.claude\" -Recurse -Force

Step 2: Run the Bootstrap Script

Open an elevated PowerShell (Run as Administrator):

Set-ExecutionPolicy Bypass -Scope Process -Force
D:\ClaudeTools\bootstrap\bootstrap.ps1

The script runs 9 phases and takes approximately 15-30 minutes depending on download speeds and Ollama model sizes.

Step 3: Advanced Usage

Run a single phase:

.\bootstrap.ps1 -OnlyPhase 4    # Only install Python packages

Skip specific phases:

.\bootstrap.ps1 -SkipPhase 5    # Skip Ollama model pulls (slow)
.\bootstrap.ps1 -SkipPhase 4,5  # Skip Python packages and Ollama models

Phase Reference

Phase What It Does Duration
1 Install Git, Node.js, Python 3.13, Ollama via winget 2-5 min
2 Install Claude Code CLI + global npm packages 1-2 min
3 Clone or configure ClaudeTools git repository <1 min
4 Install all Python pip packages globally 3-5 min
5 Pull Ollama models (nomic-embed-text, llama3.1:8b, qwen2.5-coder:7b) 5-15 min
6 Create MCP server venv and install dependencies 1-2 min
7 Write Claude Code settings.json, copy commands, create directories <1 min
8 Initialize GrepAI <1 min
9 Verify all components are installed and working <1 min

Manual Steps (Cannot Be Automated)

These steps require interactive authentication or browser actions:

1. Authenticate Claude Code

claude

Follow the prompts to enter your Anthropic API key or log in via browser.

2. GitHub Personal Access Token

Edit D:\ClaudeTools\.mcp.json and set the GITHUB_PERSONAL_ACCESS_TOKEN value:

"env": {
    "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_your_token_here"
}

Generate a new token at: https://github.com/settings/tokens

3. Claude-in-Chrome Extension

Install the Chrome extension manually:

  • Open Chrome and navigate to the Chrome Web Store
  • Search for "Claude in Chrome" (or install from the MCP extension source)
  • Configure the extension to connect to your local MCP server

4. Restore Memory Files (If Needed)

If the bootstrap reports memory files are missing:

# Copy from your archive
Copy-Item -Path "E:\Backups\claude-config\projects\D--ClaudeTools\memory\*" `
          -Destination "$env:USERPROFILE\.claude\projects\D--ClaudeTools\memory\" `
          -Recurse -Force

5. Git Credentials

When you first git pull or git push to Gitea, you will be prompted for credentials. Use the Gitea username and password from credentials.md.


Verification Checklist

After bootstrap completes, verify manually:

  • git --version returns a version
  • node --version returns v24.x or later
  • python --version returns 3.13.x
  • claude --version returns a version
  • ollama list shows all 3 models
  • D:\ClaudeTools exists and has .git directory
  • D:\ClaudeTools\.mcp.json exists
  • D:\ClaudeTools\grepai.exe exists
  • C:\Users\<you>\.claude\settings.json exists
  • C:\Users\<you>\.claude\commands\ has command files
  • Run claude from D:\ClaudeTools and confirm MCP servers connect
  • Claude-in-Chrome extension is installed and responsive

Troubleshooting

winget not found

Install "App Installer" from the Microsoft Store. It ships with Windows 11 but may need updating.

Node.js/Python not on PATH after install

Close and reopen your terminal, or run:

$env:Path = [System.Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path", "User")

Ollama models fail to pull

Ensure the Ollama service is running:

ollama serve

Then retry:

.\bootstrap.ps1 -OnlyPhase 5

pip install fails for specific packages

Some packages (pywin32, opencv-python, pyzbar) require Visual C++ Build Tools. Install if needed:

winget install Microsoft.VisualStudio.2022.BuildTools

Then re-run Phase 4:

.\bootstrap.ps1 -OnlyPhase 4

GrepAI init requires interaction

Run manually:

cd D:\ClaudeTools
.\grepai.exe init

Select Ollama as the provider and nomic-embed-text as the embedding model.

SSL certificate errors with Gitea

The bootstrap configures http.sslVerify false automatically. If you still see errors:

cd D:\ClaudeTools
git config http.sslVerify false
Description
Custom Claude Code behaviors and workflows - Multi-mode operation system
Readme 1.6 GiB
Languages
MDX 55.2%
Visual Basic 6.0 40%
Python 1.3%
QuickBASIC 0.7%
HTML 0.7%
Other 1.9%