Commit Graph

75 Commits

Author SHA1 Message Date
d0cbf6126e docs: record Claude-Builder=PLUTO mapping + infra working-feedback memories
- Pluto memory/wiki/machine notes: Unraid VM "Claude-Builder" == hostname PLUTO ==
  172.16.3.36 (same box); RMM-agent access path when SSH key unauthorized; now also
  builds the GuruConnect Windows agent + hosts a Gitea Actions runner.
- New feedback memories: post #bot-alerts only for client/ticket-affecting RMM commands;
  proceed autonomously through routine infra/build prerequisites.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:37:44 -07:00
42cf2bdd68 chore: convert guru-connect to submodule; integrate ADR-008 + 2026-05-29 session log
guru-connect is now tracked as a submodule (azcomputerguru/guru-connect @ e3e95f8);
its working state was published to the GC repo first, so no content is lost. guru-rmm
advanced to include ADR-008 (GC integration boundary) replayed on top of the team's
Integrations Center / discovery advances. Includes the native-remote-control spec
(now inside the GC submodule), the versionable-products memory, and the session log.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 06:36:54 -07:00
647e09d3cb sync: auto-sync from GURU-5070 at 2026-05-28 14:27:08
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-28 14:27:08
2026-05-28 14:27:12 -07:00
f04c5012e9 sync: auto-sync from HOWARD-HOME at 2026-05-28 12:26:48
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-28 12:26:48
2026-05-28 12:26:56 -07:00
3dbbbfaa6b sync: auto-sync from GURU-5070 at 2026-05-27 16:54:37
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-27 16:54:37
2026-05-27 16:54:45 -07:00
ba90915da5 docs(session)+rules: 2026-05-27 — Quantum M365 onboarding, IX autodiscover fix, Syncro emergency/labor/attribution rules
Session logs: root (Michael #32329 hosting offer + IX simplehost.email autodiscover DNS fix + Cascades #32332 emergency correction) + Quantum client log (M365 tenant 2fd0092b onboarding, break-glass GA, CA report-only).

Syncro rule overhaul:
- Emergency billing: prepaid -> 26184 @ hours x1.5 (was 26118); non-prepaid -> 26184 with channel rate (onsite $262.50 / remote+inshop $225)
- Never make up labor items (existing product + real name; QuickBooks sync)
- Corrections preserve original tech's user_id (commission); adding notes/labor never changes ticket owner

/remediation-tool: Conditional Access may be managed programmatically (report-only first + exclude break-glass + confirm before enforce); fabb3421 deprecated for customer tenants; Quantum tenant onboarded (gotchas table).

Memory: 4 new (no-madeup-labor, corrections-preserve-tech, ca-programmatic, quantum-godaddy-tenant) + updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:57:55 -07:00
3f3a16a56d sync: auto-sync from HOWARD-HOME at 2026-05-27 11:24:44
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-27 11:24:44
2026-05-27 11:25:34 -07:00
599d861478 docs(memory): coord /messages API shape (paginated object, not array)
Pin down the coord messages endpoint shape after repeated mark-read failures:
{total,skip,limit,messages[]}; parse .messages[], strip control chars, read may be null.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:10:24 -07:00
19594b15dc docs(session): 2026-05-27 — RMM Phase 2 deploy, Autotask integration, Tohono DoIT #32328
- Root log: GuruRMM Phase 2 authz/IDOR deployed (v0.3.31); Autotask creds verified + vaulted; /autotask scaffolded (kept local)
- Client log (new): Tohono O'odham DoIT — Starlink static IP / site-to-site research, ticket #32328
- Memory: Syncro is default PSA, Autotask opt-in (feedback_psa_default_syncro.md)

Note: .claude/commands/autotask.md intentionally left local/uncommitted per Mike.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 10:40:06 -07:00
2ca7bfc475 chore(memory): fix shared-memory index issues
Audit of .claude/memory found and fixed:
- Broken link: Power Failure Runbook (../.claude/... -> ../...)
- 8 orphaned memories not in MEMORY.md index (Graph CA/password-reset,
  vault-write-sequence, GURU-BEAST-ROG, 3x Cascades, identity proposal)
  -> now indexed under their sections, so they're discoverable
- 5 files missing frontmatter -> added name/description/type
- Duplicate index entry for reference_workstation_setup.md -> deduped
- Trimmed the worst oversized index hooks (Syncro invoice line was 427 chars)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 07:37:59 -07:00
848389f442 feat: make FEATURE_ROADMAP a living doc — dev definition-of-done + audit default
Mike's decision (2026-05-27): the roadmap is a maintained status-and-plan
tracker ([ ]=planned, [x]=shipped, dated), consulted going in and updated
coming out.

- gururmm-development-principles memory: new "Living Roadmap (MANDATORY)"
  principle — consult before building, update the entry in the SAME change
  when shipping/modifying; roadmap update is part of definition-of-done.
  Dev is the primary maintainer; the audit is the backstop.
- rmm-audit skill: state the convention explicitly — the roadmap pass
  default is reconcile-and-flip (not annotate-only).

(Companion gururmm-repo changes — DESIGN.md principle + baseline checkbox
reconcile — pushed separately to the gururmm repo.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 06:34:41 -07:00
769920c1c2 sync: auto-sync from GURU-5070 at 2026-05-27 06:11:29
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-27 06:11:29
2026-05-27 06:11:33 -07:00
6e7b104555 sync: auto-sync from GURU-KALI at 2026-05-27 05:33:56
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-27 05:33:56
2026-05-27 05:33:56 -07:00
63bc234d1b sync: auto-sync from GURU-KALI at 2026-05-26 20:08:37
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-26 20:08:37
2026-05-26 20:08:39 -07:00
d2bc34a2f2 proposal: centralize machine config in identity.json
Merge Ollama fallback pattern with identity.json approach.
Store endpoint/fallback/prose_model to eliminate curl probes.
Same pattern as claudetools_root/vault_path (working).

Next: coord message rollout to populate fields on all machines.
2026-05-26 20:02:19 -07:00
2bec888ea7 sync: auto-sync from GURU-KALI at 2026-05-26 19:59:15
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-26 19:59:15
2026-05-26 19:59:16 -07:00
15de6a7cf2 sync: auto-sync from GURU-KALI at 2026-05-26 19:41:06
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-26 19:41:06
2026-05-26 19:41:07 -07:00
e0dd370934 sync: auto-sync from GURU-5070 at 2026-05-26 19:32:05
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-26 19:32:05
2026-05-26 19:32:09 -07:00
f91ab226dc sync: auto-sync from GURU-KALI at 2026-05-26 18:47:58
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-26 18:47:58
2026-05-26 18:48:02 -07:00
746588a1da sync: auto-sync from GURU-5070 at 2026-05-26 14:02:23
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-26 14:02:23
2026-05-26 14:02:27 -07:00
6adf75b958 wiki/memory: Syncro contact rule is global, not Cascades-specific
Update cascades-tucson.md Syncro billing pattern to note the blank-contact
rule applies to all customers. Update feedback_syncro_cascades_contact.md
to be incident-detail only (Meredith Kuhn default), pointing to the global
rule in feedback_syncro_blank_contact.md. Update MEMORY.md index entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 16:40:36 -07:00
9310be6113 feat: add wiki knowledge layer (Phase 0 + Phase 1 seed)
Implements LLM-compiled wiki layer between raw session logs and live
CONTEXT.md, inspired by Karpathy's knowledge base workflow. Adds wiki/
directory structure, article templates, spec docs, and seeds first two
articles (Cascades of Tucson, GuruRMM) from 60+ session logs.

Updates CLAUDE.md to check wiki first on all context-loading triggers.
Captures verified ACG IP/hostname map and Neptune physical-location
clarification (Dataforth D2, subnet overlap TODO) in memory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 15:42:38 -07:00
a44a0bcee0 sync: auto-sync from HOWARD-HOME at 2026-05-22 15:40:30
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-22 15:40:30
2026-05-22 15:40:34 -07:00
ec51a06a0d sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-22 12:08:26
Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-22 12:08:26
2026-05-22 12:08:31 -07:00
3ad96ac7b9 sync: auto-sync from GURU-BEAST-ROG at 2026-05-22 11:46:56
Author: Mike Swanson
Machine: GURU-BEAST-ROG
Timestamp: 2026-05-22 11:46:56
2026-05-22 11:46:58 -07:00
280428b68b sync: auto-sync from HOWARD-HOME at 2026-05-22 09:03:36
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-22 09:03:36
2026-05-22 09:03:39 -07:00
592262f9b7 fix(sync): detect untracked-only changes; reconcile timer-era memories
sync.sh: replace `git diff-index --quiet HEAD --` with
`[ -n "$(git status --porcelain)" ]` in both the main-repo (Phase 1) and
vault change-detection, so brand-new untracked files are no longer silently
skipped (the bug Howard hit 2026-04-17). Mark project_sync_script_bug.md
RESOLVED.

.gitignore: exclude the datto BSOD dumps (6 MB zip + 48 MB extracted) so the
detection fix doesn't sweep 54 MB of binaries into the repo.

memory: finish the add_line_item reconciliation — drop legacy "time entry" /
timer-billable framing from feedback_syncro_labor_type and
feedback_syncro_warranty_product (and their index lines); the product-selection
rules themselves are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:19:52 -07:00
0eccd714bb memory: reconcile timer memories with Syncro add_line_item switch
Mike's overhaul replaced the timer workflow with add_line_item, and he
already rewrote feedback_syncro_timer_first.md. Reconcile the leftovers:
- MEMORY.md index line for timer_first still stated the superseded
  "timers required" rule as current — rewrite to the add_line_item rule.
- timer_entry response-shape memory is now about a dead workflow — mark
  it HISTORICAL (index + file banner), retained only for manual timer use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:13:37 -07:00
04c7bf7564 refactor(syncro): replace timer workflow with add_line_item, lock API sequences
- Billing now uses add_line_item directly; timer_entry/charge_timer_entry removed
- Added Verified Response Shapes table for all endpoints (tested live against ACG internal customer)
- Billing workflow rewritten as strict 5-step locked script with no branches
- Added STOP rule: never try alternative endpoints/formats on unexpected responses
- bot-alerts section: explicit success ([OK] + message_id) and failure ([WARNING]) criteria
- Updated feedback memory to supersede the old timer-first rule

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 10:04:45 -07:00
04acf08061 sync: auto-sync from HOWARD-HOME at 2026-05-20 17:08:25
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-20 17:08:25
2026-05-20 17:08:29 -07:00
0227df6a3c sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-17 22:07:52
Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-17 22:07:52
2026-05-17 22:07:59 -07:00
422fb8ede9 docs: harden agent parity rule — all platforms in same change, no exceptions
- CODING_GUIDELINES.md: tighten parity rule wording to match Mike's intent:
  "add feature X" means Windows + Linux + macOS in the same commit
- memory: add feedback_gururmm_agent_parity for future session enforcement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 15:52:15 -07:00
e191b713f9 session: Cascades phone verification & closeout — Entra Connect staging exited, CA policies re-pointed to AD-synced SG-Caregivers
- Full tenant verification sweep: all Intune/Entra objects match session logs
- Entra Connect staging mode exited; 17 AD groups synced to cloud
- CA policies (Block-off-network, Sign-in-frequency-8h, Block-non-compliant) patched from SG-Caregivers-Pilot to AD-synced SG-Caregivers
- Registration Campaign exclusion updated to SG-Caregivers
- Deleted test accounts: howard.enos (AD) and pilot.test (M365)
- Documented Christine Nyanzunda collision risk, Ederick Yuzon open item, standing security-group rule
- Session log written

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 17:45:30 -07:00
b6265cc930 sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-12 05:50:33
Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-12 05:50:33
2026-05-12 05:50:33 -07:00
dfa23c1f70 sync: auto-sync from HOWARD-HOME at 2026-05-08 19:54:23
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-08 19:54:23
2026-05-08 19:54:24 -07:00
78b5f5d8c9 sync: auto-sync from HOWARD-HOME at 2026-05-08 19:53:03
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-08 19:53:03
2026-05-08 19:53:06 -07:00
636281da5f sync: auto-sync from HOWARD-HOME at 2026-05-06 15:10:59
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-06 15:10:59
2026-05-06 15:11:04 -07:00
17f0d0becb sync: auto-sync from HOWARD-HOME at 2026-05-06 13:46:20
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-06 13:46:20
2026-05-06 13:46:23 -07:00
bac8e5f367 sync: auto-sync from HOWARD-HOME at 2026-05-05 16:44:25
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-05 16:44:25
2026-05-05 16:44:26 -07:00
a039953d6d Session work 2026-05-04: Grabb Leap calendar fix, Dataforth lobby phone VLAN, IMC printer + VPN
- Grabb & Durando: investigated and resolved Svetlana Larionova's Leap-to-M365 calendar OAuth consent issue (Graph-side report + session log). Syncro #32245.
- Dataforth: lobby phone (ext 201) was offline due to D1-Server-Room port 1 being on the wrong VLAN; reconfigured to VLAN 100, phone re-provisioned and registered. Session log + PROJECT_STATE update. Syncro #32246.
- Instrumental Music Center: Station 2 receipt printer reconnect + VPN install on Manda's machine. Syncro #32247.
- Memory: generalized the Syncro blank-contact rule (was Cascades-only) and added the labor-type rule (never use "Prepaid project labor") per Winter's 2026-05-04 corrections.
- Gitignored `.claude/tmp/` so per-session helper scripts don't sneak in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:51:59 -07:00
2ec07ea409 syncro skill: timer-entry-first workflow + heredoc payloads
- Promote timer_entry → charge_timer_entry to default billing path; demote
  bare add_line_item to a clearly-labeled fallback for non-time items only.
  Mike caught the bare-add_line_item bug across 31 tickets on 2026-04-30;
  repeated on 3 tickets 2026-05-01. Time entries are required for Syncro
  reporting (hours per client, tech productivity, prepay burn).
- Replace /tmp/*.json payload pattern with heredoc throughout. /tmp resolves
  to C:\tmp\ in the Write tool but %LOCALAPPDATA%\Temp\ in Git Bash on
  Windows — different real directories. Caused a wrong-comment incident on
  ticket #32225 2026-05-01 (rogue payload from prior session). Heredoc
  avoids the file handoff entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:58:20 -07:00
4ad4bd60fc sync: auto-sync from HOWARD-HOME at 2026-05-01 10:44:36
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-01 10:44:36
2026-05-01 10:44:39 -07:00
dc2ffbdd28 Session log: Syncro billing batch (Sombra, Mineralogical Record, Cascades Entra) + /tmp path mismatch incident
Three tickets billed today: #32225 Sombra ($525 onsite), #32229 Mineralogical
Record ($262.50 emergency), #32214 Cascades Entra (33.5 hrs project labor at $0
debits prepaid block). Hit a real incident on Sombra: rogue comment posted with
content from a different ticket because /tmp resolves differently in the Write
tool (C:/tmp/) vs Git Bash (%LOCALAPPDATA%/Temp/) on Windows. Howard manually
deleted from GUI; subsequent posts used heredoc to avoid the file handoff
entirely. Root cause documented in feedback_tmp_path_windows.md so future
sessions don't trip the same wire. Scheduled remote agent
trig_01CAfvwoQ4nLcKEqbU4UQmSa to update the syncro skill examples 2026-05-02.
2026-05-01 10:44:39 -07:00
20f8bb0128 docs: Syncro invoice verification pattern (lesson from false alarm)
Created memory entry documenting correct way to verify ticket-invoice linkage
in Syncro API after 2026-04-30 incident where faulty verification script
falsely claimed 31 tickets had no invoices (actually 29 had invoices properly,
2 were correctly Non-Billable).

Key lessons:
- List endpoint does NOT return ticket_id or line_items
- Must query individual invoices for full data
- Invoice numbers are strings, not integers
- Use ticket ID (internal), not ticket number (user-visible)

Added to memory index for future GrepAI semantic search.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-30 18:44:12 -07:00
58e7feda9a Session log: Cascades CA bypass phased rollout + pilot user + phone re-enroll
Cascades caregiver shared-phone bypass pilot — 2026-04-29 evening into
2026-04-30 early morning continuation.

Major work:
- Adopted phased per-group CA rollout (corrects original tenant-wide §5
  design that would have blocked off-site office users)
- Step A: backfilled admin@ into excludeUsers on all 8 existing Cascades
  CA policies (mirrors sysadmin@ exclusion posture; Option 1 break-glass)
- Outlook + Helpany + LinkRx assigned to Cascades - Shared Phones group
  and added to MHS kiosk app list (final dashboard: 5 caregiver apps)
- Created cloud-only pilot user pilot.test@cascadestucson.com,
  SG-Caregivers-Pilot group, Business Premium license, vault entry
  pushed to Gitea vault repo
- Built 4 CA changes: PATCH legacy all-users-MFA to exclude pilot group,
  CREATE 3 new Report-only policies (block off-network, block
  non-compliant, 8h sign-in frequency) with both admins excluded
- Pilot phone wipe + re-enroll after first attempt stuck; PIN set,
  awaiting MHS to take over launcher and SDM sign-in prompt

6 new project/feedback memories. Resume point at top of new session log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:57:28 -07:00
1d3c1f53f4 Session log: cPanel CVE-2026-41940 IOC scan + remediation on IX/WebSvr
Both servers were already patched (11.110.0.97 and 11.134.0.20) via
daily auto-update. IOC scan found 16 flagged sessions across both
plus 4 uncommented SSH keys on IX.

Critical remediation:
- Forensic evidence preserved before any deletion
- 4 uncommented SSH keys removed from IX (server-side backup retained)
- 16 flagged sessions purged across both servers
- Root passwords rotated via chpasswd
- New WHM API tokens created; 3 stale transfer-* tokens revoked
- Vault entries + 1Password Infrastructure items updated

Forensic deep-dive verdict: patch held. All 7 actual CVE exploit
attempts (botnet IPs hitting /json-api/version) returned HTTP 403.
The "multi-line pass" IOC hits on user sessions were false positives.
Unidentified 76.18.103.222 root session traced to routine SSL
maintenance (zero sensitive endpoints touched).

Skill hardening:
- Added MANDATORY service-token directive to .claude/commands/1password.md
  enforcing OP_SERVICE_ACCOUNT_TOKEN from SOPS for all op CLI calls
- Per Mike: memory files alone don't reliably bind agent behavior;
  baking governance into skill content loaded at moment of use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 07:22:52 -07:00
26cc4f71c2 memory: GuruRMM holistic development principles
Documented two fundamental GuruRMM development principles:

1. Holistic Feature Development (MANDATORY):
   - Every feature requires complete stack: backend, API, UI/UX, docs
   - Features without management interfaces are incomplete
   - Design for scalability and future expansion
   - Example workflows included

2. AI-Optional Operation:
   - Product must work without AI agents (Claude, autonomous tools)
   - AI features are enhancements, not requirements
   - Core operations remain deterministic and reliable

Principles documented in guru-rmm/docs/DESIGN.md and now in memory for
cross-session reference.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-29 07:17:11 -07:00
a1185b4707 memory: approval workflow for tools vs projects
Tools (remediation-tool, onboard scripts, MSP utilities):
- Howard can modify directly
- Claude can execute with Howard OR Mike approval
- No roadmap process, immediate operational changes

Projects (GuruRMM, ClaudeTools API, etc.):
- Require Mike approval
- Features go to roadmap
- Bugs go to bug list

Established during Cascades CA role gap fix discussion.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-29 06:51:39 -07:00
cdb71c91af radio: skip Clay profile build (failed) — accept 2015-s7e19 Q&A as noisy
First attempt at Clay's voice profile from 2015-s7e19 produced
Clay-vs-Mike cosine similarity of 0.994 — essentially a Mike clone.
Root cause: 10s WavLM x-vector chunks averaged Mike's frequent
interjections together with Clay's dialogue, and Mike's well-trained
profile dominated the resulting embedding signal.

Mike's call: skip Clay, accept the 2015-s7e19 Q&A as noisy. Clay rarely
appears in other episodes, so the cost of not having his profile is
bounded to this one episode plus any rare future appearances.

Cleanup:
- voice-profiles/clay/ removed
- voice-profiles/profiles.json: Clay entry removed
- Memory updated to record the decision and the failure mode

Kept build_clay_profile.py in-repo as documentation of the attempt and
the Mike-similarity-filter pattern. Useful starting point if a future
attempt provides cleaner pure-Clay timestamps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:36:46 -07:00
c7affd8681 radio: bumper detection in diarizer + full archive download script
Adds a transcript-driven bumper filter to the diarization pipeline. When
a transcript segment matches qa_extractor's promo/bumper signatures, the
overlapping audio windows are labeled BUMPER and the WavLM cosine match
is skipped. Prevents music/promo from being matched against speaker
profiles (the failure mode Mike caught in 2018-s10e18 @ 09:20-10:05).

Code changes:
- src/voice_profiler.py: identify_speakers() takes optional skip_ranges
  parameter; windows whose midpoint falls in a skip range get labeled
  "[bumper]" and skip cosine match
- src/diarizer.py: diarize() takes optional transcript_path; pre-computes
  bumper time ranges via qa_extractor._is_promo_or_bumper, passes to
  identify_speakers; adds BUMPER speaker label
- benchmark.py: passes transcript_path to diarize()

Aggregate impact across 9-episode test set:
  Tara attribution: 4880s -> 3680s  (-1200s / -25%)
  Q&A pairs: 17 -> 19 (+2)
    (bumper-flagged segments had been disrupting conversation detection
     in 2017-s9e30 and 2018-s10e18)
  CALLER total: 1320s -> 1190s  (bumpers previously labeled CALLER moved)
  Per-episode bumpers caught: 1-8, total ~165 bumper segments across set

Remaining Tara false positives are real callers acoustically similar to
Tara (Christopher in 2018, Kay in 2012, William and Charles in 2015) and
guest Clay in 2015-s7e19 — those need profile rebuild + Clay profile,
not bumper filtering.

Adds download_full_archive.py — resumable mirror-style downloader that
walks IX server's /home/gurushow/public_html/archive/{year}/ and copies
all MP3s to archive-data/episodes/. Run is in progress (~589 files,
~10-15GB). Used to source clean profile windows for the remaining
co-hosts (Tara rebuild, Clay, Tony, Rob, Randall, producers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:17:50 -07:00