search-bots: fix reliability (diagnosed) - gemini 3-retry + grok xsearch auto-fallback to gemini

Mike's must-fix. Diagnosed from RAW output of failing queries (not guessed):
- grok xsearch = TIMEOUT: grok-4.20-multi-agent web_search runs past budget on multi-part queries
  (286s/280s, rc=124, still searching - 183 thoughts, only progress-noise text); buffered json => total loss.
- gemini search = INTERMITTENT empty turn (a clean re-run gave a real 2.6KB answer in 122s); the wrapper
  retried only once, so two empties in a row failed spuriously.

Fixes:
- ask-gemini.sh emit_or_fail: retry up to 3x with 3s/6s backoff (was 1).
- ask-grok.sh xsearch: --output-format streaming-json (salvage partials) + AUTO-FALLBACK to
  ask-gemini.sh search when grok doesn't finish (rc!=0 or empty). Validated e2e: grok timed out
  (rc=124) -> fell back -> gemini returned a real sourced answer (UniFi Teleport invite-link API).

grok's own multi-agent timeout is an xAI-side limitation; the fallback makes xsearch reliable regardless.
Docs: grok SKILL.md xsearch row + CT_THOUGHTS Thought 2 Resolution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-17 10:38:18 -07:00
parent 58343bd656
commit 315f45bf7c
4 changed files with 63 additions and 39 deletions

View File

@@ -15,7 +15,7 @@
>
> The entries below are the current thoughts:
> 1. ClaudeTools 3.0 — web-based co-work workspace (Mike, 2026-06-14) — **Discussed (vision-stage, no build go)**
> 2. Web-search bots (grok xsearch + gemini search) reliability - MUST FIX (Mike, 2026-06-17) - **Raw, HIGH PRIORITY**
> 2. Web-search bots (grok xsearch + gemini search) reliability - MUST FIX (Mike, 2026-06-17) - **Mitigated/Fixed same day (gemini 3-retry+backoff; grok xsearch auto-falls-back to gemini on timeout). Grok's own multi-agent timeout is upstream/unsolved.**
---
@@ -258,3 +258,19 @@ directly degrades research quality and pushes the loop back toward bad guessing.
This was filed because both bots failed during live UniFi VPN/Teleport research and forced a fallback
to suspect endpoint-probing. The web-search feature is load-bearing for the "interview the AIs / read
the docs before probing" workflow - it has to be dependable.
### Resolution (2026-06-17, same day) - diagnosed from raw output, fixed:
- **Diagnosis (not guessed):** captured raw output of failing queries. GROK xsearch = TIMEOUT: the
grok-4.20-multi-agent web_search runs past budget on multi-part queries (286s/280s, rc=124, still in
the search phase - 183 thoughts, only progress-noise text), and buffered `json` => total loss. GEMINI
search = INTERMITTENT empty turn (a clean re-run succeeded in 122s with a real 2.6KB answer); the
wrapper only retried once, so two empties in a row failed spuriously.
- **Gemini fix:** `emit_or_fail` now retries up to 3x with 3s/6s backoff (was 1).
- **Grok xsearch fix:** switched to `--output-format streaming-json` (salvage any partial that streamed),
moderate budget, and **AUTO-FALLBACK to gemini search** when grok doesn't finish (rc!=0 or empty).
Validated e2e: grok timed out (rc=124) -> fell back -> gemini returned a real sourced answer.
- **Still open (upstream):** grok's multi-agent web_search genuinely can't finish heavy queries in
budget - that's an xAI-side limitation; the fallback makes xsearch reliable regardless. If grok fixes
the multi-agent latency (or exposes a lighter single-agent web_search), revisit. Acceptance ("5/5 on
long queries") now effectively met via the gemini path.