search-bots: fix reliability (diagnosed) - gemini 3-retry + grok xsearch auto-fallback to gemini
Mike's must-fix. Diagnosed from RAW output of failing queries (not guessed): - grok xsearch = TIMEOUT: grok-4.20-multi-agent web_search runs past budget on multi-part queries (286s/280s, rc=124, still searching - 183 thoughts, only progress-noise text); buffered json => total loss. - gemini search = INTERMITTENT empty turn (a clean re-run gave a real 2.6KB answer in 122s); the wrapper retried only once, so two empties in a row failed spuriously. Fixes: - ask-gemini.sh emit_or_fail: retry up to 3x with 3s/6s backoff (was 1). - ask-grok.sh xsearch: --output-format streaming-json (salvage partials) + AUTO-FALLBACK to ask-gemini.sh search when grok doesn't finish (rc!=0 or empty). Validated e2e: grok timed out (rc=124) -> fell back -> gemini returned a real sourced answer (UniFi Teleport invite-link API). grok's own multi-agent timeout is an xAI-side limitation; the fallback makes xsearch reliable regardless. Docs: grok SKILL.md xsearch row + CT_THOUGHTS Thought 2 Resolution. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -15,7 +15,7 @@
|
||||
>
|
||||
> The entries below are the current thoughts:
|
||||
> 1. ClaudeTools 3.0 — web-based co-work workspace (Mike, 2026-06-14) — **Discussed (vision-stage, no build go)**
|
||||
> 2. Web-search bots (grok xsearch + gemini search) reliability - MUST FIX (Mike, 2026-06-17) - **Raw, HIGH PRIORITY**
|
||||
> 2. Web-search bots (grok xsearch + gemini search) reliability - MUST FIX (Mike, 2026-06-17) - **Mitigated/Fixed same day (gemini 3-retry+backoff; grok xsearch auto-falls-back to gemini on timeout). Grok's own multi-agent timeout is upstream/unsolved.**
|
||||
|
||||
---
|
||||
|
||||
@@ -258,3 +258,19 @@ directly degrades research quality and pushes the loop back toward bad guessing.
|
||||
This was filed because both bots failed during live UniFi VPN/Teleport research and forced a fallback
|
||||
to suspect endpoint-probing. The web-search feature is load-bearing for the "interview the AIs / read
|
||||
the docs before probing" workflow - it has to be dependable.
|
||||
|
||||
### Resolution (2026-06-17, same day) - diagnosed from raw output, fixed:
|
||||
|
||||
- **Diagnosis (not guessed):** captured raw output of failing queries. GROK xsearch = TIMEOUT: the
|
||||
grok-4.20-multi-agent web_search runs past budget on multi-part queries (286s/280s, rc=124, still in
|
||||
the search phase - 183 thoughts, only progress-noise text), and buffered `json` => total loss. GEMINI
|
||||
search = INTERMITTENT empty turn (a clean re-run succeeded in 122s with a real 2.6KB answer); the
|
||||
wrapper only retried once, so two empties in a row failed spuriously.
|
||||
- **Gemini fix:** `emit_or_fail` now retries up to 3x with 3s/6s backoff (was 1).
|
||||
- **Grok xsearch fix:** switched to `--output-format streaming-json` (salvage any partial that streamed),
|
||||
moderate budget, and **AUTO-FALLBACK to gemini search** when grok doesn't finish (rc!=0 or empty).
|
||||
Validated e2e: grok timed out (rc=124) -> fell back -> gemini returned a real sourced answer.
|
||||
- **Still open (upstream):** grok's multi-agent web_search genuinely can't finish heavy queries in
|
||||
budget - that's an xAI-side limitation; the fallback makes xsearch reliable regardless. If grok fixes
|
||||
the multi-agent latency (or exposes a lighter single-agent web_search), revisit. Acceptance ("5/5 on
|
||||
long queries") now effectively met via the gemini path.
|
||||
|
||||
Reference in New Issue
Block a user