ct-thoughts: web-search bots reliability = MUST FIX (Mike) + research-method correction

Mike's correction: web search (grok xsearch + gemini search) carries at least as much weight as
live API probing - the searches gave the real leads this session (connector proxy, teleport setting
path); blind endpoint-probing is "highly suspect" (mostly 404s). And the search bots MUST be properly
fixed - both returned empty repeatedly on UniFi research despite the same-day partial grok fix.

- docs/CT_THOUGHTS.md: Thought 2 (HIGH PRIORITY) - web-search reliability must-fix, with the observed
  failures + a proper-fix investigation plan (capture failing-query JSON; max-turns/streaming-json/
  retry; cross-fallback grok<->gemini; 5/5 acceptance).
- memory feedback_web_search_over_probing: lead with web search/docs; probe only to CONFIRM a
  hypothesis, never as primary discovery. Reading our own config is fine; guessing paths is not.
- errorlog correction logged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-17 09:36:35 -07:00
parent 8f72178d8a
commit 1dd2f208a0
4 changed files with 92 additions and 0 deletions

View File

@@ -15,6 +15,7 @@
>
> The entries below are the current thoughts:
> 1. ClaudeTools 3.0 — web-based co-work workspace (Mike, 2026-06-14) — **Discussed (vision-stage, no build go)**
> 2. Web-search bots (grok xsearch + gemini search) reliability - MUST FIX (Mike, 2026-06-17) - **Raw, HIGH PRIORITY**
---
@@ -210,3 +211,50 @@ a day spent, not a quarter.
2. **Input arbitration** — is a soft driving-token enough, or do you want hard turns?
3. **CLI coexistence** — do existing Claude Code CLI sessions appear as first-class nodes,
or is the web the only entry point?
---
## Thought 2 — Web-search bots (grok xsearch + gemini search) reliability: MUST FIX (Mike, 2026-06-17)
**Status: Raw - HIGH PRIORITY. Mike's directive: this "absolutely must be properly fixed."**
### The problem
The live web-search path - `grok xsearch` (multi-agent web_search) and `agy/gemini search`
(Google grounding) - is the MOST VALUABLE discovery tool we have (it surfaced the UniFi cloud
connector proxy and the Teleport `/rest/setting/teleport` path that blind endpoint-probing never
would have). But it is UNRELIABLE: both return EMPTY intermittently, especially on longer /
multi-part queries.
Observed 2026-06-17:
- `grok xsearch` returned `[ask-grok] no result (stopReason=)` ~5+ times on UniFi queries,
DESPITE the same-day partial fix (--yolo, drop --no-subagents, web-primary prompt, 300s budget)
that DID work on a short query ("current rust version", 23s). So the fix is incomplete - longer
queries still fail.
- `agy/gemini search` returned `[ask-gemini] empty response` (even after its built-in single retry)
on the same queries.
### Why it matters (Mike's reweighting)
Web search now carries AT LEAST as much weight as live API probing. Probing without a search/doc
lead is "blind guessing" and mostly 404s; the searches give the real leads. So a flaky search bot
directly degrades research quality and pushes the loop back toward bad guessing.
### Proper fix (not a workaround) - investigation plan
1. **grok:** capture the raw `--output-format json` (or `streaming-json`) of a FAILING long query;
determine whether it's max-turns exhaustion, the empty-finalization quirk (stopReason blank), a
timeout mid-search, or the multi-agent searcher itself returning nothing. Then fix the actual cause
(raise/auto-scale max-turns; switch to `streaming-json` so partial results survive; retry-on-empty
loop with backoff; possibly chunk multi-part queries).
2. **gemini:** same - capture why `search` returns empty (the wrapper already retries once and still
fails); check whether `--approval-mode yolo` + `google_web_search` is finalizing empty, and add a
real retry/fallback.
3. **Cross-fallback:** if grok empty -> auto-try gemini and vice-versa, and surface which one answered.
4. Acceptance: 5/5 success on a battery of long, multi-part research queries.
### Note
This was filed because both bots failed during live UniFi VPN/Teleport research and forced a fallback
to suspect endpoint-probing. The web-search feature is load-bearing for the "interview the AIs / read
the docs before probing" workflow - it has to be dependable.