From 1dd2f208a079828630682b875facac3325547b3c Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Wed, 17 Jun 2026 09:36:35 -0700 Subject: [PATCH] ct-thoughts: web-search bots reliability = MUST FIX (Mike) + research-method correction Mike's correction: web search (grok xsearch + gemini search) carries at least as much weight as live API probing - the searches gave the real leads this session (connector proxy, teleport setting path); blind endpoint-probing is "highly suspect" (mostly 404s). And the search bots MUST be properly fixed - both returned empty repeatedly on UniFi research despite the same-day partial grok fix. - docs/CT_THOUGHTS.md: Thought 2 (HIGH PRIORITY) - web-search reliability must-fix, with the observed failures + a proper-fix investigation plan (capture failing-query JSON; max-turns/streaming-json/ retry; cross-fallback grok<->gemini; 5/5 acceptance). - memory feedback_web_search_over_probing: lead with web search/docs; probe only to CONFIRM a hypothesis, never as primary discovery. Reading our own config is fine; guessing paths is not. - errorlog correction logged. Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/memory/MEMORY.md | 1 + .../feedback_web_search_over_probing.md | 29 +++++++++++ docs/CT_THOUGHTS.md | 48 +++++++++++++++++++ errorlog.md | 14 ++++++ 4 files changed, 92 insertions(+) create mode 100644 .claude/memory/feedback_web_search_over_probing.md diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index c6ea9ec1..00a78356 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -57,6 +57,7 @@ - [/tmp path mismatch on Windows](feedback_tmp_path_windows.md) — Write tool and Git Bash resolve `/tmp` to DIFFERENT real dirs. Use heredoc or workspace path for JSON payloads handed to curl. - [Windows strips embedded double-quotes](feedback_windows_quote_stripping.md) — Embedded `"` in an arg gets eaten twice over: PowerShell->curl.exe (CommandLineToArgvW) AND RMM->cmd.exe. Use single-quoted heredoc `<<'JSON'` + `--data-binary @-` for bodies; build `"` from `[char]34`; or drop the quoted part (e.g. `shutdown /c`). - [Interview the AI / read its docs before probing](feedback_interview_ai_read_docs.md) — To learn an external AI/CLI's syntax or capabilities, READ its bundled docs (Grok: `~/.grok/docs/user-guide/`, `README.md`, `grok inspect`/`models`/`--help`) or interview the model; don't guess flags or run slow trial-and-error. One run to confirm a doc-derived hypothesis, not a dozen to discover. +- [Web search over blind probing](feedback_web_search_over_probing.md) — For external API/capability discovery, LEAD with web search (grok/gemini) + vendor docs; live endpoint-probing only CONFIRMS a hypothesis, never the primary discovery method (it mostly 404s, "highly suspect"). Reading a system's OWN config is fine; guessing unknown PATHS is not. Web-search bots being flaky is a must-fix (CT_THOUGHTS Thought 2). - [Windows bash command mapping](feedback_windows_bash_mapping.md) — `bash` often resolves to WSL stub instead of Git/MSYS bash required by the harness. Fix by prepending `C:\Program Files\Git\bin` (and usr\bin) to PATH, or source `.claude/scripts/ensure-git-bash.ps1`. Profile has the logic; use plain `bash .claude/scripts/...` after remap. See the helper and this memory file for details. - [Git must authenticate non-interactively](feedback_git_noninteractive_auth.md) — Mike's gripe with Git for Windows is the constant password prompts (GCM) that hang automation, NOT the tool itself. D:\ClaudeTools is set to `credential.helper=store` primed with the azcomputerguru Gitea API token (host 172.16.3.20:3000); always set `GIT_TERMINAL_PROMPT=0`. Any never-prompts solution is acceptable. - [Vault git auth — GCM shadows store token](feedback_vault_gcm_shadow_auth.md) — vault sync "Failed to authenticate user" on git.azcomputerguru.com: GCM is first in the helper chain and shadows the valid store token. Fix (machine-local): store-only credential.helper reset + pin `azcomputerguru@` in the vault remote URL so store returns the durable PAT (not the volatile OAUTH_USER JWT). Applied GURU-5070 2026-06-07. diff --git a/.claude/memory/feedback_web_search_over_probing.md b/.claude/memory/feedback_web_search_over_probing.md new file mode 100644 index 00000000..09732b78 --- /dev/null +++ b/.claude/memory/feedback_web_search_over_probing.md @@ -0,0 +1,29 @@ +--- +name: feedback_web_search_over_probing +description: For external API/capability discovery, LEAD with web search (grok/gemini) and bundled docs; use live endpoint-probing only to CONFIRM a search/doc-derived hypothesis - never as the primary discovery method. Mike's correction 2026-06-17. +metadata: + type: feedback +--- + +When figuring out an external system's API surface or capabilities, **web search (grok +xsearch / gemini search) and the vendor's own docs carry AT LEAST as much weight as live +experimentation** - usually more. + +**Why (Mike, 2026-06-17):** blind endpoint-probing ("does `/stat/openvpn` exist? does +`/cmd/vpnmgr`?") is guessing - it mostly 404s and is "highly suspect" as a source of truth. +The genuinely valuable leads this session came from the searches: grok surfaced the UniFi +**cloud connector proxy** (`/v1/connector/consoles/.../proxy/...`); gemini surfaced the +**Teleport `/rest/setting/teleport`** path. Probing only *confirmed* those after the search +pointed the way. + +**How to apply:** +- Discovery order: web search + bundled docs FIRST -> form a specific hypothesis -> then ONE + targeted live call to CONFIRM it. Not: spray candidate URLs and infer from status codes. +- Reading a system's OWN config (e.g. our gateway's `networkconf`) is fine - that's reading + real data, not guessing endpoints. The "suspect" part is guessing unknown PATHS. +- Do not present probe results as "authoritative" over web-search findings; weight them at + least equally and reconcile. +- Corollary: the web-search bots being flaky is a real liability (see CT_THOUGHTS "Thought 2 - + web-search reliability MUST FIX"); when they fail, say so plainly rather than silently + falling back to guessing and calling it authoritative. +- Complements [[feedback_interview_ai_read_docs]]. diff --git a/docs/CT_THOUGHTS.md b/docs/CT_THOUGHTS.md index efb04522..27e9d138 100644 --- a/docs/CT_THOUGHTS.md +++ b/docs/CT_THOUGHTS.md @@ -15,6 +15,7 @@ > > The entries below are the current thoughts: > 1. ClaudeTools 3.0 — web-based co-work workspace (Mike, 2026-06-14) — **Discussed (vision-stage, no build go)** +> 2. Web-search bots (grok xsearch + gemini search) reliability - MUST FIX (Mike, 2026-06-17) - **Raw, HIGH PRIORITY** --- @@ -210,3 +211,50 @@ a day spent, not a quarter. 2. **Input arbitration** — is a soft driving-token enough, or do you want hard turns? 3. **CLI coexistence** — do existing Claude Code CLI sessions appear as first-class nodes, or is the web the only entry point? + +--- + +## Thought 2 — Web-search bots (grok xsearch + gemini search) reliability: MUST FIX (Mike, 2026-06-17) + +**Status: Raw - HIGH PRIORITY. Mike's directive: this "absolutely must be properly fixed."** + +### The problem + +The live web-search path - `grok xsearch` (multi-agent web_search) and `agy/gemini search` +(Google grounding) - is the MOST VALUABLE discovery tool we have (it surfaced the UniFi cloud +connector proxy and the Teleport `/rest/setting/teleport` path that blind endpoint-probing never +would have). But it is UNRELIABLE: both return EMPTY intermittently, especially on longer / +multi-part queries. + +Observed 2026-06-17: +- `grok xsearch` returned `[ask-grok] no result (stopReason=)` ~5+ times on UniFi queries, + DESPITE the same-day partial fix (--yolo, drop --no-subagents, web-primary prompt, 300s budget) + that DID work on a short query ("current rust version", 23s). So the fix is incomplete - longer + queries still fail. +- `agy/gemini search` returned `[ask-gemini] empty response` (even after its built-in single retry) + on the same queries. + +### Why it matters (Mike's reweighting) + +Web search now carries AT LEAST as much weight as live API probing. Probing without a search/doc +lead is "blind guessing" and mostly 404s; the searches give the real leads. So a flaky search bot +directly degrades research quality and pushes the loop back toward bad guessing. + +### Proper fix (not a workaround) - investigation plan + +1. **grok:** capture the raw `--output-format json` (or `streaming-json`) of a FAILING long query; + determine whether it's max-turns exhaustion, the empty-finalization quirk (stopReason blank), a + timeout mid-search, or the multi-agent searcher itself returning nothing. Then fix the actual cause + (raise/auto-scale max-turns; switch to `streaming-json` so partial results survive; retry-on-empty + loop with backoff; possibly chunk multi-part queries). +2. **gemini:** same - capture why `search` returns empty (the wrapper already retries once and still + fails); check whether `--approval-mode yolo` + `google_web_search` is finalizing empty, and add a + real retry/fallback. +3. **Cross-fallback:** if grok empty -> auto-try gemini and vice-versa, and surface which one answered. +4. Acceptance: 5/5 success on a battery of long, multi-part research queries. + +### Note + +This was filed because both bots failed during live UniFi VPN/Teleport research and forced a fallback +to suspect endpoint-probing. The web-search feature is load-bearing for the "interview the AIs / read +the docs before probing" workflow - it has to be dependable. diff --git a/errorlog.md b/errorlog.md index aad82891..436a452f 100644 --- a/errorlog.md +++ b/errorlog.md @@ -17,6 +17,20 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · +2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=] + +2026-06-17 | GURU-5070 | research-method | [correction] treated blind endpoint-probing as 'authoritative' over web search; Mike: web searches (grok/gemini) have been MORE valuable - they gave the real leads (connector proxy, teleport setting path), probing only confirmed and mostly 404s. Lead with web search; probe only to CONFIRM a search/doc-derived hypothesis [ctx: ref=feedback_interview_ai_read_docs] + +2026-06-17 | GURU-5070 | bash/background-ai | [friction] mixing a backgrounded ask-grok/ask-gemini ('&' + wait) with foreground curl probes in ONE Bash command repeatedly yields an EMPTY output capture; run AI calls as separate run_in_background Bash tool calls, never '&'+wait inline with work to capture [ctx: ref=grok/gemini wrappers] + +2026-06-17 | GURU-5070 | agy | gemini returned no response (empty after retry) [ctx: mode=search err=Ripgrep is not available. Falling back to GrepTool.] + +2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=] + +2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=] + +2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=] + 2026-06-17 | Howard-Home | unifi-wifi/apply-radio | [friction] per-AP --apply loop: each call re-logs-in to the controller; rapid succession throttles -> write silently skips (no [ok]). Fix: space calls (sleep 3-4) or add multi-AP/one-login support 2026-06-17 | Howard-Home | wiki-compile | [friction] full recompile Sonnet subagent ran ~54min then crashed on 32k output cap (tried to emit the ~490-line article despite being told to write-to-file and return only a summary); recovered via direct surgical Edits to the existing article. Fix: for --full on large existing articles, prefer targeted Edit integration over a subagent rewrite, or hard-cap/forbid article body in the subagent reply. [ctx: skill=wiki-compile target=client:cascades-tucson]