From a2427979085489fb41f68a5f074a65819f9ef5a5 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Tue, 16 Jun 2026 19:14:59 -0700 Subject: [PATCH] grok: fix xsearch (multi-agent web_search), pin grok-build, RTFM doc sweep Root-caused the long-standing `ask-grok.sh xsearch` "no result (stopReason=)" failure by reading Grok's bundled docs (~/.grok/docs/user-guide + README) instead of probing: - web_search runs a SEPARATE multi-agent model (grok-4.20-multi-agent), so the wrapper's blanket --no-subagents strangled it -> indefinite hang, 0 bytes. Scoped --no-subagents OFF xsearch; use --yolo (documented headless tool-run posture). - xsearch prompt mandated X/Twitter search on every call (slow multi-agent) and the budget was 240s -> still timed out. Now web-primary (X only when relevant), 300s. Validated end-to-end through the wrapper: 23s, correct answer + 3 sources. Model: pin -m grok-build (xAI flagship, 512k, the documented default) for the reasoning modes (text/verify/review*) so quality is deterministic and not at the mercy of the runtime default (this machine drifted to grok-composer-2.5-fast, a fast Cursor coding model). xsearch + image/video keep the runtime default. Validated text mode on grok-build (13s). Doc accuracy (SKILL.md): corrected the model facts (default, the separate web_search model, --effort unsupported on grok-build per supports_reasoning_effort:false); documented the xsearch subagent exception. Fixed a stale in-script comment claiming --rules/--disallowed-tools "tripped the CLI" (both are valid headless flags). memory: add feedback_interview_ai_read_docs (read bundled docs / interview the model before probing) + index; errorlog correction. Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/memory/MEMORY.md | 1 + .../memory/feedback_interview_ai_read_docs.md | 35 ++++++++++++++++ .claude/skills/grok/SKILL.md | 6 +-- .claude/skills/grok/scripts/ask-grok.sh | 42 ++++++++++++++++--- errorlog.md | 10 +++++ 5 files changed, 85 insertions(+), 9 deletions(-) create mode 100644 .claude/memory/feedback_interview_ai_read_docs.md diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index fca1f19..b067c5c 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -55,6 +55,7 @@ - [Point vault-access teammates at SOPS path](feedback_vault_pointer_for_teammates.md) — When relaying infra/credential info to Howard or other vault-access teammates, hand over the SOPS path + key anchors; don't transcribe the entry's fields into the message. - [/tmp path mismatch on Windows](feedback_tmp_path_windows.md) — Write tool and Git Bash resolve `/tmp` to DIFFERENT real dirs. Use heredoc or workspace path for JSON payloads handed to curl. - [Windows strips embedded double-quotes](feedback_windows_quote_stripping.md) — Embedded `"` in an arg gets eaten twice over: PowerShell->curl.exe (CommandLineToArgvW) AND RMM->cmd.exe. Use single-quoted heredoc `<<'JSON'` + `--data-binary @-` for bodies; build `"` from `[char]34`; or drop the quoted part (e.g. `shutdown /c`). +- [Interview the AI / read its docs before probing](feedback_interview_ai_read_docs.md) — To learn an external AI/CLI's syntax or capabilities, READ its bundled docs (Grok: `~/.grok/docs/user-guide/`, `README.md`, `grok inspect`/`models`/`--help`) or interview the model; don't guess flags or run slow trial-and-error. One run to confirm a doc-derived hypothesis, not a dozen to discover. - [Windows bash command mapping](feedback_windows_bash_mapping.md) — `bash` often resolves to WSL stub instead of Git/MSYS bash required by the harness. Fix by prepending `C:\Program Files\Git\bin` (and usr\bin) to PATH, or source `.claude/scripts/ensure-git-bash.ps1`. Profile has the logic; use plain `bash .claude/scripts/...` after remap. See the helper and this memory file for details. - [Git must authenticate non-interactively](feedback_git_noninteractive_auth.md) — Mike's gripe with Git for Windows is the constant password prompts (GCM) that hang automation, NOT the tool itself. D:\ClaudeTools is set to `credential.helper=store` primed with the azcomputerguru Gitea API token (host 172.16.3.20:3000); always set `GIT_TERMINAL_PROMPT=0`. Any never-prompts solution is acceptable. - [Vault git auth — GCM shadows store token](feedback_vault_gcm_shadow_auth.md) — vault sync "Failed to authenticate user" on git.azcomputerguru.com: GCM is first in the helper chain and shadows the valid store token. Fix (machine-local): store-only credential.helper reset + pin `azcomputerguru@` in the vault remote URL so store returns the durable PAT (not the volatile OAUTH_USER JWT). Applied GURU-5070 2026-06-07. diff --git a/.claude/memory/feedback_interview_ai_read_docs.md b/.claude/memory/feedback_interview_ai_read_docs.md new file mode 100644 index 0000000..2730cac --- /dev/null +++ b/.claude/memory/feedback_interview_ai_read_docs.md @@ -0,0 +1,35 @@ +--- +name: feedback_interview_ai_read_docs +description: Before guessing or probing an external AI/CLI's command syntax or capabilities, READ its bundled docs and/or interview the model itself — probing wastes tokens and misleads. +metadata: + type: feedback +--- + +When you need to understand an external AI's or CLI tool's **command syntax or +capabilities**, do NOT blindly guess flags or run slow trial-and-error probes. +First **read its own bundled documentation**, and/or **interview the model +itself** (ask it to read its own docs and explain). The authoritative source is +almost always already on disk. + +**Why:** repeated timed probing is expensive (each Grok run is 80-300s), gives +ambiguous signals, and is the exact "blindly guessing or probing" pattern Mike +has flagged. The docs answer the question directly and for free. + +**Concrete example (the lesson):** the long-standing `ask-grok.sh xsearch` +"no result (stopReason=)" failure was root-caused not by probing but by reading +**`~/.grok/docs/user-guide/` (esp. `14-headless-mode.md`, `11-custom-models.md`, +`05-configuration.md`) and `~/.grok/README.md`**. They revealed: `web_search` +runs a SEPARATE multi-agent model (`grok-4.20-multi-agent`), so the wrapper's +blanket `--no-subagents` strangled it; the documented headless JSON schema is +`{text,stopReason,sessionId,requestId}`; and `--yolo` is the documented +tool-run posture. One confirmatory run, not a dozen. + +**How to apply:** +- For the Grok CLI: read `~/.grok/docs/user-guide/*.md` and `~/.grok/README.md` + (and `grok inspect` / `grok models` / `grok --help` for live truth) + before changing [[feedback_windows_quote_stripping]]-style wrapper internals. +- For any vendor CLI/API: locate its shipped docs/`--help`/OpenAPI first; treat + one targeted run as *confirmation* of a doc-derived hypothesis, not as the + discovery method. +- Interviewing the model (its text path) is valid even when a tool path is + broken — asking Grok doesn't require its web_search to work. diff --git a/.claude/skills/grok/SKILL.md b/.claude/skills/grok/SKILL.md index d208577..79552d2 100644 --- a/.claude/skills/grok/SKILL.md +++ b/.claude/skills/grok/SKILL.md @@ -91,10 +91,10 @@ Grok is **per-machine** — the skill syncs fleet-wide but the binary does not. ## Safety / operational notes -- `~/.grok/config.toml` defaults to `permission_mode = "always-approve"` (auto-runs tools). The wrapper overrides with `--permission-mode dontAsk` and `--no-subagents`; do not bypass that. +- `~/.grok/config.toml` defaults to `permission_mode = "always-approve"` (auto-runs tools). The wrapper's single-context modes (`text`/`verify`/`review*`) override with `--permission-mode dontAsk --no-subagents` (cheap, bounded, timeout-predictable). **`xsearch` is the deliberate exception:** it uses `--yolo` and leaves subagents ENABLED, because `web_search` runs the `grok-4.20-multi-agent` model — passing `--no-subagents` made it hang with zero output (the long-standing xsearch-empty bug; root-caused + fixed 2026-06-16). Don't re-add `--no-subagents` to xsearch. - Prompts are passed via `--prompt-file` only (inline args break on shell quoting). -- `grok-build` rejects `--effort`/`--reasoning-effort` (400) — don't pass them. -- Models available: `grok-build` (default), `grok-composer-2.5-fast`. The agent self-reports as Grok 4.3. +- `grok-build` does NOT support `--effort`/`--reasoning-effort` (`supports_reasoning_effort:false` in the model info) — passing them 400s. Don't. +- Models (verify live with `grok models`): `grok-build` ("Grok Build", xAI's latest coding model, 512k ctx — the **documented** default per `docs/user-guide/11-custom-models.md`) and `grok-composer-2.5-fast` ("Cursor's latest coding model", fast). **Gotcha:** on this machine the runtime default that `grok models` reports is `grok-composer-2.5-fast`, and the wrapper pins no `-m`, so it currently rides whatever the runtime default is (not necessarily grok-build). `web_search` uses a SEPARATE model, `grok-4.20-multi-agent` (set via `[tools] web_search` / `GROK_WEB_SEARCH_MODEL`). - After media gen, Claude should **view the image** (Read tool) to confirm correctness; videos can be confirmed by header/ffprobe. ## Reference diff --git a/.claude/skills/grok/scripts/ask-grok.sh b/.claude/skills/grok/scripts/ask-grok.sh index 2f101f5..87a074d 100644 --- a/.claude/skills/grok/scripts/ask-grok.sh +++ b/.claude/skills/grok/scripts/ask-grok.sh @@ -94,12 +94,27 @@ if [[ "$OSTYPE" == "darwin"* ]]; then TIMEOUT_CMD="$(command -v gtimeout 2>/dev/null || echo timeout)" fi +# Policy flags, overridable per mode. Defaults suit the single-context subordinate +# calls (text/verify/review): a fixed permission posture and NO subagent fan-out -- +# that keeps a one-shot second-opinion call cheap, bounded, and timeout-predictable. +# xsearch OVERRIDES both: web_search runs the grok-4.20-multi-agent model, so +# --no-subagents made it hang with zero output (root cause, 2026-06-16). See xsearch. +GROK_PERM_FLAGS=(--permission-mode dontAsk) +GROK_SUBAGENT_FLAGS=(--no-subagents) +# Pin xAI's flagship grok-build for the reasoning modes (text/verify/review*) so the +# second-opinion quality is deterministic and not at the mercy of the runtime default +# (which drifts -- this machine's `grok models` reports grok-composer-2.5-fast, a fast +# Cursor coding model). xsearch + image/video clear this to "" to use the runtime +# default (xsearch's searcher is a separate model; media gen was verified on default). +GROK_MODEL="${GROK_MODEL:-grok-build}" + run_grok() { local to="$1"; shift + local mflag=(); [ -n "$GROK_MODEL" ] && mflag=(-m "$GROK_MODEL") # Hand grok native-Windows paths (cygpath); MSYS leaves already-Windows paths alone, # so conversion is deterministic and space-safe. "$TIMEOUT_CMD" "$to" "$GROK" --prompt-file "$(winpath "$PF")" --output-format json \ - --permission-mode dontAsk --no-subagents --no-plan --cwd "$(winpath "$RUN_CWD")" "$@" \ + "${mflag[@]}" "${GROK_PERM_FLAGS[@]}" "${GROK_SUBAGENT_FLAGS[@]}" --no-plan --cwd "$(winpath "$RUN_CWD")" "$@" \ >"$OUT" 2>"$TMP/err.txt" || true } @@ -161,9 +176,12 @@ case "$MODE" in SRC="${1:-}" fi [ -z "$SRC" ] && { echo "usage: $SELF $MODE \"\" | $SELF $MODE --prompt-file " >&2; exit 2; } - # Prompt-level steering keeps it text-only and (for verify) adversarial. - # (The --disallowed-tools/--rules flags tripped the CLI; --check adds a slow - # multi-turn self-check loop — both avoided in favor of prompt steering.) + # Prompt steering + --disable-web-search (below) keeps these text-only; no tools + # are needed. NOTE for future hardening: --rules and --disallowed-tools ARE valid + # headless flags (per docs/user-guide/14-headless-mode.md) and could hard-restrict + # tools if prompt steering ever proves insufficient. --check/--self-verify also + # exists, but it makes grok verify its OWN output -- not adversarially critique an + # EXTERNAL claim -- so prompt steering is the correct semantic for verify mode. if [ "$MODE" = "verify" ]; then printf 'Adversarially evaluate the following claim/finding/document: try hard to find any reason it is WRONG, incomplete, or overstated. Then give a verdict plus specific justification. Answer in text only; do not use tools. Content:\n%s' "$SRC" > "$PF" else @@ -177,6 +195,7 @@ case "$MODE" in image) [ -z "${1:-}" ] && { echo "usage: $SELF image \"\" [out.png]" >&2; exit 2; } out="${2:-grok-image.png}" + GROK_MODEL="" # media gen verified on the runtime default; don't force grok-build here printf 'Use your image_gen tool to generate exactly this image, save it, then stop. Image: %s' "$1" > "$PF" run_grok 240 --disable-web-search --max-turns 12 sid="$(jfield sessionId)"; art="$(find_artifact "$sid" images)" @@ -189,6 +208,7 @@ case "$MODE" in input="$2"; out="${3:-grok-video.mp4}" [ -f "$input" ] || { echo "[$SELF] input image not found: $input" >&2; exit 2; } cp -f "$input" "$WORK/input.jpg" + GROK_MODEL="" # media gen verified on the runtime default; don't force grok-build here printf 'Use your image_to_video tool to animate input.jpg (in the current directory) into a short clip, save it, then stop. Motion: %s' "$1" > "$PF" run_grok 360 --disable-web-search --max-turns 20 sid="$(jfield sessionId)"; art="$(find_artifact "$sid" videos)" @@ -198,8 +218,18 @@ case "$MODE" in ;; xsearch) [ -z "${1:-}" ] && { echo "usage: $SELF xsearch \"\"" >&2; exit 2; } - printf 'Use your web_search and X/Twitter search tools to answer this, cite sources, then stop: %s' "$1" > "$PF" - run_grok 150 --max-turns 6 + # web_search runs the grok-4.20-multi-agent model -> subagents MUST stay enabled + # (--no-subagents made it hang with ZERO output, the long-standing xsearch bug). + # --yolo is the documented headless tool-run posture (README/14-headless-mode.md). + # Budget is generous: a live web_search measured ~83s end-to-end. + GROK_SUBAGENT_FLAGS=() + GROK_PERM_FLAGS=(--yolo) + GROK_MODEL="" # search runs the separate grok-4.20-multi-agent model; use runtime default orchestrator + # Lead with web_search (fast); pull in X/Twitter ONLY when the query is actually + # about social/news/sentiment. Mandating X search on every call ran the multi-agent + # searcher long enough to blow a 240s budget. Generous 300s budget for headroom. + printf 'Use your web_search tool to answer this question and cite sources. Use your X/Twitter search tools ONLY if the question is specifically about social media, breaking news, or public sentiment. Then stop.\n\nQuestion: %s' "$1" > "$PF" + run_grok 300 --max-turns 14 txt="$(jfield text)" if [ -n "$txt" ]; then printf '%s\n' "$txt"; else echo "[$SELF] no result (stopReason=$(jfield stopReason))" >&2; _logerr "grok xsearch returned no result" --context "mode=xsearch stopReason=$(jfield stopReason)"; exit 1; fi diff --git a/errorlog.md b/errorlog.md index 3e9f406..ffeaa98 100644 --- a/errorlog.md +++ b/errorlog.md @@ -17,6 +17,16 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · +2026-06-17 | GURU-5070 | grok/ask-grok.sh | [correction] blindly probed grok with slow timed runs to find the xsearch syntax instead of reading its bundled docs (~/.grok/docs/user-guide/ + README.md) / interviewing the model first; the docs gave the root cause directly (web_search=grok-4.20-multi-agent multi-agent model, killed by --no-subagents) [ctx: ref=feedback_interview_ai_read_docs] + +2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=] + +2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=] + +2026-06-17 | GURU-5070 | grok/xsearch | xsearch returned empty (stopReason finalization quirk) on Grok-co-work research; fell back to text-mode design + Claude synthesis [ctx: topic=grok-as-claude-cowork] + +2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=] + 2026-06-17 | GURU-5070 | syncro/customer-comment | [friction] posted a customer-facing emailed comment to #32333 WITHOUT previewing for human review first; preview-before-send is mandatory for ALL outgoing comms [ctx: ref=CLAUDE.md syncro 'Before any POST: Always show the full payload and wait for confirmation'] 2026-06-17 | GURU-5070 | syncro/customer-comment | [correction] conflated Arizona Medical Transit (AMT-PC, Windows: Dell bloatware + misbehaving Syncro agent) cleanup into the Scileppi Mac (#32333) customer comment; Scileppi was a full-disk/Trash + Mail + Downloads-redesign job, no Dell/agent work [ctx: ticket=32333 mac=scileppi]