grok: fix xsearch (multi-agent web_search), pin grok-build, RTFM doc sweep

Root-caused the long-standing `ask-grok.sh xsearch` "no result (stopReason=)" failure by reading Grok's bundled docs (~/.grok/docs/user-guide + README) instead of probing: - web_search runs a SEPARATE multi-agent model (grok-4.20-multi-agent), so the wrapper's blanket --no-subagents strangled it -> indefinite hang, 0 bytes. Scoped --no-subagents OFF xsearch; use --yolo (documented headless tool-run posture). - xsearch prompt mandated X/Twitter search on every call (slow multi-agent) and the budget was 240s -> still timed out. Now web-primary (X only when relevant), 300s. Validated end-to-end through the wrapper: 23s, correct answer + 3 sources. Model: pin -m grok-build (xAI flagship, 512k, the documented default) for the reasoning modes (text/verify/review*) so quality is deterministic and not at the mercy of the runtime default (this machine drifted to grok-composer-2.5-fast, a fast Cursor coding model). xsearch + image/video keep the runtime default. Validated text mode on grok-build (13s). Doc accuracy (SKILL.md): corrected the model facts (default, the separate web_search model, --effort unsupported on grok-build per supports_reasoning_effort:false); documented the xsearch subagent exception. Fixed a stale in-script comment claiming --rules/--disallowed-tools "tripped the CLI" (both are valid headless flags). memory: add feedback_interview_ai_read_docs (read bundled docs / interview the model before probing) + index; errorlog correction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 19:14:59 -07:00
parent 58ecc5ad40
commit a242797908
5 changed files with 85 additions and 9 deletions
--- a/.claude/memory/MEMORY.md
+++ b/.claude/memory/MEMORY.md
@@ -55,6 +55,7 @@
 - [Point vault-access teammates at SOPS path](feedback_vault_pointer_for_teammates.md) — When relaying infra/credential info to Howard or other vault-access teammates, hand over the SOPS path + key anchors; don't transcribe the entry's fields into the message.
 - [/tmp path mismatch on Windows](feedback_tmp_path_windows.md) — Write tool and Git Bash resolve `/tmp` to DIFFERENT real dirs. Use heredoc or workspace path for JSON payloads handed to curl.
 - [Windows strips embedded double-quotes](feedback_windows_quote_stripping.md) — Embedded `"` in an arg gets eaten twice over: PowerShell->curl.exe (CommandLineToArgvW) AND RMM->cmd.exe. Use single-quoted heredoc `<<'JSON'` + `--data-binary @-` for bodies; build `"` from `[char]34`; or drop the quoted part (e.g. `shutdown /c`).
 - [Interview the AI / read its docs before probing](feedback_interview_ai_read_docs.md) — To learn an external AI/CLI's syntax or capabilities, READ its bundled docs (Grok: `~/.grok/docs/user-guide/`, `README.md`, `grok inspect`/`models`/`--help`) or interview the model; don't guess flags or run slow trial-and-error. One run to confirm a doc-derived hypothesis, not a dozen to discover.
 - [Windows bash command mapping](feedback_windows_bash_mapping.md) — `bash` often resolves to WSL stub instead of Git/MSYS bash required by the harness. Fix by prepending `C:\Program Files\Git\bin` (and usr\bin) to PATH, or source `.claude/scripts/ensure-git-bash.ps1`. Profile has the logic; use plain `bash .claude/scripts/...` after remap. See the helper and this memory file for details.
 - [Git must authenticate non-interactively](feedback_git_noninteractive_auth.md) — Mike's gripe with Git for Windows is the constant password prompts (GCM) that hang automation, NOT the tool itself. D:\ClaudeTools is set to `credential.helper=store` primed with the azcomputerguru Gitea API token (host 172.16.3.20:3000); always set `GIT_TERMINAL_PROMPT=0`. Any never-prompts solution is acceptable.
 - [Vault git auth — GCM shadows store token](feedback_vault_gcm_shadow_auth.md) — vault sync "Failed to authenticate user" on git.azcomputerguru.com: GCM is first in the helper chain and shadows the valid store token. Fix (machine-local): store-only credential.helper reset + pin `azcomputerguru@` in the vault remote URL so store returns the durable PAT (not the volatile OAUTH_USER JWT). Applied GURU-5070 2026-06-07.
--- a/.claude/memory/feedback_interview_ai_read_docs.md
+++ b/.claude/memory/feedback_interview_ai_read_docs.md
@@ -0,0 +1,35 @@
 ---
 name: feedback_interview_ai_read_docs
 description: Before guessing or probing an external AI/CLI's command syntax or capabilities, READ its bundled docs and/or interview the model itself — probing wastes tokens and misleads.
 metadata:
  type: feedback
 ---
 When you need to understand an external AI's or CLI tool's **command syntax or
 capabilities**, do NOT blindly guess flags or run slow trial-and-error probes.
 First **read its own bundled documentation**, and/or **interview the model
 itself** (ask it to read its own docs and explain). The authoritative source is
 almost always already on disk.
 **Why:** repeated timed probing is expensive (each Grok run is 80-300s), gives
 ambiguous signals, and is the exact "blindly guessing or probing" pattern Mike
 has flagged. The docs answer the question directly and for free.
 **Concrete example (the lesson):** the long-standing `ask-grok.sh xsearch`
 "no result (stopReason=)" failure was root-caused not by probing but by reading
 **`~/.grok/docs/user-guide/` (esp. `14-headless-mode.md`, `11-custom-models.md`,
 `05-configuration.md`) and `~/.grok/README.md`**. They revealed: `web_search`
 runs a SEPARATE multi-agent model (`grok-4.20-multi-agent`), so the wrapper's
 blanket `--no-subagents` strangled it; the documented headless JSON schema is
 `{text,stopReason,sessionId,requestId}`; and `--yolo` is the documented
 tool-run posture. One confirmatory run, not a dozen.
 **How to apply:**
 - For the Grok CLI: read `~/.grok/docs/user-guide/*.md` and `~/.grok/README.md`
  (and `grok inspect` / `grok models` / `grok <cmd> --help` for live truth)
  before changing [[feedback_windows_quote_stripping]]-style wrapper internals.
 - For any vendor CLI/API: locate its shipped docs/`--help`/OpenAPI first; treat
  one targeted run as *confirmation* of a doc-derived hypothesis, not as the
  discovery method.
 - Interviewing the model (its text path) is valid even when a tool path is
  broken — asking Grok doesn't require its web_search to work.
--- a/.claude/skills/grok/SKILL.md
+++ b/.claude/skills/grok/SKILL.md
@@ -91,10 +91,10 @@ Grok is **per-machine** — the skill syncs fleet-wide but the binary does not.
 ## Safety / operational notes
- `~/.grok/config.toml` defaults to `permission_mode = "always-approve"` (auto-runs tools). The wrapper overrides with `--permission-mode dontAsk` and `--no-subagents`; do not bypass that.
+- `~/.grok/config.toml` defaults to `permission_mode = "always-approve"` (auto-runs tools). The wrapper's single-context modes (`text`/`verify`/`review*`) override with `--permission-mode dontAsk --no-subagents` (cheap, bounded, timeout-predictable). **`xsearch` is the deliberate exception:** it uses `--yolo` and leaves subagents ENABLED, because `web_search` runs the `grok-4.20-multi-agent` model — passing `--no-subagents` made it hang with zero output (the long-standing xsearch-empty bug; root-caused + fixed 2026-06-16). Don't re-add `--no-subagents` to xsearch.
 - Prompts are passed via `--prompt-file` only (inline args break on shell quoting).
- `grok-build` rejects `--effort`/`--reasoning-effort` (400) — don't pass them.
+- `grok-build` does NOT support `--effort`/`--reasoning-effort` (`supports_reasoning_effort:false` in the model info) — passing them 400s. Don't.
- Models available: `grok-build` (default), `grok-composer-2.5-fast`. The agent self-reports as Grok 4.3.
+- Models (verify live with `grok models`): `grok-build` ("Grok Build", xAI's latest coding model, 512k ctx — the **documented** default per `docs/user-guide/11-custom-models.md`) and `grok-composer-2.5-fast` ("Cursor's latest coding model", fast). **Gotcha:** on this machine the runtime default that `grok models` reports is `grok-composer-2.5-fast`, and the wrapper pins no `-m`, so it currently rides whatever the runtime default is (not necessarily grok-build). `web_search` uses a SEPARATE model, `grok-4.20-multi-agent` (set via `[tools] web_search` / `GROK_WEB_SEARCH_MODEL`).
 - After media gen, Claude should **view the image** (Read tool) to confirm correctness; videos can be confirmed by header/ffprobe.
 ## Reference
--- a/.claude/skills/grok/scripts/ask-grok.sh
+++ b/.claude/skills/grok/scripts/ask-grok.sh
@@ -94,12 +94,27 @@ if [[ "$OSTYPE" == "darwin"* ]]; then
  TIMEOUT_CMD="$(command -v gtimeout 2>/dev/null || echo timeout)"
 fi
 # Policy flags, overridable per mode. Defaults suit the single-context subordinate
 # calls (text/verify/review): a fixed permission posture and NO subagent fan-out --
 # that keeps a one-shot second-opinion call cheap, bounded, and timeout-predictable.
 # xsearch OVERRIDES both: web_search runs the grok-4.20-multi-agent model, so
 # --no-subagents made it hang with zero output (root cause, 2026-06-16). See xsearch.
 GROK_PERM_FLAGS=(--permission-mode dontAsk)
 GROK_SUBAGENT_FLAGS=(--no-subagents)
 # Pin xAI's flagship grok-build for the reasoning modes (text/verify/review*) so the
 # second-opinion quality is deterministic and not at the mercy of the runtime default
 # (which drifts -- this machine's `grok models` reports grok-composer-2.5-fast, a fast
 # Cursor coding model). xsearch + image/video clear this to "" to use the runtime
 # default (xsearch's searcher is a separate model; media gen was verified on default).
 GROK_MODEL="${GROK_MODEL:-grok-build}"
 run_grok() {
  local to="$1"; shift
  local mflag=(); [ -n "$GROK_MODEL" ] && mflag=(-m "$GROK_MODEL")
  # Hand grok native-Windows paths (cygpath); MSYS leaves already-Windows paths alone,
  # so conversion is deterministic and space-safe.
  "$TIMEOUT_CMD" "$to" "$GROK" --prompt-file "$(winpath "$PF")" --output-format json \
-    --permission-mode dontAsk --no-subagents --no-plan --cwd "$(winpath "$RUN_CWD")" "$@" \
+    "${mflag[@]}" "${GROK_PERM_FLAGS[@]}" "${GROK_SUBAGENT_FLAGS[@]}" --no-plan --cwd "$(winpath "$RUN_CWD")" "$@" \
    >"$OUT" 2>"$TMP/err.txt" || true
 }
@@ -161,9 +176,12 @@ case "$MODE" in
      SRC="${1:-}"
    fi
    [ -z "$SRC" ] && { echo "usage: $SELF $MODE \"<prompt>\" | $SELF $MODE --prompt-file <path>" >&2; exit 2; }
-    # Prompt-level steering keeps it text-only and (for verify) adversarial.
+    # Prompt steering + --disable-web-search (below) keeps these text-only; no tools
-    # (The --disallowed-tools/--rules flags tripped the CLI; --check adds a slow
+    # are needed. NOTE for future hardening: --rules and --disallowed-tools ARE valid
-    # multi-turn self-check loop — both avoided in favor of prompt steering.)
+    # headless flags (per docs/user-guide/14-headless-mode.md) and could hard-restrict
    # tools if prompt steering ever proves insufficient. --check/--self-verify also
    # exists, but it makes grok verify its OWN output -- not adversarially critique an
    # EXTERNAL claim -- so prompt steering is the correct semantic for verify mode.
    if [ "$MODE" = "verify" ]; then
      printf 'Adversarially evaluate the following claim/finding/document: try hard to find any reason it is WRONG, incomplete, or overstated. Then give a verdict plus specific justification. Answer in text only; do not use tools. Content:\n%s' "$SRC" > "$PF"
    else
@@ -177,6 +195,7 @@ case "$MODE" in
  image)
    [ -z "${1:-}" ] && { echo "usage: $SELF image \"<prompt>\" [out.png]" >&2; exit 2; }
    out="${2:-grok-image.png}"
    GROK_MODEL=""   # media gen verified on the runtime default; don't force grok-build here
    printf 'Use your image_gen tool to generate exactly this image, save it, then stop. Image: %s' "$1" > "$PF"
    run_grok 240 --disable-web-search --max-turns 12
    sid="$(jfield sessionId)"; art="$(find_artifact "$sid" images)"
@@ -189,6 +208,7 @@ case "$MODE" in
    input="$2"; out="${3:-grok-video.mp4}"
    [ -f "$input" ] || { echo "[$SELF] input image not found: $input" >&2; exit 2; }
    cp -f "$input" "$WORK/input.jpg"
    GROK_MODEL=""   # media gen verified on the runtime default; don't force grok-build here
    printf 'Use your image_to_video tool to animate input.jpg (in the current directory) into a short clip, save it, then stop. Motion: %s' "$1" > "$PF"
    run_grok 360 --disable-web-search --max-turns 20
    sid="$(jfield sessionId)"; art="$(find_artifact "$sid" videos)"
@@ -198,8 +218,18 @@ case "$MODE" in
    ;;
  xsearch)
    [ -z "${1:-}" ] && { echo "usage: $SELF xsearch \"<query>\"" >&2; exit 2; }
-    printf 'Use your web_search and X/Twitter search tools to answer this, cite sources, then stop: %s' "$1" > "$PF"
+    # web_search runs the grok-4.20-multi-agent model -> subagents MUST stay enabled
-    run_grok 150 --max-turns 6
+    # (--no-subagents made it hang with ZERO output, the long-standing xsearch bug).
    # --yolo is the documented headless tool-run posture (README/14-headless-mode.md).
    # Budget is generous: a live web_search measured ~83s end-to-end.
    GROK_SUBAGENT_FLAGS=()
    GROK_PERM_FLAGS=(--yolo)
    GROK_MODEL=""   # search runs the separate grok-4.20-multi-agent model; use runtime default orchestrator
    # Lead with web_search (fast); pull in X/Twitter ONLY when the query is actually
    # about social/news/sentiment. Mandating X search on every call ran the multi-agent
    # searcher long enough to blow a 240s budget. Generous 300s budget for headroom.
    printf 'Use your web_search tool to answer this question and cite sources. Use your X/Twitter search tools ONLY if the question is specifically about social media, breaking news, or public sentiment. Then stop.\n\nQuestion: %s' "$1" > "$PF"
    run_grok 300 --max-turns 14
    txt="$(jfield text)"
    if [ -n "$txt" ]; then printf '%s\n' "$txt"; else
      echo "[$SELF] no result (stopReason=$(jfield stopReason))" >&2; _logerr "grok xsearch returned no result" --context "mode=xsearch stopReason=$(jfield stopReason)"; exit 1; fi
--- a/errorlog.md
+++ b/errorlog.md
@@ -17,6 +17,16 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
 <!-- Append entries below this line -->
 2026-06-17 | GURU-5070 | grok/ask-grok.sh | [correction] blindly probed grok with slow timed runs to find the xsearch syntax instead of reading its bundled docs (~/.grok/docs/user-guide/ + README.md) / interviewing the model first; the docs gave the root cause directly (web_search=grok-4.20-multi-agent multi-agent model, killed by --no-subagents) [ctx: ref=feedback_interview_ai_read_docs]
 2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=]
 2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=]
 2026-06-17 | GURU-5070 | grok/xsearch | xsearch returned empty (stopReason finalization quirk) on Grok-co-work research; fell back to text-mode design + Claude synthesis [ctx: topic=grok-as-claude-cowork]
 2026-06-17 | GURU-5070 | grok | grok xsearch returned no result [ctx: mode=xsearch stopReason=]
 2026-06-17 | GURU-5070 | syncro/customer-comment | [friction] posted a customer-facing emailed comment to #32333 WITHOUT previewing for human review first; preview-before-send is mandatory for ALL outgoing comms [ctx: ref=CLAUDE.md syncro 'Before any POST: Always show the full payload and wait for confirmation']
 2026-06-17 | GURU-5070 | syncro/customer-comment | [correction] conflated Arizona Medical Transit (AMT-PC, Windows: Dell bloatware + misbehaving Syncro agent) cleanup into the Scileppi Mac (#32333) customer comment; Scileppi was a full-disk/Trash + Mail + Downloads-redesign job, no Dell/agent work [ctx: ticket=32333 mac=scileppi]