diff --git a/.claude/scripts/migrate-identity.sh b/.claude/scripts/migrate-identity.sh index 64396fd..9121e50 100755 --- a/.claude/scripts/migrate-identity.sh +++ b/.claude/scripts/migrate-identity.sh @@ -164,7 +164,7 @@ if '$GEMINI_INSTALLED' == 'true': gm['installed'] = True gm['binary'] = r'$GEMINI_BIN' gm.setdefault('auth', 'oauth') - gm.setdefault('capabilities', ['text', 'verify', 'review']) + gm['capabilities'] = ['text', 'verify', 'review', 'image-analyze', 'search'] else: gm['installed'] = False data['gemini'] = gm diff --git a/.claude/skills/agy/SKILL.md b/.claude/skills/agy/SKILL.md index 46ae34b..caaf31c 100644 --- a/.claude/skills/agy/SKILL.md +++ b/.claude/skills/agy/SKILL.md @@ -18,7 +18,8 @@ global, v0.45.1) for a genuinely independent, different-vendor second model. AGY is the sibling of [`grok`](../grok/SKILL.md): both are second-opinion / review routers. Use whichever you want a second model from (or both, to triangulate). Verified working on this machine (2026-06-05): text, verify, review (single -file / file set / git diff). +file / file set / git diff), image-analyze (vision input), search (live Google +web search). All KEYLESS — they work on Google OAuth, no API key. **Auth:** Gemini uses **Google login (OAuth)** — **no API key**. Creds live at `~/.gemini/oauth_creds.json`. If calls fail with an auth error, run `gemini` @@ -37,6 +38,8 @@ bash "$CLAUDETOOLS_ROOT/.claude/skills/agy/scripts/ask-gemini.sh" ... | `review` | `ask-gemini.sh review [""]` | Gemini reads the file itself (its `read_file` tool, read-only `plan` mode) and reviews it. Accepts absolute or repo-relative paths, and paths with spaces. Works even on gitignored files. | | `review-files` | `ask-gemini.sh review-files [-i ""] [f2 …]` | Review a **set** of files together (cross-file consistency, multi-file change). Paths absolute or repo-relative; spaces OK. No code passed as a shell arg. | | `review-diff` | `ask-gemini.sh review-diff [-C ] [-i ""] [-- ]` | Review a **git diff** (`git diff ` from ``; default repo root, use `-C` for a submodule e.g. `-C projects/msp-tools/guru-rmm`). Diff goes via the prompt file; Gemini can `read_file` changed files for full context. | +| `image-analyze` | `ask-gemini.sh image-analyze [""]` | **Vision** — Gemini `read_file`s the image and describes/answers about it. Pins the **pro vision model** (the default flash-lite router hallucinates image content). Path absolute or repo-relative; spaces OK. KEYLESS (works on OAuth). | +| `search` | `ask-gemini.sh search ""` (or `search --prompt-file `) | **Live Google web search** (sibling of `grok xsearch`) — Gemini uses its `google_web_search` tool and returns the answer **with source URLs**. KEYLESS (works on OAuth). | | `raw` | `ask-gemini.sh raw ` | Escape hatch — passes args straight to `gemini`. | The script runs Gemini headless with `-o json`, extracts the answer from @@ -50,6 +53,17 @@ never corrupts the parse. - `verify` / `review*` pin a **strong** model — `gemini-3.1-pro-preview` (verified available on this account 2026-06-05; the CLI's own pro tier). - Override either with `GEMINI_MODEL=` (e.g. `GEMINI_MODEL=gemini-2.5-pro`). +- `image-analyze` and `search` also pin the strong model (`GEMINI_MODEL` still honored). + +### Multimodal: image INPUT works, image GENERATION does not + +- **Image INPUT (vision) works on OAuth** — `image-analyze` reads an image with the + pinned **pro vision model** and describes it correctly. The default flash-lite + router HALLUCINATES image content, which is why the pro model is pinned. +- **Image GENERATION (nano-banana) does NOT work on OAuth** — it needs a Google AI + Studio `NANOBANANA_API_KEY` plus the `nanobanana` extension. **Deferred** for now. + Image/video **generation** stays [GROK](../grok/SKILL.md)'s lane (`grok image` / + `grok video`); AGY's multimodal support is read/analyze only. ## Machine availability (fleet) @@ -60,7 +74,7 @@ not. Availability is gated by `identity.json` (per-machine, gitignored): "gemini": { "installed": true, "binary": "C:/Users/guru/AppData/Roaming/npm/gemini", "auth": "oauth", "is_fleet_host": true, - "capabilities": ["text","verify","review"] } + "capabilities": ["text","verify","review","image-analyze","search"] } ``` - If `gemini.installed` is `false` (or the block is absent), `ask-gemini.sh` exits @@ -96,8 +110,9 @@ run both and compare — disagreement between them is a strong signal to slow do - Editing this repo's code → Claude's own agents own the codebase work. Gemini's `review*` modes are read-only (`--approval-mode plan`) by design; do not give Gemini write access to this repo. -- Image / video generation → that's GROK's lane (`grok image` / `grok video`), - not Gemini here. +- Image / video **generation** → that's GROK's lane (`grok image` / `grok video`), + not Gemini here (nano-banana needs an API key — deferred). Gemini CAN analyze an + image you give it (`image-analyze`, vision input on OAuth). - **Never** delegate unsupervised destructive / production actions to Gemini. Always review Gemini output before acting on it — like Grok, it can over-claim. diff --git a/.claude/skills/agy/scripts/ask-gemini.sh b/.claude/skills/agy/scripts/ask-gemini.sh index 6f383f3..5c1f039 100644 --- a/.claude/skills/agy/scripts/ask-gemini.sh +++ b/.claude/skills/agy/scripts/ask-gemini.sh @@ -32,6 +32,8 @@ # ask-gemini.sh review [instructions] # gemini reads + reviews one file # ask-gemini.sh review-files [-i "instr"] [f2 ...] # review a SET of files together # ask-gemini.sh review-diff [-C ] [-i "instr"] [-- ] +# ask-gemini.sh image-analyze ["question"] # vision: read_file image + describe (PRO model) +# ask-gemini.sh search "" # Google-grounded live web search + sources # ask-gemini.sh raw # escape hatch # # Exit: 0 ok, 1 no result, 2 usage, 3 not installed here, 127 gemini/python not found. @@ -97,7 +99,7 @@ fi STRONG_MODEL="${GEMINI_MODEL:-gemini-3.1-pro-preview}" MODE="${1:-}"; shift 2>/dev/null || true -[ -z "$MODE" ] && { echo "usage: $SELF {text|verify|review|review-files|review-diff|raw} ..." >&2; exit 2; } +[ -z "$MODE" ] && { echo "usage: $SELF {text|verify|review|review-files|review-diff|image-analyze|search|raw} ..." >&2; exit 2; } TMP="$(mktemp -d)"; trap 'rm -rf "$TMP"' EXIT PF="$TMP/prompt.txt"; OUT="$TMP/out.txt"; ERR="$TMP/err.txt" @@ -123,15 +125,62 @@ run_gemini() { # extract .response from the JSON object starting at the first '{' in $OUT. # Parsed via stdin so Windows python never resolves a git-bash (/c/...) path. -gresponse() { "$PY" -c "import json,sys +# +# Some pinned-pro tool-using turns (notably image-analyze) leak the model's +# internal reasoning stream into .response: a stray token + a 'thought' marker +# followed by 'CRITICAL INSTRUCTION N:' lines, then the real answer. We strip +# that preamble ONLY when the signature is clearly present, so clean responses +# (text/verify/review/search) pass through byte-for-byte unchanged. +gresponse() { "$PY" -c "import json,sys,re,os raw=sys.stdin.read() i=raw.find('{') if i < 0: print(''); sys.exit(0) try: - print(json.loads(raw[i:]).get('response','') or '') + r=json.loads(raw[i:]).get('response','') or '' except Exception: - print('')" < "$OUT"; } + print(''); sys.exit(0) +head=r[:40].lower() +leak=('thought' in head) or ('critical instruction' in r.lower()[:600]) +if leak: + lines=r.split('\n') + keep=[]; dropping=True + for ln in lines: + s=ln.strip() + low=s.lower() + if dropping and ( + low.endswith('thought') or low.startswith('critical instruction') + or low.startswith('thought:') or low=='' ): + continue + dropping=False + keep.append(ln) + cleaned='\n'.join(keep).strip() + r=cleaned if cleaned else r.strip() +# AGY_CLEAN: aggressive prefix scrub for tool-using turns (image-analyze), which +# can fuse a stray stream/tool token onto the front of the answer (e.g. '.', +# '.94>', 'uem_image_0_0_png}'). Off by default so text/verify/review/search are +# byte-exact. We only remove a junk run that ends in a stream delimiter (} > :) +# or a lone leading punctuation char, immediately before the first real sentence. +if os.environ.get('AGY_CLEAN') == '1' and r: + # The pro-preview tool loop sometimes prepends a numbered/markdown reasoning + # block before the actual answer. If a clear answer pivot follows such a + # preamble, keep from the pivot onward (the user-facing answer). + if re.search(r'(?im)^\s*\d+[.)]\s', r) or 'thought' in r[:60].lower(): + pivs=list(re.finditer(r'(?i)(Based on the image\b|\*\*Answer:?\*\*|The image (?:contains|shows|displays)\b)', r)) + if pivs: + r=r[pivs[-1].start():] + m=re.match(r'^[^\n]{0,40}?(?:\.png\)|\.jpe?g\)|[}>:)])\s*([\"A-Z].*)$', r, re.S) + if m and m.group(1): + r=m.group(1) + else: + # a short leading junk run (ASCII punctuation/digits or non-Latin stream + # tokens) before a capitalized/quoted sentence start. Bounded length so we + # never eat a real lowercase sentence or real prose. + m=re.match(r'^(?:[^A-Za-z\"]|[^\x00-\x7f]){1,8}([A-Z\"].*)$', r, re.S) + if m and m.group(1): + r=m.group(1) + r=r.strip() +print(r)" < "$OUT"; } # detect an auth failure in stderr (so we can give a precise remediation hint) auth_failed() { grep -qiE 'oauth|unauthor|authenticat|login|credential|invalid_grant|401' "$ERR" 2>/dev/null; } @@ -263,10 +312,49 @@ case "$MODE" in emit_or_fail ;; + image-analyze|image|vision) + # Independent second-model VISION. The default flash-lite router hallucinates + # image content, so we PIN the pro vision model (STRONG_MODEL) and run with + # yolo approval so read_file can execute. The image is copied into an included + # temp dir (like the review modes) and handed to Gemini by absolute winpath. + [ -z "${1:-}" ] && { echo "usage: $SELF image-analyze [\"question\"]" >&2; exit 2; } + target="$1" + question="${2:-Describe exactly what is in this image.}" + if [ -f "$target" ]; then resolved="$target" + elif [ -f "$REPO_ROOT/$target" ]; then resolved="$REPO_ROOT/$target" + else echo "[$SELF] image not found: $target" >&2; exit 2; fi + prep_includes + base="$(basename "$resolved")" + cp -f "$resolved" "$INCLUDE_DIR/$base" + img_win="$(winpath "$INCLUDE_DIR/$base")" + inc_win="$(winpath "$INCLUDE_DIR")" + # Image path goes in via %s (never as a printf format string). + printf 'Use your read_file tool to read the image at this absolute path, then describe exactly what you see. Report only what is actually present in the image; do not guess or invent content. Then stop. Do not modify anything.\nImage path: %s\n\nQuestion: %s' "$img_win" "$question" > "$PF" + run_gemini 240 -m "$STRONG_MODEL" --approval-mode yolo --include-directories "$inc_win" + AGY_CLEAN=1 emit_or_fail + ;; + + search|websearch) + # Google-grounded LIVE web search (mirrors grok xsearch). Gemini's + # google_web_search tool works on OAuth; run with yolo so the tool can fire. + # Query goes via the prompt file so long queries don't hit shell-quote limits. + SRC="" + if [ "${1:-}" = "--prompt-file" ]; then + [ -f "${2:-}" ] || { echo "[$SELF] prompt file not found: ${2:-}" >&2; exit 2; } + SRC="$(cat "$2")" + else + SRC="${1:-}" + fi + [ -z "$SRC" ] && { echo "usage: $SELF search \"\" | $SELF search --prompt-file " >&2; exit 2; } + printf 'Use your google_web_search tool to find current, live information answering the following, then stop. Answer concisely and ALWAYS include the source URLs you used (a Sources list of full URLs). Do not fabricate URLs.\n\nQuery: %s' "$SRC" > "$PF" + run_gemini 180 -m "$STRONG_MODEL" --approval-mode yolo + emit_or_fail + ;; + raw) "$GEMINI" "$@" ;; *) - echo "[$SELF] unknown mode '$MODE' (use text|verify|review|review-files|review-diff|raw)" >&2; exit 2 ;; + echo "[$SELF] unknown mode '$MODE' (use text|verify|review|review-files|review-diff|image-analyze|search|raw)" >&2; exit 2 ;; esac