diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 5c9cd71..7e84c72 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -292,9 +292,9 @@ Tier 0 — **Ollama is the documentation and classification engine.** Route pros |---------|----------| | Local Ollama present (GURU-5070, Howard-Home, etc.) | `http://localhost:11434` | | GURU-BEAST-ROG | `http://localhost:11434` (always-on RTX 4090; common fallback target) | -| No local Ollama | per-machine fallback from `.claude/identity.json` `ollama_fallback` (e.g. Beast `http://100.101.122.4:11434`) | +| No local Ollama (e.g. GURU-KALI) | endpoint+fallback = Beast `http://100.101.122.4:11434` | -Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Fallback host is per-machine — set in `.claude/identity.json` `ollama_fallback`; GURU-BEAST-ROG (`100.101.122.4:11434`, always on, RTX 4090) is the usual choice. Full reference + per-machine routing: `.claude/OLLAMA.md` +Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Per-machine config (endpoint, fallback, prose_model, python, platform, architecture) lives in `.claude/identity.json` `ollama`/`python` — populated by `.claude/scripts/migrate-identity.sh` (Phase 2, 2026-05-26). Scripts read `.ollama.endpoint` (no curl probe). GURU-BEAST-ROG (`100.101.122.4:11434`, always on, RTX 4090) is the usual fallback. Full reference + per-machine routing: `.claude/OLLAMA.md` ### GrepAI (Semantic Code Search) diff --git a/.claude/OLLAMA.md b/.claude/OLLAMA.md index d31c7d1..f838e6b 100644 --- a/.claude/OLLAMA.md +++ b/.claude/OLLAMA.md @@ -49,26 +49,20 @@ qwen3:14b and qwen3.6 are CPU-bottlenecked on this machine (split mode, PCIe ban ## Endpoints -Auto-detect: any machine with a local Ollama on `127.0.0.1:11434` uses local. Otherwise it falls back to the **per-machine** host in `.claude/identity.json` `ollama_fallback` — each user/machine chooses its own (GURU-BEAST-ROG, the always-on RTX 4090, is the usual pick). +Endpoint comes from `.claude/identity.json` `ollama` (Phase 2 centralization, 2026-05-26) — read the declared endpoint, no curl probe per call: ```bash -# Universal resolver — the fallback is per-machine, read from identity.json (never hardcode) -LOCAL="http://localhost:11434" -FALLBACK=$(python3 -c "import json;print((json.load(open('.claude/identity.json')).get('ollama_fallback') or {}).get('endpoint',''))" 2>/dev/null) -if curl -s -m 2 "$LOCAL/api/tags" >/dev/null 2>&1; then - OLLAMA="$LOCAL" -elif [ -n "$FALLBACK" ]; then - OLLAMA="$FALLBACK" # e.g. GURU-KALI -> http://100.101.122.4:11434 (GURU-BEAST-ROG) -else - OLLAMA="$LOCAL" # no fallback configured — local only -fi +OLLAMA=$(jq -r '.ollama.endpoint // .ollama.fallback // "http://localhost:11434"' .claude/identity.json) +MODEL=$(jq -r '.ollama.prose_model // "qwen3:14b"' .claude/identity.json) ``` +`migrate-identity.sh` sets the `ollama` object per machine — `endpoint` (the one to use), `fallback` (backup, usually GURU-BEAST-ROG `100.101.122.4`), `prose_model` (qwen3:8b on 12 GB boxes, qwen3:14b elsewhere). Re-run `migrate-identity.sh` to re-detect after an Ollama/network change. + Rationale: -- **Local-Ollama machines** (e.g. Howard-Home, with the canonical model set) use local — faster, zero Tailscale hop; leave `ollama_fallback` unset/local. -- **GURU-BEAST-ROG:** always-on RTX 4090; the usual `ollama_fallback` target for machines without local models. -- **Machines without local Ollama** set `ollama_fallback` in identity.json to the host they want (commonly Beast over Tailscale). -- **Fallback offline (rare):** graceful degradation — local users continue; remote users get a clean timeout. +- **Local-Ollama machines** (e.g. Howard-Home, GURU-5070) get `endpoint=localhost` at migration — faster, zero Tailscale hop. +- **GURU-BEAST-ROG:** always-on RTX 4090; the usual `fallback`, and many machines' `endpoint`. +- **Machines without local Ollama** (e.g. GURU-KALI) get `endpoint=fallback=Beast`. +- **No per-call probe:** the declared endpoint is trusted; re-run migrate-identity.sh if the Ollama/network topology changes. Manual override (for testing or explicit preference): set `OLLAMA=http://100.101.122.4:11434` before the call. diff --git a/.claude/memory/feedback_ollama_tier0_routing.md b/.claude/memory/feedback_ollama_tier0_routing.md index 2504251..1cd23a6 100644 --- a/.claude/memory/feedback_ollama_tier0_routing.md +++ b/.claude/memory/feedback_ollama_tier0_routing.md @@ -15,19 +15,12 @@ Route Tier-0 tasks (summaries, classifications, drafts, extractions) through Oll - Suggesting refactors / generating docstrings → codestral:22b (then review) - NEVER for: auth decisions, credential handling, production migrations, security review, citation work, production-change scripts -**Endpoint resolution — the remote fallback is a PER-MACHINE choice in `.claude/identity.json` `ollama_fallback`, never hardcoded:** +**Endpoint resolution — machine config is centralized in `.claude/identity.json` `ollama` (Phase 2, 2026-05-26). Read the declared endpoint; no curl probe per call:** ```bash -LOCAL="http://localhost:11434" -FALLBACK=$(python3 -c "import json;print((json.load(open('.claude/identity.json')).get('ollama_fallback') or {}).get('endpoint',''))" 2>/dev/null) -if curl -s -m 2 "$LOCAL/api/tags" >/dev/null 2>&1; then - OLLAMA="$LOCAL" # local Ollama is up — use it -elif [ -n "$FALLBACK" ]; then - OLLAMA="$FALLBACK" # per-machine fallback from identity.json -else - OLLAMA="$LOCAL" # no fallback configured — local only -fi +OLLAMA=$(jq -r '.ollama.endpoint // .ollama.fallback // "http://localhost:11434"' .claude/identity.json) +MODEL=$(jq -r '.ollama.prose_model // "qwen3:14b"' .claude/identity.json) ``` -Each machine sets its own `ollama_fallback` in identity.json, e.g. `{"host":"GURU-BEAST-ROG","endpoint":"http://100.101.122.4:11434"}`. GURU-BEAST-ROG (RTX 4090, always on) is the usual choice; GURU-KALI is set to it (confirmed 2026-05-26). A machine with local models loaded (e.g. Howard-Home: qwen3:14b, codestral:22b, nomic-embed-text, qwen3-coder:30b) can leave `ollama_fallback` unset/local — zero Tailscale hop. Do NOT bake a fallback IP into shared files (memory, OLLAMA.md, CLAUDE.md) — read it from identity.json. +`migrate-identity.sh` populates the `ollama` object per machine: `endpoint` (the one to use — localhost if a local Ollama was present at migration, else the Tailscale host), `fallback` (backup, usually GURU-BEAST-ROG `http://100.101.122.4:11434`, always-on RTX 4090), and `prose_model` (qwen3:8b on 12 GB boxes like GURU-5070, qwen3:14b elsewhere). GURU-KALI: endpoint+fallback = Beast (no local Ollama). Re-run `migrate-identity.sh` to re-detect after an Ollama/network change. Don't hardcode endpoints/IPs in shared files — read identity.json. (Superseded the interim `ollama_fallback` field from earlier 2026-05-26.) **Call pattern for qwen3 — use `/api/chat` with `think:false`**, NOT `/api/generate`. qwen3 on generate endpoint dumps reasoning into internal thinking tokens and returns empty `response` field. Chat endpoint with `think:false` returns clean content in `message.content`: