chore: add Ollama Tier 0 routing — delegate low-stakes work to local models

- Tier 0 (Ollama): summarize, classify, extract, draft, format — free/fast/private - qwen3:14b for general tasks; codestral:22b for code suggestions - Falls back to Haiku if Ollama unreachable or task needs agent tool use - Bump rule extended: Ollama → Haiku on security/auth/migration/production - Delegation pattern: direct Bash curl, not an agent spawn - Per-task model guidance and review policy documented Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:55:50 -07:00
parent 492fbbf4c9
commit d37cc238d2
1 changed files with 53 additions and 4 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -104,17 +104,20 @@ You are NOT an executor. You coordinate specialized agents and preserve your con

 ### Model Routing (Complexity-Based)

-Before spawning an agent, pick a tier from `.claude/COMPLEXITY_ROUTING.md`:
+Before spawning an agent, pick a tier:

 | Tier | Model | When |
 |------|-------|------|
-| 1 | `haiku` | Lookup, format, summarize, doc — no code changes |
+| 0 | **Ollama** (local) | Low-stakes: summarize, classify, extract, draft, format — no code changes, output reviewed before use |
+| 1 | `haiku` | Ollama unavailable, or task needs tool use / file access an agent provides |
 | 2 | (inherit) | Standard code, DB, tests, git — most work |
 | 3 | `opus` | Architecture, security, ambiguous failures, production risk |

-**Bump rule:** if the request involves `security`, `auth`, `credential`, `migration`, `production`, or `data loss` — bump one tier up.
+**Tier 0 rule:** Always try Ollama first for low-stakes work. It's free, fast, and private. Use `qwen3:14b` for general tasks; `codestral:22b` for code suggestions. Fall back to Haiku only if Ollama is unreachable or the task requires agent tool use.

-Pass `model: "haiku"` or `model: "opus"` explicitly. Omit for Tier 2 (inherits session model).
+**Bump rule:** if the request involves `security`, `auth`, `credential`, `migration`, `production`, or `data loss` — bump one tier up (Ollama → Haiku, Haiku → inherit, inherit → opus).
+
+Pass `model: "haiku"` or `model: "opus"` explicitly. Omit for Tier 2 (inherits session model). Tier 0 is a direct Bash call, not an agent spawn — see Ollama section below.

 ### Coordination Flow

@@ -412,6 +415,52 @@ If it fails: verify Tailscale is connected (`tailscale status`), and that Mike's
 - NOT exposed to LAN, VPN, or internet
 - Binding: `OLLAMA_HOST=0.0.0.0:11434` (all interfaces, firewall restricts)

+### Delegation pattern (Tier 0 — use instead of spawning a Haiku agent)
+
+Determine the endpoint from identity.json, then call directly with the Bash tool:
+
+```bash
+# Resolve endpoint once per session
+OLLAMA=$([ "$(jq -r .machine ~/.claude/identity.json 2>/dev/null)" = "DESKTOP-0O8A1RL" ] \
+  && echo "http://localhost:11434" || echo "http://100.92.127.64:11434")
+
+# General task (summarize, classify, extract, draft)
+curl -s "$OLLAMA/api/generate" \
+  -d "{\"model\":\"qwen3:14b\",\"prompt\":\"$(echo "$PROMPT" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))'| tr -d '\"')\",\"stream\":false}" \
+  | python3 -c "import sys,json; print(json.load(sys.stdin).get('response',''))"
+
+# Code suggestion (refactor ideas, docstrings — NOT production code)
+# Same call, model: "codestral:22b"
+```
+
+**Practical shorthand** — for one-off inline prompts, use python3 to avoid escaping issues:
+
+```bash
+python3 -c "
+import urllib.request, json, sys
+url = 'http://localhost:11434/api/generate'  # or 100.92.127.64
+body = json.dumps({'model':'qwen3:14b','prompt': sys.argv[1],'stream':False}).encode()
+res = json.loads(urllib.request.urlopen(urllib.request.Request(url, body)).read())
+print(res['response'])
+" "Summarize these changes in one sentence: ..."
+```
+
+**When to use which model:**
+
+| Task | Model |
+|------|-------|
+| Summarize logs, diffs, session notes | qwen3:14b |
+| Classify bug type, severity, category | qwen3:14b |
+| Extract structured data from text output | qwen3:14b |
+| Draft commit message from diff | qwen3:14b |
+| Suggest refactor for a function (review output) | codestral:22b |
+| Docstring / comment generation | codestral:22b |
+
+**Review policy:**
+- Low-stakes output (summary, label, draft) — use directly, no review needed
+- Code suggestions from codestral — always review before applying
+- Never use Ollama for: auth decisions, credential handling, production migrations, security review
+
 **Review policy:** Always review Critical/High impact Ollama outputs (auth, security, migrations, production). Trust Low impact (classification, formatting). Flag uncertainty to user.

 ### GrepAI (Semantic Code Search)