chore: add Ollama Tier 0 routing — delegate low-stakes work to local models

- Tier 0 (Ollama): summarize, classify, extract, draft, format — free/fast/private
- qwen3:14b for general tasks; codestral:22b for code suggestions
- Falls back to Haiku if Ollama unreachable or task needs agent tool use
- Bump rule extended: Ollama → Haiku on security/auth/migration/production
- Delegation pattern: direct Bash curl, not an agent spawn
- Per-task model guidance and review policy documented

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-19 18:55:50 -07:00
parent 492fbbf4c9
commit d37cc238d2

View File

@@ -104,17 +104,20 @@ You are NOT an executor. You coordinate specialized agents and preserve your con
### Model Routing (Complexity-Based)
Before spawning an agent, pick a tier from `.claude/COMPLEXITY_ROUTING.md`:
Before spawning an agent, pick a tier:
| Tier | Model | When |
|------|-------|------|
| 1 | `haiku` | Lookup, format, summarize, doc — no code changes |
| 0 | **Ollama** (local) | Low-stakes: summarize, classify, extract, draft, format — no code changes, output reviewed before use |
| 1 | `haiku` | Ollama unavailable, or task needs tool use / file access an agent provides |
| 2 | (inherit) | Standard code, DB, tests, git — most work |
| 3 | `opus` | Architecture, security, ambiguous failures, production risk |
**Bump rule:** if the request involves `security`, `auth`, `credential`, `migration`, `production`, or `data loss` — bump one tier up.
**Tier 0 rule:** Always try Ollama first for low-stakes work. It's free, fast, and private. Use `qwen3:14b` for general tasks; `codestral:22b` for code suggestions. Fall back to Haiku only if Ollama is unreachable or the task requires agent tool use.
Pass `model: "haiku"` or `model: "opus"` explicitly. Omit for Tier 2 (inherits session model).
**Bump rule:** if the request involves `security`, `auth`, `credential`, `migration`, `production`, or `data loss` — bump one tier up (Ollama → Haiku, Haiku → inherit, inherit → opus).
Pass `model: "haiku"` or `model: "opus"` explicitly. Omit for Tier 2 (inherits session model). Tier 0 is a direct Bash call, not an agent spawn — see Ollama section below.
### Coordination Flow
@@ -412,6 +415,52 @@ If it fails: verify Tailscale is connected (`tailscale status`), and that Mike's
- NOT exposed to LAN, VPN, or internet
- Binding: `OLLAMA_HOST=0.0.0.0:11434` (all interfaces, firewall restricts)
### Delegation pattern (Tier 0 — use instead of spawning a Haiku agent)
Determine the endpoint from identity.json, then call directly with the Bash tool:
```bash
# Resolve endpoint once per session
OLLAMA=$([ "$(jq -r .machine ~/.claude/identity.json 2>/dev/null)" = "DESKTOP-0O8A1RL" ] \
&& echo "http://localhost:11434" || echo "http://100.92.127.64:11434")
# General task (summarize, classify, extract, draft)
curl -s "$OLLAMA/api/generate" \
-d "{\"model\":\"qwen3:14b\",\"prompt\":\"$(echo "$PROMPT" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))'| tr -d '\"')\",\"stream\":false}" \
| python3 -c "import sys,json; print(json.load(sys.stdin).get('response',''))"
# Code suggestion (refactor ideas, docstrings — NOT production code)
# Same call, model: "codestral:22b"
```
**Practical shorthand** — for one-off inline prompts, use python3 to avoid escaping issues:
```bash
python3 -c "
import urllib.request, json, sys
url = 'http://localhost:11434/api/generate' # or 100.92.127.64
body = json.dumps({'model':'qwen3:14b','prompt': sys.argv[1],'stream':False}).encode()
res = json.loads(urllib.request.urlopen(urllib.request.Request(url, body)).read())
print(res['response'])
" "Summarize these changes in one sentence: ..."
```
**When to use which model:**
| Task | Model |
|------|-------|
| Summarize logs, diffs, session notes | qwen3:14b |
| Classify bug type, severity, category | qwen3:14b |
| Extract structured data from text output | qwen3:14b |
| Draft commit message from diff | qwen3:14b |
| Suggest refactor for a function (review output) | codestral:22b |
| Docstring / comment generation | codestral:22b |
**Review policy:**
- Low-stakes output (summary, label, draft) — use directly, no review needed
- Code suggestions from codestral — always review before applying
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
**Review policy:** Always review Critical/High impact Ollama outputs (auth, security, migrations, production). Trust Low impact (classification, formatting). Flag uncertainty to user.
### GrepAI (Semantic Code Search)