chore: add Ollama Tier 0 routing — delegate low-stakes work to local models
- Tier 0 (Ollama): summarize, classify, extract, draft, format — free/fast/private - qwen3:14b for general tasks; codestral:22b for code suggestions - Falls back to Haiku if Ollama unreachable or task needs agent tool use - Bump rule extended: Ollama → Haiku on security/auth/migration/production - Delegation pattern: direct Bash curl, not an agent spawn - Per-task model guidance and review policy documented Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -104,17 +104,20 @@ You are NOT an executor. You coordinate specialized agents and preserve your con
|
||||
|
||||
### Model Routing (Complexity-Based)
|
||||
|
||||
Before spawning an agent, pick a tier from `.claude/COMPLEXITY_ROUTING.md`:
|
||||
Before spawning an agent, pick a tier:
|
||||
|
||||
| Tier | Model | When |
|
||||
|------|-------|------|
|
||||
| 1 | `haiku` | Lookup, format, summarize, doc — no code changes |
|
||||
| 0 | **Ollama** (local) | Low-stakes: summarize, classify, extract, draft, format — no code changes, output reviewed before use |
|
||||
| 1 | `haiku` | Ollama unavailable, or task needs tool use / file access an agent provides |
|
||||
| 2 | (inherit) | Standard code, DB, tests, git — most work |
|
||||
| 3 | `opus` | Architecture, security, ambiguous failures, production risk |
|
||||
|
||||
**Bump rule:** if the request involves `security`, `auth`, `credential`, `migration`, `production`, or `data loss` — bump one tier up.
|
||||
**Tier 0 rule:** Always try Ollama first for low-stakes work. It's free, fast, and private. Use `qwen3:14b` for general tasks; `codestral:22b` for code suggestions. Fall back to Haiku only if Ollama is unreachable or the task requires agent tool use.
|
||||
|
||||
Pass `model: "haiku"` or `model: "opus"` explicitly. Omit for Tier 2 (inherits session model).
|
||||
**Bump rule:** if the request involves `security`, `auth`, `credential`, `migration`, `production`, or `data loss` — bump one tier up (Ollama → Haiku, Haiku → inherit, inherit → opus).
|
||||
|
||||
Pass `model: "haiku"` or `model: "opus"` explicitly. Omit for Tier 2 (inherits session model). Tier 0 is a direct Bash call, not an agent spawn — see Ollama section below.
|
||||
|
||||
### Coordination Flow
|
||||
|
||||
@@ -412,6 +415,52 @@ If it fails: verify Tailscale is connected (`tailscale status`), and that Mike's
|
||||
- NOT exposed to LAN, VPN, or internet
|
||||
- Binding: `OLLAMA_HOST=0.0.0.0:11434` (all interfaces, firewall restricts)
|
||||
|
||||
### Delegation pattern (Tier 0 — use instead of spawning a Haiku agent)
|
||||
|
||||
Determine the endpoint from identity.json, then call directly with the Bash tool:
|
||||
|
||||
```bash
|
||||
# Resolve endpoint once per session
|
||||
OLLAMA=$([ "$(jq -r .machine ~/.claude/identity.json 2>/dev/null)" = "DESKTOP-0O8A1RL" ] \
|
||||
&& echo "http://localhost:11434" || echo "http://100.92.127.64:11434")
|
||||
|
||||
# General task (summarize, classify, extract, draft)
|
||||
curl -s "$OLLAMA/api/generate" \
|
||||
-d "{\"model\":\"qwen3:14b\",\"prompt\":\"$(echo "$PROMPT" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))'| tr -d '\"')\",\"stream\":false}" \
|
||||
| python3 -c "import sys,json; print(json.load(sys.stdin).get('response',''))"
|
||||
|
||||
# Code suggestion (refactor ideas, docstrings — NOT production code)
|
||||
# Same call, model: "codestral:22b"
|
||||
```
|
||||
|
||||
**Practical shorthand** — for one-off inline prompts, use python3 to avoid escaping issues:
|
||||
|
||||
```bash
|
||||
python3 -c "
|
||||
import urllib.request, json, sys
|
||||
url = 'http://localhost:11434/api/generate' # or 100.92.127.64
|
||||
body = json.dumps({'model':'qwen3:14b','prompt': sys.argv[1],'stream':False}).encode()
|
||||
res = json.loads(urllib.request.urlopen(urllib.request.Request(url, body)).read())
|
||||
print(res['response'])
|
||||
" "Summarize these changes in one sentence: ..."
|
||||
```
|
||||
|
||||
**When to use which model:**
|
||||
|
||||
| Task | Model |
|
||||
|------|-------|
|
||||
| Summarize logs, diffs, session notes | qwen3:14b |
|
||||
| Classify bug type, severity, category | qwen3:14b |
|
||||
| Extract structured data from text output | qwen3:14b |
|
||||
| Draft commit message from diff | qwen3:14b |
|
||||
| Suggest refactor for a function (review output) | codestral:22b |
|
||||
| Docstring / comment generation | codestral:22b |
|
||||
|
||||
**Review policy:**
|
||||
- Low-stakes output (summary, label, draft) — use directly, no review needed
|
||||
- Code suggestions from codestral — always review before applying
|
||||
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
|
||||
|
||||
**Review policy:** Always review Critical/High impact Ollama outputs (auth, security, migrations, production). Trust Low impact (classification, formatting). Flag uncertainty to user.
|
||||
|
||||
### GrepAI (Semantic Code Search)
|
||||
|
||||
Reference in New Issue
Block a user