2.6 KiB
name, description, type
| name | description | type |
|---|---|---|
| Route Tier-0 tasks through Ollama (Mike's ClaudeTools design intent) | Drafts, summaries, classifications, extractions MUST go through Ollama per Mike's tiered-model architecture. Don't default to Claude inference for low-stakes text generation. | feedback |
Route Tier-0 tasks (summaries, classifications, drafts, extractions) through Ollama. Not optional — this is how Mike designed ClaudeTools to work.
Why: Mike built the tiered-model architecture (CLAUDE.md Model Routing section + .claude/OLLAMA.md) deliberately. Tier 0 is free + fast + private. Defaulting to Claude for every drafting task burns context window and Anthropic tokens on work that qwen3:14b does fine.
How to apply:
- Drafting emails, session-log paragraphs, status-update sentences, commit-message first-drafts → qwen3:14b
- Summarizing long output (Graph JSON, PowerShell transcripts, log tails) → qwen3:14b
- Extracting structured data from text → qwen3:14b
- Suggesting refactors / generating docstrings → codestral:22b (then review)
- NEVER for: auth decisions, credential handling, production migrations, security review, citation work, production-change scripts
Endpoint resolution (updated 2026-04-22 in .claude/OLLAMA.md):
if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
OLLAMA="http://localhost:11434"
else
OLLAMA="http://100.92.127.64:11434"
fi
HOWARD-HOME has the canonical models loaded locally (qwen3:14b, codestral:22b, nomic-embed-text, plus bonus qwen3-coder:30b) — so HOWARD-HOME uses local Ollama, not Mike's. Zero Tailscale hop.
Call pattern for qwen3 — use /api/chat with think:false, NOT /api/generate. qwen3 on generate endpoint dumps reasoning into internal thinking tokens and returns empty response field. Chat endpoint with think:false returns clean content in message.content:
body = json.dumps({
'model':'qwen3:14b',
'messages':[{'role':'user','content': prompt}],
'stream':False,
'think':False
}).encode()
# POST to OLLAMA + '/api/chat'
# Read res['message']['content']
Codestral doesn't need think:false — just use it on /api/chat normally.
Cold-start ~30-50s on first call per model per session; warm calls 1-5s.
Incident 2026-04-22: Spent an entire Cascades rollout session (G1 hygiene, orphan cleanup, risk register, synology discovery, etc.) without routing a single task through Ollama despite many drafting opportunities (report drafts, summary text, email drafts). Howard called this out: "just make sure ollama is being used as mike has designed claudetools to work."