43 lines
3.1 KiB
Markdown
43 lines
3.1 KiB
Markdown
---
|
|
name: Route Tier-0 tasks through Ollama (Mike's ClaudeTools design intent)
|
|
description: Drafts, summaries, classifications, extractions MUST go through Ollama per Mike's tiered-model architecture. Don't default to Claude inference for low-stakes text generation.
|
|
type: feedback
|
|
---
|
|
|
|
Route Tier-0 tasks (summaries, classifications, drafts, extractions) through Ollama. Not optional — this is how Mike designed ClaudeTools to work.
|
|
|
|
**Why:** Mike built the tiered-model architecture (`CLAUDE.md` Model Routing section + `.claude/OLLAMA.md`) deliberately. Tier 0 is free + fast + private. Defaulting to Claude for every drafting task burns context window and Anthropic tokens on work that qwen3:14b does fine.
|
|
|
|
**How to apply:**
|
|
- Drafting emails, session-log paragraphs, status-update sentences, commit-message first-drafts → qwen3:14b
|
|
- Summarizing long output (Graph JSON, PowerShell transcripts, log tails) → qwen3:14b
|
|
- Extracting structured data from text → qwen3:14b
|
|
- Suggesting refactors / generating docstrings → codestral:22b (then review)
|
|
- NEVER for: auth decisions, credential handling, production migrations, security review, citation work, production-change scripts
|
|
|
|
**Endpoint resolution — machine config is centralized in `.claude/identity.json` `ollama` (Phase 2, 2026-05-26). Read the declared endpoint; no curl probe per call:**
|
|
```bash
|
|
OLLAMA=$(jq -r '.ollama.endpoint // .ollama.fallback // "http://localhost:11434"' .claude/identity.json)
|
|
MODEL=$(jq -r '.ollama.prose_model // "qwen3:14b"' .claude/identity.json)
|
|
```
|
|
`migrate-identity.sh` populates the `ollama` object per machine: `endpoint` (the one to use — localhost if a local Ollama was present at migration, else the Tailscale host), `fallback` (backup, usually GURU-BEAST-ROG `http://100.101.122.4:11434`, always-on RTX 4090), and `prose_model` (qwen3:8b on 12 GB boxes like GURU-5070, qwen3:14b elsewhere). GURU-KALI: endpoint+fallback = Beast (no local Ollama). Re-run `migrate-identity.sh` to re-detect after an Ollama/network change. Don't hardcode endpoints/IPs in shared files — read identity.json. (Superseded the interim `ollama_fallback` field from earlier 2026-05-26.)
|
|
|
|
**Call pattern for qwen3 — use `/api/chat` with `think:false`**, NOT `/api/generate`. qwen3 on generate endpoint dumps reasoning into internal thinking tokens and returns empty `response` field. Chat endpoint with `think:false` returns clean content in `message.content`:
|
|
|
|
```python
|
|
body = json.dumps({
|
|
'model':'qwen3:14b',
|
|
'messages':[{'role':'user','content': prompt}],
|
|
'stream':False,
|
|
'think':False
|
|
}).encode()
|
|
# POST to OLLAMA + '/api/chat'
|
|
# Read res['message']['content']
|
|
```
|
|
|
|
Codestral doesn't need `think:false` — just use it on `/api/chat` normally.
|
|
|
|
Cold-start ~30-50s on first call per model per session; warm calls 1-5s.
|
|
|
|
**Incident 2026-04-22:** Spent an entire Cascades rollout session (G1 hygiene, orphan cleanup, risk register, synology discovery, etc.) without routing a single task through Ollama despite many drafting opportunities (report drafts, summary text, email drafts). Howard called this out: "just make sure ollama is being used as mike has designed claudetools to work."
|