diff --git a/.claude/OLLAMA.md b/.claude/OLLAMA.md index 827ebf6..7b2cba1 100644 --- a/.claude/OLLAMA.md +++ b/.claude/OLLAMA.md @@ -112,8 +112,6 @@ This keeps Claude tokens focused on reasoning, decisions, and execution. Ollama | Client-facing notes and summaries | qwen3:14b / qwen3:8b* | Review for accuracy | | Agent phase handoff summaries (explore → plan, plan → implement) | qwen3:14b / qwen3:8b* | Review + include in agent brief | | Client email drafts | qwen3:14b / qwen3:8b* | Review for accuracy + tone before sending | - -*Use `qwen3:8b` on DESKTOP-0O8A1RL — 4.8x faster due to full VRAM fit. Use `qwen3:14b` everywhere else. | Ticket / issue classification (priority, type, category) | qwen3.6 | Review + apply label | | Diff summarization before commit | qwen3.6 | Review + use in commit message | | Error message categorization (transient / config / bug) | qwen3.6 | Review + act on classification | @@ -124,6 +122,8 @@ This keeps Claude tokens focused on reasoning, decisions, and execution. Ollama | Code comments and docstrings | codestral:22b | Review before applying | | Refactor suggestions | codestral:22b | Review before applying | +\* Use `qwen3:8b` on DESKTOP-0O8A1RL — 4.8x faster than 14b there due to full VRAM fit. Use `qwen3:14b` on all other machines. + ### What Claude always owns (never Ollama) - Credentials, passwords, API keys — must be verbatim accurate @@ -187,8 +187,6 @@ print('warm') | Summarize logs, diffs, incident notes (no length cap) | qwen3:8b* / qwen3:14b | | Agent phase handoff summaries | qwen3:8b* / qwen3:14b | | Client email drafts | qwen3:8b* / qwen3:14b | - -*On DESKTOP-0O8A1RL only — 4.8x faster (86 tok/s vs 18 tok/s). Use qwen3:14b on all other machines. | Classify bug type, severity, category, priority | qwen3.6 | | Extract structured data from text (JSON, fields) | qwen3.6 | | Diff summarization with strict format / fields | qwen3.6 | @@ -200,7 +198,9 @@ print('warm') | Code comment / docstring generation | codestral:22b | | Refactor suggestions | codestral:22b | -**Rule of thumb:** if the output is *prose someone will read*, use qwen3:14b (2x faster). If the output is *structured data something will parse* or *must obey a tight format*, use qwen3.6. +\* On DESKTOP-0O8A1RL only — 4.8x faster (86 tok/s vs 18 tok/s). Use `qwen3:14b` on all other machines. + +**Rule of thumb:** if the output is *prose someone will read*, use the per-machine prose model (qwen3:8b on DESKTOP-0O8A1RL, qwen3:14b elsewhere). If the output is *structured data something will parse* or *must obey a tight format*, use qwen3.6. ## Review Policy