Files
claudetools/.claude
Mike Swanson 4aadf16a9f feat: add qwen3:8b for DESKTOP-0O8A1RL, update Ollama routing
Benchmarked 2026-05-16 on DESKTOP-0O8A1RL (RTX 5070 Ti Laptop, 12 GB VRAM):
- qwen3:8b: 100% VRAM fit (10.9/10.9 GB) -> 74-86 tok/s
- qwen3:14b: 73% VRAM (11.3/15.6 GB split) -> 17-18 tok/s (4.8x slower)
- qwen3.6:  41% VRAM (11.3/27.5 GB split) -> 17-19 tok/s

qwen3:14b overflows 12 GB VRAM at runtime (9.3 GB GGUF = 15.6 GB loaded).
qwen3:8b fits entirely in VRAM and matches the reference machine speed.

Updated OLLAMA.md: added qwen3:8b to models table, per-machine routing
table, benchmark results. Updated CLAUDE.md model one-liner.
Routing: qwen3:8b for prose on DESKTOP-0O8A1RL, qwen3:14b everywhere else,
qwen3.6 for strict-format tasks on all machines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:25:57 -07:00
..