# Ollama — Local AI Reference Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Available to all team members via Tailscale. ## Models | Model | Size | Use For | |-------|------|---------| | `qwen3:14b` | 9.3 GB | Summarization, classification, data extraction, drafting | | `codestral:22b` | 12 GB | Code generation, refactoring suggestions, docstrings | | `nomic-embed-text` | 274 MB | Embeddings only (used by GrepAI) | ## Endpoints - **DESKTOP-0O8A1RL** (local): `http://localhost:11434` - **Any other machine** (Tailscale required): `http://100.92.127.64:11434` Check reachability: ```bash curl -s http://100.92.127.64:11434/api/tags | jq -r '.models[].name' ``` If it fails: verify Tailscale is connected (`tailscale status`) and Mike's workstation is online. ## Access Control - Port 11434 allowed ONLY from Tailscale subnet (100.0.0.0/8) - NOT exposed to LAN, VPN, or internet - Binding: `OLLAMA_HOST=0.0.0.0:11434` (firewall restricts) ## Calling Ollama Resolve endpoint from identity.json first: ```bash OLLAMA=$([ "$(jq -r .machine .claude/identity.json 2>/dev/null)" = "DESKTOP-0O8A1RL" ] \ && echo "http://localhost:11434" || echo "http://100.92.127.64:11434") ``` Preferred one-liner (avoids shell escaping): ```bash py -c " import urllib.request, json, sys url = 'http://localhost:11434/api/generate' body = json.dumps({'model':'qwen3:14b','prompt': sys.argv[1],'stream':False}).encode() res = json.loads(urllib.request.urlopen(urllib.request.Request(url, body)).read()) print(res['response']) " "Your prompt here" ``` For code suggestions, swap `qwen3:14b` for `codestral:22b`. ## When to Use Which Model | Task | Model | |------|-------| | Summarize logs, diffs, session notes | qwen3:14b | | Classify bug type, severity, category | qwen3:14b | | Extract structured data from text | qwen3:14b | | Draft commit message from diff | qwen3:14b | | Suggest refactor for a function | codestral:22b | | Docstring / comment generation | codestral:22b | ## Review Policy - Low-stakes output (summary, classification, draft) — use directly - Code suggestions from codestral — always review before applying - Never use Ollama for: auth decisions, credential handling, production migrations, security review