Windows Store python3 stub returns exit 49 instead of running Python. Replace with: py (Windows launcher) for actual Python code, jq for simple JSON extraction. Reorder fallback loops to try py first. Add Bash(py:*) to settings.local.json allowlist. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
68 lines
2.2 KiB
Markdown
68 lines
2.2 KiB
Markdown
# Ollama — Local AI Reference
|
|
|
|
Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Available to all team members via Tailscale.
|
|
|
|
## Models
|
|
|
|
| Model | Size | Use For |
|
|
|-------|------|---------|
|
|
| `qwen3:14b` | 9.3 GB | Summarization, classification, data extraction, drafting |
|
|
| `codestral:22b` | 12 GB | Code generation, refactoring suggestions, docstrings |
|
|
| `nomic-embed-text` | 274 MB | Embeddings only (used by GrepAI) |
|
|
|
|
## Endpoints
|
|
|
|
- **DESKTOP-0O8A1RL** (local): `http://localhost:11434`
|
|
- **Any other machine** (Tailscale required): `http://100.92.127.64:11434`
|
|
|
|
Check reachability:
|
|
```bash
|
|
curl -s http://100.92.127.64:11434/api/tags | jq -r '.models[].name'
|
|
```
|
|
|
|
If it fails: verify Tailscale is connected (`tailscale status`) and Mike's workstation is online.
|
|
|
|
## Access Control
|
|
|
|
- Port 11434 allowed ONLY from Tailscale subnet (100.0.0.0/8)
|
|
- NOT exposed to LAN, VPN, or internet
|
|
- Binding: `OLLAMA_HOST=0.0.0.0:11434` (firewall restricts)
|
|
|
|
## Calling Ollama
|
|
|
|
Resolve endpoint from identity.json first:
|
|
```bash
|
|
OLLAMA=$([ "$(jq -r .machine .claude/identity.json 2>/dev/null)" = "DESKTOP-0O8A1RL" ] \
|
|
&& echo "http://localhost:11434" || echo "http://100.92.127.64:11434")
|
|
```
|
|
|
|
Preferred one-liner (avoids shell escaping):
|
|
```bash
|
|
py -c "
|
|
import urllib.request, json, sys
|
|
url = 'http://localhost:11434/api/generate'
|
|
body = json.dumps({'model':'qwen3:14b','prompt': sys.argv[1],'stream':False}).encode()
|
|
res = json.loads(urllib.request.urlopen(urllib.request.Request(url, body)).read())
|
|
print(res['response'])
|
|
" "Your prompt here"
|
|
```
|
|
|
|
For code suggestions, swap `qwen3:14b` for `codestral:22b`.
|
|
|
|
## When to Use Which Model
|
|
|
|
| Task | Model |
|
|
|------|-------|
|
|
| Summarize logs, diffs, session notes | qwen3:14b |
|
|
| Classify bug type, severity, category | qwen3:14b |
|
|
| Extract structured data from text | qwen3:14b |
|
|
| Draft commit message from diff | qwen3:14b |
|
|
| Suggest refactor for a function | codestral:22b |
|
|
| Docstring / comment generation | codestral:22b |
|
|
|
|
## Review Policy
|
|
|
|
- Low-stakes output (summary, classification, draft) — use directly
|
|
- Code suggestions from codestral — always review before applying
|
|
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
|