sync: auto-sync from HOWARD-HOME at 2026-04-23 06:21:23
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-04-23 06:21:23
This commit is contained in:
@@ -12,15 +12,31 @@ Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Avail
|
||||
|
||||
## Endpoints
|
||||
|
||||
- **DESKTOP-0O8A1RL** (local): `http://localhost:11434`
|
||||
- **Any other machine** (Tailscale required): `http://100.92.127.64:11434`
|
||||
Auto-detect: any machine that has a local Ollama listening on `127.0.0.1:11434` uses local. Otherwise fall back to Mike's workstation over Tailscale.
|
||||
|
||||
```bash
|
||||
# Preferred universal resolver — works on any machine
|
||||
if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
|
||||
OLLAMA="http://localhost:11434"
|
||||
else
|
||||
OLLAMA="http://100.92.127.64:11434"
|
||||
fi
|
||||
```
|
||||
|
||||
Rationale:
|
||||
- **Mike's workstation (DESKTOP-0O8A1RL):** local matches, no change.
|
||||
- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop, no load on Mike's GPU.
|
||||
- **Other team machines:** no local Ollama → falls back to Mike's over Tailscale.
|
||||
- **Mike's machine offline:** graceful degradation — local users continue working; non-local users get a clean timeout.
|
||||
|
||||
Manual override (for testing or explicit preference): set `OLLAMA=http://100.92.127.64:11434` before the call.
|
||||
|
||||
Check reachability:
|
||||
```bash
|
||||
curl -s http://100.92.127.64:11434/api/tags | jq -r '.models[].name'
|
||||
curl -s $OLLAMA/api/tags | jq -r '.models[].name'
|
||||
```
|
||||
|
||||
If it fails: verify Tailscale is connected (`tailscale status`) and Mike's workstation is online.
|
||||
If neither endpoint responds: verify Tailscale (`tailscale status`) and whether your local Ollama service is running.
|
||||
|
||||
## Access Control
|
||||
|
||||
@@ -30,24 +46,29 @@ If it fails: verify Tailscale is connected (`tailscale status`) and Mike's works
|
||||
|
||||
## Calling Ollama
|
||||
|
||||
Resolve endpoint from identity.json first:
|
||||
```bash
|
||||
OLLAMA=$([ "$(jq -r .machine .claude/identity.json 2>/dev/null)" = "DESKTOP-0O8A1RL" ] \
|
||||
&& echo "http://localhost:11434" || echo "http://100.92.127.64:11434")
|
||||
```
|
||||
Use the `/api/chat` endpoint with `think:false` for qwen3 models. The older `/api/generate` endpoint on qwen3 puts output into thinking tokens that don't appear in the `response` field — you'll get an empty response if you use `/api/generate`.
|
||||
|
||||
Preferred one-liner (avoids shell escaping):
|
||||
Preferred one-liner:
|
||||
```bash
|
||||
py -c "
|
||||
import urllib.request, json, sys
|
||||
url = 'http://localhost:11434/api/generate'
|
||||
body = json.dumps({'model':'qwen3:14b','prompt': sys.argv[1],'stream':False}).encode()
|
||||
res = json.loads(urllib.request.urlopen(urllib.request.Request(url, body)).read())
|
||||
print(res['response'])
|
||||
python -c "
|
||||
import urllib.request, json, sys, os
|
||||
OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.92.127.64:11434')
|
||||
body = json.dumps({
|
||||
'model':'qwen3:14b',
|
||||
'messages':[{'role':'user','content': sys.argv[1]}],
|
||||
'stream':False,
|
||||
'think':False
|
||||
}).encode()
|
||||
res = json.loads(urllib.request.urlopen(urllib.request.Request(OLLAMA+'/api/chat', body), timeout=120).read())
|
||||
print(res['message']['content'])
|
||||
" "Your prompt here"
|
||||
```
|
||||
|
||||
For code suggestions, swap `qwen3:14b` for `codestral:22b`.
|
||||
Or set `$OLLAMA` once from bash (see auto-detect formula above) and reuse it across calls.
|
||||
|
||||
For code suggestions, swap `qwen3:14b` for `codestral:22b`. Codestral doesn't need `think:false`.
|
||||
|
||||
Cold-start is ~30-50s on first call per model per session. Warm calls are 1-5s.
|
||||
|
||||
## When to Use Which Model
|
||||
|
||||
|
||||
Reference in New Issue
Block a user