sync: auto-sync from GURU-5070 at 2026-05-25 06:00:45

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-25 06:00:45
This commit is contained in:
2026-05-25 06:00:47 -07:00
parent cdd6e6fc8c
commit 2491660b88
3 changed files with 22 additions and 17 deletions

View File

@@ -282,9 +282,10 @@ Tier 0 — **Ollama is the documentation and classification engine.** Route pros
| Machine | Endpoint |
|---------|----------|
| DESKTOP-0O8A1RL | `http://localhost:11434` |
| Other | `http://100.92.127.64:11434` (Tailscale) |
| GURU-BEAST-ROG | `http://localhost:11434` (always-on; canonical Tailscale fallback) |
| Other | `http://100.101.122.4:11434` (Beast via Tailscale) |
Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Full reference + per-machine routing: `.claude/OLLAMA.md`
Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Tailscale fallback host: **GURU-BEAST-ROG** (`100.101.122.4:11434`, always on, RTX 4090). Full reference + per-machine routing: `.claude/OLLAMA.md`
### GrepAI (Semantic Code Search)

View File

@@ -1,6 +1,6 @@
# Ollama — Local AI Reference
Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Available to all team members via Tailscale.
Ollama's always-on host is **GURU-BEAST-ROG** (RTX 4090, 24 GB VRAM, Tailscale `100.101.122.4`). It is the canonical Tailscale fallback for all machines without a local Ollama. DESKTOP-0O8A1RL and other workstations use local when available, Beast otherwise.
## Models
@@ -30,19 +30,22 @@ qwen3:14b and qwen3.6 are CPU-bottlenecked on this machine (split mode, PCIe ban
| Machine | GPU VRAM | Prose model |
|---------|----------|-------------|
| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` |
| GURU-BEAST-ROG | 24 GB (RTX 4090) | `qwen3:14b` (always-on Tailscale host — `100.101.122.4`) |
| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` (local — 4.8x faster than 14b here) |
| Mikes-MacBook-Air | unified memory | `qwen3:14b` |
| HOWARD-HOME | local Ollama | `qwen3:14b` |
| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote / `qwen3:14b` now; `qwen3:8b` if local installed |
| Other | Tailscale fallback | `qwen3:14b` |
| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote Beast / `qwen3:14b` now; `qwen3:8b` if local installed |
| Other | Tailscale fallback (Beast) | `qwen3:14b` |
> **GURU-KALI status (2026-05-24):** Tailscale installed — remote Ollama
> (`100.92.127.64`) IS reachable, so it uses the Tailscale-fallback prose model
> `qwen3:14b` (the "Other" row). No local Ollama yet. It has the strongest hardware
> in the fleet, but the GPU runs the nouveau driver (no CUDA), so a future local
> (Beast at `100.101.122.4`) is reachable, so it uses the Tailscale-fallback prose model
> `qwen3:14b` (the "Other" row). No local Ollama yet. It has strong hardware
> but the GPU runs the nouveau driver (no CUDA), so a future local
> Ollama would need the proprietary NVIDIA driver for GPU accel; `qwen3:8b` would
> then fit its 8 GB VRAM (mirrors DESKTOP-0O8A1RL), with larger models splitting to
> CPU. Full machine profile: `.claude/machines/guru-kali.md`.
>
> **GURU-BEAST-ROG models (2026-05-25):** `gemma3:27b`, `qwen3:32b`, `qwen3:14b`, `codestral:22b`, `nomic-embed-text`. Note: `qwen3.6:latest` and `qwen3:8b` not yet installed — add if strict-format or speed routing is needed.
## Endpoints
@@ -53,17 +56,18 @@ Auto-detect: any machine that has a local Ollama listening on `127.0.0.1:11434`
if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
OLLAMA="http://localhost:11434"
else
OLLAMA="http://100.92.127.64:11434"
OLLAMA="http://100.101.122.4:11434"
fi
```
Rationale:
- **Mike's workstation (DESKTOP-0O8A1RL):** local matches, no change.
- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop, no load on Mike's GPU.
- **Other team machines:** no local Ollama → falls back to Mike's over Tailscale.
- **Mike's machine offline:** graceful degradation — local users continue working; non-local users get a clean timeout.
- **DESKTOP-0O8A1RL:** local matches, uses local Ollama — faster, no Tailscale hop.
- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop.
- **GURU-BEAST-ROG:** always-on; the canonical fallback for all machines without a local Ollama.
- **Other team machines:** no local Ollama → falls back to Beast over Tailscale.
- **Beast offline (rare):** graceful degradation — local Ollama users continue; remote users get a clean timeout.
Manual override (for testing or explicit preference): set `OLLAMA=http://100.92.127.64:11434` before the call.
Manual override (for testing or explicit preference): set `OLLAMA=http://100.101.122.4:11434` before the call.
Check reachability:
```bash
@@ -86,7 +90,7 @@ Preferred one-liner:
```bash
python -c "
import urllib.request, json, sys, os
OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.92.127.64:11434')
OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.101.122.4:11434')
body = json.dumps({
'model':'qwen3:14b',
'messages':[{'role':'user','content': sys.argv[1]}],