sync: auto-sync from GURU-5070 at 2026-05-25 06:00:45
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-25 06:00:45
This commit is contained in:
@@ -282,9 +282,10 @@ Tier 0 — **Ollama is the documentation and classification engine.** Route pros
|
||||
| Machine | Endpoint |
|
||||
|---------|----------|
|
||||
| DESKTOP-0O8A1RL | `http://localhost:11434` |
|
||||
| Other | `http://100.92.127.64:11434` (Tailscale) |
|
||||
| GURU-BEAST-ROG | `http://localhost:11434` (always-on; canonical Tailscale fallback) |
|
||||
| Other | `http://100.101.122.4:11434` (Beast via Tailscale) |
|
||||
|
||||
Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Full reference + per-machine routing: `.claude/OLLAMA.md`
|
||||
Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Tailscale fallback host: **GURU-BEAST-ROG** (`100.101.122.4:11434`, always on, RTX 4090). Full reference + per-machine routing: `.claude/OLLAMA.md`
|
||||
|
||||
### GrepAI (Semantic Code Search)
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Ollama — Local AI Reference
|
||||
|
||||
Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Available to all team members via Tailscale.
|
||||
Ollama's always-on host is **GURU-BEAST-ROG** (RTX 4090, 24 GB VRAM, Tailscale `100.101.122.4`). It is the canonical Tailscale fallback for all machines without a local Ollama. DESKTOP-0O8A1RL and other workstations use local when available, Beast otherwise.
|
||||
|
||||
## Models
|
||||
|
||||
@@ -30,19 +30,22 @@ qwen3:14b and qwen3.6 are CPU-bottlenecked on this machine (split mode, PCIe ban
|
||||
|
||||
| Machine | GPU VRAM | Prose model |
|
||||
|---------|----------|-------------|
|
||||
| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` |
|
||||
| GURU-BEAST-ROG | 24 GB (RTX 4090) | `qwen3:14b` (always-on Tailscale host — `100.101.122.4`) |
|
||||
| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` (local — 4.8x faster than 14b here) |
|
||||
| Mikes-MacBook-Air | unified memory | `qwen3:14b` |
|
||||
| HOWARD-HOME | local Ollama | `qwen3:14b` |
|
||||
| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote / `qwen3:14b` now; `qwen3:8b` if local installed |
|
||||
| Other | Tailscale fallback | `qwen3:14b` |
|
||||
| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote Beast / `qwen3:14b` now; `qwen3:8b` if local installed |
|
||||
| Other | Tailscale fallback (Beast) | `qwen3:14b` |
|
||||
|
||||
> **GURU-KALI status (2026-05-24):** Tailscale installed — remote Ollama
|
||||
> (`100.92.127.64`) IS reachable, so it uses the Tailscale-fallback prose model
|
||||
> `qwen3:14b` (the "Other" row). No local Ollama yet. It has the strongest hardware
|
||||
> in the fleet, but the GPU runs the nouveau driver (no CUDA), so a future local
|
||||
> (Beast at `100.101.122.4`) is reachable, so it uses the Tailscale-fallback prose model
|
||||
> `qwen3:14b` (the "Other" row). No local Ollama yet. It has strong hardware
|
||||
> but the GPU runs the nouveau driver (no CUDA), so a future local
|
||||
> Ollama would need the proprietary NVIDIA driver for GPU accel; `qwen3:8b` would
|
||||
> then fit its 8 GB VRAM (mirrors DESKTOP-0O8A1RL), with larger models splitting to
|
||||
> CPU. Full machine profile: `.claude/machines/guru-kali.md`.
|
||||
>
|
||||
> **GURU-BEAST-ROG models (2026-05-25):** `gemma3:27b`, `qwen3:32b`, `qwen3:14b`, `codestral:22b`, `nomic-embed-text`. Note: `qwen3.6:latest` and `qwen3:8b` not yet installed — add if strict-format or speed routing is needed.
|
||||
|
||||
## Endpoints
|
||||
|
||||
@@ -53,17 +56,18 @@ Auto-detect: any machine that has a local Ollama listening on `127.0.0.1:11434`
|
||||
if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
|
||||
OLLAMA="http://localhost:11434"
|
||||
else
|
||||
OLLAMA="http://100.92.127.64:11434"
|
||||
OLLAMA="http://100.101.122.4:11434"
|
||||
fi
|
||||
```
|
||||
|
||||
Rationale:
|
||||
- **Mike's workstation (DESKTOP-0O8A1RL):** local matches, no change.
|
||||
- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop, no load on Mike's GPU.
|
||||
- **Other team machines:** no local Ollama → falls back to Mike's over Tailscale.
|
||||
- **Mike's machine offline:** graceful degradation — local users continue working; non-local users get a clean timeout.
|
||||
- **DESKTOP-0O8A1RL:** local matches, uses local Ollama — faster, no Tailscale hop.
|
||||
- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop.
|
||||
- **GURU-BEAST-ROG:** always-on; the canonical fallback for all machines without a local Ollama.
|
||||
- **Other team machines:** no local Ollama → falls back to Beast over Tailscale.
|
||||
- **Beast offline (rare):** graceful degradation — local Ollama users continue; remote users get a clean timeout.
|
||||
|
||||
Manual override (for testing or explicit preference): set `OLLAMA=http://100.92.127.64:11434` before the call.
|
||||
Manual override (for testing or explicit preference): set `OLLAMA=http://100.101.122.4:11434` before the call.
|
||||
|
||||
Check reachability:
|
||||
```bash
|
||||
@@ -86,7 +90,7 @@ Preferred one-liner:
|
||||
```bash
|
||||
python -c "
|
||||
import urllib.request, json, sys, os
|
||||
OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.92.127.64:11434')
|
||||
OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.101.122.4:11434')
|
||||
body = json.dumps({
|
||||
'model':'qwen3:14b',
|
||||
'messages':[{'role':'user','content': sys.argv[1]}],
|
||||
|
||||
Submodule projects/msp-tools/guru-rmm updated: 1ed55964db...4b3c278daa
Reference in New Issue
Block a user