From 2491660b880c73aef26c7d4f886db8a2b74b3fe4 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Mon, 25 May 2026 06:00:47 -0700 Subject: [PATCH] sync: auto-sync from GURU-5070 at 2026-05-25 06:00:45 Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-25 06:00:45 --- .claude/CLAUDE.md | 5 +++-- .claude/OLLAMA.md | 32 ++++++++++++++++++-------------- projects/msp-tools/guru-rmm | 2 +- 3 files changed, 22 insertions(+), 17 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 997ba06..40f37ac 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -282,9 +282,10 @@ Tier 0 — **Ollama is the documentation and classification engine.** Route pros | Machine | Endpoint | |---------|----------| | DESKTOP-0O8A1RL | `http://localhost:11434` | -| Other | `http://100.92.127.64:11434` (Tailscale) | +| GURU-BEAST-ROG | `http://localhost:11434` (always-on; canonical Tailscale fallback) | +| Other | `http://100.101.122.4:11434` (Beast via Tailscale) | -Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Full reference + per-machine routing: `.claude/OLLAMA.md` +Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Tailscale fallback host: **GURU-BEAST-ROG** (`100.101.122.4:11434`, always on, RTX 4090). Full reference + per-machine routing: `.claude/OLLAMA.md` ### GrepAI (Semantic Code Search) diff --git a/.claude/OLLAMA.md b/.claude/OLLAMA.md index ada0705..065be89 100644 --- a/.claude/OLLAMA.md +++ b/.claude/OLLAMA.md @@ -1,6 +1,6 @@ # Ollama — Local AI Reference -Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Available to all team members via Tailscale. +Ollama's always-on host is **GURU-BEAST-ROG** (RTX 4090, 24 GB VRAM, Tailscale `100.101.122.4`). It is the canonical Tailscale fallback for all machines without a local Ollama. DESKTOP-0O8A1RL and other workstations use local when available, Beast otherwise. ## Models @@ -30,19 +30,22 @@ qwen3:14b and qwen3.6 are CPU-bottlenecked on this machine (split mode, PCIe ban | Machine | GPU VRAM | Prose model | |---------|----------|-------------| -| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` | +| GURU-BEAST-ROG | 24 GB (RTX 4090) | `qwen3:14b` (always-on Tailscale host — `100.101.122.4`) | +| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` (local — 4.8x faster than 14b here) | | Mikes-MacBook-Air | unified memory | `qwen3:14b` | | HOWARD-HOME | local Ollama | `qwen3:14b` | -| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote / `qwen3:14b` now; `qwen3:8b` if local installed | -| Other | Tailscale fallback | `qwen3:14b` | +| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote Beast / `qwen3:14b` now; `qwen3:8b` if local installed | +| Other | Tailscale fallback (Beast) | `qwen3:14b` | > **GURU-KALI status (2026-05-24):** Tailscale installed — remote Ollama -> (`100.92.127.64`) IS reachable, so it uses the Tailscale-fallback prose model -> `qwen3:14b` (the "Other" row). No local Ollama yet. It has the strongest hardware -> in the fleet, but the GPU runs the nouveau driver (no CUDA), so a future local +> (Beast at `100.101.122.4`) is reachable, so it uses the Tailscale-fallback prose model +> `qwen3:14b` (the "Other" row). No local Ollama yet. It has strong hardware +> but the GPU runs the nouveau driver (no CUDA), so a future local > Ollama would need the proprietary NVIDIA driver for GPU accel; `qwen3:8b` would > then fit its 8 GB VRAM (mirrors DESKTOP-0O8A1RL), with larger models splitting to > CPU. Full machine profile: `.claude/machines/guru-kali.md`. +> +> **GURU-BEAST-ROG models (2026-05-25):** `gemma3:27b`, `qwen3:32b`, `qwen3:14b`, `codestral:22b`, `nomic-embed-text`. Note: `qwen3.6:latest` and `qwen3:8b` not yet installed — add if strict-format or speed routing is needed. ## Endpoints @@ -53,17 +56,18 @@ Auto-detect: any machine that has a local Ollama listening on `127.0.0.1:11434` if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://localhost:11434" else - OLLAMA="http://100.92.127.64:11434" + OLLAMA="http://100.101.122.4:11434" fi ``` Rationale: -- **Mike's workstation (DESKTOP-0O8A1RL):** local matches, no change. -- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop, no load on Mike's GPU. -- **Other team machines:** no local Ollama → falls back to Mike's over Tailscale. -- **Mike's machine offline:** graceful degradation — local users continue working; non-local users get a clean timeout. +- **DESKTOP-0O8A1RL:** local matches, uses local Ollama — faster, no Tailscale hop. +- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop. +- **GURU-BEAST-ROG:** always-on; the canonical fallback for all machines without a local Ollama. +- **Other team machines:** no local Ollama → falls back to Beast over Tailscale. +- **Beast offline (rare):** graceful degradation — local Ollama users continue; remote users get a clean timeout. -Manual override (for testing or explicit preference): set `OLLAMA=http://100.92.127.64:11434` before the call. +Manual override (for testing or explicit preference): set `OLLAMA=http://100.101.122.4:11434` before the call. Check reachability: ```bash @@ -86,7 +90,7 @@ Preferred one-liner: ```bash python -c " import urllib.request, json, sys, os -OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.92.127.64:11434') +OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.101.122.4:11434') body = json.dumps({ 'model':'qwen3:14b', 'messages':[{'role':'user','content': sys.argv[1]}], diff --git a/projects/msp-tools/guru-rmm b/projects/msp-tools/guru-rmm index 1ed5596..4b3c278 160000 --- a/projects/msp-tools/guru-rmm +++ b/projects/msp-tools/guru-rmm @@ -1 +1 @@ -Subproject commit 1ed55964db77d3964b330370b4e68de6fce2c3d6 +Subproject commit 4b3c278daa7ee2ed924dacce5835327cc9c3f30c