sync: auto-sync from GURU-5070 at 2026-05-25 06:00:45

Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-25 06:00:45
2026-05-25 06:00:47 -07:00
parent cdd6e6fc8c
commit 2491660b88
3 changed files with 22 additions and 17 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -282,9 +282,10 @@ Tier 0 — **Ollama is the documentation and classification engine.** Route pros
 | Machine | Endpoint |
 |---------|----------|
 | DESKTOP-0O8A1RL | `http://localhost:11434` |
-| Other | `http://100.92.127.64:11434` (Tailscale) |
+| GURU-BEAST-ROG | `http://localhost:11434` (always-on; canonical Tailscale fallback) |
+| Other | `http://100.101.122.4:11434` (Beast via Tailscale) |

-Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Full reference + per-machine routing: `.claude/OLLAMA.md`
+Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Tailscale fallback host: **GURU-BEAST-ROG** (`100.101.122.4:11434`, always on, RTX 4090). Full reference + per-machine routing: `.claude/OLLAMA.md`

 ### GrepAI (Semantic Code Search)

--- a/.claude/OLLAMA.md
+++ b/.claude/OLLAMA.md
@@ -1,6 +1,6 @@
 # Ollama — Local AI Reference

-Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Available to all team members via Tailscale.
+Ollama's always-on host is **GURU-BEAST-ROG** (RTX 4090, 24 GB VRAM, Tailscale `100.101.122.4`). It is the canonical Tailscale fallback for all machines without a local Ollama. DESKTOP-0O8A1RL and other workstations use local when available, Beast otherwise.

 ## Models

@@ -30,19 +30,22 @@ qwen3:14b and qwen3.6 are CPU-bottlenecked on this machine (split mode, PCIe ban

 | Machine | GPU VRAM | Prose model |
 |---------|----------|-------------|
-| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` |
+| GURU-BEAST-ROG | 24 GB (RTX 4090) | `qwen3:14b` (always-on Tailscale host — `100.101.122.4`) |
+| DESKTOP-0O8A1RL | 12 GB (RTX 5070 Ti Laptop) | `qwen3:8b` (local — 4.8x faster than 14b here) |
 | Mikes-MacBook-Air | unified memory | `qwen3:14b` |
 | HOWARD-HOME | local Ollama | `qwen3:14b` |
-| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote / `qwen3:14b` now; `qwen3:8b` if local installed |
-| Other | Tailscale fallback | `qwen3:14b` |
+| GURU-KALI | 8 GB (RTX 4070 Mobile) — see note | remote Beast / `qwen3:14b` now; `qwen3:8b` if local installed |
+| Other | Tailscale fallback (Beast) | `qwen3:14b` |

 > **GURU-KALI status (2026-05-24):** Tailscale installed — remote Ollama
-> (`100.92.127.64`) IS reachable, so it uses the Tailscale-fallback prose model
-> `qwen3:14b` (the "Other" row). No local Ollama yet. It has the strongest hardware
-> in the fleet, but the GPU runs the nouveau driver (no CUDA), so a future local
+> (Beast at `100.101.122.4`) is reachable, so it uses the Tailscale-fallback prose model
+> `qwen3:14b` (the "Other" row). No local Ollama yet. It has strong hardware
+> but the GPU runs the nouveau driver (no CUDA), so a future local
 > Ollama would need the proprietary NVIDIA driver for GPU accel; `qwen3:8b` would
 > then fit its 8 GB VRAM (mirrors DESKTOP-0O8A1RL), with larger models splitting to
 > CPU. Full machine profile: `.claude/machines/guru-kali.md`.
+>
+> **GURU-BEAST-ROG models (2026-05-25):** `gemma3:27b`, `qwen3:32b`, `qwen3:14b`, `codestral:22b`, `nomic-embed-text`. Note: `qwen3.6:latest` and `qwen3:8b` not yet installed — add if strict-format or speed routing is needed.

 ## Endpoints

@@ -53,17 +56,18 @@ Auto-detect: any machine that has a local Ollama listening on `127.0.0.1:11434`
 if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
    OLLAMA="http://localhost:11434"
 else
-    OLLAMA="http://100.92.127.64:11434"
+    OLLAMA="http://100.101.122.4:11434"
 fi
 ```

 Rationale:
- **Mike's workstation (DESKTOP-0O8A1RL):** local matches, no change.
- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop, no load on Mike's GPU.
- **Other team machines:** no local Ollama → falls back to Mike's over Tailscale.
- **Mike's machine offline:** graceful degradation — local users continue working; non-local users get a clean timeout.
+- **DESKTOP-0O8A1RL:** local matches, uses local Ollama — faster, no Tailscale hop.
+- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop.
+- **GURU-BEAST-ROG:** always-on; the canonical fallback for all machines without a local Ollama.
+- **Other team machines:** no local Ollama → falls back to Beast over Tailscale.
+- **Beast offline (rare):** graceful degradation — local Ollama users continue; remote users get a clean timeout.

-Manual override (for testing or explicit preference): set `OLLAMA=http://100.92.127.64:11434` before the call.
+Manual override (for testing or explicit preference): set `OLLAMA=http://100.101.122.4:11434` before the call.

 Check reachability:
 ```bash
@@ -86,7 +90,7 @@ Preferred one-liner:
 ```bash
 python -c "
 import urllib.request, json, sys, os
-OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.92.127.64:11434')
+OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.101.122.4:11434')
 body = json.dumps({
  'model':'qwen3:14b',
  'messages':[{'role':'user','content': sys.argv[1]}],
--- a/projects/msp-tools/guru-rmm
+++ b/projects/msp-tools/guru-rmm