From 8bdb9197c428a290be933ac30c5822e029d72275 Mon Sep 17 00:00:00 2001
From: Mike Swanson <mike@azcomputerguru.com>
Date: Tue, 26 May 2026 19:59:16 -0700
Subject: [PATCH] sync: auto-sync from GURU-KALI at 2026-05-26 19:59:15

Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-26 19:59:15
---
 .claude/CLAUDE.md                             |  8 +++----
 .claude/OLLAMA.md                             | 23 +++++++++++--------
 .../memory/feedback_ollama_tier0_routing.md   | 17 +++++++-------
 .../memory/reference_pluto_build_server.md    |  2 +-
 4 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index f6c7bff..5c9cd71 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -290,11 +290,11 @@ Tier 0 — **Ollama is the documentation and classification engine.** Route pros
 
 | Machine | Endpoint |
 |---------|----------|
-| DESKTOP-0O8A1RL | `http://localhost:11434` |
-| GURU-BEAST-ROG | `http://localhost:11434` (always-on; canonical Tailscale fallback) |
-| Other | `http://100.101.122.4:11434` (Beast via Tailscale) |
+| Local Ollama present (GURU-5070, Howard-Home, etc.) | `http://localhost:11434` |
+| GURU-BEAST-ROG | `http://localhost:11434` (always-on RTX 4090; common fallback target) |
+| No local Ollama | per-machine fallback from `.claude/identity.json` `ollama_fallback` (e.g. Beast `http://100.101.122.4:11434`) |
 
-Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Tailscale fallback host: **GURU-BEAST-ROG** (`100.101.122.4:11434`, always on, RTX 4090). Full reference + per-machine routing: `.claude/OLLAMA.md`
+Models: `qwen3.6:latest` (strict-format: JSON, classification, structured rules, redaction, word-limited summaries), `qwen3:8b` (prose on DESKTOP-0O8A1RL — 86 tok/s, full 12 GB VRAM fit), `qwen3:14b` (prose everywhere else — ~66 tok/s), `codestral:22b` (code suggestions — always review). Fallback host is per-machine — set in `.claude/identity.json` `ollama_fallback`; GURU-BEAST-ROG (`100.101.122.4:11434`, always on, RTX 4090) is the usual choice. Full reference + per-machine routing: `.claude/OLLAMA.md`
 
 ### GrepAI (Semantic Code Search)
 
diff --git a/.claude/OLLAMA.md b/.claude/OLLAMA.md
index 065be89..d31c7d1 100644
--- a/.claude/OLLAMA.md
+++ b/.claude/OLLAMA.md
@@ -49,23 +49,26 @@ qwen3:14b and qwen3.6 are CPU-bottlenecked on this machine (split mode, PCIe ban
 
 ## Endpoints
 
-Auto-detect: any machine that has a local Ollama listening on `127.0.0.1:11434` uses local. Otherwise fall back to Mike's workstation over Tailscale.
+Auto-detect: any machine with a local Ollama on `127.0.0.1:11434` uses local. Otherwise it falls back to the **per-machine** host in `.claude/identity.json` `ollama_fallback` — each user/machine chooses its own (GURU-BEAST-ROG, the always-on RTX 4090, is the usual pick).
 
 ```bash
-# Preferred universal resolver — works on any machine
-if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
-    OLLAMA="http://localhost:11434"
+# Universal resolver — the fallback is per-machine, read from identity.json (never hardcode)
+LOCAL="http://localhost:11434"
+FALLBACK=$(python3 -c "import json;print((json.load(open('.claude/identity.json')).get('ollama_fallback') or {}).get('endpoint',''))" 2>/dev/null)
+if curl -s -m 2 "$LOCAL/api/tags" >/dev/null 2>&1; then
+    OLLAMA="$LOCAL"
+elif [ -n "$FALLBACK" ]; then
+    OLLAMA="$FALLBACK"          # e.g. GURU-KALI -> http://100.101.122.4:11434 (GURU-BEAST-ROG)
 else
-    OLLAMA="http://100.101.122.4:11434"
+    OLLAMA="$LOCAL"             # no fallback configured — local only
 fi
 ```
 
 Rationale:
-- **DESKTOP-0O8A1RL:** local matches, uses local Ollama — faster, no Tailscale hop.
-- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop.
-- **GURU-BEAST-ROG:** always-on; the canonical fallback for all machines without a local Ollama.
-- **Other team machines:** no local Ollama → falls back to Beast over Tailscale.
-- **Beast offline (rare):** graceful degradation — local Ollama users continue; remote users get a clean timeout.
+- **Local-Ollama machines** (e.g. Howard-Home, with the canonical model set) use local — faster, zero Tailscale hop; leave `ollama_fallback` unset/local.
+- **GURU-BEAST-ROG:** always-on RTX 4090; the usual `ollama_fallback` target for machines without local models.
+- **Machines without local Ollama** set `ollama_fallback` in identity.json to the host they want (commonly Beast over Tailscale).
+- **Fallback offline (rare):** graceful degradation — local users continue; remote users get a clean timeout.
 
 Manual override (for testing or explicit preference): set `OLLAMA=http://100.101.122.4:11434` before the call.
 
diff --git a/.claude/memory/feedback_ollama_tier0_routing.md b/.claude/memory/feedback_ollama_tier0_routing.md
index 8df915d..2504251 100644
--- a/.claude/memory/feedback_ollama_tier0_routing.md
+++ b/.claude/memory/feedback_ollama_tier0_routing.md
@@ -15,18 +15,19 @@ Route Tier-0 tasks (summaries, classifications, drafts, extractions) through Oll
 - Suggesting refactors / generating docstrings → codestral:22b (then review)
 - NEVER for: auth decisions, credential handling, production migrations, security review, citation work, production-change scripts
 
-**Endpoint resolution (updated 2026-04-22 in `.claude/OLLAMA.md`):**
+**Endpoint resolution — the remote fallback is a PER-MACHINE choice in `.claude/identity.json` `ollama_fallback`, never hardcoded:**
 ```bash
-if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
-    OLLAMA="http://localhost:11434"
+LOCAL="http://localhost:11434"
+FALLBACK=$(python3 -c "import json;print((json.load(open('.claude/identity.json')).get('ollama_fallback') or {}).get('endpoint',''))" 2>/dev/null)
+if curl -s -m 2 "$LOCAL/api/tags" >/dev/null 2>&1; then
+    OLLAMA="$LOCAL"                 # local Ollama is up — use it
+elif [ -n "$FALLBACK" ]; then
+    OLLAMA="$FALLBACK"              # per-machine fallback from identity.json
 else
-    OLLAMA="http://100.92.127.64:11434"
+    OLLAMA="$LOCAL"                 # no fallback configured — local only
 fi
 ```
-
-[DISCREPANCY 2026-05-26 — CLAUDE.md gives the canonical always-on Tailscale fallback as GURU-BEAST-ROG @ 100.101.122.4. Defer to CLAUDE.md; Mike to confirm which is correct.]
-
-Howard-Home has the canonical models loaded locally (qwen3:14b, codestral:22b, nomic-embed-text, plus bonus qwen3-coder:30b) — so Howard-Home uses local Ollama, not Mike's. Zero Tailscale hop.
+Each machine sets its own `ollama_fallback` in identity.json, e.g. `{"host":"GURU-BEAST-ROG","endpoint":"http://100.101.122.4:11434"}`. GURU-BEAST-ROG (RTX 4090, always on) is the usual choice; GURU-KALI is set to it (confirmed 2026-05-26). A machine with local models loaded (e.g. Howard-Home: qwen3:14b, codestral:22b, nomic-embed-text, qwen3-coder:30b) can leave `ollama_fallback` unset/local — zero Tailscale hop. Do NOT bake a fallback IP into shared files (memory, OLLAMA.md, CLAUDE.md) — read it from identity.json.
 
 **Call pattern for qwen3 — use `/api/chat` with `think:false`**, NOT `/api/generate`. qwen3 on generate endpoint dumps reasoning into internal thinking tokens and returns empty `response` field. Chat endpoint with `think:false` returns clean content in `message.content`:
 
diff --git a/.claude/memory/reference_pluto_build_server.md b/.claude/memory/reference_pluto_build_server.md
index fe2ea74..e9c1357 100644
--- a/.claude/memory/reference_pluto_build_server.md
+++ b/.claude/memory/reference_pluto_build_server.md
@@ -9,7 +9,7 @@ Pluto is a Windows Server VM on Jupiter. It is the **general-purpose Windows bui
 - **Hostname:** PLUTO (VM on Jupiter)
 - **Static IP:** 172.16.3.36 (confirmed static 2026-04-19)
 - **SSH:** `ssh -i ~/.ssh/id_ed25519 Administrator@172.16.3.36` (key auth)
-- **Authorized key:** `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINXR2BOcFAlOPuB7OYOKfOZDNd3u1tCt/IINRH9beFyB guru@DESKTOP-0O8A1RL` [STALE 2026-05-26 — DESKTOP-0O8A1RL is RETIRED. VERIFY GURU-5070's key is authorized on Pluto and rotate out the old key.]
+- **Authorized keys (verified via RMM 2026-05-26):** `gururmm-build@gururmm-server` and `guru@gururmm-build` (the build server's keys), present in both `C:\ProgramData\ssh\administrators_authorized_keys` and `Administrator\.ssh\authorized_keys`. The old `guru@DESKTOP-0O8A1RL` key (retired machine) has already been rotated out. NOTE: no personal-workstation key (e.g. GURU-5070) is currently authorized — the `ssh -i ~/.ssh/id_ed25519 Administrator@172.16.3.36` workflow below works only from a host whose pubkey is in the file; add GURU-5070's pubkey to `administrators_authorized_keys` if you need direct workstation SSH.
 
 ## Installed Toolchain