From 32ea783c3138e1842852c52d13a0e343a88f1814 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Fri, 12 Jun 2026 08:27:31 -0700 Subject: [PATCH] sync: auto-sync from GURU-5070 at 2026-06-12 08:27:16 Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-12 08:27:16 --- ...6-12-mike-rmm-claude-logs-vm-decom-wiki.md | 92 +++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 session-logs/2026-06-12-mike-rmm-claude-logs-vm-decom-wiki.md diff --git a/session-logs/2026-06-12-mike-rmm-claude-logs-vm-decom-wiki.md b/session-logs/2026-06-12-mike-rmm-claude-logs-vm-decom-wiki.md new file mode 100644 index 0000000..d88fceb --- /dev/null +++ b/session-logs/2026-06-12-mike-rmm-claude-logs-vm-decom-wiki.md @@ -0,0 +1,92 @@ +# 2026-06-12 — GuruRMM log-analysis → Claude Haiku, old-VM decommission, wiki VM/build-chain sweep + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +## Summary +Started from a `gururmm_server::api::logs` error: "Log analysis unavailable: Ollama unreachable +(http://100.101.122.4:11434/api/chat)". Diagnosed, fixed, and shipped a cutover off Ollama; then +decommissioned the old GuruRMM VM, swept stale "VM" + Windows-build-chain framing across wiki/memory, +ran wiki-lint, compiled 5 missing wiki articles, seeded GuruConnect, and patched the wiki-compile skill. + +## Root cause (the headline) +"Ollama unreachable" was a **mislabeled 120s reqwest timeout, NOT a reachability problem.** +- `100.101.122.4` already IS Beast (GURU-BEAST-ROG). The server `.30` reaches Beast fine for + `/api/tags` and short warm `/api/chat` (warm "say OK" = 1.1s), but a fleet-sized `/api/chat` + (~1500 logs) never completes — warm fleet-size curl from `.30` hit the 300s ceiling. +- Cause: qwen3:14b minutes-long inference on a big prompt over a flaky cross-LAN tailnet + (`.30` behind symmetric NAT `MappingVariesByDestIP:true`; Beast on Wi-Fi `10.2.51.228`). + reqwest's 120s `.timeout()` surfaced as "error sending request … Check Tailscale". +- Beast also had a duplicate-Ollama bind conflict (tray app's `ollama serve` couldn't bind 11434; + older PID 14144 held `0.0.0.0:11434` and served) — noisy, not the cause. + +## Fix shipped — log analysis now uses Claude Haiku 4.5 +- `server/src/api/logs.rs`: `analyze_logs_with_ollama` → `analyze_logs_with_claude`. POSTs + `https://api.anthropic.com/v1/messages` (plain HTTPS, no tailnet), `x-api-key` from env, + `ANTHROPIC_API_KEY` (required) + `ANTHROPIC_MODEL` (default `claude-haiku-4-5`). Structured + outputs (`output_config.format` + json_schema) → guaranteed-parseable findings JSON. +- Verified end-to-end against Haiku, then live via `/api/logs/analyze`: **1500 logs → 10 findings + in 24s** (was timing out at 120s). The findings even surfaced the old self-logged "Ollama + unreachable" errors — the problem we just fixed. +- ZDR (zero data retention) requested from Anthropic, **pending**. Test fleet OK in the meantime. + +## Old VM decommissioned + mgmt IP dropped +- Migration to the physical box completed 2026-06-11; soak signed off → deleted the parked old VM. +- Jupiter (172.16.3.20): `virsh destroy GuruRMM` + `undefine` + removed `/mnt/user/domains/GuruRMM` + (vdisk1.img, 64 GB). Did NOT use `--remove-all-storage` (would have wiped the shared Ubuntu ISO). + `.46` down; Pluto (`Claude-Builder`) + ISO intact. +- `.30` netplan: removed the secondary `172.16.3.47/22` (backup saved `.bak`), `netplan apply`; + `eno1` now carries only `172.16.3.30`. Route + service intact, 216 agents. + +## Wiki / knowledge sweeps +- Retired "GuruRMM VM / Linux VM on Jupiter" framing → `.30` is a **physical box** (Lenovo + ThinkCentre M83, Ubuntu 26.04): overview, index, jupiter, gururmm-build, internal-infrastructure, + POWER_FAILURE_RUNBOOK + 4 memory files. HOST_MIGRATION_RUNBOOK header flipped to COMPLETE. +- Windows build chain corrected everywhere: **Beast PRIMARY, Pluto FALLBACK** + (`attempt_build beast || attempt_build pluto`, verified in build-windows.sh): index, overview, + projects/gururmm, systems/pluto, internal-infrastructure + reference_pluto_build_server memory + + build-pipeline doc comments. +- `/wiki-lint` run: found+fixed 2 missed gaps (internal-infrastructure + pluto backlink). Pre-existing + backlog (missing/broken/index) left as-is. `guru-rmm.md` is an intentional redirect tombstone (kept); + the tailscale-enroll.ps1 "dead" link was a false positive (script exists). +- Compiled 5 missing articles via 5 parallel Sonnet sub-agents: clients gonzvar-tax-services, + tohono-oodham-doit (Syncro 33069069), tucson-golden-corral (3859123); projects gururmm-agent + (artifact-based), msp-tools (umbrella). Deduped the duplicate `system:neptune` compile-queue entry. +- Seeded **GuruConnect** wiki article (v0.3.0 production, ScreenConnect-class Rust tool; artifact-based + from guru-connect @ origin/main ded99c5). `[[guruconnect]]` backlinks now resolve. +- Per-client fixes: Gonzvar found via fuzzy `query=` ("Gonzvar Tax Service" singular, id **1830740**, + break-fix ~$175/hr, 6 assets). Golden Corral email = **Neptune Exchange** (per Mike; IX cPanel kept + as a caveat). **TGC-SERVER is colocated at ACG main office** (behind ACG office net, not a naked + public box at the restaurant). +- Patched wiki-compile Phase 2a: fuzzy `query=` + fallback ladder instead of near-exact `name=` + (root cause of the Gonzvar miss). + +## Credentials / access (vault paths — no secrets inlined) +- Anthropic API key (GuruRMM log analysis): vault `projects/gururmm/anthropic-api` `credentials.api_key`. + Deployed to `.30` at `/opt/gururmm/.env` as `ANTHROPIC_API_KEY` (root-only file). +- `.30` SSH: `~/.ssh/gururmm-physical` → `guru@172.16.3.30`; sudo password = vault + `infrastructure/gururmm-server.sops.yaml` `credentials.password` (`sudo -S`). +- Jupiter: `ssh -i ~/.ssh/id_ed25519 root@172.16.3.20`. +- Beast (Windows build PRIMARY): `guru@100.101.122.4` (tailnet), key `~/.ssh/id_ed25519`. + +## Infrastructure touched +- `.30` (physical GuruRMM/build host) — code deploy + .env + restart + netplan. +- Jupiter (.20) — deleted GuruRMM virsh domain. +- Beast (100.101.122.4) — inspected (duplicate Ollama). + +## Files changed / pushed +- gururmm repo: `c869e4d` (logs.rs→Claude), runbook `37c8593`/`8b301bf`, build-pipeline docs `a794a7f`. +- ClaudeTools: multiple — Claude-cutover memory, VM-sweep, build-chain, wiki-lint fixes, 5 compiled + articles + index, GuruConnect, per-client fixes, wiki-compile fuzzy-search fix (through `401ed7d4`). +- vault: `c1c9744` (Anthropic key entry). + +## Pending / follow-ups +- **ZDR** confirmation from Anthropic before pointing a production fleet at the key. +- Golden Corral: reconcile whether IX cPanel mail accounts/forwarders remain vs all-Neptune. +- Gonzvar: confirm contact name/email (Syncro record has phone only). +- `guru-connect` is an untracked standalone clone under `projects/msp-tools/` — consider making it a + proper submodule like guru-rmm. +- Optional: a server-side Ollama fallback in logs.rs (try Claude → fallback) — deferred; Beast path + is no longer used.