sync: auto-sync from GURU-5070 at 2026-06-12 08:27:16

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-12 08:27:16
This commit is contained in:
2026-06-12 08:27:31 -07:00
parent 401ed7d4e0
commit 32ea783c31

View File

@@ -0,0 +1,92 @@
# 2026-06-12 — GuruRMM log-analysis → Claude Haiku, old-VM decommission, wiki VM/build-chain sweep
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Summary
Started from a `gururmm_server::api::logs` error: "Log analysis unavailable: Ollama unreachable
(http://100.101.122.4:11434/api/chat)". Diagnosed, fixed, and shipped a cutover off Ollama; then
decommissioned the old GuruRMM VM, swept stale "VM" + Windows-build-chain framing across wiki/memory,
ran wiki-lint, compiled 5 missing wiki articles, seeded GuruConnect, and patched the wiki-compile skill.
## Root cause (the headline)
"Ollama unreachable" was a **mislabeled 120s reqwest timeout, NOT a reachability problem.**
- `100.101.122.4` already IS Beast (GURU-BEAST-ROG). The server `.30` reaches Beast fine for
`/api/tags` and short warm `/api/chat` (warm "say OK" = 1.1s), but a fleet-sized `/api/chat`
(~1500 logs) never completes — warm fleet-size curl from `.30` hit the 300s ceiling.
- Cause: qwen3:14b minutes-long inference on a big prompt over a flaky cross-LAN tailnet
(`.30` behind symmetric NAT `MappingVariesByDestIP:true`; Beast on Wi-Fi `10.2.51.228`).
reqwest's 120s `.timeout()` surfaced as "error sending request … Check Tailscale".
- Beast also had a duplicate-Ollama bind conflict (tray app's `ollama serve` couldn't bind 11434;
older PID 14144 held `0.0.0.0:11434` and served) — noisy, not the cause.
## Fix shipped — log analysis now uses Claude Haiku 4.5
- `server/src/api/logs.rs`: `analyze_logs_with_ollama``analyze_logs_with_claude`. POSTs
`https://api.anthropic.com/v1/messages` (plain HTTPS, no tailnet), `x-api-key` from env,
`ANTHROPIC_API_KEY` (required) + `ANTHROPIC_MODEL` (default `claude-haiku-4-5`). Structured
outputs (`output_config.format` + json_schema) → guaranteed-parseable findings JSON.
- Verified end-to-end against Haiku, then live via `/api/logs/analyze`: **1500 logs → 10 findings
in 24s** (was timing out at 120s). The findings even surfaced the old self-logged "Ollama
unreachable" errors — the problem we just fixed.
- ZDR (zero data retention) requested from Anthropic, **pending**. Test fleet OK in the meantime.
## Old VM decommissioned + mgmt IP dropped
- Migration to the physical box completed 2026-06-11; soak signed off → deleted the parked old VM.
- Jupiter (172.16.3.20): `virsh destroy GuruRMM` + `undefine` + removed `/mnt/user/domains/GuruRMM`
(vdisk1.img, 64 GB). Did NOT use `--remove-all-storage` (would have wiped the shared Ubuntu ISO).
`.46` down; Pluto (`Claude-Builder`) + ISO intact.
- `.30` netplan: removed the secondary `172.16.3.47/22` (backup saved `.bak`), `netplan apply`;
`eno1` now carries only `172.16.3.30`. Route + service intact, 216 agents.
## Wiki / knowledge sweeps
- Retired "GuruRMM VM / Linux VM on Jupiter" framing → `.30` is a **physical box** (Lenovo
ThinkCentre M83, Ubuntu 26.04): overview, index, jupiter, gururmm-build, internal-infrastructure,
POWER_FAILURE_RUNBOOK + 4 memory files. HOST_MIGRATION_RUNBOOK header flipped to COMPLETE.
- Windows build chain corrected everywhere: **Beast PRIMARY, Pluto FALLBACK**
(`attempt_build beast || attempt_build pluto`, verified in build-windows.sh): index, overview,
projects/gururmm, systems/pluto, internal-infrastructure + reference_pluto_build_server memory +
build-pipeline doc comments.
- `/wiki-lint` run: found+fixed 2 missed gaps (internal-infrastructure + pluto backlink). Pre-existing
backlog (missing/broken/index) left as-is. `guru-rmm.md` is an intentional redirect tombstone (kept);
the tailscale-enroll.ps1 "dead" link was a false positive (script exists).
- Compiled 5 missing articles via 5 parallel Sonnet sub-agents: clients gonzvar-tax-services,
tohono-oodham-doit (Syncro 33069069), tucson-golden-corral (3859123); projects gururmm-agent
(artifact-based), msp-tools (umbrella). Deduped the duplicate `system:neptune` compile-queue entry.
- Seeded **GuruConnect** wiki article (v0.3.0 production, ScreenConnect-class Rust tool; artifact-based
from guru-connect @ origin/main ded99c5). `[[guruconnect]]` backlinks now resolve.
- Per-client fixes: Gonzvar found via fuzzy `query=` ("Gonzvar Tax Service" singular, id **1830740**,
break-fix ~$175/hr, 6 assets). Golden Corral email = **Neptune Exchange** (per Mike; IX cPanel kept
as a caveat). **TGC-SERVER is colocated at ACG main office** (behind ACG office net, not a naked
public box at the restaurant).
- Patched wiki-compile Phase 2a: fuzzy `query=` + fallback ladder instead of near-exact `name=`
(root cause of the Gonzvar miss).
## Credentials / access (vault paths — no secrets inlined)
- Anthropic API key (GuruRMM log analysis): vault `projects/gururmm/anthropic-api` `credentials.api_key`.
Deployed to `.30` at `/opt/gururmm/.env` as `ANTHROPIC_API_KEY` (root-only file).
- `.30` SSH: `~/.ssh/gururmm-physical``guru@172.16.3.30`; sudo password = vault
`infrastructure/gururmm-server.sops.yaml` `credentials.password` (`sudo -S`).
- Jupiter: `ssh -i ~/.ssh/id_ed25519 root@172.16.3.20`.
- Beast (Windows build PRIMARY): `guru@100.101.122.4` (tailnet), key `~/.ssh/id_ed25519`.
## Infrastructure touched
- `.30` (physical GuruRMM/build host) — code deploy + .env + restart + netplan.
- Jupiter (.20) — deleted GuruRMM virsh domain.
- Beast (100.101.122.4) — inspected (duplicate Ollama).
## Files changed / pushed
- gururmm repo: `c869e4d` (logs.rs→Claude), runbook `37c8593`/`8b301bf`, build-pipeline docs `a794a7f`.
- ClaudeTools: multiple — Claude-cutover memory, VM-sweep, build-chain, wiki-lint fixes, 5 compiled
articles + index, GuruConnect, per-client fixes, wiki-compile fuzzy-search fix (through `401ed7d4`).
- vault: `c1c9744` (Anthropic key entry).
## Pending / follow-ups
- **ZDR** confirmation from Anthropic before pointing a production fleet at the key.
- Golden Corral: reconcile whether IX cPanel mail accounts/forwarders remain vs all-Neptune.
- Gonzvar: confirm contact name/email (Syncro record has phone only).
- `guru-connect` is an untracked standalone clone under `projects/msp-tools/` — consider making it a
proper submodule like guru-rmm.
- Optional: a server-side Ollama fallback in logs.rs (try Claude → fallback) — deferred; Beast path
is no longer used.