diff --git a/.claude/memory/gururmm-log-analysis-claude-cutover.md b/.claude/memory/gururmm-log-analysis-claude-cutover.md index 71e158d..daf548e 100644 --- a/.claude/memory/gururmm-log-analysis-claude-cutover.md +++ b/.claude/memory/gururmm-log-analysis-claude-cutover.md @@ -10,7 +10,9 @@ cut over from **Ollama (qwen3:14b on Beast, `100.101.122.4:11434`)** to the **Anthropic API (Claude Haiku 4.5)** on 2026-06-12 (decision: Mike). **Why — the "Ollama unreachable" error was a mislabeled timeout, not reachability.** -The server VM `.30` (gururmm, `172.16.3.30`) reaches Beast fine for `/api/tags` and +The GuruRMM server `.30` (gururmm, `172.16.3.30` — a **physical box**, Ubuntu 26.04; +the VM-on-Jupiter was retired and the physical server took over the `.30` IP) reaches +Beast fine for `/api/tags` and short warm `/api/chat` (warm "say OK" = 1.1s), but a fleet-sized `/api/chat` (~1500 log lines / ~17KB) never completes — it hit the curl 300s ceiling even warm. Cause is qwen3:14b's minutes-long inference on a big prompt over a flaky cross-LAN @@ -33,9 +35,12 @@ per-project key, mint its own). **ZDR requested from Anthropic, pending** — or not a console toggle (email sales@anthropic.com). Test fleet OK to run before ZDR confirms; don't point a production fleet at it until ZDR is live. -**Deploy shape.** Production server is a **native binary** `/opt/gururmm/gururmm-server` -via systemd, `EnvironmentFile=/opt/gururmm/.env` (both root-only; `guru` can't write -.env or restart). A CI pipeline builds/ships the binary on commit (`[ci-version-bump]` -commits). `.30` has no cargo. So deploy = commit/push (CI builds binary) **+ a root -action on `.30`** to add `ANTHROPIC_API_KEY` (and optional `ANTHROPIC_MODEL`) to -`/opt/gururmm/.env` and restart `gururmm-server`. +**Deploy shape (DONE 2026-06-12).** Production server is a **native binary** +`/opt/gururmm/gururmm-server` via systemd, `EnvironmentFile=/opt/gururmm/.env` +(root-owned). A Gitea webhook → CI builds+ships the binary on push to gururmm `main` +(no cargo on `.30`). `guru` CAN do root ops via `sudo` with the password in vault +`infrastructure/gururmm-server` `credentials.password` (SSH via `~/.ssh/gururmm-physical`). +Shipped: gururmm `c869e4d` → CI redeployed the binary; `ANTHROPIC_API_KEY` appended to +`/opt/gururmm/.env`; `gururmm-server` restarted; `/api/logs/analyze` verified end-to-end +(1500 logs → 10 findings in 24s). **Migration note:** the key lives in `.30`'s local +`.env`, not the repo — already on the physical `.30`, so nothing to re-add. diff --git a/.claude/memory/gururmm-physical-server-storage.md b/.claude/memory/gururmm-physical-server-storage.md index 19af687..5a5b64f 100644 --- a/.claude/memory/gururmm-physical-server-storage.md +++ b/.claude/memory/gururmm-physical-server-storage.md @@ -37,13 +37,14 @@ mgmt IPs; decommission the old VM after a stability soak (VM parked on .46, powe for rollback -- do NOT delete yet). -The GuruRMM server/build-pipeline is being migrated from the VM (172.16.3.30, slow -rotational-backed disk — the cause of the WAL-fsync pool timeouts) to a **physical box**. - -New box (as of 2026-06-10): **172.16.1.231** (TEMPORARY IP — will become 172.16.3.30 at -cutover), hostname `gururmm`, **Ubuntu 26.04 LTS**. SSH: dedicated ed25519 key -`~/.ssh/gururmm-physical` (alias `gururmm-new`), vault `infrastructure/gururmm-server-physical` -(also holds the initial `guru` password). sudo needs that password (`sudo -S`), not passwordless. +**History (pre-cutover — now DONE, retained for context).** The GuruRMM server/build-pipeline +ran on a **VM** at 172.16.3.30 (slow rotational-backed disk — the WAL-fsync pool-timeout cause) +and was migrated to a **physical box**, which took over the 172.16.3.30 IP at cutover +(2026-06-11). During provisioning (2026-06-10) the physical box was briefly at temp DHCP IP +**172.16.1.231**; that IP is no longer used. hostname `gururmm`, **Ubuntu 26.04 LTS**. SSH: +dedicated ed25519 key `~/.ssh/gururmm-physical` to `guru@172.16.3.30`, vault +`infrastructure/gururmm-server-physical` (SSH key + initial `guru` password). sudo needs that +password (`sudo -S`), not passwordless. **Drives (storage optimized 2026-06-10):** - **SSD `sda`** (Samsung 860, 929 GB) = HOT tier. Installer had left root at only 100 GB; @@ -79,4 +80,4 @@ re-point (the `http://172.16.3.30:8001` refs + Cloudflare→pfSense→.30 path a `pg_dumpall --globals-only` + `pg_dump -Fc`/`pg_restore -j` (14→16, schema as-is — storage tiering is a SEPARATE later task). Full runbook (Gate-A pre-flight, cutover from CONSOLE, ARP flush, credential-decrypt gate, PONR=first-agent-reconnect, rollback): `projects/msp-tools/guru-rmm/docs/ -HOST_MIGRATION_RUNBOOK.md`. NOT yet executed — needs a window + the Gate-A unknowns closed. +HOST_MIGRATION_RUNBOOK.md`. EXECUTED and COMPLETE 2026-06-11 (see the top of this note). diff --git a/.claude/memory/infra_office_network.md b/.claude/memory/infra_office_network.md index 97c7c68..11cea9e 100644 --- a/.claude/memory/infra_office_network.md +++ b/.claude/memory/infra_office_network.md @@ -13,7 +13,7 @@ ACG office LAN is 172.16.0.0/22, routed via Tailscale through pfSense node `pfse | pfSense | 172.16.0.1 | port 2248, user admin | Router, DNS (Unbound), Tailscale subnet router | | Jupiter | 172.16.3.20 | port 22, user root | Unraid NAS — all VMs + Docker containers | | Uranus | 172.16.3.21 | (no key) | OwnCloud additional storage only — NOT a proxy | -| GuruRMM VM | 172.16.3.30 | port 22, user guru | Linux VM on Jupiter — GuruRMM, Coord API, MariaDB, Gitea | +| GuruRMM | 172.16.3.30 | port 22, user guru | PHYSICAL box (Ubuntu 26.04) — took the .30 IP when the Jupiter VM was retired 2026-06-11; runs GuruRMM, Coord API, MariaDB/PostgreSQL. Old VM parked at .46 (rollback) | | Pluto | 172.16.3.36 | (Windows) | Windows Server 2019 VM on Jupiter — MSI build server | **Why:** How to apply: check these IPs before assuming what's where. .21 is NOT the Seafile proxy — NPM on .20 is. diff --git a/.claude/memory/reference_resource_map.md b/.claude/memory/reference_resource_map.md index d938b6e..7ec2c87 100644 --- a/.claude/memory/reference_resource_map.md +++ b/.claude/memory/reference_resource_map.md @@ -36,7 +36,7 @@ type: reference - Detail: [[infra_office_network]]. ### gururmm-server (172.16.3.30, hostname `gururmm`) -- **What:** Linux VM on Jupiter. THE workhorse — runs MariaDB, PostgreSQL, ClaudeTools API (`:8001`), GuruRMM API (`:3001`), GuruConnect server (`:3002`), coord API, Gitea Actions runner, build pipeline, webhook. +- **What:** PHYSICAL box (Ubuntu 26.04), NOT a VM — took the .30 IP when the Jupiter VM was retired 2026-06-11 (old VM parked at 172.16.3.46 as rollback). THE workhorse — runs MariaDB, PostgreSQL, ClaudeTools API (`:8001`), GuruRMM API (`:3001`), GuruConnect server (`:3002`), coord API, Gitea Actions runner, build pipeline, webhook. - **Default:** `ssh guru@172.16.3.30`. Password `infrastructure/gururmm-server.sops.yaml` `credentials.password`. User is **`guru`** NOT `mike`. Home `/home/guru/`. - **Gotcha:** for cargo/protoc/PATH, use a **login shell**: `ssh guru@172.16.3.30 'bash -lc "..."'`. Non-interactive shell doesn't source `~/.profile` and these look "missing". - **Layout:** repo at `/home/guru/gururmm`, build pipeline at `/opt/gururmm/` (auto-synced from repo `deploy/build-pipeline/` by `build-shared.sh`).