sync: auto-sync from GURU-5070 at 2026-06-12 07:28:38

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-12 07:28:38
This commit is contained in:
2026-06-12 07:28:53 -07:00
parent 7f06e47f09
commit 95c96d5dec
4 changed files with 23 additions and 17 deletions

View File

@@ -10,7 +10,9 @@ cut over from **Ollama (qwen3:14b on Beast, `100.101.122.4:11434`)** to the
**Anthropic API (Claude Haiku 4.5)** on 2026-06-12 (decision: Mike). **Anthropic API (Claude Haiku 4.5)** on 2026-06-12 (decision: Mike).
**Why — the "Ollama unreachable" error was a mislabeled timeout, not reachability.** **Why — the "Ollama unreachable" error was a mislabeled timeout, not reachability.**
The server VM `.30` (gururmm, `172.16.3.30`) reaches Beast fine for `/api/tags` and The GuruRMM server `.30` (gururmm, `172.16.3.30` — a **physical box**, Ubuntu 26.04;
the VM-on-Jupiter was retired and the physical server took over the `.30` IP) reaches
Beast fine for `/api/tags` and
short warm `/api/chat` (warm "say OK" = 1.1s), but a fleet-sized `/api/chat` short warm `/api/chat` (warm "say OK" = 1.1s), but a fleet-sized `/api/chat`
(~1500 log lines / ~17KB) never completes — it hit the curl 300s ceiling even warm. (~1500 log lines / ~17KB) never completes — it hit the curl 300s ceiling even warm.
Cause is qwen3:14b's minutes-long inference on a big prompt over a flaky cross-LAN Cause is qwen3:14b's minutes-long inference on a big prompt over a flaky cross-LAN
@@ -33,9 +35,12 @@ per-project key, mint its own). **ZDR requested from Anthropic, pending** — or
not a console toggle (email sales@anthropic.com). Test fleet OK to run before ZDR not a console toggle (email sales@anthropic.com). Test fleet OK to run before ZDR
confirms; don't point a production fleet at it until ZDR is live. confirms; don't point a production fleet at it until ZDR is live.
**Deploy shape.** Production server is a **native binary** `/opt/gururmm/gururmm-server` **Deploy shape (DONE 2026-06-12).** Production server is a **native binary**
via systemd, `EnvironmentFile=/opt/gururmm/.env` (both root-only; `guru` can't write `/opt/gururmm/gururmm-server` via systemd, `EnvironmentFile=/opt/gururmm/.env`
.env or restart). A CI pipeline builds/ships the binary on commit (`[ci-version-bump]` (root-owned). A Gitea webhook → CI builds+ships the binary on push to gururmm `main`
commits). `.30` has no cargo. So deploy = commit/push (CI builds binary) **+ a root (no cargo on `.30`). `guru` CAN do root ops via `sudo` with the password in vault
action on `.30`** to add `ANTHROPIC_API_KEY` (and optional `ANTHROPIC_MODEL`) to `infrastructure/gururmm-server` `credentials.password` (SSH via `~/.ssh/gururmm-physical`).
`/opt/gururmm/.env` and restart `gururmm-server`. Shipped: gururmm `c869e4d` → CI redeployed the binary; `ANTHROPIC_API_KEY` appended to
`/opt/gururmm/.env`; `gururmm-server` restarted; `/api/logs/analyze` verified end-to-end
(1500 logs → 10 findings in 24s). **Migration note:** the key lives in `.30`'s local
`.env`, not the repo — already on the physical `.30`, so nothing to re-add.

View File

@@ -37,13 +37,14 @@ mgmt IPs; decommission the old VM after a stability soak (VM parked on .46, powe
for rollback -- do NOT delete yet). for rollback -- do NOT delete yet).
The GuruRMM server/build-pipeline is being migrated from the VM (172.16.3.30, slow **History (pre-cutover — now DONE, retained for context).** The GuruRMM server/build-pipeline
rotational-backed disk — the cause of the WAL-fsync pool timeouts) to a **physical box**. ran on a **VM** at 172.16.3.30 (slow rotational-backed disk — the WAL-fsync pool-timeout cause)
and was migrated to a **physical box**, which took over the 172.16.3.30 IP at cutover
New box (as of 2026-06-10): **172.16.1.231** (TEMPORARY IP — will become 172.16.3.30 at (2026-06-11). During provisioning (2026-06-10) the physical box was briefly at temp DHCP IP
cutover), hostname `gururmm`, **Ubuntu 26.04 LTS**. SSH: dedicated ed25519 key **172.16.1.231**; that IP is no longer used. hostname `gururmm`, **Ubuntu 26.04 LTS**. SSH:
`~/.ssh/gururmm-physical` (alias `gururmm-new`), vault `infrastructure/gururmm-server-physical` dedicated ed25519 key `~/.ssh/gururmm-physical` to `guru@172.16.3.30`, vault
(also holds the initial `guru` password). sudo needs that password (`sudo -S`), not passwordless. `infrastructure/gururmm-server-physical` (SSH key + initial `guru` password). sudo needs that
password (`sudo -S`), not passwordless.
**Drives (storage optimized 2026-06-10):** **Drives (storage optimized 2026-06-10):**
- **SSD `sda`** (Samsung 860, 929 GB) = HOT tier. Installer had left root at only 100 GB; - **SSD `sda`** (Samsung 860, 929 GB) = HOT tier. Installer had left root at only 100 GB;
@@ -79,4 +80,4 @@ re-point (the `http://172.16.3.30:8001` refs + Cloudflare→pfSense→.30 path a
`pg_dumpall --globals-only` + `pg_dump -Fc`/`pg_restore -j` (14→16, schema as-is — storage tiering `pg_dumpall --globals-only` + `pg_dump -Fc`/`pg_restore -j` (14→16, schema as-is — storage tiering
is a SEPARATE later task). Full runbook (Gate-A pre-flight, cutover from CONSOLE, ARP flush, is a SEPARATE later task). Full runbook (Gate-A pre-flight, cutover from CONSOLE, ARP flush,
credential-decrypt gate, PONR=first-agent-reconnect, rollback): `projects/msp-tools/guru-rmm/docs/ credential-decrypt gate, PONR=first-agent-reconnect, rollback): `projects/msp-tools/guru-rmm/docs/
HOST_MIGRATION_RUNBOOK.md`. NOT yet executed — needs a window + the Gate-A unknowns closed. HOST_MIGRATION_RUNBOOK.md`. EXECUTED and COMPLETE 2026-06-11 (see the top of this note).

View File

@@ -13,7 +13,7 @@ ACG office LAN is 172.16.0.0/22, routed via Tailscale through pfSense node `pfse
| pfSense | 172.16.0.1 | port 2248, user admin | Router, DNS (Unbound), Tailscale subnet router | | pfSense | 172.16.0.1 | port 2248, user admin | Router, DNS (Unbound), Tailscale subnet router |
| Jupiter | 172.16.3.20 | port 22, user root | Unraid NAS — all VMs + Docker containers | | Jupiter | 172.16.3.20 | port 22, user root | Unraid NAS — all VMs + Docker containers |
| Uranus | 172.16.3.21 | (no key) | OwnCloud additional storage only — NOT a proxy | | Uranus | 172.16.3.21 | (no key) | OwnCloud additional storage only — NOT a proxy |
| GuruRMM VM | 172.16.3.30 | port 22, user guru | Linux VM on Jupiter — GuruRMM, Coord API, MariaDB, Gitea | | GuruRMM | 172.16.3.30 | port 22, user guru | PHYSICAL box (Ubuntu 26.04) — took the .30 IP when the Jupiter VM was retired 2026-06-11; runs GuruRMM, Coord API, MariaDB/PostgreSQL. Old VM parked at .46 (rollback) |
| Pluto | 172.16.3.36 | (Windows) | Windows Server 2019 VM on Jupiter — MSI build server | | Pluto | 172.16.3.36 | (Windows) | Windows Server 2019 VM on Jupiter — MSI build server |
**Why:** How to apply: check these IPs before assuming what's where. .21 is NOT the Seafile proxy — NPM on .20 is. **Why:** How to apply: check these IPs before assuming what's where. .21 is NOT the Seafile proxy — NPM on .20 is.

View File

@@ -36,7 +36,7 @@ type: reference
- Detail: [[infra_office_network]]. - Detail: [[infra_office_network]].
### gururmm-server (172.16.3.30, hostname `gururmm`) ### gururmm-server (172.16.3.30, hostname `gururmm`)
- **What:** Linux VM on Jupiter. THE workhorse — runs MariaDB, PostgreSQL, ClaudeTools API (`:8001`), GuruRMM API (`:3001`), GuruConnect server (`:3002`), coord API, Gitea Actions runner, build pipeline, webhook. - **What:** PHYSICAL box (Ubuntu 26.04), NOT a VM — took the .30 IP when the Jupiter VM was retired 2026-06-11 (old VM parked at 172.16.3.46 as rollback). THE workhorse — runs MariaDB, PostgreSQL, ClaudeTools API (`:8001`), GuruRMM API (`:3001`), GuruConnect server (`:3002`), coord API, Gitea Actions runner, build pipeline, webhook.
- **Default:** `ssh guru@172.16.3.30`. Password `infrastructure/gururmm-server.sops.yaml` `credentials.password`. User is **`guru`** NOT `mike`. Home `/home/guru/`. - **Default:** `ssh guru@172.16.3.30`. Password `infrastructure/gururmm-server.sops.yaml` `credentials.password`. User is **`guru`** NOT `mike`. Home `/home/guru/`.
- **Gotcha:** for cargo/protoc/PATH, use a **login shell**: `ssh guru@172.16.3.30 'bash -lc "..."'`. Non-interactive shell doesn't source `~/.profile` and these look "missing". - **Gotcha:** for cargo/protoc/PATH, use a **login shell**: `ssh guru@172.16.3.30 'bash -lc "..."'`. Non-interactive shell doesn't source `~/.profile` and these look "missing".
- **Layout:** repo at `/home/guru/gururmm`, build pipeline at `/opt/gururmm/` (auto-synced from repo `deploy/build-pipeline/` by `build-shared.sh`). - **Layout:** repo at `/home/guru/gururmm`, build pipeline at `/opt/gururmm/` (auto-synced from repo `deploy/build-pipeline/` by `build-shared.sh`).