sync: auto-sync from GURU-5070 at 2026-06-12 07:28:38
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-12 07:28:38
This commit is contained in:
@@ -10,7 +10,9 @@ cut over from **Ollama (qwen3:14b on Beast, `100.101.122.4:11434`)** to the
|
|||||||
**Anthropic API (Claude Haiku 4.5)** on 2026-06-12 (decision: Mike).
|
**Anthropic API (Claude Haiku 4.5)** on 2026-06-12 (decision: Mike).
|
||||||
|
|
||||||
**Why — the "Ollama unreachable" error was a mislabeled timeout, not reachability.**
|
**Why — the "Ollama unreachable" error was a mislabeled timeout, not reachability.**
|
||||||
The server VM `.30` (gururmm, `172.16.3.30`) reaches Beast fine for `/api/tags` and
|
The GuruRMM server `.30` (gururmm, `172.16.3.30` — a **physical box**, Ubuntu 26.04;
|
||||||
|
the VM-on-Jupiter was retired and the physical server took over the `.30` IP) reaches
|
||||||
|
Beast fine for `/api/tags` and
|
||||||
short warm `/api/chat` (warm "say OK" = 1.1s), but a fleet-sized `/api/chat`
|
short warm `/api/chat` (warm "say OK" = 1.1s), but a fleet-sized `/api/chat`
|
||||||
(~1500 log lines / ~17KB) never completes — it hit the curl 300s ceiling even warm.
|
(~1500 log lines / ~17KB) never completes — it hit the curl 300s ceiling even warm.
|
||||||
Cause is qwen3:14b's minutes-long inference on a big prompt over a flaky cross-LAN
|
Cause is qwen3:14b's minutes-long inference on a big prompt over a flaky cross-LAN
|
||||||
@@ -33,9 +35,12 @@ per-project key, mint its own). **ZDR requested from Anthropic, pending** — or
|
|||||||
not a console toggle (email sales@anthropic.com). Test fleet OK to run before ZDR
|
not a console toggle (email sales@anthropic.com). Test fleet OK to run before ZDR
|
||||||
confirms; don't point a production fleet at it until ZDR is live.
|
confirms; don't point a production fleet at it until ZDR is live.
|
||||||
|
|
||||||
**Deploy shape.** Production server is a **native binary** `/opt/gururmm/gururmm-server`
|
**Deploy shape (DONE 2026-06-12).** Production server is a **native binary**
|
||||||
via systemd, `EnvironmentFile=/opt/gururmm/.env` (both root-only; `guru` can't write
|
`/opt/gururmm/gururmm-server` via systemd, `EnvironmentFile=/opt/gururmm/.env`
|
||||||
.env or restart). A CI pipeline builds/ships the binary on commit (`[ci-version-bump]`
|
(root-owned). A Gitea webhook → CI builds+ships the binary on push to gururmm `main`
|
||||||
commits). `.30` has no cargo. So deploy = commit/push (CI builds binary) **+ a root
|
(no cargo on `.30`). `guru` CAN do root ops via `sudo` with the password in vault
|
||||||
action on `.30`** to add `ANTHROPIC_API_KEY` (and optional `ANTHROPIC_MODEL`) to
|
`infrastructure/gururmm-server` `credentials.password` (SSH via `~/.ssh/gururmm-physical`).
|
||||||
`/opt/gururmm/.env` and restart `gururmm-server`.
|
Shipped: gururmm `c869e4d` → CI redeployed the binary; `ANTHROPIC_API_KEY` appended to
|
||||||
|
`/opt/gururmm/.env`; `gururmm-server` restarted; `/api/logs/analyze` verified end-to-end
|
||||||
|
(1500 logs → 10 findings in 24s). **Migration note:** the key lives in `.30`'s local
|
||||||
|
`.env`, not the repo — already on the physical `.30`, so nothing to re-add.
|
||||||
|
|||||||
@@ -37,13 +37,14 @@ mgmt IPs; decommission the old VM after a stability soak (VM parked on .46, powe
|
|||||||
for rollback -- do NOT delete yet).
|
for rollback -- do NOT delete yet).
|
||||||
|
|
||||||
|
|
||||||
The GuruRMM server/build-pipeline is being migrated from the VM (172.16.3.30, slow
|
**History (pre-cutover — now DONE, retained for context).** The GuruRMM server/build-pipeline
|
||||||
rotational-backed disk — the cause of the WAL-fsync pool timeouts) to a **physical box**.
|
ran on a **VM** at 172.16.3.30 (slow rotational-backed disk — the WAL-fsync pool-timeout cause)
|
||||||
|
and was migrated to a **physical box**, which took over the 172.16.3.30 IP at cutover
|
||||||
New box (as of 2026-06-10): **172.16.1.231** (TEMPORARY IP — will become 172.16.3.30 at
|
(2026-06-11). During provisioning (2026-06-10) the physical box was briefly at temp DHCP IP
|
||||||
cutover), hostname `gururmm`, **Ubuntu 26.04 LTS**. SSH: dedicated ed25519 key
|
**172.16.1.231**; that IP is no longer used. hostname `gururmm`, **Ubuntu 26.04 LTS**. SSH:
|
||||||
`~/.ssh/gururmm-physical` (alias `gururmm-new`), vault `infrastructure/gururmm-server-physical`
|
dedicated ed25519 key `~/.ssh/gururmm-physical` to `guru@172.16.3.30`, vault
|
||||||
(also holds the initial `guru` password). sudo needs that password (`sudo -S`), not passwordless.
|
`infrastructure/gururmm-server-physical` (SSH key + initial `guru` password). sudo needs that
|
||||||
|
password (`sudo -S`), not passwordless.
|
||||||
|
|
||||||
**Drives (storage optimized 2026-06-10):**
|
**Drives (storage optimized 2026-06-10):**
|
||||||
- **SSD `sda`** (Samsung 860, 929 GB) = HOT tier. Installer had left root at only 100 GB;
|
- **SSD `sda`** (Samsung 860, 929 GB) = HOT tier. Installer had left root at only 100 GB;
|
||||||
@@ -79,4 +80,4 @@ re-point (the `http://172.16.3.30:8001` refs + Cloudflare→pfSense→.30 path a
|
|||||||
`pg_dumpall --globals-only` + `pg_dump -Fc`/`pg_restore -j` (14→16, schema as-is — storage tiering
|
`pg_dumpall --globals-only` + `pg_dump -Fc`/`pg_restore -j` (14→16, schema as-is — storage tiering
|
||||||
is a SEPARATE later task). Full runbook (Gate-A pre-flight, cutover from CONSOLE, ARP flush,
|
is a SEPARATE later task). Full runbook (Gate-A pre-flight, cutover from CONSOLE, ARP flush,
|
||||||
credential-decrypt gate, PONR=first-agent-reconnect, rollback): `projects/msp-tools/guru-rmm/docs/
|
credential-decrypt gate, PONR=first-agent-reconnect, rollback): `projects/msp-tools/guru-rmm/docs/
|
||||||
HOST_MIGRATION_RUNBOOK.md`. NOT yet executed — needs a window + the Gate-A unknowns closed.
|
HOST_MIGRATION_RUNBOOK.md`. EXECUTED and COMPLETE 2026-06-11 (see the top of this note).
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ ACG office LAN is 172.16.0.0/22, routed via Tailscale through pfSense node `pfse
|
|||||||
| pfSense | 172.16.0.1 | port 2248, user admin | Router, DNS (Unbound), Tailscale subnet router |
|
| pfSense | 172.16.0.1 | port 2248, user admin | Router, DNS (Unbound), Tailscale subnet router |
|
||||||
| Jupiter | 172.16.3.20 | port 22, user root | Unraid NAS — all VMs + Docker containers |
|
| Jupiter | 172.16.3.20 | port 22, user root | Unraid NAS — all VMs + Docker containers |
|
||||||
| Uranus | 172.16.3.21 | (no key) | OwnCloud additional storage only — NOT a proxy |
|
| Uranus | 172.16.3.21 | (no key) | OwnCloud additional storage only — NOT a proxy |
|
||||||
| GuruRMM VM | 172.16.3.30 | port 22, user guru | Linux VM on Jupiter — GuruRMM, Coord API, MariaDB, Gitea |
|
| GuruRMM | 172.16.3.30 | port 22, user guru | PHYSICAL box (Ubuntu 26.04) — took the .30 IP when the Jupiter VM was retired 2026-06-11; runs GuruRMM, Coord API, MariaDB/PostgreSQL. Old VM parked at .46 (rollback) |
|
||||||
| Pluto | 172.16.3.36 | (Windows) | Windows Server 2019 VM on Jupiter — MSI build server |
|
| Pluto | 172.16.3.36 | (Windows) | Windows Server 2019 VM on Jupiter — MSI build server |
|
||||||
|
|
||||||
**Why:** How to apply: check these IPs before assuming what's where. .21 is NOT the Seafile proxy — NPM on .20 is.
|
**Why:** How to apply: check these IPs before assuming what's where. .21 is NOT the Seafile proxy — NPM on .20 is.
|
||||||
|
|||||||
@@ -36,7 +36,7 @@ type: reference
|
|||||||
- Detail: [[infra_office_network]].
|
- Detail: [[infra_office_network]].
|
||||||
|
|
||||||
### gururmm-server (172.16.3.30, hostname `gururmm`)
|
### gururmm-server (172.16.3.30, hostname `gururmm`)
|
||||||
- **What:** Linux VM on Jupiter. THE workhorse — runs MariaDB, PostgreSQL, ClaudeTools API (`:8001`), GuruRMM API (`:3001`), GuruConnect server (`:3002`), coord API, Gitea Actions runner, build pipeline, webhook.
|
- **What:** PHYSICAL box (Ubuntu 26.04), NOT a VM — took the .30 IP when the Jupiter VM was retired 2026-06-11 (old VM parked at 172.16.3.46 as rollback). THE workhorse — runs MariaDB, PostgreSQL, ClaudeTools API (`:8001`), GuruRMM API (`:3001`), GuruConnect server (`:3002`), coord API, Gitea Actions runner, build pipeline, webhook.
|
||||||
- **Default:** `ssh guru@172.16.3.30`. Password `infrastructure/gururmm-server.sops.yaml` `credentials.password`. User is **`guru`** NOT `mike`. Home `/home/guru/`.
|
- **Default:** `ssh guru@172.16.3.30`. Password `infrastructure/gururmm-server.sops.yaml` `credentials.password`. User is **`guru`** NOT `mike`. Home `/home/guru/`.
|
||||||
- **Gotcha:** for cargo/protoc/PATH, use a **login shell**: `ssh guru@172.16.3.30 'bash -lc "..."'`. Non-interactive shell doesn't source `~/.profile` and these look "missing".
|
- **Gotcha:** for cargo/protoc/PATH, use a **login shell**: `ssh guru@172.16.3.30 'bash -lc "..."'`. Non-interactive shell doesn't source `~/.profile` and these look "missing".
|
||||||
- **Layout:** repo at `/home/guru/gururmm`, build pipeline at `/opt/gururmm/` (auto-synced from repo `deploy/build-pipeline/` by `build-shared.sh`).
|
- **Layout:** repo at `/home/guru/gururmm`, build pipeline at `/opt/gururmm/` (auto-synced from repo `deploy/build-pipeline/` by `build-shared.sh`).
|
||||||
|
|||||||
Reference in New Issue
Block a user