sync: auto-sync from GURU-5070 at 2026-06-11 07:24:11

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-11 07:24:11
This commit is contained in:
2026-06-11 07:24:27 -07:00
parent 83133ddce3
commit ee1eba5f4c

View File

@@ -1,10 +1,30 @@
---
name: gururmm-physical-server-storage
description: New physical GuruRMM server (172.16.1.231) storage layout + hot/cold tiering plan for the migration off 172.16.3.30
description: Physical GuruRMM server (now IS 172.16.3.30) storage layout + hot/cold tiering; host migration COMPLETE 2026-06-11
metadata:
type: project
---
**MIGRATION COMPLETE (2026-06-11 ~07:20 MST).** The physical box now IS 172.16.3.30 and runs the
full stack: gururmm-server :3001, guruconnect :3002, coord/claudetools-api :8001, webhook :9000,
nginx :80, PostgreSQL 18, MariaDB 11.8, Grafana :3000, Prometheus :9090. Cred-decrypt verified
(MSP360 sync 62/0). Agents reconnected (162/212 within 15 min). SSH: `~/.ssh/gururmm-physical`
(alias `gururmm-new` -> .231 was the temp DHCP; box is now .30). sudo password = the vault `guru`
password, piped via `echo "$P" | sudo -S -p ""` (a bare `sudo -u postgres` with no prior sudo in
the SSH session fails with "a terminal is required").
**Cutover gotchas that bit us (see runbook):** (1) the box's nginx loaded a STALE config missing
`location /ws` -> agents got 404 on /ws -> `systemctl reload nginx` fixed it (always reload after
config placement). (2) Public ingress/TLS is **Nginx Proxy Manager on Jupiter 172.16.3.20**, NOT
local nginx (which is :80-only) -> NPM forwards to .30:80, no reconfig needed since .30 preserved.
(3) Prometheus TSDB WAL was copied mid-write -> `segments are not sequential` -> moved
`/var/lib/prometheus/metrics2/wal` aside (lost ~2h, blocks intact). (4) the `.30` IP swap used a
self-confirming detached netplan apply + a fresh `.47` mgmt IP (no stale-ARP baggage like `.30`);
the VM kept `.46` as an independent channel and released `.30`.
**Still pending post-cutover:** 7-day metrics/agent_logs backfill; Gitea runner -> Jupiter Docker
(Workstream B); drop `.47` (new box) + `.46` (VM) mgmt IPs; decommission the old VM after a
stability soak (VM is parked on .46, powered on, DATA PRISTINE for rollback -- do NOT delete yet).
The GuruRMM server/build-pipeline is being migrated from the VM (172.16.3.30, slow
rotational-backed disk — the cause of the WAL-fsync pool timeouts) to a **physical box**.