sync: auto-sync from GURU-5070 at 2026-06-11 07:24:11

Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-11 07:24:11
2026-06-11 07:24:27 -07:00
parent 83133ddce3
commit ee1eba5f4c
1 changed files with 21 additions and 1 deletions
--- a/.claude/memory/gururmm-physical-server-storage.md
+++ b/.claude/memory/gururmm-physical-server-storage.md
@@ -1,10 +1,30 @@
 ---
 name: gururmm-physical-server-storage
-description: New physical GuruRMM server (172.16.1.231) storage layout + hot/cold tiering plan for the migration off 172.16.3.30
+description: Physical GuruRMM server (now IS 172.16.3.30) storage layout + hot/cold tiering; host migration COMPLETE 2026-06-11
 metadata:
  type: project
 ---

+**MIGRATION COMPLETE (2026-06-11 ~07:20 MST).** The physical box now IS 172.16.3.30 and runs the
+full stack: gururmm-server :3001, guruconnect :3002, coord/claudetools-api :8001, webhook :9000,
+nginx :80, PostgreSQL 18, MariaDB 11.8, Grafana :3000, Prometheus :9090. Cred-decrypt verified
+(MSP360 sync 62/0). Agents reconnected (162/212 within 15 min). SSH: `~/.ssh/gururmm-physical`
+(alias `gururmm-new` -> .231 was the temp DHCP; box is now .30). sudo password = the vault `guru`
+password, piped via `echo "$P" | sudo -S -p ""` (a bare `sudo -u postgres` with no prior sudo in
+the SSH session fails with "a terminal is required").
+**Cutover gotchas that bit us (see runbook):** (1) the box's nginx loaded a STALE config missing
+`location /ws` -> agents got 404 on /ws -> `systemctl reload nginx` fixed it (always reload after
+config placement). (2) Public ingress/TLS is **Nginx Proxy Manager on Jupiter 172.16.3.20**, NOT
+local nginx (which is :80-only) -> NPM forwards to .30:80, no reconfig needed since .30 preserved.
+(3) Prometheus TSDB WAL was copied mid-write -> `segments are not sequential` -> moved
+`/var/lib/prometheus/metrics2/wal` aside (lost ~2h, blocks intact). (4) the `.30` IP swap used a
+self-confirming detached netplan apply + a fresh `.47` mgmt IP (no stale-ARP baggage like `.30`);
+the VM kept `.46` as an independent channel and released `.30`.
+**Still pending post-cutover:** 7-day metrics/agent_logs backfill; Gitea runner -> Jupiter Docker
+(Workstream B); drop `.47` (new box) + `.46` (VM) mgmt IPs; decommission the old VM after a
+stability soak (VM is parked on .46, powered on, DATA PRISTINE for rollback -- do NOT delete yet).
+
+
 The GuruRMM server/build-pipeline is being migrated from the VM (172.16.3.30, slow
 rotational-backed disk — the cause of the WAL-fsync pool timeouts) to a **physical box**.