diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index 01aa35f..5abc1f5 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -23,6 +23,7 @@ - [Gitea git-op latency](reference_gitea_git_op_latency.md) — SSH (.20:2222) is SLOWEST (~1.5s); internal HTTP+token ~0.55s; SOPS lookup only ~0.33s. Don't switch to SSH for speed. Gitea SSH is .20:2222 (API ssh_url .21 is wrong). - [GuruRMM technical reference](reference_gururmm.md) — Server (172.16.3.30) layout + downloads dir `/var/www/gururmm/downloads` + `.channel` sidecar rollout control (stable/beta) + privileged server access via the server's OWN root RMM agent (hostname `gururmm`, no SSH needed; plink fallback) + API + `context=user_session` (WTS impersonation) + build-pipeline vendoring at `deploy/build-pipeline/` + Linux agent systemd sandbox trap. - [RMM agent update model](rmm-agent-update-model.md) — Agent updates are server-PUSH on heartbeat (no self-poll); available versions = filesystem scan needing a `.sha256`; promote flips `.channel` sidecars beta→stable globally. Two stranders: beta-first freezes stable until an explicit promote; agents older than ~0.6.50 re-enroll with a NEW device_id/agent row when updated. +- [GuruRMM physical server storage](gururmm-physical-server-storage.md) — New box 172.16.1.231 (temp IP→will be .30), Ubuntu 26.04, ssh key `gururmm-physical`/alias `gururmm-new`. SSD (915G root) = HOT (PG default tablespace + WAL + builds); HDD ext4 at `/data` = COLD (`gururmm_cold` PG tablespace for aged `agent_logs` partitions + downloads + backups + archive). The #3 retention answer. - [Trebesch DESKTOP-QNP3ON5 shell replacement](reference_trebesch_qnp3on5.md) — AT Trebesch box runs an Explorer shell replacement; explorer.exe owner check returns blank — use Win32_ComputerSystem.UserName. GuruRMM SWIFT-LION-2892. ## Users diff --git a/.claude/memory/gururmm-physical-server-storage.md b/.claude/memory/gururmm-physical-server-storage.md new file mode 100644 index 0000000..b05b0c4 --- /dev/null +++ b/.claude/memory/gururmm-physical-server-storage.md @@ -0,0 +1,50 @@ +--- +name: gururmm-physical-server-storage +description: New physical GuruRMM server (172.16.1.231) storage layout + hot/cold tiering plan for the migration off 172.16.3.30 +metadata: + type: project +--- + +The GuruRMM server/build-pipeline is being migrated from the VM (172.16.3.30, slow +rotational-backed disk — the cause of the WAL-fsync pool timeouts) to a **physical box**. + +New box (as of 2026-06-10): **172.16.1.231** (TEMPORARY IP — will become 172.16.3.30 at +cutover), hostname `gururmm`, **Ubuntu 26.04 LTS**. SSH: dedicated ed25519 key +`~/.ssh/gururmm-physical` (alias `gururmm-new`), vault `infrastructure/gururmm-server-physical` +(also holds the initial `guru` password). sudo needs that password (`sudo -S`), not passwordless. + +**Drives (storage optimized 2026-06-10):** +- **SSD `sda`** (Samsung 860, 929 GB) = HOT tier. Installer had left root at only 100 GB; + extended the LV into the full VG → **root is now ~915 GB**. Holds: OS, Postgres DEFAULT + tablespace (live/recent data) + WAL, cargo build targets, `/opt/gururmm`. Fast fsync here is + the real fix for the pool-timeout root cause (could even revert `synchronous_commit=on`). +- **HDD `sdb`** (WD 1 TB, spinning) = COLD tier. Old NTFS "Data2" (504 GB, user confirmed + already backed up) wiped → **ext4, mounted at `/data`** (fstab by UUID, `noatime`). Dirs: + `/data/gururmm/{pgcold, downloads, backups, archive}`. + +**Cold-storage isolation (built at migration — needs PG running):** +- `CREATE TABLESPACE gururmm_cold LOCATION '/data/gururmm/pgcold'` (chown the dir + postgres:postgres first). +- Time-partition `agent_logs` (by month). Recent partitions on SSD default tablespace (hot + write path: the batched multi-row INSERT + heartbeats). Nightly job `ALTER TABLE + agent_logs_YYYYMM SET TABLESPACE gururmm_cold` ages old partitions onto the HDD (still + queryable for signatures/build-correlation). Past retention horizon: pg_dump partition to + `/data/gururmm/archive` (compressed) then DROP. +- `downloads` (build artifacts, served by nginx + written by pipeline) and `backups` + (nightly pg_dump) also live on `/data`. + +This is the concrete answer to the deferred "#3 log retention/archival" discussion. See +[[rmm-agent-update-model]] (the downloads dir is the update artifact source) and the WAL-fix +context (synchronous_commit=off + pool→30 applied to the OLD VM). + +**Migration architecture (ratified 2026-06-10, via a 2-round Gemini+Grok panel).** The VM +`172.16.3.30` is a kitchen-sink host (GuruRMM + GuruConnect + coord API :8001 + Gitea runner + +Grafana/Prometheus + MariaDB; PG 14, 5.4 GB gururmm DB). Decision: physical box **becomes +`172.16.3.30`** and runs **everything EXCEPT the Gitea runner** (which becomes a Docker container +on Jupiter `.20`); VM retired. (MariaDB MIGRATES — Gate-A found it backs the coord API's `claudetools` +DB at localhost:3306, NOT droppable.) Keeping `.30` + coord on physical means NO fleet-wide +re-point (the `http://172.16.3.30:8001` refs + Cloudflare→pfSense→.30 path are unchanged). PG via +`pg_dumpall --globals-only` + `pg_dump -Fc`/`pg_restore -j` (14→16, schema as-is — storage tiering +is a SEPARATE later task). Full runbook (Gate-A pre-flight, cutover from CONSOLE, ARP flush, +credential-decrypt gate, PONR=first-agent-reconnect, rollback): `projects/msp-tools/guru-rmm/docs/ +HOST_MIGRATION_RUNBOOK.md`. NOT yet executed — needs a window + the Gate-A unknowns closed. diff --git a/projects/msp-tools/guru-rmm b/projects/msp-tools/guru-rmm index f68bbbe..12a6445 160000 --- a/projects/msp-tools/guru-rmm +++ b/projects/msp-tools/guru-rmm @@ -1 +1 @@ -Subproject commit f68bbbe8c093889e0c5239a0b5e14f20adaaefbd +Subproject commit 12a644548fadb012091d3bc64bbc7c5a1a41a44a