sync: auto-sync from GURU-5070 at 2026-05-29 16:42:08

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-29 16:42:08
This commit is contained in:
2026-05-29 16:42:13 -07:00
parent 36fd44a8c8
commit 2237cb911e

View File

@@ -374,3 +374,76 @@ curl -s http://localhost:11434/api/chat \
- LogAnalysis: `projects/msp-tools/guru-rmm/dashboard/src/components/LogAnalysis.tsx`
- Roadmap spec: `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` lines 644-752
- Coord message: notified Howard about 0.3.36 deployment
---
## Update: 16:05 PT — UISP/UNMS data migration (2.4.206 -> 3.0.147) on Jupiter
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
### Session Summary
Migrated the legacy UISP/UNMS data into the freshly-deployed UISP 3.0.147 Docker container (`DockerUISP`) on Jupiter (172.16.3.20). The new container had been deployed empty; the original install was dead. Investigation found the original PostgreSQL 13 cluster had catalog corruption (38 orphaned `pg_rewrite` matview rules, a corrupt view-definition in `pg_toast_2618`, and two missing index data files). This corruption is what aborted the original in-container `pg_upgrade` (PG13->PG17), since `pg_upgrade` runs an internal schema dump that hits the same sanity-check failure — leaving the half-finished `data_old`/`data_new` dirs and a non-booting install.
Diagnosed and repaired on throwaway copies. The table heaps were intact; only the catalog and one telemetry table (`device_interface_statistics`, missing data file) were damaged. Deleted all orphaned `pg_rewrite` entries, REINDEXed the missing indexes from intact heaps, and extracted a clean data-only dump (`uisp-data.dump`, 417 KB) plus a full clean dump (`uisp-full-2.4.206.dump`, 1.18 MB). Real config fully recovered: 29 sites, 59 devices, 215 interfaces, 8 CRM clients/services, 2 users, master key (`aesKey`), per-device keys.
Determined the old data was UISP 2.4.206 (per `unms.version`) and confirmed 3.0.147 is a direct continuation of the same Sequelize migration lineage (shared head `20250317150507`; only ~5 newer telemetry migrations, incl. TimescaleDB hypertables). Stood up a clean 2.4.206 container, loaded the repaired dump into it (truncate + data-only `pg_restore --disable-triggers`, excluding `SequelizeMeta`), producing a working 2.4.206 install with the full fleet. Then used the nico640 image's own `/migrate.sh` to perform the real `pg_upgrade` 13->17 (succeeded cleanly on the corruption-free rebuild) followed by UISP's 2.4.206->3.0.147 app migrations (text->jsonb, hypertable conversion — finished in 24.6 s).
Cut the upgraded `/config` into the production `DockerUISP` container (reversible: backed up the prior config to `DockerUISP.preempty-bak`, preserved the real Let's Encrypt cert). Production is live at `https://unms.azcomputerguru.com` (-> NPM -> 172.16.3.25) serving 3.0.147 with 29 sites / 51 devices / 8 clients. Verified device reconnection: 11 devices live-connected, 28 of 51 with valid working keys; the 22 "unauthorized" are 21 discovered LAN IPs (discovery noise, not adoptable agents) plus one real device (`LaHC-Casita`, LBE-5AC-Gen2) needing re-adoption. Cleaned up all throwaway containers/dirs.
### Key Decisions
- Chose native version upgrade (let UISP migrate) over raw data-load into 3.0.147 or fresh start — preserves the master key so the managed fleet auto-reconnects. Raw data-only load into 3.0.147 was tested and rejected: 47 errors due to `setting.value` text->jsonb and other 3.0 schema changes.
- Repaired the corrupt cluster via catalog surgery (delete orphaned `pg_rewrite` rows, REINDEX) rather than attempting a full schema dump — the corruption is catalog-only; table heaps were intact.
- Routed the upgrade through a clean 2.4.206 install + the image's built-in `/migrate.sh`, so `pg_upgrade` ran on a corruption-free cluster (the exact step that killed the original).
- Cut over by swapping data into the existing `DockerUISP` container (keeps Unraid's container definition, IP 172.16.3.25, and the real LE cert) rather than repointing networking to a new container.
### Problems Encountered
- `pg_dump`/`pg_upgrade` failed sanity check (orphaned `pg_rewrite` parent OIDs) -> deleted all orphaned rewrite rules on a throwaway copy.
- Corrupt view-definition TOAST blocked full schema dump; `DROP VIEW` failed with catalog heap errors -> used data-only dump (never calls `pg_get_viewdef`).
- `device` table unreadable (missing `device_model_index` file 202207703) -> `REINDEX` rebuilt it from the intact heap; one telemetry table (`device_interface_statistics`, missing file) excluded from the dump.
- Direct 2.4.206 data-only load into 3.0.147 produced 47 errors (text->jsonb type change on `setting.value`, etc.) -> abandoned in favor of the native upgrade path.
- Device "Decryption failed using master key" log spam -> determined non-issue: managed radios reconnected (11 live, 28 with keys); errors are from discovered LAN IPs and one un-adopted device.
### Configuration Changes
- Production: `/mnt/user/appdata/DockerUISP/` (Jupiter) — `/config` replaced with the migrated 3.0.147 data (PG17). Real LE cert preserved.
- Backup/rollback: `/mnt/user/appdata/DockerUISP.preempty-bak/` (pre-cutover empty 3.0.147 config, ~5.2 GB — safe to delete after validation).
- Recovery dumps: `/mnt/user/appdata/uisp-recovery/uisp-data.dump` (417 KB), `uisp-full-2.4.206.dump` (1.18 MB).
- Removed throwaway dirs: `uisp-migrate`, `uisp30test`, `uisp-upgrade`, `pg13b`, `pg13dump`.
### Credentials & Secrets
- UISP master key `aesKey` (setting table): 48-char value `pF0so5WI...56OKHz` (jsonb-wrapped in 3.0.147). Preserved through migration; do not regenerate.
- UISP admin user accounts (2) restored from old data with original credentials — log in via the portal to manage.
### Infrastructure & Servers
- Jupiter: 172.16.3.20 (Unraid host, NPM, docker).
- Prod UISP container: `DockerUISP`, image `nico640/docker-unms:latest` (3.0.147), network `br0`, dedicated IP `172.16.3.25`, PG17 + TimescaleDB internal.
- Public: `https://unms.azcomputerguru.com` -> Cloudflare DNS -> office Cox -> NPM (172.16.3.20) -> 172.16.3.25. Real LE cert CN=unms.azcomputerguru.com.
- Old `/config` (preserved, untouched): `/mnt/user/appdata/unms/` (cert, siridb, redis, rabbitmq, unms app-data, postgres_13/9.6/ascii).
### Commands & Outputs
- Catalog repair: `delete from pg_rewrite r where not exists (select 1 from pg_class c where c.oid=r.ev_class)` (DELETE 38); `reindex index unms.device_model_index`.
- Data load: `pg_restore -l dump | grep -v SequelizeMeta > rl.list; pg_restore --data-only --disable-triggers --no-owner -L rl.list dump`.
- Upgrade: nico640 `/migrate.sh <FROM_VERSION>` auto-invoked on PG version mismatch -> `pg_upgrade` 13->17; app migrations "Migrations finished in 24.586s"; PG_VERSION 13->17.
- Verify: `https://unms.azcomputerguru.com -> 302`; `select count from unms.site/device, ucrm.client = 29|51|8`.
### Pending / Incomplete Tasks
- Re-adopt `LaHC-Casita` (LBE-5AC-Gen2, fw 8.7.19) from the UISP UI when reachable.
- Optional: delete `/mnt/user/appdata/DockerUISP.preempty-bak` (~5.2 GB) after validating production.
- Optional: TimescaleDB background-worker warnings at startup (`out of background workers`) — bump `max_worker_processes` if compression/retention jobs lag.
- 17 authorized-but-not-connected devices should reconnect as they phone home; confirm over the next day.
### Reference Information
- Image upgrade mechanism: `/migrate.sh` (old PG binaries under `/postgres/<ver>/bin`); `init-postgres` does initdb-or-use-as-is, version-mismatch triggers migrate.
- Old data version: UISP 2.4.206; new: 3.0.147. Shared migration head: `20250317150507-change_error_data_type_to_bigint`.
- `setting.value` is `jsonb` in 3.0.147; extract with `value#>>'{}'`.