From da411b0f4895627655d9826148f57cb87224564f Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Tue, 23 Jun 2026 05:37:43 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-06-23 05:36:49 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-23 05:36:49 --- ...-howard-gururmm-uninstall-polish-deploy.md | 174 ++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 session-logs/2026-06/2026-06-23-howard-gururmm-uninstall-polish-deploy.md diff --git a/session-logs/2026-06/2026-06-23-howard-gururmm-uninstall-polish-deploy.md b/session-logs/2026-06/2026-06-23-howard-gururmm-uninstall-polish-deploy.md new file mode 100644 index 00000000..850af355 --- /dev/null +++ b/session-logs/2026-06/2026-06-23-howard-gururmm-uninstall-polish-deploy.md @@ -0,0 +1,174 @@ +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Resumed the GuruRMM remote software-uninstall work (SPEC-030) from a prior window. +Reconstructed state from `session-logs/2026-06/2026-06-22-howard-gururmm-software-uninstall.md`: +the engine + knowledge-catalog branch `feat/engine-bcu-improvements` (head `0e8323b`) was +pushed but NOT merged, held for more validation. Per Howard's go-ahead ("as long as it doesn't +affect the non-beta rmm site we can push"), mapped the merge-to-main deploy model +(merge = build+deploy trigger; agent/server/dashboard land on BETA, prod dashboard promotion +is a separate deliberate `promote-dashboard.sh`). Confirmed the one nuance: a single shared +prod API serves both beta and prod dashboards, so merging deploys the server binary + runs +migrations against the prod DB — but core endpoints are untouched, migrations are additive, +and the prod UI is unaffected until promotion. Ran local pre-merge verification (server +`cargo check`, dashboard build, migration collision check — 062 new, 061 identical to main), +then merged PR #49 to main. Verified the deploy live: authenticated against the internal API +over Tailscale and confirmed `GET /api/software/knowledge` returned 200 `[]` (new binary live, +migration 062 applied). + +Howard then reported a functional bug: running uninstall on several programs flashed a popup +window briefly, then nothing reported whether programs were removed; the per-row removable/unknown +labels worked but the action gave no feedback. Live forensics on the test agent showed ZERO +removal attempts recorded and ZERO knowledge entries despite a window flashing — proving the +engine ran partway but the result path recorded nothing. Howard redirected to a full audit + +polish of the uninstall function ("issues or missing functions not mapped like the unknown +flag"). Read all five layers (engine PS1, server `software.rs`, db `software_knowledge.rs` / +`software_removal.rs`, dashboard `SoftwareManager.tsx`, API `client.ts`) and produced a +seven-finding audit. + +Root cause (Finding A): the engine emits its JSON results array ONLY at the end of the batch, +so any single target throwing OUTSIDE `Invoke-Uninstall`'s inner try (under StrictMode + +`EAP=Stop`) propagates to the main `catch` -> `exit 3` -> no JSON -> server records nothing and +the dashboard shows no result, even though programs removed before the throw already ran (the +flash). Implemented fixes across all three layers on branch `fix/software-uninstall-polish`, +verified (server cargo check, dashboard tsc + build, engine PS parse + dry-run), committed +(`bd6dd27`), and merged+deployed via PR #50. The server came up healthy (internal `/health` 200) +but end-to-end behavioral re-test is blocked: the test box DESKTOP-MS42HNC went offline ~47 min +before the deploy (VM powered off), so any uninstall returns `503 agent offline`. + +## Key Decisions + +- **Merged PR #49 (engine-bcu + catalog) under Howard's conditional go.** Confirmed the + shared-server nuance explicitly before pushing; declined to promote the dashboard to prod. +- **Treated the no-feedback bug as an engine-robustness root cause, not a forensic hunt.** + The 0-attempts/0-knowledge + window-flash evidence was conclusive; stopped fighting the + GuruScan command-flood to retrieve the exact buried command record (cost > value). +- **Fixed at the engine first (always emit JSON), server second (tolerant parse).** Defense in + depth: even if a future engine path exits non-zero with partial results, the server now + records what came back instead of discarding the batch. +- **Per-device removal-history surfaced in the dashboard** rather than left as dead API methods + — the recorded attempts + logs (migration 061) were invisible; this is the "function not + mapped" Howard referenced. +- **Knowledge catalog cross-referenced onto inventory rows** ("the unknown flag"): a catalog- + confirmed `requires_ui` program is now excluded from silent bulk-select even with an uninstall + string; unknown/silent surfaced per-row. +- **Branched `fix/software-uninstall-polish` off the NEW main** (post-PR#49) rather than the old + feature branch, to avoid re-introducing already-merged work. + +## Problems Encountered + +- **Test-box agent offline blocks end-to-end verification.** DESKTOP-MS42HNC last seen + `2026-06-23T05:36:45Z` (~47 min before the PR#50 deploy) — VM powered off/asleep. The + software list/uninstall endpoints return `503 agent offline` until it reconnects. +- **No clean remote way to confirm the exact binary-swap moment.** Cloudflare blocks dashboard + fetches (JS challenge -> 403) even with a browser UA, and there's no password-SSH tool on + Howard-Home (no plink/sshpass/Posh-SSH; only system OpenSSH) to tail `.30` build logs. Relied + on: pipeline version-bump commit landing + internal `/health` 200 + auto-rollback safety net. +- **Gitea `vault.sh get-field credentials.api-token` mis-resolves to 4 chars** (recurring). Read + the raw token by decrypting `D:/vault/services/gitea.sops.yaml` and grepping the `api-token` + line. Token used via PowerShell `Invoke-RestMethod` against the public Gitea API (internal + `172.16.3.20:3000` unreachable until Tailscale came up). +- **Git-Bash heredoc mangled `\\` to single backslash** in test-data JSON (even quoted `<<'EOF'`) + -> invalid JSON -> engine `exit 3` ("Unrecognized escape sequence"). Rebuilt the test targets + via PowerShell `ConvertTo-Json`. Logged as friction (errorlog, ref `feedback_tmp_path_windows`). +- **`Invoke-WebRequest` failed in NonInteractive mode** without `-UseBasicParsing`; switched to + `Invoke-RestMethod` for API calls. +- **Large `/api/commands?limit=400` pull timed out** (2 min) — base64 engine bodies + GuruScan + command flood make the list heavy. Abandoned the buried-record retrieval. + +## Configuration Changes + +Branch `fix/software-uninstall-polish` (guru-rmm submodule), commit `bd6dd27`, merged to main +via PR #50 (`e83b06c`): +- `agent/scripts/uninstall-engine.ps1` — per-target try/catch in the main loop; one bad target + becomes a `failed` result row instead of aborting the batch; JSON array ALWAYS emitted. +- `server/src/api/software.rs` — tolerant result extraction (parse results from stdout even on + non-zero exit; only fail when no array present); `-WindowStyle Hidden` on both dispatch + wrappers (list + uninstall); factored out `extract_json_array()`. +- `server/src/db/software_knowledge.rs` — `#[allow(dead_code)]` on `get_by_name` (kept for + future server-side row enrichment). +- `dashboard/src/components/SoftwareManager.tsx` — wired per-device removal-history panel + (removalStatus/resolveRemoval) with logs + dismiss; cross-referenced fleet catalog onto + inventory rows (requires_ui excluded from silent bulk-select; unknown/silent surfaced); + StatusBadge explicit `unknown`/`dry-run`. + +Already merged earlier this session (PR #49, `feat/engine-bcu-improvements` -> main `e83b06c`'s +parent line, merge `bce9e80`, version-bump `c399e70`): engine BCU improvements + knowledge +catalog (migration 062) + audit fixes — now live on the shared prod API. + +Parent claudetools: `errorlog.md` (friction entry — heredoc JSON mangling). + +## Credentials & Secrets + +- No new credentials created. Reused (all already vaulted): + - GuruRMM API admin: `claude-api@azcomputerguru.com` / `ClaudeAPI2026!@#` — vault + `infrastructure/gururmm-server.sops.yaml` fields `credentials.gururmm-api.admin-email` / + `admin-password`. Used for API auth during deploy verification. + - Gitea API token: `9b1da4b79a38ef782268341d25a4b6880572063f` — vault + `services/gitea.sops.yaml` field `credentials.api-token` (the `get-field` helper mis-resolves + it; read via raw `sops -d`). Used for PR #49/#50 create+merge via the public Gitea API. + - GuruRMM build host SSH: `guru` / `Paper123!@#-rmm` (sudo same) — vault + `infrastructure/gururmm-server.sops.yaml`. Not used (no password-SSH tool available). + +## Infrastructure & Servers + +- GuruRMM API/server + build host: `172.16.3.30` (Ubuntu 22.04, user `guru`, repo + `/home/guru/gururmm`). API on `:3001` (prod, shared by beta+prod dashboards). Internal + `/health` -> 200. Postgres `:5432` localhost-only (not reachable over Tailscale). DB + `gururmm` / user `gururmm` / pw `43617ebf7eb242e814ca9988cc4df5ad`. +- Public API: `https://rmm-api.azcomputerguru.com` (`/health` 200; SPA hosts 403 to non-browser). +- Beta dashboard `https://rmm-beta.azcomputerguru.com` (built from main). Prod dashboard + `https://rmm.azcomputerguru.com` (promoted deliberately — UNCHANGED this session). +- Gitea internal `172.16.3.20:3000` (unreachable until Tailscale up); public + `https://git.azcomputerguru.com` behind Cloudflare (Invoke-RestMethod works; curl blocked). +- Test box **DESKTOP-MS42HNC** (AZ Computer Guru / Howard-VM), agent id + `0de89b88-b21d-4647-ab64-96157ba87cc5` — OFFLINE since `2026-06-23T05:36:45Z`. +- Internal network (172.16.3.x) reachable only over **Tailscale** from Howard-Home. + +## Commands & Outputs + +- Local pre-merge verify: `bash .claude/skills/gururmm-build/scripts/verify.sh server --check` + (PASS), `... dashboard` (PASS), `... migrations` (062 next free, no collision). +- Deploy-live check: `POST /api/auth/login` then `GET /api/software/knowledge` -> 200 `[]`. +- Forensic: `GET /api/agents//software/removal-status` -> empty; `GET /api/software/knowledge` + -> count 0 (proved nothing recorded). `GET /api/commands?agent_id=&limit=60` -> all + GuruScan monitoring (one per ~37s); uninstall commands scrolled past, 400-pull timed out. +- Engine dry-run smoke test (PowerShell ConvertTo-Json targets): exit 0, one row per target — + `Fake MSI App` tier=msi, `Fake NSIS App` tier=silent, `No Metadata App` needs_interactive, + `GuruRMM Agent` refused. PS parse: `[OK] engine parses clean`. +- PR create+merge via public Gitea API (`Invoke-RestMethod`): PR #49 -> merge `bce9e80`; PR #50 + -> merge `e83b06c`; pipeline bumps `c399e70` / `49ea0ef`. +- `list err 503` on software list post-deploy = agent offline (server returns 503 when the agent + isn't WS-connected). + +## Pending / Incomplete Tasks + +- **Power on DESKTOP-MS42HNC** and let the agent reconnect — REQUIRED to verify the fix + end-to-end. Then either Howard re-runs the uninstall from beta (expect per-program results + + the new "Removal history - this device" panel to populate), or Claude runs a controlled + single-program uninstall via the API (destructive — Howard names a throwaway). +- **Confirm the new server binary swapped** (couldn't pinpoint remotely) — implicitly proven once + a post-deploy uninstall returns non-empty results (old broken engine returned nothing). +- **Post-deploy catalog checks** still open from the prior session: verify the catalog populates + on a real uninstall + promote an unknown live on beta. +- **Prod dashboard promotion** NOT done (intentional). Promote deliberately later via + `sudo /opt/gururmm/promote-dashboard.sh --confirm` when the beta UI is validated. +- Follow-ons unchanged: GuruConnect SPEC-019, rip-and-replace Tier 1.4, Tier 1.5, Linux/macOS. + +## Reference Information + +- guru-rmm main head: `e83b06c` (merge PR #50), pipeline bump `49ea0ef`. Fix commit `bd6dd27` + on `fix/software-uninstall-polish`. +- Earlier merge: PR #49 `bce9e80`, bump `c399e70`. Pre-session main pin `1ec0256`. +- PRs: https://git.azcomputerguru.com/azcomputerguru/gururmm/pulls/49 and /50. +- Audit findings A-G (this session): A engine batch-abort (root cause, critical), B window flash, + C server intolerant of non-zero exit, D removal-status not mapped to UI, E inventory rows + ignore catalog ("unknown flag"), F StatusBadge missing unknown/dry-run, G `get_by_name` dead. +- Engine embed path: server `include_str!("../../../agent/scripts/uninstall-engine.ps1")`. +- Prior session log: `session-logs/2026-06/2026-06-22-howard-gururmm-software-uninstall.md`. +- Build model: `projects/msp-tools/guru-rmm/docs/BUILD.md`; pipeline + `deploy/build-pipeline/README.md`. Promote: `/opt/gururmm/promote-dashboard.sh`.