sync: auto-sync from HOWARD-HOME at 2026-06-23 05:36:49
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-23 05:36:49
This commit is contained in:
@@ -0,0 +1,174 @@
|
||||
## User
|
||||
- **User:** Howard Enos (howard)
|
||||
- **Machine:** Howard-Home
|
||||
- **Role:** tech
|
||||
|
||||
## Session Summary
|
||||
|
||||
Resumed the GuruRMM remote software-uninstall work (SPEC-030) from a prior window.
|
||||
Reconstructed state from `session-logs/2026-06/2026-06-22-howard-gururmm-software-uninstall.md`:
|
||||
the engine + knowledge-catalog branch `feat/engine-bcu-improvements` (head `0e8323b`) was
|
||||
pushed but NOT merged, held for more validation. Per Howard's go-ahead ("as long as it doesn't
|
||||
affect the non-beta rmm site we can push"), mapped the merge-to-main deploy model
|
||||
(merge = build+deploy trigger; agent/server/dashboard land on BETA, prod dashboard promotion
|
||||
is a separate deliberate `promote-dashboard.sh`). Confirmed the one nuance: a single shared
|
||||
prod API serves both beta and prod dashboards, so merging deploys the server binary + runs
|
||||
migrations against the prod DB — but core endpoints are untouched, migrations are additive,
|
||||
and the prod UI is unaffected until promotion. Ran local pre-merge verification (server
|
||||
`cargo check`, dashboard build, migration collision check — 062 new, 061 identical to main),
|
||||
then merged PR #49 to main. Verified the deploy live: authenticated against the internal API
|
||||
over Tailscale and confirmed `GET /api/software/knowledge` returned 200 `[]` (new binary live,
|
||||
migration 062 applied).
|
||||
|
||||
Howard then reported a functional bug: running uninstall on several programs flashed a popup
|
||||
window briefly, then nothing reported whether programs were removed; the per-row removable/unknown
|
||||
labels worked but the action gave no feedback. Live forensics on the test agent showed ZERO
|
||||
removal attempts recorded and ZERO knowledge entries despite a window flashing — proving the
|
||||
engine ran partway but the result path recorded nothing. Howard redirected to a full audit +
|
||||
polish of the uninstall function ("issues or missing functions not mapped like the unknown
|
||||
flag"). Read all five layers (engine PS1, server `software.rs`, db `software_knowledge.rs` /
|
||||
`software_removal.rs`, dashboard `SoftwareManager.tsx`, API `client.ts`) and produced a
|
||||
seven-finding audit.
|
||||
|
||||
Root cause (Finding A): the engine emits its JSON results array ONLY at the end of the batch,
|
||||
so any single target throwing OUTSIDE `Invoke-Uninstall`'s inner try (under StrictMode +
|
||||
`EAP=Stop`) propagates to the main `catch` -> `exit 3` -> no JSON -> server records nothing and
|
||||
the dashboard shows no result, even though programs removed before the throw already ran (the
|
||||
flash). Implemented fixes across all three layers on branch `fix/software-uninstall-polish`,
|
||||
verified (server cargo check, dashboard tsc + build, engine PS parse + dry-run), committed
|
||||
(`bd6dd27`), and merged+deployed via PR #50. The server came up healthy (internal `/health` 200)
|
||||
but end-to-end behavioral re-test is blocked: the test box DESKTOP-MS42HNC went offline ~47 min
|
||||
before the deploy (VM powered off), so any uninstall returns `503 agent offline`.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Merged PR #49 (engine-bcu + catalog) under Howard's conditional go.** Confirmed the
|
||||
shared-server nuance explicitly before pushing; declined to promote the dashboard to prod.
|
||||
- **Treated the no-feedback bug as an engine-robustness root cause, not a forensic hunt.**
|
||||
The 0-attempts/0-knowledge + window-flash evidence was conclusive; stopped fighting the
|
||||
GuruScan command-flood to retrieve the exact buried command record (cost > value).
|
||||
- **Fixed at the engine first (always emit JSON), server second (tolerant parse).** Defense in
|
||||
depth: even if a future engine path exits non-zero with partial results, the server now
|
||||
records what came back instead of discarding the batch.
|
||||
- **Per-device removal-history surfaced in the dashboard** rather than left as dead API methods
|
||||
— the recorded attempts + logs (migration 061) were invisible; this is the "function not
|
||||
mapped" Howard referenced.
|
||||
- **Knowledge catalog cross-referenced onto inventory rows** ("the unknown flag"): a catalog-
|
||||
confirmed `requires_ui` program is now excluded from silent bulk-select even with an uninstall
|
||||
string; unknown/silent surfaced per-row.
|
||||
- **Branched `fix/software-uninstall-polish` off the NEW main** (post-PR#49) rather than the old
|
||||
feature branch, to avoid re-introducing already-merged work.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **Test-box agent offline blocks end-to-end verification.** DESKTOP-MS42HNC last seen
|
||||
`2026-06-23T05:36:45Z` (~47 min before the PR#50 deploy) — VM powered off/asleep. The
|
||||
software list/uninstall endpoints return `503 agent offline` until it reconnects.
|
||||
- **No clean remote way to confirm the exact binary-swap moment.** Cloudflare blocks dashboard
|
||||
fetches (JS challenge -> 403) even with a browser UA, and there's no password-SSH tool on
|
||||
Howard-Home (no plink/sshpass/Posh-SSH; only system OpenSSH) to tail `.30` build logs. Relied
|
||||
on: pipeline version-bump commit landing + internal `/health` 200 + auto-rollback safety net.
|
||||
- **Gitea `vault.sh get-field credentials.api-token` mis-resolves to 4 chars** (recurring). Read
|
||||
the raw token by decrypting `D:/vault/services/gitea.sops.yaml` and grepping the `api-token`
|
||||
line. Token used via PowerShell `Invoke-RestMethod` against the public Gitea API (internal
|
||||
`172.16.3.20:3000` unreachable until Tailscale came up).
|
||||
- **Git-Bash heredoc mangled `\\` to single backslash** in test-data JSON (even quoted `<<'EOF'`)
|
||||
-> invalid JSON -> engine `exit 3` ("Unrecognized escape sequence"). Rebuilt the test targets
|
||||
via PowerShell `ConvertTo-Json`. Logged as friction (errorlog, ref `feedback_tmp_path_windows`).
|
||||
- **`Invoke-WebRequest` failed in NonInteractive mode** without `-UseBasicParsing`; switched to
|
||||
`Invoke-RestMethod` for API calls.
|
||||
- **Large `/api/commands?limit=400` pull timed out** (2 min) — base64 engine bodies + GuruScan
|
||||
command flood make the list heavy. Abandoned the buried-record retrieval.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
Branch `fix/software-uninstall-polish` (guru-rmm submodule), commit `bd6dd27`, merged to main
|
||||
via PR #50 (`e83b06c`):
|
||||
- `agent/scripts/uninstall-engine.ps1` — per-target try/catch in the main loop; one bad target
|
||||
becomes a `failed` result row instead of aborting the batch; JSON array ALWAYS emitted.
|
||||
- `server/src/api/software.rs` — tolerant result extraction (parse results from stdout even on
|
||||
non-zero exit; only fail when no array present); `-WindowStyle Hidden` on both dispatch
|
||||
wrappers (list + uninstall); factored out `extract_json_array()`.
|
||||
- `server/src/db/software_knowledge.rs` — `#[allow(dead_code)]` on `get_by_name` (kept for
|
||||
future server-side row enrichment).
|
||||
- `dashboard/src/components/SoftwareManager.tsx` — wired per-device removal-history panel
|
||||
(removalStatus/resolveRemoval) with logs + dismiss; cross-referenced fleet catalog onto
|
||||
inventory rows (requires_ui excluded from silent bulk-select; unknown/silent surfaced);
|
||||
StatusBadge explicit `unknown`/`dry-run`.
|
||||
|
||||
Already merged earlier this session (PR #49, `feat/engine-bcu-improvements` -> main `e83b06c`'s
|
||||
parent line, merge `bce9e80`, version-bump `c399e70`): engine BCU improvements + knowledge
|
||||
catalog (migration 062) + audit fixes — now live on the shared prod API.
|
||||
|
||||
Parent claudetools: `errorlog.md` (friction entry — heredoc JSON mangling).
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- No new credentials created. Reused (all already vaulted):
|
||||
- GuruRMM API admin: `claude-api@azcomputerguru.com` / `ClaudeAPI2026!@#` — vault
|
||||
`infrastructure/gururmm-server.sops.yaml` fields `credentials.gururmm-api.admin-email` /
|
||||
`admin-password`. Used for API auth during deploy verification.
|
||||
- Gitea API token: `9b1da4b79a38ef782268341d25a4b6880572063f` — vault
|
||||
`services/gitea.sops.yaml` field `credentials.api-token` (the `get-field` helper mis-resolves
|
||||
it; read via raw `sops -d`). Used for PR #49/#50 create+merge via the public Gitea API.
|
||||
- GuruRMM build host SSH: `guru` / `Paper123!@#-rmm` (sudo same) — vault
|
||||
`infrastructure/gururmm-server.sops.yaml`. Not used (no password-SSH tool available).
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- GuruRMM API/server + build host: `172.16.3.30` (Ubuntu 22.04, user `guru`, repo
|
||||
`/home/guru/gururmm`). API on `:3001` (prod, shared by beta+prod dashboards). Internal
|
||||
`/health` -> 200. Postgres `:5432` localhost-only (not reachable over Tailscale). DB
|
||||
`gururmm` / user `gururmm` / pw `43617ebf7eb242e814ca9988cc4df5ad`.
|
||||
- Public API: `https://rmm-api.azcomputerguru.com` (`/health` 200; SPA hosts 403 to non-browser).
|
||||
- Beta dashboard `https://rmm-beta.azcomputerguru.com` (built from main). Prod dashboard
|
||||
`https://rmm.azcomputerguru.com` (promoted deliberately — UNCHANGED this session).
|
||||
- Gitea internal `172.16.3.20:3000` (unreachable until Tailscale up); public
|
||||
`https://git.azcomputerguru.com` behind Cloudflare (Invoke-RestMethod works; curl blocked).
|
||||
- Test box **DESKTOP-MS42HNC** (AZ Computer Guru / Howard-VM), agent id
|
||||
`0de89b88-b21d-4647-ab64-96157ba87cc5` — OFFLINE since `2026-06-23T05:36:45Z`.
|
||||
- Internal network (172.16.3.x) reachable only over **Tailscale** from Howard-Home.
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- Local pre-merge verify: `bash .claude/skills/gururmm-build/scripts/verify.sh server --check`
|
||||
(PASS), `... dashboard` (PASS), `... migrations` (062 next free, no collision).
|
||||
- Deploy-live check: `POST /api/auth/login` then `GET /api/software/knowledge` -> 200 `[]`.
|
||||
- Forensic: `GET /api/agents/<id>/software/removal-status` -> empty; `GET /api/software/knowledge`
|
||||
-> count 0 (proved nothing recorded). `GET /api/commands?agent_id=<id>&limit=60` -> all
|
||||
GuruScan monitoring (one per ~37s); uninstall commands scrolled past, 400-pull timed out.
|
||||
- Engine dry-run smoke test (PowerShell ConvertTo-Json targets): exit 0, one row per target —
|
||||
`Fake MSI App` tier=msi, `Fake NSIS App` tier=silent, `No Metadata App` needs_interactive,
|
||||
`GuruRMM Agent` refused. PS parse: `[OK] engine parses clean`.
|
||||
- PR create+merge via public Gitea API (`Invoke-RestMethod`): PR #49 -> merge `bce9e80`; PR #50
|
||||
-> merge `e83b06c`; pipeline bumps `c399e70` / `49ea0ef`.
|
||||
- `list err 503` on software list post-deploy = agent offline (server returns 503 when the agent
|
||||
isn't WS-connected).
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **Power on DESKTOP-MS42HNC** and let the agent reconnect — REQUIRED to verify the fix
|
||||
end-to-end. Then either Howard re-runs the uninstall from beta (expect per-program results +
|
||||
the new "Removal history - this device" panel to populate), or Claude runs a controlled
|
||||
single-program uninstall via the API (destructive — Howard names a throwaway).
|
||||
- **Confirm the new server binary swapped** (couldn't pinpoint remotely) — implicitly proven once
|
||||
a post-deploy uninstall returns non-empty results (old broken engine returned nothing).
|
||||
- **Post-deploy catalog checks** still open from the prior session: verify the catalog populates
|
||||
on a real uninstall + promote an unknown live on beta.
|
||||
- **Prod dashboard promotion** NOT done (intentional). Promote deliberately later via
|
||||
`sudo /opt/gururmm/promote-dashboard.sh --confirm` when the beta UI is validated.
|
||||
- Follow-ons unchanged: GuruConnect SPEC-019, rip-and-replace Tier 1.4, Tier 1.5, Linux/macOS.
|
||||
|
||||
## Reference Information
|
||||
|
||||
- guru-rmm main head: `e83b06c` (merge PR #50), pipeline bump `49ea0ef`. Fix commit `bd6dd27`
|
||||
on `fix/software-uninstall-polish`.
|
||||
- Earlier merge: PR #49 `bce9e80`, bump `c399e70`. Pre-session main pin `1ec0256`.
|
||||
- PRs: https://git.azcomputerguru.com/azcomputerguru/gururmm/pulls/49 and /50.
|
||||
- Audit findings A-G (this session): A engine batch-abort (root cause, critical), B window flash,
|
||||
C server intolerant of non-zero exit, D removal-status not mapped to UI, E inventory rows
|
||||
ignore catalog ("unknown flag"), F StatusBadge missing unknown/dry-run, G `get_by_name` dead.
|
||||
- Engine embed path: server `include_str!("../../../agent/scripts/uninstall-engine.ps1")`.
|
||||
- Prior session log: `session-logs/2026-06/2026-06-22-howard-gururmm-software-uninstall.md`.
|
||||
- Build model: `projects/msp-tools/guru-rmm/docs/BUILD.md`; pipeline
|
||||
`deploy/build-pipeline/README.md`. Promote: `/opt/gururmm/promote-dashboard.sh`.
|
||||
Reference in New Issue
Block a user