From f55feb07fab581d6eb2be92c16617ac26952a3c8 Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Sun, 21 Jun 2026 16:55:00 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-06-21 16:54:31 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-21 16:54:31 --- .../2026-06-21-howard-gururmm-bug-018-019.md | 103 ++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 session-logs/2026-06/2026-06-21-howard-gururmm-bug-018-019.md diff --git a/session-logs/2026-06/2026-06-21-howard-gururmm-bug-018-019.md b/session-logs/2026-06/2026-06-21-howard-gururmm-bug-018-019.md new file mode 100644 index 00000000..49088a45 --- /dev/null +++ b/session-logs/2026-06/2026-06-21-howard-gururmm-bug-018-019.md @@ -0,0 +1,103 @@ +# Session — GuruRMM bug fixes: BUG-016/017 verify, BUG-019 fix, BUG-018 investigation + +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Picked up GuruRMM bug fixes after triaging the coord inbox (same HOWARD-HOME session that did the +unifi-wifi pfSense build-out; see `2026-06-21-howard-unifi-pfsense-control-verbs.md`). GURU-KALI had +flagged BUG-016/017 (P1 Linux-fleet) and offered Howard BUG-016. Verified both against the live +guru-rmm working tree (`ed8cad3` == origin/main): **already fixed** by Mike in commit `30da053` +(2026-06-01) — `StateDirectory=gururmm` in the systemd unit template (016) and an `OnceLock` device-id +cache in `device_id.rs` (017). Nothing to take there. + +Surveyed `docs/FEATURE_ROADMAP.md` for remaining open bugs: BUG-018 (DELETE agents, P2) and BUG-019 +(containerized agent self-update, P2). Fixed **BUG-019**: `agent/src/updater/mod.rs` `perform_update()` +now early-returns inside a container (Linux-gated `is_docker_container()`), skipping the +download/replace/rollback path — stops the silent downgrade on container recreate and the startup +rollback-artifact log noise. Reports `UpdateStatus::Failed` + a clear message; a structured +`update_method=image` field was deliberately not added (would need a matching server-side enum change — +that's SPEC-023). Host `cargo check` clean; the Linux-gated branch could not be cross-compiled from the +Windows dev box (`openssl-sys` cross build), so the Linux agent build is the final compile gate. + +Investigated **BUG-018** and corrected its root cause. Mike's roadmap hypothesis was "missing index on +child tables." A static pass over all 59 migrations showed every HIGH-volume cascade child of `agents` +(metrics, agent_logs, commands, agent_events, check_history, checks, alerts, watchdog_events, …) is +**already** indexed on `agent_id` — only `alert_mutes` (low-volume) lacked one, now added (migration +`060`, FK hygiene). So the slow `DELETE /api/agents/:id` cascade is **delete volume**, not a seq-scan: +deleting an agent with months of per-minute metrics means deleting millions of child rows synchronously +in-request. The durable fix is handler-level (202 + background/batched delete, or a bulk-delete +endpoint), which changes the 204 API contract + the dashboard's delete→refetch flow — left **Open** for +Mike's decision, with the corrected analysis written into the roadmap. + +Landed per Howard's choice: pushed a branch (not main → no build trigger) to guru-rmm origin and handed +it to Mike via coord. Committed **only** the three intended files; the shared guru-rmm checkout had +concurrent uncommitted WIP from other sessions (dashboard `.tsx`, `server/api/agents.rs`, `ws/mod.rs`, +an rmm-audit report, `script-library/`) which was left untouched, and the submodule HEAD was restored to +detached `ed8cad3`. + +## Key Decisions +- **BUG-019 reports `Failed` (not a new `Skipped`/`update_method=image`):** the server deserializes + `UpdateResultPayload`/`UpdateStatus`; adding an enum variant agent-side would break server parsing. + Using the existing `Failed` + a clear message is the minimal, no-server-change fix; structured image + reporting is SPEC-023. +- **BUG-018 not "fixed":** investigation disproved the missing-index theory, so shipping just the + `alert_mutes` index and calling it fixed would be wrong. The real fix is a server-behavior change + (Mike-owned). Documented the corrected root cause rather than guessing at a handler redesign. +- **Branch, not direct-to-main** (Howard's call): production repo with a push-to-main build webhook + + the Linux path wasn't locally compile-verified → branch for Mike's review/build-gate. +- **Committed only my 3 files** from the shared checkout (explicit `git add `, not `-A`) to avoid + sweeping concurrent sessions' WIP; verified the FEATURE_ROADMAP diff was solely my BUG-018/019 edits. + +## Problems Encountered +- **Can't cross-compile-check the Linux-gated guard from Windows** (`openssl-sys` build script needs a + Linux OpenSSL). Mitigated with careful review + host `cargo check`; flagged the Linux agent build as + the compile gate. +- **Shared-checkout concurrency:** creating the branch moved the shared submodule HEAD off detached + `ed8cad3`; a concurrent session then edited `FEATURE_ROADMAP.md`, so `git checkout ed8cad3` aborted + (would clobber their edit). Restored HEAD without data loss via `git checkout --detach` + + `git reset --mixed ed8cad3` (branch ref left at the pushed commit; all WIP preserved). +- **`cd` into the submodule persisted** across Bash calls again (a recurring friction); ran sync from + `/c/claudetools` explicitly. + +## Configuration Changes +In the guru-rmm submodule (on branch `bugfix/bug-019-container-selfupdate-and-bug-018-index`, commit `66a7f4e`, pushed to origin — NOT in the claudetools repo): +- Modified: `agent/src/updater/mod.rs` (BUG-019 container guard), `docs/FEATURE_ROADMAP.md` (BUG-018 + BUG-019 entries). +- Created: `server/migrations/060_alert_mutes_agent_id_index.sql` (BUG-018 FK hygiene). +No changes committed to the ClaudeTools repo from this work (only this session log + gitignored `.claude/tmp/` scratch). + +## Credentials & Secrets +None created, discovered, or used. + +## Infrastructure & Servers +- GuruRMM repo: Gitea `https://git.azcomputerguru.com/azcomputerguru/gururmm.git`; working tree `ed8cad3` == origin/main. +- GuruRMM server: 172.16.3.30 (push-to-main webhook builds agents; server is a separate manual/webhook build; migrations run on server deploy). +- Rust toolchain on HOWARD-HOME: cargo 1.95.0; added rustup target `x86_64-unknown-linux-gnu` (check only; openssl-sys blocks cross build). + +## Commands & Outputs +``` +# verify 016/017 fixed +grep -n StateDirectory agent/src/main.rs # StateDirectory=gururmm (BUG-016) +grep -n OnceLock agent/src/device_id.rs # CACHED_ID OnceLock get_or_init (BUG-017) +# BUG-018 index analysis: only alert_mutes lacked an agent_id index +cargo check # host: clean (16 pre-existing warnings) +cargo check --target x86_64-unknown-linux-gnu # FAILS: openssl-sys cross build (env limit, not code) +# land on a branch (no build trigger) +git checkout -b bugfix/bug-019-...; git add <3 files>; git commit; git push -u origin +git checkout --detach; git reset --mixed ed8cad3 # restore shared HEAD w/o losing concurrent WIP +``` + +## Pending / Incomplete Tasks +- **BUG-018 handler fix** (Open, Mike's call): 202 + background/batched delete, or bulk-delete endpoint + (changes 204 contract + dashboard flow). Offered to implement; awaiting Mike. Coord msg `93f2fc0b`. +- **BUG-019 compile gate:** confirm the Linux agent build is green when Mike merges the branch. +- Branch `bugfix/bug-019-container-selfupdate-and-bug-018-index` awaiting Mike's review/merge to main. + +## Reference Information +- Branch: `bugfix/bug-019-container-selfupdate-and-bug-018-index` @ `66a7f4e` (guru-rmm origin). +- guru-rmm: 016/017 fixed in `30da053`; repo HEAD `ed8cad3`; FEATURE_ROADMAP BUG-018 ~line 411, BUG-019 ~453. +- Coord: handoff to Mike `93f2fc0b` (to GURU-5070). +- Companion log this session: `session-logs/2026-06/2026-06-21-howard-unifi-pfsense-control-verbs.md`.