sync: auto-sync from HOWARD-HOME at 2026-06-21 16:54:31
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-21 16:54:31
This commit is contained in:
103
session-logs/2026-06/2026-06-21-howard-gururmm-bug-018-019.md
Normal file
103
session-logs/2026-06/2026-06-21-howard-gururmm-bug-018-019.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Session — GuruRMM bug fixes: BUG-016/017 verify, BUG-019 fix, BUG-018 investigation
|
||||
|
||||
## User
|
||||
- **User:** Howard Enos (howard)
|
||||
- **Machine:** Howard-Home
|
||||
- **Role:** tech
|
||||
|
||||
## Session Summary
|
||||
|
||||
Picked up GuruRMM bug fixes after triaging the coord inbox (same HOWARD-HOME session that did the
|
||||
unifi-wifi pfSense build-out; see `2026-06-21-howard-unifi-pfsense-control-verbs.md`). GURU-KALI had
|
||||
flagged BUG-016/017 (P1 Linux-fleet) and offered Howard BUG-016. Verified both against the live
|
||||
guru-rmm working tree (`ed8cad3` == origin/main): **already fixed** by Mike in commit `30da053`
|
||||
(2026-06-01) — `StateDirectory=gururmm` in the systemd unit template (016) and an `OnceLock` device-id
|
||||
cache in `device_id.rs` (017). Nothing to take there.
|
||||
|
||||
Surveyed `docs/FEATURE_ROADMAP.md` for remaining open bugs: BUG-018 (DELETE agents, P2) and BUG-019
|
||||
(containerized agent self-update, P2). Fixed **BUG-019**: `agent/src/updater/mod.rs` `perform_update()`
|
||||
now early-returns inside a container (Linux-gated `is_docker_container()`), skipping the
|
||||
download/replace/rollback path — stops the silent downgrade on container recreate and the startup
|
||||
rollback-artifact log noise. Reports `UpdateStatus::Failed` + a clear message; a structured
|
||||
`update_method=image` field was deliberately not added (would need a matching server-side enum change —
|
||||
that's SPEC-023). Host `cargo check` clean; the Linux-gated branch could not be cross-compiled from the
|
||||
Windows dev box (`openssl-sys` cross build), so the Linux agent build is the final compile gate.
|
||||
|
||||
Investigated **BUG-018** and corrected its root cause. Mike's roadmap hypothesis was "missing index on
|
||||
child tables." A static pass over all 59 migrations showed every HIGH-volume cascade child of `agents`
|
||||
(metrics, agent_logs, commands, agent_events, check_history, checks, alerts, watchdog_events, …) is
|
||||
**already** indexed on `agent_id` — only `alert_mutes` (low-volume) lacked one, now added (migration
|
||||
`060`, FK hygiene). So the slow `DELETE /api/agents/:id` cascade is **delete volume**, not a seq-scan:
|
||||
deleting an agent with months of per-minute metrics means deleting millions of child rows synchronously
|
||||
in-request. The durable fix is handler-level (202 + background/batched delete, or a bulk-delete
|
||||
endpoint), which changes the 204 API contract + the dashboard's delete→refetch flow — left **Open** for
|
||||
Mike's decision, with the corrected analysis written into the roadmap.
|
||||
|
||||
Landed per Howard's choice: pushed a branch (not main → no build trigger) to guru-rmm origin and handed
|
||||
it to Mike via coord. Committed **only** the three intended files; the shared guru-rmm checkout had
|
||||
concurrent uncommitted WIP from other sessions (dashboard `.tsx`, `server/api/agents.rs`, `ws/mod.rs`,
|
||||
an rmm-audit report, `script-library/`) which was left untouched, and the submodule HEAD was restored to
|
||||
detached `ed8cad3`.
|
||||
|
||||
## Key Decisions
|
||||
- **BUG-019 reports `Failed` (not a new `Skipped`/`update_method=image`):** the server deserializes
|
||||
`UpdateResultPayload`/`UpdateStatus`; adding an enum variant agent-side would break server parsing.
|
||||
Using the existing `Failed` + a clear message is the minimal, no-server-change fix; structured image
|
||||
reporting is SPEC-023.
|
||||
- **BUG-018 not "fixed":** investigation disproved the missing-index theory, so shipping just the
|
||||
`alert_mutes` index and calling it fixed would be wrong. The real fix is a server-behavior change
|
||||
(Mike-owned). Documented the corrected root cause rather than guessing at a handler redesign.
|
||||
- **Branch, not direct-to-main** (Howard's call): production repo with a push-to-main build webhook +
|
||||
the Linux path wasn't locally compile-verified → branch for Mike's review/build-gate.
|
||||
- **Committed only my 3 files** from the shared checkout (explicit `git add <paths>`, not `-A`) to avoid
|
||||
sweeping concurrent sessions' WIP; verified the FEATURE_ROADMAP diff was solely my BUG-018/019 edits.
|
||||
|
||||
## Problems Encountered
|
||||
- **Can't cross-compile-check the Linux-gated guard from Windows** (`openssl-sys` build script needs a
|
||||
Linux OpenSSL). Mitigated with careful review + host `cargo check`; flagged the Linux agent build as
|
||||
the compile gate.
|
||||
- **Shared-checkout concurrency:** creating the branch moved the shared submodule HEAD off detached
|
||||
`ed8cad3`; a concurrent session then edited `FEATURE_ROADMAP.md`, so `git checkout ed8cad3` aborted
|
||||
(would clobber their edit). Restored HEAD without data loss via `git checkout --detach` +
|
||||
`git reset --mixed ed8cad3` (branch ref left at the pushed commit; all WIP preserved).
|
||||
- **`cd` into the submodule persisted** across Bash calls again (a recurring friction); ran sync from
|
||||
`/c/claudetools` explicitly.
|
||||
|
||||
## Configuration Changes
|
||||
In the guru-rmm submodule (on branch `bugfix/bug-019-container-selfupdate-and-bug-018-index`, commit `66a7f4e`, pushed to origin — NOT in the claudetools repo):
|
||||
- Modified: `agent/src/updater/mod.rs` (BUG-019 container guard), `docs/FEATURE_ROADMAP.md` (BUG-018 + BUG-019 entries).
|
||||
- Created: `server/migrations/060_alert_mutes_agent_id_index.sql` (BUG-018 FK hygiene).
|
||||
No changes committed to the ClaudeTools repo from this work (only this session log + gitignored `.claude/tmp/` scratch).
|
||||
|
||||
## Credentials & Secrets
|
||||
None created, discovered, or used.
|
||||
|
||||
## Infrastructure & Servers
|
||||
- GuruRMM repo: Gitea `https://git.azcomputerguru.com/azcomputerguru/gururmm.git`; working tree `ed8cad3` == origin/main.
|
||||
- GuruRMM server: 172.16.3.30 (push-to-main webhook builds agents; server is a separate manual/webhook build; migrations run on server deploy).
|
||||
- Rust toolchain on HOWARD-HOME: cargo 1.95.0; added rustup target `x86_64-unknown-linux-gnu` (check only; openssl-sys blocks cross build).
|
||||
|
||||
## Commands & Outputs
|
||||
```
|
||||
# verify 016/017 fixed
|
||||
grep -n StateDirectory agent/src/main.rs # StateDirectory=gururmm (BUG-016)
|
||||
grep -n OnceLock agent/src/device_id.rs # CACHED_ID OnceLock get_or_init (BUG-017)
|
||||
# BUG-018 index analysis: only alert_mutes lacked an agent_id index
|
||||
cargo check # host: clean (16 pre-existing warnings)
|
||||
cargo check --target x86_64-unknown-linux-gnu # FAILS: openssl-sys cross build (env limit, not code)
|
||||
# land on a branch (no build trigger)
|
||||
git checkout -b bugfix/bug-019-...; git add <3 files>; git commit; git push -u origin <branch>
|
||||
git checkout --detach; git reset --mixed ed8cad3 # restore shared HEAD w/o losing concurrent WIP
|
||||
```
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
- **BUG-018 handler fix** (Open, Mike's call): 202 + background/batched delete, or bulk-delete endpoint
|
||||
(changes 204 contract + dashboard flow). Offered to implement; awaiting Mike. Coord msg `93f2fc0b`.
|
||||
- **BUG-019 compile gate:** confirm the Linux agent build is green when Mike merges the branch.
|
||||
- Branch `bugfix/bug-019-container-selfupdate-and-bug-018-index` awaiting Mike's review/merge to main.
|
||||
|
||||
## Reference Information
|
||||
- Branch: `bugfix/bug-019-container-selfupdate-and-bug-018-index` @ `66a7f4e` (guru-rmm origin).
|
||||
- guru-rmm: 016/017 fixed in `30da053`; repo HEAD `ed8cad3`; FEATURE_ROADMAP BUG-018 ~line 411, BUG-019 ~453.
|
||||
- Coord: handoff to Mike `93f2fc0b` (to GURU-5070).
|
||||
- Companion log this session: `session-logs/2026-06/2026-06-21-howard-unifi-pfsense-control-verbs.md`.
|
||||
Reference in New Issue
Block a user