sync: auto-sync from HOWARD-HOME at 2026-06-21 16:54:31

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-21 16:54:31
This commit is contained in:
2026-06-21 16:55:00 -07:00
parent 3c4b108865
commit f55feb07fa

View File

@@ -0,0 +1,103 @@
# Session — GuruRMM bug fixes: BUG-016/017 verify, BUG-019 fix, BUG-018 investigation
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Picked up GuruRMM bug fixes after triaging the coord inbox (same HOWARD-HOME session that did the
unifi-wifi pfSense build-out; see `2026-06-21-howard-unifi-pfsense-control-verbs.md`). GURU-KALI had
flagged BUG-016/017 (P1 Linux-fleet) and offered Howard BUG-016. Verified both against the live
guru-rmm working tree (`ed8cad3` == origin/main): **already fixed** by Mike in commit `30da053`
(2026-06-01) — `StateDirectory=gururmm` in the systemd unit template (016) and an `OnceLock` device-id
cache in `device_id.rs` (017). Nothing to take there.
Surveyed `docs/FEATURE_ROADMAP.md` for remaining open bugs: BUG-018 (DELETE agents, P2) and BUG-019
(containerized agent self-update, P2). Fixed **BUG-019**: `agent/src/updater/mod.rs` `perform_update()`
now early-returns inside a container (Linux-gated `is_docker_container()`), skipping the
download/replace/rollback path — stops the silent downgrade on container recreate and the startup
rollback-artifact log noise. Reports `UpdateStatus::Failed` + a clear message; a structured
`update_method=image` field was deliberately not added (would need a matching server-side enum change —
that's SPEC-023). Host `cargo check` clean; the Linux-gated branch could not be cross-compiled from the
Windows dev box (`openssl-sys` cross build), so the Linux agent build is the final compile gate.
Investigated **BUG-018** and corrected its root cause. Mike's roadmap hypothesis was "missing index on
child tables." A static pass over all 59 migrations showed every HIGH-volume cascade child of `agents`
(metrics, agent_logs, commands, agent_events, check_history, checks, alerts, watchdog_events, …) is
**already** indexed on `agent_id` — only `alert_mutes` (low-volume) lacked one, now added (migration
`060`, FK hygiene). So the slow `DELETE /api/agents/:id` cascade is **delete volume**, not a seq-scan:
deleting an agent with months of per-minute metrics means deleting millions of child rows synchronously
in-request. The durable fix is handler-level (202 + background/batched delete, or a bulk-delete
endpoint), which changes the 204 API contract + the dashboard's delete→refetch flow — left **Open** for
Mike's decision, with the corrected analysis written into the roadmap.
Landed per Howard's choice: pushed a branch (not main → no build trigger) to guru-rmm origin and handed
it to Mike via coord. Committed **only** the three intended files; the shared guru-rmm checkout had
concurrent uncommitted WIP from other sessions (dashboard `.tsx`, `server/api/agents.rs`, `ws/mod.rs`,
an rmm-audit report, `script-library/`) which was left untouched, and the submodule HEAD was restored to
detached `ed8cad3`.
## Key Decisions
- **BUG-019 reports `Failed` (not a new `Skipped`/`update_method=image`):** the server deserializes
`UpdateResultPayload`/`UpdateStatus`; adding an enum variant agent-side would break server parsing.
Using the existing `Failed` + a clear message is the minimal, no-server-change fix; structured image
reporting is SPEC-023.
- **BUG-018 not "fixed":** investigation disproved the missing-index theory, so shipping just the
`alert_mutes` index and calling it fixed would be wrong. The real fix is a server-behavior change
(Mike-owned). Documented the corrected root cause rather than guessing at a handler redesign.
- **Branch, not direct-to-main** (Howard's call): production repo with a push-to-main build webhook +
the Linux path wasn't locally compile-verified → branch for Mike's review/build-gate.
- **Committed only my 3 files** from the shared checkout (explicit `git add <paths>`, not `-A`) to avoid
sweeping concurrent sessions' WIP; verified the FEATURE_ROADMAP diff was solely my BUG-018/019 edits.
## Problems Encountered
- **Can't cross-compile-check the Linux-gated guard from Windows** (`openssl-sys` build script needs a
Linux OpenSSL). Mitigated with careful review + host `cargo check`; flagged the Linux agent build as
the compile gate.
- **Shared-checkout concurrency:** creating the branch moved the shared submodule HEAD off detached
`ed8cad3`; a concurrent session then edited `FEATURE_ROADMAP.md`, so `git checkout ed8cad3` aborted
(would clobber their edit). Restored HEAD without data loss via `git checkout --detach` +
`git reset --mixed ed8cad3` (branch ref left at the pushed commit; all WIP preserved).
- **`cd` into the submodule persisted** across Bash calls again (a recurring friction); ran sync from
`/c/claudetools` explicitly.
## Configuration Changes
In the guru-rmm submodule (on branch `bugfix/bug-019-container-selfupdate-and-bug-018-index`, commit `66a7f4e`, pushed to origin — NOT in the claudetools repo):
- Modified: `agent/src/updater/mod.rs` (BUG-019 container guard), `docs/FEATURE_ROADMAP.md` (BUG-018 + BUG-019 entries).
- Created: `server/migrations/060_alert_mutes_agent_id_index.sql` (BUG-018 FK hygiene).
No changes committed to the ClaudeTools repo from this work (only this session log + gitignored `.claude/tmp/` scratch).
## Credentials & Secrets
None created, discovered, or used.
## Infrastructure & Servers
- GuruRMM repo: Gitea `https://git.azcomputerguru.com/azcomputerguru/gururmm.git`; working tree `ed8cad3` == origin/main.
- GuruRMM server: 172.16.3.30 (push-to-main webhook builds agents; server is a separate manual/webhook build; migrations run on server deploy).
- Rust toolchain on HOWARD-HOME: cargo 1.95.0; added rustup target `x86_64-unknown-linux-gnu` (check only; openssl-sys blocks cross build).
## Commands & Outputs
```
# verify 016/017 fixed
grep -n StateDirectory agent/src/main.rs # StateDirectory=gururmm (BUG-016)
grep -n OnceLock agent/src/device_id.rs # CACHED_ID OnceLock get_or_init (BUG-017)
# BUG-018 index analysis: only alert_mutes lacked an agent_id index
cargo check # host: clean (16 pre-existing warnings)
cargo check --target x86_64-unknown-linux-gnu # FAILS: openssl-sys cross build (env limit, not code)
# land on a branch (no build trigger)
git checkout -b bugfix/bug-019-...; git add <3 files>; git commit; git push -u origin <branch>
git checkout --detach; git reset --mixed ed8cad3 # restore shared HEAD w/o losing concurrent WIP
```
## Pending / Incomplete Tasks
- **BUG-018 handler fix** (Open, Mike's call): 202 + background/batched delete, or bulk-delete endpoint
(changes 204 contract + dashboard flow). Offered to implement; awaiting Mike. Coord msg `93f2fc0b`.
- **BUG-019 compile gate:** confirm the Linux agent build is green when Mike merges the branch.
- Branch `bugfix/bug-019-container-selfupdate-and-bug-018-index` awaiting Mike's review/merge to main.
## Reference Information
- Branch: `bugfix/bug-019-container-selfupdate-and-bug-018-index` @ `66a7f4e` (guru-rmm origin).
- guru-rmm: 016/017 fixed in `30da053`; repo HEAD `ed8cad3`; FEATURE_ROADMAP BUG-018 ~line 411, BUG-019 ~453.
- Coord: handoff to Mike `93f2fc0b` (to GURU-5070).
- Companion log this session: `session-logs/2026-06/2026-06-21-howard-unifi-pfsense-control-verbs.md`.