diff --git a/session-logs/2026-06/2026-06-21-howard-gururmm-features-audit-submodule-fix.md b/session-logs/2026-06/2026-06-21-howard-gururmm-features-audit-submodule-fix.md new file mode 100644 index 00000000..93a3c167 --- /dev/null +++ b/session-logs/2026-06/2026-06-21-howard-gururmm-features-audit-submodule-fix.md @@ -0,0 +1,88 @@ +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Worked through a batch of GuruRMM roadmap/bug items, ran a full codebase audit, and ended by root-causing and fixing a ClaudeTools sync bug that had been disrupting the whole session. Started by listing outstanding skill/roadmap work (excluding bitdefender/unifi/guru-scan), then implemented in sequence: SPEC-021 (logged-in-user domain & account-type detection) + the BUG-020 watchdog tray-teardown follow-up; the "Add Devices"/"Enroll an Agent" modal UX fix (top-X close, click-outside + Escape dismissal, refresh-on-close); BUG-018 (reliable agent deletion); and the "Open in MSP360" deep-link button (RMM_THOUGHTS Feature 3). Each landed full-stack on its own branch and was compile-validated (server `cargo check`, dashboard `tsc -b && vite build` run server-side in a throwaway git worktree, since this machine has no `node_modules`). + +The enrollment modal fix was the only item taken all the way to production this session: built the feature branch's dashboard onto the beta web root via a git worktree, the user tested it on rmm-beta, then promoted beta to prod with `promote-dashboard.sh --confirm`. Established (and saved as a feedback memory) a standing rule that GuruRMM **dashboard** changes go to beta first, before main, unless told otherwise — and documented that the pipeline auto-builds beta from `main`, so getting a branch onto beta means a manual worktree build, not a merge. + +Ran `/rmm-audit` (5 parallel Opus passes + a build-pipeline pass over SSH). It surfaced one real HIGH bug — assigning/unassigning any policy silently wiped a connected agent's Event Log Watch monitoring (the third instance of a clobber class already fixed at two other config-push sites) — which was fixed by reusing the single rule-shaper. The audit also flagged a MEDIUM info-disclosure (raw error strings in 500 bodies) that turned out to be **already fixed in main** (commit `58c1a96`); it was a false positive because the audit agents read a stale working tree. + +Lined up the four pending GuruRMM branches as Gitea PRs (#40-#43) with the migration merge-order (060 → 061 → 062) spelled out, DM'd Mike the lineup, and synced. Finally, diagnosed why the guru-rmm submodule kept reverting to a stale commit (`2e469f1`) and discarding work all session: `sync.sh` ran `git submodule update --init` unconditionally on every sync, re-checking-out the intentionally-lagging pinned gitlink in detached HEAD. Guarded it to only populate genuinely-missing submodules, verified guru-rmm now stays on `main` through a sync, and pushed the fix fleet-wide. + +## Key Decisions + +- **Dashboard changes go to beta before main** (saved as memory `rmm-dashboard-beta-before-main`). Beta auto-builds from `main`, so a feature branch is previewed by building its `dashboard/` in a `git worktree` and rsyncing `dist/` to `/var/www/gururmm/dashboard-beta`; promotion to prod is the separate human-run `promote-dashboard.sh --confirm`. +- **BUG-018 root cause corrected against the live DB**, not the roadmap's guess: the two huge cascade children (`metrics` ~6.4 GB, `agent_logs` ~3.5 GB) were already indexed on `agent_id`; only five small FKs were unindexed (migration 061). The real operator fix is the new `POST /api/agents/bulk-delete` (one transaction), not the indexes. +- **MSP360 deep-link placement unified**: the backup-alert "Investigate" action already deep-links to `?tab=backup`, so a single button on the backup-tab plan card covers both placements; no separate alert-row button (the alert doesn't carry the console id). +- **Event Log Watch fix reused the existing shaper** (`watch_rules_for_agent` retargeted from `&AppState` to `&PgPool`) rather than duplicating the rule-mapping JSON — one source of truth across all three push sites. +- **Submodule fix guarded the clobber, did not bump the gitlink**: CLAUDE.md intends the parent's pinned gitlink to lag `main`; the bug was `sync.sh` re-checking-out that pin over live work, so the fix is a populate-only guard, preserving the lag-by-design behavior. +- **Used a SHA-push workaround** (commit → cherry-pick onto `origin/main` → `git push origin :refs/heads/` → verify with `ls-remote`) to land branches reliably while the submodule kept detaching, before the root cause was found. + +## Problems Encountered + +- **Submodule kept reverting to `2e469f1` (detached HEAD), discarding uncommitted edits** — caused multiple dangling-commit recoveries and one reverted sed edit. Worked around with SHA-pushes, then root-caused: `sync.sh:339` `git submodule update --init` ran unconditionally. Fixed with a `git -C "$ppath" rev-parse --git-dir` populate-only guard. Verified guru-rmm now survives a sync on `main @ ed8cad3`. +- **Audit ran against a stale checkout** (working tree pinned at `2e469f1`, 5 commits behind `origin/main`). One MEDIUM finding (500-body info-disclosure, 17 sites) was already fixed in main by `58c1a96`; caught before shipping a redundant change, deleted the bogus branch, corrected the report. Re-verified all other findings against real main — only that one was stale. Same submodule root cause as above. +- **Build-server SSH host key changed** — `172.16.3.30` was rebuilt as a new physical box on 2026-06-11 (per wiki), so the ED25519 key legitimately changed. Refreshed `known_hosts` after confirming the rebuild, rather than blindly bypassing. +- **No SSH key on Howard-Home for the build server** — used the dedicated `gururmm-physical` key from the SOPS vault (extract to a temp keyfile, `ssh -i`, delete after). + +## Configuration Changes + +**ClaudeTools repo (committed + pushed):** +- `M .claude/scripts/sync.sh` — populate-only guard on the submodule update (Phase 1a) so already-populated submodules are never re-checked-out to the lagging gitlink. +- `A .claude/memory/rmm-dashboard-beta-before-main.md` + index pointer in `.claude/memory/MEMORY.md`. +- `M errorlog.md` — three entries: two `--friction` (detached-HEAD submodule; stale audit base) and the root-cause fix context. + +**guru-rmm submodule (on feature branches, not yet merged):** +- `feat/spec-021-and-bug-020-tray-teardown`: `agent/src/metrics/mod.rs`, `agent/Cargo.toml`, `agent/src/watchdog/{mod,monitor,wts}.rs`, `server/migrations/060_logged_in_user_domain.sql`, `server/src/db/metrics.rs`, `server/src/ws/mod.rs`, `dashboard/src/{api/client.ts,pages/AgentDetail.tsx}`, `docs/specs/SPEC-021-*.md`, `docs/FEATURE_ROADMAP.md`. +- `fix/bug-018-agent-delete-indexes`: `server/migrations/061_index_agent_foreign_keys.sql`, `server/src/db/agents.rs`, `server/src/api/{agents,mod}.rs`, `dashboard/src/{api/client.ts,pages/Agents.tsx}`, `docs/FEATURE_ROADMAP.md`. +- `feat/msp360-deeplink`: `server/src/mspbackups/{client,sync}.rs`, `server/src/db/mspbackups.rs`, `server/src/api/mspbackups.rs`, `server/migrations/062_backup_provider_console_id.sql`, `dashboard/src/{api/client.ts,lib/provider-links.ts,components/BackupDetailTab.tsx}`, `docs/RMM_THOUGHTS.md`. +- `fix/eventlog-watch-policy-clobber`: `server/src/api/event_log_watches.rs`, `server/src/ws/mod.rs`, `server/src/api/policies.rs`. +- `fix/enrollment-modal-ux` (MERGED to main as `4027c86`): `dashboard/src/components/{EnrollmentModal,EnrollAgentModal}.tsx`, `dashboard/src/pages/{SiteDetail,ClientDetail}.tsx`. +- Untracked (NOT committed): `projects/msp-tools/guru-rmm/reports/2026-06-21-rmm-audit.md`. + +## Credentials & Secrets + +No new credentials created. Used (all already vaulted): +- SSH to build server: `gururmm-physical` ed25519 key — vault `infrastructure/gururmm-server-physical.sops.yaml` field `credentials.ssh-private-key` (user `guru@172.16.3.30`, key-only auth; password auth disabled on the new box). +- Sudo on build server (for promote): `Paper123!@#-rmm` — vault `infrastructure/gururmm-server.sops.yaml` `credentials.password`. +- Postgres (schema query): vault `infrastructure/gururmm-server.sops.yaml` `credentials.databases.postgresql-password` (db `gururmm`, user `gururmm`, localhost:5432). +- Gitea API (PR creation): vault `services/gitea-howard.sops.yaml` `credentials.password` (user `howard`, internal `http://172.16.3.20:3000`). + +## Infrastructure & Servers + +- **GuruRMM server** `172.16.3.30` (hostname `gururmm`) — physical Lenovo ThinkCentre M83, Ubuntu 26.04, replaced the Jupiter VM at this IP on 2026-06-11. Repo `/home/guru/gururmm`. Build pipeline `/opt/gururmm/*.sh`, webhook `:9000`. Postgres 18, MariaDB, GuruRMM API `:3001`, Coord API `:8001`. +- Dashboard web roots: beta `/var/www/gururmm/dashboard-beta` (https://rmm-beta.azcomputerguru.com), prod `/var/www/gururmm/dashboard` (https://rmm.azcomputerguru.com). API https://rmm-api.azcomputerguru.com. +- Channel model: webhook auto-builds beta from `origin/main`; prod is updated only by `promote-dashboard.sh --confirm` (backs up prod first; `--rollback` to undo). +- Gitea: git.azcomputerguru.com (internal `172.16.3.20:3000`), repo `azcomputerguru/gururmm`. + +## Commands & Outputs + +- Beta preview build (per branch, no main merge): + `git worktree add --detach origin/` → `cd /dashboard && npm install && npm run build` → `rsync -a --delete dist/ /var/www/gururmm/dashboard-beta/`. Do NOT touch `/opt/gururmm/last-built-commit-dashboard`. +- Promote beta→prod: `sudo /opt/gururmm/promote-dashboard.sh --confirm` (prod backup written to `/var/www/gururmm/.dashboard-backups/dashboard-20260621-191008`). Undo: `... --rollback`. +- Unindexed-FK / schema queries run via `PGPASSWORD=... psql -U gururmm -d gururmm -h localhost`. +- PR creation: `POST http://172.16.3.20:3000/api/v1/repos/azcomputerguru/gururmm/pulls` (basic auth `howard:`, JSON built with `jq -nc --arg`). +- Submodule-clobber fix verified: guru-rmm `main @ ed8cad3` before AND after a full `sync.sh` run (previously reverted to detached `2e469f1`). + +## Pending / Incomplete Tasks + +- **Four GuruRMM PRs await Mike's review/merge** — merge migrations in order: + - #40 SPEC-021 + BUG-020 (migration 060) — FIRST. SPEC-021 also needs a Pluto signed-MSI agent build + fleet rollout. + - #41 BUG-018 (migration 061) — after 060. + - #42 MSP360 deep-link (migration 062) — last. + - #43 Event Log Watch HIGH fix — no migration, merge anytime. +- After each PR with a dashboard portion merges to main, it auto-builds to beta; promote to prod when validated. +- **Remaining audit findings** (all re-verified valid against real main): LOW DoS `cap_field` batch (DiscoveryResult / WatchdogEvent / NetworkState / Auth hostname in `ws/mod.rs`), SSE status-stream revocation bypass + JWT-in-URL (`agents.rs:758-762`), two `console.log` stubs (`AgentDetail.tsx:670`, `Logs.tsx:198`), `ContextTree.tsx` missing `isError`, and the roadmap reconciliation (5 shipped-but-unchecked items to flip `[ ]`→`[x]` + 3 partials to annotate). +- The audit report `reports/2026-06-21-rmm-audit.md` is untracked in the guru-rmm submodule working tree — not yet committed to the gururmm repo. + +## Reference Information + +- PRs: https://git.azcomputerguru.com/azcomputerguru/gururmm/pulls/40 (..43) +- Branch tips: SPEC-021/BUG-020 `7083e39`; BUG-018 `604c42f`; MSP360 `776b587`; eventlog fix `432b434`; enrollment (merged) `4027c86`. +- guru-rmm `origin/main` = `ed8cad3`; the already-shipped info-disclosure fix = `58c1a96`; the stale gitlink the submodule was pinned at = `2e469f1`. +- Discord DMs to Mike: prod-promote confirmation (msg 1518332316114489505), PR lineup (msg 1518361486509215955). +- Memory added: `.claude/memory/rmm-dashboard-beta-before-main.md`.