diff --git a/session-logs/2026-05-25-session.md b/session-logs/2026-05-25-session.md index 8fa790d..6fb3be9 100644 --- a/session-logs/2026-05-25-session.md +++ b/session-logs/2026-05-25-session.md @@ -801,3 +801,92 @@ POST http://172.16.3.30:3001/api/logs/analyze {} (fleet scope) - Webhook handler: `/opt/gururmm/webhook-handler.py` (port 9000, builds agents only, NOT server) - gururmm Gitea: `http://172.16.3.20:3000/azcomputerguru/gururmm` - Beast Ollama: `http://100.101.122.4:11434` (direct), `http://172.16.0.1:11434` (via socat relay from LAN) + +--- + +## Update: 09:34 MST — GuruRMM full audit + submodule infrastructure fixes (Mike Swanson / GURU-KALI) + +### Session Summary + +Ran `/rmm-audit` against GuruRMM. Because GURU-KALI was freshly recovered (see the MacBook nvidia black-screen recovery earlier today), the `projects/msp-tools/guru-rmm` submodule was uninitialized and empty, so the audit was run against a fresh clone of the active `azcomputerguru/gururmm` repo at commit `7374e8a` placed in `/tmp/gururmm-audit`. Five passes ran: four codebase passes (API coverage, Rust quality+auth, TypeScript, data integrity) as parallel subagents — security/auth/migration passes on opus, the rest on sonnet — plus a sequential build-pipeline pass that SSHed read-only into the build server (172.16.3.30). Aggregated to 61 findings: 2 critical, 10 high, 16 medium, 7 low, 26 info. + +The two CRITICALs share one root cause: the server has **no router-level/middleware auth** — every route is protected only by whether its handler includes the `AuthUser` extractor, so a handler that omits it is silently public. Two whole modules omit it: `metrics.rs` (per-agent + fleet metrics readable anonymously) and `logs.rs` (fleet-wide raw logs, plus `POST /logs/analyze` which fires an outbound Ollama call, and `POST /agents/:id/logs/request` which commands an agent to upload logs — all anonymous). HIGH highlights: unauthenticated fleet-wide agent-status SSE stream, Entra SSO callback never validating the ID-token signature, mac builds stuck 7 commits behind HEAD since the 2026-05-24 Pluto outage, and two dead frontend links (`Agent.client_id` / `Agent.update_channel` declared in TS but never returned by the agent endpoints). The agent↔server wire protocol (21 AgentMessage + 18 ServerMessage variants, all handled), policy system (5 sections all merge/default/route), migrations (001–045 no gaps), and build pipeline integrity came back clean. + +The report was written to the gururmm repo's `reports/` and committed to a non-main branch `audit/2026-05-25-rmm-audit` (commit `da1d4ee`) — verified via the webhook handler that a push to `main` triggers a full build (no path filtering) while a branch push triggers nothing, so the branch keeps the report off the build path. `docs/UI_GAPS.md` was updated in the same commit: Watchdog Alerts marked CLOSED, MSPBackups + Organizations downgraded to in-progress, and four new orphaned-route gaps (#12–15) added. + +Mike then flagged that this Linux instance was mishandling the RMM submodule. Investigation found the real issues: (1) the submodule was never initialized on GURU-KALI and `sync.sh` Phase 1a used `git submodule foreach` (which only visits initialized submodules), so it silently skipped population yet reported success — the `/tmp` clone workaround was a symptom of this; (2) an orphaned `projects/solverbot` gitlink (mode 160000, committed at `8b6f0bc` with no `.gitmodules` entry) made bare `git submodule` commands throw `fatal: no submodule mapping`. The `.gitmodules` URL for guru-rmm points to the **active** `azcomputerguru/gururmm` repo — the "stale reference copy" wording in CLAUDE.md was misleading. + +Fixes applied: initialized + populated the guru-rmm submodule at its proper path (pinned `7374e8a` at the time); rewrote `sync.sh` Phase 1a to explicitly init+populate each `.gitmodules`-declared submodule with credentials inherited from the parent origin URL (so non-interactive init authenticates), then advance to remote tip, with honest reporting; removed the solverbot orphan gitlink (per Mike's choice); normalized `git config user.name` from `Mike-Swanson` to `Mike Swanson`; and corrected the CLAUDE.md submodule wording. A later sync pulled a teammate commit (`6945b42`) bumping the guru-rmm pin to `0a4db53`, which `git submodule update` checked out cleanly — confirming the new flow works. + +### Key Decisions + +- **Audited a fresh clone, not the empty submodule:** the submodule was uninitialized; rather than block, cloned the active repo to `/tmp`. The *correct* long-term fix (done afterward) was to initialize the submodule properly — the `/tmp` clone was a stopgap, now removed. +- **Report committed to a branch, not main:** confirmed the webhook has no path filtering, so a docs-only push to main would trigger a full agent build. Branch push avoids it; Mike merges to main on his schedule. +- **Reclassified two agent severities during aggregation:** Agent A's "script-runs/:id has no client function" CRITICAL → MEDIUM (no security/data-loss/crash; workaround exists); Agent E's tray-EXE LOW → INFO (count within threshold). Applied the rubric consistently as aggregator. +- **Removed solverbot rather than registering it:** Mike's call. solverbot has its own Gitea repo (`azcomputerguru/solverbot` @ `0ec690f`) but doesn't belong as a claudetools submodule; dropping the gitlink clears the `fatal`. Its own repo is untouched. +- **Credential inheritance in sync.sh, not in `.gitmodules`:** submodule clone URLs get the parent origin's embedded creds written to local `.git/config` only; `.gitmodules` stays credential-free so nothing secret is committed. + +### Problems Encountered + +- **Submodule empty / `git submodule status` fatal:** root-caused to uninitialized submodule + orphaned solverbot gitlink. Resolved by `git submodule init`/`update` (path-scoped) and `git rm --cached projects/solverbot`. +- **sync.sh false success on submodules:** `git submodule foreach` no-ops on uninitialized submodules. Rewrote Phase 1a to iterate `.gitmodules` entries and init+populate explicitly. +- **Submodule pointer showed as modified after CLAUDE.md push:** the rebase pulled a teammate commit (`6945b42`) that advanced the guru-rmm pin; local submodule was still on the old commit. Resolved with `git submodule update` (checks out the recorded pin `0a4db53`) — not a real local change. +- **git user.name drift:** machine had `Mike-Swanson`; normalized to `Mike Swanson` per identity.json/protocol. + +### Configuration Changes + +- `.claude/scripts/sync.sh` — Phase 1a rewritten (init+populate submodules w/ credential inheritance; honest reporting). Commit `413df93`. +- `projects/solverbot` — orphaned gitlink removed from index + empty dir deleted. Commit `413df93`. +- `.claude/CLAUDE.md` — corrected guru-rmm submodule wording (lines ~143, ~270). Commit `f2ece8e`. +- `.claude/current-mode` — set to `dev` (local, gitignored). +- guru-rmm submodule: initialized locally; `submodule.projects/msp-tools/guru-rmm.url` in `.git/config` set to the credentialed gururmm URL (local only). +- In the gururmm repo (branch `audit/2026-05-25-rmm-audit`, commit `da1d4ee`): `reports/2026-05-25-rmm-audit.md` (new), `docs/UI_GAPS.md` (modified). +- git `user.name`: `Mike-Swanson` → `Mike Swanson`. + +### Credentials & Secrets + +- No new credentials created. Submodule clones reuse the shared Gitea account credentials already embedded in the claudetools `remote.origin.url` (account `azcomputerguru`); sync.sh copies that scheme+userinfo+host into each submodule's local `.git/config` URL at init time. Nothing secret is written to tracked files (`.gitmodules` stays credential-free). +- GuruRMM API admin creds used by the build-pipeline pass: vault `infrastructure/gururmm-server.sops.yaml` (admin-email `claude-api@azcomputerguru.com`). + +### Infrastructure & Servers + +- GuruRMM server / build server: `172.16.3.30` — API `:3001`, webhook handler `:9000` (`/opt/gururmm/webhook-handler.py`, multi-platform split handler, `PLATFORMS`×3). Builds only on push to `refs/heads/main`; no path filtering; skip token `[ci-version-bump]`. Live repo `/home/guru/gururmm`. +- Build artifacts: flat in `/var/www/gururmm/downloads/` with `-latest` symlinks (NOT the `windows/amd64` subdirs the rmm-audit skill assumes — skill Pass 6 paths should be updated). Current artifacts v0.6.39 built 2026-05-25. +- Per-platform last-built-commit: Linux/Windows at HEAD `7374e8a`; mac stuck at `1ed5596` (7 behind) since the 2026-05-24 Pluto outage. +- Pluto (Windows MSI builder): SSH from build-windows.sh pins `StrictHostKeyChecking=yes` against `/opt/gururmm/pluto_known_hosts` (3 entries). +- gururmm Gitea repos: `azcomputerguru/gururmm` (active, main was `7374e8a`→`f5df7a53`→`0a4db53` during/after the session) and `azcomputerguru/guru-rmm` (abandoned hyphenated duplicate). `azcomputerguru/solverbot` @ `0ec690f` exists but is not a claudetools submodule. + +### Commands & Outputs + +```bash +# Properly initialize the previously-empty submodule (the correct fix): +git submodule init -- projects/msp-tools/guru-rmm +git config submodule."projects/msp-tools/guru-rmm".url \ + "https://azcomputerguru:@git.azcomputerguru.com/azcomputerguru/gururmm.git" +git submodule update -- projects/msp-tools/guru-rmm +# -> checked out 7374e8a... + +# Remove the orphaned solverbot gitlink: +git rm --cached projects/solverbot && rmdir projects/solverbot +# git submodule status -> now exits 0, no fatal + +# After a pull bumped the pin, sync the submodule working tree to the recorded commit: +git submodule update -- projects/msp-tools/guru-rmm +# -> checked out 0a4db53... ; git status clean +``` + +- Webhook finding: a docs/reports-only push to `main` DOES trigger a full build (no path inspection in `webhook-handler.py`); a non-main branch push triggers nothing (`return 200 Ignored push to {ref}`). + +### Pending / Incomplete Tasks + +- **GuruRMM CRITICAL auth fixes (not started):** add `AuthUser` to all `metrics.rs` (`:29,:57`) and `logs.rs` (`:88,101,112,124,133,178`) handlers and scope to accessible orgs; then add a router-level auth layer so "public" must be opt-in (kills the whole class). Offered to start; awaiting Mike's go. +- HIGH follow-ups: validate Entra ID-token signature (`sso.rs:212`); auth+scope the agent-status SSE (`agents.rs:583`); bring the mac builder back online (gate stuck at `1ed5596`); add `client_id`/`update_channel` to the agent response structs (dead frontend links). +- Audit report lives only on branch `audit/2026-05-25-rmm-audit` — merge to main when bundling code fixes (will trigger a build). +- Optional: update the rmm-audit skill's Pass 6 artifact paths (flat `downloads/`, not `windows/amd64`). + +### Reference Information + +- Audited gururmm commit: `7374e8a`. Audit report: `reports/2026-05-25-rmm-audit.md` on branch `audit/2026-05-25-rmm-audit`, commit `da1d4ee` (gururmm remote). PR URL: `https://git.azcomputerguru.com/azcomputerguru/gururmm/pulls/new/audit/2026-05-25-rmm-audit` +- claudetools commits this session: `413df93` (sync.sh submodule fix + solverbot removal), `f2ece8e` (CLAUDE.md wording). +- Findings tally: API Coverage 14 (0C/5H/4M/1L), Rust+Auth 10 (2C/2H/1M), TypeScript 17 (0C/2H/7M/6L), Data Integrity 10 (0C/0H/4M), Build Pipeline 10 (0C/1H). Total 61 (2C/10H/16M/7L/26I). +- Prior GuruRMM audits: `reports/2026-05-23-rmm-audit.md`, `reports/2026-05-19-rmm-audit.md`.