diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index 38e7fbe7..0dc07245 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -101,6 +101,7 @@ - [Syncro lessons / incident archive](feedback_syncro_history.md) — Detail behind the three rule files: tickets (#32332, #32312, #32225, #32253, #32203, #32185, #32142, #32304, #32333), verbatim Mike/Howard/Winter quotes, dates, tech user_id table (Mike 1735 / Howard 1750 / Winter 1737 / Rob 1760), labor product table, and superseded-rule history. ### GuruRMM +- [GuruRMM build verification (read before touching the pipeline)](feedback_gururmm_build_verification.md) — Merge-to-main IS the build+deploy; verify locally FIRST. Canonical refs: guru-rmm `docs/BUILD.md` + the `gururmm-build` skill (`verify.sh server|agent|dashboard|migrations`) + `deploy/build-pipeline/README.md`. Compile-gate trap: Windows cargo can't verify Linux-gated agent code (openssl-sys); Linux build on .30 is the real gate. Server needs SQLX_OFFLINE + fresh server/.sqlx; check migration-number collisions. - [GuruRMM operational rules](feedback_gururmm.md) — Six rules: (1) RMM dev = Mike, never Howard (368/0 commits); GuruScan is Howard's. (2) Agent parity Win+Linux+macOS in same change. (3) Builds via Gitea webhook pipeline only, never SSH. (4) #bot-alerts only for client/ticket impact, skip internal infra/dev. (5) Identify agents by IP, not by reconning candidates. (6) UNC paths in user_session need [char]92 — literals get halved. - [Build channel default = beta](feedback_gururmm_build_channel_default.md) — New agent builds must be tagged BETA by default (stable = explicit promote re-tag); distinct from agents defaulting to the stable CHANNEL (correct). Fixed build-windows/linux.sh 2026-06-01; macOS already correct. Enables beta-first canary. - [Dashboard beta-first deploy](feedback_dashboard_beta_first.md) — Dashboard auto-builds to rmm-beta.azcomputerguru.com on push; prod (rmm.azcomputerguru.com) is explicit promote-only via promote-dashboard.sh --confirm. Never hand-rsync prod. One artifact, nginx sub_filter BETA banner. Stood up 2026-06-02. diff --git a/.claude/memory/feedback_gururmm_build_verification.md b/.claude/memory/feedback_gururmm_build_verification.md new file mode 100644 index 00000000..a4a07add --- /dev/null +++ b/.claude/memory/feedback_gururmm_build_verification.md @@ -0,0 +1,48 @@ +--- +name: feedback_gururmm_build_verification +description: Before touching the GuruRMM build pipeline, verify locally first — merge-to-main IS the build+deploy. Canonical refs: docs/BUILD.md + the gururmm-build skill. Compile-gate trap. +metadata: + type: feedback +--- + +Every Claude instance that touches the GuruRMM build pipeline must internalize this before +editing agent/server/dashboard code: + +**Merging to `main` IS the build-and-deploy trigger — there is no separate build step.** A push +to main fires the Gitea webhook → `webhook-handler.py` on 172.16.3.30 → `build-shared.sh` (version +bump + `[ci-version-bump]` commit) → per-component builds (each self-gated on change). New +agent/server/dashboard artifacts land on **beta** first; prod promotion is deliberate. + +Canonical, kept-current references (use these, don't re-derive from the scripts): +- **`projects/msp-tools/guru-rmm/docs/BUILD.md`** — developer build + pre-merge verification guide. +- **`gururmm-build` skill** — `verify.sh {server|agent|dashboard|all|migrations} [--check]` runs the + same cargo/npm the server runs, on the right OS, with the gotchas baked in + errorlog on failure. +- **`projects/msp-tools/guru-rmm/deploy/build-pipeline/README.md`** — server-side pipeline source of + truth (webhook, Beast/Pluto hosts, signing, dashboard channels, repo↔/opt sync). The repo's older + `scripts/build-agents.sh` + `scripts/webhook-handler.py` are a PRIOR generation — do not follow them. + +The traps that break a post-merge build: +- **Compile gate:** Linux-gated agent code (`#[cfg(target_os="linux")]`, e.g. `is_docker_container()` + in `agent/src/updater/mod.rs`) is ONLY compiled by the Linux agent build — Windows `cargo check` + CANNOT verify it (`openssl-sys` won't cross-build). The Linux agent build on .30 is the real gate. + Build agent changes on the OS family they target; Windows-only paths (MSI/tray/legacy/x86) are + gated only by the Beast/Pluto Windows build. +- **sqlx offline cache:** the server build uses `SQLX_OFFLINE=true` + `server/.sqlx`. If you changed + queries/migrations, run `cargo sqlx prepare` and commit `server/.sqlx`, or the build fails. +- **Migration number collisions** across branches: check the highest `NNN_` on origin/main before + naming a new one (`verify.sh migrations`). Idempotent (`IF NOT EXISTS`) keeps overlaps harmless. +- Webhook builds from **origin/main**, so verify the COMMITTED state, not the working tree. + +**Why:** merging deploys, so an unverified merge breaks the build or ships broken code. Howard hit +this 2026-06-21 — couldn't cross-build the Linux-gated BUG-019 path from his Windows box, so the +post-merge Linux build on .30 was the only gate. + +**How to apply:** before merging gururmm changes, run `gururmm-build` `verify.sh` for each touched +component (on the OS it targets); read `docs/BUILD.md` if unfamiliar with the model; after merge, +watch the matching `/var/log/gururmm-build-*.log` on .30. Promote the dashboard beta→prod only via +`promote-dashboard.sh --confirm`. + +Related: [[reference_gururmm]] [[rmm-dashboard-beta-before-main]] [[feedback_dashboard_beta_first]] +[[feedback_gururmm_build_channel_default]] [[feedback_verify_committed_state_before_push]] +[[reference_guru5070_rust_toolchain]] [[reference_sqlx_migrations_immutable]] +[[gururmm-beast-windows-build-host]] [[feedback_gururmm]] [[rmm-agent-update-model]] diff --git a/.claude/memory/reference_resource_map.md b/.claude/memory/reference_resource_map.md index 7ec2c871..5ebb26f0 100644 --- a/.claude/memory/reference_resource_map.md +++ b/.claude/memory/reference_resource_map.md @@ -185,7 +185,7 @@ ACG manages multiple M365 tenants via the **ComputerGuru tiered MSP app suite** | CW Concrete | `clients/cw-concrete/m365.sops.yaml` | | Kittle (M. Sanchez) | `clients/kittle/m365-michael-sanchez.sops.yaml` | -Also: multi-tenant Graph API service principal at `msp-tools/claude-msp-access-graph-api.sops.yaml`. +Also: the multi-tenant **ComputerGuru remediation app suite** — tiered SPs (Security Investigator `bfbc12a4`, Exchange Operator `b43e7342` (holds Graph Mail.Send), User Manager `64fac46b`, Tenant Admin `709e6eed`, Defender add-on `dbf8ad1a`), secrets `msp-tools/computerguru-*.sops.yaml`. ACG own-mail (`/mailbox`) uses the single-tenant `1873b1b0` app (`msp-tools/computerguru-mailbox.sops.yaml`). The old single-app `fabb3421` ("Claude-MSP-Access", secret `msp-tools/claude-msp-access-graph-api.sops.yaml`) was **DELETED from the tenant 2026-06-14 — do not reference** (token requests return AADSTS700016). See [[feedback_365_remediation_tool]]. **Google Workspace:** ACG service account `msp-tools/acg-msp-access-google-workspace.sops.yaml`. Client-specific: `clients/lonestar-electrical/google-workspace.sops.yaml`. diff --git a/.claude/skills/gururmm-build/SKILL.md b/.claude/skills/gururmm-build/SKILL.md new file mode 100644 index 00000000..6ff4e744 --- /dev/null +++ b/.claude/skills/gururmm-build/SKILL.md @@ -0,0 +1,101 @@ +--- +name: gururmm-build +description: > + Build and pre-merge verification for the GuruRMM codebase (agent/server/dashboard). Encodes the + fleet's build model so every dev/machine builds the SAME way Mike does: merging to main is the + build+deploy trigger (no separate build step), so verify locally first. Wraps the cargo/npm + commands the server-side pipeline runs, with the compile-gate gotcha baked in (Linux-gated agent + code can't be verified by Windows cargo check; server needs SQLX_OFFLINE + a fresh .sqlx cache; + dashboard uses npm install not ci). Triggers: build the rmm agent/server/dashboard, verify the + linux build, gururmm build, pre-merge check, will this compile, sqlx prepare, migration collision, + promote dashboard, what does merging to main do. +--- + +# gururmm-build — Build & pre-merge verification for GuruRMM + +One skill so the whole fleet builds GuruRMM the same way. The actual build+deploy happens +server-side on a push to `main` (Gitea webhook → `webhook-handler.py` on `172.16.3.30`); this +skill is the **local pre-merge verification** layer plus the mental model, so nobody merges code +that fails the server build. + +The human-facing companion doc lives in the project: `projects/msp-tools/guru-rmm/docs/BUILD.md`. +The server-side pipeline source of truth is `projects/msp-tools/guru-rmm/deploy/build-pipeline/README.md`. + +## The one rule + +**Merging to `main` is the build-and-deploy trigger. There is no separate build step.** New +agent/server/dashboard binaries land on **beta** first; prod promotion is deliberate and separate. +So: verify locally before you merge, and after merge watch the relevant build log on `.30`. + +## Commands + +```bash +# Verify a component compiles the way the server will (run from anywhere in the fleet) +bash .claude/skills/gururmm-build/scripts/verify.sh server # SQLX_OFFLINE cargo build + .sqlx freshness check +bash .claude/skills/gururmm-build/scripts/verify.sh agent # cargo build (warns about the compile gate per-OS) +bash .claude/skills/gururmm-build/scripts/verify.sh dashboard # npm install + npm run build +bash .claude/skills/gururmm-build/scripts/verify.sh all # all three + +# Faster: cargo check instead of a full release build (Rust components only) +bash .claude/skills/gururmm-build/scripts/verify.sh server --check +bash .claude/skills/gururmm-build/scripts/verify.sh agent --check + +# Check for a migration number collision before you name/merge a new migration +bash .claude/skills/gururmm-build/scripts/verify.sh migrations +``` + +The script auto-locates the guru-rmm checkout (the `projects/msp-tools/guru-rmm` submodule), +detects the host OS, applies the right toolchain/env, and logs real failures to `errorlog.md`. + +## The compile gate (why Windows cargo check is not enough) + +The agent compiles across OSes and toolchains; **each OS gates only its own platform code**: + +- **Linux-gated agent code** (`#[cfg(target_os = "linux")]`, e.g. `is_docker_container()` in + `agent/src/updater/mod.rs`) is only compiled by the **Linux** agent build. You cannot verify it + with cargo on Windows — `openssl-sys` won't cross-build there. `verify.sh agent` warns when you're + on Windows and the change may be Linux-gated. To truly gate it, build on Linux or watch the + post-merge `/var/log/gururmm-build-linux.log` on `.30`. +- **Windows-only code** — MSI (`installer/`, WiX), tray (`tray/`), legacy (Rust 1.77) + x86 variants + — is only built on the Beast/Pluto Windows hosts; a Linux/macOS box can't gate it. + +Rule: **build agent changes on the OS family they target.** + +## Server + sqlx (the other common merge-breaker) + +`build-server.sh` builds with `SQLX_OFFLINE=true` against the committed `server/.sqlx` cache. If you +changed SQL queries or added a migration and did **not** refresh the cache, the server build fails: + +```bash +cd server && cargo sqlx prepare # needs a reachable DB; regenerates server/.sqlx +git add server/.sqlx && git commit -m "chore: refresh sqlx offline cache" +``` + +`verify.sh server` flags when `server/` query code changed but `server/.sqlx` was not also touched. + +## Migrations + +Numbered `NNN_description.sql`, applied on server deploy. Keep them idempotent +(`CREATE INDEX IF NOT EXISTS`). **Number collisions across branches are real** — if two branches +both add `060_*`, the second to merge must be renumbered. `verify.sh migrations` reports the highest +migration on `main` and any duplicate numbers so you pick the next free one. + +## After merge + +Build logs on `.30`: `gururmm-build.log` (shared/version bump), `-linux`, `-windows`, `-server`, +`-dashboard`. Server deploy auto-rolls-back if the new binary won't start. Agents + dashboard land +on **beta**. Promote the dashboard to prod deliberately: + +```bash +sudo /opt/gururmm/promote-dashboard.sh # dry-run delta +sudo /opt/gururmm/promote-dashboard.sh --confirm # promote (backs up prod) +sudo /opt/gururmm/promote-dashboard.sh --rollback # undo last promotion +``` + +## Notes + +- This skill does **not** trigger the production build — that's the webhook on merge to `main`. It + verifies locally and documents the model. +- A full `cargo build --release` for agent/server is slow (minutes); use `--check` for a fast gate + and the release build only when you need the real artifact. +- Detail: `projects/msp-tools/guru-rmm/docs/BUILD.md` and `deploy/build-pipeline/README.md`. diff --git a/.claude/skills/gururmm-build/scripts/verify.sh b/.claude/skills/gururmm-build/scripts/verify.sh new file mode 100755 index 00000000..cca747ab --- /dev/null +++ b/.claude/skills/gururmm-build/scripts/verify.sh @@ -0,0 +1,168 @@ +#!/usr/bin/env bash +# verify.sh — GuruRMM local pre-merge verification. +# Builds/checks a component the SAME way the server-side pipeline does, so you don't +# merge code that fails the post-merge build on .30. Does NOT trigger the production +# build (that's the webhook on push to main) — this is the local gate only. +# +# Usage: +# bash verify.sh server [--check] # SQLX_OFFLINE cargo build (or check) + .sqlx freshness warning +# bash verify.sh agent [--check] # cargo build (or check) + per-OS compile-gate warning +# bash verify.sh dashboard # npm install + npm run build +# bash verify.sh all [--check] # server + agent + dashboard +# bash verify.sh migrations # report highest migration number + any duplicate numbers +# +# See: projects/msp-tools/guru-rmm/docs/BUILD.md +# projects/msp-tools/guru-rmm/deploy/build-pipeline/README.md + +set -uo pipefail + +# Resolve repo root (4 levels up: .claude/skills/gururmm-build/scripts/ -> repo root). +CLAUDETOOLS_ROOT="${CLAUDETOOLS_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)}" +REPO_DIR="$CLAUDETOOLS_ROOT/projects/msp-tools/guru-rmm" + +_logerr() { bash "$CLAUDETOOLS_ROOT/.claude/scripts/log-skill-error.sh" "gururmm-build" "$@" >/dev/null 2>&1 || true; } + +# ASCII markers (no emojis — fleet rule). +OK="[OK]"; ERR="[ERROR]"; WARN="[WARNING]"; INFO="[INFO]" + +die() { echo "$ERR $*" >&2; _logerr "$*"; exit 1; } + +[ -d "$REPO_DIR" ] || die "guru-rmm checkout not found at $REPO_DIR (submodule not initialized?)" + +# Host OS family. +case "$(uname -s 2>/dev/null)" in + Linux*) OS=linux ;; + Darwin*) OS=mac ;; + MINGW*|MSYS*|CYGWIN*) OS=windows ;; + *) OS=unknown ;; +esac + +CMD="${1:-}"; shift || true +MODE="build" +for a in "$@"; do + case "$a" in + --check) MODE="check" ;; + *) die "unknown flag: $a" ;; + esac +done +[ -n "$CMD" ] || die "usage: verify.sh {server|agent|dashboard|all|migrations} [--check]" + +have() { command -v "$1" >/dev/null 2>&1; } + +# Names of files changed vs origin/main (committed) + uncommitted, restricted to a path prefix. +changed_under() { + local prefix="$1" + { + git -C "$REPO_DIR" diff --name-only origin/main...HEAD -- "$prefix" 2>/dev/null + git -C "$REPO_DIR" status --porcelain -- "$prefix" 2>/dev/null | awk '{print $2}' + } | sort -u +} + +verify_server() { + echo "$INFO === server: $MODE (host=$OS) ===" + have cargo || die "cargo not on PATH — install the Rust toolchain to verify server" + + # sqlx offline-cache freshness: the server build runs SQLX_OFFLINE=true against + # server/.sqlx. If server/ Rust changed but .sqlx did not, the build can fail on a + # cache mismatch — warn loudly (not fatal; queries may be unchanged). + local srv_rs srv_sqlx + srv_rs=$(changed_under "server" | grep -E '\.rs$' | wc -l | tr -d ' ') + srv_sqlx=$(changed_under "server/.sqlx" | wc -l | tr -d ' ') + if [ "$srv_rs" -gt 0 ] && [ "$srv_sqlx" -eq 0 ]; then + echo "$WARN server/*.rs changed but server/.sqlx was not regenerated." + echo "$WARN If you added/changed SQL queries or migrations, run (in server/, DB reachable):" + echo "$WARN cargo sqlx prepare && git add server/.sqlx && git commit" + echo "$WARN Otherwise the post-merge server build may fail on the offline cache." + fi + + local sub="build --release"; [ "$MODE" = check ] && sub="check" + ( cd "$REPO_DIR/server" && SQLX_OFFLINE=true cargo $sub ) + local rc=$? + [ $rc -eq 0 ] || { _logerr "server cargo $sub failed (rc=$rc)" --context "os=$OS"; echo "$ERR server $MODE FAILED (rc=$rc)"; return 1; } + echo "$OK server $MODE passed" +} + +verify_agent() { + echo "$INFO === agent: $MODE (host=$OS) ===" + have cargo || die "cargo not on PATH — install the Rust toolchain to verify agent" + + # Compile-gate warning: platform-gated code is only verified on its own OS. + if [ "$OS" = windows ]; then + echo "$WARN On Windows, cargo cannot verify Linux-gated agent code (#[cfg(target_os=\"linux\")];" + echo "$WARN openssl-sys won't cross-build). The Linux agent build on .30 is the real gate —" + echo "$WARN build on Linux too, or watch /var/log/gururmm-build-linux.log after merge." + elif [ "$OS" = linux ]; then + echo "$INFO Linux host: this gates Linux + cross-platform agent code. Windows-only paths" + echo "$INFO (MSI/tray/legacy/x86) are gated only by the Beast/Pluto Windows build." + fi + + local sub="build --release"; [ "$MODE" = check ] && sub="check" + ( cd "$REPO_DIR/agent" && cargo $sub ) + local rc=$? + [ $rc -eq 0 ] || { _logerr "agent cargo $sub failed (rc=$rc)" --context "os=$OS"; echo "$ERR agent $MODE FAILED (rc=$rc)"; return 1; } + echo "$OK agent $MODE passed (host-OS gate only — see compile-gate warning)" +} + +verify_dashboard() { + echo "$INFO === dashboard: build (host=$OS) ===" + have npm || die "npm not on PATH — install Node to verify dashboard" + # 'npm install' (not ci): tolerates the package.json version-bump skew the pipeline makes. + ( cd "$REPO_DIR/dashboard" && npm install --no-audit --no-fund && npm run build ) + local rc=$? + [ $rc -eq 0 ] || { _logerr "dashboard npm build failed (rc=$rc)" --context "os=$OS"; echo "$ERR dashboard build FAILED (rc=$rc)"; return 1; } + [ -f "$REPO_DIR/dashboard/dist/index.html" ] || { _logerr "dashboard build produced no dist/index.html"; echo "$ERR dashboard build produced no dist/index.html"; return 1; } + echo "$OK dashboard build passed (dist/index.html present)" +} + +check_migrations() { + echo "$INFO === migrations: numbering check ===" + local mdir="" + for cand in "server/migrations" "migrations"; do + [ -d "$REPO_DIR/$cand" ] && { mdir="$REPO_DIR/$cand"; break; } + done + [ -n "$mdir" ] || die "no migrations directory found (looked for server/migrations, migrations)" + echo "$INFO migrations dir: ${mdir#$REPO_DIR/}" + + # Highest NNN prefix and any duplicate numbers. + local nums + nums=$(ls -1 "$mdir" 2>/dev/null | grep -E '^[0-9]+_' | sed -E 's/^([0-9]+)_.*/\1/' | sort) + [ -n "$nums" ] || { echo "$WARN no NNN_ migrations found in $mdir"; return 0; } + + local highest next dupes + highest=$(printf '%s\n' "$nums" | tail -1) + next=$(printf '%03d' $((10#$highest + 1))) + echo "$OK highest migration on this checkout: $highest -> next free number: $next" + + dupes=$(printf '%s\n' "$nums" | uniq -d) + if [ -n "$dupes" ]; then + echo "$ERR duplicate migration number(s) detected: $dupes" + printf '%s\n' "$dupes" | while read -r d; do + ls -1 "$mdir" | grep -E "^${d}_" | sed "s/^/$ERR /" + done + _logerr "duplicate migration number(s): $dupes" + return 1 + fi + echo "$INFO Before naming a new migration, also check the highest number on origin/main" + echo "$INFO (another branch may have claimed $next): git fetch && ls $(printf '%s' "${mdir#$REPO_DIR/}")" +} + +rc=0 +case "$CMD" in + server) verify_server || rc=1 ;; + agent) verify_agent || rc=1 ;; + dashboard) verify_dashboard || rc=1 ;; + migrations) check_migrations || rc=1 ;; + all) + verify_server || rc=1 + verify_agent || rc=1 + verify_dashboard || rc=1 + ;; + *) die "unknown component: $CMD (use server|agent|dashboard|all|migrations)" ;; +esac + +if [ $rc -eq 0 ]; then + echo "$OK verify.sh $CMD: PASS" +else + echo "$ERR verify.sh $CMD: FAIL" +fi +exit $rc