sync: auto-sync from GURU-5070 at 2026-06-21 17:24:36

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-21 17:24:36
This commit is contained in:
2026-06-21 17:25:22 -07:00
parent b49cb21fa6
commit f8c33c9019
5 changed files with 319 additions and 1 deletions

View File

@@ -101,6 +101,7 @@
- [Syncro lessons / incident archive](feedback_syncro_history.md) — Detail behind the three rule files: tickets (#32332, #32312, #32225, #32253, #32203, #32185, #32142, #32304, #32333), verbatim Mike/Howard/Winter quotes, dates, tech user_id table (Mike 1735 / Howard 1750 / Winter 1737 / Rob 1760), labor product table, and superseded-rule history.
### GuruRMM
- [GuruRMM build verification (read before touching the pipeline)](feedback_gururmm_build_verification.md) — Merge-to-main IS the build+deploy; verify locally FIRST. Canonical refs: guru-rmm `docs/BUILD.md` + the `gururmm-build` skill (`verify.sh server|agent|dashboard|migrations`) + `deploy/build-pipeline/README.md`. Compile-gate trap: Windows cargo can't verify Linux-gated agent code (openssl-sys); Linux build on .30 is the real gate. Server needs SQLX_OFFLINE + fresh server/.sqlx; check migration-number collisions.
- [GuruRMM operational rules](feedback_gururmm.md) — Six rules: (1) RMM dev = Mike, never Howard (368/0 commits); GuruScan is Howard's. (2) Agent parity Win+Linux+macOS in same change. (3) Builds via Gitea webhook pipeline only, never SSH. (4) #bot-alerts only for client/ticket impact, skip internal infra/dev. (5) Identify agents by IP, not by reconning candidates. (6) UNC paths in user_session need [char]92 — literals get halved.
- [Build channel default = beta](feedback_gururmm_build_channel_default.md) — New agent builds must be tagged BETA by default (stable = explicit promote re-tag); distinct from agents defaulting to the stable CHANNEL (correct). Fixed build-windows/linux.sh 2026-06-01; macOS already correct. Enables beta-first canary.
- [Dashboard beta-first deploy](feedback_dashboard_beta_first.md) — Dashboard auto-builds to rmm-beta.azcomputerguru.com on push; prod (rmm.azcomputerguru.com) is explicit promote-only via promote-dashboard.sh --confirm. Never hand-rsync prod. One artifact, nginx sub_filter BETA banner. Stood up 2026-06-02.

View File

@@ -0,0 +1,48 @@
---
name: feedback_gururmm_build_verification
description: Before touching the GuruRMM build pipeline, verify locally first — merge-to-main IS the build+deploy. Canonical refs: docs/BUILD.md + the gururmm-build skill. Compile-gate trap.
metadata:
type: feedback
---
Every Claude instance that touches the GuruRMM build pipeline must internalize this before
editing agent/server/dashboard code:
**Merging to `main` IS the build-and-deploy trigger — there is no separate build step.** A push
to main fires the Gitea webhook → `webhook-handler.py` on 172.16.3.30 → `build-shared.sh` (version
bump + `[ci-version-bump]` commit) → per-component builds (each self-gated on change). New
agent/server/dashboard artifacts land on **beta** first; prod promotion is deliberate.
Canonical, kept-current references (use these, don't re-derive from the scripts):
- **`projects/msp-tools/guru-rmm/docs/BUILD.md`** — developer build + pre-merge verification guide.
- **`gururmm-build` skill** — `verify.sh {server|agent|dashboard|all|migrations} [--check]` runs the
same cargo/npm the server runs, on the right OS, with the gotchas baked in + errorlog on failure.
- **`projects/msp-tools/guru-rmm/deploy/build-pipeline/README.md`** — server-side pipeline source of
truth (webhook, Beast/Pluto hosts, signing, dashboard channels, repo↔/opt sync). The repo's older
`scripts/build-agents.sh` + `scripts/webhook-handler.py` are a PRIOR generation — do not follow them.
The traps that break a post-merge build:
- **Compile gate:** Linux-gated agent code (`#[cfg(target_os="linux")]`, e.g. `is_docker_container()`
in `agent/src/updater/mod.rs`) is ONLY compiled by the Linux agent build — Windows `cargo check`
CANNOT verify it (`openssl-sys` won't cross-build). The Linux agent build on .30 is the real gate.
Build agent changes on the OS family they target; Windows-only paths (MSI/tray/legacy/x86) are
gated only by the Beast/Pluto Windows build.
- **sqlx offline cache:** the server build uses `SQLX_OFFLINE=true` + `server/.sqlx`. If you changed
queries/migrations, run `cargo sqlx prepare` and commit `server/.sqlx`, or the build fails.
- **Migration number collisions** across branches: check the highest `NNN_` on origin/main before
naming a new one (`verify.sh migrations`). Idempotent (`IF NOT EXISTS`) keeps overlaps harmless.
- Webhook builds from **origin/main**, so verify the COMMITTED state, not the working tree.
**Why:** merging deploys, so an unverified merge breaks the build or ships broken code. Howard hit
this 2026-06-21 — couldn't cross-build the Linux-gated BUG-019 path from his Windows box, so the
post-merge Linux build on .30 was the only gate.
**How to apply:** before merging gururmm changes, run `gururmm-build` `verify.sh` for each touched
component (on the OS it targets); read `docs/BUILD.md` if unfamiliar with the model; after merge,
watch the matching `/var/log/gururmm-build-*.log` on .30. Promote the dashboard beta→prod only via
`promote-dashboard.sh --confirm`.
Related: [[reference_gururmm]] [[rmm-dashboard-beta-before-main]] [[feedback_dashboard_beta_first]]
[[feedback_gururmm_build_channel_default]] [[feedback_verify_committed_state_before_push]]
[[reference_guru5070_rust_toolchain]] [[reference_sqlx_migrations_immutable]]
[[gururmm-beast-windows-build-host]] [[feedback_gururmm]] [[rmm-agent-update-model]]

View File

@@ -185,7 +185,7 @@ ACG manages multiple M365 tenants via the **ComputerGuru tiered MSP app suite**
| CW Concrete | `clients/cw-concrete/m365.sops.yaml` |
| Kittle (M. Sanchez) | `clients/kittle/m365-michael-sanchez.sops.yaml` |
Also: multi-tenant Graph API service principal at `msp-tools/claude-msp-access-graph-api.sops.yaml`.
Also: the multi-tenant **ComputerGuru remediation app suite** — tiered SPs (Security Investigator `bfbc12a4`, Exchange Operator `b43e7342` (holds Graph Mail.Send), User Manager `64fac46b`, Tenant Admin `709e6eed`, Defender add-on `dbf8ad1a`), secrets `msp-tools/computerguru-*.sops.yaml`. ACG own-mail (`/mailbox`) uses the single-tenant `1873b1b0` app (`msp-tools/computerguru-mailbox.sops.yaml`). The old single-app `fabb3421` ("Claude-MSP-Access", secret `msp-tools/claude-msp-access-graph-api.sops.yaml`) was **DELETED from the tenant 2026-06-14 — do not reference** (token requests return AADSTS700016). See [[feedback_365_remediation_tool]].
**Google Workspace:** ACG service account `msp-tools/acg-msp-access-google-workspace.sops.yaml`. Client-specific: `clients/lonestar-electrical/google-workspace.sops.yaml`.

View File

@@ -0,0 +1,101 @@
---
name: gururmm-build
description: >
Build and pre-merge verification for the GuruRMM codebase (agent/server/dashboard). Encodes the
fleet's build model so every dev/machine builds the SAME way Mike does: merging to main is the
build+deploy trigger (no separate build step), so verify locally first. Wraps the cargo/npm
commands the server-side pipeline runs, with the compile-gate gotcha baked in (Linux-gated agent
code can't be verified by Windows cargo check; server needs SQLX_OFFLINE + a fresh .sqlx cache;
dashboard uses npm install not ci). Triggers: build the rmm agent/server/dashboard, verify the
linux build, gururmm build, pre-merge check, will this compile, sqlx prepare, migration collision,
promote dashboard, what does merging to main do.
---
# gururmm-build — Build & pre-merge verification for GuruRMM
One skill so the whole fleet builds GuruRMM the same way. The actual build+deploy happens
server-side on a push to `main` (Gitea webhook → `webhook-handler.py` on `172.16.3.30`); this
skill is the **local pre-merge verification** layer plus the mental model, so nobody merges code
that fails the server build.
The human-facing companion doc lives in the project: `projects/msp-tools/guru-rmm/docs/BUILD.md`.
The server-side pipeline source of truth is `projects/msp-tools/guru-rmm/deploy/build-pipeline/README.md`.
## The one rule
**Merging to `main` is the build-and-deploy trigger. There is no separate build step.** New
agent/server/dashboard binaries land on **beta** first; prod promotion is deliberate and separate.
So: verify locally before you merge, and after merge watch the relevant build log on `.30`.
## Commands
```bash
# Verify a component compiles the way the server will (run from anywhere in the fleet)
bash .claude/skills/gururmm-build/scripts/verify.sh server # SQLX_OFFLINE cargo build + .sqlx freshness check
bash .claude/skills/gururmm-build/scripts/verify.sh agent # cargo build (warns about the compile gate per-OS)
bash .claude/skills/gururmm-build/scripts/verify.sh dashboard # npm install + npm run build
bash .claude/skills/gururmm-build/scripts/verify.sh all # all three
# Faster: cargo check instead of a full release build (Rust components only)
bash .claude/skills/gururmm-build/scripts/verify.sh server --check
bash .claude/skills/gururmm-build/scripts/verify.sh agent --check
# Check for a migration number collision before you name/merge a new migration
bash .claude/skills/gururmm-build/scripts/verify.sh migrations
```
The script auto-locates the guru-rmm checkout (the `projects/msp-tools/guru-rmm` submodule),
detects the host OS, applies the right toolchain/env, and logs real failures to `errorlog.md`.
## The compile gate (why Windows cargo check is not enough)
The agent compiles across OSes and toolchains; **each OS gates only its own platform code**:
- **Linux-gated agent code** (`#[cfg(target_os = "linux")]`, e.g. `is_docker_container()` in
`agent/src/updater/mod.rs`) is only compiled by the **Linux** agent build. You cannot verify it
with cargo on Windows — `openssl-sys` won't cross-build there. `verify.sh agent` warns when you're
on Windows and the change may be Linux-gated. To truly gate it, build on Linux or watch the
post-merge `/var/log/gururmm-build-linux.log` on `.30`.
- **Windows-only code** — MSI (`installer/`, WiX), tray (`tray/`), legacy (Rust 1.77) + x86 variants
— is only built on the Beast/Pluto Windows hosts; a Linux/macOS box can't gate it.
Rule: **build agent changes on the OS family they target.**
## Server + sqlx (the other common merge-breaker)
`build-server.sh` builds with `SQLX_OFFLINE=true` against the committed `server/.sqlx` cache. If you
changed SQL queries or added a migration and did **not** refresh the cache, the server build fails:
```bash
cd server && cargo sqlx prepare # needs a reachable DB; regenerates server/.sqlx
git add server/.sqlx && git commit -m "chore: refresh sqlx offline cache"
```
`verify.sh server` flags when `server/` query code changed but `server/.sqlx` was not also touched.
## Migrations
Numbered `NNN_description.sql`, applied on server deploy. Keep them idempotent
(`CREATE INDEX IF NOT EXISTS`). **Number collisions across branches are real** — if two branches
both add `060_*`, the second to merge must be renumbered. `verify.sh migrations` reports the highest
migration on `main` and any duplicate numbers so you pick the next free one.
## After merge
Build logs on `.30`: `gururmm-build.log` (shared/version bump), `-linux`, `-windows`, `-server`,
`-dashboard`. Server deploy auto-rolls-back if the new binary won't start. Agents + dashboard land
on **beta**. Promote the dashboard to prod deliberately:
```bash
sudo /opt/gururmm/promote-dashboard.sh # dry-run delta
sudo /opt/gururmm/promote-dashboard.sh --confirm # promote (backs up prod)
sudo /opt/gururmm/promote-dashboard.sh --rollback # undo last promotion
```
## Notes
- This skill does **not** trigger the production build — that's the webhook on merge to `main`. It
verifies locally and documents the model.
- A full `cargo build --release` for agent/server is slow (minutes); use `--check` for a fast gate
and the release build only when you need the real artifact.
- Detail: `projects/msp-tools/guru-rmm/docs/BUILD.md` and `deploy/build-pipeline/README.md`.

View File

@@ -0,0 +1,168 @@
#!/usr/bin/env bash
# verify.sh — GuruRMM local pre-merge verification.
# Builds/checks a component the SAME way the server-side pipeline does, so you don't
# merge code that fails the post-merge build on .30. Does NOT trigger the production
# build (that's the webhook on push to main) — this is the local gate only.
#
# Usage:
# bash verify.sh server [--check] # SQLX_OFFLINE cargo build (or check) + .sqlx freshness warning
# bash verify.sh agent [--check] # cargo build (or check) + per-OS compile-gate warning
# bash verify.sh dashboard # npm install + npm run build
# bash verify.sh all [--check] # server + agent + dashboard
# bash verify.sh migrations # report highest migration number + any duplicate numbers
#
# See: projects/msp-tools/guru-rmm/docs/BUILD.md
# projects/msp-tools/guru-rmm/deploy/build-pipeline/README.md
set -uo pipefail
# Resolve repo root (4 levels up: .claude/skills/gururmm-build/scripts/ -> repo root).
CLAUDETOOLS_ROOT="${CLAUDETOOLS_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)}"
REPO_DIR="$CLAUDETOOLS_ROOT/projects/msp-tools/guru-rmm"
_logerr() { bash "$CLAUDETOOLS_ROOT/.claude/scripts/log-skill-error.sh" "gururmm-build" "$@" >/dev/null 2>&1 || true; }
# ASCII markers (no emojis — fleet rule).
OK="[OK]"; ERR="[ERROR]"; WARN="[WARNING]"; INFO="[INFO]"
die() { echo "$ERR $*" >&2; _logerr "$*"; exit 1; }
[ -d "$REPO_DIR" ] || die "guru-rmm checkout not found at $REPO_DIR (submodule not initialized?)"
# Host OS family.
case "$(uname -s 2>/dev/null)" in
Linux*) OS=linux ;;
Darwin*) OS=mac ;;
MINGW*|MSYS*|CYGWIN*) OS=windows ;;
*) OS=unknown ;;
esac
CMD="${1:-}"; shift || true
MODE="build"
for a in "$@"; do
case "$a" in
--check) MODE="check" ;;
*) die "unknown flag: $a" ;;
esac
done
[ -n "$CMD" ] || die "usage: verify.sh {server|agent|dashboard|all|migrations} [--check]"
have() { command -v "$1" >/dev/null 2>&1; }
# Names of files changed vs origin/main (committed) + uncommitted, restricted to a path prefix.
changed_under() {
local prefix="$1"
{
git -C "$REPO_DIR" diff --name-only origin/main...HEAD -- "$prefix" 2>/dev/null
git -C "$REPO_DIR" status --porcelain -- "$prefix" 2>/dev/null | awk '{print $2}'
} | sort -u
}
verify_server() {
echo "$INFO === server: $MODE (host=$OS) ==="
have cargo || die "cargo not on PATH — install the Rust toolchain to verify server"
# sqlx offline-cache freshness: the server build runs SQLX_OFFLINE=true against
# server/.sqlx. If server/ Rust changed but .sqlx did not, the build can fail on a
# cache mismatch — warn loudly (not fatal; queries may be unchanged).
local srv_rs srv_sqlx
srv_rs=$(changed_under "server" | grep -E '\.rs$' | wc -l | tr -d ' ')
srv_sqlx=$(changed_under "server/.sqlx" | wc -l | tr -d ' ')
if [ "$srv_rs" -gt 0 ] && [ "$srv_sqlx" -eq 0 ]; then
echo "$WARN server/*.rs changed but server/.sqlx was not regenerated."
echo "$WARN If you added/changed SQL queries or migrations, run (in server/, DB reachable):"
echo "$WARN cargo sqlx prepare && git add server/.sqlx && git commit"
echo "$WARN Otherwise the post-merge server build may fail on the offline cache."
fi
local sub="build --release"; [ "$MODE" = check ] && sub="check"
( cd "$REPO_DIR/server" && SQLX_OFFLINE=true cargo $sub )
local rc=$?
[ $rc -eq 0 ] || { _logerr "server cargo $sub failed (rc=$rc)" --context "os=$OS"; echo "$ERR server $MODE FAILED (rc=$rc)"; return 1; }
echo "$OK server $MODE passed"
}
verify_agent() {
echo "$INFO === agent: $MODE (host=$OS) ==="
have cargo || die "cargo not on PATH — install the Rust toolchain to verify agent"
# Compile-gate warning: platform-gated code is only verified on its own OS.
if [ "$OS" = windows ]; then
echo "$WARN On Windows, cargo cannot verify Linux-gated agent code (#[cfg(target_os=\"linux\")];"
echo "$WARN openssl-sys won't cross-build). The Linux agent build on .30 is the real gate —"
echo "$WARN build on Linux too, or watch /var/log/gururmm-build-linux.log after merge."
elif [ "$OS" = linux ]; then
echo "$INFO Linux host: this gates Linux + cross-platform agent code. Windows-only paths"
echo "$INFO (MSI/tray/legacy/x86) are gated only by the Beast/Pluto Windows build."
fi
local sub="build --release"; [ "$MODE" = check ] && sub="check"
( cd "$REPO_DIR/agent" && cargo $sub )
local rc=$?
[ $rc -eq 0 ] || { _logerr "agent cargo $sub failed (rc=$rc)" --context "os=$OS"; echo "$ERR agent $MODE FAILED (rc=$rc)"; return 1; }
echo "$OK agent $MODE passed (host-OS gate only — see compile-gate warning)"
}
verify_dashboard() {
echo "$INFO === dashboard: build (host=$OS) ==="
have npm || die "npm not on PATH — install Node to verify dashboard"
# 'npm install' (not ci): tolerates the package.json version-bump skew the pipeline makes.
( cd "$REPO_DIR/dashboard" && npm install --no-audit --no-fund && npm run build )
local rc=$?
[ $rc -eq 0 ] || { _logerr "dashboard npm build failed (rc=$rc)" --context "os=$OS"; echo "$ERR dashboard build FAILED (rc=$rc)"; return 1; }
[ -f "$REPO_DIR/dashboard/dist/index.html" ] || { _logerr "dashboard build produced no dist/index.html"; echo "$ERR dashboard build produced no dist/index.html"; return 1; }
echo "$OK dashboard build passed (dist/index.html present)"
}
check_migrations() {
echo "$INFO === migrations: numbering check ==="
local mdir=""
for cand in "server/migrations" "migrations"; do
[ -d "$REPO_DIR/$cand" ] && { mdir="$REPO_DIR/$cand"; break; }
done
[ -n "$mdir" ] || die "no migrations directory found (looked for server/migrations, migrations)"
echo "$INFO migrations dir: ${mdir#$REPO_DIR/}"
# Highest NNN prefix and any duplicate numbers.
local nums
nums=$(ls -1 "$mdir" 2>/dev/null | grep -E '^[0-9]+_' | sed -E 's/^([0-9]+)_.*/\1/' | sort)
[ -n "$nums" ] || { echo "$WARN no NNN_ migrations found in $mdir"; return 0; }
local highest next dupes
highest=$(printf '%s\n' "$nums" | tail -1)
next=$(printf '%03d' $((10#$highest + 1)))
echo "$OK highest migration on this checkout: $highest -> next free number: $next"
dupes=$(printf '%s\n' "$nums" | uniq -d)
if [ -n "$dupes" ]; then
echo "$ERR duplicate migration number(s) detected: $dupes"
printf '%s\n' "$dupes" | while read -r d; do
ls -1 "$mdir" | grep -E "^${d}_" | sed "s/^/$ERR /"
done
_logerr "duplicate migration number(s): $dupes"
return 1
fi
echo "$INFO Before naming a new migration, also check the highest number on origin/main"
echo "$INFO (another branch may have claimed $next): git fetch && ls $(printf '%s' "${mdir#$REPO_DIR/}")"
}
rc=0
case "$CMD" in
server) verify_server || rc=1 ;;
agent) verify_agent || rc=1 ;;
dashboard) verify_dashboard || rc=1 ;;
migrations) check_migrations || rc=1 ;;
all)
verify_server || rc=1
verify_agent || rc=1
verify_dashboard || rc=1
;;
*) die "unknown component: $CMD (use server|agent|dashboard|all|migrations)" ;;
esac
if [ $rc -eq 0 ]; then
echo "$OK verify.sh $CMD: PASS"
else
echo "$ERR verify.sh $CMD: FAIL"
fi
exit $rc