diff --git a/.claude/commands/save.md b/.claude/commands/save.md index e202d09..bfd42e9 100644 --- a/.claude/commands/save.md +++ b/.claude/commands/save.md @@ -34,9 +34,18 @@ Claude writes all sections directly. Be concise, factual, technical. No filler p ### Filename + append behavior -- Filename: `YYYY-MM-DD-session.md` (today's local date) -- If file exists, **append** a `## Update: HH:MM PT — ` section. Do not overwrite. -- If two users worked on the same date, namespace: `YYYY-MM-DD--.md` (e.g. `2026-05-01-howard-syncro-billing-batch.md`) +**Per-session-unique filenames are mandatory** — 3–4 Claude sessions can run against this one +working tree at once, and a shared `YYYY-MM-DD-session.md` lets them overwrite each other's logs. +Never use the bare `YYYY-MM-DD-session.md`. + +- Default: `YYYY-MM-DD--.md` — `` from the User block (identity.json), + `` a short kebab slug of this session's main work (e.g. `2026-06-05-mike-gururmm-platform-day.md`). + The topic naturally separates concurrent sessions. +- Collision guard: if that exact filename already exists and belongs to a **different** session + (different work), append a discriminator — `YYYY-MM-DD---2.md` (increment until free). + Never overwrite another session's file. +- Same-session continuation (re-saving your own ongoing work): **append** a + `## Update: HH:MM PT — ` section to this session's own file. Do not overwrite. ### Required sections (in order) @@ -77,7 +86,7 @@ Fold what you just worked on into the wiki article so it ships in the **same com - If the synthesis subagent fails or is unavailable, fall back to a surgical **refresh** (bump `last_compiled` + `sources`; refresh client Syncro fields) so the article still records the session, and emit `[WARN] wiki refreshed, not recompiled; run /wiki-compile --full later`. - Any other failure: log it and continue to sync. -The article + `wiki/index.md` are picked up by `sync.sh`'s `git add -A` and committed alongside the session log. +The article + `wiki/index.md` are committed alongside the session log by `sync.sh` (Phase 4). --- @@ -87,7 +96,9 @@ The article + `wiki/index.md` are picked up by `sync.sh`'s `git add -A` and comm bash .claude/scripts/sync.sh ``` -`sync.sh` handles: reconcile this machine's `git config user.name/email` to `.claude/identity.json` (so commit authorship can't drift), stage all changes with `git add -A` (after purging garbled Windows path-as-filename cruft), auto-commit, fetch + rebase, push, then the same flow for the vault repo, then surface cross-user `## Note for ` blocks. +`sync.sh` is **serialized by a per-machine lock** (`.git/claudetools-sync.lock`) so concurrent sessions or the scheduled-task sync cannot interleave commits/rebases; if another sync is mid-flight it waits up to ~120s, then skips (the next sync catches up). It then handles: reconcile this machine's `git config user.name/email` to `.claude/identity.json` (so commit authorship can't drift), stage all changes with `git add -A` (after purging garbled Windows path-as-filename cruft), auto-commit, fetch + rebase, push, then the same flow for the vault repo, then surface cross-user `## Note for ` blocks. + +> Note: `git add -A` is still the catch-all sweep, so a save run will also pick up any *other* dirty files in the shared tree. The lock prevents two syncs from racing, and per-session-unique log filenames prevent log overwrites — but the bare-`add -A` capture means full per-session commit isolation is a later step (see the isolation plan: drop blind `add -A` in favour of explicit per-session staging). For now, avoid running `/save` from two sessions at the exact same moment. After sync, emit a **Post-commit Summary**: diff --git a/.claude/memory/feedback_verify_committed_state_before_push.md b/.claude/memory/feedback_verify_committed_state_before_push.md new file mode 100644 index 0000000..b791e93 --- /dev/null +++ b/.claude/memory/feedback_verify_committed_state_before_push.md @@ -0,0 +1,24 @@ +--- +name: feedback_verify_committed_state_before_push +description: For webhook-builds-from-main deploys, verify the COMMITTED state builds (not just the working tree); git-add bad-pathspec aborts the whole stage +metadata: + type: feedback +--- + +When a deploy pipeline builds from `origin/main` (e.g. GuruRMM's `build-dashboard.sh` does +`git reset --hard origin/main` then build), the SERVER builds the COMMITTED content — so a local +`tsc`/`vite build` passing against your **working tree** can MASK an incomplete commit and you push a +broken main. + +**Why:** A `git add ` with a stale/deleted pathspec **aborts the entire add** +("fatal: pathspec ... did not match"), silently staging nothing — so the commit captured only an +earlier `git rm`, not the new files. Working-tree build still passed; the committed build failed on +the server. (GuruRMM Phase-2 omnibox, 2026-06-05: main pushed importing a deleted CommandPalette.) + +**How to apply:** +- Stage with the DIRECTORY (`git add dashboard/src/components/omnibox`), not the deleted file path. +- Before pushing a merge that a webhook will build: verify the **committed** state, e.g. + `git stash -u && (cd dashboard && npx tsc -b && npx vite build) ; git stash pop` — or check + `git show HEAD:` / `git ls-files ` to confirm the intended files are actually in the commit. +- A failed beta build does NOT deploy (marker not written), so beta stays on the last good version — + but main is left broken for others until fixed. See [[reference_gururmm]]. diff --git a/.claude/scripts/sync.sh b/.claude/scripts/sync.sh index 851360c..5e870d5 100755 --- a/.claude/scripts/sync.sh +++ b/.claude/scripts/sync.sh @@ -121,6 +121,111 @@ cd "$REPO_ROOT" echo -e "${GREEN}[OK]${NC} Working directory: $(pwd)" +# --- Concurrency lock -------------------------------------------------------- +# WHY: multiple sync runs on ONE machine must NOT overlap. An interactive /sync +# or /save can collide with the scheduled-task sync, or two concurrent Claude +# sessions can each stage + commit + fetch + rebase + push and interleave their +# git state — corrupting an in-progress rebase, orphaning commits, or pushing a +# half-built tree. We serialize the whole claudetools critical section (Phase 1a +# submodule update, staging, commit, fetch, rebase, push — and by extension the +# vault phase) behind a single per-machine lock. +# +# PORTABILITY: `flock` is frequently ABSENT on Git Bash (MSYS2), so we can't +# depend on it. An atomic `mkdir` is the lowest common denominator — it fails if +# the directory already exists, atomically, on every platform we run on (Windows +# Git Bash, macOS, Linux). The lock lives under .git/ (never tracked, so a blind +# `git add -A` can't stage it) and is scoped to this repo. +SYNC_LOCK_DIR="$REPO_ROOT/.git/claudetools-sync.lock" +SYNC_LOCK_WAIT=120 # max seconds to wait for a held lock before skipping the run +SYNC_LOCK_STALE=600 # seconds after which a held lock is treated as stale (10 min) +SYNC_LOCK_OWNED=0 # becomes 1 only once THIS run owns the lock (gates release) + +# Idempotent release — only removes the lock if THIS process actually owns it +# (stored PID == $$), so a "skipping this run" exit can never clobber the lock +# held by the live sync we deferred to. Installed as an EXIT trap because the +# script runs under `set -e`: the lock must be released on error exits too. +# (There is no pre-existing EXIT trap in this script, so this adds a fresh one.) +release_sync_lock() { + if [ "$SYNC_LOCK_OWNED" = "1" ] && [ -d "$SYNC_LOCK_DIR" ]; then + local owner_pid + owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") + if [ -z "$owner_pid" ] || [ "$owner_pid" = "$$" ]; then + rm -rf "$SYNC_LOCK_DIR" 2>/dev/null || true + fi + SYNC_LOCK_OWNED=0 + fi +} +trap release_sync_lock EXIT INT TERM + +# Portable liveness check. `kill -0 ` works on Git Bash (it maps to the +# Windows process table), macOS, and Linux; guarded so a bad/empty PID is "dead". +sync_pid_alive() { + local pid="$1" + [ -n "$pid" ] || return 1 + kill -0 "$pid" 2>/dev/null +} + +acquire_sync_lock() { + local waited=0 owner_pid owner_ts now mtime lock_age stale_aside + while true; do + if mkdir "$SYNC_LOCK_DIR" 2>/dev/null; then + SYNC_LOCK_OWNED=1 + printf '%s' "$$" > "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || true + # PID + ISO timestamp inside the lock dir, for diagnostics. + { + printf 'pid=%s\n' "$$" + printf 'iso=%s\n' "$(date -u "+%Y-%m-%dT%H:%M:%SZ")" + printf 'machine=%s\n' "$MACHINE" + } > "$SYNC_LOCK_DIR/owner" 2>/dev/null || true + # Defense-in-depth: confirm we still own the dir we just created. If + # owner.pid isn't ours, drop ownership and re-evaluate (never fatal + # under set -e — comparison is cheap and the body just loops). + if [ "$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null)" != "$$" ]; then + SYNC_LOCK_OWNED=0; continue + fi + return 0 + fi + + # mkdir failed -> the lock is held. Decide whether it's stale or live. + owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") + owner_ts=$(sed -n 's/^iso=//p' "$SYNC_LOCK_DIR/owner" 2>/dev/null | head -1) + [ -n "$owner_ts" ] || owner_ts="unknown" + + # Stale if the dir is older than the threshold OR the owner PID is dead. + # `stat -c` is GNU/Git-Bash, `stat -f` is BSD/macOS; fall back to 0. + now=$(date +%s 2>/dev/null || echo 0) + mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0) + lock_age=$(( now - mtime )) + if { [ "$mtime" -gt 0 ] && [ "$lock_age" -ge "$SYNC_LOCK_STALE" ]; } \ + || { [ -n "$owner_pid" ] && ! sync_pid_alive "$owner_pid"; }; then + echo -e "${YELLOW}[WARNING]${NC} removing stale sync lock (held by PID ${owner_pid:-?} since ${owner_ts}, age ${lock_age}s)" + # Atomically claim the right to clear the stale lock. Only ONE racer can rename + # the canonical dir aside (rename source vanishes after the first; the loser's mv + # fails and it re-evaluates next pass). The canonical lock name is thereafter only + # ever recreated by the atomic mkdir at the top, so a live freshly-acquired lock + # can never be rm'd out from under its owner. + stale_aside="${SYNC_LOCK_DIR}.stale.$$" + if mv "$SYNC_LOCK_DIR" "$stale_aside" 2>/dev/null; then + rm -rf "$stale_aside" 2>/dev/null || true + fi + continue # retry mkdir immediately + fi + + # Live lock. If we've waited the full budget, skip (a duplicate sync is + # harmless to drop — the next scheduled/interactive run catches up). + if [ "$waited" -ge "$SYNC_LOCK_WAIT" ]; then + echo -e "${YELLOW}[WARNING]${NC} another sync is in progress (held by PID ${owner_pid:-?} since ${owner_ts}); skipping this run" + exit 75 # EX_TEMPFAIL: deferred (another sync in progress), not a real success + fi + sleep 2 + waited=$(( waited + 2 )) + done +} + +acquire_sync_lock +echo -e "${GREEN}[OK]${NC} Acquired sync lock ($SYNC_LOCK_DIR)" +# --- end concurrency lock ---------------------------------------------------- + # Detect Python interpreter — read from identity.json first, fall back to detection PYTHON="" if [ -f ".claude/identity.json" ] && command -v jq >/dev/null 2>&1; then