diff --git a/.claude/commands/checkpoint.md b/.claude/commands/checkpoint.md index 991845a..45d936e 100644 --- a/.claude/commands/checkpoint.md +++ b/.claude/commands/checkpoint.md @@ -14,11 +14,10 @@ Please create a comprehensive git checkpoint with the following steps: - Run `git diff` to see detailed changes in tracked files - Run `git log -5 --oneline` to understand the commit message style of this repository -3. **Stage everything**: +3. **Decide what will be staged** (do NOT stage yet): - - Add ALL tracked changes (modified and deleted files) - - Add ALL untracked files (new files) - - Use `git add -A` or `git add .` to stage everything + - Identify all tracked changes (modified/deleted) and untracked (new) files via `git status`. + - Staging is done **atomically with the commit, under the repo lock, in step 5** — do not run a separate `git add` here. This prevents a concurrent session in a shared worktree (e.g. ClaudeTools) from having its dirty files swept into this checkpoint. 4. **Draft commit message body via Ollama** (documentation engine): @@ -49,7 +48,17 @@ print(res['message']['content']) - **Body**: Ollama draft (Claude reviews); Claude writes directly if Ollama unavailable - **Footer**: `Co-Authored-By: Claude Sonnet 4.6 ` -5. **Execute the commit**: Create the commit with the properly formatted message following this repository's conventions. +5. **Execute the commit (locked)**: Write the final message (summary line + body + footer) to a temp file, then stage + commit **atomically under the repo's commit lock** so concurrent sessions can't interleave or get swept in: + + ```bash + # MSG = path to the composed commit-message file; LOCK = the shared lock wrapper + LOCK="${CLAUDETOOLS_ROOT:-/d/claudetools}/.claude/scripts/sync-lock.sh" + bash "$LOCK" run bash -c 'git add -A && git commit -F "$1"' _ "$MSG" + ``` + - The lock is scoped to the **current repo** (`git rev-parse --show-toplevel`/.git), so this serializes correctly whether the checkpoint is in ClaudeTools (shares the same lock as `/sync` and `/scc`) or in a project repo (its own lock). The wrapper errors out (exit 2) if you're not in a git repo. + - If it **exits 75**, another commit/sync holds the lock — wait briefly and retry, or report "checkpoint deferred". + - This is a **local commit only** (no push), matching checkpoint's purpose. + - `$CLAUDETOOLS_ROOT` should be set per-machine; the `/d/claudetools` fallback is for this box only — on Mac/Linux it resolves from the env var. ## Part 2: Verify Git Checkpoint diff --git a/.claude/commands/scc.md b/.claude/commands/scc.md index 5a9d549..37dcf1b 100644 --- a/.claude/commands/scc.md +++ b/.claude/commands/scc.md @@ -6,24 +6,17 @@ Quick command to save session log, stage everything, and push to Gitea in one sh 1. **Save session log** - Create/update session log for today using the /save skill logic: - Determine correct location based on work context (project-specific or general `session-logs/`) - - Use format `YYYY-MM-DD-session.md` - - If file exists, append with `## Update: HH:MM` header + - **Per-session-unique filename (mandatory)** — concurrent sessions share this worktree, so never use the bare `YYYY-MM-DD-session.md`. Use `YYYY-MM-DD--.md`; collision-guard + same-session-append rules are in `/save` (`save.md`). - Include: summary, credentials (unredacted), infrastructure, commands, files changed, pending tasks -2. **Stage all changes** - Run `git add -A` to stage everything including the new session log +2. **Commit + push (locked, rebase-safe)** - Run `bash .claude/scripts/sync.sh`. This is the single serialized git path: it takes the per-machine sync lock (so it can't interleave with another session's sync/commit), reconciles git identity to `identity.json`, stages changes, commits, fetch + rebase, pushes — ClaudeTools then vault. + - **Do NOT** run raw `git add -A` / `git commit` / `git push origin main` here — that bypasses the lock AND the fetch+rebase (the old flow raced and would reject on a stale push). + - If `sync.sh` **exits 75**, another sync is in progress: report "sync deferred — your log is saved locally and will sync on the next run"; do not claim pushed. + - Note: the discrete `scc:`-prefixed message is dropped in favour of one locked git path (commit lands under `sync.sh`'s auto message). If a custom message matters, revisit later (e.g. a `-m` arg on `sync.sh`). -3. **Commit** - Auto-commit with message: - ``` - scc: Session save and push from [hostname] at [timestamp] +3. **Report** - Confirm what was saved, committed, and pushed (or deferred) - Co-Authored-By: Claude Opus 4.6 - ``` - -4. **Push to Gitea** - Run `git push origin main` - -5. **Report** - Confirm what was saved, committed, and pushed - -6. **Reaffirm roles** - After push, briefly restate: +4. **Reaffirm roles** - After push, briefly restate: - You are a COORDINATOR, not an executor - Delegate: DB -> Database Agent, code -> Coding Agent, git -> Gitea Agent, tests -> Testing Agent - Do yourself: simple responses, reading 1-2 files, planning, decisions diff --git a/.claude/scripts/sync-lock.sh b/.claude/scripts/sync-lock.sh new file mode 100644 index 0000000..fb7c1d2 --- /dev/null +++ b/.claude/scripts/sync-lock.sh @@ -0,0 +1,185 @@ +#!/bin/bash +# ClaudeTools shared sync-concurrency lock primitive +# ---------------------------------------------------------------------------- +# A per-repo, per-machine critical-section lock shared by every commit path +# (sync.sh, /scc, /checkpoint, ...). Extracted VERBATIM from sync.sh so the +# logic — which already survived two review rounds — is preserved exactly: +# * atomic mkdir lock (flock is frequently absent on Git Bash / MSYS2) +# * stale detection (age threshold OR dead owner PID), with a re-verify guard +# immediately before clearing so a fresh winner is never stolen from +# * rename-aside clear (mv then rm) instead of a bare rm +# * exit 75 (EX_TEMPFAIL) on live-lock contention after the wait budget +# * sleep 1 busy-spin insurance if clearing persistently fails +# * defense-in-depth owner.pid==$$ re-read right after acquisition +# * ownership-checked, idempotent release (owner.pid must be ours or empty) +# +# TWO WAYS TO USE: +# 1. SOURCE it (e.g. from sync.sh). Sourcing defines vars + functions ONLY — +# no trap is installed and the lock is NOT acquired. The caller sets +# SYNC_LOCK_DIR (optional — a default is derived from the current git repo +# if unset), installs its own `trap release_sync_lock EXIT INT TERM`, and +# calls `acquire_sync_lock` where it wants the critical section to begin. +# 2. EXECUTE it as a wrapper: bash sync-lock.sh run [args...] +# Resolves the lock dir from the current git repo, installs the trap, +# acquires the lock, runs , then releases via the EXIT trap and exits +# with 's status. Contention propagates as exit 75. +# +# Lock-dir basename is fixed at `claudetools-sync.lock` so EVERY tool locking +# the same repo root contends on the SAME directory. +# ---------------------------------------------------------------------------- + +# Colours — define only if the caller hasn't already (sync.sh defines these +# before sourcing; standalone execution needs them too). +: "${RED:=\033[0;31m}" +: "${GREEN:=\033[0;32m}" +: "${YELLOW:=\033[1;33m}" +: "${CYAN:=\033[0;36m}" +: "${NC:=\033[0m}" + +# Machine label used in lock diagnostics. sync.sh sets MACHINE before sourcing; +# guard it so standalone wrapper use (under set -u) never trips on an unset var. +: "${MACHINE:=$(hostname 2>/dev/null || echo unknown)}" + +# --- Concurrency lock -------------------------------------------------------- +# WHY: multiple sync/commit runs on ONE machine must NOT overlap. An interactive +# /sync, /scc, or /checkpoint can collide with the scheduled-task sync, or two +# concurrent Claude sessions can each stage + commit + fetch + rebase + push and +# interleave their git state — corrupting an in-progress rebase, orphaning +# commits, or pushing a half-built tree. We serialize the whole critical section +# behind a single per-machine lock. +# +# PORTABILITY: `flock` is frequently ABSENT on Git Bash (MSYS2), so we can't +# depend on it. An atomic `mkdir` is the lowest common denominator — it fails if +# the directory already exists, atomically, on every platform we run on (Windows +# Git Bash, macOS, Linux). The lock lives under .git/ (never tracked, so a blind +# `git add -A` can't stage it) and is scoped to this repo. +# +# Lock dir: default to the current repo's .git/claudetools-sync.lock IF the +# caller hasn't already set SYNC_LOCK_DIR (sync.sh sets it explicitly). +: "${SYNC_LOCK_DIR:=$(git rev-parse --show-toplevel 2>/dev/null)/.git/claudetools-sync.lock}" +SYNC_LOCK_WAIT="${SYNC_LOCK_WAIT:-120}" # max seconds to wait for a held lock before skipping the run +SYNC_LOCK_STALE="${SYNC_LOCK_STALE:-600}" # seconds after which a held lock is treated as stale (10 min) +SYNC_LOCK_OWNED=0 # becomes 1 only once THIS run owns the lock (gates release) + +# Idempotent release — only removes the lock if THIS process actually owns it +# (stored PID == $$), so a "skipping this run" exit can never clobber the lock +# held by the live sync we deferred to. Installed as an EXIT trap by the caller +# because callers run under `set -e`: the lock must be released on error exits too. +release_sync_lock() { + if [ "$SYNC_LOCK_OWNED" = "1" ] && [ -d "$SYNC_LOCK_DIR" ]; then + local owner_pid + owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") + if [ -z "$owner_pid" ] || [ "$owner_pid" = "$$" ]; then + rm -rf "$SYNC_LOCK_DIR" 2>/dev/null || true + fi + SYNC_LOCK_OWNED=0 + fi +} + +# Portable liveness check. `kill -0 ` works on Git Bash (it maps to the +# Windows process table), macOS, and Linux; guarded so a bad/empty PID is "dead". +sync_pid_alive() { + local pid="$1" + [ -n "$pid" ] || return 1 + kill -0 "$pid" 2>/dev/null +} + +acquire_sync_lock() { + local waited=0 owner_pid owner_ts now mtime lock_age stale_aside re_pid re_now re_mtime re_age + while true; do + if mkdir "$SYNC_LOCK_DIR" 2>/dev/null; then + SYNC_LOCK_OWNED=1 + printf '%s' "$$" > "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || true + # PID + ISO timestamp inside the lock dir, for diagnostics. + { + printf 'pid=%s\n' "$$" + printf 'iso=%s\n' "$(date -u "+%Y-%m-%dT%H:%M:%SZ")" + printf 'machine=%s\n' "$MACHINE" + } > "$SYNC_LOCK_DIR/owner" 2>/dev/null || true + # Defense-in-depth: confirm we still own the dir we just created. If + # owner.pid isn't ours, drop ownership and re-evaluate (never fatal + # under set -e — comparison is cheap and the body just loops). + if [ "$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null)" != "$$" ]; then + SYNC_LOCK_OWNED=0; continue + fi + return 0 + fi + + # mkdir failed -> the lock is held. Decide whether it's stale or live. + owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") + owner_ts=$(sed -n 's/^iso=//p' "$SYNC_LOCK_DIR/owner" 2>/dev/null | head -1) + [ -n "$owner_ts" ] || owner_ts="unknown" + + # Stale if the dir is older than the threshold OR the owner PID is dead. + # `stat -c` is GNU/Git-Bash, `stat -f` is BSD/macOS; fall back to 0. + now=$(date +%s 2>/dev/null || echo 0) + mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0) + lock_age=$(( now - mtime )) + if { [ "$mtime" -gt 0 ] && [ "$lock_age" -ge "$SYNC_LOCK_STALE" ]; } \ + || { [ -n "$owner_pid" ] && ! sync_pid_alive "$owner_pid"; }; then + # Re-verify staleness IMMEDIATELY before clearing. Between the check + # above and here, another racer may have already cleared the stale + # lock and acquired a fresh, LIVE one. Re-read owner.pid + mtime NOW; + # only rename-aside if it is STILL stale this instant. A freshly + # acquired winner has a live PID and fresh mtime, so the loser falls + # through to the live-lock wait path instead of stealing the lock. + re_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") + re_now=$(date +%s 2>/dev/null || echo 0) + re_mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0) + re_age=$(( re_now - re_mtime )) + if { [ "$re_mtime" -gt 0 ] && [ "$re_age" -ge "$SYNC_LOCK_STALE" ]; } \ + || { [ -n "$re_pid" ] && ! sync_pid_alive "$re_pid"; }; then + echo -e "${YELLOW}[WARNING]${NC} removing stale sync lock (held by PID ${re_pid:-?} since ${owner_ts}, age ${re_age}s)" + stale_aside="${SYNC_LOCK_DIR}.stale.$$" + if mv "$SYNC_LOCK_DIR" "$stale_aside" 2>/dev/null; then + rm -rf "$stale_aside" 2>/dev/null || true + fi + fi + sleep 1 # insurance: never tight-spin if clearing persistently fails + continue + fi + + # Live lock. If we've waited the full budget, skip (a duplicate sync is + # harmless to drop — the next scheduled/interactive run catches up). + if [ "$waited" -ge "$SYNC_LOCK_WAIT" ]; then + echo -e "${YELLOW}[WARNING]${NC} another sync is in progress (held by PID ${owner_pid:-?} since ${owner_ts}); skipping this run" + exit 75 # EX_TEMPFAIL: deferred (another sync in progress), not a real success + fi + sleep 2 + waited=$(( waited + 2 )) + done +} +# --- end concurrency lock ---------------------------------------------------- + +# --- Wrapper mode (direct execution only) ------------------------------------ +# Sourcing stops here: the block below runs ONLY when this file is executed +# directly, never when sourced. So sourcing has zero side effects beyond the +# var + function definitions above (no trap, no acquire). +if [ "${BASH_SOURCE[0]}" = "$0" ]; then + # NOT set -e: a non-zero status from the wrapped command must be reported as + # this script's own exit code, not swallowed by an errexit abort. + set -uo pipefail + + if [ "${1:-}" != "run" ] || [ -z "${2:-}" ]; then + echo "usage: $(basename "$0") run [args...]" >&2 + echo " Acquires the per-repo sync lock, runs , releases, exits with its status." >&2 + exit 2 + fi + shift # drop the 'run' subcommand; "$@" is now the command + args + + # Resolve the lock dir from the CURRENT repo. Must be inside a git repo. + _repo_root=$(git rev-parse --show-toplevel 2>/dev/null || true) + if [ -z "$_repo_root" ]; then + echo -e "${RED}[ERROR]${NC} sync-lock.sh: not inside a git repository (cannot resolve lock dir)" >&2 + exit 2 + fi + SYNC_LOCK_DIR="$_repo_root/.git/claudetools-sync.lock" + + trap release_sync_lock EXIT INT TERM + acquire_sync_lock # exits 75 on contention (propagates to our caller) + + "$@" + _status=$? + # Release happens via the EXIT trap; mirror the wrapped command's status. + exit $_status +fi diff --git a/.claude/scripts/sync.sh b/.claude/scripts/sync.sh index 904db66..f6f7043 100755 --- a/.claude/scripts/sync.sh +++ b/.claude/scripts/sync.sh @@ -130,107 +130,18 @@ echo -e "${GREEN}[OK]${NC} Working directory: $(pwd)" # submodule update, staging, commit, fetch, rebase, push — and by extension the # vault phase) behind a single per-machine lock. # -# PORTABILITY: `flock` is frequently ABSENT on Git Bash (MSYS2), so we can't -# depend on it. An atomic `mkdir` is the lowest common denominator — it fails if -# the directory already exists, atomically, on every platform we run on (Windows -# Git Bash, macOS, Linux). The lock lives under .git/ (never tracked, so a blind -# `git add -A` can't stage it) and is scoped to this repo. +# The lock primitive (mkdir-atomic lock, stale detection, ownership-checked +# release, exit-75-on-contention) lives in the SHAREABLE library sync-lock.sh so +# other commit paths (/scc, /checkpoint) can contend on the SAME lock dir. We +# set SYNC_LOCK_DIR explicitly, source the library (which defines the vars + +# functions but installs NO trap and acquires NOTHING on source), then install +# our own EXIT trap and acquire — exactly as before. We are already cd'd into +# REPO_ROOT, and the path is absolute, so the source resolves from any CWD. SYNC_LOCK_DIR="$REPO_ROOT/.git/claudetools-sync.lock" -SYNC_LOCK_WAIT=120 # max seconds to wait for a held lock before skipping the run -SYNC_LOCK_STALE=600 # seconds after which a held lock is treated as stale (10 min) -SYNC_LOCK_OWNED=0 # becomes 1 only once THIS run owns the lock (gates release) +# shellcheck source=./sync-lock.sh +source "$REPO_ROOT/.claude/scripts/sync-lock.sh" -# Idempotent release — only removes the lock if THIS process actually owns it -# (stored PID == $$), so a "skipping this run" exit can never clobber the lock -# held by the live sync we deferred to. Installed as an EXIT trap because the -# script runs under `set -e`: the lock must be released on error exits too. -# (There is no pre-existing EXIT trap in this script, so this adds a fresh one.) -release_sync_lock() { - if [ "$SYNC_LOCK_OWNED" = "1" ] && [ -d "$SYNC_LOCK_DIR" ]; then - local owner_pid - owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") - if [ -z "$owner_pid" ] || [ "$owner_pid" = "$$" ]; then - rm -rf "$SYNC_LOCK_DIR" 2>/dev/null || true - fi - SYNC_LOCK_OWNED=0 - fi -} trap release_sync_lock EXIT INT TERM - -# Portable liveness check. `kill -0 ` works on Git Bash (it maps to the -# Windows process table), macOS, and Linux; guarded so a bad/empty PID is "dead". -sync_pid_alive() { - local pid="$1" - [ -n "$pid" ] || return 1 - kill -0 "$pid" 2>/dev/null -} - -acquire_sync_lock() { - local waited=0 owner_pid owner_ts now mtime lock_age stale_aside re_pid re_now re_mtime re_age - while true; do - if mkdir "$SYNC_LOCK_DIR" 2>/dev/null; then - SYNC_LOCK_OWNED=1 - printf '%s' "$$" > "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || true - # PID + ISO timestamp inside the lock dir, for diagnostics. - { - printf 'pid=%s\n' "$$" - printf 'iso=%s\n' "$(date -u "+%Y-%m-%dT%H:%M:%SZ")" - printf 'machine=%s\n' "$MACHINE" - } > "$SYNC_LOCK_DIR/owner" 2>/dev/null || true - # Defense-in-depth: confirm we still own the dir we just created. If - # owner.pid isn't ours, drop ownership and re-evaluate (never fatal - # under set -e — comparison is cheap and the body just loops). - if [ "$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null)" != "$$" ]; then - SYNC_LOCK_OWNED=0; continue - fi - return 0 - fi - - # mkdir failed -> the lock is held. Decide whether it's stale or live. - owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") - owner_ts=$(sed -n 's/^iso=//p' "$SYNC_LOCK_DIR/owner" 2>/dev/null | head -1) - [ -n "$owner_ts" ] || owner_ts="unknown" - - # Stale if the dir is older than the threshold OR the owner PID is dead. - # `stat -c` is GNU/Git-Bash, `stat -f` is BSD/macOS; fall back to 0. - now=$(date +%s 2>/dev/null || echo 0) - mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0) - lock_age=$(( now - mtime )) - if { [ "$mtime" -gt 0 ] && [ "$lock_age" -ge "$SYNC_LOCK_STALE" ]; } \ - || { [ -n "$owner_pid" ] && ! sync_pid_alive "$owner_pid"; }; then - # Re-verify staleness IMMEDIATELY before clearing. Between the check - # above and here, another racer may have already cleared the stale - # lock and acquired a fresh, LIVE one. Re-read owner.pid + mtime NOW; - # only rename-aside if it is STILL stale this instant. A freshly - # acquired winner has a live PID and fresh mtime, so the loser falls - # through to the live-lock wait path instead of stealing the lock. - re_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "") - re_now=$(date +%s 2>/dev/null || echo 0) - re_mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0) - re_age=$(( re_now - re_mtime )) - if { [ "$re_mtime" -gt 0 ] && [ "$re_age" -ge "$SYNC_LOCK_STALE" ]; } \ - || { [ -n "$re_pid" ] && ! sync_pid_alive "$re_pid"; }; then - echo -e "${YELLOW}[WARNING]${NC} removing stale sync lock (held by PID ${re_pid:-?} since ${owner_ts}, age ${re_age}s)" - stale_aside="${SYNC_LOCK_DIR}.stale.$$" - if mv "$SYNC_LOCK_DIR" "$stale_aside" 2>/dev/null; then - rm -rf "$stale_aside" 2>/dev/null || true - fi - fi - sleep 1 # insurance: never tight-spin if clearing persistently fails - continue - fi - - # Live lock. If we've waited the full budget, skip (a duplicate sync is - # harmless to drop — the next scheduled/interactive run catches up). - if [ "$waited" -ge "$SYNC_LOCK_WAIT" ]; then - echo -e "${YELLOW}[WARNING]${NC} another sync is in progress (held by PID ${owner_pid:-?} since ${owner_ts}); skipping this run" - exit 75 # EX_TEMPFAIL: deferred (another sync in progress), not a real success - fi - sleep 2 - waited=$(( waited + 2 )) - done -} - acquire_sync_lock echo -e "${GREEN}[OK]${NC} Acquired sync lock ($SYNC_LOCK_DIR)" # --- end concurrency lock ----------------------------------------------------