refactor(sync): share the sync lock with /scc and /checkpoint
Extract the per-machine concurrency lock from sync.sh into a sourceable
lib (.claude/scripts/sync-lock.sh) plus a `run <cmd>` wrapper that locks
the current repo (same lock-dir basename, so it mutually excludes with
sync.sh in the ClaudeTools repo and self-scopes in any project repo).
sync.sh now sources it (behavior identical — verified by review). /scc
routes its commit+push through the locked, rebase-safe sync.sh (and drops
the bare YYYY-MM-DD-session.md filename for the per-session-unique one).
/checkpoint now stages+commits atomically under the repo lock so a
concurrent session in a shared worktree can't be swept in. Closes the
remaining commit paths that bypassed the lock shipped in 6b0ce9a.
This commit is contained in:
@@ -14,11 +14,10 @@ Please create a comprehensive git checkpoint with the following steps:
|
|||||||
- Run `git diff` to see detailed changes in tracked files
|
- Run `git diff` to see detailed changes in tracked files
|
||||||
- Run `git log -5 --oneline` to understand the commit message style of this repository
|
- Run `git log -5 --oneline` to understand the commit message style of this repository
|
||||||
|
|
||||||
3. **Stage everything**:
|
3. **Decide what will be staged** (do NOT stage yet):
|
||||||
|
|
||||||
- Add ALL tracked changes (modified and deleted files)
|
- Identify all tracked changes (modified/deleted) and untracked (new) files via `git status`.
|
||||||
- Add ALL untracked files (new files)
|
- Staging is done **atomically with the commit, under the repo lock, in step 5** — do not run a separate `git add` here. This prevents a concurrent session in a shared worktree (e.g. ClaudeTools) from having its dirty files swept into this checkpoint.
|
||||||
- Use `git add -A` or `git add .` to stage everything
|
|
||||||
|
|
||||||
4. **Draft commit message body via Ollama** (documentation engine):
|
4. **Draft commit message body via Ollama** (documentation engine):
|
||||||
|
|
||||||
@@ -49,7 +48,17 @@ print(res['message']['content'])
|
|||||||
- **Body**: Ollama draft (Claude reviews); Claude writes directly if Ollama unavailable
|
- **Body**: Ollama draft (Claude reviews); Claude writes directly if Ollama unavailable
|
||||||
- **Footer**: `Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>`
|
- **Footer**: `Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>`
|
||||||
|
|
||||||
5. **Execute the commit**: Create the commit with the properly formatted message following this repository's conventions.
|
5. **Execute the commit (locked)**: Write the final message (summary line + body + footer) to a temp file, then stage + commit **atomically under the repo's commit lock** so concurrent sessions can't interleave or get swept in:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# MSG = path to the composed commit-message file; LOCK = the shared lock wrapper
|
||||||
|
LOCK="${CLAUDETOOLS_ROOT:-/d/claudetools}/.claude/scripts/sync-lock.sh"
|
||||||
|
bash "$LOCK" run bash -c 'git add -A && git commit -F "$1"' _ "$MSG"
|
||||||
|
```
|
||||||
|
- The lock is scoped to the **current repo** (`git rev-parse --show-toplevel`/.git), so this serializes correctly whether the checkpoint is in ClaudeTools (shares the same lock as `/sync` and `/scc`) or in a project repo (its own lock). The wrapper errors out (exit 2) if you're not in a git repo.
|
||||||
|
- If it **exits 75**, another commit/sync holds the lock — wait briefly and retry, or report "checkpoint deferred".
|
||||||
|
- This is a **local commit only** (no push), matching checkpoint's purpose.
|
||||||
|
- `$CLAUDETOOLS_ROOT` should be set per-machine; the `/d/claudetools` fallback is for this box only — on Mac/Linux it resolves from the env var.
|
||||||
|
|
||||||
## Part 2: Verify Git Checkpoint
|
## Part 2: Verify Git Checkpoint
|
||||||
|
|
||||||
|
|||||||
@@ -6,24 +6,17 @@ Quick command to save session log, stage everything, and push to Gitea in one sh
|
|||||||
|
|
||||||
1. **Save session log** - Create/update session log for today using the /save skill logic:
|
1. **Save session log** - Create/update session log for today using the /save skill logic:
|
||||||
- Determine correct location based on work context (project-specific or general `session-logs/`)
|
- Determine correct location based on work context (project-specific or general `session-logs/`)
|
||||||
- Use format `YYYY-MM-DD-session.md`
|
- **Per-session-unique filename (mandatory)** — concurrent sessions share this worktree, so never use the bare `YYYY-MM-DD-session.md`. Use `YYYY-MM-DD-<user>-<topic>.md`; collision-guard + same-session-append rules are in `/save` (`save.md`).
|
||||||
- If file exists, append with `## Update: HH:MM` header
|
|
||||||
- Include: summary, credentials (unredacted), infrastructure, commands, files changed, pending tasks
|
- Include: summary, credentials (unredacted), infrastructure, commands, files changed, pending tasks
|
||||||
|
|
||||||
2. **Stage all changes** - Run `git add -A` to stage everything including the new session log
|
2. **Commit + push (locked, rebase-safe)** - Run `bash .claude/scripts/sync.sh`. This is the single serialized git path: it takes the per-machine sync lock (so it can't interleave with another session's sync/commit), reconciles git identity to `identity.json`, stages changes, commits, fetch + rebase, pushes — ClaudeTools then vault.
|
||||||
|
- **Do NOT** run raw `git add -A` / `git commit` / `git push origin main` here — that bypasses the lock AND the fetch+rebase (the old flow raced and would reject on a stale push).
|
||||||
|
- If `sync.sh` **exits 75**, another sync is in progress: report "sync deferred — your log is saved locally and will sync on the next run"; do not claim pushed.
|
||||||
|
- Note: the discrete `scc:`-prefixed message is dropped in favour of one locked git path (commit lands under `sync.sh`'s auto message). If a custom message matters, revisit later (e.g. a `-m` arg on `sync.sh`).
|
||||||
|
|
||||||
3. **Commit** - Auto-commit with message:
|
3. **Report** - Confirm what was saved, committed, and pushed (or deferred)
|
||||||
```
|
|
||||||
scc: Session save and push from [hostname] at [timestamp]
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
4. **Reaffirm roles** - After push, briefly restate:
|
||||||
```
|
|
||||||
|
|
||||||
4. **Push to Gitea** - Run `git push origin main`
|
|
||||||
|
|
||||||
5. **Report** - Confirm what was saved, committed, and pushed
|
|
||||||
|
|
||||||
6. **Reaffirm roles** - After push, briefly restate:
|
|
||||||
- You are a COORDINATOR, not an executor
|
- You are a COORDINATOR, not an executor
|
||||||
- Delegate: DB -> Database Agent, code -> Coding Agent, git -> Gitea Agent, tests -> Testing Agent
|
- Delegate: DB -> Database Agent, code -> Coding Agent, git -> Gitea Agent, tests -> Testing Agent
|
||||||
- Do yourself: simple responses, reading 1-2 files, planning, decisions
|
- Do yourself: simple responses, reading 1-2 files, planning, decisions
|
||||||
|
|||||||
185
.claude/scripts/sync-lock.sh
Normal file
185
.claude/scripts/sync-lock.sh
Normal file
@@ -0,0 +1,185 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# ClaudeTools shared sync-concurrency lock primitive
|
||||||
|
# ----------------------------------------------------------------------------
|
||||||
|
# A per-repo, per-machine critical-section lock shared by every commit path
|
||||||
|
# (sync.sh, /scc, /checkpoint, ...). Extracted VERBATIM from sync.sh so the
|
||||||
|
# logic — which already survived two review rounds — is preserved exactly:
|
||||||
|
# * atomic mkdir lock (flock is frequently absent on Git Bash / MSYS2)
|
||||||
|
# * stale detection (age threshold OR dead owner PID), with a re-verify guard
|
||||||
|
# immediately before clearing so a fresh winner is never stolen from
|
||||||
|
# * rename-aside clear (mv then rm) instead of a bare rm
|
||||||
|
# * exit 75 (EX_TEMPFAIL) on live-lock contention after the wait budget
|
||||||
|
# * sleep 1 busy-spin insurance if clearing persistently fails
|
||||||
|
# * defense-in-depth owner.pid==$$ re-read right after acquisition
|
||||||
|
# * ownership-checked, idempotent release (owner.pid must be ours or empty)
|
||||||
|
#
|
||||||
|
# TWO WAYS TO USE:
|
||||||
|
# 1. SOURCE it (e.g. from sync.sh). Sourcing defines vars + functions ONLY —
|
||||||
|
# no trap is installed and the lock is NOT acquired. The caller sets
|
||||||
|
# SYNC_LOCK_DIR (optional — a default is derived from the current git repo
|
||||||
|
# if unset), installs its own `trap release_sync_lock EXIT INT TERM`, and
|
||||||
|
# calls `acquire_sync_lock` where it wants the critical section to begin.
|
||||||
|
# 2. EXECUTE it as a wrapper: bash sync-lock.sh run <cmd> [args...]
|
||||||
|
# Resolves the lock dir from the current git repo, installs the trap,
|
||||||
|
# acquires the lock, runs <cmd>, then releases via the EXIT trap and exits
|
||||||
|
# with <cmd>'s status. Contention propagates as exit 75.
|
||||||
|
#
|
||||||
|
# Lock-dir basename is fixed at `claudetools-sync.lock` so EVERY tool locking
|
||||||
|
# the same repo root contends on the SAME directory.
|
||||||
|
# ----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# Colours — define only if the caller hasn't already (sync.sh defines these
|
||||||
|
# before sourcing; standalone execution needs them too).
|
||||||
|
: "${RED:=\033[0;31m}"
|
||||||
|
: "${GREEN:=\033[0;32m}"
|
||||||
|
: "${YELLOW:=\033[1;33m}"
|
||||||
|
: "${CYAN:=\033[0;36m}"
|
||||||
|
: "${NC:=\033[0m}"
|
||||||
|
|
||||||
|
# Machine label used in lock diagnostics. sync.sh sets MACHINE before sourcing;
|
||||||
|
# guard it so standalone wrapper use (under set -u) never trips on an unset var.
|
||||||
|
: "${MACHINE:=$(hostname 2>/dev/null || echo unknown)}"
|
||||||
|
|
||||||
|
# --- Concurrency lock --------------------------------------------------------
|
||||||
|
# WHY: multiple sync/commit runs on ONE machine must NOT overlap. An interactive
|
||||||
|
# /sync, /scc, or /checkpoint can collide with the scheduled-task sync, or two
|
||||||
|
# concurrent Claude sessions can each stage + commit + fetch + rebase + push and
|
||||||
|
# interleave their git state — corrupting an in-progress rebase, orphaning
|
||||||
|
# commits, or pushing a half-built tree. We serialize the whole critical section
|
||||||
|
# behind a single per-machine lock.
|
||||||
|
#
|
||||||
|
# PORTABILITY: `flock` is frequently ABSENT on Git Bash (MSYS2), so we can't
|
||||||
|
# depend on it. An atomic `mkdir` is the lowest common denominator — it fails if
|
||||||
|
# the directory already exists, atomically, on every platform we run on (Windows
|
||||||
|
# Git Bash, macOS, Linux). The lock lives under .git/ (never tracked, so a blind
|
||||||
|
# `git add -A` can't stage it) and is scoped to this repo.
|
||||||
|
#
|
||||||
|
# Lock dir: default to the current repo's .git/claudetools-sync.lock IF the
|
||||||
|
# caller hasn't already set SYNC_LOCK_DIR (sync.sh sets it explicitly).
|
||||||
|
: "${SYNC_LOCK_DIR:=$(git rev-parse --show-toplevel 2>/dev/null)/.git/claudetools-sync.lock}"
|
||||||
|
SYNC_LOCK_WAIT="${SYNC_LOCK_WAIT:-120}" # max seconds to wait for a held lock before skipping the run
|
||||||
|
SYNC_LOCK_STALE="${SYNC_LOCK_STALE:-600}" # seconds after which a held lock is treated as stale (10 min)
|
||||||
|
SYNC_LOCK_OWNED=0 # becomes 1 only once THIS run owns the lock (gates release)
|
||||||
|
|
||||||
|
# Idempotent release — only removes the lock if THIS process actually owns it
|
||||||
|
# (stored PID == $$), so a "skipping this run" exit can never clobber the lock
|
||||||
|
# held by the live sync we deferred to. Installed as an EXIT trap by the caller
|
||||||
|
# because callers run under `set -e`: the lock must be released on error exits too.
|
||||||
|
release_sync_lock() {
|
||||||
|
if [ "$SYNC_LOCK_OWNED" = "1" ] && [ -d "$SYNC_LOCK_DIR" ]; then
|
||||||
|
local owner_pid
|
||||||
|
owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "")
|
||||||
|
if [ -z "$owner_pid" ] || [ "$owner_pid" = "$$" ]; then
|
||||||
|
rm -rf "$SYNC_LOCK_DIR" 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
SYNC_LOCK_OWNED=0
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Portable liveness check. `kill -0 <pid>` works on Git Bash (it maps to the
|
||||||
|
# Windows process table), macOS, and Linux; guarded so a bad/empty PID is "dead".
|
||||||
|
sync_pid_alive() {
|
||||||
|
local pid="$1"
|
||||||
|
[ -n "$pid" ] || return 1
|
||||||
|
kill -0 "$pid" 2>/dev/null
|
||||||
|
}
|
||||||
|
|
||||||
|
acquire_sync_lock() {
|
||||||
|
local waited=0 owner_pid owner_ts now mtime lock_age stale_aside re_pid re_now re_mtime re_age
|
||||||
|
while true; do
|
||||||
|
if mkdir "$SYNC_LOCK_DIR" 2>/dev/null; then
|
||||||
|
SYNC_LOCK_OWNED=1
|
||||||
|
printf '%s' "$$" > "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || true
|
||||||
|
# PID + ISO timestamp inside the lock dir, for diagnostics.
|
||||||
|
{
|
||||||
|
printf 'pid=%s\n' "$$"
|
||||||
|
printf 'iso=%s\n' "$(date -u "+%Y-%m-%dT%H:%M:%SZ")"
|
||||||
|
printf 'machine=%s\n' "$MACHINE"
|
||||||
|
} > "$SYNC_LOCK_DIR/owner" 2>/dev/null || true
|
||||||
|
# Defense-in-depth: confirm we still own the dir we just created. If
|
||||||
|
# owner.pid isn't ours, drop ownership and re-evaluate (never fatal
|
||||||
|
# under set -e — comparison is cheap and the body just loops).
|
||||||
|
if [ "$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null)" != "$$" ]; then
|
||||||
|
SYNC_LOCK_OWNED=0; continue
|
||||||
|
fi
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# mkdir failed -> the lock is held. Decide whether it's stale or live.
|
||||||
|
owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "")
|
||||||
|
owner_ts=$(sed -n 's/^iso=//p' "$SYNC_LOCK_DIR/owner" 2>/dev/null | head -1)
|
||||||
|
[ -n "$owner_ts" ] || owner_ts="unknown"
|
||||||
|
|
||||||
|
# Stale if the dir is older than the threshold OR the owner PID is dead.
|
||||||
|
# `stat -c` is GNU/Git-Bash, `stat -f` is BSD/macOS; fall back to 0.
|
||||||
|
now=$(date +%s 2>/dev/null || echo 0)
|
||||||
|
mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0)
|
||||||
|
lock_age=$(( now - mtime ))
|
||||||
|
if { [ "$mtime" -gt 0 ] && [ "$lock_age" -ge "$SYNC_LOCK_STALE" ]; } \
|
||||||
|
|| { [ -n "$owner_pid" ] && ! sync_pid_alive "$owner_pid"; }; then
|
||||||
|
# Re-verify staleness IMMEDIATELY before clearing. Between the check
|
||||||
|
# above and here, another racer may have already cleared the stale
|
||||||
|
# lock and acquired a fresh, LIVE one. Re-read owner.pid + mtime NOW;
|
||||||
|
# only rename-aside if it is STILL stale this instant. A freshly
|
||||||
|
# acquired winner has a live PID and fresh mtime, so the loser falls
|
||||||
|
# through to the live-lock wait path instead of stealing the lock.
|
||||||
|
re_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "")
|
||||||
|
re_now=$(date +%s 2>/dev/null || echo 0)
|
||||||
|
re_mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0)
|
||||||
|
re_age=$(( re_now - re_mtime ))
|
||||||
|
if { [ "$re_mtime" -gt 0 ] && [ "$re_age" -ge "$SYNC_LOCK_STALE" ]; } \
|
||||||
|
|| { [ -n "$re_pid" ] && ! sync_pid_alive "$re_pid"; }; then
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} removing stale sync lock (held by PID ${re_pid:-?} since ${owner_ts}, age ${re_age}s)"
|
||||||
|
stale_aside="${SYNC_LOCK_DIR}.stale.$$"
|
||||||
|
if mv "$SYNC_LOCK_DIR" "$stale_aside" 2>/dev/null; then
|
||||||
|
rm -rf "$stale_aside" 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
sleep 1 # insurance: never tight-spin if clearing persistently fails
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Live lock. If we've waited the full budget, skip (a duplicate sync is
|
||||||
|
# harmless to drop — the next scheduled/interactive run catches up).
|
||||||
|
if [ "$waited" -ge "$SYNC_LOCK_WAIT" ]; then
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} another sync is in progress (held by PID ${owner_pid:-?} since ${owner_ts}); skipping this run"
|
||||||
|
exit 75 # EX_TEMPFAIL: deferred (another sync in progress), not a real success
|
||||||
|
fi
|
||||||
|
sleep 2
|
||||||
|
waited=$(( waited + 2 ))
|
||||||
|
done
|
||||||
|
}
|
||||||
|
# --- end concurrency lock ----------------------------------------------------
|
||||||
|
|
||||||
|
# --- Wrapper mode (direct execution only) ------------------------------------
|
||||||
|
# Sourcing stops here: the block below runs ONLY when this file is executed
|
||||||
|
# directly, never when sourced. So sourcing has zero side effects beyond the
|
||||||
|
# var + function definitions above (no trap, no acquire).
|
||||||
|
if [ "${BASH_SOURCE[0]}" = "$0" ]; then
|
||||||
|
# NOT set -e: a non-zero status from the wrapped command must be reported as
|
||||||
|
# this script's own exit code, not swallowed by an errexit abort.
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
if [ "${1:-}" != "run" ] || [ -z "${2:-}" ]; then
|
||||||
|
echo "usage: $(basename "$0") run <command> [args...]" >&2
|
||||||
|
echo " Acquires the per-repo sync lock, runs <command>, releases, exits with its status." >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
shift # drop the 'run' subcommand; "$@" is now the command + args
|
||||||
|
|
||||||
|
# Resolve the lock dir from the CURRENT repo. Must be inside a git repo.
|
||||||
|
_repo_root=$(git rev-parse --show-toplevel 2>/dev/null || true)
|
||||||
|
if [ -z "$_repo_root" ]; then
|
||||||
|
echo -e "${RED}[ERROR]${NC} sync-lock.sh: not inside a git repository (cannot resolve lock dir)" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
SYNC_LOCK_DIR="$_repo_root/.git/claudetools-sync.lock"
|
||||||
|
|
||||||
|
trap release_sync_lock EXIT INT TERM
|
||||||
|
acquire_sync_lock # exits 75 on contention (propagates to our caller)
|
||||||
|
|
||||||
|
"$@"
|
||||||
|
_status=$?
|
||||||
|
# Release happens via the EXIT trap; mirror the wrapped command's status.
|
||||||
|
exit $_status
|
||||||
|
fi
|
||||||
@@ -130,107 +130,18 @@ echo -e "${GREEN}[OK]${NC} Working directory: $(pwd)"
|
|||||||
# submodule update, staging, commit, fetch, rebase, push — and by extension the
|
# submodule update, staging, commit, fetch, rebase, push — and by extension the
|
||||||
# vault phase) behind a single per-machine lock.
|
# vault phase) behind a single per-machine lock.
|
||||||
#
|
#
|
||||||
# PORTABILITY: `flock` is frequently ABSENT on Git Bash (MSYS2), so we can't
|
# The lock primitive (mkdir-atomic lock, stale detection, ownership-checked
|
||||||
# depend on it. An atomic `mkdir` is the lowest common denominator — it fails if
|
# release, exit-75-on-contention) lives in the SHAREABLE library sync-lock.sh so
|
||||||
# the directory already exists, atomically, on every platform we run on (Windows
|
# other commit paths (/scc, /checkpoint) can contend on the SAME lock dir. We
|
||||||
# Git Bash, macOS, Linux). The lock lives under .git/ (never tracked, so a blind
|
# set SYNC_LOCK_DIR explicitly, source the library (which defines the vars +
|
||||||
# `git add -A` can't stage it) and is scoped to this repo.
|
# functions but installs NO trap and acquires NOTHING on source), then install
|
||||||
|
# our own EXIT trap and acquire — exactly as before. We are already cd'd into
|
||||||
|
# REPO_ROOT, and the path is absolute, so the source resolves from any CWD.
|
||||||
SYNC_LOCK_DIR="$REPO_ROOT/.git/claudetools-sync.lock"
|
SYNC_LOCK_DIR="$REPO_ROOT/.git/claudetools-sync.lock"
|
||||||
SYNC_LOCK_WAIT=120 # max seconds to wait for a held lock before skipping the run
|
# shellcheck source=./sync-lock.sh
|
||||||
SYNC_LOCK_STALE=600 # seconds after which a held lock is treated as stale (10 min)
|
source "$REPO_ROOT/.claude/scripts/sync-lock.sh"
|
||||||
SYNC_LOCK_OWNED=0 # becomes 1 only once THIS run owns the lock (gates release)
|
|
||||||
|
|
||||||
# Idempotent release — only removes the lock if THIS process actually owns it
|
|
||||||
# (stored PID == $$), so a "skipping this run" exit can never clobber the lock
|
|
||||||
# held by the live sync we deferred to. Installed as an EXIT trap because the
|
|
||||||
# script runs under `set -e`: the lock must be released on error exits too.
|
|
||||||
# (There is no pre-existing EXIT trap in this script, so this adds a fresh one.)
|
|
||||||
release_sync_lock() {
|
|
||||||
if [ "$SYNC_LOCK_OWNED" = "1" ] && [ -d "$SYNC_LOCK_DIR" ]; then
|
|
||||||
local owner_pid
|
|
||||||
owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "")
|
|
||||||
if [ -z "$owner_pid" ] || [ "$owner_pid" = "$$" ]; then
|
|
||||||
rm -rf "$SYNC_LOCK_DIR" 2>/dev/null || true
|
|
||||||
fi
|
|
||||||
SYNC_LOCK_OWNED=0
|
|
||||||
fi
|
|
||||||
}
|
|
||||||
trap release_sync_lock EXIT INT TERM
|
trap release_sync_lock EXIT INT TERM
|
||||||
|
|
||||||
# Portable liveness check. `kill -0 <pid>` works on Git Bash (it maps to the
|
|
||||||
# Windows process table), macOS, and Linux; guarded so a bad/empty PID is "dead".
|
|
||||||
sync_pid_alive() {
|
|
||||||
local pid="$1"
|
|
||||||
[ -n "$pid" ] || return 1
|
|
||||||
kill -0 "$pid" 2>/dev/null
|
|
||||||
}
|
|
||||||
|
|
||||||
acquire_sync_lock() {
|
|
||||||
local waited=0 owner_pid owner_ts now mtime lock_age stale_aside re_pid re_now re_mtime re_age
|
|
||||||
while true; do
|
|
||||||
if mkdir "$SYNC_LOCK_DIR" 2>/dev/null; then
|
|
||||||
SYNC_LOCK_OWNED=1
|
|
||||||
printf '%s' "$$" > "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || true
|
|
||||||
# PID + ISO timestamp inside the lock dir, for diagnostics.
|
|
||||||
{
|
|
||||||
printf 'pid=%s\n' "$$"
|
|
||||||
printf 'iso=%s\n' "$(date -u "+%Y-%m-%dT%H:%M:%SZ")"
|
|
||||||
printf 'machine=%s\n' "$MACHINE"
|
|
||||||
} > "$SYNC_LOCK_DIR/owner" 2>/dev/null || true
|
|
||||||
# Defense-in-depth: confirm we still own the dir we just created. If
|
|
||||||
# owner.pid isn't ours, drop ownership and re-evaluate (never fatal
|
|
||||||
# under set -e — comparison is cheap and the body just loops).
|
|
||||||
if [ "$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null)" != "$$" ]; then
|
|
||||||
SYNC_LOCK_OWNED=0; continue
|
|
||||||
fi
|
|
||||||
return 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
# mkdir failed -> the lock is held. Decide whether it's stale or live.
|
|
||||||
owner_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "")
|
|
||||||
owner_ts=$(sed -n 's/^iso=//p' "$SYNC_LOCK_DIR/owner" 2>/dev/null | head -1)
|
|
||||||
[ -n "$owner_ts" ] || owner_ts="unknown"
|
|
||||||
|
|
||||||
# Stale if the dir is older than the threshold OR the owner PID is dead.
|
|
||||||
# `stat -c` is GNU/Git-Bash, `stat -f` is BSD/macOS; fall back to 0.
|
|
||||||
now=$(date +%s 2>/dev/null || echo 0)
|
|
||||||
mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0)
|
|
||||||
lock_age=$(( now - mtime ))
|
|
||||||
if { [ "$mtime" -gt 0 ] && [ "$lock_age" -ge "$SYNC_LOCK_STALE" ]; } \
|
|
||||||
|| { [ -n "$owner_pid" ] && ! sync_pid_alive "$owner_pid"; }; then
|
|
||||||
# Re-verify staleness IMMEDIATELY before clearing. Between the check
|
|
||||||
# above and here, another racer may have already cleared the stale
|
|
||||||
# lock and acquired a fresh, LIVE one. Re-read owner.pid + mtime NOW;
|
|
||||||
# only rename-aside if it is STILL stale this instant. A freshly
|
|
||||||
# acquired winner has a live PID and fresh mtime, so the loser falls
|
|
||||||
# through to the live-lock wait path instead of stealing the lock.
|
|
||||||
re_pid=$(cat "$SYNC_LOCK_DIR/owner.pid" 2>/dev/null || echo "")
|
|
||||||
re_now=$(date +%s 2>/dev/null || echo 0)
|
|
||||||
re_mtime=$(stat -c %Y "$SYNC_LOCK_DIR" 2>/dev/null || stat -f %m "$SYNC_LOCK_DIR" 2>/dev/null || echo 0)
|
|
||||||
re_age=$(( re_now - re_mtime ))
|
|
||||||
if { [ "$re_mtime" -gt 0 ] && [ "$re_age" -ge "$SYNC_LOCK_STALE" ]; } \
|
|
||||||
|| { [ -n "$re_pid" ] && ! sync_pid_alive "$re_pid"; }; then
|
|
||||||
echo -e "${YELLOW}[WARNING]${NC} removing stale sync lock (held by PID ${re_pid:-?} since ${owner_ts}, age ${re_age}s)"
|
|
||||||
stale_aside="${SYNC_LOCK_DIR}.stale.$$"
|
|
||||||
if mv "$SYNC_LOCK_DIR" "$stale_aside" 2>/dev/null; then
|
|
||||||
rm -rf "$stale_aside" 2>/dev/null || true
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
sleep 1 # insurance: never tight-spin if clearing persistently fails
|
|
||||||
continue
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Live lock. If we've waited the full budget, skip (a duplicate sync is
|
|
||||||
# harmless to drop — the next scheduled/interactive run catches up).
|
|
||||||
if [ "$waited" -ge "$SYNC_LOCK_WAIT" ]; then
|
|
||||||
echo -e "${YELLOW}[WARNING]${NC} another sync is in progress (held by PID ${owner_pid:-?} since ${owner_ts}); skipping this run"
|
|
||||||
exit 75 # EX_TEMPFAIL: deferred (another sync in progress), not a real success
|
|
||||||
fi
|
|
||||||
sleep 2
|
|
||||||
waited=$(( waited + 2 ))
|
|
||||||
done
|
|
||||||
}
|
|
||||||
|
|
||||||
acquire_sync_lock
|
acquire_sync_lock
|
||||||
echo -e "${GREEN}[OK]${NC} Acquired sync lock ($SYNC_LOCK_DIR)"
|
echo -e "${GREEN}[OK]${NC} Acquired sync lock ($SYNC_LOCK_DIR)"
|
||||||
# --- end concurrency lock ----------------------------------------------------
|
# --- end concurrency lock ----------------------------------------------------
|
||||||
|
|||||||
Reference in New Issue
Block a user