- coord routers: removed JWT auth requirement (internal-only endpoints) - error_handler: SQLAlchemy OperationalError/DisconnectionError → 503 with Retry-After: 30 header instead of 500 - /health: live DB probe (SELECT 1) instead of static response - CLAUDE.md: "Live State Tracking" section with full agent protocol for all projects — session start, lock claim/release, component state updates, softfail + local queue catch-up - COORDINATION_PROTOCOL.md: softfail/catch-up section + server-side 503 behavior documented Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.3 KiB
Coordination Protocol
Cross-session coordination uses the ClaudeTools API at http://172.16.3.30:8001/api/coord/. This replaces PROJECT_STATE.md files.
No auth token required for coordination endpoints — they are internal-only on the 172.16.3.30 private network. Pass session_id in the request body or as a query parameter to identify the calling session (e.g., DESKTOP-0O8A1RL/claude-main).
When a Lock Is Required
- Editing or creating source code files
- Git commit or push
- SSH command that modifies a server (deploy, install, config change, service restart)
- Database schema change or data migration
- Build pipeline modification
Reading files, planning, and answering questions do NOT require a lock.
Lock Lifecycle
Step 1 — Check for conflicts
GET /api/coord/locks?project_key=<key>&resource=<resource>
- Active lock present: stop, report to user, ask how to proceed.
- Lock
acquired_at> 2 hours ago: note it, release it (Step 2 below), proceed.
Step 2 — Claim your lock
POST /api/coord/locks
{
"project_key": "gururmm",
"session_id": "DESKTOP-0O8A1RL/claude-main",
"resource": "server/src/api/credentials.rs",
"description": "Adding credential endpoints",
"ttl_hours": 2
}
Response: { "id": "<uuid>", ... } — save the id for release.
ttl_hours: use 2 for normal work; 0 for no expiry (use sparingly).
Step 3 — Do the work
Step 4 — Release the lock
DELETE /api/coord/locks/<id>?session_id=<session_id>
Release on completion AND on failure. Only the claiming session may release.
Stale lock rule: A lock with acquired_at older than 2 hours and no activity update is abandoned. Release it, then proceed.
Component States
Record the current status of named system components so all sessions share a live view.
Upsert a component state:
PUT /api/coord/components
{
"project_key": "gururmm",
"component": "server",
"state": "deployed",
"version": "0.3.0",
"notes": "Deployed 2026-05-12; credential store live",
"updated_by": "DESKTOP-0O8A1RL/claude-main"
}
Valid states (convention — not enforced): building, built, deploying, deployed, degraded, unknown
Read all component states for a project:
GET /api/coord/components?project_key=gururmm
Workflows and Work Items
Use workflows to track multi-step initiatives that span sessions or days.
Create a workflow:
POST /api/coord/workflows
{
"project_key": "gururmm",
"name": "Network Discovery Phase 1",
"description": "TCP probe scanner + DB layer + API + dashboard",
"status": "planning",
"created_by": "DESKTOP-0O8A1RL/claude-main"
}
Add work items to a workflow:
POST /api/coord/work-items
{
"workflow_id": "<uuid>",
"project_key": "gururmm",
"title": "Write migrations 017-019 for discovery tables",
"status": "pending",
"priority": 10
}
Update work item status:
PATCH /api/coord/work-items/<id>
{ "status": "completed" }
Workflow statuses: planning, active, blocked, completed, cancelled
Work item statuses: pending, in_progress, blocked, completed, cancelled
Inter-Session Messages
Send targeted messages between sessions or broadcast to a project.
Send a message:
POST /api/coord/messages
{
"from_session": "DESKTOP-0O8A1RL/claude-main",
"to_session": "HOWARD-HOME/claude-main", // omit for broadcast
"project_key": "gururmm",
"subject": "macOS build pipeline ready for wiring",
"body": "build-agents.sh updated. Section marked TODO-MACOS. Wire in from your end."
}
Check for unread messages (do this at session start):
GET /api/coord/messages?to_session=<session_id>&unread_only=true
Display each unread message prominently:
============================================================
MESSAGE FROM <from_session> — <subject>
============================================================
<body>
============================================================
Mark as read:
PUT /api/coord/messages/<id>/read
Status Overview
Quick snapshot of everything active:
GET /api/coord/status
Returns: active locks, recent component state changes, active workflows, unread message count.
Session Cleanup
When a session ends cleanly, release all its locks:
DELETE /api/coord/locks?session_id=<session_id>&release_all=true
project_key Slugs
| Slug | Project |
|---|---|
gururmm |
GuruRMM server + dashboard |
claudetools |
ClaudeTools API + coordination system |
dataforth-dos |
Dataforth DOS project |
Free-form — add new slugs as needed. Does NOT foreign-key to the projects table.
Softfail and Catch-Up
The coordination API must never block work. If it is unavailable:
On any network error, timeout, or 5xx response:
- Log the failed call to
.claude/coord-queue.jsonl(one JSON object per line):{"ts":"2026-05-12T15:30:00Z","method":"PUT","path":"/api/coord/components/gururmm/server","body":{"state":"deployed","version":"0.3.0","notes":"...","updated_by":"DESKTOP-0O8A1RL/claude-main"}} - Continue working. Do not retry immediately.
On 503 with Retry-After header:
Wait the specified seconds, then retry once. If the retry also fails, queue it.
Catch-up (session start and after /sync):
# If coord-queue.jsonl exists and is non-empty:
while read -r line; do
method=$(echo "$line" | jq -r .method)
path=$(echo "$line" | jq -r .path)
body=$(echo "$line" | jq -r .body)
curl -s -X "$method" "http://172.16.3.30:8001$path" -H "Content-Type: application/json" -d "$body"
done < .claude/coord-queue.jsonl
# Remove the file only if all calls succeeded
The queue file lives in .claude/coord-queue.jsonl (gitignored — local to each workstation).
API Softfail Behavior (Server Side)
When the MariaDB database is unavailable:
- Coord endpoints return
503 Service Unavailablewith headerRetry-After: 30 - Response body:
{"detail": "Database unavailable. Retry after 30 seconds.", "retry_after": 30} GET /healthreflects DB status:{"status":"degraded","database":"disconnected"}
This behavior is implemented in the API server and does not need to be coded by agents.
Migration Note
projects/*/PROJECT_STATE.md files are ARCHIVED — read-only historical reference. Do not edit them. Use this API for all live coordination going forward.