Files
claudetools/.claude/COORDINATION_PROTOCOL.md
Mike Swanson 73573800b0 feat: coord API — no-auth, DB softfail 503, agent tracking protocol
- coord routers: removed JWT auth requirement (internal-only endpoints)
- error_handler: SQLAlchemy OperationalError/DisconnectionError → 503
  with Retry-After: 30 header instead of 500
- /health: live DB probe (SELECT 1) instead of static response
- CLAUDE.md: "Live State Tracking" section with full agent protocol
  for all projects — session start, lock claim/release, component
  state updates, softfail + local queue catch-up
- COORDINATION_PROTOCOL.md: softfail/catch-up section + server-side
  503 behavior documented

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 08:45:33 -07:00

6.3 KiB

Coordination Protocol

Cross-session coordination uses the ClaudeTools API at http://172.16.3.30:8001/api/coord/. This replaces PROJECT_STATE.md files.

No auth token required for coordination endpoints — they are internal-only on the 172.16.3.30 private network. Pass session_id in the request body or as a query parameter to identify the calling session (e.g., DESKTOP-0O8A1RL/claude-main).


When a Lock Is Required

  • Editing or creating source code files
  • Git commit or push
  • SSH command that modifies a server (deploy, install, config change, service restart)
  • Database schema change or data migration
  • Build pipeline modification

Reading files, planning, and answering questions do NOT require a lock.


Lock Lifecycle

Step 1 — Check for conflicts

GET /api/coord/locks?project_key=<key>&resource=<resource>
  • Active lock present: stop, report to user, ask how to proceed.
  • Lock acquired_at > 2 hours ago: note it, release it (Step 2 below), proceed.

Step 2 — Claim your lock

POST /api/coord/locks
{
  "project_key": "gururmm",
  "session_id": "DESKTOP-0O8A1RL/claude-main",
  "resource": "server/src/api/credentials.rs",
  "description": "Adding credential endpoints",
  "ttl_hours": 2
}

Response: { "id": "<uuid>", ... } — save the id for release.

ttl_hours: use 2 for normal work; 0 for no expiry (use sparingly).

Step 3 — Do the work

Step 4 — Release the lock

DELETE /api/coord/locks/<id>?session_id=<session_id>

Release on completion AND on failure. Only the claiming session may release.

Stale lock rule: A lock with acquired_at older than 2 hours and no activity update is abandoned. Release it, then proceed.


Component States

Record the current status of named system components so all sessions share a live view.

Upsert a component state:

PUT /api/coord/components
{
  "project_key": "gururmm",
  "component": "server",
  "state": "deployed",
  "version": "0.3.0",
  "notes": "Deployed 2026-05-12; credential store live",
  "updated_by": "DESKTOP-0O8A1RL/claude-main"
}

Valid states (convention — not enforced): building, built, deploying, deployed, degraded, unknown

Read all component states for a project:

GET /api/coord/components?project_key=gururmm

Workflows and Work Items

Use workflows to track multi-step initiatives that span sessions or days.

Create a workflow:

POST /api/coord/workflows
{
  "project_key": "gururmm",
  "name": "Network Discovery Phase 1",
  "description": "TCP probe scanner + DB layer + API + dashboard",
  "status": "planning",
  "created_by": "DESKTOP-0O8A1RL/claude-main"
}

Add work items to a workflow:

POST /api/coord/work-items
{
  "workflow_id": "<uuid>",
  "project_key": "gururmm",
  "title": "Write migrations 017-019 for discovery tables",
  "status": "pending",
  "priority": 10
}

Update work item status:

PATCH /api/coord/work-items/<id>
{ "status": "completed" }

Workflow statuses: planning, active, blocked, completed, cancelled Work item statuses: pending, in_progress, blocked, completed, cancelled


Inter-Session Messages

Send targeted messages between sessions or broadcast to a project.

Send a message:

POST /api/coord/messages
{
  "from_session": "DESKTOP-0O8A1RL/claude-main",
  "to_session": "HOWARD-HOME/claude-main",   // omit for broadcast
  "project_key": "gururmm",
  "subject": "macOS build pipeline ready for wiring",
  "body": "build-agents.sh updated. Section marked TODO-MACOS. Wire in from your end."
}

Check for unread messages (do this at session start):

GET /api/coord/messages?to_session=<session_id>&unread_only=true

Display each unread message prominently:

============================================================
MESSAGE FROM <from_session> — <subject>
============================================================
<body>
============================================================

Mark as read:

PUT /api/coord/messages/<id>/read

Status Overview

Quick snapshot of everything active:

GET /api/coord/status

Returns: active locks, recent component state changes, active workflows, unread message count.


Session Cleanup

When a session ends cleanly, release all its locks:

DELETE /api/coord/locks?session_id=<session_id>&release_all=true

project_key Slugs

Slug Project
gururmm GuruRMM server + dashboard
claudetools ClaudeTools API + coordination system
dataforth-dos Dataforth DOS project

Free-form — add new slugs as needed. Does NOT foreign-key to the projects table.


Softfail and Catch-Up

The coordination API must never block work. If it is unavailable:

On any network error, timeout, or 5xx response:

  1. Log the failed call to .claude/coord-queue.jsonl (one JSON object per line):
    {"ts":"2026-05-12T15:30:00Z","method":"PUT","path":"/api/coord/components/gururmm/server","body":{"state":"deployed","version":"0.3.0","notes":"...","updated_by":"DESKTOP-0O8A1RL/claude-main"}}
    
  2. Continue working. Do not retry immediately.

On 503 with Retry-After header:
Wait the specified seconds, then retry once. If the retry also fails, queue it.

Catch-up (session start and after /sync):

# If coord-queue.jsonl exists and is non-empty:
while read -r line; do
  method=$(echo "$line" | jq -r .method)
  path=$(echo "$line" | jq -r .path)
  body=$(echo "$line" | jq -r .body)
  curl -s -X "$method" "http://172.16.3.30:8001$path" -H "Content-Type: application/json" -d "$body"
done < .claude/coord-queue.jsonl
# Remove the file only if all calls succeeded

The queue file lives in .claude/coord-queue.jsonl (gitignored — local to each workstation).


API Softfail Behavior (Server Side)

When the MariaDB database is unavailable:

  • Coord endpoints return 503 Service Unavailable with header Retry-After: 30
  • Response body: {"detail": "Database unavailable. Retry after 30 seconds.", "retry_after": 30}
  • GET /health reflects DB status: {"status":"degraded","database":"disconnected"}

This behavior is implemented in the API server and does not need to be coded by agents.


Migration Note

projects/*/PROJECT_STATE.md files are ARCHIVED — read-only historical reference. Do not edit them. Use this API for all live coordination going forward.