feat: coord API — no-auth, DB softfail 503, agent tracking protocol
- coord routers: removed JWT auth requirement (internal-only endpoints) - error_handler: SQLAlchemy OperationalError/DisconnectionError → 503 with Retry-After: 30 header instead of 500 - /health: live DB probe (SELECT 1) instead of static response - CLAUDE.md: "Live State Tracking" section with full agent protocol for all projects — session start, lock claim/release, component state updates, softfail + local queue catch-up - COORDINATION_PROTOCOL.md: softfail/catch-up section + server-side 503 behavior documented Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -252,19 +252,7 @@ Full protocol reference: `.claude/COORDINATION_PROTOCOL.md`
|
||||
|
||||
### Cross-Session Messages (MANDATORY)
|
||||
|
||||
At session start and after every `/sync`, check for unread messages:
|
||||
```
|
||||
GET http://172.16.3.30:8001/api/coord/messages?to_session=<this-session>&unread_only=true
|
||||
```
|
||||
If unread messages exist, display each one prominently before any other work:
|
||||
```
|
||||
============================================================
|
||||
MESSAGE FROM <from_session> — <subject>
|
||||
============================================================
|
||||
<body>
|
||||
============================================================
|
||||
```
|
||||
Mark as read via `PUT /api/coord/messages/{id}/read` after displaying.
|
||||
See the **Session Start Protocol** in "Live State Tracking" above. Messages must be displayed and marked read before any other work.
|
||||
|
||||
Also scan session logs pulled during `/sync` for legacy `## Note for <user>` sections (transitional — older sessions still use markdown).
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
Cross-session coordination uses the ClaudeTools API at `http://172.16.3.30:8001/api/coord/`. This replaces PROJECT_STATE.md files.
|
||||
|
||||
All endpoints require a `session_id` string identifying the current session (e.g., `DESKTOP-0O8A1RL/claude-main`). No auth token required for coordination endpoints.
|
||||
No auth token required for coordination endpoints — they are internal-only on the 172.16.3.30 private network. Pass `session_id` in the request body or as a query parameter to identify the calling session (e.g., `DESKTOP-0O8A1RL/claude-main`).
|
||||
|
||||
---
|
||||
|
||||
@@ -187,6 +187,47 @@ Free-form — add new slugs as needed. Does NOT foreign-key to the projects tabl
|
||||
|
||||
---
|
||||
|
||||
## Softfail and Catch-Up
|
||||
|
||||
The coordination API must never block work. If it is unavailable:
|
||||
|
||||
**On any network error, timeout, or 5xx response:**
|
||||
1. Log the failed call to `.claude/coord-queue.jsonl` (one JSON object per line):
|
||||
```json
|
||||
{"ts":"2026-05-12T15:30:00Z","method":"PUT","path":"/api/coord/components/gururmm/server","body":{"state":"deployed","version":"0.3.0","notes":"...","updated_by":"DESKTOP-0O8A1RL/claude-main"}}
|
||||
```
|
||||
2. Continue working. Do not retry immediately.
|
||||
|
||||
**On 503 with `Retry-After` header:**
|
||||
Wait the specified seconds, then retry once. If the retry also fails, queue it.
|
||||
|
||||
**Catch-up (session start and after `/sync`):**
|
||||
```bash
|
||||
# If coord-queue.jsonl exists and is non-empty:
|
||||
while read -r line; do
|
||||
method=$(echo "$line" | jq -r .method)
|
||||
path=$(echo "$line" | jq -r .path)
|
||||
body=$(echo "$line" | jq -r .body)
|
||||
curl -s -X "$method" "http://172.16.3.30:8001$path" -H "Content-Type: application/json" -d "$body"
|
||||
done < .claude/coord-queue.jsonl
|
||||
# Remove the file only if all calls succeeded
|
||||
```
|
||||
|
||||
The queue file lives in `.claude/coord-queue.jsonl` (gitignored — local to each workstation).
|
||||
|
||||
---
|
||||
|
||||
## API Softfail Behavior (Server Side)
|
||||
|
||||
When the MariaDB database is unavailable:
|
||||
- Coord endpoints return `503 Service Unavailable` with header `Retry-After: 30`
|
||||
- Response body: `{"detail": "Database unavailable. Retry after 30 seconds.", "retry_after": 30}`
|
||||
- `GET /health` reflects DB status: `{"status":"degraded","database":"disconnected"}`
|
||||
|
||||
This behavior is implemented in the API server and does not need to be coded by agents.
|
||||
|
||||
---
|
||||
|
||||
## Migration Note
|
||||
|
||||
`projects/*/PROJECT_STATE.md` files are ARCHIVED — read-only historical reference. Do not edit them. Use this API for all live coordination going forward.
|
||||
|
||||
17
api/main.py
17
api/main.py
@@ -3,8 +3,10 @@ ClaudeTools FastAPI Application
|
||||
Main entry point for the ClaudeTools MSP management system API
|
||||
"""
|
||||
|
||||
import sqlalchemy as sa
|
||||
from fastapi import FastAPI
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import Response
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
from api.config import get_settings
|
||||
@@ -108,10 +110,17 @@ async def root():
|
||||
@app.get("/health")
|
||||
async def health_check():
|
||||
"""Health check endpoint for monitoring"""
|
||||
return {
|
||||
"status": "healthy",
|
||||
"database": "connected"
|
||||
}
|
||||
try:
|
||||
with engine.connect() as conn:
|
||||
conn.execute(sa.text("SELECT 1"))
|
||||
return {"status": "healthy", "database": "connected"}
|
||||
except Exception:
|
||||
return Response(
|
||||
content='{"status":"degraded","database":"disconnected"}',
|
||||
status_code=503,
|
||||
media_type="application/json",
|
||||
headers={"Retry-After": "30"},
|
||||
)
|
||||
|
||||
|
||||
# Register routers
|
||||
|
||||
@@ -10,7 +10,7 @@ from typing import Any, Dict, Optional
|
||||
from fastapi import FastAPI, Request, status
|
||||
from fastapi.exceptions import RequestValidationError
|
||||
from fastapi.responses import JSONResponse
|
||||
from sqlalchemy.exc import SQLAlchemyError
|
||||
from sqlalchemy.exc import DisconnectionError, OperationalError, SQLAlchemyError
|
||||
|
||||
|
||||
class ClaudeToolsException(Exception):
|
||||
@@ -278,6 +278,17 @@ async def sqlalchemy_exception_handler(
|
||||
)
|
||||
|
||||
|
||||
async def db_unavailable_exception_handler(request: Request, exc: Exception) -> JSONResponse:
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
|
||||
content={
|
||||
"error": "Database unavailable. Retry after 30 seconds.",
|
||||
"path": str(request.url.path),
|
||||
},
|
||||
headers={"Retry-After": "30"},
|
||||
)
|
||||
|
||||
|
||||
async def generic_exception_handler(request: Request, exc: Exception) -> JSONResponse:
|
||||
"""
|
||||
Handler for unhandled exceptions.
|
||||
@@ -321,4 +332,6 @@ def register_exception_handlers(app: FastAPI) -> None:
|
||||
app.add_exception_handler(ClaudeToolsException, claudetools_exception_handler)
|
||||
app.add_exception_handler(RequestValidationError, validation_exception_handler)
|
||||
app.add_exception_handler(SQLAlchemyError, sqlalchemy_exception_handler)
|
||||
app.add_exception_handler(OperationalError, db_unavailable_exception_handler)
|
||||
app.add_exception_handler(DisconnectionError, db_unavailable_exception_handler)
|
||||
app.add_exception_handler(Exception, generic_exception_handler)
|
||||
|
||||
@@ -4,7 +4,6 @@ from fastapi import APIRouter, Depends, Query, status
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from api.database import get_db
|
||||
from api.middleware.auth import get_current_user
|
||||
from api.schemas.coord_component_state import CoordComponentStateUpsert, CoordComponentStateResponse
|
||||
from api.services import coord_component_service
|
||||
|
||||
@@ -15,7 +14,6 @@ router = APIRouter()
|
||||
def list_component_states(
|
||||
project_key: str | None = Query(default=None),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""List all component states, optionally filtered by project."""
|
||||
states = coord_component_service.get_component_states(db, project_key=project_key)
|
||||
@@ -31,7 +29,6 @@ def upsert_component_state(
|
||||
component: str,
|
||||
data: CoordComponentStateUpsert,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Create or update the state of a component within a project."""
|
||||
state = coord_component_service.upsert_component_state(db, project_key, component, data)
|
||||
|
||||
@@ -6,7 +6,6 @@ from fastapi import APIRouter, Depends, Query, status
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from api.database import get_db
|
||||
from api.middleware.auth import get_current_user
|
||||
from api.schemas.coord_session_lock import CoordSessionLockCreate, CoordSessionLockResponse
|
||||
from api.services import coord_lock_service
|
||||
|
||||
@@ -20,7 +19,6 @@ def list_active_locks(
|
||||
skip: int = Query(default=0, ge=0),
|
||||
limit: int = Query(default=100, ge=1, le=1000),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""List currently active locks with optional filters."""
|
||||
locks, total = coord_lock_service.get_active_locks(
|
||||
@@ -39,7 +37,6 @@ def check_resource_locked(
|
||||
project_key: str = Query(...),
|
||||
resource: str = Query(...),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Check whether a resource is currently locked."""
|
||||
lock = coord_lock_service.check_resource_locked(db, project_key, resource)
|
||||
@@ -52,7 +49,6 @@ def check_resource_locked(
|
||||
def claim_lock(
|
||||
data: CoordSessionLockCreate,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Claim a resource lock for a session."""
|
||||
lock = coord_lock_service.claim_lock(db, data)
|
||||
@@ -63,7 +59,6 @@ def claim_lock(
|
||||
def release_all_session_locks(
|
||||
session_id: str = Query(..., description="Release all active locks held by this session"),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Release all active locks for a session (call on session end)."""
|
||||
return coord_lock_service.release_all_session_locks(db, session_id)
|
||||
@@ -74,7 +69,6 @@ def release_lock(
|
||||
lock_id: UUID,
|
||||
session_id: str = Query(..., description="Must match the session that claimed the lock"),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Release a specific lock by ID."""
|
||||
lock = coord_lock_service.release_lock(db, lock_id, session_id)
|
||||
|
||||
@@ -6,7 +6,6 @@ from fastapi import APIRouter, Depends, Query, status
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from api.database import get_db
|
||||
from api.middleware.auth import get_current_user
|
||||
from api.schemas.coord_message import CoordMessageCreate, CoordMessageResponse
|
||||
from api.services import coord_message_service
|
||||
|
||||
@@ -20,7 +19,6 @@ def list_messages(
|
||||
skip: int = Query(default=0, ge=0),
|
||||
limit: int = Query(default=100, ge=1, le=1000),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""List messages with optional filters."""
|
||||
messages, total = coord_message_service.get_messages(
|
||||
@@ -38,7 +36,6 @@ def list_messages(
|
||||
def get_unread_count(
|
||||
session_id: str = Query(...),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Return the count of unread messages for a session."""
|
||||
count = coord_message_service.get_unread_count(db, session_id)
|
||||
@@ -49,7 +46,6 @@ def get_unread_count(
|
||||
def send_message(
|
||||
data: CoordMessageCreate,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Send a message to a session or broadcast."""
|
||||
msg = coord_message_service.send_message(db, data)
|
||||
@@ -60,7 +56,6 @@ def send_message(
|
||||
def mark_message_read(
|
||||
message_id: UUID,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Mark a message as read."""
|
||||
msg = coord_message_service.mark_read(db, message_id)
|
||||
@@ -71,7 +66,6 @@ def mark_message_read(
|
||||
def delete_message(
|
||||
message_id: UUID,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Delete a message."""
|
||||
return coord_message_service.delete_message(db, message_id)
|
||||
|
||||
@@ -4,7 +4,6 @@ from fastapi import APIRouter, Depends, status
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from api.database import get_db
|
||||
from api.middleware.auth import get_current_user
|
||||
from api.schemas.coord_session_lock import CoordSessionLockResponse
|
||||
from api.schemas.coord_workflow import CoordWorkflowResponse
|
||||
from api.schemas.coord_component_state import CoordComponentStateResponse
|
||||
@@ -16,7 +15,6 @@ router = APIRouter()
|
||||
@router.get("", response_model=dict, status_code=status.HTTP_200_OK)
|
||||
def get_coordination_status(
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Return a cross-project snapshot: active locks, in-progress workflows, component states, unread message counts."""
|
||||
active_locks, lock_total = coord_lock_service.get_active_locks(db, limit=200)
|
||||
|
||||
@@ -6,7 +6,6 @@ from fastapi import APIRouter, Depends, Query, status
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from api.database import get_db
|
||||
from api.middleware.auth import get_current_user
|
||||
from api.schemas.coord_work_item import CoordWorkItemCreate, CoordWorkItemResponse, CoordWorkItemUpdate
|
||||
from api.services import coord_work_item_service
|
||||
|
||||
@@ -22,7 +21,6 @@ def list_work_items(
|
||||
skip: int = Query(default=0, ge=0),
|
||||
limit: int = Query(default=100, ge=1, le=1000),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""List work items with optional filters."""
|
||||
items, total = coord_work_item_service.get_work_items(
|
||||
@@ -46,7 +44,6 @@ def list_work_items(
|
||||
def create_work_item(
|
||||
data: CoordWorkItemCreate,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Create a new work item within a workflow."""
|
||||
item = coord_work_item_service.create_work_item(db, data)
|
||||
@@ -57,7 +54,6 @@ def create_work_item(
|
||||
def get_work_item(
|
||||
item_id: UUID,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Get a work item by ID."""
|
||||
item = coord_work_item_service.get_work_item_by_id(db, item_id)
|
||||
@@ -69,7 +65,6 @@ def update_work_item(
|
||||
item_id: UUID,
|
||||
data: CoordWorkItemUpdate,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Update a work item."""
|
||||
item = coord_work_item_service.update_work_item(db, item_id, data)
|
||||
@@ -80,7 +75,6 @@ def update_work_item(
|
||||
def delete_work_item(
|
||||
item_id: UUID,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Delete a work item."""
|
||||
return coord_work_item_service.delete_work_item(db, item_id)
|
||||
|
||||
@@ -6,7 +6,6 @@ from fastapi import APIRouter, Depends, Query, status
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from api.database import get_db
|
||||
from api.middleware.auth import get_current_user
|
||||
from api.schemas.coord_workflow import CoordWorkflowCreate, CoordWorkflowResponse, CoordWorkflowUpdate
|
||||
from api.schemas.coord_work_item import CoordWorkItemResponse
|
||||
from api.services import coord_workflow_service, coord_work_item_service
|
||||
@@ -21,7 +20,6 @@ def list_workflows(
|
||||
skip: int = Query(default=0, ge=0),
|
||||
limit: int = Query(default=100, ge=1, le=1000),
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""List workflows with optional filters."""
|
||||
workflows, total = coord_workflow_service.get_workflows(
|
||||
@@ -39,7 +37,6 @@ def list_workflows(
|
||||
def create_workflow(
|
||||
data: CoordWorkflowCreate,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Create a new coordination workflow."""
|
||||
workflow = coord_workflow_service.create_workflow(db, data)
|
||||
@@ -50,7 +47,6 @@ def create_workflow(
|
||||
def get_workflow(
|
||||
workflow_id: UUID,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Get a workflow by ID including its work items."""
|
||||
workflow = coord_workflow_service.get_workflow_by_id(db, workflow_id)
|
||||
@@ -66,7 +62,6 @@ def update_workflow(
|
||||
workflow_id: UUID,
|
||||
data: CoordWorkflowUpdate,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Update a workflow."""
|
||||
workflow = coord_workflow_service.update_workflow(db, workflow_id, data)
|
||||
@@ -77,7 +72,6 @@ def update_workflow(
|
||||
def delete_workflow(
|
||||
workflow_id: UUID,
|
||||
db: Session = Depends(get_db),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""Delete a workflow and its work items (cascade)."""
|
||||
return coord_workflow_service.delete_workflow(db, workflow_id)
|
||||
|
||||
Reference in New Issue
Block a user