Implements production-ready MSP platform with cross-machine persistent memory for Claude. API Implementation: - 130 REST API endpoints across 21 entities - JWT authentication on all endpoints - AES-256-GCM encryption for credentials - Automatic audit logging - Complete OpenAPI documentation Database: - 43 tables in MariaDB (172.16.3.20:3306) - 42 SQLAlchemy models with modern 2.0 syntax - Full Alembic migration system - 99.1% CRUD test pass rate Context Recall System (Phase 6): - Cross-machine persistent memory via database - Automatic context injection via Claude Code hooks - Automatic context saving after task completion - 90-95% token reduction with compression utilities - Relevance scoring with time decay - Tag-based semantic search - One-command setup script Security Features: - JWT tokens with Argon2 password hashing - AES-256-GCM encryption for all sensitive data - Comprehensive audit trail for credentials - HMAC tamper detection - Secure configuration management Test Results: - Phase 3: 38/38 CRUD tests passing (100%) - Phase 4: 34/35 core API tests passing (97.1%) - Phase 5: 62/62 extended API tests passing (100%) - Phase 6: 10/10 compression tests passing (100%) - Overall: 144/145 tests passing (99.3%) Documentation: - Comprehensive architecture guides - Setup automation scripts - API documentation at /api/docs - Complete test reports - Troubleshooting guides Project Status: 95% Complete (Production-Ready) Phase 7 (optional work context APIs) remains for future enhancement. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
12 KiB
Conversation Parser Usage Guide
Complete guide for using the ClaudeTools conversation transcript parser and intelligent categorizer.
Overview
The conversation parser extracts, analyzes, and categorizes conversation data from Claude Desktop/Code sessions. It intelligently classifies conversations as MSP Work, Development, or General and compresses them for efficient database storage.
Main Functions
1. parse_jsonl_conversation(file_path: str)
Parse conversation files (.jsonl or .json) and extract structured data.
Returns:
{
"messages": [{"role": str, "content": str, "timestamp": str}, ...],
"metadata": {"title": str, "model": str, "created_at": str, ...},
"file_paths": [str, ...], # Auto-extracted from content
"tool_calls": [{"tool": str, "count": int}, ...],
"duration_seconds": int,
"message_count": int
}
Example:
from api.utils.conversation_parser import parse_jsonl_conversation
conversation = parse_jsonl_conversation("/path/to/conversation.jsonl")
print(f"Found {conversation['message_count']} messages")
print(f"Duration: {conversation['duration_seconds']} seconds")
2. categorize_conversation(messages: List[Dict])
Intelligently categorize conversation content using weighted keyword analysis.
Returns: "msp", "development", or "general"
Categorization Logic:
MSP Keywords (higher weight = stronger signal):
- Client/Infrastructure: client, customer, site, firewall, network, server
- Services: support, ticket, incident, billable, invoice
- Microsoft 365: office365, azure, exchange, sharepoint, teams
- MSP-specific: managed service, service desk, RDS, terminal server
Development Keywords:
- API/Backend: api, endpoint, fastapi, flask, rest, webhook
- Database: database, migration, alembic, sqlalchemy, postgresql
- Code: implement, refactor, debug, test, pytest, function, class
- Tools: docker, kubernetes, ci/cd, deployment
Example:
from api.utils.conversation_parser import categorize_conversation
# MSP conversation
messages = [
{"role": "user", "content": "Client firewall blocking Office365"},
{"role": "assistant", "content": "Checking client site configuration"}
]
category = categorize_conversation(messages) # Returns "msp"
# Development conversation
messages = [
{"role": "user", "content": "Build FastAPI endpoint with PostgreSQL"},
{"role": "assistant", "content": "Creating API using SQLAlchemy"}
]
category = categorize_conversation(messages) # Returns "development"
3. extract_context_from_conversation(conversation: Dict)
Extract dense, compressed context suitable for database storage.
Returns:
{
"category": str, # "msp", "development", or "general"
"summary": Dict, # From compress_conversation_summary()
"tags": List[str], # Auto-extracted technology/topic tags
"decisions": List[Dict], # Key decisions with rationale
"key_files": List[str], # Top 20 file paths mentioned
"key_tools": List[str], # Top 10 tools used
"metrics": {
"message_count": int,
"duration_seconds": int,
"file_count": int,
"tool_count": int,
"decision_count": int,
"quality_score": float # 0-10 quality rating
},
"raw_metadata": Dict # Original metadata
}
Quality Score Calculation:
- More messages = higher quality (up to 5 points)
- Decisions indicate depth (up to 2 points)
- File mentions indicate concrete work (up to 2 points)
- Sessions >5 minutes (+1 point)
Example:
from api.utils.conversation_parser import (
parse_jsonl_conversation,
extract_context_from_conversation
)
# Parse and extract context
conversation = parse_jsonl_conversation("/path/to/file.jsonl")
context = extract_context_from_conversation(conversation)
print(f"Category: {context['category']}")
print(f"Tags: {context['tags']}")
print(f"Quality: {context['metrics']['quality_score']}/10")
print(f"Decisions: {len(context['decisions'])}")
4. scan_folder_for_conversations(base_path: str)
Recursively find all conversation files in a directory.
Features:
- Finds both
.jsonland.jsonfiles - Automatically skips config files (config.json, settings.json)
- Skips common non-conversation files (package.json, tsconfig.json)
- Cross-platform path handling
Returns: List of absolute file paths
Example:
from api.utils.conversation_parser import scan_folder_for_conversations
# Scan Claude Code sessions
files = scan_folder_for_conversations(
r"C:\Users\MikeSwanson\claude-projects"
)
print(f"Found {len(files)} conversation files")
for file in files[:5]:
print(f" - {file}")
Complete Workflow Example
Batch Process Conversation Folder
from api.utils.conversation_parser import (
scan_folder_for_conversations,
parse_jsonl_conversation,
extract_context_from_conversation
)
# 1. Scan for conversation files
base_path = r"C:\Users\MikeSwanson\claude-projects"
files = scan_folder_for_conversations(base_path)
# 2. Process each conversation
contexts = []
for file_path in files:
try:
# Parse conversation
conversation = parse_jsonl_conversation(file_path)
# Extract context
context = extract_context_from_conversation(conversation)
# Add source file
context["source_file"] = file_path
contexts.append(context)
print(f"Processed: {file_path}")
print(f" Category: {context['category']}")
print(f" Messages: {context['metrics']['message_count']}")
print(f" Quality: {context['metrics']['quality_score']}/10")
except Exception as e:
print(f"Error processing {file_path}: {e}")
# 3. Categorize by type
msp_contexts = [c for c in contexts if c['category'] == 'msp']
dev_contexts = [c for c in contexts if c['category'] == 'development']
print(f"\nSummary:")
print(f" MSP conversations: {len(msp_contexts)}")
print(f" Development conversations: {len(dev_contexts)}")
Using the Batch Helper Function
from api.utils.conversation_parser import batch_process_conversations
def progress_callback(file_path, context):
"""Called for each processed file"""
print(f"Processed: {context['category']} - {context['metrics']['quality_score']}/10")
# Process all conversations with callback
contexts = batch_process_conversations(
r"C:\Users\MikeSwanson\claude-projects",
output_callback=progress_callback
)
print(f"Total processed: {len(contexts)}")
Integration with Database
Insert Context into Database
from sqlalchemy.orm import Session
from api.models import ContextSnippet
from api.utils.conversation_parser import (
parse_jsonl_conversation,
extract_context_from_conversation
)
def import_conversation_to_db(db: Session, file_path: str):
"""Import a conversation file into the database."""
# 1. Parse and extract context
conversation = parse_jsonl_conversation(file_path)
context = extract_context_from_conversation(conversation)
# 2. Create context snippet for summary
summary_snippet = ContextSnippet(
content=str(context['summary']),
snippet_type="session_summary",
tags=context['tags'],
importance=min(10, int(context['metrics']['quality_score'])),
metadata={
"category": context['category'],
"source_file": file_path,
"message_count": context['metrics']['message_count'],
"duration_seconds": context['metrics']['duration_seconds']
}
)
db.add(summary_snippet)
# 3. Create decision snippets
for decision in context['decisions']:
decision_snippet = ContextSnippet(
content=f"{decision['decision']} - {decision['rationale']}",
snippet_type="decision",
tags=context['tags'][:5],
importance=7 if decision['impact'] == 'high' else 5,
metadata={
"category": context['category'],
"impact": decision['impact'],
"source_file": file_path
}
)
db.add(decision_snippet)
db.commit()
print(f"Imported conversation from {file_path}")
CLI Quick Test
The module includes a standalone CLI for quick testing:
# Test a specific conversation file
python api/utils/conversation_parser.py /path/to/conversation.jsonl
# Output:
# Conversation: Build authentication system
# Category: development
# Messages: 15
# Duration: 1200s (20m)
# Tags: development, fastapi, postgresql, auth, api
# Quality: 7.5/10
Categorization Examples
MSP Conversation
User: Client at BGBuilders site reported VPN connection issues
Assistant: I'll check the firewall configuration and VPN settings for the client
Category: msp
Score Logic: client (3), site (2), vpn (2), firewall (3) = 10 points
Development Conversation
User: Build a FastAPI REST API with PostgreSQL and implement JWT authentication
Assistant: I'll create the API endpoints using SQLAlchemy ORM and add JWT token support
Category: development
Score Logic: fastapi (4), api (3), postgresql (3), jwt (auth tag), sqlalchemy (3) = 13+ points
General Conversation
User: What's the best way to organize my project files?
Assistant: I recommend organizing by feature rather than by file type
Category: general
Score Logic: No strong MSP or dev keywords, low scores on both
Advanced Features
File Path Extraction
Automatically extracts file paths from conversation content:
conversation = parse_jsonl_conversation("/path/to/file.jsonl")
print(conversation['file_paths'])
# ['api/auth.py', 'api/models.py', 'tests/test_auth.py']
Supports:
- Windows absolute paths:
C:\Users\...\file.py - Unix absolute paths:
/home/user/file.py - Relative paths:
./api/file.py,../utils/helper.py - Code paths:
api/auth.py,src/models.py
Tool Call Tracking
Automatically tracks which tools were used:
conversation = parse_jsonl_conversation("/path/to/file.jsonl")
print(conversation['tool_calls'])
# [
# {"tool": "write", "count": 5},
# {"tool": "read", "count": 3},
# {"tool": "bash", "count": 2}
# ]
Best Practices
- Use quality scores to filter: Only import high-quality conversations (score > 5.0)
- Batch process in chunks: Process large folders in batches to manage memory
- Add source file tracking: Always include
source_filein context for traceability - Validate before import: Check
message_count > 0before importing to database - Use callbacks for progress: Implement progress callbacks for long-running batch jobs
Error Handling
from api.utils.conversation_parser import parse_jsonl_conversation
try:
conversation = parse_jsonl_conversation(file_path)
if conversation['message_count'] == 0:
print("Warning: Empty conversation, skipping")
return
# Process conversation...
except FileNotFoundError:
print(f"File not found: {file_path}")
except ValueError as e:
print(f"Invalid file format: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Related Files
context_compression.py: Provides compression utilities used by the parsertest_conversation_parser.py: Comprehensive test suite with examples- Database Models:
api/models.py- ContextSnippet model for storage
Future Enhancements
Potential improvements for future versions:
- Multi-language detection: Identify primary programming language
- Sentiment analysis: Detect problem-solving vs. exploratory conversations
- Entity extraction: Extract specific client names, project names, technologies
- Time-based patterns: Identify working hours, session patterns
- Conversation linking: Link related conversations by topic/project