# Conversation Parser Usage Guide Complete guide for using the ClaudeTools conversation transcript parser and intelligent categorizer. ## Overview The conversation parser extracts, analyzes, and categorizes conversation data from Claude Desktop/Code sessions. It intelligently classifies conversations as **MSP Work**, **Development**, or **General** and compresses them for efficient database storage. ## Main Functions ### 1. `parse_jsonl_conversation(file_path: str)` Parse conversation files (`.jsonl` or `.json`) and extract structured data. **Returns:** ```python { "messages": [{"role": str, "content": str, "timestamp": str}, ...], "metadata": {"title": str, "model": str, "created_at": str, ...}, "file_paths": [str, ...], # Auto-extracted from content "tool_calls": [{"tool": str, "count": int}, ...], "duration_seconds": int, "message_count": int } ``` **Example:** ```python from api.utils.conversation_parser import parse_jsonl_conversation conversation = parse_jsonl_conversation("/path/to/conversation.jsonl") print(f"Found {conversation['message_count']} messages") print(f"Duration: {conversation['duration_seconds']} seconds") ``` --- ### 2. `categorize_conversation(messages: List[Dict])` Intelligently categorize conversation content using weighted keyword analysis. **Returns:** `"msp"`, `"development"`, or `"general"` **Categorization Logic:** **MSP Keywords (higher weight = stronger signal):** - Client/Infrastructure: client, customer, site, firewall, network, server - Services: support, ticket, incident, billable, invoice - Microsoft 365: office365, azure, exchange, sharepoint, teams - MSP-specific: managed service, service desk, RDS, terminal server **Development Keywords:** - API/Backend: api, endpoint, fastapi, flask, rest, webhook - Database: database, migration, alembic, sqlalchemy, postgresql - Code: implement, refactor, debug, test, pytest, function, class - Tools: docker, kubernetes, ci/cd, deployment **Example:** ```python from api.utils.conversation_parser import categorize_conversation # MSP conversation messages = [ {"role": "user", "content": "Client firewall blocking Office365"}, {"role": "assistant", "content": "Checking client site configuration"} ] category = categorize_conversation(messages) # Returns "msp" # Development conversation messages = [ {"role": "user", "content": "Build FastAPI endpoint with PostgreSQL"}, {"role": "assistant", "content": "Creating API using SQLAlchemy"} ] category = categorize_conversation(messages) # Returns "development" ``` --- ### 3. `extract_context_from_conversation(conversation: Dict)` Extract dense, compressed context suitable for database storage. **Returns:** ```python { "category": str, # "msp", "development", or "general" "summary": Dict, # From compress_conversation_summary() "tags": List[str], # Auto-extracted technology/topic tags "decisions": List[Dict], # Key decisions with rationale "key_files": List[str], # Top 20 file paths mentioned "key_tools": List[str], # Top 10 tools used "metrics": { "message_count": int, "duration_seconds": int, "file_count": int, "tool_count": int, "decision_count": int, "quality_score": float # 0-10 quality rating }, "raw_metadata": Dict # Original metadata } ``` **Quality Score Calculation:** - More messages = higher quality (up to 5 points) - Decisions indicate depth (up to 2 points) - File mentions indicate concrete work (up to 2 points) - Sessions >5 minutes (+1 point) **Example:** ```python from api.utils.conversation_parser import ( parse_jsonl_conversation, extract_context_from_conversation ) # Parse and extract context conversation = parse_jsonl_conversation("/path/to/file.jsonl") context = extract_context_from_conversation(conversation) print(f"Category: {context['category']}") print(f"Tags: {context['tags']}") print(f"Quality: {context['metrics']['quality_score']}/10") print(f"Decisions: {len(context['decisions'])}") ``` --- ### 4. `scan_folder_for_conversations(base_path: str)` Recursively find all conversation files in a directory. **Features:** - Finds both `.jsonl` and `.json` files - Automatically skips config files (config.json, settings.json) - Skips common non-conversation files (package.json, tsconfig.json) - Cross-platform path handling **Returns:** List of absolute file paths **Example:** ```python from api.utils.conversation_parser import scan_folder_for_conversations # Scan Claude Code sessions files = scan_folder_for_conversations( r"C:\Users\MikeSwanson\claude-projects" ) print(f"Found {len(files)} conversation files") for file in files[:5]: print(f" - {file}") ``` --- ## Complete Workflow Example ### Batch Process Conversation Folder ```python from api.utils.conversation_parser import ( scan_folder_for_conversations, parse_jsonl_conversation, extract_context_from_conversation ) # 1. Scan for conversation files base_path = r"C:\Users\MikeSwanson\claude-projects" files = scan_folder_for_conversations(base_path) # 2. Process each conversation contexts = [] for file_path in files: try: # Parse conversation conversation = parse_jsonl_conversation(file_path) # Extract context context = extract_context_from_conversation(conversation) # Add source file context["source_file"] = file_path contexts.append(context) print(f"Processed: {file_path}") print(f" Category: {context['category']}") print(f" Messages: {context['metrics']['message_count']}") print(f" Quality: {context['metrics']['quality_score']}/10") except Exception as e: print(f"Error processing {file_path}: {e}") # 3. Categorize by type msp_contexts = [c for c in contexts if c['category'] == 'msp'] dev_contexts = [c for c in contexts if c['category'] == 'development'] print(f"\nSummary:") print(f" MSP conversations: {len(msp_contexts)}") print(f" Development conversations: {len(dev_contexts)}") ``` ### Using the Batch Helper Function ```python from api.utils.conversation_parser import batch_process_conversations def progress_callback(file_path, context): """Called for each processed file""" print(f"Processed: {context['category']} - {context['metrics']['quality_score']}/10") # Process all conversations with callback contexts = batch_process_conversations( r"C:\Users\MikeSwanson\claude-projects", output_callback=progress_callback ) print(f"Total processed: {len(contexts)}") ``` --- ## Integration with Database ### Insert Context into Database ```python from sqlalchemy.orm import Session from api.models import ContextSnippet from api.utils.conversation_parser import ( parse_jsonl_conversation, extract_context_from_conversation ) def import_conversation_to_db(db: Session, file_path: str): """Import a conversation file into the database.""" # 1. Parse and extract context conversation = parse_jsonl_conversation(file_path) context = extract_context_from_conversation(conversation) # 2. Create context snippet for summary summary_snippet = ContextSnippet( content=str(context['summary']), snippet_type="session_summary", tags=context['tags'], importance=min(10, int(context['metrics']['quality_score'])), metadata={ "category": context['category'], "source_file": file_path, "message_count": context['metrics']['message_count'], "duration_seconds": context['metrics']['duration_seconds'] } ) db.add(summary_snippet) # 3. Create decision snippets for decision in context['decisions']: decision_snippet = ContextSnippet( content=f"{decision['decision']} - {decision['rationale']}", snippet_type="decision", tags=context['tags'][:5], importance=7 if decision['impact'] == 'high' else 5, metadata={ "category": context['category'], "impact": decision['impact'], "source_file": file_path } ) db.add(decision_snippet) db.commit() print(f"Imported conversation from {file_path}") ``` --- ## CLI Quick Test The module includes a standalone CLI for quick testing: ```bash # Test a specific conversation file python api/utils/conversation_parser.py /path/to/conversation.jsonl # Output: # Conversation: Build authentication system # Category: development # Messages: 15 # Duration: 1200s (20m) # Tags: development, fastapi, postgresql, auth, api # Quality: 7.5/10 ``` --- ## Categorization Examples ### MSP Conversation ``` User: Client at BGBuilders site reported VPN connection issues Assistant: I'll check the firewall configuration and VPN settings for the client ``` **Category:** `msp` **Score Logic:** client (3), site (2), vpn (2), firewall (3) = 10 points ### Development Conversation ``` User: Build a FastAPI REST API with PostgreSQL and implement JWT authentication Assistant: I'll create the API endpoints using SQLAlchemy ORM and add JWT token support ``` **Category:** `development` **Score Logic:** fastapi (4), api (3), postgresql (3), jwt (auth tag), sqlalchemy (3) = 13+ points ### General Conversation ``` User: What's the best way to organize my project files? Assistant: I recommend organizing by feature rather than by file type ``` **Category:** `general` **Score Logic:** No strong MSP or dev keywords, low scores on both --- ## Advanced Features ### File Path Extraction Automatically extracts file paths from conversation content: ```python conversation = parse_jsonl_conversation("/path/to/file.jsonl") print(conversation['file_paths']) # ['api/auth.py', 'api/models.py', 'tests/test_auth.py'] ``` Supports: - Windows absolute paths: `C:\Users\...\file.py` - Unix absolute paths: `/home/user/file.py` - Relative paths: `./api/file.py`, `../utils/helper.py` - Code paths: `api/auth.py`, `src/models.py` ### Tool Call Tracking Automatically tracks which tools were used: ```python conversation = parse_jsonl_conversation("/path/to/file.jsonl") print(conversation['tool_calls']) # [ # {"tool": "write", "count": 5}, # {"tool": "read", "count": 3}, # {"tool": "bash", "count": 2} # ] ``` --- ## Best Practices 1. **Use quality scores to filter**: Only import high-quality conversations (score > 5.0) 2. **Batch process in chunks**: Process large folders in batches to manage memory 3. **Add source file tracking**: Always include `source_file` in context for traceability 4. **Validate before import**: Check `message_count > 0` before importing to database 5. **Use callbacks for progress**: Implement progress callbacks for long-running batch jobs --- ## Error Handling ```python from api.utils.conversation_parser import parse_jsonl_conversation try: conversation = parse_jsonl_conversation(file_path) if conversation['message_count'] == 0: print("Warning: Empty conversation, skipping") return # Process conversation... except FileNotFoundError: print(f"File not found: {file_path}") except ValueError as e: print(f"Invalid file format: {e}") except Exception as e: print(f"Unexpected error: {e}") ``` --- ## Related Files - **`context_compression.py`**: Provides compression utilities used by the parser - **`test_conversation_parser.py`**: Comprehensive test suite with examples - **Database Models**: `api/models.py` - ContextSnippet model for storage --- ## Future Enhancements Potential improvements for future versions: 1. **Multi-language detection**: Identify primary programming language 2. **Sentiment analysis**: Detect problem-solving vs. exploratory conversations 3. **Entity extraction**: Extract specific client names, project names, technologies 4. **Time-based patterns**: Identify working hours, session patterns 5. **Conversation linking**: Link related conversations by topic/project