Files
claudetools/api/utils/CONVERSATION_PARSER_GUIDE.md
Mike Swanson 390b10b32c Complete Phase 6: MSP Work Tracking with Context Recall System
Implements production-ready MSP platform with cross-machine persistent memory for Claude.

API Implementation:
- 130 REST API endpoints across 21 entities
- JWT authentication on all endpoints
- AES-256-GCM encryption for credentials
- Automatic audit logging
- Complete OpenAPI documentation

Database:
- 43 tables in MariaDB (172.16.3.20:3306)
- 42 SQLAlchemy models with modern 2.0 syntax
- Full Alembic migration system
- 99.1% CRUD test pass rate

Context Recall System (Phase 6):
- Cross-machine persistent memory via database
- Automatic context injection via Claude Code hooks
- Automatic context saving after task completion
- 90-95% token reduction with compression utilities
- Relevance scoring with time decay
- Tag-based semantic search
- One-command setup script

Security Features:
- JWT tokens with Argon2 password hashing
- AES-256-GCM encryption for all sensitive data
- Comprehensive audit trail for credentials
- HMAC tamper detection
- Secure configuration management

Test Results:
- Phase 3: 38/38 CRUD tests passing (100%)
- Phase 4: 34/35 core API tests passing (97.1%)
- Phase 5: 62/62 extended API tests passing (100%)
- Phase 6: 10/10 compression tests passing (100%)
- Overall: 144/145 tests passing (99.3%)

Documentation:
- Comprehensive architecture guides
- Setup automation scripts
- API documentation at /api/docs
- Complete test reports
- Troubleshooting guides

Project Status: 95% Complete (Production-Ready)
Phase 7 (optional work context APIs) remains for future enhancement.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-17 06:00:26 -07:00

12 KiB

Conversation Parser Usage Guide

Complete guide for using the ClaudeTools conversation transcript parser and intelligent categorizer.

Overview

The conversation parser extracts, analyzes, and categorizes conversation data from Claude Desktop/Code sessions. It intelligently classifies conversations as MSP Work, Development, or General and compresses them for efficient database storage.

Main Functions

1. parse_jsonl_conversation(file_path: str)

Parse conversation files (.jsonl or .json) and extract structured data.

Returns:

{
    "messages": [{"role": str, "content": str, "timestamp": str}, ...],
    "metadata": {"title": str, "model": str, "created_at": str, ...},
    "file_paths": [str, ...],           # Auto-extracted from content
    "tool_calls": [{"tool": str, "count": int}, ...],
    "duration_seconds": int,
    "message_count": int
}

Example:

from api.utils.conversation_parser import parse_jsonl_conversation

conversation = parse_jsonl_conversation("/path/to/conversation.jsonl")
print(f"Found {conversation['message_count']} messages")
print(f"Duration: {conversation['duration_seconds']} seconds")

2. categorize_conversation(messages: List[Dict])

Intelligently categorize conversation content using weighted keyword analysis.

Returns: "msp", "development", or "general"

Categorization Logic:

MSP Keywords (higher weight = stronger signal):

  • Client/Infrastructure: client, customer, site, firewall, network, server
  • Services: support, ticket, incident, billable, invoice
  • Microsoft 365: office365, azure, exchange, sharepoint, teams
  • MSP-specific: managed service, service desk, RDS, terminal server

Development Keywords:

  • API/Backend: api, endpoint, fastapi, flask, rest, webhook
  • Database: database, migration, alembic, sqlalchemy, postgresql
  • Code: implement, refactor, debug, test, pytest, function, class
  • Tools: docker, kubernetes, ci/cd, deployment

Example:

from api.utils.conversation_parser import categorize_conversation

# MSP conversation
messages = [
    {"role": "user", "content": "Client firewall blocking Office365"},
    {"role": "assistant", "content": "Checking client site configuration"}
]
category = categorize_conversation(messages)  # Returns "msp"

# Development conversation
messages = [
    {"role": "user", "content": "Build FastAPI endpoint with PostgreSQL"},
    {"role": "assistant", "content": "Creating API using SQLAlchemy"}
]
category = categorize_conversation(messages)  # Returns "development"

3. extract_context_from_conversation(conversation: Dict)

Extract dense, compressed context suitable for database storage.

Returns:

{
    "category": str,                    # "msp", "development", or "general"
    "summary": Dict,                    # From compress_conversation_summary()
    "tags": List[str],                  # Auto-extracted technology/topic tags
    "decisions": List[Dict],            # Key decisions with rationale
    "key_files": List[str],            # Top 20 file paths mentioned
    "key_tools": List[str],            # Top 10 tools used
    "metrics": {
        "message_count": int,
        "duration_seconds": int,
        "file_count": int,
        "tool_count": int,
        "decision_count": int,
        "quality_score": float         # 0-10 quality rating
    },
    "raw_metadata": Dict               # Original metadata
}

Quality Score Calculation:

  • More messages = higher quality (up to 5 points)
  • Decisions indicate depth (up to 2 points)
  • File mentions indicate concrete work (up to 2 points)
  • Sessions >5 minutes (+1 point)

Example:

from api.utils.conversation_parser import (
    parse_jsonl_conversation,
    extract_context_from_conversation
)

# Parse and extract context
conversation = parse_jsonl_conversation("/path/to/file.jsonl")
context = extract_context_from_conversation(conversation)

print(f"Category: {context['category']}")
print(f"Tags: {context['tags']}")
print(f"Quality: {context['metrics']['quality_score']}/10")
print(f"Decisions: {len(context['decisions'])}")

4. scan_folder_for_conversations(base_path: str)

Recursively find all conversation files in a directory.

Features:

  • Finds both .jsonl and .json files
  • Automatically skips config files (config.json, settings.json)
  • Skips common non-conversation files (package.json, tsconfig.json)
  • Cross-platform path handling

Returns: List of absolute file paths

Example:

from api.utils.conversation_parser import scan_folder_for_conversations

# Scan Claude Code sessions
files = scan_folder_for_conversations(
    r"C:\Users\MikeSwanson\claude-projects"
)

print(f"Found {len(files)} conversation files")
for file in files[:5]:
    print(f"  - {file}")

Complete Workflow Example

Batch Process Conversation Folder

from api.utils.conversation_parser import (
    scan_folder_for_conversations,
    parse_jsonl_conversation,
    extract_context_from_conversation
)

# 1. Scan for conversation files
base_path = r"C:\Users\MikeSwanson\claude-projects"
files = scan_folder_for_conversations(base_path)

# 2. Process each conversation
contexts = []
for file_path in files:
    try:
        # Parse conversation
        conversation = parse_jsonl_conversation(file_path)

        # Extract context
        context = extract_context_from_conversation(conversation)

        # Add source file
        context["source_file"] = file_path

        contexts.append(context)

        print(f"Processed: {file_path}")
        print(f"  Category: {context['category']}")
        print(f"  Messages: {context['metrics']['message_count']}")
        print(f"  Quality: {context['metrics']['quality_score']}/10")

    except Exception as e:
        print(f"Error processing {file_path}: {e}")

# 3. Categorize by type
msp_contexts = [c for c in contexts if c['category'] == 'msp']
dev_contexts = [c for c in contexts if c['category'] == 'development']

print(f"\nSummary:")
print(f"  MSP conversations: {len(msp_contexts)}")
print(f"  Development conversations: {len(dev_contexts)}")

Using the Batch Helper Function

from api.utils.conversation_parser import batch_process_conversations

def progress_callback(file_path, context):
    """Called for each processed file"""
    print(f"Processed: {context['category']} - {context['metrics']['quality_score']}/10")

# Process all conversations with callback
contexts = batch_process_conversations(
    r"C:\Users\MikeSwanson\claude-projects",
    output_callback=progress_callback
)

print(f"Total processed: {len(contexts)}")

Integration with Database

Insert Context into Database

from sqlalchemy.orm import Session
from api.models import ContextSnippet
from api.utils.conversation_parser import (
    parse_jsonl_conversation,
    extract_context_from_conversation
)

def import_conversation_to_db(db: Session, file_path: str):
    """Import a conversation file into the database."""

    # 1. Parse and extract context
    conversation = parse_jsonl_conversation(file_path)
    context = extract_context_from_conversation(conversation)

    # 2. Create context snippet for summary
    summary_snippet = ContextSnippet(
        content=str(context['summary']),
        snippet_type="session_summary",
        tags=context['tags'],
        importance=min(10, int(context['metrics']['quality_score'])),
        metadata={
            "category": context['category'],
            "source_file": file_path,
            "message_count": context['metrics']['message_count'],
            "duration_seconds": context['metrics']['duration_seconds']
        }
    )
    db.add(summary_snippet)

    # 3. Create decision snippets
    for decision in context['decisions']:
        decision_snippet = ContextSnippet(
            content=f"{decision['decision']} - {decision['rationale']}",
            snippet_type="decision",
            tags=context['tags'][:5],
            importance=7 if decision['impact'] == 'high' else 5,
            metadata={
                "category": context['category'],
                "impact": decision['impact'],
                "source_file": file_path
            }
        )
        db.add(decision_snippet)

    db.commit()
    print(f"Imported conversation from {file_path}")

CLI Quick Test

The module includes a standalone CLI for quick testing:

# Test a specific conversation file
python api/utils/conversation_parser.py /path/to/conversation.jsonl

# Output:
# Conversation: Build authentication system
# Category: development
# Messages: 15
# Duration: 1200s (20m)
# Tags: development, fastapi, postgresql, auth, api
# Quality: 7.5/10

Categorization Examples

MSP Conversation

User: Client at BGBuilders site reported VPN connection issues
Assistant: I'll check the firewall configuration and VPN settings for the client

Category: msp Score Logic: client (3), site (2), vpn (2), firewall (3) = 10 points

Development Conversation

User: Build a FastAPI REST API with PostgreSQL and implement JWT authentication
Assistant: I'll create the API endpoints using SQLAlchemy ORM and add JWT token support

Category: development Score Logic: fastapi (4), api (3), postgresql (3), jwt (auth tag), sqlalchemy (3) = 13+ points

General Conversation

User: What's the best way to organize my project files?
Assistant: I recommend organizing by feature rather than by file type

Category: general Score Logic: No strong MSP or dev keywords, low scores on both


Advanced Features

File Path Extraction

Automatically extracts file paths from conversation content:

conversation = parse_jsonl_conversation("/path/to/file.jsonl")
print(conversation['file_paths'])
# ['api/auth.py', 'api/models.py', 'tests/test_auth.py']

Supports:

  • Windows absolute paths: C:\Users\...\file.py
  • Unix absolute paths: /home/user/file.py
  • Relative paths: ./api/file.py, ../utils/helper.py
  • Code paths: api/auth.py, src/models.py

Tool Call Tracking

Automatically tracks which tools were used:

conversation = parse_jsonl_conversation("/path/to/file.jsonl")
print(conversation['tool_calls'])
# [
#   {"tool": "write", "count": 5},
#   {"tool": "read", "count": 3},
#   {"tool": "bash", "count": 2}
# ]

Best Practices

  1. Use quality scores to filter: Only import high-quality conversations (score > 5.0)
  2. Batch process in chunks: Process large folders in batches to manage memory
  3. Add source file tracking: Always include source_file in context for traceability
  4. Validate before import: Check message_count > 0 before importing to database
  5. Use callbacks for progress: Implement progress callbacks for long-running batch jobs

Error Handling

from api.utils.conversation_parser import parse_jsonl_conversation

try:
    conversation = parse_jsonl_conversation(file_path)

    if conversation['message_count'] == 0:
        print("Warning: Empty conversation, skipping")
        return

    # Process conversation...

except FileNotFoundError:
    print(f"File not found: {file_path}")
except ValueError as e:
    print(f"Invalid file format: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

  • context_compression.py: Provides compression utilities used by the parser
  • test_conversation_parser.py: Comprehensive test suite with examples
  • Database Models: api/models.py - ContextSnippet model for storage

Future Enhancements

Potential improvements for future versions:

  1. Multi-language detection: Identify primary programming language
  2. Sentiment analysis: Detect problem-solving vs. exploratory conversations
  3. Entity extraction: Extract specific client names, project names, technologies
  4. Time-based patterns: Identify working hours, session patterns
  5. Conversation linking: Link related conversations by topic/project