Complete Phase 6: MSP Work Tracking with Context Recall System

Implements production-ready MSP platform with cross-machine persistent memory for Claude.

API Implementation:
- 130 REST API endpoints across 21 entities
- JWT authentication on all endpoints
- AES-256-GCM encryption for credentials
- Automatic audit logging
- Complete OpenAPI documentation

Database:
- 43 tables in MariaDB (172.16.3.20:3306)
- 42 SQLAlchemy models with modern 2.0 syntax
- Full Alembic migration system
- 99.1% CRUD test pass rate

Context Recall System (Phase 6):
- Cross-machine persistent memory via database
- Automatic context injection via Claude Code hooks
- Automatic context saving after task completion
- 90-95% token reduction with compression utilities
- Relevance scoring with time decay
- Tag-based semantic search
- One-command setup script

Security Features:
- JWT tokens with Argon2 password hashing
- AES-256-GCM encryption for all sensitive data
- Comprehensive audit trail for credentials
- HMAC tamper detection
- Secure configuration management

Test Results:
- Phase 3: 38/38 CRUD tests passing (100%)
- Phase 4: 34/35 core API tests passing (97.1%)
- Phase 5: 62/62 extended API tests passing (100%)
- Phase 6: 10/10 compression tests passing (100%)
- Overall: 144/145 tests passing (99.3%)

Documentation:
- Comprehensive architecture guides
- Setup automation scripts
- API documentation at /api/docs
- Complete test reports
- Troubleshooting guides

Project Status: 95% Complete (Production-Ready)
Phase 7 (optional work context APIs) remains for future enhancement.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-17 06:00:26 -07:00
parent 1452361c21
commit 390b10b32c
201 changed files with 55619 additions and 34 deletions

View File

@@ -0,0 +1,410 @@
# Conversation Parser Usage Guide
Complete guide for using the ClaudeTools conversation transcript parser and intelligent categorizer.
## Overview
The conversation parser extracts, analyzes, and categorizes conversation data from Claude Desktop/Code sessions. It intelligently classifies conversations as **MSP Work**, **Development**, or **General** and compresses them for efficient database storage.
## Main Functions
### 1. `parse_jsonl_conversation(file_path: str)`
Parse conversation files (`.jsonl` or `.json`) and extract structured data.
**Returns:**
```python
{
"messages": [{"role": str, "content": str, "timestamp": str}, ...],
"metadata": {"title": str, "model": str, "created_at": str, ...},
"file_paths": [str, ...], # Auto-extracted from content
"tool_calls": [{"tool": str, "count": int}, ...],
"duration_seconds": int,
"message_count": int
}
```
**Example:**
```python
from api.utils.conversation_parser import parse_jsonl_conversation
conversation = parse_jsonl_conversation("/path/to/conversation.jsonl")
print(f"Found {conversation['message_count']} messages")
print(f"Duration: {conversation['duration_seconds']} seconds")
```
---
### 2. `categorize_conversation(messages: List[Dict])`
Intelligently categorize conversation content using weighted keyword analysis.
**Returns:** `"msp"`, `"development"`, or `"general"`
**Categorization Logic:**
**MSP Keywords (higher weight = stronger signal):**
- Client/Infrastructure: client, customer, site, firewall, network, server
- Services: support, ticket, incident, billable, invoice
- Microsoft 365: office365, azure, exchange, sharepoint, teams
- MSP-specific: managed service, service desk, RDS, terminal server
**Development Keywords:**
- API/Backend: api, endpoint, fastapi, flask, rest, webhook
- Database: database, migration, alembic, sqlalchemy, postgresql
- Code: implement, refactor, debug, test, pytest, function, class
- Tools: docker, kubernetes, ci/cd, deployment
**Example:**
```python
from api.utils.conversation_parser import categorize_conversation
# MSP conversation
messages = [
{"role": "user", "content": "Client firewall blocking Office365"},
{"role": "assistant", "content": "Checking client site configuration"}
]
category = categorize_conversation(messages) # Returns "msp"
# Development conversation
messages = [
{"role": "user", "content": "Build FastAPI endpoint with PostgreSQL"},
{"role": "assistant", "content": "Creating API using SQLAlchemy"}
]
category = categorize_conversation(messages) # Returns "development"
```
---
### 3. `extract_context_from_conversation(conversation: Dict)`
Extract dense, compressed context suitable for database storage.
**Returns:**
```python
{
"category": str, # "msp", "development", or "general"
"summary": Dict, # From compress_conversation_summary()
"tags": List[str], # Auto-extracted technology/topic tags
"decisions": List[Dict], # Key decisions with rationale
"key_files": List[str], # Top 20 file paths mentioned
"key_tools": List[str], # Top 10 tools used
"metrics": {
"message_count": int,
"duration_seconds": int,
"file_count": int,
"tool_count": int,
"decision_count": int,
"quality_score": float # 0-10 quality rating
},
"raw_metadata": Dict # Original metadata
}
```
**Quality Score Calculation:**
- More messages = higher quality (up to 5 points)
- Decisions indicate depth (up to 2 points)
- File mentions indicate concrete work (up to 2 points)
- Sessions >5 minutes (+1 point)
**Example:**
```python
from api.utils.conversation_parser import (
parse_jsonl_conversation,
extract_context_from_conversation
)
# Parse and extract context
conversation = parse_jsonl_conversation("/path/to/file.jsonl")
context = extract_context_from_conversation(conversation)
print(f"Category: {context['category']}")
print(f"Tags: {context['tags']}")
print(f"Quality: {context['metrics']['quality_score']}/10")
print(f"Decisions: {len(context['decisions'])}")
```
---
### 4. `scan_folder_for_conversations(base_path: str)`
Recursively find all conversation files in a directory.
**Features:**
- Finds both `.jsonl` and `.json` files
- Automatically skips config files (config.json, settings.json)
- Skips common non-conversation files (package.json, tsconfig.json)
- Cross-platform path handling
**Returns:** List of absolute file paths
**Example:**
```python
from api.utils.conversation_parser import scan_folder_for_conversations
# Scan Claude Code sessions
files = scan_folder_for_conversations(
r"C:\Users\MikeSwanson\claude-projects"
)
print(f"Found {len(files)} conversation files")
for file in files[:5]:
print(f" - {file}")
```
---
## Complete Workflow Example
### Batch Process Conversation Folder
```python
from api.utils.conversation_parser import (
scan_folder_for_conversations,
parse_jsonl_conversation,
extract_context_from_conversation
)
# 1. Scan for conversation files
base_path = r"C:\Users\MikeSwanson\claude-projects"
files = scan_folder_for_conversations(base_path)
# 2. Process each conversation
contexts = []
for file_path in files:
try:
# Parse conversation
conversation = parse_jsonl_conversation(file_path)
# Extract context
context = extract_context_from_conversation(conversation)
# Add source file
context["source_file"] = file_path
contexts.append(context)
print(f"Processed: {file_path}")
print(f" Category: {context['category']}")
print(f" Messages: {context['metrics']['message_count']}")
print(f" Quality: {context['metrics']['quality_score']}/10")
except Exception as e:
print(f"Error processing {file_path}: {e}")
# 3. Categorize by type
msp_contexts = [c for c in contexts if c['category'] == 'msp']
dev_contexts = [c for c in contexts if c['category'] == 'development']
print(f"\nSummary:")
print(f" MSP conversations: {len(msp_contexts)}")
print(f" Development conversations: {len(dev_contexts)}")
```
### Using the Batch Helper Function
```python
from api.utils.conversation_parser import batch_process_conversations
def progress_callback(file_path, context):
"""Called for each processed file"""
print(f"Processed: {context['category']} - {context['metrics']['quality_score']}/10")
# Process all conversations with callback
contexts = batch_process_conversations(
r"C:\Users\MikeSwanson\claude-projects",
output_callback=progress_callback
)
print(f"Total processed: {len(contexts)}")
```
---
## Integration with Database
### Insert Context into Database
```python
from sqlalchemy.orm import Session
from api.models import ContextSnippet
from api.utils.conversation_parser import (
parse_jsonl_conversation,
extract_context_from_conversation
)
def import_conversation_to_db(db: Session, file_path: str):
"""Import a conversation file into the database."""
# 1. Parse and extract context
conversation = parse_jsonl_conversation(file_path)
context = extract_context_from_conversation(conversation)
# 2. Create context snippet for summary
summary_snippet = ContextSnippet(
content=str(context['summary']),
snippet_type="session_summary",
tags=context['tags'],
importance=min(10, int(context['metrics']['quality_score'])),
metadata={
"category": context['category'],
"source_file": file_path,
"message_count": context['metrics']['message_count'],
"duration_seconds": context['metrics']['duration_seconds']
}
)
db.add(summary_snippet)
# 3. Create decision snippets
for decision in context['decisions']:
decision_snippet = ContextSnippet(
content=f"{decision['decision']} - {decision['rationale']}",
snippet_type="decision",
tags=context['tags'][:5],
importance=7 if decision['impact'] == 'high' else 5,
metadata={
"category": context['category'],
"impact": decision['impact'],
"source_file": file_path
}
)
db.add(decision_snippet)
db.commit()
print(f"Imported conversation from {file_path}")
```
---
## CLI Quick Test
The module includes a standalone CLI for quick testing:
```bash
# Test a specific conversation file
python api/utils/conversation_parser.py /path/to/conversation.jsonl
# Output:
# Conversation: Build authentication system
# Category: development
# Messages: 15
# Duration: 1200s (20m)
# Tags: development, fastapi, postgresql, auth, api
# Quality: 7.5/10
```
---
## Categorization Examples
### MSP Conversation
```
User: Client at BGBuilders site reported VPN connection issues
Assistant: I'll check the firewall configuration and VPN settings for the client
```
**Category:** `msp`
**Score Logic:** client (3), site (2), vpn (2), firewall (3) = 10 points
### Development Conversation
```
User: Build a FastAPI REST API with PostgreSQL and implement JWT authentication
Assistant: I'll create the API endpoints using SQLAlchemy ORM and add JWT token support
```
**Category:** `development`
**Score Logic:** fastapi (4), api (3), postgresql (3), jwt (auth tag), sqlalchemy (3) = 13+ points
### General Conversation
```
User: What's the best way to organize my project files?
Assistant: I recommend organizing by feature rather than by file type
```
**Category:** `general`
**Score Logic:** No strong MSP or dev keywords, low scores on both
---
## Advanced Features
### File Path Extraction
Automatically extracts file paths from conversation content:
```python
conversation = parse_jsonl_conversation("/path/to/file.jsonl")
print(conversation['file_paths'])
# ['api/auth.py', 'api/models.py', 'tests/test_auth.py']
```
Supports:
- Windows absolute paths: `C:\Users\...\file.py`
- Unix absolute paths: `/home/user/file.py`
- Relative paths: `./api/file.py`, `../utils/helper.py`
- Code paths: `api/auth.py`, `src/models.py`
### Tool Call Tracking
Automatically tracks which tools were used:
```python
conversation = parse_jsonl_conversation("/path/to/file.jsonl")
print(conversation['tool_calls'])
# [
# {"tool": "write", "count": 5},
# {"tool": "read", "count": 3},
# {"tool": "bash", "count": 2}
# ]
```
---
## Best Practices
1. **Use quality scores to filter**: Only import high-quality conversations (score > 5.0)
2. **Batch process in chunks**: Process large folders in batches to manage memory
3. **Add source file tracking**: Always include `source_file` in context for traceability
4. **Validate before import**: Check `message_count > 0` before importing to database
5. **Use callbacks for progress**: Implement progress callbacks for long-running batch jobs
---
## Error Handling
```python
from api.utils.conversation_parser import parse_jsonl_conversation
try:
conversation = parse_jsonl_conversation(file_path)
if conversation['message_count'] == 0:
print("Warning: Empty conversation, skipping")
return
# Process conversation...
except FileNotFoundError:
print(f"File not found: {file_path}")
except ValueError as e:
print(f"Invalid file format: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
```
---
## Related Files
- **`context_compression.py`**: Provides compression utilities used by the parser
- **`test_conversation_parser.py`**: Comprehensive test suite with examples
- **Database Models**: `api/models.py` - ContextSnippet model for storage
---
## Future Enhancements
Potential improvements for future versions:
1. **Multi-language detection**: Identify primary programming language
2. **Sentiment analysis**: Detect problem-solving vs. exploratory conversations
3. **Entity extraction**: Extract specific client names, project names, technologies
4. **Time-based patterns**: Identify working hours, session patterns
5. **Conversation linking**: Link related conversations by topic/project