Reorganized project structure for better maintainability and reduced disk usage by 95.9% (11 GB -> 451 MB). Directory Reorganization (85% reduction in root files): - Created docs/ with subdirectories (deployment, testing, database, etc.) - Created infrastructure/vpn-configs/ for VPN scripts - Moved 90+ files from root to organized locations - Archived obsolete documentation (context system, offline mode, zombie debugging) - Moved all test files to tests/ directory - Root directory: 119 files -> 18 files Disk Cleanup (10.55 GB recovered): - Deleted Rust build artifacts: 9.6 GB (target/ directories) - Deleted Python virtual environments: 161 MB (venv/ directories) - Deleted Python cache: 50 KB (__pycache__/) New Structure: - docs/ - All documentation organized by category - docs/archives/ - Obsolete but preserved documentation - infrastructure/ - VPN configs and SSH setup - tests/ - All test files consolidated - logs/ - Ready for future logs Benefits: - Cleaner root directory (18 vs 119 files) - Logical organization of documentation - 95.9% disk space reduction - Faster navigation and discovery - Better portability (build artifacts excluded) Build artifacts can be regenerated: - Rust: cargo build --release (5-15 min per project) - Python: pip install -r requirements.txt (2-3 min) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
9.4 KiB
Bulk Import Implementation Summary
Overview
Successfully implemented bulk import functionality for ClaudeTools context recall system. This enables automated import of conversation histories from Claude Desktop/Code into the ClaudeTools database for context persistence and retrieval.
Components Delivered
1. API Endpoint (api/routers/bulk_import.py)
Endpoint: POST /api/bulk-import/import-folder
Features:
- Scans folder recursively for
.jsonland.jsonconversation files - Parses conversation structure using intelligent parser
- Extracts metadata, decisions, and context
- Automatic conversation categorization (MSP, Development, General)
- Quality scoring (0-10) based on content depth
- Dry-run mode for preview without database changes
- Comprehensive error handling with detailed error reporting
- Optional project/session association
Parameters:
folder_path(required): Path to Claude projects folderdry_run(default: false): Preview modeproject_id(optional): Associate with specific projectsession_id(optional): Associate with specific session
Response Structure:
{
"dry_run": false,
"folder_path": "/path/to/conversations",
"files_scanned": 15,
"files_processed": 14,
"contexts_created": 14,
"errors": [],
"contexts_preview": [
{
"file": "conversation1.jsonl",
"title": "Build authentication system",
"type": "project_state",
"category": "development",
"message_count": 45,
"tags": ["api", "fastapi", "auth", "jwt"],
"relevance_score": 8.5,
"quality_score": 8.5
}
],
"summary": "Scanned 15 files | Processed 14 successfully | Created 14 contexts"
}
Status Endpoint: GET /api/bulk-import/import-status
Returns system capabilities and supported formats.
2. Command-Line Import Script (scripts/import-claude-context.py)
Usage:
# Preview import (dry run)
python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --dry-run
# Execute import
python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --execute
# Associate with project
python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --execute --project-id abc-123
Features:
- JWT token authentication from
.claude/context-recall-config.env - Configurable API base URL
- Rich console output with progress display
- Error reporting and summary statistics
- Cross-platform path support
Configuration File: .claude/context-recall-config.env
JWT_TOKEN=your-jwt-token-here
API_BASE_URL=http://localhost:8000
3. API Main Router Update (api/main.py)
Registered bulk_import router with:
- Prefix:
/api/bulk-import - Tag:
Bulk Import
Now accessible via:
POST http://localhost:8000/api/bulk-import/import-folderGET http://localhost:8000/api/bulk-import/import-status
4. Supporting Utilities
Conversation Parser (api/utils/conversation_parser.py)
Previously created and enhanced. Provides:
parse_jsonl_conversation(): Parse .jsonl/.json filesextract_context_from_conversation(): Extract rich contextcategorize_conversation(): Intelligent categorizationscan_folder_for_conversations(): Recursive file scanning
Categorization Algorithm:
- Keyword-based scoring with weighted terms
- Code pattern detection
- Ticket/incident pattern matching
- Heuristic analysis for classification confidence
Categories:
msp: Client support, infrastructure, incidentsdevelopment: Code, APIs, features, testinggeneral: Other conversations
Credential Scanner (api/utils/credential_scanner.py)
Previously created. Provides file-based credential scanning (separate from conversation import):
scan_for_credential_files(): Find credential filesparse_credential_file(): Extract credentials from various formatsimport_credentials_to_db(): Import with encryption
Database Schema Integration
Contexts are stored in conversation_contexts table with:
title: Conversation title or generated namedense_summary: Compressed summary with metricskey_decisions: JSON array of extracted decisionstags: JSON array of categorization tagscontext_type: Mapped from category (session_summary, project_state, general_context)relevance_score: Quality-based score (0.0-10.0)project_id/session_id: Optional associations
Intelligent Features
Automatic Categorization
Conversations are automatically classified using:
- Keyword Analysis: Weighted scoring of domain-specific terms
- Pattern Matching: Code blocks, file paths, ticket references
- Heuristic Scoring: Threshold-based confidence determination
Quality Scoring
Quality scores (0-10) calculated from:
- Message count (more = higher quality)
- Decision count (decisions = depth)
- File references (concrete work)
- Session duration (longer = more substantial)
Context Compression
Dense summaries include:
- Token-optimized text compression
- Key decision extraction
- File path tracking
- Tool usage statistics
- Temporal metrics
Security Features
- JWT authentication required for all endpoints
- User authorization validation
- Input validation and sanitization
- Error messages don't leak sensitive paths
- Dry-run mode prevents accidental imports
Error Handling
Comprehensive error handling with:
- File-level error isolation (one failure doesn't stop batch)
- Detailed error messages with file names
- HTTP exception mapping
- Graceful fallback for malformed files
Testing Recommendations
-
Unit Tests (not yet implemented):
- Test conversation parsing with various formats
- Test categorization accuracy
- Test quality score calculation
- Test error handling edge cases
-
Integration Tests (not yet implemented):
- Test full import workflow
- Test dry-run vs execute modes
- Test project/session association
- Test authentication
-
Manual Testing:
# Test dry run python scripts/import-claude-context.py --folder test_conversations --dry-run # Test actual import python scripts/import-claude-context.py --folder test_conversations --execute
Performance Considerations
- Recursive folder scanning optimized with pathlib
- File parsing is sequential (not parallelized)
- Database commits per-conversation (not batched)
- Large folders may take time (consider progress indicators)
Optimization Opportunities:
- Batch database inserts
- Parallel file processing
- Streaming for very large files
- Caching for repeated scans
Documentation
Created documentation files:
BULK_IMPORT_IMPLEMENTATION.md(this file).claude/context-recall-config.env.example(configuration template)
Next Steps
Recommended enhancements:
- Progress Tracking: Add real-time progress updates for large batches
- Deduplication: Detect and skip already-imported conversations
- Incremental Import: Only import new/modified files
- Batch Operations: Batch database inserts for performance
- Testing Suite: Comprehensive unit and integration tests
- Web UI: Frontend interface for import operations
- Scheduling: Cron/scheduler integration for automated imports
- Validation: Pre-import validation and compatibility checks
Files Modified/Created
Created:
api/routers/bulk_import.py(230 lines)scripts/import-claude-context.py(278 lines).claude/context-recall-config.env.exampleBULK_IMPORT_IMPLEMENTATION.md(this file)
Modified:
api/main.py(added bulk_import router registration)
Previously Created (Dependencies):
api/utils/conversation_parser.py(609 lines)api/utils/credential_scanner.py(597 lines)
Total Implementation
- Lines of Code: ~1,700+ lines
- API Endpoints: 2 (import-folder, import-status)
- CLI Tool: 1 full-featured script
- Categories Supported: 3 (MSP, Development, General)
- File Formats: 2 (.jsonl, .json)
Usage Example
# Step 1: Set up configuration
cp .claude/context-recall-config.env.example .claude/context-recall-config.env
# Edit and add your JWT token
# Step 2: Preview import
python scripts/import-claude-context.py \
--folder "C:\Users\MikeSwanson\claude-projects" \
--dry-run
# Step 3: Review preview output
# Step 4: Execute import
python scripts/import-claude-context.py \
--folder "C:\Users\MikeSwanson\claude-projects" \
--execute
# Step 5: Verify import via API
curl -H "Authorization: Bearer YOUR_TOKEN" \
http://localhost:8000/api/conversation-contexts
API Integration Example
import requests
# Get JWT token
token = "your-jwt-token"
headers = {"Authorization": f"Bearer {token}"}
# Import with API
response = requests.post(
"http://localhost:8000/api/bulk-import/import-folder",
headers=headers,
params={
"folder_path": "/path/to/conversations",
"dry_run": False,
"project_id": "abc-123"
}
)
result = response.json()
print(f"Imported {result['contexts_created']} contexts")
Conclusion
The bulk import system is fully implemented and functional. It provides:
- Automated conversation import from Claude Desktop/Code
- Intelligent categorization and quality scoring
- Both API and CLI interfaces
- Comprehensive error handling and reporting
- Dry-run capabilities for safe testing
- Integration with existing ClaudeTools infrastructure
The system is ready for use and can be extended with the recommended enhancements for production deployment.