feat: Major directory reorganization and cleanup
Reorganized project structure for better maintainability and reduced disk usage by 95.9% (11 GB -> 451 MB). Directory Reorganization (85% reduction in root files): - Created docs/ with subdirectories (deployment, testing, database, etc.) - Created infrastructure/vpn-configs/ for VPN scripts - Moved 90+ files from root to organized locations - Archived obsolete documentation (context system, offline mode, zombie debugging) - Moved all test files to tests/ directory - Root directory: 119 files -> 18 files Disk Cleanup (10.55 GB recovered): - Deleted Rust build artifacts: 9.6 GB (target/ directories) - Deleted Python virtual environments: 161 MB (venv/ directories) - Deleted Python cache: 50 KB (__pycache__/) New Structure: - docs/ - All documentation organized by category - docs/archives/ - Obsolete but preserved documentation - infrastructure/ - VPN configs and SSH setup - tests/ - All test files consolidated - logs/ - Ready for future logs Benefits: - Cleaner root directory (18 vs 119 files) - Logical organization of documentation - 95.9% disk space reduction - Faster navigation and discovery - Better portability (build artifacts excluded) Build artifacts can be regenerated: - Rust: cargo build --release (5-15 min per project) - Python: pip install -r requirements.txt (2-3 min) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
312
docs/database/BULK_IMPORT_IMPLEMENTATION.md
Normal file
312
docs/database/BULK_IMPORT_IMPLEMENTATION.md
Normal file
@@ -0,0 +1,312 @@
|
||||
# Bulk Import Implementation Summary
|
||||
|
||||
## Overview
|
||||
|
||||
Successfully implemented bulk import functionality for ClaudeTools context recall system. This enables automated import of conversation histories from Claude Desktop/Code into the ClaudeTools database for context persistence and retrieval.
|
||||
|
||||
## Components Delivered
|
||||
|
||||
### 1. API Endpoint (`api/routers/bulk_import.py`)
|
||||
|
||||
**Endpoint**: `POST /api/bulk-import/import-folder`
|
||||
|
||||
**Features**:
|
||||
- Scans folder recursively for `.jsonl` and `.json` conversation files
|
||||
- Parses conversation structure using intelligent parser
|
||||
- Extracts metadata, decisions, and context
|
||||
- Automatic conversation categorization (MSP, Development, General)
|
||||
- Quality scoring (0-10) based on content depth
|
||||
- Dry-run mode for preview without database changes
|
||||
- Comprehensive error handling with detailed error reporting
|
||||
- Optional project/session association
|
||||
|
||||
**Parameters**:
|
||||
- `folder_path` (required): Path to Claude projects folder
|
||||
- `dry_run` (default: false): Preview mode
|
||||
- `project_id` (optional): Associate with specific project
|
||||
- `session_id` (optional): Associate with specific session
|
||||
|
||||
**Response Structure**:
|
||||
```json
|
||||
{
|
||||
"dry_run": false,
|
||||
"folder_path": "/path/to/conversations",
|
||||
"files_scanned": 15,
|
||||
"files_processed": 14,
|
||||
"contexts_created": 14,
|
||||
"errors": [],
|
||||
"contexts_preview": [
|
||||
{
|
||||
"file": "conversation1.jsonl",
|
||||
"title": "Build authentication system",
|
||||
"type": "project_state",
|
||||
"category": "development",
|
||||
"message_count": 45,
|
||||
"tags": ["api", "fastapi", "auth", "jwt"],
|
||||
"relevance_score": 8.5,
|
||||
"quality_score": 8.5
|
||||
}
|
||||
],
|
||||
"summary": "Scanned 15 files | Processed 14 successfully | Created 14 contexts"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Endpoint**: `GET /api/bulk-import/import-status`
|
||||
|
||||
Returns system capabilities and supported formats.
|
||||
|
||||
### 2. Command-Line Import Script (`scripts/import-claude-context.py`)
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
# Preview import (dry run)
|
||||
python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --dry-run
|
||||
|
||||
# Execute import
|
||||
python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --execute
|
||||
|
||||
# Associate with project
|
||||
python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --execute --project-id abc-123
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- JWT token authentication from `.claude/context-recall-config.env`
|
||||
- Configurable API base URL
|
||||
- Rich console output with progress display
|
||||
- Error reporting and summary statistics
|
||||
- Cross-platform path support
|
||||
|
||||
**Configuration File**: `.claude/context-recall-config.env`
|
||||
```env
|
||||
JWT_TOKEN=your-jwt-token-here
|
||||
API_BASE_URL=http://localhost:8000
|
||||
```
|
||||
|
||||
### 3. API Main Router Update (`api/main.py`)
|
||||
|
||||
Registered bulk_import router with:
|
||||
- Prefix: `/api/bulk-import`
|
||||
- Tag: `Bulk Import`
|
||||
|
||||
Now accessible via:
|
||||
- `POST http://localhost:8000/api/bulk-import/import-folder`
|
||||
- `GET http://localhost:8000/api/bulk-import/import-status`
|
||||
|
||||
### 4. Supporting Utilities
|
||||
|
||||
#### Conversation Parser (`api/utils/conversation_parser.py`)
|
||||
|
||||
Previously created and enhanced. Provides:
|
||||
- `parse_jsonl_conversation()`: Parse .jsonl/.json files
|
||||
- `extract_context_from_conversation()`: Extract rich context
|
||||
- `categorize_conversation()`: Intelligent categorization
|
||||
- `scan_folder_for_conversations()`: Recursive file scanning
|
||||
|
||||
**Categorization Algorithm**:
|
||||
- Keyword-based scoring with weighted terms
|
||||
- Code pattern detection
|
||||
- Ticket/incident pattern matching
|
||||
- Heuristic analysis for classification confidence
|
||||
|
||||
**Categories**:
|
||||
- `msp`: Client support, infrastructure, incidents
|
||||
- `development`: Code, APIs, features, testing
|
||||
- `general`: Other conversations
|
||||
|
||||
#### Credential Scanner (`api/utils/credential_scanner.py`)
|
||||
|
||||
Previously created. Provides file-based credential scanning (separate from conversation import):
|
||||
- `scan_for_credential_files()`: Find credential files
|
||||
- `parse_credential_file()`: Extract credentials from various formats
|
||||
- `import_credentials_to_db()`: Import with encryption
|
||||
|
||||
## Database Schema Integration
|
||||
|
||||
Contexts are stored in `conversation_contexts` table with:
|
||||
- `title`: Conversation title or generated name
|
||||
- `dense_summary`: Compressed summary with metrics
|
||||
- `key_decisions`: JSON array of extracted decisions
|
||||
- `tags`: JSON array of categorization tags
|
||||
- `context_type`: Mapped from category (session_summary, project_state, general_context)
|
||||
- `relevance_score`: Quality-based score (0.0-10.0)
|
||||
- `project_id` / `session_id`: Optional associations
|
||||
|
||||
## Intelligent Features
|
||||
|
||||
### Automatic Categorization
|
||||
|
||||
Conversations are automatically classified using:
|
||||
1. **Keyword Analysis**: Weighted scoring of domain-specific terms
|
||||
2. **Pattern Matching**: Code blocks, file paths, ticket references
|
||||
3. **Heuristic Scoring**: Threshold-based confidence determination
|
||||
|
||||
### Quality Scoring
|
||||
|
||||
Quality scores (0-10) calculated from:
|
||||
- Message count (more = higher quality)
|
||||
- Decision count (decisions = depth)
|
||||
- File references (concrete work)
|
||||
- Session duration (longer = more substantial)
|
||||
|
||||
### Context Compression
|
||||
|
||||
Dense summaries include:
|
||||
- Token-optimized text compression
|
||||
- Key decision extraction
|
||||
- File path tracking
|
||||
- Tool usage statistics
|
||||
- Temporal metrics
|
||||
|
||||
## Security Features
|
||||
|
||||
- JWT authentication required for all endpoints
|
||||
- User authorization validation
|
||||
- Input validation and sanitization
|
||||
- Error messages don't leak sensitive paths
|
||||
- Dry-run mode prevents accidental imports
|
||||
|
||||
## Error Handling
|
||||
|
||||
Comprehensive error handling with:
|
||||
- File-level error isolation (one failure doesn't stop batch)
|
||||
- Detailed error messages with file names
|
||||
- HTTP exception mapping
|
||||
- Graceful fallback for malformed files
|
||||
|
||||
## Testing Recommendations
|
||||
|
||||
1. **Unit Tests** (not yet implemented):
|
||||
- Test conversation parsing with various formats
|
||||
- Test categorization accuracy
|
||||
- Test quality score calculation
|
||||
- Test error handling edge cases
|
||||
|
||||
2. **Integration Tests** (not yet implemented):
|
||||
- Test full import workflow
|
||||
- Test dry-run vs execute modes
|
||||
- Test project/session association
|
||||
- Test authentication
|
||||
|
||||
3. **Manual Testing**:
|
||||
```bash
|
||||
# Test dry run
|
||||
python scripts/import-claude-context.py --folder test_conversations --dry-run
|
||||
|
||||
# Test actual import
|
||||
python scripts/import-claude-context.py --folder test_conversations --execute
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- Recursive folder scanning optimized with pathlib
|
||||
- File parsing is sequential (not parallelized)
|
||||
- Database commits per-conversation (not batched)
|
||||
- Large folders may take time (consider progress indicators)
|
||||
|
||||
**Optimization Opportunities**:
|
||||
- Batch database inserts
|
||||
- Parallel file processing
|
||||
- Streaming for very large files
|
||||
- Caching for repeated scans
|
||||
|
||||
## Documentation
|
||||
|
||||
Created documentation files:
|
||||
- `BULK_IMPORT_IMPLEMENTATION.md` (this file)
|
||||
- `.claude/context-recall-config.env.example` (configuration template)
|
||||
|
||||
## Next Steps
|
||||
|
||||
Recommended enhancements:
|
||||
|
||||
1. **Progress Tracking**: Add real-time progress updates for large batches
|
||||
2. **Deduplication**: Detect and skip already-imported conversations
|
||||
3. **Incremental Import**: Only import new/modified files
|
||||
4. **Batch Operations**: Batch database inserts for performance
|
||||
5. **Testing Suite**: Comprehensive unit and integration tests
|
||||
6. **Web UI**: Frontend interface for import operations
|
||||
7. **Scheduling**: Cron/scheduler integration for automated imports
|
||||
8. **Validation**: Pre-import validation and compatibility checks
|
||||
|
||||
## Files Modified/Created
|
||||
|
||||
### Created:
|
||||
- `api/routers/bulk_import.py` (230 lines)
|
||||
- `scripts/import-claude-context.py` (278 lines)
|
||||
- `.claude/context-recall-config.env.example`
|
||||
- `BULK_IMPORT_IMPLEMENTATION.md` (this file)
|
||||
|
||||
### Modified:
|
||||
- `api/main.py` (added bulk_import router registration)
|
||||
|
||||
### Previously Created (Dependencies):
|
||||
- `api/utils/conversation_parser.py` (609 lines)
|
||||
- `api/utils/credential_scanner.py` (597 lines)
|
||||
|
||||
## Total Implementation
|
||||
|
||||
- **Lines of Code**: ~1,700+ lines
|
||||
- **API Endpoints**: 2 (import-folder, import-status)
|
||||
- **CLI Tool**: 1 full-featured script
|
||||
- **Categories Supported**: 3 (MSP, Development, General)
|
||||
- **File Formats**: 2 (.jsonl, .json)
|
||||
|
||||
## Usage Example
|
||||
|
||||
```bash
|
||||
# Step 1: Set up configuration
|
||||
cp .claude/context-recall-config.env.example .claude/context-recall-config.env
|
||||
# Edit and add your JWT token
|
||||
|
||||
# Step 2: Preview import
|
||||
python scripts/import-claude-context.py \
|
||||
--folder "C:\Users\MikeSwanson\claude-projects" \
|
||||
--dry-run
|
||||
|
||||
# Step 3: Review preview output
|
||||
|
||||
# Step 4: Execute import
|
||||
python scripts/import-claude-context.py \
|
||||
--folder "C:\Users\MikeSwanson\claude-projects" \
|
||||
--execute
|
||||
|
||||
# Step 5: Verify import via API
|
||||
curl -H "Authorization: Bearer YOUR_TOKEN" \
|
||||
http://localhost:8000/api/conversation-contexts
|
||||
```
|
||||
|
||||
## API Integration Example
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Get JWT token
|
||||
token = "your-jwt-token"
|
||||
headers = {"Authorization": f"Bearer {token}"}
|
||||
|
||||
# Import with API
|
||||
response = requests.post(
|
||||
"http://localhost:8000/api/bulk-import/import-folder",
|
||||
headers=headers,
|
||||
params={
|
||||
"folder_path": "/path/to/conversations",
|
||||
"dry_run": False,
|
||||
"project_id": "abc-123"
|
||||
}
|
||||
)
|
||||
|
||||
result = response.json()
|
||||
print(f"Imported {result['contexts_created']} contexts")
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The bulk import system is fully implemented and functional. It provides:
|
||||
- Automated conversation import from Claude Desktop/Code
|
||||
- Intelligent categorization and quality scoring
|
||||
- Both API and CLI interfaces
|
||||
- Comprehensive error handling and reporting
|
||||
- Dry-run capabilities for safe testing
|
||||
- Integration with existing ClaudeTools infrastructure
|
||||
|
||||
The system is ready for use and can be extended with the recommended enhancements for production deployment.
|
||||
Reference in New Issue
Block a user