feat: Major directory reorganization and cleanup

Reorganized project structure for better maintainability and reduced disk usage by 95.9% (11 GB -> 451 MB). Directory Reorganization (85% reduction in root files): - Created docs/ with subdirectories (deployment, testing, database, etc.) - Created infrastructure/vpn-configs/ for VPN scripts - Moved 90+ files from root to organized locations - Archived obsolete documentation (context system, offline mode, zombie debugging) - Moved all test files to tests/ directory - Root directory: 119 files -> 18 files Disk Cleanup (10.55 GB recovered): - Deleted Rust build artifacts: 9.6 GB (target/ directories) - Deleted Python virtual environments: 161 MB (venv/ directories) - Deleted Python cache: 50 KB (__pycache__/) New Structure: - docs/ - All documentation organized by category - docs/archives/ - Obsolete but preserved documentation - infrastructure/ - VPN configs and SSH setup - tests/ - All test files consolidated - logs/ - Ready for future logs Benefits: - Cleaner root directory (18 vs 119 files) - Logical organization of documentation - 95.9% disk space reduction - Faster navigation and discovery - Better portability (build artifacts excluded) Build artifacts can be regenerated: - Rust: cargo build --release (5-15 min per project) - Python: pip install -r requirements.txt (2-3 min) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 20:42:28 -07:00
parent 89e5118306
commit 06f7617718
96 changed files with 54 additions and 2639 deletions
--- a/docs/database/BULK_IMPORT_IMPLEMENTATION.md
+++ b/docs/database/BULK_IMPORT_IMPLEMENTATION.md
@@ -0,0 +1,312 @@
+# Bulk Import Implementation Summary
+
+## Overview
+
+Successfully implemented bulk import functionality for ClaudeTools context recall system. This enables automated import of conversation histories from Claude Desktop/Code into the ClaudeTools database for context persistence and retrieval.
+
+## Components Delivered
+
+### 1. API Endpoint (`api/routers/bulk_import.py`)
+
+**Endpoint**: `POST /api/bulk-import/import-folder`
+
+**Features**:
+- Scans folder recursively for `.jsonl` and `.json` conversation files
+- Parses conversation structure using intelligent parser
+- Extracts metadata, decisions, and context
+- Automatic conversation categorization (MSP, Development, General)
+- Quality scoring (0-10) based on content depth
+- Dry-run mode for preview without database changes
+- Comprehensive error handling with detailed error reporting
+- Optional project/session association
+
+**Parameters**:
+- `folder_path` (required): Path to Claude projects folder
+- `dry_run` (default: false): Preview mode
+- `project_id` (optional): Associate with specific project
+- `session_id` (optional): Associate with specific session
+
+**Response Structure**:
+```json
+{
+  "dry_run": false,
+  "folder_path": "/path/to/conversations",
+  "files_scanned": 15,
+  "files_processed": 14,
+  "contexts_created": 14,
+  "errors": [],
+  "contexts_preview": [
+    {
+      "file": "conversation1.jsonl",
+      "title": "Build authentication system",
+      "type": "project_state",
+      "category": "development",
+      "message_count": 45,
+      "tags": ["api", "fastapi", "auth", "jwt"],
+      "relevance_score": 8.5,
+      "quality_score": 8.5
+    }
+  ],
+  "summary": "Scanned 15 files | Processed 14 successfully | Created 14 contexts"
+}
+```
+
+**Status Endpoint**: `GET /api/bulk-import/import-status`
+
+Returns system capabilities and supported formats.
+
+### 2. Command-Line Import Script (`scripts/import-claude-context.py`)
+
+**Usage**:
+```bash
+# Preview import (dry run)
+python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --dry-run
+
+# Execute import
+python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --execute
+
+# Associate with project
+python scripts/import-claude-context.py --folder "C:\Users\MikeSwanson\claude-projects" --execute --project-id abc-123
+```
+
+**Features**:
+- JWT token authentication from `.claude/context-recall-config.env`
+- Configurable API base URL
+- Rich console output with progress display
+- Error reporting and summary statistics
+- Cross-platform path support
+
+**Configuration File**: `.claude/context-recall-config.env`
+```env
+JWT_TOKEN=your-jwt-token-here
+API_BASE_URL=http://localhost:8000
+```
+
+### 3. API Main Router Update (`api/main.py`)
+
+Registered bulk_import router with:
+- Prefix: `/api/bulk-import`
+- Tag: `Bulk Import`
+
+Now accessible via:
+- `POST http://localhost:8000/api/bulk-import/import-folder`
+- `GET http://localhost:8000/api/bulk-import/import-status`
+
+### 4. Supporting Utilities
+
+#### Conversation Parser (`api/utils/conversation_parser.py`)
+
+Previously created and enhanced. Provides:
+- `parse_jsonl_conversation()`: Parse .jsonl/.json files
+- `extract_context_from_conversation()`: Extract rich context
+- `categorize_conversation()`: Intelligent categorization
+- `scan_folder_for_conversations()`: Recursive file scanning
+
+**Categorization Algorithm**:
+- Keyword-based scoring with weighted terms
+- Code pattern detection
+- Ticket/incident pattern matching
+- Heuristic analysis for classification confidence
+
+**Categories**:
+- `msp`: Client support, infrastructure, incidents
+- `development`: Code, APIs, features, testing
+- `general`: Other conversations
+
+#### Credential Scanner (`api/utils/credential_scanner.py`)
+
+Previously created. Provides file-based credential scanning (separate from conversation import):
+- `scan_for_credential_files()`: Find credential files
+- `parse_credential_file()`: Extract credentials from various formats
+- `import_credentials_to_db()`: Import with encryption
+
+## Database Schema Integration
+
+Contexts are stored in `conversation_contexts` table with:
+- `title`: Conversation title or generated name
+- `dense_summary`: Compressed summary with metrics
+- `key_decisions`: JSON array of extracted decisions
+- `tags`: JSON array of categorization tags
+- `context_type`: Mapped from category (session_summary, project_state, general_context)
+- `relevance_score`: Quality-based score (0.0-10.0)
+- `project_id` / `session_id`: Optional associations
+
+## Intelligent Features
+
+### Automatic Categorization
+
+Conversations are automatically classified using:
+1. **Keyword Analysis**: Weighted scoring of domain-specific terms
+2. **Pattern Matching**: Code blocks, file paths, ticket references
+3. **Heuristic Scoring**: Threshold-based confidence determination
+
+### Quality Scoring
+
+Quality scores (0-10) calculated from:
+- Message count (more = higher quality)
+- Decision count (decisions = depth)
+- File references (concrete work)
+- Session duration (longer = more substantial)
+
+### Context Compression
+
+Dense summaries include:
+- Token-optimized text compression
+- Key decision extraction
+- File path tracking
+- Tool usage statistics
+- Temporal metrics
+
+## Security Features
+
+- JWT authentication required for all endpoints
+- User authorization validation
+- Input validation and sanitization
+- Error messages don't leak sensitive paths
+- Dry-run mode prevents accidental imports
+
+## Error Handling
+
+Comprehensive error handling with:
+- File-level error isolation (one failure doesn't stop batch)
+- Detailed error messages with file names
+- HTTP exception mapping
+- Graceful fallback for malformed files
+
+## Testing Recommendations
+
+1. **Unit Tests** (not yet implemented):
+   - Test conversation parsing with various formats
+   - Test categorization accuracy
+   - Test quality score calculation
+   - Test error handling edge cases
+
+2. **Integration Tests** (not yet implemented):
+   - Test full import workflow
+   - Test dry-run vs execute modes
+   - Test project/session association
+   - Test authentication
+
+3. **Manual Testing**:
+   ```bash
+   # Test dry run
+   python scripts/import-claude-context.py --folder test_conversations --dry-run
+
+   # Test actual import
+   python scripts/import-claude-context.py --folder test_conversations --execute
+   ```
+
+## Performance Considerations
+
+- Recursive folder scanning optimized with pathlib
+- File parsing is sequential (not parallelized)
+- Database commits per-conversation (not batched)
+- Large folders may take time (consider progress indicators)
+
+**Optimization Opportunities**:
+- Batch database inserts
+- Parallel file processing
+- Streaming for very large files
+- Caching for repeated scans
+
+## Documentation
+
+Created documentation files:
+- `BULK_IMPORT_IMPLEMENTATION.md` (this file)
+- `.claude/context-recall-config.env.example` (configuration template)
+
+## Next Steps
+
+Recommended enhancements:
+
+1. **Progress Tracking**: Add real-time progress updates for large batches
+2. **Deduplication**: Detect and skip already-imported conversations
+3. **Incremental Import**: Only import new/modified files
+4. **Batch Operations**: Batch database inserts for performance
+5. **Testing Suite**: Comprehensive unit and integration tests
+6. **Web UI**: Frontend interface for import operations
+7. **Scheduling**: Cron/scheduler integration for automated imports
+8. **Validation**: Pre-import validation and compatibility checks
+
+## Files Modified/Created
+
+### Created:
+- `api/routers/bulk_import.py` (230 lines)
+- `scripts/import-claude-context.py` (278 lines)
+- `.claude/context-recall-config.env.example`
+- `BULK_IMPORT_IMPLEMENTATION.md` (this file)
+
+### Modified:
+- `api/main.py` (added bulk_import router registration)
+
+### Previously Created (Dependencies):
+- `api/utils/conversation_parser.py` (609 lines)
+- `api/utils/credential_scanner.py` (597 lines)
+
+## Total Implementation
+
+- **Lines of Code**: ~1,700+ lines
+- **API Endpoints**: 2 (import-folder, import-status)
+- **CLI Tool**: 1 full-featured script
+- **Categories Supported**: 3 (MSP, Development, General)
+- **File Formats**: 2 (.jsonl, .json)
+
+## Usage Example
+
+```bash
+# Step 1: Set up configuration
+cp .claude/context-recall-config.env.example .claude/context-recall-config.env
+# Edit and add your JWT token
+
+# Step 2: Preview import
+python scripts/import-claude-context.py \
+  --folder "C:\Users\MikeSwanson\claude-projects" \
+  --dry-run
+
+# Step 3: Review preview output
+
+# Step 4: Execute import
+python scripts/import-claude-context.py \
+  --folder "C:\Users\MikeSwanson\claude-projects" \
+  --execute
+
+# Step 5: Verify import via API
+curl -H "Authorization: Bearer YOUR_TOKEN" \
+  http://localhost:8000/api/conversation-contexts
+```
+
+## API Integration Example
+
+```python
+import requests
+
+# Get JWT token
+token = "your-jwt-token"
+headers = {"Authorization": f"Bearer {token}"}
+
+# Import with API
+response = requests.post(
+    "http://localhost:8000/api/bulk-import/import-folder",
+    headers=headers,
+    params={
+        "folder_path": "/path/to/conversations",
+        "dry_run": False,
+        "project_id": "abc-123"
+    }
+)
+
+result = response.json()
+print(f"Imported {result['contexts_created']} contexts")
+```
+
+## Conclusion
+
+The bulk import system is fully implemented and functional. It provides:
+- Automated conversation import from Claude Desktop/Code
+- Intelligent categorization and quality scoring
+- Both API and CLI interfaces
+- Comprehensive error handling and reporting
+- Dry-run capabilities for safe testing
+- Integration with existing ClaudeTools infrastructure
+
+The system is ready for use and can be extended with the recommended enhancements for production deployment.