# Credential Scanner and Importer Guide **Module:** `api/utils/credential_scanner.py` **Purpose:** Scan for credential files and import them into the ClaudeTools credential vault with automatic encryption **Status:** Production Ready --- ## Overview The Credential Scanner and Importer provides automated discovery and secure import of credentials from structured files into the ClaudeTools database. All credentials are automatically encrypted using AES-256-GCM before storage, and comprehensive audit logs are created for compliance. ### Key Features - **Multi-format support**: Markdown, .env, text files - **Automatic encryption**: Uses existing `credential_service` for AES-256-GCM encryption - **Type detection**: Auto-detects API keys, passwords, connection strings, tokens - **Audit logging**: Every import operation is logged with full traceability - **Client association**: Optional linking to specific clients - **Safe parsing**: Never logs plaintext credential values --- ## Supported File Formats ### 1. Markdown Files (`.md`) Structured format using headers and key-value pairs: ```markdown ## Gitea Admin Username: admin Password: SecurePass123! URL: https://git.example.com Notes: Main admin account ## Database Server Type: connection_string Connection String: mysql://dbuser:dbpass@192.168.1.50:3306/mydb Notes: Production database ## OpenAI API API Key: sk-1234567890abcdefghijklmnopqrstuvwxyz Notes: Production API key ``` **Recognized keys:** - `Username`, `User`, `Login` → username field - `Password`, `Pass`, `Pwd` → password field - `API Key`, `API_Key`, `ApiKey`, `Key` → api_key field - `Token`, `Access Token`, `Bearer` → token field - `Client Secret`, `Secret` → client_secret field - `Connection String`, `Conn_Str` → connection_string field - `URL`, `Host`, `Server`, `Address` → url (auto-detects internal/external) - `Port` → custom_port field - `Notes`, `Description` → notes field - `Type`, `Credential_Type` → credential_type field ### 2. Environment Files (`.env`) Standard environment variable format: ```bash # Database Configuration DATABASE_URL=mysql://user:pass@host:3306/db # API Keys OPENAI_API_KEY=sk-1234567890abcdefghij GITHUB_TOKEN=ghp_abc123def456ghi789 # Secrets SECRET_KEY=super_secret_key_12345 ``` **Behavior:** - Each `KEY=value` pair creates a separate credential - Service name derived from KEY (e.g., `DATABASE_URL` → "Database Url") - Credential type auto-detected from value pattern ### 3. Text Files (`.txt`) Same format as Markdown, but uses `.txt` extension: ```text # Server Passwords ## Web Server Username: webadmin Password: Web@dmin2024! Host: 192.168.1.100 Port: 22 ## Backup Server Username: backup Password: BackupSecure789 Host: 10.0.0.50 ``` --- ## Credential Type Detection The scanner automatically detects credential types based on value patterns: | Pattern | Detected Type | Field | |---------|--------------|-------| | `sk-*` (20+ chars) | `api_key` | api_key | | `api_*` (20+ chars) | `api_key` | api_key | | `ghp_*` (36 chars) | `api_key` | api_key | | `gho_*` (36 chars) | `api_key` | api_key | | `xoxb-*` | `api_key` | api_key | | `-----BEGIN * PRIVATE KEY-----` | `ssh_key` | password | | `mysql://...` | `connection_string` | connection_string | | `postgresql://...` | `connection_string` | connection_string | | `Server=...;Database=...` | `connection_string` | connection_string | | JWT (3 parts, 50+ chars) | `jwt` | token | | `ya29.*`, `ey*`, `oauth*` | `oauth` | token | | Default | `password` | password | --- ## API Reference ### Function 1: `scan_for_credential_files(base_path: str)` Find all credential files in a directory tree. **Parameters:** - `base_path` (str): Root directory to search from **Returns:** - `List[str]`: Absolute paths to credential files found **Scanned file names:** - `credentials.md`, `credentials.txt` - `passwords.md`, `passwords.txt` - `secrets.md`, `secrets.txt` - `auth.md`, `auth.txt` - `.env`, `.env.local`, `.env.production`, `.env.development`, `.env.staging` **Excluded directories:** - `.git`, `.svn`, `node_modules`, `venv`, `__pycache__`, `.venv`, `dist`, `build` **Example:** ```python from api.utils.credential_scanner import scan_for_credential_files files = scan_for_credential_files("C:/Projects/ClientA") # Returns: ["C:/Projects/ClientA/credentials.md", "C:/Projects/ClientA/.env"] ``` --- ### Function 2: `parse_credential_file(file_path: str)` Extract credentials from a file and return structured data. **Parameters:** - `file_path` (str): Absolute path to credential file **Returns:** - `List[Dict]`: List of credential dictionaries **Credential Dictionary Format:** ```python { "service_name": "Gitea Admin", "credential_type": "password", "username": "admin", "password": "SecurePass123!", # or api_key, token, etc. "internal_url": "192.168.1.100", "custom_port": 3000, "notes": "Main admin account" } ``` **Example:** ```python from api.utils.credential_scanner import parse_credential_file creds = parse_credential_file("C:/Projects/credentials.md") for cred in creds: print(f"Service: {cred['service_name']}") print(f"Type: {cred['credential_type']}") ``` --- ### Function 3: `import_credentials_to_db(db, credentials, client_id=None, user_id="system_import", ip_address=None)` Import credentials into the database with automatic encryption. **Parameters:** - `db` (Session): SQLAlchemy database session - `credentials` (List[Dict]): List of credential dictionaries from `parse_credential_file()` - `client_id` (Optional[str]): UUID string to associate credentials with a client - `user_id` (str): User ID for audit logging (default: "system_import") - `ip_address` (Optional[str]): IP address for audit logging **Returns:** - `int`: Count of successfully imported credentials **Security:** - All sensitive fields automatically encrypted using AES-256-GCM - Audit log entry created for each import (action: "create") - Never logs plaintext credential values - Uses existing `credential_service` encryption infrastructure **Example:** ```python from api.database import SessionLocal from api.utils.credential_scanner import parse_credential_file, import_credentials_to_db db = SessionLocal() try: creds = parse_credential_file("C:/Projects/credentials.md") count = import_credentials_to_db( db=db, credentials=creds, client_id="a1b2c3d4-e5f6-7890-abcd-ef1234567890", user_id="mike@example.com", ip_address="192.168.1.100" ) print(f"Imported {count} credentials") finally: db.close() ``` --- ### Function 4: `scan_and_import_credentials(base_path, db, client_id=None, user_id="system_import", ip_address=None)` Scan for credential files and import all found credentials in one operation. **Parameters:** - `base_path` (str): Root directory to scan - `db` (Session): Database session - `client_id` (Optional[str]): Client UUID to associate credentials with - `user_id` (str): User ID for audit logging - `ip_address` (Optional[str]): IP address for audit logging **Returns:** - `Dict[str, int]`: Summary statistics - `files_found`: Number of credential files found - `credentials_parsed`: Total credentials parsed from all files - `credentials_imported`: Number successfully imported to database **Example:** ```python from api.database import SessionLocal from api.utils.credential_scanner import scan_and_import_credentials db = SessionLocal() try: results = scan_and_import_credentials( base_path="C:/Projects/ClientA", db=db, client_id="client-uuid-here", user_id="mike@example.com" ) print(f"Files found: {results['files_found']}") print(f"Credentials parsed: {results['credentials_parsed']}") print(f"Credentials imported: {results['credentials_imported']}") finally: db.close() ``` --- ## Usage Examples ### Example 1: Quick Import ```python from api.database import SessionLocal from api.utils.credential_scanner import scan_and_import_credentials db = SessionLocal() try: results = scan_and_import_credentials( "C:/Projects/ClientProject", db, client_id="your-client-uuid" ) print(f"Imported {results['credentials_imported']} credentials") finally: db.close() ``` ### Example 2: Preview Before Import ```python from api.utils.credential_scanner import scan_for_credential_files, parse_credential_file # Find files files = scan_for_credential_files("C:/Projects/ClientProject") print(f"Found {len(files)} files") # Preview credentials for file_path in files: creds = parse_credential_file(file_path) print(f"\n{file_path}:") for cred in creds: print(f" - {cred['service_name']} ({cred['credential_type']})") ``` ### Example 3: Manual Import with Error Handling ```python from api.database import SessionLocal from api.utils.credential_scanner import ( scan_for_credential_files, parse_credential_file, import_credentials_to_db ) db = SessionLocal() try: # Scan files = scan_for_credential_files("C:/Projects/ClientProject") # Parse and import each file separately for file_path in files: try: creds = parse_credential_file(file_path) count = import_credentials_to_db(db, creds, client_id="uuid-here") print(f"✓ Imported {count} from {file_path}") except Exception as e: print(f"✗ Failed to import {file_path}: {e}") continue except Exception as e: print(f"Error: {e}") finally: db.close() ``` ### Example 4: Command-Line Import Tool See `example_credential_import.py`: ```bash # Preview without importing python example_credential_import.py /path/to/project --preview # Import with client association python example_credential_import.py /path/to/project --client-id "uuid-here" ``` --- ## Testing Run the test suite: ```bash python test_credential_scanner.py ``` **Tests included:** 1. Scan for credential files 2. Parse credential files (all formats) 3. Import credentials to database 4. Full workflow (scan + parse + import) 5. Markdown format variations --- ## Security Considerations ### Encryption All credentials are encrypted before storage: - **Algorithm**: AES-256-GCM (via Fernet) - **Key management**: Stored in environment variable `ENCRYPTION_KEY` - **Per-field encryption**: password, api_key, client_secret, token, connection_string ### Audit Trail Every import operation creates audit log entries: - **Action**: "create" - **User ID**: From function parameter - **IP address**: From function parameter - **Timestamp**: Auto-generated - **Details**: Service name, credential type ### Logging Safety - Plaintext credentials are **NEVER** logged - File paths and counts are logged - Service names (non-sensitive) are logged - Errors are logged without credential values ### Best Practices 1. **Delete source files** after successful import 2. **Verify imports** using the API or database queries 3. **Use client_id** to associate credentials with clients 4. **Review audit logs** regularly for compliance 5. **Rotate credentials** after initial import if they were stored in plaintext --- ## Integration with ClaudeTools ### Credential Service The scanner uses `api/services/credential_service.py` for all database operations: - `create_credential()` - Handles encryption and audit logging - Automatic validation via Pydantic schemas - Foreign key enforcement (client_id, service_id, infrastructure_id) ### Database Schema Credentials are stored in the `credentials` table: - `id` - UUID primary key - `service_name` - Display name - `credential_type` - Type (password, api_key, etc.) - `username` - Username (optional) - `password_encrypted` - AES-256-GCM encrypted password - `api_key_encrypted` - Encrypted API key - `token_encrypted` - Encrypted token - `connection_string_encrypted` - Encrypted connection string - Plus 20+ other fields for metadata ### Audit Logging Audit logs stored in `credential_audit_log` table: - `credential_id` - Reference to credential - `action` - "create", "view", "update", "delete", "decrypt" - `user_id` - User performing action - `ip_address` - Source IP - `timestamp` - When action occurred - `details` - JSON metadata --- ## Troubleshooting ### No files found **Problem:** `scan_for_credential_files()` returns empty list **Solutions:** - Verify the base path exists and is a directory - Check file names match expected patterns (credentials.md, .env, etc.) - Ensure files are not in excluded directories (node_modules, .git, etc.) ### Parsing errors **Problem:** `parse_credential_file()` returns empty list **Solutions:** - Verify file format matches expected structure (headers, key-value pairs) - Check for encoding issues (must be UTF-8) - Ensure key names are recognized (see "Recognized keys" section) ### Import failures **Problem:** `import_credentials_to_db()` fails or imports less than parsed **Solutions:** - Check database connection is active - Verify `client_id` exists if provided (foreign key constraint) - Check encryption key is configured (`ENCRYPTION_KEY` environment variable) - Review logs for specific validation errors ### Type detection issues **Problem:** Credentials imported with wrong type **Solutions:** - Manually specify `Type:` field in credential file - Update detection patterns in `_detect_credential_type()` - Use explicit field names (e.g., "API Key:" instead of "Key:") --- ## Extending the Scanner ### Add New File Format ```python def _parse_custom_format(content: str) -> List[Dict]: """Parse credentials from custom format.""" credentials = [] # Your parsing logic here return credentials # Update parse_credential_file(): elif file_ext == '.custom': credentials = _parse_custom_format(content) ``` ### Add New Credential Type Pattern ```python # Add to API_KEY_PATTERNS, SSH_KEY_PATTERN, or CONNECTION_STRING_PATTERNS API_KEY_PATTERNS.append(r"^custom_[a-zA-Z0-9]{20,}") # Or add detection logic to _detect_credential_type() ``` ### Add Custom Field Mapping ```python # In _parse_markdown_credentials(), add mapping: elif key in ['custom_field', 'alt_name']: current_cred['custom_field'] = value ``` --- ## Production Deployment ### Environment Setup ```bash # Required environment variable export ENCRYPTION_KEY="64-character-hex-string" # Generate new key: python -c "from api.utils.crypto import generate_encryption_key; print(generate_encryption_key())" ``` ### Import Workflow 1. **Scan** client project directories 2. **Preview** credentials before import 3. **Import** with client association 4. **Verify** import success via API 5. **Delete** source credential files 6. **Rotate** credentials if needed 7. **Document** import in client notes ### Automation Example ```python # Automated import script for all clients from api.database import SessionLocal from api.models.client import Client from api.utils.credential_scanner import scan_and_import_credentials db = SessionLocal() try: clients = db.query(Client).all() for client in clients: project_path = f"C:/Projects/{client.name}" if os.path.exists(project_path): results = scan_and_import_credentials( project_path, db, client_id=str(client.id) ) print(f"{client.name}: {results['credentials_imported']} imported") finally: db.close() ``` --- ## Related Documentation - **API Specification**: `.claude/API_SPEC.md` - **Credential Schema**: `.claude/SCHEMA_CREDENTIALS.md` - **Credential Service**: `api/services/credential_service.py` - **Encryption Utils**: `api/utils/crypto.py` - **Database Models**: `api/models/credential.py` --- **Last Updated:** 2026-01-16 **Version:** 1.0 **Author:** ClaudeTools Development Team