claudetools/CREDENTIAL_SCANNER_GUIDE.md

# Credential Scanner and Importer Guide

**Module:** `api/utils/credential_scanner.py`
**Purpose:** Scan for credential files and import them into the ClaudeTools credential vault with automatic encryption
**Status:** Production Ready

---

## Overview

The Credential Scanner and Importer provides automated discovery and secure import of credentials from structured files into the ClaudeTools database. All credentials are automatically encrypted using AES-256-GCM before storage, and comprehensive audit logs are created for compliance.

### Key Features

- **Multi-format support**: Markdown, .env, text files
- **Automatic encryption**: Uses existing `credential_service` for AES-256-GCM encryption
- **Type detection**: Auto-detects API keys, passwords, connection strings, tokens
- **Audit logging**: Every import operation is logged with full traceability
- **Client association**: Optional linking to specific clients
- **Safe parsing**: Never logs plaintext credential values

---

## Supported File Formats

### 1. Markdown Files (`.md`)

Structured format using headers and key-value pairs:

```markdown
## Gitea Admin
Username: admin
Password: SecurePass123!
URL: https://git.example.com
Notes: Main admin account

## Database Server
Type: connection_string
Connection String: mysql://dbuser:dbpass@192.168.1.50:3306/mydb
Notes: Production database

## OpenAI API
API Key: sk-1234567890abcdefghijklmnopqrstuvwxyz
Notes: Production API key
```

**Recognized keys:**
- `Username`, `User`, `Login` → username field
- `Password`, `Pass`, `Pwd` → password field
- `API Key`, `API_Key`, `ApiKey`, `Key` → api_key field
- `Token`, `Access Token`, `Bearer` → token field
- `Client Secret`, `Secret` → client_secret field
- `Connection String`, `Conn_Str` → connection_string field
- `URL`, `Host`, `Server`, `Address` → url (auto-detects internal/external)
- `Port` → custom_port field
- `Notes`, `Description` → notes field
- `Type`, `Credential_Type` → credential_type field

### 2. Environment Files (`.env`)

Standard environment variable format:

```bash
# Database Configuration
DATABASE_URL=mysql://user:pass@host:3306/db

# API Keys
OPENAI_API_KEY=sk-1234567890abcdefghij
GITHUB_TOKEN=ghp_abc123def456ghi789

# Secrets
SECRET_KEY=super_secret_key_12345
```

**Behavior:**
- Each `KEY=value` pair creates a separate credential
- Service name derived from KEY (e.g., `DATABASE_URL` → "Database Url")
- Credential type auto-detected from value pattern

### 3. Text Files (`.txt`)

Same format as Markdown, but uses `.txt` extension:

```text
# Server Passwords

## Web Server
Username: webadmin
Password: Web@dmin2024!
Host: 192.168.1.100
Port: 22

## Backup Server
Username: backup
Password: BackupSecure789
Host: 10.0.0.50
```

---

## Credential Type Detection

The scanner automatically detects credential types based on value patterns:

| Pattern | Detected Type | Field |
|---------|--------------|-------|
| `sk-*` (20+ chars) | `api_key` | api_key |
| `api_*` (20+ chars) | `api_key` | api_key |
| `ghp_*` (36 chars) | `api_key` | api_key |
| `gho_*` (36 chars) | `api_key` | api_key |
| `xoxb-*` | `api_key` | api_key |
| `-----BEGIN * PRIVATE KEY-----` | `ssh_key` | password |
| `mysql://...` | `connection_string` | connection_string |
| `postgresql://...` | `connection_string` | connection_string |
| `Server=...;Database=...` | `connection_string` | connection_string |
| JWT (3 parts, 50+ chars) | `jwt` | token |
| `ya29.*`, `ey*`, `oauth*` | `oauth` | token |
| Default | `password` | password |

---

## API Reference

### Function 1: `scan_for_credential_files(base_path: str)`

Find all credential files in a directory tree.

**Parameters:**
- `base_path` (str): Root directory to search from

**Returns:**
- `List[str]`: Absolute paths to credential files found

**Scanned file names:**
- `credentials.md`, `credentials.txt`
- `passwords.md`, `passwords.txt`
- `secrets.md`, `secrets.txt`
- `auth.md`, `auth.txt`
- `.env`, `.env.local`, `.env.production`, `.env.development`, `.env.staging`

**Excluded directories:**
- `.git`, `.svn`, `node_modules`, `venv`, `__pycache__`, `.venv`, `dist`, `build`

**Example:**

```python
from api.utils.credential_scanner import scan_for_credential_files

files = scan_for_credential_files("C:/Projects/ClientA")
# Returns: ["C:/Projects/ClientA/credentials.md", "C:/Projects/ClientA/.env"]
```

---

### Function 2: `parse_credential_file(file_path: str)`

Extract credentials from a file and return structured data.

**Parameters:**
- `file_path` (str): Absolute path to credential file

**Returns:**
- `List[Dict]`: List of credential dictionaries

**Credential Dictionary Format:**

```python
{
    "service_name": "Gitea Admin",
    "credential_type": "password",
    "username": "admin",
    "password": "SecurePass123!",  # or api_key, token, etc.
    "internal_url": "192.168.1.100",
    "custom_port": 3000,
    "notes": "Main admin account"
}
```

**Example:**

```python
from api.utils.credential_scanner import parse_credential_file

creds = parse_credential_file("C:/Projects/credentials.md")
for cred in creds:
    print(f"Service: {cred['service_name']}")
    print(f"Type: {cred['credential_type']}")
```

---

### Function 3: `import_credentials_to_db(db, credentials, client_id=None, user_id="system_import", ip_address=None)`

Import credentials into the database with automatic encryption.

**Parameters:**
- `db` (Session): SQLAlchemy database session
- `credentials` (List[Dict]): List of credential dictionaries from `parse_credential_file()`
- `client_id` (Optional[str]): UUID string to associate credentials with a client
- `user_id` (str): User ID for audit logging (default: "system_import")
- `ip_address` (Optional[str]): IP address for audit logging

**Returns:**
- `int`: Count of successfully imported credentials

**Security:**
- All sensitive fields automatically encrypted using AES-256-GCM
- Audit log entry created for each import (action: "create")
- Never logs plaintext credential values
- Uses existing `credential_service` encryption infrastructure

**Example:**

```python
from api.database import SessionLocal
from api.utils.credential_scanner import parse_credential_file, import_credentials_to_db

db = SessionLocal()
try:
    creds = parse_credential_file("C:/Projects/credentials.md")
    count = import_credentials_to_db(
        db=db,
        credentials=creds,
        client_id="a1b2c3d4-e5f6-7890-abcd-ef1234567890",
        user_id="mike@example.com",
        ip_address="192.168.1.100"
    )
    print(f"Imported {count} credentials")
finally:
    db.close()
```

---

### Function 4: `scan_and_import_credentials(base_path, db, client_id=None, user_id="system_import", ip_address=None)`

Scan for credential files and import all found credentials in one operation.

**Parameters:**
- `base_path` (str): Root directory to scan
- `db` (Session): Database session
- `client_id` (Optional[str]): Client UUID to associate credentials with
- `user_id` (str): User ID for audit logging
- `ip_address` (Optional[str]): IP address for audit logging

**Returns:**
- `Dict[str, int]`: Summary statistics
  - `files_found`: Number of credential files found
  - `credentials_parsed`: Total credentials parsed from all files
  - `credentials_imported`: Number successfully imported to database

**Example:**

```python
from api.database import SessionLocal
from api.utils.credential_scanner import scan_and_import_credentials

db = SessionLocal()
try:
    results = scan_and_import_credentials(
        base_path="C:/Projects/ClientA",
        db=db,
        client_id="client-uuid-here",
        user_id="mike@example.com"
    )

    print(f"Files found: {results['files_found']}")
    print(f"Credentials parsed: {results['credentials_parsed']}")
    print(f"Credentials imported: {results['credentials_imported']}")
finally:
    db.close()
```

---

## Usage Examples

### Example 1: Quick Import

```python
from api.database import SessionLocal
from api.utils.credential_scanner import scan_and_import_credentials

db = SessionLocal()
try:
    results = scan_and_import_credentials(
        "C:/Projects/ClientProject",
        db,
        client_id="your-client-uuid"
    )
    print(f"Imported {results['credentials_imported']} credentials")
finally:
    db.close()
```

### Example 2: Preview Before Import

```python
from api.utils.credential_scanner import scan_for_credential_files, parse_credential_file

# Find files
files = scan_for_credential_files("C:/Projects/ClientProject")
print(f"Found {len(files)} files")

# Preview credentials
for file_path in files:
    creds = parse_credential_file(file_path)
    print(f"\n{file_path}:")
    for cred in creds:
        print(f"  - {cred['service_name']} ({cred['credential_type']})")
```

### Example 3: Manual Import with Error Handling

```python
from api.database import SessionLocal
from api.utils.credential_scanner import (
    scan_for_credential_files,
    parse_credential_file,
    import_credentials_to_db
)

db = SessionLocal()
try:
    # Scan
    files = scan_for_credential_files("C:/Projects/ClientProject")

    # Parse and import each file separately
    for file_path in files:
        try:
            creds = parse_credential_file(file_path)
            count = import_credentials_to_db(db, creds, client_id="uuid-here")
            print(f"✓ Imported {count} from {file_path}")
        except Exception as e:
            print(f"✗ Failed to import {file_path}: {e}")
            continue

except Exception as e:
    print(f"Error: {e}")
finally:
    db.close()
```

### Example 4: Command-Line Import Tool

See `example_credential_import.py`:

```bash
# Preview without importing
python example_credential_import.py /path/to/project --preview

# Import with client association
python example_credential_import.py /path/to/project --client-id "uuid-here"
```

---

## Testing

Run the test suite:

```bash
python test_credential_scanner.py
```

**Tests included:**
1. Scan for credential files
2. Parse credential files (all formats)
3. Import credentials to database
4. Full workflow (scan + parse + import)
5. Markdown format variations

---

## Security Considerations

### Encryption

All credentials are encrypted before storage:
- **Algorithm**: AES-256-GCM (via Fernet)
- **Key management**: Stored in environment variable `ENCRYPTION_KEY`
- **Per-field encryption**: password, api_key, client_secret, token, connection_string

### Audit Trail

Every import operation creates audit log entries:
- **Action**: "create"
- **User ID**: From function parameter
- **IP address**: From function parameter
- **Timestamp**: Auto-generated
- **Details**: Service name, credential type

### Logging Safety

- Plaintext credentials are **NEVER** logged
- File paths and counts are logged
- Service names (non-sensitive) are logged
- Errors are logged without credential values

### Best Practices

1. **Delete source files** after successful import
2. **Verify imports** using the API or database queries
3. **Use client_id** to associate credentials with clients
4. **Review audit logs** regularly for compliance
5. **Rotate credentials** after initial import if they were stored in plaintext

---

## Integration with ClaudeTools

### Credential Service

The scanner uses `api/services/credential_service.py` for all database operations:
- `create_credential()` - Handles encryption and audit logging
- Automatic validation via Pydantic schemas
- Foreign key enforcement (client_id, service_id, infrastructure_id)

### Database Schema

Credentials are stored in the `credentials` table:
- `id` - UUID primary key
- `service_name` - Display name
- `credential_type` - Type (password, api_key, etc.)
- `username` - Username (optional)
- `password_encrypted` - AES-256-GCM encrypted password
- `api_key_encrypted` - Encrypted API key
- `token_encrypted` - Encrypted token
- `connection_string_encrypted` - Encrypted connection string
- Plus 20+ other fields for metadata

### Audit Logging

Audit logs stored in `credential_audit_log` table:
- `credential_id` - Reference to credential
- `action` - "create", "view", "update", "delete", "decrypt"
- `user_id` - User performing action
- `ip_address` - Source IP
- `timestamp` - When action occurred
- `details` - JSON metadata

---

## Troubleshooting

### No files found

**Problem:** `scan_for_credential_files()` returns empty list

**Solutions:**
- Verify the base path exists and is a directory
- Check file names match expected patterns (credentials.md, .env, etc.)
- Ensure files are not in excluded directories (node_modules, .git, etc.)

### Parsing errors

**Problem:** `parse_credential_file()` returns empty list

**Solutions:**
- Verify file format matches expected structure (headers, key-value pairs)
- Check for encoding issues (must be UTF-8)
- Ensure key names are recognized (see "Recognized keys" section)

### Import failures

**Problem:** `import_credentials_to_db()` fails or imports less than parsed

**Solutions:**
- Check database connection is active
- Verify `client_id` exists if provided (foreign key constraint)
- Check encryption key is configured (`ENCRYPTION_KEY` environment variable)
- Review logs for specific validation errors

### Type detection issues

**Problem:** Credentials imported with wrong type

**Solutions:**
- Manually specify `Type:` field in credential file
- Update detection patterns in `_detect_credential_type()`
- Use explicit field names (e.g., "API Key:" instead of "Key:")

---

## Extending the Scanner

### Add New File Format

```python
def _parse_custom_format(content: str) -> List[Dict]:
    """Parse credentials from custom format."""
    credentials = []

    # Your parsing logic here

    return credentials

# Update parse_credential_file():
elif file_ext == '.custom':
    credentials = _parse_custom_format(content)
```

### Add New Credential Type Pattern

```python
# Add to API_KEY_PATTERNS, SSH_KEY_PATTERN, or CONNECTION_STRING_PATTERNS
API_KEY_PATTERNS.append(r"^custom_[a-zA-Z0-9]{20,}")

# Or add detection logic to _detect_credential_type()
```

### Add Custom Field Mapping

```python
# In _parse_markdown_credentials(), add mapping:
elif key in ['custom_field', 'alt_name']:
    current_cred['custom_field'] = value
```

---

## Production Deployment

### Environment Setup

```bash
# Required environment variable
export ENCRYPTION_KEY="64-character-hex-string"

# Generate new key:
python -c "from api.utils.crypto import generate_encryption_key; print(generate_encryption_key())"
```

### Import Workflow

1. **Scan** client project directories
2. **Preview** credentials before import
3. **Import** with client association
4. **Verify** import success via API
5. **Delete** source credential files
6. **Rotate** credentials if needed
7. **Document** import in client notes

### Automation Example

```python
# Automated import script for all clients
from api.database import SessionLocal
from api.models.client import Client
from api.utils.credential_scanner import scan_and_import_credentials

db = SessionLocal()
try:
    clients = db.query(Client).all()

    for client in clients:
        project_path = f"C:/Projects/{client.name}"
        if os.path.exists(project_path):
            results = scan_and_import_credentials(
                project_path,
                db,
                client_id=str(client.id)
            )
            print(f"{client.name}: {results['credentials_imported']} imported")
finally:
    db.close()
```

---

## Related Documentation

- **API Specification**: `.claude/API_SPEC.md`
- **Credential Schema**: `.claude/SCHEMA_CREDENTIALS.md`
- **Credential Service**: `api/services/credential_service.py`
- **Encryption Utils**: `api/utils/crypto.py`
- **Database Models**: `api/models/credential.py`

---

**Last Updated:** 2026-01-16
**Version:** 1.0
**Author:** ClaudeTools Development Team