Files
claudetools/.claude/SCHEMA_CONTEXT.md
Mike Swanson 390b10b32c Complete Phase 6: MSP Work Tracking with Context Recall System
Implements production-ready MSP platform with cross-machine persistent memory for Claude.

API Implementation:
- 130 REST API endpoints across 21 entities
- JWT authentication on all endpoints
- AES-256-GCM encryption for credentials
- Automatic audit logging
- Complete OpenAPI documentation

Database:
- 43 tables in MariaDB (172.16.3.20:3306)
- 42 SQLAlchemy models with modern 2.0 syntax
- Full Alembic migration system
- 99.1% CRUD test pass rate

Context Recall System (Phase 6):
- Cross-machine persistent memory via database
- Automatic context injection via Claude Code hooks
- Automatic context saving after task completion
- 90-95% token reduction with compression utilities
- Relevance scoring with time decay
- Tag-based semantic search
- One-command setup script

Security Features:
- JWT tokens with Argon2 password hashing
- AES-256-GCM encryption for all sensitive data
- Comprehensive audit trail for credentials
- HMAC tamper detection
- Secure configuration management

Test Results:
- Phase 3: 38/38 CRUD tests passing (100%)
- Phase 4: 34/35 core API tests passing (97.1%)
- Phase 5: 62/62 extended API tests passing (100%)
- Phase 6: 10/10 compression tests passing (100%)
- Overall: 144/145 tests passing (99.3%)

Documentation:
- Comprehensive architecture guides
- Setup automation scripts
- API documentation at /api/docs
- Complete test reports
- Troubleshooting guides

Project Status: 95% Complete (Production-Ready)
Phase 7 (optional work context APIs) remains for future enhancement.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-17 06:00:26 -07:00

31 KiB

Learning & Context Schema

MSP Mode Database Schema - Self-Learning System

Status: Designed 2026-01-15 Database: msp_tracking (MariaDB on Jupiter)


Overview

The Learning & Context subsystem enables MSP Mode to learn from every failure, build environmental awareness, and prevent recurring mistakes. This self-improving system captures failure patterns, generates actionable insights, and proactively checks environmental constraints before making suggestions.

Core Principle: Every failure is a learning opportunity. Agents must never make the same mistake twice.

Related Documentation:


Tables Summary

Table Purpose Auto-Generated
environmental_insights Generated insights per client/infrastructure Yes
problem_solutions Issue tracking with root cause and resolution Partial
failure_patterns Aggregated failure analysis and learnings Yes
operation_failures Non-command failures (API, file ops, network) Yes

Total: 4 tables

Specialized Agents:

  • Failure Analysis Agent - Analyzes failures, identifies patterns, generates insights
  • Environment Context Agent - Pre-checks environmental constraints before operations
  • Problem Pattern Matching Agent - Searches historical solutions for similar issues

Table Schemas

environmental_insights

Auto-generated insights about client infrastructure constraints, limitations, and quirks. Used by Environment Context Agent to prevent failures before they occur.

CREATE TABLE environmental_insights (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,

    -- Insight classification
    insight_category VARCHAR(100) NOT NULL CHECK(insight_category IN (
        'command_constraints', 'service_configuration', 'version_limitations',
        'custom_installations', 'network_constraints', 'permissions',
        'compatibility', 'performance', 'security'
    )),
    insight_title VARCHAR(500) NOT NULL,
    insight_description TEXT NOT NULL, -- markdown formatted

    -- Examples and documentation
    examples TEXT, -- JSON array of command/config examples
    affected_operations TEXT, -- JSON array: ["user_management", "service_restart"]

    -- Source and verification
    source_pattern_id UUID REFERENCES failure_patterns(id) ON DELETE SET NULL,
    confidence_level VARCHAR(20) CHECK(confidence_level IN ('confirmed', 'likely', 'suspected')),
    verification_count INTEGER DEFAULT 1, -- how many times verified
    last_verified TIMESTAMP,

    -- Priority (1-10, higher = more important to avoid)
    priority INTEGER DEFAULT 5 CHECK(priority BETWEEN 1 AND 10),

    -- Status
    is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies
    superseded_by UUID REFERENCES environmental_insights(id), -- if replaced by better insight

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_insights_client (client_id),
    INDEX idx_insights_infrastructure (infrastructure_id),
    INDEX idx_insights_category (insight_category),
    INDEX idx_insights_priority (priority),
    INDEX idx_insights_active (is_active)
);

Real-World Examples:

D2TESTNAS - Custom WINS Installation:

{
  "infrastructure_id": "d2testnas-uuid",
  "client_id": "dataforth-uuid",
  "insight_category": "custom_installations",
  "insight_title": "WINS Service: Manual Samba installation (no native ReadyNAS service)",
  "insight_description": "**Installation:** Manually installed via Samba nmbd, not a native ReadyNAS service.\n\n**Constraints:**\n- No GUI service manager for WINS\n- Cannot use standard service management commands\n- Configuration via `/etc/frontview/samba/smb.conf.overrides`\n\n**Correct commands:**\n- Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'`\n- View config: `ssh root@192.168.0.9 'cat /etc/frontview/samba/smb.conf.overrides | grep wins'`\n- Restart: `ssh root@192.168.0.9 'service nmbd restart'`",
  "examples": [
    "ps aux | grep nmbd",
    "cat /etc/frontview/samba/smb.conf.overrides | grep wins",
    "service nmbd restart"
  ],
  "affected_operations": ["service_management", "wins_configuration"],
  "confidence_level": "confirmed",
  "verification_count": 3,
  "priority": 9
}

AD2 - PowerShell Version Constraints:

{
  "infrastructure_id": "ad2-uuid",
  "client_id": "dataforth-uuid",
  "insight_category": "version_limitations",
  "insight_title": "Server 2022: PowerShell 5.1 command compatibility",
  "insight_description": "**PowerShell Version:** 5.1 (default)\n\n**Compatible:** Modern cmdlets work (Get-LocalUser, Get-LocalGroup)\n\n**Not available:** PowerShell 7 specific features\n\n**Remote execution:** Use Invoke-Command for remote operations",
  "examples": [
    "Get-LocalUser",
    "Get-LocalGroup",
    "Invoke-Command -ComputerName AD2 -ScriptBlock { Get-LocalUser }"
  ],
  "confidence_level": "confirmed",
  "verification_count": 5,
  "priority": 6
}

Server 2008 - PowerShell 2.0 Limitations:

{
  "infrastructure_id": "old-server-2008-uuid",
  "insight_category": "version_limitations",
  "insight_title": "Server 2008: PowerShell 2.0 command compatibility",
  "insight_description": "**PowerShell Version:** 2.0 only\n\n**Avoid:** Get-LocalUser, Get-LocalGroup, New-LocalUser (not available in PS 2.0)\n\n**Use instead:** Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group\n\n**Why:** Server 2008 predates modern PowerShell user management cmdlets",
  "examples": [
    "Get-WmiObject Win32_UserAccount",
    "Get-WmiObject Win32_Group",
    "Get-WmiObject Win32_UserAccount -Filter \"Name='username'\""
  ],
  "affected_operations": ["user_management", "group_management"],
  "confidence_level": "confirmed",
  "verification_count": 5,
  "priority": 8
}

DOS Machines (TS-XX) - Batch Syntax Constraints:

{
  "infrastructure_id": "ts-27-uuid",
  "client_id": "dataforth-uuid",
  "insight_category": "command_constraints",
  "insight_title": "MS-DOS 6.22: Batch file syntax limitations",
  "insight_description": "**OS:** MS-DOS 6.22\n\n**No support for:**\n- `IF /I` (case insensitive) - added in Windows 2000\n- Long filenames (8.3 format only)\n- Unicode or special characters\n- Modern batch features\n\n**Workarounds:**\n- Use duplicate IF statements for upper/lowercase\n- Keep filenames to 8.3 format\n- Use basic batch syntax only",
  "examples": [
    "IF \"%1\"=\"STATUS\" GOTO STATUS",
    "IF \"%1\"=\"status\" GOTO STATUS",
    "COPY FILE.TXT BACKUP.TXT"
  ],
  "affected_operations": ["batch_scripting", "file_operations"],
  "confidence_level": "confirmed",
  "verification_count": 8,
  "priority": 10
}

D2TESTNAS - SMB Protocol Constraints:

{
  "infrastructure_id": "d2testnas-uuid",
  "insight_category": "network_constraints",
  "insight_title": "ReadyNAS: SMB1/CORE protocol for DOS compatibility",
  "insight_description": "**Protocol:** CORE/SMB1 only (for DOS machine compatibility)\n\n**Implications:**\n- Modern SMB2/3 clients may need configuration\n- Use NetBIOS name, not IP address for DOS machines\n- Security risk: SMB1 deprecated due to vulnerabilities\n\n**Configuration:**\n- Set in `/etc/frontview/samba/smb.conf.overrides`\n- `min protocol = CORE`",
  "examples": [
    "NET USE Z: \\\\D2TESTNAS\\SHARE (from DOS)",
    "smbclient -L //192.168.0.9 -m SMB1"
  ],
  "confidence_level": "confirmed",
  "priority": 7
}

Generated insights.md Example:

When Failure Analysis Agent runs, it generates markdown files for each client:

# Environmental Insights: Dataforth

Auto-generated from failure patterns and verified operations.

## D2TESTNAS (192.168.0.9)

### Custom Installations

**WINS Service: Manual Samba installation**
- Manually installed via Samba nmbd, not native ReadyNAS service
- No GUI service manager for WINS
- Configure via `/etc/frontview/samba/smb.conf.overrides`
- Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'`

### Network Constraints

**SMB Protocol: CORE/SMB1 only**
- For DOS compatibility
- Modern SMB2/3 clients may need configuration
- Use NetBIOS name from DOS machines

## AD2 (192.168.0.6 - Server 2022)

### PowerShell Version

**Version:** PowerShell 5.1 (default)
- **Compatible:** Modern cmdlets work
- **Not available:** PowerShell 7 specific features

## TS-XX Machines (DOS 6.22)

### Command Constraints

**No support for:**
- `IF /I` (case insensitive) - use duplicate IF statements
- Long filenames (8.3 format only)
- Unicode or special characters
- Modern batch features

**Examples:**
```batch
REM Correct (DOS 6.22)
IF "%1"=="STATUS" GOTO STATUS
IF "%1"=="status" GOTO STATUS

REM Incorrect (requires Windows 2000+)
IF /I "%1"=="STATUS" GOTO STATUS

---

### `problem_solutions`

Issue tracking with root cause analysis and resolution documentation. Searchable historical knowledge base.

```sql
CREATE TABLE problem_solutions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,

    -- Problem description
    problem_title VARCHAR(500) NOT NULL,
    problem_description TEXT NOT NULL,
    symptom TEXT, -- what user/system exhibited
    error_message TEXT, -- exact error code/message
    error_code VARCHAR(100), -- structured error code

    -- Investigation
    investigation_steps TEXT, -- JSON array of diagnostic commands/actions
    diagnostic_output TEXT, -- key outputs that led to root cause
    investigation_duration_minutes INTEGER,

    -- Root cause
    root_cause TEXT NOT NULL,
    root_cause_category VARCHAR(100), -- "configuration", "hardware", "software", "network"

    -- Solution
    solution_applied TEXT NOT NULL,
    solution_category VARCHAR(100), -- "config_change", "restart", "replacement", "patch"
    commands_run TEXT, -- JSON array of commands used to fix
    files_modified TEXT, -- JSON array of config files changed

    -- Verification
    verification_method TEXT,
    verification_successful BOOLEAN DEFAULT true,
    verification_notes TEXT,

    -- Prevention and rollback
    rollback_plan TEXT,
    prevention_measures TEXT, -- what was done to prevent recurrence

    -- Pattern tracking
    recurrence_count INTEGER DEFAULT 1, -- if same problem reoccurs
    similar_problems TEXT, -- JSON array of related problem_solution IDs
    tags TEXT, -- JSON array: ["ssl", "apache", "certificate"]

    -- Resolution
    resolved_at TIMESTAMP,
    time_to_resolution_minutes INTEGER,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_problems_work_item (work_item_id),
    INDEX idx_problems_session (session_id),
    INDEX idx_problems_client (client_id),
    INDEX idx_problems_infrastructure (infrastructure_id),
    INDEX idx_problems_category (root_cause_category),
    FULLTEXT idx_problems_search (problem_description, symptom, error_message, root_cause)
);

Example Problem Solutions:

Apache SSL Certificate Expiration:

{
  "problem_title": "Apache SSL certificate expiration causing ERR_SSL_PROTOCOL_ERROR",
  "problem_description": "Website inaccessible via HTTPS. Browser shows ERR_SSL_PROTOCOL_ERROR.",
  "symptom": "Users unable to access website. SSL handshake failure.",
  "error_message": "ERR_SSL_PROTOCOL_ERROR",
  "investigation_steps": [
    "curl -I https://example.com",
    "openssl s_client -connect example.com:443",
    "systemctl status apache2",
    "openssl x509 -in /etc/ssl/certs/example.com.crt -text -noout"
  ],
  "diagnostic_output": "Certificate expiration: 2026-01-10 (3 days ago)",
  "root_cause": "SSL certificate expired on 2026-01-10. Certbot auto-renewal failed due to DNS validation issue.",
  "root_cause_category": "configuration",
  "solution_applied": "1. Fixed DNS TXT record for Let's Encrypt validation\n2. Ran: certbot renew --force-renewal\n3. Restarted Apache: systemctl restart apache2",
  "solution_category": "config_change",
  "commands_run": [
    "certbot renew --force-renewal",
    "systemctl restart apache2"
  ],
  "files_modified": [
    "/etc/apache2/sites-enabled/example.com.conf"
  ],
  "verification_method": "curl test successful. Browser loads HTTPS site without error.",
  "verification_successful": true,
  "prevention_measures": "Set up monitoring for certificate expiration (30 days warning). Fixed DNS automation for certbot.",
  "tags": ["ssl", "apache", "certificate", "certbot"],
  "time_to_resolution_minutes": 25
}

PowerShell Compatibility Issue:

{
  "problem_title": "Get-LocalUser fails on Server 2008 (PowerShell 2.0)",
  "problem_description": "Attempting to list local users on Server 2008 using Get-LocalUser cmdlet",
  "symptom": "Command not recognized error",
  "error_message": "Get-LocalUser : The term 'Get-LocalUser' is not recognized as the name of a cmdlet",
  "error_code": "CommandNotFoundException",
  "investigation_steps": [
    "$PSVersionTable",
    "Get-Command Get-LocalUser",
    "Get-WmiObject Win32_OperatingSystem | Select Caption, Version"
  ],
  "root_cause": "Server 2008 has PowerShell 2.0 only. Get-LocalUser introduced in PowerShell 5.1 (Windows 10/Server 2016).",
  "root_cause_category": "software",
  "solution_applied": "Use WMI instead: Get-WmiObject Win32_UserAccount",
  "solution_category": "alternative_approach",
  "commands_run": [
    "Get-WmiObject Win32_UserAccount | Select Name, Disabled, LocalAccount"
  ],
  "verification_method": "Successfully retrieved local user list",
  "verification_successful": true,
  "prevention_measures": "Created environmental insight for all Server 2008 machines. Environment Context Agent now checks PowerShell version before suggesting cmdlets.",
  "tags": ["powershell", "server_2008", "compatibility", "user_management"],
  "recurrence_count": 5
}

Queries:

-- Find similar problems by error message
SELECT problem_title, solution_applied, created_at
FROM problem_solutions
WHERE MATCH(error_message) AGAINST('SSL_PROTOCOL_ERROR' IN BOOLEAN MODE)
ORDER BY created_at DESC;

-- Most common problems (by recurrence)
SELECT problem_title, recurrence_count, root_cause_category
FROM problem_solutions
WHERE recurrence_count > 1
ORDER BY recurrence_count DESC;

-- Recent solutions for client
SELECT problem_title, solution_applied, resolved_at
FROM problem_solutions
WHERE client_id = 'dataforth-uuid'
ORDER BY resolved_at DESC
LIMIT 10;

failure_patterns

Aggregated failure insights learned from command/operation failures. Auto-generated by Failure Analysis Agent.

CREATE TABLE failure_patterns (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,

    -- Pattern identification
    pattern_type VARCHAR(100) NOT NULL CHECK(pattern_type IN (
        'command_compatibility', 'version_mismatch', 'permission_denied',
        'service_unavailable', 'configuration_error', 'environmental_limitation',
        'network_connectivity', 'authentication_failure', 'syntax_error'
    )),
    pattern_signature VARCHAR(500) NOT NULL, -- "PowerShell 7 cmdlets on Server 2008"
    error_pattern TEXT, -- regex or keywords: "Get-LocalUser.*not recognized"

    -- Context
    affected_systems TEXT, -- JSON array: ["all_server_2008", "D2TESTNAS"]
    affected_os_versions TEXT, -- JSON array: ["Server 2008", "DOS 6.22"]
    triggering_commands TEXT, -- JSON array of command patterns
    triggering_operations TEXT, -- JSON array of operation types

    -- Failure details
    failure_description TEXT NOT NULL,
    typical_error_messages TEXT, -- JSON array of common error texts

    -- Resolution
    root_cause TEXT NOT NULL, -- "Server 2008 only has PowerShell 2.0"
    recommended_solution TEXT NOT NULL, -- "Use Get-WmiObject instead of Get-LocalUser"
    alternative_approaches TEXT, -- JSON array of alternatives
    workaround_commands TEXT, -- JSON array of working commands

    -- Metadata
    occurrence_count INTEGER DEFAULT 1, -- how many times seen
    first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    severity VARCHAR(20) CHECK(severity IN ('blocking', 'major', 'minor', 'info')),

    -- Status
    is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies (e.g., server upgraded)
    added_to_insights BOOLEAN DEFAULT false, -- environmental_insight generated

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_failure_infrastructure (infrastructure_id),
    INDEX idx_failure_client (client_id),
    INDEX idx_failure_pattern_type (pattern_type),
    INDEX idx_failure_signature (pattern_signature),
    INDEX idx_failure_active (is_active),
    INDEX idx_failure_severity (severity)
);

Example Failure Patterns:

PowerShell Version Incompatibility:

{
  "pattern_type": "command_compatibility",
  "pattern_signature": "Modern PowerShell cmdlets on Server 2008",
  "error_pattern": "(Get-LocalUser|Get-LocalGroup|New-LocalUser).*not recognized",
  "affected_systems": ["all_server_2008_machines"],
  "affected_os_versions": ["Server 2008", "Server 2008 R2"],
  "triggering_commands": [
    "Get-LocalUser",
    "Get-LocalGroup",
    "New-LocalUser",
    "Remove-LocalUser"
  ],
  "failure_description": "Modern PowerShell user management cmdlets fail on Server 2008 with 'not recognized' error",
  "typical_error_messages": [
    "Get-LocalUser : The term 'Get-LocalUser' is not recognized",
    "Get-LocalGroup : The term 'Get-LocalGroup' is not recognized"
  ],
  "root_cause": "Server 2008 has PowerShell 2.0 only. Modern user management cmdlets (Get-LocalUser, etc.) were introduced in PowerShell 5.1 (Windows 10/Server 2016).",
  "recommended_solution": "Use WMI for user/group management: Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group",
  "alternative_approaches": [
    "Use Get-WmiObject Win32_UserAccount",
    "Use net user command",
    "Upgrade to PowerShell 5.1 (if possible on Server 2008 R2)"
  ],
  "workaround_commands": [
    "Get-WmiObject Win32_UserAccount",
    "Get-WmiObject Win32_Group",
    "net user"
  ],
  "occurrence_count": 5,
  "severity": "major",
  "added_to_insights": true
}

DOS Batch Syntax Limitation:

{
  "pattern_type": "environmental_limitation",
  "pattern_signature": "Modern batch syntax on MS-DOS 6.22",
  "error_pattern": "IF /I.*Invalid switch",
  "affected_systems": ["all_dos_machines"],
  "affected_os_versions": ["MS-DOS 6.22"],
  "triggering_commands": [
    "IF /I \"%1\"==\"value\" ...",
    "Long filenames with spaces"
  ],
  "failure_description": "Modern batch file syntax not supported in MS-DOS 6.22",
  "typical_error_messages": [
    "Invalid switch - /I",
    "File not found (long filename)",
    "Bad command or file name"
  ],
  "root_cause": "DOS 6.22 does not support /I flag (added in Windows 2000), long filenames, or many modern batch features",
  "recommended_solution": "Use duplicate IF statements for upper/lowercase. Keep filenames to 8.3 format. Use basic batch syntax only.",
  "alternative_approaches": [
    "Duplicate IF for case-insensitive: IF \"%1\"==\"VALUE\" ... + IF \"%1\"==\"value\" ...",
    "Use 8.3 filenames only",
    "Avoid advanced batch features"
  ],
  "workaround_commands": [
    "IF \"%1\"==\"STATUS\" GOTO STATUS",
    "IF \"%1\"==\"status\" GOTO STATUS"
  ],
  "occurrence_count": 8,
  "severity": "blocking",
  "added_to_insights": true
}

ReadyNAS Service Management:

{
  "pattern_type": "service_unavailable",
  "pattern_signature": "systemd commands on ReadyNAS",
  "error_pattern": "systemctl.*command not found",
  "affected_systems": ["D2TESTNAS"],
  "triggering_commands": [
    "systemctl status nmbd",
    "systemctl restart samba"
  ],
  "failure_description": "ReadyNAS does not use systemd for service management",
  "typical_error_messages": [
    "systemctl: command not found",
    "-ash: systemctl: not found"
  ],
  "root_cause": "ReadyNAS OS is based on older Linux without systemd. Uses traditional init scripts.",
  "recommended_solution": "Use 'service' command or direct process management: service nmbd status, ps aux | grep nmbd",
  "alternative_approaches": [
    "service nmbd status",
    "ps aux | grep nmbd",
    "/etc/init.d/nmbd status"
  ],
  "occurrence_count": 3,
  "severity": "major",
  "added_to_insights": true
}

operation_failures

Non-command failures (API calls, integrations, file operations, network requests). Complements commands_run failure tracking.

CREATE TABLE operation_failures (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
    work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE,
    client_id UUID REFERENCES clients(id) ON DELETE SET NULL,

    -- Operation details
    operation_type VARCHAR(100) NOT NULL CHECK(operation_type IN (
        'api_call', 'file_operation', 'network_request',
        'database_query', 'external_integration', 'service_restart',
        'backup_operation', 'restore_operation', 'migration'
    )),
    operation_description TEXT NOT NULL,
    target_system VARCHAR(255), -- host, URL, service name

    -- Failure details
    error_message TEXT NOT NULL,
    error_code VARCHAR(50), -- HTTP status, exit code, error number
    failure_category VARCHAR(100), -- "timeout", "authentication", "not_found", etc.
    stack_trace TEXT,

    -- Context
    request_data TEXT, -- JSON: what was attempted
    response_data TEXT, -- JSON: error response
    environment_snapshot TEXT, -- JSON: relevant env vars, versions

    -- Resolution
    resolution_applied TEXT,
    resolved BOOLEAN DEFAULT false,
    resolved_at TIMESTAMP,
    time_to_resolution_minutes INTEGER,

    -- Pattern linkage
    related_pattern_id UUID REFERENCES failure_patterns(id),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_op_failure_session (session_id),
    INDEX idx_op_failure_type (operation_type),
    INDEX idx_op_failure_category (failure_category),
    INDEX idx_op_failure_resolved (resolved),
    INDEX idx_op_failure_client (client_id)
);

Example Operation Failures:

SyncroMSP API Timeout:

{
  "operation_type": "api_call",
  "operation_description": "Search SyncroMSP tickets for Dataforth",
  "target_system": "https://azcomputerguru.syncromsp.com/api/v1",
  "error_message": "Request timeout after 30 seconds",
  "error_code": "ETIMEDOUT",
  "failure_category": "timeout",
  "request_data": {
    "endpoint": "/api/v1/tickets",
    "params": {"customer_id": 12345, "status": "open"}
  },
  "response_data": null,
  "resolution_applied": "Increased timeout to 60 seconds. Added retry logic with exponential backoff.",
  "resolved": true,
  "time_to_resolution_minutes": 15
}

File Upload Permission Denied:

{
  "operation_type": "file_operation",
  "operation_description": "Upload backup file to NAS",
  "target_system": "D2TESTNAS:/mnt/backups",
  "error_message": "Permission denied: /mnt/backups/db_backup_2026-01-15.sql",
  "error_code": "EACCES",
  "failure_category": "permission",
  "environment_snapshot": {
    "user": "backupuser",
    "directory_perms": "drwxr-xr-x root root"
  },
  "resolution_applied": "Changed directory ownership: chown -R backupuser:backupgroup /mnt/backups",
  "resolved": true
}

Database Query Performance:

{
  "operation_type": "database_query",
  "operation_description": "Query sessions table for large date range",
  "target_system": "MariaDB msp_tracking",
  "error_message": "Query execution time: 45 seconds (threshold: 5 seconds)",
  "failure_category": "performance",
  "request_data": {
    "query": "SELECT * FROM sessions WHERE session_date BETWEEN '2020-01-01' AND '2026-01-15'"
  },
  "resolution_applied": "Added index on session_date column. Query now runs in 0.3 seconds.",
  "resolved": true
}

Self-Learning Workflow

1. Failure Detection and Logging

Command Execution with Failure Tracking:

User: "Check WINS status on D2TESTNAS"

Main Claude → Environment Context Agent:
  - Queries infrastructure table for D2TESTNAS
  - Reads environmental_notes: "Manual WINS install, no native service"
  - Reads environmental_insights for D2TESTNAS
  - Returns: "D2TESTNAS has manually installed WINS (not native ReadyNAS service)"

Main Claude suggests command based on environmental context:
  - Executes: ssh root@192.168.0.9 'systemctl status nmbd'

Command fails:
  - success = false
  - exit_code = 127
  - error_message = "systemctl: command not found"
  - failure_category = "command_compatibility"

Trigger Failure Analysis Agent:
  - Analyzes error: ReadyNAS doesn't use systemd
  - Identifies correct approach: "service nmbd status" or "ps aux | grep nmbd"
  - Creates failure_pattern entry
  - Updates environmental_insights with correction
  - Returns resolution to Main Claude

Main Claude tries corrected command:
  - Executes: ssh root@192.168.0.9 'ps aux | grep nmbd'
  - Success = true
  - Updates original failure record with resolution

2. Pattern Analysis (Periodic Agent Run)

Failure Analysis Agent runs periodically:

Agent Task: "Analyze recent failures and update environmental insights"

  1. Query failures:

    SELECT * FROM commands_run
    WHERE success = false AND resolved = false
    ORDER BY created_at DESC;
    
    SELECT * FROM operation_failures
    WHERE resolved = false
    ORDER BY created_at DESC;
    
  2. Group by pattern:

    • Group by infrastructure_id, error_pattern, failure_category
    • Identify recurring patterns
  3. Create/update failure_patterns:

    • If pattern seen 3+ times → Create failure_pattern
    • Increment occurrence_count for existing patterns
    • Update last_seen timestamp
  4. Generate environmental_insights:

    • Transform failure_patterns into actionable insights
    • Create markdown-formatted descriptions
    • Add command examples
    • Set priority based on severity and frequency
  5. Update infrastructure environmental_notes:

    • Add constraints to infrastructure.environmental_notes
    • Set powershell_version, shell_type, limitations
  6. Generate insights.md file:

    • Query all environmental_insights for client
    • Format as markdown
    • Save to D:\ClaudeTools\insights[client-name].md
    • Agents read this file before making suggestions

3. Pre-Operation Environment Check

Environment Context Agent runs before operations:

Agent Task: "Check environmental constraints for D2TESTNAS before command suggestion"

  1. Query infrastructure:

    SELECT environmental_notes, powershell_version, shell_type, limitations
    FROM infrastructure
    WHERE id = 'd2testnas-uuid';
    
  2. Query environmental_insights:

    SELECT insight_title, insight_description, examples, priority
    FROM environmental_insights
    WHERE infrastructure_id = 'd2testnas-uuid'
      AND is_active = true
    ORDER BY priority DESC;
    
  3. Query failure_patterns:

    SELECT pattern_signature, recommended_solution, workaround_commands
    FROM failure_patterns
    WHERE infrastructure_id = 'd2testnas-uuid'
      AND is_active = true;
    
  4. Check proposed command compatibility:

    • Proposed: "systemctl status nmbd"
    • Pattern match: "systemctl.*command not found"
    • Result: INCOMPATIBLE
    • Recommended: "ps aux | grep nmbd"
  5. Return environmental context:

    Environmental Context for D2TESTNAS:
    - ReadyNAS OS (Linux-based)
    - Manual WINS installation (Samba nmbd)
    - No systemd (use 'service' or ps commands)
    - SMB1/CORE protocol for DOS compatibility
    
    Recommended commands:
    ✓ ps aux | grep nmbd
    ✓ service nmbd status
    ✗ systemctl status nmbd (not available)
    

Main Claude uses this context to suggest correct approach.


Benefits

1. Self-Improving System

  • Each failure makes the system smarter
  • Patterns identified automatically
  • Insights generated without manual documentation
  • Knowledge accumulates over time

2. Reduced User Friction

  • User doesn't have to keep correcting same mistakes
  • Claude learns environmental constraints once
  • Suggestions are environmentally aware from start
  • Proactive problem prevention

3. Institutional Knowledge Capture

  • All environmental quirks documented in database
  • Survives across sessions and Claude instances
  • Queryable: "What are known issues with D2TESTNAS?"
  • Transferable to new team members

4. Proactive Problem Prevention

  • Environment Context Agent prevents failures before they happen
  • Suggests compatible alternatives automatically
  • Warns about known limitations
  • Avoids wasting time on incompatible approaches

5. Audit Trail

  • Every failure tracked with full context
  • Resolution history for troubleshooting
  • Pattern analysis for infrastructure planning
  • ROI tracking: time saved by avoiding repeat failures

Integration with Other Schemas

Sources data from:

  • commands_run - Command execution failures
  • infrastructure - System capabilities and limitations
  • work_items - Context for failures
  • sessions - Session context for operations

Provides data to:

  • Environment Context Agent (pre-operation checks)
  • Problem Pattern Matching Agent (solution lookup)
  • MSP Mode (intelligent suggestions)
  • Reporting (failure analysis, improvement metrics)

Example Queries

Find all insights for a client

SELECT ei.insight_title, ei.insight_description, i.hostname
FROM environmental_insights ei
JOIN infrastructure i ON ei.infrastructure_id = i.id
WHERE ei.client_id = 'dataforth-uuid'
  AND ei.is_active = true
ORDER BY ei.priority DESC;

Search for similar problems

SELECT ps.problem_title, ps.solution_applied, ps.created_at
FROM problem_solutions ps
WHERE MATCH(ps.problem_description, ps.symptom, ps.error_message)
      AGAINST('SSL certificate' IN BOOLEAN MODE)
ORDER BY ps.created_at DESC
LIMIT 10;

Active failure patterns

SELECT fp.pattern_signature, fp.occurrence_count, fp.recommended_solution
FROM failure_patterns fp
WHERE fp.is_active = true
  AND fp.severity IN ('blocking', 'major')
ORDER BY fp.occurrence_count DESC;

Unresolved operation failures

SELECT of.operation_type, of.target_system, of.error_message, of.created_at
FROM operation_failures of
WHERE of.resolved = false
ORDER BY of.created_at DESC;

Document Version: 1.0 Last Updated: 2026-01-15 Author: MSP Mode Schema Design Team