# Learning & Context Schema **MSP Mode Database Schema - Self-Learning System** **Status:** Designed 2026-01-15 **Database:** msp_tracking (MariaDB on Jupiter) --- ## Overview The Learning & Context subsystem enables MSP Mode to learn from every failure, build environmental awareness, and prevent recurring mistakes. This self-improving system captures failure patterns, generates actionable insights, and proactively checks environmental constraints before making suggestions. **Core Principle:** Every failure is a learning opportunity. Agents must never make the same mistake twice. **Related Documentation:** - [MSP-MODE-SPEC.md](../MSP-MODE-SPEC.md) - Full system specification - [ARCHITECTURE_OVERVIEW.md](ARCHITECTURE_OVERVIEW.md) - Agent architecture - [SCHEMA_CREDENTIALS.md](SCHEMA_CREDENTIALS.md) - Security tables - [API_SPEC.md](API_SPEC.md) - API endpoints --- ## Tables Summary | Table | Purpose | Auto-Generated | |-------|---------|----------------| | `environmental_insights` | Generated insights per client/infrastructure | Yes | | `problem_solutions` | Issue tracking with root cause and resolution | Partial | | `failure_patterns` | Aggregated failure analysis and learnings | Yes | | `operation_failures` | Non-command failures (API, file ops, network) | Yes | **Total:** 4 tables **Specialized Agents:** - **Failure Analysis Agent** - Analyzes failures, identifies patterns, generates insights - **Environment Context Agent** - Pre-checks environmental constraints before operations - **Problem Pattern Matching Agent** - Searches historical solutions for similar issues --- ## Table Schemas ### `environmental_insights` Auto-generated insights about client infrastructure constraints, limitations, and quirks. Used by Environment Context Agent to prevent failures before they occur. ```sql CREATE TABLE environmental_insights ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), client_id UUID REFERENCES clients(id) ON DELETE CASCADE, infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE, -- Insight classification insight_category VARCHAR(100) NOT NULL CHECK(insight_category IN ( 'command_constraints', 'service_configuration', 'version_limitations', 'custom_installations', 'network_constraints', 'permissions', 'compatibility', 'performance', 'security' )), insight_title VARCHAR(500) NOT NULL, insight_description TEXT NOT NULL, -- markdown formatted -- Examples and documentation examples TEXT, -- JSON array of command/config examples affected_operations TEXT, -- JSON array: ["user_management", "service_restart"] -- Source and verification source_pattern_id UUID REFERENCES failure_patterns(id) ON DELETE SET NULL, confidence_level VARCHAR(20) CHECK(confidence_level IN ('confirmed', 'likely', 'suspected')), verification_count INTEGER DEFAULT 1, -- how many times verified last_verified TIMESTAMP, -- Priority (1-10, higher = more important to avoid) priority INTEGER DEFAULT 5 CHECK(priority BETWEEN 1 AND 10), -- Status is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies superseded_by UUID REFERENCES environmental_insights(id), -- if replaced by better insight created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, INDEX idx_insights_client (client_id), INDEX idx_insights_infrastructure (infrastructure_id), INDEX idx_insights_category (insight_category), INDEX idx_insights_priority (priority), INDEX idx_insights_active (is_active) ); ``` **Real-World Examples:** **D2TESTNAS - Custom WINS Installation:** ```json { "infrastructure_id": "d2testnas-uuid", "client_id": "dataforth-uuid", "insight_category": "custom_installations", "insight_title": "WINS Service: Manual Samba installation (no native ReadyNAS service)", "insight_description": "**Installation:** Manually installed via Samba nmbd, not a native ReadyNAS service.\n\n**Constraints:**\n- No GUI service manager for WINS\n- Cannot use standard service management commands\n- Configuration via `/etc/frontview/samba/smb.conf.overrides`\n\n**Correct commands:**\n- Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'`\n- View config: `ssh root@192.168.0.9 'cat /etc/frontview/samba/smb.conf.overrides | grep wins'`\n- Restart: `ssh root@192.168.0.9 'service nmbd restart'`", "examples": [ "ps aux | grep nmbd", "cat /etc/frontview/samba/smb.conf.overrides | grep wins", "service nmbd restart" ], "affected_operations": ["service_management", "wins_configuration"], "confidence_level": "confirmed", "verification_count": 3, "priority": 9 } ``` **AD2 - PowerShell Version Constraints:** ```json { "infrastructure_id": "ad2-uuid", "client_id": "dataforth-uuid", "insight_category": "version_limitations", "insight_title": "Server 2022: PowerShell 5.1 command compatibility", "insight_description": "**PowerShell Version:** 5.1 (default)\n\n**Compatible:** Modern cmdlets work (Get-LocalUser, Get-LocalGroup)\n\n**Not available:** PowerShell 7 specific features\n\n**Remote execution:** Use Invoke-Command for remote operations", "examples": [ "Get-LocalUser", "Get-LocalGroup", "Invoke-Command -ComputerName AD2 -ScriptBlock { Get-LocalUser }" ], "confidence_level": "confirmed", "verification_count": 5, "priority": 6 } ``` **Server 2008 - PowerShell 2.0 Limitations:** ```json { "infrastructure_id": "old-server-2008-uuid", "insight_category": "version_limitations", "insight_title": "Server 2008: PowerShell 2.0 command compatibility", "insight_description": "**PowerShell Version:** 2.0 only\n\n**Avoid:** Get-LocalUser, Get-LocalGroup, New-LocalUser (not available in PS 2.0)\n\n**Use instead:** Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group\n\n**Why:** Server 2008 predates modern PowerShell user management cmdlets", "examples": [ "Get-WmiObject Win32_UserAccount", "Get-WmiObject Win32_Group", "Get-WmiObject Win32_UserAccount -Filter \"Name='username'\"" ], "affected_operations": ["user_management", "group_management"], "confidence_level": "confirmed", "verification_count": 5, "priority": 8 } ``` **DOS Machines (TS-XX) - Batch Syntax Constraints:** ```json { "infrastructure_id": "ts-27-uuid", "client_id": "dataforth-uuid", "insight_category": "command_constraints", "insight_title": "MS-DOS 6.22: Batch file syntax limitations", "insight_description": "**OS:** MS-DOS 6.22\n\n**No support for:**\n- `IF /I` (case insensitive) - added in Windows 2000\n- Long filenames (8.3 format only)\n- Unicode or special characters\n- Modern batch features\n\n**Workarounds:**\n- Use duplicate IF statements for upper/lowercase\n- Keep filenames to 8.3 format\n- Use basic batch syntax only", "examples": [ "IF \"%1\"=\"STATUS\" GOTO STATUS", "IF \"%1\"=\"status\" GOTO STATUS", "COPY FILE.TXT BACKUP.TXT" ], "affected_operations": ["batch_scripting", "file_operations"], "confidence_level": "confirmed", "verification_count": 8, "priority": 10 } ``` **D2TESTNAS - SMB Protocol Constraints:** ```json { "infrastructure_id": "d2testnas-uuid", "insight_category": "network_constraints", "insight_title": "ReadyNAS: SMB1/CORE protocol for DOS compatibility", "insight_description": "**Protocol:** CORE/SMB1 only (for DOS machine compatibility)\n\n**Implications:**\n- Modern SMB2/3 clients may need configuration\n- Use NetBIOS name, not IP address for DOS machines\n- Security risk: SMB1 deprecated due to vulnerabilities\n\n**Configuration:**\n- Set in `/etc/frontview/samba/smb.conf.overrides`\n- `min protocol = CORE`", "examples": [ "NET USE Z: \\\\D2TESTNAS\\SHARE (from DOS)", "smbclient -L //192.168.0.9 -m SMB1" ], "confidence_level": "confirmed", "priority": 7 } ``` **Generated insights.md Example:** When Failure Analysis Agent runs, it generates markdown files for each client: ```markdown # Environmental Insights: Dataforth Auto-generated from failure patterns and verified operations. ## D2TESTNAS (192.168.0.9) ### Custom Installations **WINS Service: Manual Samba installation** - Manually installed via Samba nmbd, not native ReadyNAS service - No GUI service manager for WINS - Configure via `/etc/frontview/samba/smb.conf.overrides` - Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'` ### Network Constraints **SMB Protocol: CORE/SMB1 only** - For DOS compatibility - Modern SMB2/3 clients may need configuration - Use NetBIOS name from DOS machines ## AD2 (192.168.0.6 - Server 2022) ### PowerShell Version **Version:** PowerShell 5.1 (default) - **Compatible:** Modern cmdlets work - **Not available:** PowerShell 7 specific features ## TS-XX Machines (DOS 6.22) ### Command Constraints **No support for:** - `IF /I` (case insensitive) - use duplicate IF statements - Long filenames (8.3 format only) - Unicode or special characters - Modern batch features **Examples:** ```batch REM Correct (DOS 6.22) IF "%1"=="STATUS" GOTO STATUS IF "%1"=="status" GOTO STATUS REM Incorrect (requires Windows 2000+) IF /I "%1"=="STATUS" GOTO STATUS ``` ``` --- ### `problem_solutions` Issue tracking with root cause analysis and resolution documentation. Searchable historical knowledge base. ```sql CREATE TABLE problem_solutions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE, session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE, client_id UUID REFERENCES clients(id) ON DELETE SET NULL, infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL, -- Problem description problem_title VARCHAR(500) NOT NULL, problem_description TEXT NOT NULL, symptom TEXT, -- what user/system exhibited error_message TEXT, -- exact error code/message error_code VARCHAR(100), -- structured error code -- Investigation investigation_steps TEXT, -- JSON array of diagnostic commands/actions diagnostic_output TEXT, -- key outputs that led to root cause investigation_duration_minutes INTEGER, -- Root cause root_cause TEXT NOT NULL, root_cause_category VARCHAR(100), -- "configuration", "hardware", "software", "network" -- Solution solution_applied TEXT NOT NULL, solution_category VARCHAR(100), -- "config_change", "restart", "replacement", "patch" commands_run TEXT, -- JSON array of commands used to fix files_modified TEXT, -- JSON array of config files changed -- Verification verification_method TEXT, verification_successful BOOLEAN DEFAULT true, verification_notes TEXT, -- Prevention and rollback rollback_plan TEXT, prevention_measures TEXT, -- what was done to prevent recurrence -- Pattern tracking recurrence_count INTEGER DEFAULT 1, -- if same problem reoccurs similar_problems TEXT, -- JSON array of related problem_solution IDs tags TEXT, -- JSON array: ["ssl", "apache", "certificate"] -- Resolution resolved_at TIMESTAMP, time_to_resolution_minutes INTEGER, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, INDEX idx_problems_work_item (work_item_id), INDEX idx_problems_session (session_id), INDEX idx_problems_client (client_id), INDEX idx_problems_infrastructure (infrastructure_id), INDEX idx_problems_category (root_cause_category), FULLTEXT idx_problems_search (problem_description, symptom, error_message, root_cause) ); ``` **Example Problem Solutions:** **Apache SSL Certificate Expiration:** ```json { "problem_title": "Apache SSL certificate expiration causing ERR_SSL_PROTOCOL_ERROR", "problem_description": "Website inaccessible via HTTPS. Browser shows ERR_SSL_PROTOCOL_ERROR.", "symptom": "Users unable to access website. SSL handshake failure.", "error_message": "ERR_SSL_PROTOCOL_ERROR", "investigation_steps": [ "curl -I https://example.com", "openssl s_client -connect example.com:443", "systemctl status apache2", "openssl x509 -in /etc/ssl/certs/example.com.crt -text -noout" ], "diagnostic_output": "Certificate expiration: 2026-01-10 (3 days ago)", "root_cause": "SSL certificate expired on 2026-01-10. Certbot auto-renewal failed due to DNS validation issue.", "root_cause_category": "configuration", "solution_applied": "1. Fixed DNS TXT record for Let's Encrypt validation\n2. Ran: certbot renew --force-renewal\n3. Restarted Apache: systemctl restart apache2", "solution_category": "config_change", "commands_run": [ "certbot renew --force-renewal", "systemctl restart apache2" ], "files_modified": [ "/etc/apache2/sites-enabled/example.com.conf" ], "verification_method": "curl test successful. Browser loads HTTPS site without error.", "verification_successful": true, "prevention_measures": "Set up monitoring for certificate expiration (30 days warning). Fixed DNS automation for certbot.", "tags": ["ssl", "apache", "certificate", "certbot"], "time_to_resolution_minutes": 25 } ``` **PowerShell Compatibility Issue:** ```json { "problem_title": "Get-LocalUser fails on Server 2008 (PowerShell 2.0)", "problem_description": "Attempting to list local users on Server 2008 using Get-LocalUser cmdlet", "symptom": "Command not recognized error", "error_message": "Get-LocalUser : The term 'Get-LocalUser' is not recognized as the name of a cmdlet", "error_code": "CommandNotFoundException", "investigation_steps": [ "$PSVersionTable", "Get-Command Get-LocalUser", "Get-WmiObject Win32_OperatingSystem | Select Caption, Version" ], "root_cause": "Server 2008 has PowerShell 2.0 only. Get-LocalUser introduced in PowerShell 5.1 (Windows 10/Server 2016).", "root_cause_category": "software", "solution_applied": "Use WMI instead: Get-WmiObject Win32_UserAccount", "solution_category": "alternative_approach", "commands_run": [ "Get-WmiObject Win32_UserAccount | Select Name, Disabled, LocalAccount" ], "verification_method": "Successfully retrieved local user list", "verification_successful": true, "prevention_measures": "Created environmental insight for all Server 2008 machines. Environment Context Agent now checks PowerShell version before suggesting cmdlets.", "tags": ["powershell", "server_2008", "compatibility", "user_management"], "recurrence_count": 5 } ``` **Queries:** ```sql -- Find similar problems by error message SELECT problem_title, solution_applied, created_at FROM problem_solutions WHERE MATCH(error_message) AGAINST('SSL_PROTOCOL_ERROR' IN BOOLEAN MODE) ORDER BY created_at DESC; -- Most common problems (by recurrence) SELECT problem_title, recurrence_count, root_cause_category FROM problem_solutions WHERE recurrence_count > 1 ORDER BY recurrence_count DESC; -- Recent solutions for client SELECT problem_title, solution_applied, resolved_at FROM problem_solutions WHERE client_id = 'dataforth-uuid' ORDER BY resolved_at DESC LIMIT 10; ``` --- ### `failure_patterns` Aggregated failure insights learned from command/operation failures. Auto-generated by Failure Analysis Agent. ```sql CREATE TABLE failure_patterns ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE, client_id UUID REFERENCES clients(id) ON DELETE CASCADE, -- Pattern identification pattern_type VARCHAR(100) NOT NULL CHECK(pattern_type IN ( 'command_compatibility', 'version_mismatch', 'permission_denied', 'service_unavailable', 'configuration_error', 'environmental_limitation', 'network_connectivity', 'authentication_failure', 'syntax_error' )), pattern_signature VARCHAR(500) NOT NULL, -- "PowerShell 7 cmdlets on Server 2008" error_pattern TEXT, -- regex or keywords: "Get-LocalUser.*not recognized" -- Context affected_systems TEXT, -- JSON array: ["all_server_2008", "D2TESTNAS"] affected_os_versions TEXT, -- JSON array: ["Server 2008", "DOS 6.22"] triggering_commands TEXT, -- JSON array of command patterns triggering_operations TEXT, -- JSON array of operation types -- Failure details failure_description TEXT NOT NULL, typical_error_messages TEXT, -- JSON array of common error texts -- Resolution root_cause TEXT NOT NULL, -- "Server 2008 only has PowerShell 2.0" recommended_solution TEXT NOT NULL, -- "Use Get-WmiObject instead of Get-LocalUser" alternative_approaches TEXT, -- JSON array of alternatives workaround_commands TEXT, -- JSON array of working commands -- Metadata occurrence_count INTEGER DEFAULT 1, -- how many times seen first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP, severity VARCHAR(20) CHECK(severity IN ('blocking', 'major', 'minor', 'info')), -- Status is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies (e.g., server upgraded) added_to_insights BOOLEAN DEFAULT false, -- environmental_insight generated created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, INDEX idx_failure_infrastructure (infrastructure_id), INDEX idx_failure_client (client_id), INDEX idx_failure_pattern_type (pattern_type), INDEX idx_failure_signature (pattern_signature), INDEX idx_failure_active (is_active), INDEX idx_failure_severity (severity) ); ``` **Example Failure Patterns:** **PowerShell Version Incompatibility:** ```json { "pattern_type": "command_compatibility", "pattern_signature": "Modern PowerShell cmdlets on Server 2008", "error_pattern": "(Get-LocalUser|Get-LocalGroup|New-LocalUser).*not recognized", "affected_systems": ["all_server_2008_machines"], "affected_os_versions": ["Server 2008", "Server 2008 R2"], "triggering_commands": [ "Get-LocalUser", "Get-LocalGroup", "New-LocalUser", "Remove-LocalUser" ], "failure_description": "Modern PowerShell user management cmdlets fail on Server 2008 with 'not recognized' error", "typical_error_messages": [ "Get-LocalUser : The term 'Get-LocalUser' is not recognized", "Get-LocalGroup : The term 'Get-LocalGroup' is not recognized" ], "root_cause": "Server 2008 has PowerShell 2.0 only. Modern user management cmdlets (Get-LocalUser, etc.) were introduced in PowerShell 5.1 (Windows 10/Server 2016).", "recommended_solution": "Use WMI for user/group management: Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group", "alternative_approaches": [ "Use Get-WmiObject Win32_UserAccount", "Use net user command", "Upgrade to PowerShell 5.1 (if possible on Server 2008 R2)" ], "workaround_commands": [ "Get-WmiObject Win32_UserAccount", "Get-WmiObject Win32_Group", "net user" ], "occurrence_count": 5, "severity": "major", "added_to_insights": true } ``` **DOS Batch Syntax Limitation:** ```json { "pattern_type": "environmental_limitation", "pattern_signature": "Modern batch syntax on MS-DOS 6.22", "error_pattern": "IF /I.*Invalid switch", "affected_systems": ["all_dos_machines"], "affected_os_versions": ["MS-DOS 6.22"], "triggering_commands": [ "IF /I \"%1\"==\"value\" ...", "Long filenames with spaces" ], "failure_description": "Modern batch file syntax not supported in MS-DOS 6.22", "typical_error_messages": [ "Invalid switch - /I", "File not found (long filename)", "Bad command or file name" ], "root_cause": "DOS 6.22 does not support /I flag (added in Windows 2000), long filenames, or many modern batch features", "recommended_solution": "Use duplicate IF statements for upper/lowercase. Keep filenames to 8.3 format. Use basic batch syntax only.", "alternative_approaches": [ "Duplicate IF for case-insensitive: IF \"%1\"==\"VALUE\" ... + IF \"%1\"==\"value\" ...", "Use 8.3 filenames only", "Avoid advanced batch features" ], "workaround_commands": [ "IF \"%1\"==\"STATUS\" GOTO STATUS", "IF \"%1\"==\"status\" GOTO STATUS" ], "occurrence_count": 8, "severity": "blocking", "added_to_insights": true } ``` **ReadyNAS Service Management:** ```json { "pattern_type": "service_unavailable", "pattern_signature": "systemd commands on ReadyNAS", "error_pattern": "systemctl.*command not found", "affected_systems": ["D2TESTNAS"], "triggering_commands": [ "systemctl status nmbd", "systemctl restart samba" ], "failure_description": "ReadyNAS does not use systemd for service management", "typical_error_messages": [ "systemctl: command not found", "-ash: systemctl: not found" ], "root_cause": "ReadyNAS OS is based on older Linux without systemd. Uses traditional init scripts.", "recommended_solution": "Use 'service' command or direct process management: service nmbd status, ps aux | grep nmbd", "alternative_approaches": [ "service nmbd status", "ps aux | grep nmbd", "/etc/init.d/nmbd status" ], "occurrence_count": 3, "severity": "major", "added_to_insights": true } ``` --- ### `operation_failures` Non-command failures (API calls, integrations, file operations, network requests). Complements commands_run failure tracking. ```sql CREATE TABLE operation_failures ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), session_id UUID REFERENCES sessions(id) ON DELETE CASCADE, work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE, client_id UUID REFERENCES clients(id) ON DELETE SET NULL, -- Operation details operation_type VARCHAR(100) NOT NULL CHECK(operation_type IN ( 'api_call', 'file_operation', 'network_request', 'database_query', 'external_integration', 'service_restart', 'backup_operation', 'restore_operation', 'migration' )), operation_description TEXT NOT NULL, target_system VARCHAR(255), -- host, URL, service name -- Failure details error_message TEXT NOT NULL, error_code VARCHAR(50), -- HTTP status, exit code, error number failure_category VARCHAR(100), -- "timeout", "authentication", "not_found", etc. stack_trace TEXT, -- Context request_data TEXT, -- JSON: what was attempted response_data TEXT, -- JSON: error response environment_snapshot TEXT, -- JSON: relevant env vars, versions -- Resolution resolution_applied TEXT, resolved BOOLEAN DEFAULT false, resolved_at TIMESTAMP, time_to_resolution_minutes INTEGER, -- Pattern linkage related_pattern_id UUID REFERENCES failure_patterns(id), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, INDEX idx_op_failure_session (session_id), INDEX idx_op_failure_type (operation_type), INDEX idx_op_failure_category (failure_category), INDEX idx_op_failure_resolved (resolved), INDEX idx_op_failure_client (client_id) ); ``` **Example Operation Failures:** **SyncroMSP API Timeout:** ```json { "operation_type": "api_call", "operation_description": "Search SyncroMSP tickets for Dataforth", "target_system": "https://azcomputerguru.syncromsp.com/api/v1", "error_message": "Request timeout after 30 seconds", "error_code": "ETIMEDOUT", "failure_category": "timeout", "request_data": { "endpoint": "/api/v1/tickets", "params": {"customer_id": 12345, "status": "open"} }, "response_data": null, "resolution_applied": "Increased timeout to 60 seconds. Added retry logic with exponential backoff.", "resolved": true, "time_to_resolution_minutes": 15 } ``` **File Upload Permission Denied:** ```json { "operation_type": "file_operation", "operation_description": "Upload backup file to NAS", "target_system": "D2TESTNAS:/mnt/backups", "error_message": "Permission denied: /mnt/backups/db_backup_2026-01-15.sql", "error_code": "EACCES", "failure_category": "permission", "environment_snapshot": { "user": "backupuser", "directory_perms": "drwxr-xr-x root root" }, "resolution_applied": "Changed directory ownership: chown -R backupuser:backupgroup /mnt/backups", "resolved": true } ``` **Database Query Performance:** ```json { "operation_type": "database_query", "operation_description": "Query sessions table for large date range", "target_system": "MariaDB msp_tracking", "error_message": "Query execution time: 45 seconds (threshold: 5 seconds)", "failure_category": "performance", "request_data": { "query": "SELECT * FROM sessions WHERE session_date BETWEEN '2020-01-01' AND '2026-01-15'" }, "resolution_applied": "Added index on session_date column. Query now runs in 0.3 seconds.", "resolved": true } ``` --- ## Self-Learning Workflow ### 1. Failure Detection and Logging **Command Execution with Failure Tracking:** ``` User: "Check WINS status on D2TESTNAS" Main Claude → Environment Context Agent: - Queries infrastructure table for D2TESTNAS - Reads environmental_notes: "Manual WINS install, no native service" - Reads environmental_insights for D2TESTNAS - Returns: "D2TESTNAS has manually installed WINS (not native ReadyNAS service)" Main Claude suggests command based on environmental context: - Executes: ssh root@192.168.0.9 'systemctl status nmbd' Command fails: - success = false - exit_code = 127 - error_message = "systemctl: command not found" - failure_category = "command_compatibility" Trigger Failure Analysis Agent: - Analyzes error: ReadyNAS doesn't use systemd - Identifies correct approach: "service nmbd status" or "ps aux | grep nmbd" - Creates failure_pattern entry - Updates environmental_insights with correction - Returns resolution to Main Claude Main Claude tries corrected command: - Executes: ssh root@192.168.0.9 'ps aux | grep nmbd' - Success = true - Updates original failure record with resolution ``` ### 2. Pattern Analysis (Periodic Agent Run) **Failure Analysis Agent runs periodically:** **Agent Task:** "Analyze recent failures and update environmental insights" 1. **Query failures:** ```sql SELECT * FROM commands_run WHERE success = false AND resolved = false ORDER BY created_at DESC; SELECT * FROM operation_failures WHERE resolved = false ORDER BY created_at DESC; ``` 2. **Group by pattern:** - Group by infrastructure_id, error_pattern, failure_category - Identify recurring patterns 3. **Create/update failure_patterns:** - If pattern seen 3+ times → Create failure_pattern - Increment occurrence_count for existing patterns - Update last_seen timestamp 4. **Generate environmental_insights:** - Transform failure_patterns into actionable insights - Create markdown-formatted descriptions - Add command examples - Set priority based on severity and frequency 5. **Update infrastructure environmental_notes:** - Add constraints to infrastructure.environmental_notes - Set powershell_version, shell_type, limitations 6. **Generate insights.md file:** - Query all environmental_insights for client - Format as markdown - Save to D:\ClaudeTools\insights\[client-name].md - Agents read this file before making suggestions ### 3. Pre-Operation Environment Check **Environment Context Agent runs before operations:** **Agent Task:** "Check environmental constraints for D2TESTNAS before command suggestion" 1. **Query infrastructure:** ```sql SELECT environmental_notes, powershell_version, shell_type, limitations FROM infrastructure WHERE id = 'd2testnas-uuid'; ``` 2. **Query environmental_insights:** ```sql SELECT insight_title, insight_description, examples, priority FROM environmental_insights WHERE infrastructure_id = 'd2testnas-uuid' AND is_active = true ORDER BY priority DESC; ``` 3. **Query failure_patterns:** ```sql SELECT pattern_signature, recommended_solution, workaround_commands FROM failure_patterns WHERE infrastructure_id = 'd2testnas-uuid' AND is_active = true; ``` 4. **Check proposed command compatibility:** - Proposed: "systemctl status nmbd" - Pattern match: "systemctl.*command not found" - **Result:** INCOMPATIBLE - Recommended: "ps aux | grep nmbd" 5. **Return environmental context:** ``` Environmental Context for D2TESTNAS: - ReadyNAS OS (Linux-based) - Manual WINS installation (Samba nmbd) - No systemd (use 'service' or ps commands) - SMB1/CORE protocol for DOS compatibility Recommended commands: ✓ ps aux | grep nmbd ✓ service nmbd status ✗ systemctl status nmbd (not available) ``` Main Claude uses this context to suggest correct approach. --- ## Benefits ### 1. Self-Improving System - Each failure makes the system smarter - Patterns identified automatically - Insights generated without manual documentation - Knowledge accumulates over time ### 2. Reduced User Friction - User doesn't have to keep correcting same mistakes - Claude learns environmental constraints once - Suggestions are environmentally aware from start - Proactive problem prevention ### 3. Institutional Knowledge Capture - All environmental quirks documented in database - Survives across sessions and Claude instances - Queryable: "What are known issues with D2TESTNAS?" - Transferable to new team members ### 4. Proactive Problem Prevention - Environment Context Agent prevents failures before they happen - Suggests compatible alternatives automatically - Warns about known limitations - Avoids wasting time on incompatible approaches ### 5. Audit Trail - Every failure tracked with full context - Resolution history for troubleshooting - Pattern analysis for infrastructure planning - ROI tracking: time saved by avoiding repeat failures --- ## Integration with Other Schemas **Sources data from:** - `commands_run` - Command execution failures - `infrastructure` - System capabilities and limitations - `work_items` - Context for failures - `sessions` - Session context for operations **Provides data to:** - Environment Context Agent (pre-operation checks) - Problem Pattern Matching Agent (solution lookup) - MSP Mode (intelligent suggestions) - Reporting (failure analysis, improvement metrics) --- ## Example Queries ### Find all insights for a client ```sql SELECT ei.insight_title, ei.insight_description, i.hostname FROM environmental_insights ei JOIN infrastructure i ON ei.infrastructure_id = i.id WHERE ei.client_id = 'dataforth-uuid' AND ei.is_active = true ORDER BY ei.priority DESC; ``` ### Search for similar problems ```sql SELECT ps.problem_title, ps.solution_applied, ps.created_at FROM problem_solutions ps WHERE MATCH(ps.problem_description, ps.symptom, ps.error_message) AGAINST('SSL certificate' IN BOOLEAN MODE) ORDER BY ps.created_at DESC LIMIT 10; ``` ### Active failure patterns ```sql SELECT fp.pattern_signature, fp.occurrence_count, fp.recommended_solution FROM failure_patterns fp WHERE fp.is_active = true AND fp.severity IN ('blocking', 'major') ORDER BY fp.occurrence_count DESC; ``` ### Unresolved operation failures ```sql SELECT of.operation_type, of.target_system, of.error_message, of.created_at FROM operation_failures of WHERE of.resolved = false ORDER BY of.created_at DESC; ``` --- **Document Version:** 1.0 **Last Updated:** 2026-01-15 **Author:** MSP Mode Schema Design Team