Files
claudetools/.claude/SCHEMA_CONTEXT.md
Mike Swanson 390b10b32c Complete Phase 6: MSP Work Tracking with Context Recall System
Implements production-ready MSP platform with cross-machine persistent memory for Claude.

API Implementation:
- 130 REST API endpoints across 21 entities
- JWT authentication on all endpoints
- AES-256-GCM encryption for credentials
- Automatic audit logging
- Complete OpenAPI documentation

Database:
- 43 tables in MariaDB (172.16.3.20:3306)
- 42 SQLAlchemy models with modern 2.0 syntax
- Full Alembic migration system
- 99.1% CRUD test pass rate

Context Recall System (Phase 6):
- Cross-machine persistent memory via database
- Automatic context injection via Claude Code hooks
- Automatic context saving after task completion
- 90-95% token reduction with compression utilities
- Relevance scoring with time decay
- Tag-based semantic search
- One-command setup script

Security Features:
- JWT tokens with Argon2 password hashing
- AES-256-GCM encryption for all sensitive data
- Comprehensive audit trail for credentials
- HMAC tamper detection
- Secure configuration management

Test Results:
- Phase 3: 38/38 CRUD tests passing (100%)
- Phase 4: 34/35 core API tests passing (97.1%)
- Phase 5: 62/62 extended API tests passing (100%)
- Phase 6: 10/10 compression tests passing (100%)
- Overall: 144/145 tests passing (99.3%)

Documentation:
- Comprehensive architecture guides
- Setup automation scripts
- API documentation at /api/docs
- Complete test reports
- Troubleshooting guides

Project Status: 95% Complete (Production-Ready)
Phase 7 (optional work context APIs) remains for future enhancement.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-17 06:00:26 -07:00

893 lines
31 KiB
Markdown

# Learning & Context Schema
**MSP Mode Database Schema - Self-Learning System**
**Status:** Designed 2026-01-15
**Database:** msp_tracking (MariaDB on Jupiter)
---
## Overview
The Learning & Context subsystem enables MSP Mode to learn from every failure, build environmental awareness, and prevent recurring mistakes. This self-improving system captures failure patterns, generates actionable insights, and proactively checks environmental constraints before making suggestions.
**Core Principle:** Every failure is a learning opportunity. Agents must never make the same mistake twice.
**Related Documentation:**
- [MSP-MODE-SPEC.md](../MSP-MODE-SPEC.md) - Full system specification
- [ARCHITECTURE_OVERVIEW.md](ARCHITECTURE_OVERVIEW.md) - Agent architecture
- [SCHEMA_CREDENTIALS.md](SCHEMA_CREDENTIALS.md) - Security tables
- [API_SPEC.md](API_SPEC.md) - API endpoints
---
## Tables Summary
| Table | Purpose | Auto-Generated |
|-------|---------|----------------|
| `environmental_insights` | Generated insights per client/infrastructure | Yes |
| `problem_solutions` | Issue tracking with root cause and resolution | Partial |
| `failure_patterns` | Aggregated failure analysis and learnings | Yes |
| `operation_failures` | Non-command failures (API, file ops, network) | Yes |
**Total:** 4 tables
**Specialized Agents:**
- **Failure Analysis Agent** - Analyzes failures, identifies patterns, generates insights
- **Environment Context Agent** - Pre-checks environmental constraints before operations
- **Problem Pattern Matching Agent** - Searches historical solutions for similar issues
---
## Table Schemas
### `environmental_insights`
Auto-generated insights about client infrastructure constraints, limitations, and quirks. Used by Environment Context Agent to prevent failures before they occur.
```sql
CREATE TABLE environmental_insights (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
-- Insight classification
insight_category VARCHAR(100) NOT NULL CHECK(insight_category IN (
'command_constraints', 'service_configuration', 'version_limitations',
'custom_installations', 'network_constraints', 'permissions',
'compatibility', 'performance', 'security'
)),
insight_title VARCHAR(500) NOT NULL,
insight_description TEXT NOT NULL, -- markdown formatted
-- Examples and documentation
examples TEXT, -- JSON array of command/config examples
affected_operations TEXT, -- JSON array: ["user_management", "service_restart"]
-- Source and verification
source_pattern_id UUID REFERENCES failure_patterns(id) ON DELETE SET NULL,
confidence_level VARCHAR(20) CHECK(confidence_level IN ('confirmed', 'likely', 'suspected')),
verification_count INTEGER DEFAULT 1, -- how many times verified
last_verified TIMESTAMP,
-- Priority (1-10, higher = more important to avoid)
priority INTEGER DEFAULT 5 CHECK(priority BETWEEN 1 AND 10),
-- Status
is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies
superseded_by UUID REFERENCES environmental_insights(id), -- if replaced by better insight
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_insights_client (client_id),
INDEX idx_insights_infrastructure (infrastructure_id),
INDEX idx_insights_category (insight_category),
INDEX idx_insights_priority (priority),
INDEX idx_insights_active (is_active)
);
```
**Real-World Examples:**
**D2TESTNAS - Custom WINS Installation:**
```json
{
"infrastructure_id": "d2testnas-uuid",
"client_id": "dataforth-uuid",
"insight_category": "custom_installations",
"insight_title": "WINS Service: Manual Samba installation (no native ReadyNAS service)",
"insight_description": "**Installation:** Manually installed via Samba nmbd, not a native ReadyNAS service.\n\n**Constraints:**\n- No GUI service manager for WINS\n- Cannot use standard service management commands\n- Configuration via `/etc/frontview/samba/smb.conf.overrides`\n\n**Correct commands:**\n- Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'`\n- View config: `ssh root@192.168.0.9 'cat /etc/frontview/samba/smb.conf.overrides | grep wins'`\n- Restart: `ssh root@192.168.0.9 'service nmbd restart'`",
"examples": [
"ps aux | grep nmbd",
"cat /etc/frontview/samba/smb.conf.overrides | grep wins",
"service nmbd restart"
],
"affected_operations": ["service_management", "wins_configuration"],
"confidence_level": "confirmed",
"verification_count": 3,
"priority": 9
}
```
**AD2 - PowerShell Version Constraints:**
```json
{
"infrastructure_id": "ad2-uuid",
"client_id": "dataforth-uuid",
"insight_category": "version_limitations",
"insight_title": "Server 2022: PowerShell 5.1 command compatibility",
"insight_description": "**PowerShell Version:** 5.1 (default)\n\n**Compatible:** Modern cmdlets work (Get-LocalUser, Get-LocalGroup)\n\n**Not available:** PowerShell 7 specific features\n\n**Remote execution:** Use Invoke-Command for remote operations",
"examples": [
"Get-LocalUser",
"Get-LocalGroup",
"Invoke-Command -ComputerName AD2 -ScriptBlock { Get-LocalUser }"
],
"confidence_level": "confirmed",
"verification_count": 5,
"priority": 6
}
```
**Server 2008 - PowerShell 2.0 Limitations:**
```json
{
"infrastructure_id": "old-server-2008-uuid",
"insight_category": "version_limitations",
"insight_title": "Server 2008: PowerShell 2.0 command compatibility",
"insight_description": "**PowerShell Version:** 2.0 only\n\n**Avoid:** Get-LocalUser, Get-LocalGroup, New-LocalUser (not available in PS 2.0)\n\n**Use instead:** Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group\n\n**Why:** Server 2008 predates modern PowerShell user management cmdlets",
"examples": [
"Get-WmiObject Win32_UserAccount",
"Get-WmiObject Win32_Group",
"Get-WmiObject Win32_UserAccount -Filter \"Name='username'\""
],
"affected_operations": ["user_management", "group_management"],
"confidence_level": "confirmed",
"verification_count": 5,
"priority": 8
}
```
**DOS Machines (TS-XX) - Batch Syntax Constraints:**
```json
{
"infrastructure_id": "ts-27-uuid",
"client_id": "dataforth-uuid",
"insight_category": "command_constraints",
"insight_title": "MS-DOS 6.22: Batch file syntax limitations",
"insight_description": "**OS:** MS-DOS 6.22\n\n**No support for:**\n- `IF /I` (case insensitive) - added in Windows 2000\n- Long filenames (8.3 format only)\n- Unicode or special characters\n- Modern batch features\n\n**Workarounds:**\n- Use duplicate IF statements for upper/lowercase\n- Keep filenames to 8.3 format\n- Use basic batch syntax only",
"examples": [
"IF \"%1\"=\"STATUS\" GOTO STATUS",
"IF \"%1\"=\"status\" GOTO STATUS",
"COPY FILE.TXT BACKUP.TXT"
],
"affected_operations": ["batch_scripting", "file_operations"],
"confidence_level": "confirmed",
"verification_count": 8,
"priority": 10
}
```
**D2TESTNAS - SMB Protocol Constraints:**
```json
{
"infrastructure_id": "d2testnas-uuid",
"insight_category": "network_constraints",
"insight_title": "ReadyNAS: SMB1/CORE protocol for DOS compatibility",
"insight_description": "**Protocol:** CORE/SMB1 only (for DOS machine compatibility)\n\n**Implications:**\n- Modern SMB2/3 clients may need configuration\n- Use NetBIOS name, not IP address for DOS machines\n- Security risk: SMB1 deprecated due to vulnerabilities\n\n**Configuration:**\n- Set in `/etc/frontview/samba/smb.conf.overrides`\n- `min protocol = CORE`",
"examples": [
"NET USE Z: \\\\D2TESTNAS\\SHARE (from DOS)",
"smbclient -L //192.168.0.9 -m SMB1"
],
"confidence_level": "confirmed",
"priority": 7
}
```
**Generated insights.md Example:**
When Failure Analysis Agent runs, it generates markdown files for each client:
```markdown
# Environmental Insights: Dataforth
Auto-generated from failure patterns and verified operations.
## D2TESTNAS (192.168.0.9)
### Custom Installations
**WINS Service: Manual Samba installation**
- Manually installed via Samba nmbd, not native ReadyNAS service
- No GUI service manager for WINS
- Configure via `/etc/frontview/samba/smb.conf.overrides`
- Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'`
### Network Constraints
**SMB Protocol: CORE/SMB1 only**
- For DOS compatibility
- Modern SMB2/3 clients may need configuration
- Use NetBIOS name from DOS machines
## AD2 (192.168.0.6 - Server 2022)
### PowerShell Version
**Version:** PowerShell 5.1 (default)
- **Compatible:** Modern cmdlets work
- **Not available:** PowerShell 7 specific features
## TS-XX Machines (DOS 6.22)
### Command Constraints
**No support for:**
- `IF /I` (case insensitive) - use duplicate IF statements
- Long filenames (8.3 format only)
- Unicode or special characters
- Modern batch features
**Examples:**
```batch
REM Correct (DOS 6.22)
IF "%1"=="STATUS" GOTO STATUS
IF "%1"=="status" GOTO STATUS
REM Incorrect (requires Windows 2000+)
IF /I "%1"=="STATUS" GOTO STATUS
```
```
---
### `problem_solutions`
Issue tracking with root cause analysis and resolution documentation. Searchable historical knowledge base.
```sql
CREATE TABLE problem_solutions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
-- Problem description
problem_title VARCHAR(500) NOT NULL,
problem_description TEXT NOT NULL,
symptom TEXT, -- what user/system exhibited
error_message TEXT, -- exact error code/message
error_code VARCHAR(100), -- structured error code
-- Investigation
investigation_steps TEXT, -- JSON array of diagnostic commands/actions
diagnostic_output TEXT, -- key outputs that led to root cause
investigation_duration_minutes INTEGER,
-- Root cause
root_cause TEXT NOT NULL,
root_cause_category VARCHAR(100), -- "configuration", "hardware", "software", "network"
-- Solution
solution_applied TEXT NOT NULL,
solution_category VARCHAR(100), -- "config_change", "restart", "replacement", "patch"
commands_run TEXT, -- JSON array of commands used to fix
files_modified TEXT, -- JSON array of config files changed
-- Verification
verification_method TEXT,
verification_successful BOOLEAN DEFAULT true,
verification_notes TEXT,
-- Prevention and rollback
rollback_plan TEXT,
prevention_measures TEXT, -- what was done to prevent recurrence
-- Pattern tracking
recurrence_count INTEGER DEFAULT 1, -- if same problem reoccurs
similar_problems TEXT, -- JSON array of related problem_solution IDs
tags TEXT, -- JSON array: ["ssl", "apache", "certificate"]
-- Resolution
resolved_at TIMESTAMP,
time_to_resolution_minutes INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_problems_work_item (work_item_id),
INDEX idx_problems_session (session_id),
INDEX idx_problems_client (client_id),
INDEX idx_problems_infrastructure (infrastructure_id),
INDEX idx_problems_category (root_cause_category),
FULLTEXT idx_problems_search (problem_description, symptom, error_message, root_cause)
);
```
**Example Problem Solutions:**
**Apache SSL Certificate Expiration:**
```json
{
"problem_title": "Apache SSL certificate expiration causing ERR_SSL_PROTOCOL_ERROR",
"problem_description": "Website inaccessible via HTTPS. Browser shows ERR_SSL_PROTOCOL_ERROR.",
"symptom": "Users unable to access website. SSL handshake failure.",
"error_message": "ERR_SSL_PROTOCOL_ERROR",
"investigation_steps": [
"curl -I https://example.com",
"openssl s_client -connect example.com:443",
"systemctl status apache2",
"openssl x509 -in /etc/ssl/certs/example.com.crt -text -noout"
],
"diagnostic_output": "Certificate expiration: 2026-01-10 (3 days ago)",
"root_cause": "SSL certificate expired on 2026-01-10. Certbot auto-renewal failed due to DNS validation issue.",
"root_cause_category": "configuration",
"solution_applied": "1. Fixed DNS TXT record for Let's Encrypt validation\n2. Ran: certbot renew --force-renewal\n3. Restarted Apache: systemctl restart apache2",
"solution_category": "config_change",
"commands_run": [
"certbot renew --force-renewal",
"systemctl restart apache2"
],
"files_modified": [
"/etc/apache2/sites-enabled/example.com.conf"
],
"verification_method": "curl test successful. Browser loads HTTPS site without error.",
"verification_successful": true,
"prevention_measures": "Set up monitoring for certificate expiration (30 days warning). Fixed DNS automation for certbot.",
"tags": ["ssl", "apache", "certificate", "certbot"],
"time_to_resolution_minutes": 25
}
```
**PowerShell Compatibility Issue:**
```json
{
"problem_title": "Get-LocalUser fails on Server 2008 (PowerShell 2.0)",
"problem_description": "Attempting to list local users on Server 2008 using Get-LocalUser cmdlet",
"symptom": "Command not recognized error",
"error_message": "Get-LocalUser : The term 'Get-LocalUser' is not recognized as the name of a cmdlet",
"error_code": "CommandNotFoundException",
"investigation_steps": [
"$PSVersionTable",
"Get-Command Get-LocalUser",
"Get-WmiObject Win32_OperatingSystem | Select Caption, Version"
],
"root_cause": "Server 2008 has PowerShell 2.0 only. Get-LocalUser introduced in PowerShell 5.1 (Windows 10/Server 2016).",
"root_cause_category": "software",
"solution_applied": "Use WMI instead: Get-WmiObject Win32_UserAccount",
"solution_category": "alternative_approach",
"commands_run": [
"Get-WmiObject Win32_UserAccount | Select Name, Disabled, LocalAccount"
],
"verification_method": "Successfully retrieved local user list",
"verification_successful": true,
"prevention_measures": "Created environmental insight for all Server 2008 machines. Environment Context Agent now checks PowerShell version before suggesting cmdlets.",
"tags": ["powershell", "server_2008", "compatibility", "user_management"],
"recurrence_count": 5
}
```
**Queries:**
```sql
-- Find similar problems by error message
SELECT problem_title, solution_applied, created_at
FROM problem_solutions
WHERE MATCH(error_message) AGAINST('SSL_PROTOCOL_ERROR' IN BOOLEAN MODE)
ORDER BY created_at DESC;
-- Most common problems (by recurrence)
SELECT problem_title, recurrence_count, root_cause_category
FROM problem_solutions
WHERE recurrence_count > 1
ORDER BY recurrence_count DESC;
-- Recent solutions for client
SELECT problem_title, solution_applied, resolved_at
FROM problem_solutions
WHERE client_id = 'dataforth-uuid'
ORDER BY resolved_at DESC
LIMIT 10;
```
---
### `failure_patterns`
Aggregated failure insights learned from command/operation failures. Auto-generated by Failure Analysis Agent.
```sql
CREATE TABLE failure_patterns (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
-- Pattern identification
pattern_type VARCHAR(100) NOT NULL CHECK(pattern_type IN (
'command_compatibility', 'version_mismatch', 'permission_denied',
'service_unavailable', 'configuration_error', 'environmental_limitation',
'network_connectivity', 'authentication_failure', 'syntax_error'
)),
pattern_signature VARCHAR(500) NOT NULL, -- "PowerShell 7 cmdlets on Server 2008"
error_pattern TEXT, -- regex or keywords: "Get-LocalUser.*not recognized"
-- Context
affected_systems TEXT, -- JSON array: ["all_server_2008", "D2TESTNAS"]
affected_os_versions TEXT, -- JSON array: ["Server 2008", "DOS 6.22"]
triggering_commands TEXT, -- JSON array of command patterns
triggering_operations TEXT, -- JSON array of operation types
-- Failure details
failure_description TEXT NOT NULL,
typical_error_messages TEXT, -- JSON array of common error texts
-- Resolution
root_cause TEXT NOT NULL, -- "Server 2008 only has PowerShell 2.0"
recommended_solution TEXT NOT NULL, -- "Use Get-WmiObject instead of Get-LocalUser"
alternative_approaches TEXT, -- JSON array of alternatives
workaround_commands TEXT, -- JSON array of working commands
-- Metadata
occurrence_count INTEGER DEFAULT 1, -- how many times seen
first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
severity VARCHAR(20) CHECK(severity IN ('blocking', 'major', 'minor', 'info')),
-- Status
is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies (e.g., server upgraded)
added_to_insights BOOLEAN DEFAULT false, -- environmental_insight generated
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_failure_infrastructure (infrastructure_id),
INDEX idx_failure_client (client_id),
INDEX idx_failure_pattern_type (pattern_type),
INDEX idx_failure_signature (pattern_signature),
INDEX idx_failure_active (is_active),
INDEX idx_failure_severity (severity)
);
```
**Example Failure Patterns:**
**PowerShell Version Incompatibility:**
```json
{
"pattern_type": "command_compatibility",
"pattern_signature": "Modern PowerShell cmdlets on Server 2008",
"error_pattern": "(Get-LocalUser|Get-LocalGroup|New-LocalUser).*not recognized",
"affected_systems": ["all_server_2008_machines"],
"affected_os_versions": ["Server 2008", "Server 2008 R2"],
"triggering_commands": [
"Get-LocalUser",
"Get-LocalGroup",
"New-LocalUser",
"Remove-LocalUser"
],
"failure_description": "Modern PowerShell user management cmdlets fail on Server 2008 with 'not recognized' error",
"typical_error_messages": [
"Get-LocalUser : The term 'Get-LocalUser' is not recognized",
"Get-LocalGroup : The term 'Get-LocalGroup' is not recognized"
],
"root_cause": "Server 2008 has PowerShell 2.0 only. Modern user management cmdlets (Get-LocalUser, etc.) were introduced in PowerShell 5.1 (Windows 10/Server 2016).",
"recommended_solution": "Use WMI for user/group management: Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group",
"alternative_approaches": [
"Use Get-WmiObject Win32_UserAccount",
"Use net user command",
"Upgrade to PowerShell 5.1 (if possible on Server 2008 R2)"
],
"workaround_commands": [
"Get-WmiObject Win32_UserAccount",
"Get-WmiObject Win32_Group",
"net user"
],
"occurrence_count": 5,
"severity": "major",
"added_to_insights": true
}
```
**DOS Batch Syntax Limitation:**
```json
{
"pattern_type": "environmental_limitation",
"pattern_signature": "Modern batch syntax on MS-DOS 6.22",
"error_pattern": "IF /I.*Invalid switch",
"affected_systems": ["all_dos_machines"],
"affected_os_versions": ["MS-DOS 6.22"],
"triggering_commands": [
"IF /I \"%1\"==\"value\" ...",
"Long filenames with spaces"
],
"failure_description": "Modern batch file syntax not supported in MS-DOS 6.22",
"typical_error_messages": [
"Invalid switch - /I",
"File not found (long filename)",
"Bad command or file name"
],
"root_cause": "DOS 6.22 does not support /I flag (added in Windows 2000), long filenames, or many modern batch features",
"recommended_solution": "Use duplicate IF statements for upper/lowercase. Keep filenames to 8.3 format. Use basic batch syntax only.",
"alternative_approaches": [
"Duplicate IF for case-insensitive: IF \"%1\"==\"VALUE\" ... + IF \"%1\"==\"value\" ...",
"Use 8.3 filenames only",
"Avoid advanced batch features"
],
"workaround_commands": [
"IF \"%1\"==\"STATUS\" GOTO STATUS",
"IF \"%1\"==\"status\" GOTO STATUS"
],
"occurrence_count": 8,
"severity": "blocking",
"added_to_insights": true
}
```
**ReadyNAS Service Management:**
```json
{
"pattern_type": "service_unavailable",
"pattern_signature": "systemd commands on ReadyNAS",
"error_pattern": "systemctl.*command not found",
"affected_systems": ["D2TESTNAS"],
"triggering_commands": [
"systemctl status nmbd",
"systemctl restart samba"
],
"failure_description": "ReadyNAS does not use systemd for service management",
"typical_error_messages": [
"systemctl: command not found",
"-ash: systemctl: not found"
],
"root_cause": "ReadyNAS OS is based on older Linux without systemd. Uses traditional init scripts.",
"recommended_solution": "Use 'service' command or direct process management: service nmbd status, ps aux | grep nmbd",
"alternative_approaches": [
"service nmbd status",
"ps aux | grep nmbd",
"/etc/init.d/nmbd status"
],
"occurrence_count": 3,
"severity": "major",
"added_to_insights": true
}
```
---
### `operation_failures`
Non-command failures (API calls, integrations, file operations, network requests). Complements commands_run failure tracking.
```sql
CREATE TABLE operation_failures (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE,
client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
-- Operation details
operation_type VARCHAR(100) NOT NULL CHECK(operation_type IN (
'api_call', 'file_operation', 'network_request',
'database_query', 'external_integration', 'service_restart',
'backup_operation', 'restore_operation', 'migration'
)),
operation_description TEXT NOT NULL,
target_system VARCHAR(255), -- host, URL, service name
-- Failure details
error_message TEXT NOT NULL,
error_code VARCHAR(50), -- HTTP status, exit code, error number
failure_category VARCHAR(100), -- "timeout", "authentication", "not_found", etc.
stack_trace TEXT,
-- Context
request_data TEXT, -- JSON: what was attempted
response_data TEXT, -- JSON: error response
environment_snapshot TEXT, -- JSON: relevant env vars, versions
-- Resolution
resolution_applied TEXT,
resolved BOOLEAN DEFAULT false,
resolved_at TIMESTAMP,
time_to_resolution_minutes INTEGER,
-- Pattern linkage
related_pattern_id UUID REFERENCES failure_patterns(id),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_op_failure_session (session_id),
INDEX idx_op_failure_type (operation_type),
INDEX idx_op_failure_category (failure_category),
INDEX idx_op_failure_resolved (resolved),
INDEX idx_op_failure_client (client_id)
);
```
**Example Operation Failures:**
**SyncroMSP API Timeout:**
```json
{
"operation_type": "api_call",
"operation_description": "Search SyncroMSP tickets for Dataforth",
"target_system": "https://azcomputerguru.syncromsp.com/api/v1",
"error_message": "Request timeout after 30 seconds",
"error_code": "ETIMEDOUT",
"failure_category": "timeout",
"request_data": {
"endpoint": "/api/v1/tickets",
"params": {"customer_id": 12345, "status": "open"}
},
"response_data": null,
"resolution_applied": "Increased timeout to 60 seconds. Added retry logic with exponential backoff.",
"resolved": true,
"time_to_resolution_minutes": 15
}
```
**File Upload Permission Denied:**
```json
{
"operation_type": "file_operation",
"operation_description": "Upload backup file to NAS",
"target_system": "D2TESTNAS:/mnt/backups",
"error_message": "Permission denied: /mnt/backups/db_backup_2026-01-15.sql",
"error_code": "EACCES",
"failure_category": "permission",
"environment_snapshot": {
"user": "backupuser",
"directory_perms": "drwxr-xr-x root root"
},
"resolution_applied": "Changed directory ownership: chown -R backupuser:backupgroup /mnt/backups",
"resolved": true
}
```
**Database Query Performance:**
```json
{
"operation_type": "database_query",
"operation_description": "Query sessions table for large date range",
"target_system": "MariaDB msp_tracking",
"error_message": "Query execution time: 45 seconds (threshold: 5 seconds)",
"failure_category": "performance",
"request_data": {
"query": "SELECT * FROM sessions WHERE session_date BETWEEN '2020-01-01' AND '2026-01-15'"
},
"resolution_applied": "Added index on session_date column. Query now runs in 0.3 seconds.",
"resolved": true
}
```
---
## Self-Learning Workflow
### 1. Failure Detection and Logging
**Command Execution with Failure Tracking:**
```
User: "Check WINS status on D2TESTNAS"
Main Claude → Environment Context Agent:
- Queries infrastructure table for D2TESTNAS
- Reads environmental_notes: "Manual WINS install, no native service"
- Reads environmental_insights for D2TESTNAS
- Returns: "D2TESTNAS has manually installed WINS (not native ReadyNAS service)"
Main Claude suggests command based on environmental context:
- Executes: ssh root@192.168.0.9 'systemctl status nmbd'
Command fails:
- success = false
- exit_code = 127
- error_message = "systemctl: command not found"
- failure_category = "command_compatibility"
Trigger Failure Analysis Agent:
- Analyzes error: ReadyNAS doesn't use systemd
- Identifies correct approach: "service nmbd status" or "ps aux | grep nmbd"
- Creates failure_pattern entry
- Updates environmental_insights with correction
- Returns resolution to Main Claude
Main Claude tries corrected command:
- Executes: ssh root@192.168.0.9 'ps aux | grep nmbd'
- Success = true
- Updates original failure record with resolution
```
### 2. Pattern Analysis (Periodic Agent Run)
**Failure Analysis Agent runs periodically:**
**Agent Task:** "Analyze recent failures and update environmental insights"
1. **Query failures:**
```sql
SELECT * FROM commands_run
WHERE success = false AND resolved = false
ORDER BY created_at DESC;
SELECT * FROM operation_failures
WHERE resolved = false
ORDER BY created_at DESC;
```
2. **Group by pattern:**
- Group by infrastructure_id, error_pattern, failure_category
- Identify recurring patterns
3. **Create/update failure_patterns:**
- If pattern seen 3+ times → Create failure_pattern
- Increment occurrence_count for existing patterns
- Update last_seen timestamp
4. **Generate environmental_insights:**
- Transform failure_patterns into actionable insights
- Create markdown-formatted descriptions
- Add command examples
- Set priority based on severity and frequency
5. **Update infrastructure environmental_notes:**
- Add constraints to infrastructure.environmental_notes
- Set powershell_version, shell_type, limitations
6. **Generate insights.md file:**
- Query all environmental_insights for client
- Format as markdown
- Save to D:\ClaudeTools\insights\[client-name].md
- Agents read this file before making suggestions
### 3. Pre-Operation Environment Check
**Environment Context Agent runs before operations:**
**Agent Task:** "Check environmental constraints for D2TESTNAS before command suggestion"
1. **Query infrastructure:**
```sql
SELECT environmental_notes, powershell_version, shell_type, limitations
FROM infrastructure
WHERE id = 'd2testnas-uuid';
```
2. **Query environmental_insights:**
```sql
SELECT insight_title, insight_description, examples, priority
FROM environmental_insights
WHERE infrastructure_id = 'd2testnas-uuid'
AND is_active = true
ORDER BY priority DESC;
```
3. **Query failure_patterns:**
```sql
SELECT pattern_signature, recommended_solution, workaround_commands
FROM failure_patterns
WHERE infrastructure_id = 'd2testnas-uuid'
AND is_active = true;
```
4. **Check proposed command compatibility:**
- Proposed: "systemctl status nmbd"
- Pattern match: "systemctl.*command not found"
- **Result:** INCOMPATIBLE
- Recommended: "ps aux | grep nmbd"
5. **Return environmental context:**
```
Environmental Context for D2TESTNAS:
- ReadyNAS OS (Linux-based)
- Manual WINS installation (Samba nmbd)
- No systemd (use 'service' or ps commands)
- SMB1/CORE protocol for DOS compatibility
Recommended commands:
✓ ps aux | grep nmbd
✓ service nmbd status
✗ systemctl status nmbd (not available)
```
Main Claude uses this context to suggest correct approach.
---
## Benefits
### 1. Self-Improving System
- Each failure makes the system smarter
- Patterns identified automatically
- Insights generated without manual documentation
- Knowledge accumulates over time
### 2. Reduced User Friction
- User doesn't have to keep correcting same mistakes
- Claude learns environmental constraints once
- Suggestions are environmentally aware from start
- Proactive problem prevention
### 3. Institutional Knowledge Capture
- All environmental quirks documented in database
- Survives across sessions and Claude instances
- Queryable: "What are known issues with D2TESTNAS?"
- Transferable to new team members
### 4. Proactive Problem Prevention
- Environment Context Agent prevents failures before they happen
- Suggests compatible alternatives automatically
- Warns about known limitations
- Avoids wasting time on incompatible approaches
### 5. Audit Trail
- Every failure tracked with full context
- Resolution history for troubleshooting
- Pattern analysis for infrastructure planning
- ROI tracking: time saved by avoiding repeat failures
---
## Integration with Other Schemas
**Sources data from:**
- `commands_run` - Command execution failures
- `infrastructure` - System capabilities and limitations
- `work_items` - Context for failures
- `sessions` - Session context for operations
**Provides data to:**
- Environment Context Agent (pre-operation checks)
- Problem Pattern Matching Agent (solution lookup)
- MSP Mode (intelligent suggestions)
- Reporting (failure analysis, improvement metrics)
---
## Example Queries
### Find all insights for a client
```sql
SELECT ei.insight_title, ei.insight_description, i.hostname
FROM environmental_insights ei
JOIN infrastructure i ON ei.infrastructure_id = i.id
WHERE ei.client_id = 'dataforth-uuid'
AND ei.is_active = true
ORDER BY ei.priority DESC;
```
### Search for similar problems
```sql
SELECT ps.problem_title, ps.solution_applied, ps.created_at
FROM problem_solutions ps
WHERE MATCH(ps.problem_description, ps.symptom, ps.error_message)
AGAINST('SSL certificate' IN BOOLEAN MODE)
ORDER BY ps.created_at DESC
LIMIT 10;
```
### Active failure patterns
```sql
SELECT fp.pattern_signature, fp.occurrence_count, fp.recommended_solution
FROM failure_patterns fp
WHERE fp.is_active = true
AND fp.severity IN ('blocking', 'major')
ORDER BY fp.occurrence_count DESC;
```
### Unresolved operation failures
```sql
SELECT of.operation_type, of.target_system, of.error_message, of.created_at
FROM operation_failures of
WHERE of.resolved = false
ORDER BY of.created_at DESC;
```
---
**Document Version:** 1.0
**Last Updated:** 2026-01-15
**Author:** MSP Mode Schema Design Team