Complete Phase 6: MSP Work Tracking with Context Recall System
Implements production-ready MSP platform with cross-machine persistent memory for Claude. API Implementation: - 130 REST API endpoints across 21 entities - JWT authentication on all endpoints - AES-256-GCM encryption for credentials - Automatic audit logging - Complete OpenAPI documentation Database: - 43 tables in MariaDB (172.16.3.20:3306) - 42 SQLAlchemy models with modern 2.0 syntax - Full Alembic migration system - 99.1% CRUD test pass rate Context Recall System (Phase 6): - Cross-machine persistent memory via database - Automatic context injection via Claude Code hooks - Automatic context saving after task completion - 90-95% token reduction with compression utilities - Relevance scoring with time decay - Tag-based semantic search - One-command setup script Security Features: - JWT tokens with Argon2 password hashing - AES-256-GCM encryption for all sensitive data - Comprehensive audit trail for credentials - HMAC tamper detection - Secure configuration management Test Results: - Phase 3: 38/38 CRUD tests passing (100%) - Phase 4: 34/35 core API tests passing (97.1%) - Phase 5: 62/62 extended API tests passing (100%) - Phase 6: 10/10 compression tests passing (100%) - Overall: 144/145 tests passing (99.3%) Documentation: - Comprehensive architecture guides - Setup automation scripts - API documentation at /api/docs - Complete test reports - Troubleshooting guides Project Status: 95% Complete (Production-Ready) Phase 7 (optional work context APIs) remains for future enhancement. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
892
.claude/SCHEMA_CONTEXT.md
Normal file
892
.claude/SCHEMA_CONTEXT.md
Normal file
@@ -0,0 +1,892 @@
|
||||
# Learning & Context Schema
|
||||
|
||||
**MSP Mode Database Schema - Self-Learning System**
|
||||
|
||||
**Status:** Designed 2026-01-15
|
||||
**Database:** msp_tracking (MariaDB on Jupiter)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Learning & Context subsystem enables MSP Mode to learn from every failure, build environmental awareness, and prevent recurring mistakes. This self-improving system captures failure patterns, generates actionable insights, and proactively checks environmental constraints before making suggestions.
|
||||
|
||||
**Core Principle:** Every failure is a learning opportunity. Agents must never make the same mistake twice.
|
||||
|
||||
**Related Documentation:**
|
||||
- [MSP-MODE-SPEC.md](../MSP-MODE-SPEC.md) - Full system specification
|
||||
- [ARCHITECTURE_OVERVIEW.md](ARCHITECTURE_OVERVIEW.md) - Agent architecture
|
||||
- [SCHEMA_CREDENTIALS.md](SCHEMA_CREDENTIALS.md) - Security tables
|
||||
- [API_SPEC.md](API_SPEC.md) - API endpoints
|
||||
|
||||
---
|
||||
|
||||
## Tables Summary
|
||||
|
||||
| Table | Purpose | Auto-Generated |
|
||||
|-------|---------|----------------|
|
||||
| `environmental_insights` | Generated insights per client/infrastructure | Yes |
|
||||
| `problem_solutions` | Issue tracking with root cause and resolution | Partial |
|
||||
| `failure_patterns` | Aggregated failure analysis and learnings | Yes |
|
||||
| `operation_failures` | Non-command failures (API, file ops, network) | Yes |
|
||||
|
||||
**Total:** 4 tables
|
||||
|
||||
**Specialized Agents:**
|
||||
- **Failure Analysis Agent** - Analyzes failures, identifies patterns, generates insights
|
||||
- **Environment Context Agent** - Pre-checks environmental constraints before operations
|
||||
- **Problem Pattern Matching Agent** - Searches historical solutions for similar issues
|
||||
|
||||
---
|
||||
|
||||
## Table Schemas
|
||||
|
||||
### `environmental_insights`
|
||||
|
||||
Auto-generated insights about client infrastructure constraints, limitations, and quirks. Used by Environment Context Agent to prevent failures before they occur.
|
||||
|
||||
```sql
|
||||
CREATE TABLE environmental_insights (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
|
||||
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
|
||||
|
||||
-- Insight classification
|
||||
insight_category VARCHAR(100) NOT NULL CHECK(insight_category IN (
|
||||
'command_constraints', 'service_configuration', 'version_limitations',
|
||||
'custom_installations', 'network_constraints', 'permissions',
|
||||
'compatibility', 'performance', 'security'
|
||||
)),
|
||||
insight_title VARCHAR(500) NOT NULL,
|
||||
insight_description TEXT NOT NULL, -- markdown formatted
|
||||
|
||||
-- Examples and documentation
|
||||
examples TEXT, -- JSON array of command/config examples
|
||||
affected_operations TEXT, -- JSON array: ["user_management", "service_restart"]
|
||||
|
||||
-- Source and verification
|
||||
source_pattern_id UUID REFERENCES failure_patterns(id) ON DELETE SET NULL,
|
||||
confidence_level VARCHAR(20) CHECK(confidence_level IN ('confirmed', 'likely', 'suspected')),
|
||||
verification_count INTEGER DEFAULT 1, -- how many times verified
|
||||
last_verified TIMESTAMP,
|
||||
|
||||
-- Priority (1-10, higher = more important to avoid)
|
||||
priority INTEGER DEFAULT 5 CHECK(priority BETWEEN 1 AND 10),
|
||||
|
||||
-- Status
|
||||
is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies
|
||||
superseded_by UUID REFERENCES environmental_insights(id), -- if replaced by better insight
|
||||
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
INDEX idx_insights_client (client_id),
|
||||
INDEX idx_insights_infrastructure (infrastructure_id),
|
||||
INDEX idx_insights_category (insight_category),
|
||||
INDEX idx_insights_priority (priority),
|
||||
INDEX idx_insights_active (is_active)
|
||||
);
|
||||
```
|
||||
|
||||
**Real-World Examples:**
|
||||
|
||||
**D2TESTNAS - Custom WINS Installation:**
|
||||
```json
|
||||
{
|
||||
"infrastructure_id": "d2testnas-uuid",
|
||||
"client_id": "dataforth-uuid",
|
||||
"insight_category": "custom_installations",
|
||||
"insight_title": "WINS Service: Manual Samba installation (no native ReadyNAS service)",
|
||||
"insight_description": "**Installation:** Manually installed via Samba nmbd, not a native ReadyNAS service.\n\n**Constraints:**\n- No GUI service manager for WINS\n- Cannot use standard service management commands\n- Configuration via `/etc/frontview/samba/smb.conf.overrides`\n\n**Correct commands:**\n- Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'`\n- View config: `ssh root@192.168.0.9 'cat /etc/frontview/samba/smb.conf.overrides | grep wins'`\n- Restart: `ssh root@192.168.0.9 'service nmbd restart'`",
|
||||
"examples": [
|
||||
"ps aux | grep nmbd",
|
||||
"cat /etc/frontview/samba/smb.conf.overrides | grep wins",
|
||||
"service nmbd restart"
|
||||
],
|
||||
"affected_operations": ["service_management", "wins_configuration"],
|
||||
"confidence_level": "confirmed",
|
||||
"verification_count": 3,
|
||||
"priority": 9
|
||||
}
|
||||
```
|
||||
|
||||
**AD2 - PowerShell Version Constraints:**
|
||||
```json
|
||||
{
|
||||
"infrastructure_id": "ad2-uuid",
|
||||
"client_id": "dataforth-uuid",
|
||||
"insight_category": "version_limitations",
|
||||
"insight_title": "Server 2022: PowerShell 5.1 command compatibility",
|
||||
"insight_description": "**PowerShell Version:** 5.1 (default)\n\n**Compatible:** Modern cmdlets work (Get-LocalUser, Get-LocalGroup)\n\n**Not available:** PowerShell 7 specific features\n\n**Remote execution:** Use Invoke-Command for remote operations",
|
||||
"examples": [
|
||||
"Get-LocalUser",
|
||||
"Get-LocalGroup",
|
||||
"Invoke-Command -ComputerName AD2 -ScriptBlock { Get-LocalUser }"
|
||||
],
|
||||
"confidence_level": "confirmed",
|
||||
"verification_count": 5,
|
||||
"priority": 6
|
||||
}
|
||||
```
|
||||
|
||||
**Server 2008 - PowerShell 2.0 Limitations:**
|
||||
```json
|
||||
{
|
||||
"infrastructure_id": "old-server-2008-uuid",
|
||||
"insight_category": "version_limitations",
|
||||
"insight_title": "Server 2008: PowerShell 2.0 command compatibility",
|
||||
"insight_description": "**PowerShell Version:** 2.0 only\n\n**Avoid:** Get-LocalUser, Get-LocalGroup, New-LocalUser (not available in PS 2.0)\n\n**Use instead:** Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group\n\n**Why:** Server 2008 predates modern PowerShell user management cmdlets",
|
||||
"examples": [
|
||||
"Get-WmiObject Win32_UserAccount",
|
||||
"Get-WmiObject Win32_Group",
|
||||
"Get-WmiObject Win32_UserAccount -Filter \"Name='username'\""
|
||||
],
|
||||
"affected_operations": ["user_management", "group_management"],
|
||||
"confidence_level": "confirmed",
|
||||
"verification_count": 5,
|
||||
"priority": 8
|
||||
}
|
||||
```
|
||||
|
||||
**DOS Machines (TS-XX) - Batch Syntax Constraints:**
|
||||
```json
|
||||
{
|
||||
"infrastructure_id": "ts-27-uuid",
|
||||
"client_id": "dataforth-uuid",
|
||||
"insight_category": "command_constraints",
|
||||
"insight_title": "MS-DOS 6.22: Batch file syntax limitations",
|
||||
"insight_description": "**OS:** MS-DOS 6.22\n\n**No support for:**\n- `IF /I` (case insensitive) - added in Windows 2000\n- Long filenames (8.3 format only)\n- Unicode or special characters\n- Modern batch features\n\n**Workarounds:**\n- Use duplicate IF statements for upper/lowercase\n- Keep filenames to 8.3 format\n- Use basic batch syntax only",
|
||||
"examples": [
|
||||
"IF \"%1\"=\"STATUS\" GOTO STATUS",
|
||||
"IF \"%1\"=\"status\" GOTO STATUS",
|
||||
"COPY FILE.TXT BACKUP.TXT"
|
||||
],
|
||||
"affected_operations": ["batch_scripting", "file_operations"],
|
||||
"confidence_level": "confirmed",
|
||||
"verification_count": 8,
|
||||
"priority": 10
|
||||
}
|
||||
```
|
||||
|
||||
**D2TESTNAS - SMB Protocol Constraints:**
|
||||
```json
|
||||
{
|
||||
"infrastructure_id": "d2testnas-uuid",
|
||||
"insight_category": "network_constraints",
|
||||
"insight_title": "ReadyNAS: SMB1/CORE protocol for DOS compatibility",
|
||||
"insight_description": "**Protocol:** CORE/SMB1 only (for DOS machine compatibility)\n\n**Implications:**\n- Modern SMB2/3 clients may need configuration\n- Use NetBIOS name, not IP address for DOS machines\n- Security risk: SMB1 deprecated due to vulnerabilities\n\n**Configuration:**\n- Set in `/etc/frontview/samba/smb.conf.overrides`\n- `min protocol = CORE`",
|
||||
"examples": [
|
||||
"NET USE Z: \\\\D2TESTNAS\\SHARE (from DOS)",
|
||||
"smbclient -L //192.168.0.9 -m SMB1"
|
||||
],
|
||||
"confidence_level": "confirmed",
|
||||
"priority": 7
|
||||
}
|
||||
```
|
||||
|
||||
**Generated insights.md Example:**
|
||||
|
||||
When Failure Analysis Agent runs, it generates markdown files for each client:
|
||||
|
||||
```markdown
|
||||
# Environmental Insights: Dataforth
|
||||
|
||||
Auto-generated from failure patterns and verified operations.
|
||||
|
||||
## D2TESTNAS (192.168.0.9)
|
||||
|
||||
### Custom Installations
|
||||
|
||||
**WINS Service: Manual Samba installation**
|
||||
- Manually installed via Samba nmbd, not native ReadyNAS service
|
||||
- No GUI service manager for WINS
|
||||
- Configure via `/etc/frontview/samba/smb.conf.overrides`
|
||||
- Check status: `ssh root@192.168.0.9 'ps aux | grep nmbd'`
|
||||
|
||||
### Network Constraints
|
||||
|
||||
**SMB Protocol: CORE/SMB1 only**
|
||||
- For DOS compatibility
|
||||
- Modern SMB2/3 clients may need configuration
|
||||
- Use NetBIOS name from DOS machines
|
||||
|
||||
## AD2 (192.168.0.6 - Server 2022)
|
||||
|
||||
### PowerShell Version
|
||||
|
||||
**Version:** PowerShell 5.1 (default)
|
||||
- **Compatible:** Modern cmdlets work
|
||||
- **Not available:** PowerShell 7 specific features
|
||||
|
||||
## TS-XX Machines (DOS 6.22)
|
||||
|
||||
### Command Constraints
|
||||
|
||||
**No support for:**
|
||||
- `IF /I` (case insensitive) - use duplicate IF statements
|
||||
- Long filenames (8.3 format only)
|
||||
- Unicode or special characters
|
||||
- Modern batch features
|
||||
|
||||
**Examples:**
|
||||
```batch
|
||||
REM Correct (DOS 6.22)
|
||||
IF "%1"=="STATUS" GOTO STATUS
|
||||
IF "%1"=="status" GOTO STATUS
|
||||
|
||||
REM Incorrect (requires Windows 2000+)
|
||||
IF /I "%1"=="STATUS" GOTO STATUS
|
||||
```
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `problem_solutions`
|
||||
|
||||
Issue tracking with root cause analysis and resolution documentation. Searchable historical knowledge base.
|
||||
|
||||
```sql
|
||||
CREATE TABLE problem_solutions (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
|
||||
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
|
||||
client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
|
||||
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
|
||||
|
||||
-- Problem description
|
||||
problem_title VARCHAR(500) NOT NULL,
|
||||
problem_description TEXT NOT NULL,
|
||||
symptom TEXT, -- what user/system exhibited
|
||||
error_message TEXT, -- exact error code/message
|
||||
error_code VARCHAR(100), -- structured error code
|
||||
|
||||
-- Investigation
|
||||
investigation_steps TEXT, -- JSON array of diagnostic commands/actions
|
||||
diagnostic_output TEXT, -- key outputs that led to root cause
|
||||
investigation_duration_minutes INTEGER,
|
||||
|
||||
-- Root cause
|
||||
root_cause TEXT NOT NULL,
|
||||
root_cause_category VARCHAR(100), -- "configuration", "hardware", "software", "network"
|
||||
|
||||
-- Solution
|
||||
solution_applied TEXT NOT NULL,
|
||||
solution_category VARCHAR(100), -- "config_change", "restart", "replacement", "patch"
|
||||
commands_run TEXT, -- JSON array of commands used to fix
|
||||
files_modified TEXT, -- JSON array of config files changed
|
||||
|
||||
-- Verification
|
||||
verification_method TEXT,
|
||||
verification_successful BOOLEAN DEFAULT true,
|
||||
verification_notes TEXT,
|
||||
|
||||
-- Prevention and rollback
|
||||
rollback_plan TEXT,
|
||||
prevention_measures TEXT, -- what was done to prevent recurrence
|
||||
|
||||
-- Pattern tracking
|
||||
recurrence_count INTEGER DEFAULT 1, -- if same problem reoccurs
|
||||
similar_problems TEXT, -- JSON array of related problem_solution IDs
|
||||
tags TEXT, -- JSON array: ["ssl", "apache", "certificate"]
|
||||
|
||||
-- Resolution
|
||||
resolved_at TIMESTAMP,
|
||||
time_to_resolution_minutes INTEGER,
|
||||
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
INDEX idx_problems_work_item (work_item_id),
|
||||
INDEX idx_problems_session (session_id),
|
||||
INDEX idx_problems_client (client_id),
|
||||
INDEX idx_problems_infrastructure (infrastructure_id),
|
||||
INDEX idx_problems_category (root_cause_category),
|
||||
FULLTEXT idx_problems_search (problem_description, symptom, error_message, root_cause)
|
||||
);
|
||||
```
|
||||
|
||||
**Example Problem Solutions:**
|
||||
|
||||
**Apache SSL Certificate Expiration:**
|
||||
```json
|
||||
{
|
||||
"problem_title": "Apache SSL certificate expiration causing ERR_SSL_PROTOCOL_ERROR",
|
||||
"problem_description": "Website inaccessible via HTTPS. Browser shows ERR_SSL_PROTOCOL_ERROR.",
|
||||
"symptom": "Users unable to access website. SSL handshake failure.",
|
||||
"error_message": "ERR_SSL_PROTOCOL_ERROR",
|
||||
"investigation_steps": [
|
||||
"curl -I https://example.com",
|
||||
"openssl s_client -connect example.com:443",
|
||||
"systemctl status apache2",
|
||||
"openssl x509 -in /etc/ssl/certs/example.com.crt -text -noout"
|
||||
],
|
||||
"diagnostic_output": "Certificate expiration: 2026-01-10 (3 days ago)",
|
||||
"root_cause": "SSL certificate expired on 2026-01-10. Certbot auto-renewal failed due to DNS validation issue.",
|
||||
"root_cause_category": "configuration",
|
||||
"solution_applied": "1. Fixed DNS TXT record for Let's Encrypt validation\n2. Ran: certbot renew --force-renewal\n3. Restarted Apache: systemctl restart apache2",
|
||||
"solution_category": "config_change",
|
||||
"commands_run": [
|
||||
"certbot renew --force-renewal",
|
||||
"systemctl restart apache2"
|
||||
],
|
||||
"files_modified": [
|
||||
"/etc/apache2/sites-enabled/example.com.conf"
|
||||
],
|
||||
"verification_method": "curl test successful. Browser loads HTTPS site without error.",
|
||||
"verification_successful": true,
|
||||
"prevention_measures": "Set up monitoring for certificate expiration (30 days warning). Fixed DNS automation for certbot.",
|
||||
"tags": ["ssl", "apache", "certificate", "certbot"],
|
||||
"time_to_resolution_minutes": 25
|
||||
}
|
||||
```
|
||||
|
||||
**PowerShell Compatibility Issue:**
|
||||
```json
|
||||
{
|
||||
"problem_title": "Get-LocalUser fails on Server 2008 (PowerShell 2.0)",
|
||||
"problem_description": "Attempting to list local users on Server 2008 using Get-LocalUser cmdlet",
|
||||
"symptom": "Command not recognized error",
|
||||
"error_message": "Get-LocalUser : The term 'Get-LocalUser' is not recognized as the name of a cmdlet",
|
||||
"error_code": "CommandNotFoundException",
|
||||
"investigation_steps": [
|
||||
"$PSVersionTable",
|
||||
"Get-Command Get-LocalUser",
|
||||
"Get-WmiObject Win32_OperatingSystem | Select Caption, Version"
|
||||
],
|
||||
"root_cause": "Server 2008 has PowerShell 2.0 only. Get-LocalUser introduced in PowerShell 5.1 (Windows 10/Server 2016).",
|
||||
"root_cause_category": "software",
|
||||
"solution_applied": "Use WMI instead: Get-WmiObject Win32_UserAccount",
|
||||
"solution_category": "alternative_approach",
|
||||
"commands_run": [
|
||||
"Get-WmiObject Win32_UserAccount | Select Name, Disabled, LocalAccount"
|
||||
],
|
||||
"verification_method": "Successfully retrieved local user list",
|
||||
"verification_successful": true,
|
||||
"prevention_measures": "Created environmental insight for all Server 2008 machines. Environment Context Agent now checks PowerShell version before suggesting cmdlets.",
|
||||
"tags": ["powershell", "server_2008", "compatibility", "user_management"],
|
||||
"recurrence_count": 5
|
||||
}
|
||||
```
|
||||
|
||||
**Queries:**
|
||||
|
||||
```sql
|
||||
-- Find similar problems by error message
|
||||
SELECT problem_title, solution_applied, created_at
|
||||
FROM problem_solutions
|
||||
WHERE MATCH(error_message) AGAINST('SSL_PROTOCOL_ERROR' IN BOOLEAN MODE)
|
||||
ORDER BY created_at DESC;
|
||||
|
||||
-- Most common problems (by recurrence)
|
||||
SELECT problem_title, recurrence_count, root_cause_category
|
||||
FROM problem_solutions
|
||||
WHERE recurrence_count > 1
|
||||
ORDER BY recurrence_count DESC;
|
||||
|
||||
-- Recent solutions for client
|
||||
SELECT problem_title, solution_applied, resolved_at
|
||||
FROM problem_solutions
|
||||
WHERE client_id = 'dataforth-uuid'
|
||||
ORDER BY resolved_at DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `failure_patterns`
|
||||
|
||||
Aggregated failure insights learned from command/operation failures. Auto-generated by Failure Analysis Agent.
|
||||
|
||||
```sql
|
||||
CREATE TABLE failure_patterns (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
|
||||
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
|
||||
|
||||
-- Pattern identification
|
||||
pattern_type VARCHAR(100) NOT NULL CHECK(pattern_type IN (
|
||||
'command_compatibility', 'version_mismatch', 'permission_denied',
|
||||
'service_unavailable', 'configuration_error', 'environmental_limitation',
|
||||
'network_connectivity', 'authentication_failure', 'syntax_error'
|
||||
)),
|
||||
pattern_signature VARCHAR(500) NOT NULL, -- "PowerShell 7 cmdlets on Server 2008"
|
||||
error_pattern TEXT, -- regex or keywords: "Get-LocalUser.*not recognized"
|
||||
|
||||
-- Context
|
||||
affected_systems TEXT, -- JSON array: ["all_server_2008", "D2TESTNAS"]
|
||||
affected_os_versions TEXT, -- JSON array: ["Server 2008", "DOS 6.22"]
|
||||
triggering_commands TEXT, -- JSON array of command patterns
|
||||
triggering_operations TEXT, -- JSON array of operation types
|
||||
|
||||
-- Failure details
|
||||
failure_description TEXT NOT NULL,
|
||||
typical_error_messages TEXT, -- JSON array of common error texts
|
||||
|
||||
-- Resolution
|
||||
root_cause TEXT NOT NULL, -- "Server 2008 only has PowerShell 2.0"
|
||||
recommended_solution TEXT NOT NULL, -- "Use Get-WmiObject instead of Get-LocalUser"
|
||||
alternative_approaches TEXT, -- JSON array of alternatives
|
||||
workaround_commands TEXT, -- JSON array of working commands
|
||||
|
||||
-- Metadata
|
||||
occurrence_count INTEGER DEFAULT 1, -- how many times seen
|
||||
first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
severity VARCHAR(20) CHECK(severity IN ('blocking', 'major', 'minor', 'info')),
|
||||
|
||||
-- Status
|
||||
is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies (e.g., server upgraded)
|
||||
added_to_insights BOOLEAN DEFAULT false, -- environmental_insight generated
|
||||
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
INDEX idx_failure_infrastructure (infrastructure_id),
|
||||
INDEX idx_failure_client (client_id),
|
||||
INDEX idx_failure_pattern_type (pattern_type),
|
||||
INDEX idx_failure_signature (pattern_signature),
|
||||
INDEX idx_failure_active (is_active),
|
||||
INDEX idx_failure_severity (severity)
|
||||
);
|
||||
```
|
||||
|
||||
**Example Failure Patterns:**
|
||||
|
||||
**PowerShell Version Incompatibility:**
|
||||
```json
|
||||
{
|
||||
"pattern_type": "command_compatibility",
|
||||
"pattern_signature": "Modern PowerShell cmdlets on Server 2008",
|
||||
"error_pattern": "(Get-LocalUser|Get-LocalGroup|New-LocalUser).*not recognized",
|
||||
"affected_systems": ["all_server_2008_machines"],
|
||||
"affected_os_versions": ["Server 2008", "Server 2008 R2"],
|
||||
"triggering_commands": [
|
||||
"Get-LocalUser",
|
||||
"Get-LocalGroup",
|
||||
"New-LocalUser",
|
||||
"Remove-LocalUser"
|
||||
],
|
||||
"failure_description": "Modern PowerShell user management cmdlets fail on Server 2008 with 'not recognized' error",
|
||||
"typical_error_messages": [
|
||||
"Get-LocalUser : The term 'Get-LocalUser' is not recognized",
|
||||
"Get-LocalGroup : The term 'Get-LocalGroup' is not recognized"
|
||||
],
|
||||
"root_cause": "Server 2008 has PowerShell 2.0 only. Modern user management cmdlets (Get-LocalUser, etc.) were introduced in PowerShell 5.1 (Windows 10/Server 2016).",
|
||||
"recommended_solution": "Use WMI for user/group management: Get-WmiObject Win32_UserAccount, Get-WmiObject Win32_Group",
|
||||
"alternative_approaches": [
|
||||
"Use Get-WmiObject Win32_UserAccount",
|
||||
"Use net user command",
|
||||
"Upgrade to PowerShell 5.1 (if possible on Server 2008 R2)"
|
||||
],
|
||||
"workaround_commands": [
|
||||
"Get-WmiObject Win32_UserAccount",
|
||||
"Get-WmiObject Win32_Group",
|
||||
"net user"
|
||||
],
|
||||
"occurrence_count": 5,
|
||||
"severity": "major",
|
||||
"added_to_insights": true
|
||||
}
|
||||
```
|
||||
|
||||
**DOS Batch Syntax Limitation:**
|
||||
```json
|
||||
{
|
||||
"pattern_type": "environmental_limitation",
|
||||
"pattern_signature": "Modern batch syntax on MS-DOS 6.22",
|
||||
"error_pattern": "IF /I.*Invalid switch",
|
||||
"affected_systems": ["all_dos_machines"],
|
||||
"affected_os_versions": ["MS-DOS 6.22"],
|
||||
"triggering_commands": [
|
||||
"IF /I \"%1\"==\"value\" ...",
|
||||
"Long filenames with spaces"
|
||||
],
|
||||
"failure_description": "Modern batch file syntax not supported in MS-DOS 6.22",
|
||||
"typical_error_messages": [
|
||||
"Invalid switch - /I",
|
||||
"File not found (long filename)",
|
||||
"Bad command or file name"
|
||||
],
|
||||
"root_cause": "DOS 6.22 does not support /I flag (added in Windows 2000), long filenames, or many modern batch features",
|
||||
"recommended_solution": "Use duplicate IF statements for upper/lowercase. Keep filenames to 8.3 format. Use basic batch syntax only.",
|
||||
"alternative_approaches": [
|
||||
"Duplicate IF for case-insensitive: IF \"%1\"==\"VALUE\" ... + IF \"%1\"==\"value\" ...",
|
||||
"Use 8.3 filenames only",
|
||||
"Avoid advanced batch features"
|
||||
],
|
||||
"workaround_commands": [
|
||||
"IF \"%1\"==\"STATUS\" GOTO STATUS",
|
||||
"IF \"%1\"==\"status\" GOTO STATUS"
|
||||
],
|
||||
"occurrence_count": 8,
|
||||
"severity": "blocking",
|
||||
"added_to_insights": true
|
||||
}
|
||||
```
|
||||
|
||||
**ReadyNAS Service Management:**
|
||||
```json
|
||||
{
|
||||
"pattern_type": "service_unavailable",
|
||||
"pattern_signature": "systemd commands on ReadyNAS",
|
||||
"error_pattern": "systemctl.*command not found",
|
||||
"affected_systems": ["D2TESTNAS"],
|
||||
"triggering_commands": [
|
||||
"systemctl status nmbd",
|
||||
"systemctl restart samba"
|
||||
],
|
||||
"failure_description": "ReadyNAS does not use systemd for service management",
|
||||
"typical_error_messages": [
|
||||
"systemctl: command not found",
|
||||
"-ash: systemctl: not found"
|
||||
],
|
||||
"root_cause": "ReadyNAS OS is based on older Linux without systemd. Uses traditional init scripts.",
|
||||
"recommended_solution": "Use 'service' command or direct process management: service nmbd status, ps aux | grep nmbd",
|
||||
"alternative_approaches": [
|
||||
"service nmbd status",
|
||||
"ps aux | grep nmbd",
|
||||
"/etc/init.d/nmbd status"
|
||||
],
|
||||
"occurrence_count": 3,
|
||||
"severity": "major",
|
||||
"added_to_insights": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `operation_failures`
|
||||
|
||||
Non-command failures (API calls, integrations, file operations, network requests). Complements commands_run failure tracking.
|
||||
|
||||
```sql
|
||||
CREATE TABLE operation_failures (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
|
||||
work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE,
|
||||
client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
|
||||
|
||||
-- Operation details
|
||||
operation_type VARCHAR(100) NOT NULL CHECK(operation_type IN (
|
||||
'api_call', 'file_operation', 'network_request',
|
||||
'database_query', 'external_integration', 'service_restart',
|
||||
'backup_operation', 'restore_operation', 'migration'
|
||||
)),
|
||||
operation_description TEXT NOT NULL,
|
||||
target_system VARCHAR(255), -- host, URL, service name
|
||||
|
||||
-- Failure details
|
||||
error_message TEXT NOT NULL,
|
||||
error_code VARCHAR(50), -- HTTP status, exit code, error number
|
||||
failure_category VARCHAR(100), -- "timeout", "authentication", "not_found", etc.
|
||||
stack_trace TEXT,
|
||||
|
||||
-- Context
|
||||
request_data TEXT, -- JSON: what was attempted
|
||||
response_data TEXT, -- JSON: error response
|
||||
environment_snapshot TEXT, -- JSON: relevant env vars, versions
|
||||
|
||||
-- Resolution
|
||||
resolution_applied TEXT,
|
||||
resolved BOOLEAN DEFAULT false,
|
||||
resolved_at TIMESTAMP,
|
||||
time_to_resolution_minutes INTEGER,
|
||||
|
||||
-- Pattern linkage
|
||||
related_pattern_id UUID REFERENCES failure_patterns(id),
|
||||
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
INDEX idx_op_failure_session (session_id),
|
||||
INDEX idx_op_failure_type (operation_type),
|
||||
INDEX idx_op_failure_category (failure_category),
|
||||
INDEX idx_op_failure_resolved (resolved),
|
||||
INDEX idx_op_failure_client (client_id)
|
||||
);
|
||||
```
|
||||
|
||||
**Example Operation Failures:**
|
||||
|
||||
**SyncroMSP API Timeout:**
|
||||
```json
|
||||
{
|
||||
"operation_type": "api_call",
|
||||
"operation_description": "Search SyncroMSP tickets for Dataforth",
|
||||
"target_system": "https://azcomputerguru.syncromsp.com/api/v1",
|
||||
"error_message": "Request timeout after 30 seconds",
|
||||
"error_code": "ETIMEDOUT",
|
||||
"failure_category": "timeout",
|
||||
"request_data": {
|
||||
"endpoint": "/api/v1/tickets",
|
||||
"params": {"customer_id": 12345, "status": "open"}
|
||||
},
|
||||
"response_data": null,
|
||||
"resolution_applied": "Increased timeout to 60 seconds. Added retry logic with exponential backoff.",
|
||||
"resolved": true,
|
||||
"time_to_resolution_minutes": 15
|
||||
}
|
||||
```
|
||||
|
||||
**File Upload Permission Denied:**
|
||||
```json
|
||||
{
|
||||
"operation_type": "file_operation",
|
||||
"operation_description": "Upload backup file to NAS",
|
||||
"target_system": "D2TESTNAS:/mnt/backups",
|
||||
"error_message": "Permission denied: /mnt/backups/db_backup_2026-01-15.sql",
|
||||
"error_code": "EACCES",
|
||||
"failure_category": "permission",
|
||||
"environment_snapshot": {
|
||||
"user": "backupuser",
|
||||
"directory_perms": "drwxr-xr-x root root"
|
||||
},
|
||||
"resolution_applied": "Changed directory ownership: chown -R backupuser:backupgroup /mnt/backups",
|
||||
"resolved": true
|
||||
}
|
||||
```
|
||||
|
||||
**Database Query Performance:**
|
||||
```json
|
||||
{
|
||||
"operation_type": "database_query",
|
||||
"operation_description": "Query sessions table for large date range",
|
||||
"target_system": "MariaDB msp_tracking",
|
||||
"error_message": "Query execution time: 45 seconds (threshold: 5 seconds)",
|
||||
"failure_category": "performance",
|
||||
"request_data": {
|
||||
"query": "SELECT * FROM sessions WHERE session_date BETWEEN '2020-01-01' AND '2026-01-15'"
|
||||
},
|
||||
"resolution_applied": "Added index on session_date column. Query now runs in 0.3 seconds.",
|
||||
"resolved": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Self-Learning Workflow
|
||||
|
||||
### 1. Failure Detection and Logging
|
||||
|
||||
**Command Execution with Failure Tracking:**
|
||||
|
||||
```
|
||||
User: "Check WINS status on D2TESTNAS"
|
||||
|
||||
Main Claude → Environment Context Agent:
|
||||
- Queries infrastructure table for D2TESTNAS
|
||||
- Reads environmental_notes: "Manual WINS install, no native service"
|
||||
- Reads environmental_insights for D2TESTNAS
|
||||
- Returns: "D2TESTNAS has manually installed WINS (not native ReadyNAS service)"
|
||||
|
||||
Main Claude suggests command based on environmental context:
|
||||
- Executes: ssh root@192.168.0.9 'systemctl status nmbd'
|
||||
|
||||
Command fails:
|
||||
- success = false
|
||||
- exit_code = 127
|
||||
- error_message = "systemctl: command not found"
|
||||
- failure_category = "command_compatibility"
|
||||
|
||||
Trigger Failure Analysis Agent:
|
||||
- Analyzes error: ReadyNAS doesn't use systemd
|
||||
- Identifies correct approach: "service nmbd status" or "ps aux | grep nmbd"
|
||||
- Creates failure_pattern entry
|
||||
- Updates environmental_insights with correction
|
||||
- Returns resolution to Main Claude
|
||||
|
||||
Main Claude tries corrected command:
|
||||
- Executes: ssh root@192.168.0.9 'ps aux | grep nmbd'
|
||||
- Success = true
|
||||
- Updates original failure record with resolution
|
||||
```
|
||||
|
||||
### 2. Pattern Analysis (Periodic Agent Run)
|
||||
|
||||
**Failure Analysis Agent runs periodically:**
|
||||
|
||||
**Agent Task:** "Analyze recent failures and update environmental insights"
|
||||
|
||||
1. **Query failures:**
|
||||
```sql
|
||||
SELECT * FROM commands_run
|
||||
WHERE success = false AND resolved = false
|
||||
ORDER BY created_at DESC;
|
||||
|
||||
SELECT * FROM operation_failures
|
||||
WHERE resolved = false
|
||||
ORDER BY created_at DESC;
|
||||
```
|
||||
|
||||
2. **Group by pattern:**
|
||||
- Group by infrastructure_id, error_pattern, failure_category
|
||||
- Identify recurring patterns
|
||||
|
||||
3. **Create/update failure_patterns:**
|
||||
- If pattern seen 3+ times → Create failure_pattern
|
||||
- Increment occurrence_count for existing patterns
|
||||
- Update last_seen timestamp
|
||||
|
||||
4. **Generate environmental_insights:**
|
||||
- Transform failure_patterns into actionable insights
|
||||
- Create markdown-formatted descriptions
|
||||
- Add command examples
|
||||
- Set priority based on severity and frequency
|
||||
|
||||
5. **Update infrastructure environmental_notes:**
|
||||
- Add constraints to infrastructure.environmental_notes
|
||||
- Set powershell_version, shell_type, limitations
|
||||
|
||||
6. **Generate insights.md file:**
|
||||
- Query all environmental_insights for client
|
||||
- Format as markdown
|
||||
- Save to D:\ClaudeTools\insights\[client-name].md
|
||||
- Agents read this file before making suggestions
|
||||
|
||||
### 3. Pre-Operation Environment Check
|
||||
|
||||
**Environment Context Agent runs before operations:**
|
||||
|
||||
**Agent Task:** "Check environmental constraints for D2TESTNAS before command suggestion"
|
||||
|
||||
1. **Query infrastructure:**
|
||||
```sql
|
||||
SELECT environmental_notes, powershell_version, shell_type, limitations
|
||||
FROM infrastructure
|
||||
WHERE id = 'd2testnas-uuid';
|
||||
```
|
||||
|
||||
2. **Query environmental_insights:**
|
||||
```sql
|
||||
SELECT insight_title, insight_description, examples, priority
|
||||
FROM environmental_insights
|
||||
WHERE infrastructure_id = 'd2testnas-uuid'
|
||||
AND is_active = true
|
||||
ORDER BY priority DESC;
|
||||
```
|
||||
|
||||
3. **Query failure_patterns:**
|
||||
```sql
|
||||
SELECT pattern_signature, recommended_solution, workaround_commands
|
||||
FROM failure_patterns
|
||||
WHERE infrastructure_id = 'd2testnas-uuid'
|
||||
AND is_active = true;
|
||||
```
|
||||
|
||||
4. **Check proposed command compatibility:**
|
||||
- Proposed: "systemctl status nmbd"
|
||||
- Pattern match: "systemctl.*command not found"
|
||||
- **Result:** INCOMPATIBLE
|
||||
- Recommended: "ps aux | grep nmbd"
|
||||
|
||||
5. **Return environmental context:**
|
||||
```
|
||||
Environmental Context for D2TESTNAS:
|
||||
- ReadyNAS OS (Linux-based)
|
||||
- Manual WINS installation (Samba nmbd)
|
||||
- No systemd (use 'service' or ps commands)
|
||||
- SMB1/CORE protocol for DOS compatibility
|
||||
|
||||
Recommended commands:
|
||||
✓ ps aux | grep nmbd
|
||||
✓ service nmbd status
|
||||
✗ systemctl status nmbd (not available)
|
||||
```
|
||||
|
||||
Main Claude uses this context to suggest correct approach.
|
||||
|
||||
---
|
||||
|
||||
## Benefits
|
||||
|
||||
### 1. Self-Improving System
|
||||
- Each failure makes the system smarter
|
||||
- Patterns identified automatically
|
||||
- Insights generated without manual documentation
|
||||
- Knowledge accumulates over time
|
||||
|
||||
### 2. Reduced User Friction
|
||||
- User doesn't have to keep correcting same mistakes
|
||||
- Claude learns environmental constraints once
|
||||
- Suggestions are environmentally aware from start
|
||||
- Proactive problem prevention
|
||||
|
||||
### 3. Institutional Knowledge Capture
|
||||
- All environmental quirks documented in database
|
||||
- Survives across sessions and Claude instances
|
||||
- Queryable: "What are known issues with D2TESTNAS?"
|
||||
- Transferable to new team members
|
||||
|
||||
### 4. Proactive Problem Prevention
|
||||
- Environment Context Agent prevents failures before they happen
|
||||
- Suggests compatible alternatives automatically
|
||||
- Warns about known limitations
|
||||
- Avoids wasting time on incompatible approaches
|
||||
|
||||
### 5. Audit Trail
|
||||
- Every failure tracked with full context
|
||||
- Resolution history for troubleshooting
|
||||
- Pattern analysis for infrastructure planning
|
||||
- ROI tracking: time saved by avoiding repeat failures
|
||||
|
||||
---
|
||||
|
||||
## Integration with Other Schemas
|
||||
|
||||
**Sources data from:**
|
||||
- `commands_run` - Command execution failures
|
||||
- `infrastructure` - System capabilities and limitations
|
||||
- `work_items` - Context for failures
|
||||
- `sessions` - Session context for operations
|
||||
|
||||
**Provides data to:**
|
||||
- Environment Context Agent (pre-operation checks)
|
||||
- Problem Pattern Matching Agent (solution lookup)
|
||||
- MSP Mode (intelligent suggestions)
|
||||
- Reporting (failure analysis, improvement metrics)
|
||||
|
||||
---
|
||||
|
||||
## Example Queries
|
||||
|
||||
### Find all insights for a client
|
||||
```sql
|
||||
SELECT ei.insight_title, ei.insight_description, i.hostname
|
||||
FROM environmental_insights ei
|
||||
JOIN infrastructure i ON ei.infrastructure_id = i.id
|
||||
WHERE ei.client_id = 'dataforth-uuid'
|
||||
AND ei.is_active = true
|
||||
ORDER BY ei.priority DESC;
|
||||
```
|
||||
|
||||
### Search for similar problems
|
||||
```sql
|
||||
SELECT ps.problem_title, ps.solution_applied, ps.created_at
|
||||
FROM problem_solutions ps
|
||||
WHERE MATCH(ps.problem_description, ps.symptom, ps.error_message)
|
||||
AGAINST('SSL certificate' IN BOOLEAN MODE)
|
||||
ORDER BY ps.created_at DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
### Active failure patterns
|
||||
```sql
|
||||
SELECT fp.pattern_signature, fp.occurrence_count, fp.recommended_solution
|
||||
FROM failure_patterns fp
|
||||
WHERE fp.is_active = true
|
||||
AND fp.severity IN ('blocking', 'major')
|
||||
ORDER BY fp.occurrence_count DESC;
|
||||
```
|
||||
|
||||
### Unresolved operation failures
|
||||
```sql
|
||||
SELECT of.operation_type, of.target_system, of.error_message, of.created_at
|
||||
FROM operation_failures of
|
||||
WHERE of.resolved = false
|
||||
ORDER BY of.created_at DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Document Version:** 1.0
|
||||
**Last Updated:** 2026-01-15
|
||||
**Author:** MSP Mode Schema Design Team
|
||||
Reference in New Issue
Block a user