Complete architecture for multi-mode Claude operation: - MSP Mode (client work tracking) - Development Mode (project management) - Normal Mode (general research) Agents created: - Coding Agent (perfectionist programmer) - Code Review Agent (quality gatekeeper) - Database Agent (data custodian) - Gitea Agent (version control) - Backup Agent (data protection) Workflows documented: - CODE_WORKFLOW.md (mandatory review process) - TASK_MANAGEMENT.md (checklist system) - FILE_ORGANIZATION.md (hybrid storage) - MSP-MODE-SPEC.md (complete architecture, 36 tables) Commands: - /sync (pull latest from Gitea) Database schema: 36 tables for comprehensive context storage File organization: clients/, projects/, normal/, backups/ Backup strategy: Daily/weekly/monthly with retention Status: Architecture complete, ready for implementation Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
116 KiB
MSP Mode Specification
Date Started: 2026-01-15 Status: Planning / Design Phase
Overview
Creating a custom "MSP Mode" for Claude Code that tracks client work, maintains context across sessions and machines, and provides structured access to historical MSP data.
Objectives
- Track MSP client work - Sessions, work items, time, credentials
- Multi-machine access - Same context available on all machines
- Long-term context - Search historical work across clients/projects
- Scalable & robust - Designed to evolve over time
Core Architectural Principle: Agent-Based Execution
Critical Design Rule: All modes (MSP, Development, Normal) use specialized agents wherever possible to preserve main Claude instance context space.
Why Agent-Based Architecture?
Context Preservation:
- Main Claude instance maintains conversation focus with user
- Agents handle data processing, queries, analysis, integration calls
- User gets concise results without context pollution
- Main instance doesn't get bloated with raw database results or API responses
Scalability:
- Multiple agents can run in parallel
- Each agent has full context window for its specific task
- Complex operations don't consume main context
Separation of Concerns:
- Main instance: Conversation, decision-making, user interaction
- Agents: Data retrieval, processing, categorization, integration
Agent Usage Patterns
When to Use Agents:
-
Database Operations
- Querying sessions, work items, credentials
- Searching historical data
- Complex joins and aggregations
- Agent returns concise summary, not raw rows
-
Session Analysis & Categorization
- End-of-session processing
- Auto-categorizing work items
- Extracting structured data (commands, files, systems)
- Generating dense summaries
- Auto-tagging
-
External Integrations
- Searching tickets in SyncroMSP
- Pulling backup reports from MSP Backups
- OAuth flows
- Processing external API responses
-
Context Recovery
- User asks: "What did we do for Dataforth last time?"
- Agent searches database, retrieves sessions, summarizes
- Returns: "Last session: 2026-01-10, worked on DNS migration (3 hours)"
-
Credential Management
- Retrieving encrypted credentials
- Decryption and formatting
- Audit logging
- Returns only the credential needed
-
Problem Pattern Matching
- User describes error
- Agent searches problem_solutions table
- Returns: "Similar issue solved on 2025-12-15: [brief solution]"
-
Parallel Analysis
- Multiple data sources need analysis
- Launch parallel agents for each source
- Aggregate results in main context
When NOT to Use Agents:
- Simple API calls that return small payloads
- Single credential lookups
- Quick status checks
- User is asking conversational questions (not data operations)
Agent Communication Pattern
User: "Show me all work for Dataforth in January"
↓
Main Claude: Understands request, validates parameters
↓
Launches Agent: "Explore database for Dataforth sessions in January 2026"
↓
Agent:
- Queries database (sessions WHERE client='Dataforth' AND date BETWEEN...)
- Processes 15 sessions
- Extracts key info: dates, categories, billable hours, major outcomes
- Generates concise summary
↓
Agent Returns:
"Dataforth - January 2026:
15 sessions, 38.5 billable hours
Main projects: DOS machines (8 sessions), Network migration (5), M365 (2)
Categories: Infrastructure (60%), Troubleshooting (25%), Config (15%)
Key outcomes: Completed UPDATE.BAT v2.0, migrated DNS to UDM"
↓
Main Claude: Presents summary to user, ready for follow-up questions
Context Saved: Agent processed potentially 500+ rows of data, main Claude only received 200-word summary.
Architecture Decisions
Storage: SQL Database (MariaDB)
Decision: Use SQL database instead of local files + Git sync
Rationale:
- Claude Code requires internet anyway (offline not a real advantage)
- Structured queries needed ("show all work for Client X in January")
- Relational data (clients → projects → sessions → credentials → billing)
- Fast indexing and search even with years of data
- No merge conflicts (single source of truth)
- Time tracking and billing calculations
- Report generation capabilities
Infrastructure:
- Existing MariaDB on Jupiter (172.16.3.20)
- New database:
msp_tracking
Access Method: REST API with JWT Authentication
Decision: FastAPI REST API on Jupiter with JWT tokens
Rationale - Security:
- Token-based auth (revocable, rotatable)
- Scoped permissions (API can't access other databases)
- Audit logging (track all queries by user/token/timestamp)
- Rate limiting possible
- HTTPS encryption in transit
- Token revocation without DB password changes
- IP restrictions and 2FA possible later
Rationale - Scalability:
- Industry standard approach
- Can add team members later
- Other tools can integrate (scripts, mobile, GuruRMM)
- Stateless authentication (no session storage)
- Multiple clients supported
Rationale - Robustness:
- Comprehensive error handling
- Input validation before DB access
- Structured logging
- Health checks and monitoring
- Version controlled schema migrations
Technology Stack
API Framework: FastAPI (Python)
- Async performance for concurrent requests
- Auto-generated OpenAPI/Swagger documentation
- Type safety with Pydantic models (runtime validation)
- SQLAlchemy ORM for complex queries
- Built-in background tasks
- Industry-standard testing with pytest
- Alembic for database migrations
- Mature dependency injection
Authentication: JWT Tokens
- Stateless (no DB lookup to validate)
- Claims-based (permissions, scopes, expiration)
- Refresh token pattern for long-term access
- Multiple clients/machines supported
- Short-lived tokens minimize compromise risk
- Industry standard
Configuration Storage: Gitea (Private Repo)
- Multi-machine sync
- Version controlled
- Single source of truth
- Token rotation = one commit, all machines sync
- Encrypted token values (git-crypt or encrypted JSON)
- Backup via Gitea
Deployment: Docker Container
- Easy deployment and updates
- Resource limits
- Systemd service for auto-restart
- Portable (can migrate to dedicated host later)
Infrastructure Design
Jupiter Server (172.16.3.20)
Docker Container: msp-api
- FastAPI application (Python 3.11+)
- SQLAlchemy + Alembic (ORM and migrations)
- JWT auth library (python-jose)
- Pydantic validation
- Gunicorn/Uvicorn ASGI server
- Health checks endpoint
- Prometheus metrics (optional)
- Mounted logs: /var/log/msp-api/
MariaDB Database: msp_tracking
- Connection pooling (SQLAlchemy)
- Automated backups (critical MSP data)
- Schema versioned with Alembic
Nginx Reverse Proxy
- HTTPS with Let's Encrypt
- Rate limiting
- Access logs
- Proxies to: msp-api.azcomputerguru.com
Gitea Private Repository
Repo: azcomputerguru/claude-settings (or new msp-config repo)
Structure:
msp-api-config.json
├── api_url (https://msp-api.azcomputerguru.com)
├── api_token (encrypted JWT or refresh token)
└── database_schema_version (for migration tracking)
Local Machine (D:\ClaudeTools)
Directory Structure:
D:\ClaudeTools\
├── .claude/
│ ├── commands/
│ │ ├── msp.md (MSP Mode slash command)
│ │ ├── dev.md (Development Mode - TBD)
│ │ └── normal.md (Normal Mode - TBD)
│ └── msp-api-config.json (synced from Gitea)
├── MSP-MODE-SPEC.md (this file)
└── .git/ (synced to Gitea)
API Design Principles
Versioning
- Start with
/api/v1/from day one - Allows breaking changes in future versions
Security
- All endpoints require JWT authentication
- Input validation with Pydantic models
- Never expose database errors to client
- Rate limiting to prevent abuse
- Comprehensive audit logging
Endpoints (Draft - To Be Detailed)
POST /api/v1/sessions (start new MSP session)
GET /api/v1/sessions (query sessions - filters: client, date range, etc.)
POST /api/v1/work-items (log work performed)
GET /api/v1/clients (list clients)
POST /api/v1/clients (create client record)
GET /api/v1/clients/{id}/credentials
POST /api/v1/auth/token (get JWT token)
POST /api/v1/auth/refresh (refresh expired token)
GET /api/v1/health (health check)
GET /api/v1/metrics (Prometheus metrics - optional)
Error Handling
- Structured JSON error responses
- HTTP status codes (400, 401, 403, 404, 429, 500)
- Never leak sensitive information in errors
- Log all errors with context
Logging
- JSON structured logs (easy parsing)
- Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
- Include: timestamp, user/token ID, endpoint, duration, status
- Separate audit log for sensitive operations (credential access, deletions)
JWT Token Structure
Access Token (Short-lived: 1 hour)
{
"sub": "mike@azcomputerguru.com",
"scopes": ["msp:read", "msp:write", "msp:admin"],
"machine": "windows-workstation",
"exp": 1234567890,
"iat": 1234567890,
"jti": "unique-token-id"
}
Refresh Token (Long-lived: 30 days)
- Stored securely in Gitea config
- Used to obtain new access tokens
- Can be revoked server-side
Scopes (Permissions)
msp:read- Read sessions, clients, work itemsmsp:write- Create/update sessions, work itemsmsp:admin- Manage clients, credentials, delete operations
Database Schema (Draft - To Be Detailed)
Tables (High-Level)
clients
- id, name, network_cidr, primary_contact, notes, created_at, updated_at
projects
- id, client_id, name, description, status, created_at, updated_at
sessions
- id, project_id, start_time, end_time, billable_hours, notes, created_at
work_items
- id, session_id, description, category, timestamp, billable, created_at
credentials
- id, client_id, service, username, password_encrypted, notes, created_at, updated_at
tags (for categorization)
- id, name, type (client_tag, project_tag, work_tag)
session_tags (many-to-many)
- session_id, tag_id
Schema Versioning
- Alembic migrations in version control
- Schema version tracked in config
- Automated migration on API startup (optional)
Modes Overview (D:\ClaudeTools Context)
1. MSP Mode
- Purpose: Track client work, maintain context across sessions
- Activation:
/mspslash command - Behaviors: (To Be Detailed)
- Prompt for client/project at start
- Auto-log work items as we work
- Track time spent
- Save credentials securely
- Generate session summary at end
2. Development Mode
- Purpose: (To Be Detailed)
- Activation:
/devslash command - Behaviors: (To Be Detailed)
3. Normal Mode
- Purpose: Return to standard Claude behavior
- Activation:
/normalslash command - Behaviors: (To Be Detailed)
- Clear active mode context
- Standard conversational Claude
Database Schema Design
Status: ✅ Analyzed via 5 parallel agents on 2026-01-15
Based on comprehensive analysis of:
- 37 session logs (Dec 2025 - Jan 2026)
- shared-data/credentials.md
- All project directories and documentation
- Infrastructure and client network patterns
Schema Summary
Total Tables: 25 core tables + 5 junction tables = 30 tables
Categories:
- Core MSP Tracking (5 tables)
- Client & Infrastructure (7 tables)
- Credentials & Security (4 tables)
- Work Details (6 tables)
- Tagging & Categorization (3 tables)
- System & Audit (2 tables)
- External Integrations (3 tables) - Added 2026-01-15
1. Core MSP Tracking Tables (6 tables)
machines
Technician's machines (laptops, desktops) used for MSP work.
CREATE TABLE machines (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Machine identification (auto-detected)
hostname VARCHAR(255) NOT NULL UNIQUE, -- from `hostname` command
machine_fingerprint VARCHAR(500) UNIQUE, -- hostname + username + platform hash
-- Environment details
friendly_name VARCHAR(255), -- "Main Laptop", "Home Desktop", "Travel Laptop"
machine_type VARCHAR(50) CHECK(machine_type IN ('laptop', 'desktop', 'workstation', 'vm')),
platform VARCHAR(50), -- "win32", "darwin", "linux"
os_version VARCHAR(100),
username VARCHAR(255), -- from `whoami`
home_directory VARCHAR(500), -- user home path
-- Capabilities
has_vpn_access BOOLEAN DEFAULT false, -- can connect to client networks
vpn_profiles TEXT, -- JSON array: ["dataforth", "grabb", "internal"]
has_docker BOOLEAN DEFAULT false,
has_powershell BOOLEAN DEFAULT false,
powershell_version VARCHAR(20),
has_ssh BOOLEAN DEFAULT true,
has_git BOOLEAN DEFAULT true,
-- Network context
typical_network_location VARCHAR(100), -- "home", "office", "mobile"
static_ip VARCHAR(45), -- if has static IP
-- Claude Code context
claude_working_directory VARCHAR(500), -- primary working dir
additional_working_dirs TEXT, -- JSON array
-- Tool versions
installed_tools TEXT, -- JSON: {"git": "2.40", "docker": "24.0", "python": "3.11"}
-- MCP Servers & Skills (NEW)
available_mcps TEXT, -- JSON array: ["claude-in-chrome", "filesystem", "custom-mcp"]
mcp_capabilities TEXT, -- JSON: {"chrome": {"version": "1.0", "features": ["screenshots"]}}
available_skills TEXT, -- JSON array: ["pdf", "commit", "review-pr", "custom-skill"]
skill_paths TEXT, -- JSON: {"/pdf": "/path/to/pdf-skill", ...}
-- OS-Specific Commands
preferred_shell VARCHAR(50), -- "powershell", "bash", "zsh", "cmd"
package_manager_commands TEXT, -- JSON: {"install": "choco install", "update": "choco upgrade"}
-- Status
is_primary BOOLEAN DEFAULT false, -- primary machine
is_active BOOLEAN DEFAULT true,
last_seen TIMESTAMP,
last_session_id UUID, -- last session from this machine
-- Notes
notes TEXT, -- "Travel laptop - limited tools, no VPN"
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_machines_hostname (hostname),
INDEX idx_machines_fingerprint (machine_fingerprint),
INDEX idx_machines_is_active (is_active),
INDEX idx_machines_platform (platform)
);
Machine Fingerprint Generation:
fingerprint = SHA256(hostname + "|" + username + "|" + platform + "|" + home_directory)
// Example: SHA256("ACG-M-L5090|MikeSwanson|win32|C:\Users\MikeSwanson")
Auto-Detection on Session Start:
hostname = exec("hostname") // "ACG-M-L5090"
username = exec("whoami") // "MikeSwanson" or "AzureAD+MikeSwanson"
platform = process.platform // "win32", "darwin", "linux"
home_dir = process.env.HOME || process.env.USERPROFILE
fingerprint = SHA256(`${hostname}|${username}|${platform}|${home_dir}`)
// Query database: SELECT * FROM machines WHERE machine_fingerprint = ?
// If not found: Create new machine record
// If found: Update last_seen, return machine_id
Examples:
ACG-M-L5090 (Main Laptop):
{
"hostname": "ACG-M-L5090",
"friendly_name": "Main Laptop",
"platform": "win32",
"os_version": "Windows 11 Pro",
"has_vpn_access": true,
"vpn_profiles": ["dataforth", "grabb", "internal"],
"has_docker": true,
"powershell_version": "7.4",
"preferred_shell": "powershell",
"available_mcps": ["claude-in-chrome", "filesystem"],
"available_skills": ["pdf", "commit", "review-pr", "frontend-design"],
"package_manager_commands": {
"install": "choco install {package}",
"update": "choco upgrade {package}",
"list": "choco list --local-only"
}
}
Mike-MacBook (Development Machine):
{
"hostname": "Mikes-MacBook-Pro",
"friendly_name": "MacBook Pro",
"platform": "darwin",
"os_version": "macOS 14.2",
"has_vpn_access": false,
"has_docker": true,
"powershell_version": null,
"preferred_shell": "zsh",
"available_mcps": ["filesystem"],
"available_skills": ["commit", "review-pr"],
"package_manager_commands": {
"install": "brew install {package}",
"update": "brew upgrade {package}",
"list": "brew list"
}
}
Travel-Laptop (Limited):
{
"hostname": "TRAVEL-WIN",
"friendly_name": "Travel Laptop",
"platform": "win32",
"os_version": "Windows 10 Home",
"has_vpn_access": false,
"vpn_profiles": [],
"has_docker": false,
"powershell_version": "5.1",
"preferred_shell": "powershell",
"available_mcps": [],
"available_skills": [],
"notes": "Minimal toolset, no Docker, no VPN - use for light work only"
}
clients
Master table for all client organizations.
CREATE TABLE clients (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL UNIQUE,
type VARCHAR(50) NOT NULL CHECK(type IN ('msp_client', 'internal', 'project')),
network_subnet VARCHAR(100), -- e.g., "192.168.0.0/24"
domain_name VARCHAR(255), -- AD domain or primary domain
m365_tenant_id UUID, -- Microsoft 365 tenant ID
primary_contact VARCHAR(255),
notes TEXT,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_clients_type (type),
INDEX idx_clients_name (name)
);
Examples: Dataforth, Grabb & Durando, Valley Wide Plastering, AZ Computer Guru (internal)
projects
Individual projects/engagements for clients.
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE, -- directory name: "dataforth-dos"
category VARCHAR(50) CHECK(category IN (
'client_project', 'internal_product', 'infrastructure',
'website', 'development_tool', 'documentation'
)),
status VARCHAR(50) DEFAULT 'working' CHECK(status IN (
'complete', 'working', 'blocked', 'pending', 'critical', 'deferred'
)),
priority VARCHAR(20) CHECK(priority IN ('critical', 'high', 'medium', 'low')),
description TEXT,
started_date DATE,
target_completion_date DATE,
completed_date DATE,
estimated_hours DECIMAL(10,2),
actual_hours DECIMAL(10,2),
gitea_repo_url VARCHAR(500),
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_projects_client (client_id),
INDEX idx_projects_status (status),
INDEX idx_projects_slug (slug)
);
Examples: dataforth-dos, gururmm, grabb-website-move
sessions
Work sessions with time tracking (enhanced with machine tracking).
CREATE TABLE sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
project_id UUID REFERENCES projects(id) ON DELETE SET NULL,
machine_id UUID REFERENCES machines(id) ON DELETE SET NULL, -- NEW: which machine
session_date DATE NOT NULL,
start_time TIMESTAMP,
end_time TIMESTAMP,
duration_minutes INTEGER, -- auto-calculated or manual
status VARCHAR(50) DEFAULT 'completed' CHECK(status IN (
'completed', 'in_progress', 'blocked', 'pending'
)),
session_title VARCHAR(500) NOT NULL,
summary TEXT, -- markdown summary
is_billable BOOLEAN DEFAULT false,
billable_hours DECIMAL(10,2),
technician VARCHAR(255), -- "Mike Swanson", etc.
session_log_file VARCHAR(500), -- path to .md file
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_sessions_client (client_id),
INDEX idx_sessions_project (project_id),
INDEX idx_sessions_date (session_date),
INDEX idx_sessions_billable (is_billable),
INDEX idx_sessions_machine (machine_id)
);
work_items
Individual tasks/actions within sessions (granular tracking).
CREATE TABLE work_items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
category VARCHAR(50) NOT NULL CHECK(category IN (
'infrastructure', 'troubleshooting', 'configuration',
'development', 'maintenance', 'security', 'documentation'
)),
title VARCHAR(500) NOT NULL,
description TEXT NOT NULL,
status VARCHAR(50) DEFAULT 'completed' CHECK(status IN (
'completed', 'in_progress', 'blocked', 'pending', 'deferred'
)),
priority VARCHAR(20) CHECK(priority IN ('critical', 'high', 'medium', 'low')),
is_billable BOOLEAN DEFAULT false,
estimated_minutes INTEGER,
actual_minutes INTEGER,
affected_systems TEXT, -- JSON array: ["jupiter", "172.16.3.20"]
technologies_used TEXT, -- JSON array: ["docker", "mariadb"]
item_order INTEGER, -- sequence within session
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP,
INDEX idx_work_items_session (session_id),
INDEX idx_work_items_category (category),
INDEX idx_work_items_status (status)
);
Categories distribution (from analysis):
- Infrastructure: 30%
- Troubleshooting: 25%
- Configuration: 15%
- Development: 15%
- Maintenance: 10%
- Security: 5%
pending_tasks
Open items across all clients/projects.
CREATE TABLE pending_tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
work_item_id UUID REFERENCES work_items(id) ON DELETE SET NULL,
title VARCHAR(500) NOT NULL,
description TEXT,
priority VARCHAR(20) CHECK(priority IN ('critical', 'high', 'medium', 'low')),
blocked_by TEXT, -- what's blocking this
assigned_to VARCHAR(255),
due_date DATE,
status VARCHAR(50) DEFAULT 'pending' CHECK(status IN (
'pending', 'in_progress', 'blocked', 'completed', 'cancelled'
)),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP,
INDEX idx_pending_tasks_client (client_id),
INDEX idx_pending_tasks_status (status),
INDEX idx_pending_tasks_priority (priority)
);
tasks
Task/checklist management for tracking implementation steps, analysis work, and other agent activities.
-- Task/Checklist Management
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Task hierarchy
parent_task_id UUID REFERENCES tasks(id) ON DELETE CASCADE,
task_order INTEGER NOT NULL,
-- Task details
title VARCHAR(500) NOT NULL,
description TEXT,
task_type VARCHAR(100) CHECK(task_type IN (
'implementation', 'research', 'review', 'deployment',
'testing', 'documentation', 'bugfix', 'analysis'
)),
-- Status tracking
status VARCHAR(50) NOT NULL CHECK(status IN (
'pending', 'in_progress', 'blocked', 'completed', 'cancelled'
)),
blocking_reason TEXT, -- Why blocked (if status='blocked')
-- Context
session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
project_id UUID REFERENCES projects(id) ON DELETE SET NULL,
assigned_agent VARCHAR(100), -- Which agent is handling this
-- Timing
estimated_complexity VARCHAR(20) CHECK(estimated_complexity IN (
'trivial', 'simple', 'moderate', 'complex', 'very_complex'
)),
started_at TIMESTAMP,
completed_at TIMESTAMP,
-- Context data (JSON)
task_context TEXT, -- Detailed context for this task
dependencies TEXT, -- JSON array of dependency task_ids
-- Metadata
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_tasks_session (session_id),
INDEX idx_tasks_status (status),
INDEX idx_tasks_parent (parent_task_id),
INDEX idx_tasks_client (client_id),
INDEX idx_tasks_project (project_id)
);
2. Client & Infrastructure Tables (7 tables)
sites
Physical/logical locations for clients.
CREATE TABLE sites (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL, -- "Main Office", "SLC - Salt Lake City"
network_subnet VARCHAR(100), -- "172.16.9.0/24"
vpn_required BOOLEAN DEFAULT false,
vpn_subnet VARCHAR(100), -- "192.168.1.0/24"
gateway_ip VARCHAR(45), -- IPv4/IPv6
dns_servers TEXT, -- JSON array
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_sites_client (client_id)
);
infrastructure
Servers, network devices, NAS, workstations (enhanced with environmental constraints).
CREATE TABLE infrastructure (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
site_id UUID REFERENCES sites(id) ON DELETE SET NULL,
asset_type VARCHAR(50) NOT NULL CHECK(asset_type IN (
'physical_server', 'virtual_machine', 'container',
'network_device', 'nas_storage', 'workstation',
'firewall', 'domain_controller'
)),
hostname VARCHAR(255) NOT NULL,
ip_address VARCHAR(45),
mac_address VARCHAR(17),
os VARCHAR(255), -- "Ubuntu 22.04", "Windows Server 2022", "Unraid"
os_version VARCHAR(100), -- "6.22", "2008 R2", "22.04"
role_description TEXT, -- "Primary DC, NPS/RADIUS server"
parent_host_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL, -- for VMs/containers
status VARCHAR(50) DEFAULT 'active' CHECK(status IN (
'active', 'migration_source', 'migration_destination', 'decommissioned'
)),
-- Environmental constraints (new)
environmental_notes TEXT, -- "Manual WINS install, no native service. ReadyNAS OS, SMB1 only."
powershell_version VARCHAR(20), -- "2.0", "5.1", "7.4"
shell_type VARCHAR(50), -- "bash", "cmd", "powershell", "sh"
package_manager VARCHAR(50), -- "apt", "yum", "chocolatey", "none"
has_gui BOOLEAN DEFAULT true, -- false for headless/DOS
limitations TEXT, -- JSON array: ["no_ps7", "smb1_only", "dos_6.22_commands"]
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_infrastructure_client (client_id),
INDEX idx_infrastructure_type (asset_type),
INDEX idx_infrastructure_hostname (hostname),
INDEX idx_infrastructure_parent (parent_host_id),
INDEX idx_infrastructure_os (os)
);
Examples:
- Jupiter (Ubuntu 22.04, PS7, GUI)
- AD2/Dataforth (Server 2022, PS5.1, GUI)
- D2TESTNAS (ReadyNAS OS, manual WINS, no GUI service manager, SMB1)
- TS-27 (MS-DOS 6.22, no GUI, batch only)
services
Applications/services running on infrastructure.
CREATE TABLE services (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
service_name VARCHAR(255) NOT NULL, -- "Gitea", "PostgreSQL", "Apache"
service_type VARCHAR(100), -- "git_hosting", "database", "web_server"
external_url VARCHAR(500), -- "https://git.azcomputerguru.com"
internal_url VARCHAR(500), -- "http://172.16.3.20:3000"
port INTEGER,
protocol VARCHAR(50), -- "https", "ssh", "smb"
status VARCHAR(50) DEFAULT 'running' CHECK(status IN (
'running', 'stopped', 'error', 'maintenance'
)),
version VARCHAR(100),
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_services_infrastructure (infrastructure_id),
INDEX idx_services_name (service_name),
INDEX idx_services_type (service_type)
);
service_relationships
Dependencies and relationships between services.
CREATE TABLE service_relationships (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
from_service_id UUID NOT NULL REFERENCES services(id) ON DELETE CASCADE,
to_service_id UUID NOT NULL REFERENCES services(id) ON DELETE CASCADE,
relationship_type VARCHAR(50) NOT NULL CHECK(relationship_type IN (
'hosted_on', 'proxied_by', 'authenticates_via',
'backend_for', 'depends_on', 'replicates_to'
)),
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(from_service_id, to_service_id, relationship_type),
INDEX idx_service_rel_from (from_service_id),
INDEX idx_service_rel_to (to_service_id)
);
Examples:
- Gitea (proxied_by) NPM
- GuruRMM API (hosted_on) Jupiter container
networks
Network segments, VLANs, VPN networks.
CREATE TABLE networks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
site_id UUID REFERENCES sites(id) ON DELETE CASCADE,
network_name VARCHAR(255) NOT NULL,
network_type VARCHAR(50) CHECK(network_type IN (
'lan', 'vpn', 'vlan', 'isolated', 'dmz'
)),
cidr VARCHAR(100) NOT NULL, -- "192.168.0.0/24"
gateway_ip VARCHAR(45),
vlan_id INTEGER,
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_networks_client (client_id),
INDEX idx_networks_site (site_id)
);
firewall_rules
Network security rules (for documentation/audit trail).
CREATE TABLE firewall_rules (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
rule_name VARCHAR(255),
source_cidr VARCHAR(100),
destination_cidr VARCHAR(100),
port INTEGER,
protocol VARCHAR(20), -- "tcp", "udp", "icmp"
action VARCHAR(20) CHECK(action IN ('allow', 'deny', 'drop')),
rule_order INTEGER,
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by VARCHAR(255),
INDEX idx_firewall_infra (infrastructure_id)
);
m365_tenants
Microsoft 365 tenant tracking.
CREATE TABLE m365_tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
tenant_id UUID NOT NULL UNIQUE, -- Microsoft tenant ID
tenant_name VARCHAR(255), -- "dataforth.com"
default_domain VARCHAR(255), -- "dataforthcorp.onmicrosoft.com"
admin_email VARCHAR(255),
cipp_name VARCHAR(255), -- name in CIPP portal
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_m365_client (client_id),
INDEX idx_m365_tenant_id (tenant_id)
);
3. Credentials & Security Tables (4 tables)
credentials
Encrypted credential storage (values encrypted at rest).
CREATE TABLE credentials (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
service_id UUID REFERENCES services(id) ON DELETE CASCADE,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
credential_type VARCHAR(50) NOT NULL CHECK(credential_type IN (
'password', 'api_key', 'oauth', 'ssh_key',
'shared_secret', 'jwt', 'connection_string', 'certificate'
)),
service_name VARCHAR(255) NOT NULL, -- "Gitea Admin", "AD2 sysadmin"
username VARCHAR(255),
password_encrypted BYTEA, -- AES-256-GCM encrypted
api_key_encrypted BYTEA,
client_id_oauth VARCHAR(255), -- for OAuth
client_secret_encrypted BYTEA,
tenant_id_oauth VARCHAR(255),
public_key TEXT, -- for SSH
token_encrypted BYTEA,
connection_string_encrypted BYTEA,
integration_code VARCHAR(255), -- for services like Autotask
-- Metadata
external_url VARCHAR(500),
internal_url VARCHAR(500),
custom_port INTEGER,
role_description VARCHAR(500),
requires_vpn BOOLEAN DEFAULT false,
requires_2fa BOOLEAN DEFAULT false,
ssh_key_auth_enabled BOOLEAN DEFAULT false,
access_level VARCHAR(100),
-- Lifecycle
expires_at TIMESTAMP,
last_rotated_at TIMESTAMP,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_credentials_client (client_id),
INDEX idx_credentials_service (service_id),
INDEX idx_credentials_type (credential_type),
INDEX idx_credentials_active (is_active)
);
Security:
- All sensitive fields encrypted with AES-256-GCM
- Encryption key stored separately (environment variable or vault)
- Master password unlock mechanism
credential_audit_log
Audit trail for credential access.
CREATE TABLE credential_audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
credential_id UUID NOT NULL REFERENCES credentials(id) ON DELETE CASCADE,
action VARCHAR(50) NOT NULL CHECK(action IN (
'view', 'create', 'update', 'delete', 'rotate', 'decrypt'
)),
user_id VARCHAR(255) NOT NULL, -- JWT sub claim
ip_address VARCHAR(45),
user_agent TEXT,
details TEXT, -- JSON: what changed, why
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_cred_audit_credential (credential_id),
INDEX idx_cred_audit_user (user_id),
INDEX idx_cred_audit_timestamp (timestamp)
);
security_incidents
Track security events and remediation.
CREATE TABLE security_incidents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
service_id UUID REFERENCES services(id) ON DELETE SET NULL,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
incident_type VARCHAR(100) CHECK(incident_type IN (
'bec', 'backdoor', 'malware', 'unauthorized_access',
'data_breach', 'phishing', 'ransomware', 'brute_force'
)),
incident_date TIMESTAMP NOT NULL,
severity VARCHAR(50) CHECK(severity IN ('critical', 'high', 'medium', 'low')),
description TEXT NOT NULL,
findings TEXT, -- investigation results
remediation_steps TEXT,
status VARCHAR(50) DEFAULT 'investigating' CHECK(status IN (
'investigating', 'contained', 'resolved', 'monitoring'
)),
resolved_at TIMESTAMP,
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_incidents_client (client_id),
INDEX idx_incidents_type (incident_type),
INDEX idx_incidents_status (status)
);
Examples: BG Builders OAuth backdoor, CW Concrete BEC
credential_permissions
Access control for credentials (future team expansion).
CREATE TABLE credential_permissions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
credential_id UUID NOT NULL REFERENCES credentials(id) ON DELETE CASCADE,
user_id VARCHAR(255) NOT NULL, -- or role_id
permission_level VARCHAR(50) CHECK(permission_level IN ('read', 'write', 'admin')),
granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
granted_by VARCHAR(255),
UNIQUE(credential_id, user_id),
INDEX idx_cred_perm_credential (credential_id),
INDEX idx_cred_perm_user (user_id)
);
4. Work Details Tables (6 tables)
file_changes
Track files created/modified/deleted during sessions.
CREATE TABLE file_changes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
file_path VARCHAR(1000) NOT NULL,
change_type VARCHAR(50) CHECK(change_type IN (
'created', 'modified', 'deleted', 'renamed', 'backed_up'
)),
backup_path VARCHAR(1000),
size_bytes BIGINT,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_file_changes_work_item (work_item_id),
INDEX idx_file_changes_session (session_id)
);
commands_run
Shell/PowerShell/SQL commands executed (enhanced with failure tracking).
CREATE TABLE commands_run (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
command_text TEXT NOT NULL,
host VARCHAR(255), -- where executed: "jupiter", "172.16.3.20"
shell_type VARCHAR(50), -- "bash", "powershell", "sql", "docker"
success BOOLEAN,
output_summary TEXT, -- first/last lines or error
-- Failure tracking (new)
exit_code INTEGER, -- non-zero indicates failure
error_message TEXT, -- full error text
failure_category VARCHAR(100), -- "compatibility", "permission", "syntax", "environmental"
resolution TEXT, -- how it was fixed (if resolved)
resolved BOOLEAN DEFAULT false,
execution_order INTEGER, -- sequence within work item
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_commands_work_item (work_item_id),
INDEX idx_commands_session (session_id),
INDEX idx_commands_host (host),
INDEX idx_commands_success (success),
INDEX idx_commands_failure_category (failure_category)
);
infrastructure_changes
Audit trail for infrastructure modifications.
CREATE TABLE infrastructure_changes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
change_type VARCHAR(50) CHECK(change_type IN (
'dns', 'firewall', 'routing', 'ssl', 'container',
'service_config', 'hardware', 'network', 'storage'
)),
target_system VARCHAR(255) NOT NULL,
before_state TEXT,
after_state TEXT,
is_permanent BOOLEAN DEFAULT true,
rollback_procedure TEXT,
verification_performed BOOLEAN DEFAULT false,
verification_notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_infra_changes_work_item (work_item_id),
INDEX idx_infra_changes_session (session_id),
INDEX idx_infra_changes_infrastructure (infrastructure_id)
);
backup_log
Backup tracking with verification status.
-- Backup Tracking
CREATE TABLE backup_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Backup details
backup_type VARCHAR(50) NOT NULL CHECK(backup_type IN (
'daily', 'weekly', 'monthly', 'manual', 'pre-migration'
)),
file_path VARCHAR(500) NOT NULL,
file_size_bytes BIGINT NOT NULL,
-- Timing
backup_started_at TIMESTAMP NOT NULL,
backup_completed_at TIMESTAMP NOT NULL,
duration_seconds INTEGER GENERATED ALWAYS AS (
TIMESTAMPDIFF(SECOND, backup_started_at, backup_completed_at)
) STORED,
-- Verification
verification_status VARCHAR(50) CHECK(verification_status IN (
'passed', 'failed', 'not_verified'
)),
verification_details TEXT, -- JSON: specific check results
-- Metadata
database_host VARCHAR(255),
database_name VARCHAR(100),
backup_method VARCHAR(50) DEFAULT 'mysqldump',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_backup_type (backup_type),
INDEX idx_backup_date (backup_completed_at),
INDEX idx_verification_status (verification_status)
);
problem_solutions
Issue tracking with root cause and resolution.
CREATE TABLE problem_solutions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
problem_description TEXT NOT NULL,
symptom TEXT, -- what user saw
error_message TEXT, -- exact error code/message
investigation_steps TEXT, -- JSON array of diagnostic commands
root_cause TEXT,
solution_applied TEXT NOT NULL,
verification_method TEXT,
rollback_plan TEXT,
recurrence_count INTEGER DEFAULT 1, -- if same problem reoccurs
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_problems_work_item (work_item_id),
INDEX idx_problems_session (session_id)
);
deployments
Track software/config deployments.
CREATE TABLE deployments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
service_id UUID REFERENCES services(id) ON DELETE SET NULL,
deployment_type VARCHAR(50) CHECK(deployment_type IN (
'code', 'config', 'database', 'container', 'service_restart'
)),
version VARCHAR(100),
description TEXT,
deployed_from VARCHAR(500), -- source path or repo
deployed_to VARCHAR(500), -- destination
rollback_available BOOLEAN DEFAULT false,
rollback_procedure TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_deployments_work_item (work_item_id),
INDEX idx_deployments_infrastructure (infrastructure_id),
INDEX idx_deployments_service (service_id)
);
database_changes
Track database schema/data modifications.
CREATE TABLE database_changes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
database_name VARCHAR(255) NOT NULL,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
change_type VARCHAR(50) CHECK(change_type IN (
'schema', 'data', 'index', 'optimization', 'cleanup', 'migration'
)),
sql_executed TEXT,
rows_affected BIGINT,
size_freed_bytes BIGINT, -- for cleanup operations
backup_taken BOOLEAN DEFAULT false,
backup_location VARCHAR(500),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_db_changes_work_item (work_item_id),
INDEX idx_db_changes_database (database_name)
);
failure_patterns
Aggregated failure insights learned from command/operation failures.
CREATE TABLE failure_patterns (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
-- Pattern identification
pattern_type VARCHAR(100) NOT NULL CHECK(pattern_type IN (
'command_compatibility', 'version_mismatch', 'permission_denied',
'service_unavailable', 'configuration_error', 'environmental_limitation'
)),
pattern_signature VARCHAR(500) NOT NULL, -- "PowerShell 7 cmdlets on Server 2008"
error_pattern TEXT, -- regex or keywords: "Get-LocalUser.*not recognized"
-- Context
affected_systems TEXT, -- JSON array: ["all_server_2008", "D2TESTNAS"]
triggering_commands TEXT, -- JSON array of command patterns
triggering_operations TEXT, -- JSON array of operation types
-- Resolution
failure_description TEXT NOT NULL,
root_cause TEXT NOT NULL, -- "Server 2008 only has PowerShell 2.0"
recommended_solution TEXT NOT NULL, -- "Use Get-WmiObject instead of Get-LocalUser"
alternative_approaches TEXT, -- JSON array of alternatives
-- Metadata
occurrence_count INTEGER DEFAULT 1, -- how many times seen
first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
severity VARCHAR(20) CHECK(severity IN ('blocking', 'major', 'minor', 'info')),
is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies
added_to_insights BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_failure_infrastructure (infrastructure_id),
INDEX idx_failure_client (client_id),
INDEX idx_failure_pattern_type (pattern_type),
INDEX idx_failure_signature (pattern_signature)
);
Examples:
- Pattern: "PowerShell 7 cmdlets on Server 2008" → Use PS 2.0 compatible commands
- Pattern: "WINS service GUI on D2TESTNAS" → WINS manually installed, no native service
- Pattern: "Modern batch syntax on DOS 6.22" → No IF /I, no long filenames
environmental_insights
Generated insights.md content per client/infrastructure.
CREATE TABLE environmental_insights (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
-- Insight content
insight_category VARCHAR(100) NOT NULL CHECK(insight_category IN (
'command_constraints', 'service_configuration', 'version_limitations',
'custom_installations', 'network_constraints', 'permissions'
)),
insight_title VARCHAR(500) NOT NULL,
insight_description TEXT NOT NULL, -- markdown formatted
examples TEXT, -- JSON array of command examples
-- Metadata
source_pattern_id UUID REFERENCES failure_patterns(id) ON DELETE SET NULL,
confidence_level VARCHAR(20) CHECK(confidence_level IN ('confirmed', 'likely', 'suspected')),
verification_count INTEGER DEFAULT 1, -- how many times verified
priority INTEGER DEFAULT 5, -- 1-10, higher = more important
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_verified TIMESTAMP,
INDEX idx_insights_client (client_id),
INDEX idx_insights_infrastructure (infrastructure_id),
INDEX idx_insights_category (insight_category)
);
Generated insights.md example:
# Environmental Insights: Dataforth
## D2TESTNAS (192.168.0.9)
### Custom Installations
- **WINS Service**: Manually installed, not native ReadyNAS service
- No GUI service manager for WINS
- Configure via `/etc/frontview/samba/smb.conf.overrides`
- Check status: `ssh root@192.168.0.9 'nmbd -V'`
### Version Constraints
- **SMB Protocol**: CORE/SMB1 only (for DOS compatibility)
- Modern SMB2/3 clients may need configuration
- Use NetBIOS name, not IP address for DOS machines
## AD2 (192.168.0.6 - Server 2022)
### PowerShell Version
- **Version**: PowerShell 5.1 (default)
- **Compatible**: Modern cmdlets work
- **Not available**: PowerShell 7 specific features
## TS-XX Machines (DOS)
### Command Constraints
- **OS**: MS-DOS 6.22
- **No support for**:
- `IF /I` (case insensitive) - use duplicate IF statements
- Long filenames (8.3 format only)
- Unicode or special characters
- Modern batch features
operation_failures
Non-command failures (API calls, integrations, file operations).
CREATE TABLE operation_failures (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE,
-- Operation details
operation_type VARCHAR(100) NOT NULL CHECK(operation_type IN (
'api_call', 'file_operation', 'network_request',
'database_query', 'external_integration', 'service_restart'
)),
operation_description TEXT NOT NULL,
target_system VARCHAR(255), -- host, URL, service name
-- Failure details
error_message TEXT NOT NULL,
error_code VARCHAR(50), -- HTTP status, exit code, error number
failure_category VARCHAR(100), -- "timeout", "authentication", "not_found", etc.
stack_trace TEXT,
-- Resolution
resolution_applied TEXT,
resolved BOOLEAN DEFAULT false,
resolved_at TIMESTAMP,
-- Context
request_data TEXT, -- JSON: what was attempted
response_data TEXT, -- JSON: error response
environment_snapshot TEXT, -- JSON: relevant env vars, versions
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_op_failure_session (session_id),
INDEX idx_op_failure_type (operation_type),
INDEX idx_op_failure_category (failure_category),
INDEX idx_op_failure_resolved (resolved)
);
Examples:
- SyncroMSP API call timeout → Retry logic needed
- File upload to NAS fails → Permission issue detected
- Database query slow → Index missing, added
5. Tagging & Categorization Tables (3 tables)
tags
Flexible tagging system for work items.
CREATE TABLE tags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(100) UNIQUE NOT NULL,
category VARCHAR(50) CHECK(category IN (
'technology', 'client', 'infrastructure',
'problem_type', 'action', 'service'
)),
description TEXT,
usage_count INTEGER DEFAULT 0, -- auto-increment on use
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_tags_category (category),
INDEX idx_tags_name (name)
);
Pre-populated tags: 157+ tags identified from analysis
- 58 technology tags (docker, postgresql, apache, etc.)
- 24 infrastructure tags (jupiter, saturn, pfsense, etc.)
- 20+ client tags
- 30 problem type tags (connection-timeout, ssl-error, etc.)
- 25 action tags (migration, upgrade, cleanup, etc.)
work_item_tags (Junction Table)
Many-to-many relationship: work items ↔ tags.
CREATE TABLE work_item_tags (
work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
tag_id UUID NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (work_item_id, tag_id),
INDEX idx_wit_work_item (work_item_id),
INDEX idx_wit_tag (tag_id)
);
session_tags (Junction Table)
Many-to-many relationship: sessions ↔ tags.
CREATE TABLE session_tags (
session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
tag_id UUID NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (session_id, tag_id),
INDEX idx_st_session (session_id),
INDEX idx_st_tag (tag_id)
);
6. System & Audit Tables (2 tables)
api_audit_log
Track all API requests for security and debugging.
CREATE TABLE api_audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id VARCHAR(255) NOT NULL, -- JWT sub claim
endpoint VARCHAR(500) NOT NULL, -- "/api/v1/sessions"
http_method VARCHAR(10), -- GET, POST, PUT, DELETE
ip_address VARCHAR(45),
user_agent TEXT,
request_body TEXT, -- sanitized (no credentials)
response_status INTEGER, -- 200, 401, 500
response_time_ms INTEGER,
error_message TEXT,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_api_audit_user (user_id),
INDEX idx_api_audit_endpoint (endpoint),
INDEX idx_api_audit_timestamp (timestamp),
INDEX idx_api_audit_status (response_status)
);
schema_migrations
Track database schema versions (Alembic migrations).
CREATE TABLE schema_migrations (
version_id VARCHAR(100) PRIMARY KEY,
description TEXT,
applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
applied_by VARCHAR(255),
migration_sql TEXT
);
Junction Tables Summary
- work_item_tags - Work items ↔ Tags
- session_tags - Sessions ↔ Tags
- project_relationships (optional) - Projects ↔ Projects (related/dependent)
- project_session_logs (optional) - Projects ↔ Sessions (many-to-many if sessions span multiple projects)
Schema Statistics
Total Tables: 34
- Core MSP: 6 tables (added
machinestable) - Client & Infrastructure: 7 tables
- Credentials & Security: 4 tables
- Work Details: 6 tables
- Failure Analysis & Environmental Insights: 3 tables
- Tagging: 3 tables (+ 2 junction)
- System: 2 tables
- External Integrations: 3 tables
Estimated Row Counts (1 year of MSP work):
- sessions: ~500-1000 (2-3 per day)
- work_items: ~5,000-10,000 (5-10 per session)
- file_changes: ~10,000
- commands_run: ~20,000
- tags: ~200
- clients: ~50
- projects: ~100
- credentials: ~500
- api_audit_log: ~100,000+
Storage Estimate: ~1-2 GB per year (compressed)
Design Principles Applied
- Normalized Structure - Minimizes data duplication
- Flexible Tagging - Supports evolving categorization
- Audit Trail - Comprehensive logging for security and troubleshooting
- Scalability - Designed for multi-user MSP team growth
- Security First - Encrypted credentials, access control, audit logging
- Temporal Tracking - created_at, updated_at, completed_at timestamps
- Soft Deletes - is_active flags allow recovery
- Relationships - Foreign keys enforce referential integrity
- Indexes - Strategic indexes for common query patterns
- JSON Flexibility - JSON fields for arrays/flexible data (affected_systems, technologies_used)
Next Steps for Database Implementation
- ✅ Schema designed (27 tables, relationships defined)
- ⏳ Create Alembic migration files
- ⏳ Set up encryption key management
- ⏳ Seed initial data (tags, MSP infrastructure)
- ⏳ Create database on Jupiter MariaDB
- ⏳ Build FastAPI models (SQLAlchemy + Pydantic)
- ⏳ Implement API endpoints
- ⏳ Create authentication flow
- ⏳ Build MSP Mode slash command integration
Open Questions
MSP Mode Behaviors (DEFINED)
Core Principle: Automatically categorize client interactions and store useful data in brief but information-dense format.
When /msp is Called (Session Start)
Phase 0: Machine Detection (FIRST - before everything)
Main Claude launches Machine Detection Agent:
Agent performs:
- Execute:
hostname→ "ACG-M-L5090" - Execute:
whoami→ "MikeSwanson" - Detect: platform → "win32"
- Detect: home_dir → "C:\Users\MikeSwanson"
- Generate fingerprint: SHA256(hostname|username|platform|home_dir)
Agent queries database:
SELECT * FROM machines WHERE machine_fingerprint = 'abc123...'
If machine NOT found (first time on this machine):
- Create machine record with auto-detected info
- Prompt user: "New machine detected: ACG-M-L5090. Please configure:"
- Friendly name? (e.g., "Main Laptop")
- Machine type? (laptop/desktop)
- Has VPN access? Which profiles?
- Docker installed? PowerShell version?
- Store capabilities in machines table
If machine found:
- Update last_seen timestamp
- Load machine capabilities
- Check for tool version changes (optional)
Agent returns to Main Claude:
Machine Context:
- machine_id: uuid-123
- friendly_name: "Main Laptop"
- capabilities: VPN (dataforth, grabb), Docker 24.0, PS 7.4
- limitations: None
Main Claude stores machine_id for session tracking.
Phase 1: Client/Project Detection
-
Auto-detect from context:
- Mentions of client names, domains, IPs
- If ambiguous, present quick-select list of recent clients
- Prompt: "Working on: [Client] - [Project]? (or select different)"
-
Check VPN requirements:
- If client requires VPN (e.g., Dataforth): Check if current machine has VPN capability
- If no VPN on this machine: Warn user "Dataforth requires VPN - ACG-M-L5090 has VPN access ✓"
- If VPN not available: "Travel-Laptop doesn't have VPN access - some operations may be limited"
Phase 2: Session Initialization
-
Start timer automatically
-
Create session record with:
- session_date (today)
- start_time (now)
- client_id, project_id (detected or selected)
- machine_id (from Machine Detection Agent)
- status = 'in_progress'
-
Context Display:
- Show brief summary: "MSP Mode: [Client] - [Project] | Machine: Main Laptop | Started: [time]"
- Machine capabilities displayed if relevant
- Load relevant context: recent sessions, open tasks, credentials for this client
During Session (Automatic Tracking)
Auto-Categorization - Work Items (Agent-Based): As work progresses, Main Claude tracks actions, then periodically (or on-demand) launches categorization agent:
Agent Task: "Analyze recent work and categorize"
Agent receives:
- Conversation transcript since last categorization
- Commands executed
- Files modified
- User questions/issues mentioned
Agent performs:
-
Category detection:
- Keywords trigger categories:
- "ssh", "docker restart" → infrastructure
- "error", "not working", "broken" → troubleshooting
- "configure", "setup", "change settings" → configuration
- "build", "code", "implement" → development
- "cleanup", "optimize", "backup" → maintenance
- "malware", "breach", "unauthorized" → security
- Keywords trigger categories:
-
Technology tagging:
- Auto-detect from commands/context: docker, apache, mariadb, m365, etc.
- Returns technologies_used array
-
Affected systems:
- Extract IPs, hostnames from commands
- Returns affected_systems array
-
Dense description generation:
- Problem: [what was wrong]
- Cause: [root cause if identified]
- Fix: [solution applied]
- Verify: [how confirmed]
Agent returns structured work_item data:
{
"category": "troubleshooting",
"title": "Fixed Apache SSL certificate expiration",
"description": "Problem: ERR_SSL_PROTOCOL_ERROR\nCause: Cert expired 2026-01-10\nFix: certbot renew, restarted apache\nVerify: curl test successful",
"technologies_used": ["apache", "ssl", "certbot"],
"affected_systems": ["jupiter", "172.16.3.20"],
"status": "completed"
}
Main Claude: Presents to user, stores to database via API
Information-Dense Data Capture:
-
Commands Run:
- Auto-log every bash/powershell/SQL command executed
- Store: command_text, host, shell_type, success, output_summary (first/last 5 lines)
- Link to current work_item
-
File Changes:
- Track when files are read/edited/written
- Store: file_path, change_type, backup_path (if created), size_bytes
- Brief description (auto-generated: "Modified Apache config for SSL")
-
Problems & Solutions:
- When user describes an error, auto-create problem_solution record:
- symptom: "Users can't access website"
- error_message: "ERR_CONNECTION_TIMED_OUT"
- investigation_steps: [array of diagnostic commands]
- root_cause: "Firewall blocking port 443"
- solution_applied: "Added iptables ACCEPT rule for 443"
- verification_method: "curl test successful"
- When user describes an error, auto-create problem_solution record:
-
Credentials Accessed:
- When retrieving credentials, log to credential_audit_log:
- action: 'decrypt' or 'view'
- credential_id
- user_id (from JWT)
- timestamp
- Don't log the actual credential value (security)
- When retrieving credentials, log to credential_audit_log:
-
Infrastructure Changes:
- Detect infrastructure modifications:
- DNS changes → change_type: 'dns'
- Firewall rules → change_type: 'firewall'
- Service configs → change_type: 'service_config'
- Store before_state, after_state, rollback_procedure
- Detect infrastructure modifications:
Concise Summaries:
- Auto-generate brief descriptions:
- Work item title: "Fixed Apache SSL certificate expiration on jupiter"
- Problem description: "Website down: cert expired, renewed via certbot, verified"
- Not verbose: avoid "I then proceeded to...", just facts
Billability Detection
Auto-flag billable work:
- Client work (non-internal) → is_billable = true by default
- Internal infrastructure → is_billable = false
- User can override with quick command: "/billable false"
Time allocation:
- Track time per work_item (start when created, end when completed)
- Aggregate to session total
Session End Behavior (Agent-Based)
When /msp end or /normal is called:
Main Claude launches Session Summary Agent
Agent Task: "Generate comprehensive session summary with dense format"
Agent receives:
- Full session data from main Claude
- All work_items created during session
- Commands executed log
- Files modified log
- Problems solved
- Credentials accessed
- Infrastructure changes
Agent performs:
-
Analyzes work patterns:
- Primary category (most frequent)
- Time allocation per category
- Key outcomes
-
Generates dense summary:
Session: [Client] - [Project] Duration: [duration] Category: [primary category based on work_items] Work Completed: - [Concise bullet: category, title, affected systems] - [Another item] Problems Solved: [count] - [Error] → [Solution] Infrastructure Changes: [count] - [System]: [change type] - [brief description] Commands Run: [count] | Files Modified: [count] Technologies: [tag list] Billable: [yes/no] | Hours: [calculated] -
Structures data for API:
- Complete session object
- All related: work_items, commands_run, file_changes, etc.
- Auto-calculated fields: duration, billable_hours, category distribution
Agent returns: Structured summary + API-ready payload
Main Claude:
-
Presents summary to user:
- Shows generated summary
- "Save session? (y/n)"
- "Billable hours: [auto-calculated] - adjust? (or press Enter)"
- "Add notes? (or press Enter to skip)"
-
Stores to database:
- POST to API: /api/v1/sessions
- Agent's structured payload sent
- API returns session_id
-
Generates session log file (optional):
- Create markdown file in session-logs/
- Format similar to current session logs but auto-generated
- Include all dense information captured
Context Saved: Agent processed entire session history, main Claude only receives summary and confirmation prompts.
Information Density Examples
Dense (Good):
Problem: Apache crash on jupiter
Error: segfault in mod_php
Cause: PHP 8.1 incompatibility
Fix: Downgraded to PHP 7.4, restarted apache
Verify: Website loads, no errors in logs
Files: /etc/apache2/mods-enabled/php*.conf
Commands: 3 (apt, systemctl, curl)
Verbose (Avoid):
I first investigated the Apache crash by checking the error logs.
Then I noticed that there was a segmentation fault in the mod_php module.
After some research, I determined this was due to a PHP version incompatibility.
I proceeded to downgrade PHP from version 8.1 to version 7.4.
Once that was complete, I restarted the Apache service.
Finally, I verified the fix by loading the website and checking the logs.
Dense storage = More information, fewer words.
Credential Handling (Agent-Based)
Storage:
- New credentials discovered → prompt: "Store credential for [service]? (y/n)"
- If yes → Credential Storage Agent:
- Receives: credential data, client context, service info
- Encrypts credential with AES-256-GCM
- Links to client_id, service_id, infrastructure_id
- Stores via API: POST /api/v1/credentials
- Returns: credential_id
- Main Claude confirms to user: "Stored [service] credential (ID: abc123)"
Retrieval:
- When credential needed, Main Claude launches Credential Retrieval Agent:
Agent Task: "Retrieve credential for AD2\sysadmin"
Agent performs:
- Query API: GET /api/v1/credentials?service=AD2&username=sysadmin
- Decrypt credential (API handles this)
- Log access to credential_audit_log:
- Who (JWT user_id)
- When (timestamp)
- What (credential_id, service_name)
- Why (current session_id, work_item context)
- Return only the credential value
Agent returns: "Paper123!@#"
Main Claude: Displays to user in context (e.g., "Using AD2\sysadmin password from vault")
Audit:
- Every credential access logged automatically by agent
- Main Claude doesn't see audit details (reduces context usage)
- Audit queryable later: "Show all credential access for last month"
Auto-Tagging
As work progresses, auto-apply tags:
- Mention "docker" → tag: docker
- Working on "jupiter" → tag: jupiter
- Client "Dataforth" → tag: dataforth
- Error "connection-timeout" → tag: connection-timeout
- Action "migration" → tag: migration
Tag categories:
- technology (docker, apache, mariadb)
- infrastructure (jupiter, pfsense)
- client (dataforth)
- problem_type (ssl-error, connection-timeout)
- action (migration, upgrade, cleanup)
Tags stored in work_item_tags and session_tags junction tables.
Context Awareness (Agent-Based)
When MSP session starts, Main Claude launches Context Recovery Agent:
Agent Task: "Retrieve relevant context for [Client]"
Agent queries in parallel:
- Previous sessions (last 5): GET /api/v1/sessions?client=Dataforth&limit=5
- Open pending tasks: GET /api/v1/pending-tasks?client=Dataforth&status=pending
- Recent credentials: GET /api/v1/credentials?client=Dataforth&recently_used=true
- Infrastructure: GET /api/v1/infrastructure?client=Dataforth
Agent processes and summarizes:
Context for Dataforth:
Last session: 2026-01-10 - DOS UPDATE.BAT v2.0 completion (3.5 hrs)
Open tasks (2):
- Datasheets share creation (blocked: waiting on Engineering)
- Engineer NAS access documentation (pending)
Infrastructure: AD2 (192.168.0.6), D2TESTNAS (192.168.0.9), 30x TS machines
Available credentials: AD2\sysadmin, NAS root (last used: 2026-01-10)
Agent returns concise summary
Main Claude:
- Displays context to user
- Auto-suggests: "Continue with datasheets share setup?"
- Has context for intelligent suggestions without full history in main context
During session, on-demand context retrieval:
User: "What did we do about backups for this client?"
Main Claude launches Historical Search Agent:
Agent Task: "Search Dataforth sessions for backup-related work"
Agent:
- Queries: GET /api/v1/sessions?client=Dataforth&search=backup
- Finds 3 sessions with backup work
- Extracts key outcomes
- Returns: "Found 3 backup-related sessions: 2025-12-14 (NAS setup), 2025-12-20 (Veeam config), 2026-01-05 (sync testing)"
Main Claude presents concise answer to user
Context Saved: Agent processed potentially megabytes of session data, returned 100-word summary.
Agent Types & Responsibilities
MSP Mode uses multiple specialized agents to preserve main context:
1. Context Recovery Agent
Launched: Session start (/msp command)
Purpose: Load relevant client context
Tasks:
- Query previous sessions (last 5)
- Retrieve open pending tasks
- Get recently used credentials
- Fetch infrastructure topology Returns: Concise context summary (< 300 words) API Calls: 4-5 parallel GET requests Context Saved: ~95% (processes MB of data, returns summary)
2. Work Categorization Agent
Launched: Periodically during session or on-demand Purpose: Analyze and categorize recent work Tasks:
- Parse conversation transcript
- Extract commands, files, systems, technologies
- Detect category (infrastructure, troubleshooting, etc.)
- Generate dense description
- Auto-tag work items Returns: Structured work_item object (JSON) Context Saved: ~90% (processes conversation, returns structured data)
3. Session Summary Agent
Launched: Session end (/msp end or mode switch)
Purpose: Generate comprehensive session summary
Tasks:
- Analyze all work_items from session
- Calculate time allocation per category
- Generate dense markdown summary
- Structure data for API storage
- Create billable hours calculation Returns: Summary + API-ready payload Context Saved: ~92% (processes full session, returns summary)
4. Credential Retrieval Agent
Launched: When credential needed Purpose: Securely retrieve and decrypt credentials Tasks:
- Query credentials API
- Decrypt credential value
- Log access to audit trail
- Return only credential value Returns: Single credential string API Calls: 2 (retrieve + audit log) Context Saved: ~98% (credential + minimal metadata)
5. Credential Storage Agent
Launched: When new credential discovered Purpose: Encrypt and store credential securely Tasks:
- Validate credential data
- Encrypt with AES-256-GCM
- Link to client/service/infrastructure
- Store via API
- Create audit log entry Returns: credential_id confirmation Context Saved: ~99% (only ID returned)
6. Historical Search Agent
Launched: On-demand (user asks about past work) Purpose: Search and summarize historical sessions Tasks:
- Query sessions database with filters
- Parse matching sessions
- Extract key outcomes
- Generate concise summary Returns: Brief summary of findings Example: "Found 3 backup sessions: [dates] - [outcomes]" Context Saved: ~95% (processes potentially 100s of sessions)
7. Integration Workflow Agent
Launched: Multi-step integration requests Purpose: Execute complex workflows with external tools Tasks:
- Search external ticketing systems
- Generate work summaries
- Update tickets with comments
- Pull reports from backup systems
- Attach files to tickets
- Track all integrations in database Returns: Workflow completion summary API Calls: 5-10+ external + internal calls Context Saved: ~90% (handles large files, API responses) Example: SyncroMSP ticket update + MSP Backups report workflow
8. Problem Pattern Matching Agent
Launched: When user describes an error/issue Purpose: Find similar historical problems Tasks:
- Parse error description
- Search problem_solutions table
- Extract relevant solutions
- Rank by similarity Returns: Top 3 similar problems with solutions Context Saved: ~94% (searches all problems, returns matches)
9. Database Query Agent
Launched: Complex reporting or analytics requests Purpose: Execute complex database queries Tasks:
- Build SQL queries with filters/joins
- Execute query via API
- Process result set
- Generate summary statistics
- Format for presentation Returns: Summary statistics + key findings Example: "Dataforth - Q4 2025: 45 sessions, 120 hours, $12,000 billed" Context Saved: ~93% (processes large result sets)
10. Integration Search Agent
Launched: Searching external systems Purpose: Query SyncroMSP, MSP Backups, etc. Tasks:
- Authenticate with external API
- Execute search query
- Parse results
- Summarize findings Returns: Concise list of matches API Calls: 1-3 external API calls Context Saved: ~90% (handles API pagination, large response)
11. Failure Analysis Agent
Launched: When commands/operations fail, or periodically to analyze patterns Purpose: Learn from failures to prevent future mistakes Tasks:
- Log all command/operation failures with full context
- Analyze failure patterns across sessions
- Identify environmental constraints (e.g., "Server 2008 can't run PS7 cmdlets")
- Update infrastructure environmental_notes
- Generate/update insights.md from failure database
- Create actionable resolutions Returns: Updated insights, environmental constraints Context Saved: ~94% (analyzes failures, returns key learnings)
12. Environment Context Agent
Launched: Before making suggestions or running commands on infrastructure Purpose: Check environmental constraints and insights to avoid known failures Tasks:
- Query infrastructure environmental_notes
- Read insights.md for client/infrastructure
- Check failure history for similar operations
- Validate command compatibility with environment
- Return constraints and recommendations Returns: Environmental context + compatibility warnings Example: "D2TESTNAS: Manual WINS install (no native service), ReadyNAS OS, SMB1 only" Context Saved: ~96% (processes failure history, returns summary)
13. Machine Detection Agent
Launched: Session start, before any other agents Purpose: Identify current machine and load machine-specific context Tasks:
- Execute
hostname,whoami, detect platform - Generate machine fingerprint (SHA256 hash)
- Query machines table for existing record
- If new machine: Create record, prompt user for capabilities
- If known machine: Load capabilities, VPN access, tool versions
- Update last_seen timestamp
- Check for tool updates/changes since last session Returns: Machine context (machine_id, capabilities, limitations) Example: "ACG-M-L5090: VPN access (dataforth, grabb), Docker 24.0, PowerShell 7.4" Context Saved: ~97% (machine profile loaded, only key capabilities returned)
Agent Execution Patterns
Sequential Agent Chain
Pattern: Agent A completes → Agent B starts with A's output
Example: Session End
- Work Categorization Agent → categorizes final work
- Session Summary Agent → uses categorized work to generate summary
- Database Storage → API call with structured data
Parallel Agent Execution
Pattern: Multiple agents run simultaneously
Example: Session Start
- Context Recovery Agent (previous sessions)
- Credential Cache Agent (load frequently used)
- Infrastructure Topology Agent (load network map)
- All return to main Claude in parallel (fastest wins)
On-Demand Agent
Pattern: Launched only when needed
Example: User asks: "What's the password for AD2?"
- Main Claude launches Credential Retrieval Agent
- Agent returns credential
- Main Claude displays to user
Background Agent
Pattern: Agent runs while user continues working
Example: Large report generation
- User continues conversation
- Report Generation Agent processes in background
- Notifies when complete
Failure-Aware Agent Chain
Pattern: Environment check → Operation → Failure logging → Pattern analysis
Example: Command execution on infrastructure
- Environment Context Agent checks constraints before suggesting command
- Command executes (success or failure)
- If failure: Failure Analysis Agent logs detailed failure
- Pattern analysis identifies if this is a recurring issue
- Environmental insights updated
- Future suggestions avoid this failure
Failure Logging & Environmental Awareness System
Core Principle: Every failure is a learning opportunity. Agents must never make the same mistake twice.
Failure Logging Workflow
1. Command Execution with Failure Tracking
When Main Claude or agent executes a command:
User: "Check WINS status on D2TESTNAS"
Main Claude launches Environment Context Agent:
- Queries infrastructure table for D2TESTNAS
- Reads environmental_notes: "Manual WINS install, no native service"
- Reads environmental_insights for D2TESTNAS
- Returns: "D2TESTNAS has manually installed WINS (not native ReadyNAS service)"
Main Claude suggests command based on environmental context:
- NOT: "Check Services GUI for WINS service" (WRONG - no GUI service)
- CORRECT: "ssh root@192.168.0.9 'systemctl status nmbd'" (right for manual install)
If command fails:
- Log to commands_run table:
- success = false
- exit_code = 1
- error_message = "systemctl: command not found"
- failure_category = "command_compatibility"
- Trigger Failure Analysis Agent:
- Analyzes error: ReadyNAS doesn't use systemd
- Identifies correct approach: "service nmbd status" or "ps aux | grep nmbd"
- Creates failure_pattern entry
- Updates environmental_insights with correction
- Returns resolution to Main Claude
Main Claude tries corrected command:
- Executes: "ssh root@192.168.0.9 'ps aux | grep nmbd'"
- Success = true
- Updates original failure record with resolution
2. Environmental Insights Generation
Failure Analysis Agent runs periodically (or after N failures):
Agent Task: "Analyze recent failures and update environmental insights"
Agent performs:
-
Query failures:
- All unresolved command failures
- All operation failures
- Group by infrastructure_id, client_id, pattern_type
-
Identify patterns:
- "Get-LocalUser on Server 2008" → 5 occurrences
- Pattern: Server 2008 has PowerShell 2.0 only
- Solution: Use Get-WmiObject Win32_UserAccount instead
-
Create/update failure_patterns:
INSERT INTO failure_patterns ( infrastructure_id, pattern_type = 'command_compatibility', pattern_signature = 'PowerShell 7 cmdlets on Server 2008', error_pattern = 'Get-LocalUser.*not recognized', failure_description = 'Modern PowerShell cmdlets fail on Server 2008', root_cause = 'Server 2008 only has PowerShell 2.0', recommended_solution = 'Use Get-WmiObject Win32_UserAccount instead', occurrence_count = 5, severity = 'major' ) -
Generate environmental_insights:
INSERT INTO environmental_insights ( infrastructure_id, insight_category = 'version_limitations', insight_title = 'Server 2008: PowerShell 2.0 command compatibility', insight_description = '**PowerShell Version**: 2.0 only\n**Avoid**: Get-LocalUser, Get-LocalGroup, etc.\n**Use instead**: Get-WmiObject Win32_UserAccount', examples = '["Get-WmiObject Win32_UserAccount", "Get-WmiObject Win32_Group"]', confidence_level = 'confirmed', verification_count = 5, priority = 8 ) -
Update infrastructure environmental_notes:
UPDATE infrastructure SET environmental_notes = 'Server 2008 R2. PowerShell 2.0 only (no modern cmdlets). Use WMI for user/group management.' WHERE hostname = 'old-server' -
Generate insights.md file:
- Query all environmental_insights for client
- Format as markdown
- Store in D:\ClaudeTools\insights[client-name].md
- Agents read this file before making suggestions
Agent returns: "Updated 3 failure patterns, added 2 insights for Dataforth"
3. Environment Context Agent Pre-Check
Before suggesting commands/operations:
Agent Task: "Check environmental constraints for D2TESTNAS before command suggestion"
Agent performs:
-
Query infrastructure:
- Get environmental_notes
- Get powershell_version, shell_type, limitations
-
Query environmental_insights:
- Get all insights for this infrastructure
- Sort by priority (high first)
-
Query failure_patterns:
- Get patterns affecting this infrastructure
- Check if proposed command matches any error_pattern
-
Check command compatibility:
- Proposed: "Get-Service WINS"
- Infrastructure: has_gui = true, powershell_version = "5.1"
- Insights: "WINS manually installed, no native service"
- Result: INCOMPATIBLE - suggest alternative
Agent returns:
Environmental Context for D2TESTNAS:
- ReadyNAS OS (Linux-based)
- Manual WINS installation (Samba nmbd)
- No native Windows services
- Access via SSH only
- SMB1/CORE protocol for DOS compatibility
Recommended commands:
✓ ssh root@192.168.0.9 'ps aux | grep nmbd'
✓ ssh root@192.168.0.9 'cat /etc/frontview/samba/smb.conf.overrides | grep wins'
✗ Check Services GUI (no GUI service manager)
✗ Get-Service (not Windows)
Main Claude uses this context to suggest correct approach.
4. Real-World Examples from Your Feedback
Example 1: D2TESTNAS WINS Service
Problem: Claude suggested "Check Services GUI for WINS"
Failure: User had to correct - WINS is manually installed, no GUI service
Solution after failure logging:
1. Failure logged:
- operation_type: 'user_instruction_invalid'
- error_message: 'WINS is manually installed on D2TESTNAS, no native service GUI'
- target_system: 'D2TESTNAS'
2. Environmental insight created:
- infrastructure_id: D2TESTNAS
- insight_category: 'custom_installations'
- insight_title: 'WINS: Manual Samba installation'
- insight_description: 'WINS service manually installed via Samba nmbd. Not a native ReadyNAS service. No GUI service manager available.'
- examples: ["ssh root@192.168.0.9 'ps aux | grep nmbd'"]
- priority: 9 (high - avoid wasting user time)
3. Future behavior:
- Environment Context Agent checks before suggesting WINS commands
- Returns: "D2TESTNAS has manual WINS install (no GUI)"
- Main Claude suggests SSH commands instead
Example 2: PowerShell 7 on Server 2008
Problem: Suggested Get-LocalUser on Server 2008
Failure: Command not recognized (PowerShell 2.0 only)
Solution after failure logging:
1. Command failure logged:
- command_text: 'Get-LocalUser'
- host: 'old-server-2008'
- success: false
- error_message: 'Get-LocalUser : The term Get-LocalUser is not recognized'
- failure_category: 'compatibility'
2. Failure pattern created:
- pattern_signature: 'Modern PowerShell cmdlets on Server 2008'
- error_pattern: '(Get-LocalUser|Get-LocalGroup|New-LocalUser).*not recognized'
- root_cause: 'Server 2008 has PowerShell 2.0 (no modern user management cmdlets)'
- recommended_solution: 'Use Get-WmiObject Win32_UserAccount'
3. Infrastructure updated:
- powershell_version: '2.0'
- limitations: ["no_modern_cmdlets", "no_get_local*_commands"]
- environmental_notes: 'PowerShell 2.0 only. Use WMI for user/group management.'
4. Future behavior:
- Environment Context Agent warns: "Server 2008 has PS 2.0 - modern cmdlets unavailable"
- Main Claude suggests WMI alternatives automatically
Example 3: DOS Batch File Syntax
Problem: Used IF /I (case insensitive) in DOS batch file
Failure: IF /I not recognized in MS-DOS 6.22
Solution:
1. Command failure logged:
- command_text: 'IF /I "%1"=="STATUS" GOTO STATUS'
- host: 'TS-27'
- error_message: 'Invalid switch - /I'
- failure_category: 'environmental_limitation'
2. Failure pattern created:
- pattern_signature: 'Modern batch syntax on MS-DOS 6.22'
- error_pattern: 'IF /I.*Invalid switch'
- root_cause: 'DOS 6.22 does not support /I flag (added in Windows 2000)'
- recommended_solution: 'Use duplicate IF statements for upper/lowercase'
- alternative_approaches: '["IF "%1"=="STATUS" GOTO STATUS", "IF "%1"=="status" GOTO STATUS"]'
3. Infrastructure environmental_notes:
- 'MS-DOS 6.22. No IF /I, no long filenames (8.3), no Unicode. Use basic batch only.'
4. Future behavior:
- Environment Context Agent checks OS version before batch suggestions
- Main Claude generates DOS 6.22 compatible batch files automatically
Benefits of Failure Logging System
1. Self-Improving System:
- Each failure makes the system smarter
- Patterns identified automatically
- Insights generated without manual documentation
2. Reduced User Friction:
- User doesn't have to keep correcting same mistakes
- Claude learns environmental constraints once
- Suggestions are environmentally aware from start
3. Institutional Knowledge Capture:
- All environmental quirks documented in database
- Survives across sessions and Claude instances
- Queryable: "What are known issues with D2TESTNAS?"
4. Proactive Problem Prevention:
- Environment Context Agent prevents failures before they happen
- Suggests compatible alternatives automatically
- Warns about known limitations
5. Audit Trail:
- Every failure tracked with full context
- Resolution history for troubleshooting
- Pattern analysis for infrastructure planning
Machine-Aware Operations
Core Principle: Command suggestions and operations are tailored to the capabilities of the current machine.
Real-World Scenarios
Scenario 1: VPN-Required Client Work
User on Main Laptop (has VPN):
User: "/msp" → selects Dataforth
Machine Detection Agent returns:
- Machine: ACG-M-L5090 (Main Laptop)
- VPN access: ✓ (dataforth, grabb, internal)
- Capabilities: Full toolset
Main Claude:
"MSP Mode: Dataforth | Machine: Main Laptop (VPN: ✓) | Started: 3:45 PM"
User: "SSH to D2TESTNAS and check WINS status"
Environment Context Agent:
- Current machine: Main Laptop (has VPN: dataforth)
- Target: D2TESTNAS (requires VPN to 192.168.0.9)
- ✓ VPN check passed
Main Claude suggests:
"ssh root@192.168.0.9 'ps aux | grep nmbd'"
User on Travel Laptop (NO VPN):
User: "/msp" → selects Dataforth
Machine Detection Agent returns:
- Machine: Travel-Laptop
- VPN access: ✗ (none configured)
- Capabilities: Limited toolset
Main Claude:
"⚠️ MSP Mode: Dataforth | Machine: Travel Laptop (VPN: ✗) | Started: 3:45 PM"
"Warning: Dataforth infrastructure requires VPN access. Travel-Laptop has no VPN configured."
"Some operations may be unavailable. Switch to Main Laptop for full access."
User: "SSH to D2TESTNAS"
Environment Context Agent:
- Current machine: Travel-Laptop (no VPN)
- Target: D2TESTNAS (requires VPN to 192.168.0.9)
- ✗ VPN check FAILED
Main Claude responds:
"Cannot access D2TESTNAS from Travel-Laptop (no VPN connection to 192.168.0.0/24)."
"Options:"
" 1. Switch to Main Laptop or Desktop (have VPN access)"
" 2. Configure VPN on this machine (I can help)"
" 3. Work on different client/task"
Scenario 2: Docker-Based Development
User on Main Laptop (has Docker):
User: "Build the GuruRMM container"
Machine Detection Agent: Main Laptop has Docker 24.0 ✓
Main Claude:
"Building GuruRMM container on Main Laptop..."
[executes docker build commands]
User on Travel Laptop (no Docker):
User: "Build the GuruRMM container"
Machine Detection Agent: Travel-Laptop has no Docker ✗
Main Claude:
"Travel-Laptop doesn't have Docker installed."
"Options:"
" 1. Install Docker (I can guide you)"
" 2. Switch to Main Laptop or Desktop (have Docker)"
" 3. Use remote build server (172.16.3.30)"
Scenario 3: PowerShell Version Differences
User on Main Laptop (PowerShell 7.4):
User: "Get local users on AD2"
Machine Detection Agent: Main Laptop has PowerShell 7.4 ✓
Main Claude suggests modern cmdlet:
"Get-LocalUser -ComputerName AD2"
User on old Desktop (PowerShell 5.1):
User: "Get local users on AD2"
Machine Detection Agent: Desktop has PowerShell 5.1
Environment Context Agent:
- Desktop: PowerShell 5.1 (no Get-LocalUser remote capability)
- Suggests WinRM or alternative
Main Claude:
"Invoke-Command -ComputerName AD2 -ScriptBlock { Get-LocalUser }"
(or suggests WMI alternative if WinRM not available)
Scenario 4: Session History Per Machine
Querying past work:
User: "What did I work on last from this laptop?"
Machine Detection Agent: Current machine = Main Laptop (machine_id: uuid-123)
Historical Search Agent:
Query: SELECT * FROM sessions WHERE machine_id = 'uuid-123' ORDER BY session_date DESC LIMIT 5
Returns:
"Recent sessions from Main Laptop:"
"1. Dataforth - DOS UPDATE.BAT v2.0 (Jan 15, 3.5 hrs)"
"2. Grabb & Durando - DNS migration (Jan 14, 2.0 hrs)"
"3. Internal - GuruRMM container build (Jan 13, 1.5 hrs)"
User: "What about from my desktop?"
Historical Search Agent:
Query: SELECT * FROM sessions WHERE machine_id = (SELECT id FROM machines WHERE friendly_name = 'Desktop')
Returns:
"Recent sessions from Desktop:"
"1. Valley Wide Plastering - M365 migration planning (Jan 12, 2.5 hrs)"
"2. Internal - Infrastructure upgrades (Jan 10, 4.0 hrs)"
Machine-Specific Insights
Machine capabilities inform command suggestions:
-- Before suggesting Docker command
SELECT has_docker FROM machines WHERE id = current_machine_id
-- Before suggesting SSH to client infrastructure
SELECT vpn_profiles FROM machines WHERE id = current_machine_id
-- Check if client's network is in vpn_profiles array
-- Before suggesting PowerShell cmdlets
SELECT powershell_version FROM machines WHERE id = current_machine_id
-- Use PS 2.0 compatible commands if version = "2.0"
Benefits of Machine Tracking
1. Capability-Aware Suggestions:
- Never suggest Docker commands on machines without Docker
- Never suggest VPN-required access from non-VPN machines
- Use version-compatible syntax for PowerShell/tools
2. Session Portability:
- Know which sessions were done where
- Understand tool availability context for past work
- Resume work on appropriate machine
3. Troubleshooting Context:
- "This worked on Main Laptop but not Desktop" → Check tool versions
- Machine-specific environmental issues tracked
- Cross-machine compatibility insights
4. User Experience:
- Proactive warnings about capability limitations
- Helpful suggestions to switch machines when needed
- No wasted time trying commands that won't work
5. Multi-User MSP Team (Future):
- Track which technician on which machine
- Machine capabilities per team member
- Resource allocation (who has VPN access to which clients)
OS-Specific Command Selection
Core Principle: Never suggest Windows commands on Mac, Mac commands on Windows, or Linux-only commands on either.
Command Selection Logic
Machine Detection Agent provides platform context:
{
"platform": "win32", // or "darwin", "linux"
"preferred_shell": "powershell", // or "zsh", "bash", "cmd"
"package_manager_commands": {...}
}
Main Claude selects appropriate commands based on platform:
File Operations
| Task | Windows (win32) | macOS (darwin) | Linux |
|---|---|---|---|
| List files | dir or Get-ChildItem |
ls -la |
ls -la |
| Find file | Get-ChildItem -Recurse -Filter |
find . -name |
find . -name |
| Copy file | Copy-Item |
cp |
cp |
| Move file | Move-Item |
mv |
mv |
| Delete file | Remove-Item |
rm |
rm |
Process Management
| Task | Windows | macOS | Linux |
|---|---|---|---|
| List processes | Get-Process |
ps aux |
ps aux |
| Kill process | Stop-Process -Name |
killall |
pkill |
| Process tree | `Get-Process | Select-Object` | pstree |
Network Operations
| Task | Windows | macOS | Linux |
|---|---|---|---|
| IP config | ipconfig |
ifconfig |
ip addr |
| DNS lookup | nslookup |
dig |
dig |
| Ping | ping -n 4 |
ping -c 4 |
ping -c 4 |
| Port check | Test-NetConnection -Port |
nc -zv |
nc -zv |
Package Management
| Task | Windows (Chocolatey) | macOS (Homebrew) | Linux (apt/yum) |
|---|---|---|---|
| Install | choco install {pkg} |
brew install {pkg} |
apt install {pkg} |
| Update | choco upgrade {pkg} |
brew upgrade {pkg} |
apt upgrade {pkg} |
| Search | choco search {pkg} |
brew search {pkg} |
apt search {pkg} |
| List | choco list --local |
brew list |
apt list --installed |
MCP & Skill Availability Check
Before calling MCP or Skill:
// Machine Detection Agent returns available_mcps and available_skills
current_machine = {
"available_mcps": ["claude-in-chrome", "filesystem"],
"available_skills": ["pdf", "commit", "review-pr"]
}
// User requests: "Take a screenshot of this webpage"
// Check if claude-in-chrome MCP is available:
if (current_machine.available_mcps.includes("claude-in-chrome")) {
// Use mcp__claude-in-chrome__computer screenshot action
} else {
// Inform user: "claude-in-chrome MCP not available on this machine"
// Suggest: "Switch to Main Laptop (has claude-in-chrome MCP)"
}
// User requests: "/pdf" to export document
// Check if pdf skill is available:
if (current_machine.available_skills.includes("pdf")) {
// Execute pdf skill
} else {
// Inform user: "pdf skill not available on Travel-Laptop"
// Suggest: "Install pdf skill or switch to Main Laptop"
}
Real-World Example: Cross-Platform File Search
User on Windows (ACG-M-L5090):
User: "Find all Python files in the project"
Machine Detection Agent: platform = "win32", preferred_shell = "powershell"
Main Claude uses Windows-appropriate command:
"Get-ChildItem -Path . -Recurse -Filter *.py | Select-Object FullName"
OR (if using bash-style preference):
"dir /s /b *.py"
Same user on MacBook:
User: "Find all Python files in the project"
Machine Detection Agent: platform = "darwin", preferred_shell = "zsh"
Main Claude uses macOS-appropriate command:
"find . -name '*.py' -type f"
Shell-Specific Syntax
PowerShell (Windows):
# Variables
$var = "value"
# Conditionals
if ($condition) { }
# Loops
foreach ($item in $collection) { }
# Output
Write-Host "message"
Bash/Zsh (macOS/Linux):
# Variables
var="value"
# Conditionals
if [ "$condition" ]; then fi
# Loops
for item in $collection; do done
# Output
echo "message"
Batch/CMD (Windows legacy):
REM Variables
SET var=value
REM Conditionals
IF "%var%"=="value" ( )
REM Loops
FOR %%i IN (*) DO ( )
REM Output
ECHO message
Environment-Specific Path Separators
Machine Detection Agent provides path conventions:
| Platform | Path Separator | Home Directory | Example Path |
|---|---|---|---|
| Windows | \ (backslash) |
C:\Users\{user} |
C:\Users\MikeSwanson\Documents |
| macOS | / (forward slash) |
/Users/{user} |
/Users/mike/Documents |
| Linux | / (forward slash) |
/home/{user} |
/home/mike/documents |
Main Claude constructs paths appropriately:
if (platform === "win32") {
path = `${home_directory}\\claude-projects\\${project}`
} else {
path = `${home_directory}/claude-projects/${project}`
}
Benefits
1. No Cross-Platform Errors:
- Windows commands never suggested on Mac
- Mac commands never suggested on Windows
- Shell syntax matches current environment
2. MCP/Skill Availability:
- Never attempt to call unavailable MCPs
- Suggest alternative machines if MCP needed
- Track which skills are installed where
3. Package Manager Intelligence:
- Use
chocoon Windows,brewon Mac,apton Linux - Correct syntax for each package manager
- Installation suggestions platform-appropriate
4. User Experience:
- Commands always work on current platform
- No manual translation needed
- Helpful suggestions when capabilities missing
Summary: MSP Mode = Smart Agent-Based Auto-Tracking
Architecture:
- Main Claude Instance: Conversation, decision-making, user interaction
- Specialized Agents: Data processing, queries, integrations, analysis
Benefits:
- Context Preservation: Main instance stays focused, agents handle heavy lifting
- Scalability: Parallel agents for concurrent operations
- Information Density: Agents process raw data, return summaries
- Separation of Concerns: Clean boundaries between conversation and data operations
User Experience:
- Auto-categorize work as it happens (via agents)
- Auto-extract structured data (via agents)
- Auto-tag based on content (via agents)
- Auto-detect billability (via agents)
- Auto-generate dense summaries (via agents)
- Auto-link related data (via agents)
- Minimal user input required - agents do the categorization
- Maximum information density - agents ensure brief but complete
Result: User just works, main Claude maintains conversation, agents capture everything in structured, queryable format.
MSP Tool Integrations (Future Capability)
Core Requirement: MSP Mode will integrate with external MSP platforms to automate workflows and link session data to ticketing/documentation systems.
Planned Integrations
1. SyncroMSP (PSA/RMM Platform)
- Tickets: Create, update, search, attach files
- Time Tracking: Log billable time from sessions automatically
- Assets: Link session work to customer assets
- Contacts: Associate work with customer contacts
- API: REST API with OAuth
2. MSP Backups (Backup Management)
- Reports: Pull backup status reports
- Alerts: Check backup failures
- Statistics: Get backup size, success rates
- Export: Attach reports to tickets
- API: REST API (vendor-specific)
3. Zapier (Automation Platform)
- Triggers: Session completed → Create ticket
- Actions: Ticket created → Start session
- Webhooks: Bi-directional automation
- API: REST API + webhooks
4. Future Integrations
- Autotask (PSA)
- ConnectWise (PSA)
- Datto RMM
- IT Glue (Documentation)
- Microsoft Teams (notifications)
- Email (session summaries)
Example Workflow (Agent-Based)
User request:
"Find the ticket for Dataforth about Backups - update it with the changes we've made today, and pull a report from MSP backups to attach to that ticket"
Main Claude MSP Mode execution:
Main Claude understands multi-step integration request, launches Integration Workflow Agent:
Agent Task: "Execute SyncroMSP ticket update and MSP Backups report workflow for Dataforth"
Agent receives context from Main Claude:
- Current session: client_id = Dataforth
- Current session work_items (today's work)
Agent executes workflow:
-
Search for ticket:
- API call: SyncroMSP.search_tickets(customer="Dataforth", subject contains "Backup")
- Result: Ticket #12345 "Backup configuration for NAS"
-
Generate work summary:
- Processes current session work_items
- Creates dense summary:
Changes made today: - Configured Veeam backup job for D2TESTNAS - Set retention: 30 days local, 90 days cloud - Tested backup: successful (45GB) - Verified restore point creation
-
Update ticket:
- API call: SyncroMSP.add_comment(ticket_id=12345, comment=summary)
- Store in DB: POST /api/v1/external-integrations
{ "session_id": "current-session-id", "integration_type": "syncro_ticket", "external_id": "12345", "action": "updated", "request_data": {"comment": "..."}, "response_data": {"comment_id": "67890"} }
-
Pull backup report:
- API call: MSPBackups.get_report(customer="Dataforth", date=today)
- Receives report PDF (could be large - agent handles it)
- Summarizes key metrics: "Report: 5 backup jobs, all successful, total size 245GB"
-
Attach report to ticket:
- API call: SyncroMSP.attach_file(ticket_id=12345, file=report_pdf)
- Store attachment action in external_integrations
-
Create ticket link:
- Store in DB: POST /api/v1/ticket-links
{ "session_id": "current-session-id", "client_id": "dataforth-id", "integration_type": "syncro", "ticket_id": "12345", "ticket_subject": "Backup configuration for NAS", "link_type": "documents" }
- Store in DB: POST /api/v1/ticket-links
Agent returns concise summary:
Workflow complete:
✓ Found ticket #12345: Backup configuration for NAS
✓ Added work summary comment
✓ Pulled backup report (5 jobs, all successful, 245GB)
✓ Attached report to ticket
Main Claude: Presents summary to user, confirms completion
Context Saved:
- Agent handled all API calls, file transfers, database updates
- Main Claude only received 50-word summary
- Large PDF file never entered main context
- Multi-step workflow executed autonomously
Database Schema Additions for Integrations
New table: external_integrations
CREATE TABLE external_integrations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE,
-- Integration details
integration_type VARCHAR(100) NOT NULL, -- 'syncro_ticket', 'msp_backups', 'zapier_webhook'
external_id VARCHAR(255), -- ticket ID, asset ID, etc.
external_url VARCHAR(500), -- direct link to resource
-- Action tracking
action VARCHAR(50), -- 'created', 'updated', 'linked', 'attached'
direction VARCHAR(20), -- 'outbound' (we pushed) or 'inbound' (they triggered)
-- Data
request_data TEXT, -- JSON: what we sent
response_data TEXT, -- JSON: what we received
-- Metadata
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by VARCHAR(255), -- user who authorized
INDEX idx_ext_int_session (session_id),
INDEX idx_ext_int_type (integration_type),
INDEX idx_ext_int_external (external_id)
);
New table: integration_credentials
CREATE TABLE integration_credentials (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
integration_name VARCHAR(100) NOT NULL UNIQUE, -- 'syncro', 'msp_backups', 'zapier'
-- OAuth or API key
credential_type VARCHAR(50) CHECK(credential_type IN ('oauth', 'api_key', 'basic_auth')),
api_key_encrypted BYTEA,
oauth_token_encrypted BYTEA,
oauth_refresh_token_encrypted BYTEA,
oauth_expires_at TIMESTAMP,
-- Endpoints
api_base_url VARCHAR(500),
webhook_url VARCHAR(500),
-- Status
is_active BOOLEAN DEFAULT true,
last_tested_at TIMESTAMP,
last_test_status VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_int_cred_name (integration_name)
);
New table: ticket_links
CREATE TABLE ticket_links (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
-- Ticket info
integration_type VARCHAR(100) NOT NULL, -- 'syncro', 'autotask', 'connectwise'
ticket_id VARCHAR(255) NOT NULL,
ticket_number VARCHAR(100), -- human-readable: "T12345"
ticket_subject VARCHAR(500),
ticket_url VARCHAR(500),
ticket_status VARCHAR(100),
-- Linking
link_type VARCHAR(50), -- 'related', 'resolves', 'documents'
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_ticket_session (session_id),
INDEX idx_ticket_client (client_id),
INDEX idx_ticket_external (integration_type, ticket_id)
);
API Integration Layer
FastAPI endpoints for integration management:
GET /api/v1/integrations (list configured integrations)
POST /api/v1/integrations/{name}/test (test connection)
GET /api/v1/integrations/{name}/credentials (get encrypted credentials)
PUT /api/v1/integrations/{name}/credentials (update credentials)
# Syncro-specific
GET /api/v1/syncro/tickets (search tickets)
POST /api/v1/syncro/tickets/{id}/comment (add comment)
POST /api/v1/syncro/tickets/{id}/attach (attach file)
POST /api/v1/syncro/time (log time entry)
# MSP Backups
GET /api/v1/mspbackups/report (pull report)
GET /api/v1/mspbackups/status/{client} (backup status)
# Zapier webhooks
POST /api/v1/webhooks/zapier (receive webhook)
Workflow Automation
Session → Ticket Linking: When MSP Mode session ends:
- Ask user: "Link this session to a ticket? (y/n/search)"
- If search: query Syncro for tickets matching client
- If found: link session_id to ticket_id in ticket_links table
- Auto-post session summary as ticket comment (optional)
Auto Time Tracking: When session ends with billable hours:
- Ask: "Log 2.5 hours to SyncroMSP? (y/n)"
- If yes: POST to Syncro time tracking API
- Link time entry ID to session in external_integrations
Backup Report Automation: Trigger: User mentions "backup" in MSP session for client
- Detect keyword "backup"
- Auto-suggest: "Pull latest backup report for [Client]? (y/n)"
- If yes: Query MSPBackups API, display summary
- Option to attach to ticket or save to session
Permission & Security
OAuth Flow:
- User initiates:
/msp integrate syncro - Claude generates OAuth URL, user authorizes in browser
- Callback URL receives token, encrypts, stores in integration_credentials
- Refresh token used to maintain access
API Key Storage:
- All integration credentials encrypted with AES-256-GCM
- Same master key as credential storage
- Audit log for all integration credential access
Scopes:
- Read-only for initial implementation (search tickets, pull reports)
- Write access requires explicit user confirmation per action
- Never auto-update tickets without user approval
Future Capabilities
Natural Language Integration:
- "Create a ticket for Dataforth about today's backup work"
- "Show me all open tickets for Grabb & Durando"
- "Pull the backup report for last week and email it to [contact]"
- "Log today's 3 hours to ticket T12345"
- "What tickets mention Apache or SSL?"
Multi-Step Workflows:
- Session ends → Auto-create ticket → Auto-log time → Auto-attach session summary
- Backup failure detected (via webhook) → Create session → Investigate → Update ticket
- Ticket created in Syncro (webhook) → Notify Claude → Start MSP session
Bi-Directional Sync:
- Ticket updated in Syncro → Webhook to Claude → Add to pending_tasks
- Session completed in Claude → Auto-comment in ticket
- Time logged in Claude → Synced to Syncro billing
Implementation Priority
Phase 1 (MVP):
- Database tables for integrations
- SyncroMSP ticket search and read
- Manual ticket linking
- Session summary → ticket comment (manual)
Phase 2:
- MSP Backups report pulling
- File attachments to tickets
- OAuth token refresh automation
- Auto-suggest ticket linking
Phase 3:
- Zapier webhook triggers
- Auto time tracking
- Multi-step workflows
- Natural language commands
Phase 4:
- Bi-directional sync
- Advanced automation
- Additional PSA integrations (Autotask, ConnectWise)
- IT Glue documentation sync
Impact on Current Architecture
API Design Considerations:
- Modular integration layer (plugins per platform)
- Webhook receiver endpoints
- OAuth flow support
- Rate limiting per integration
- Retry logic for failed API calls
Database Design:
- external_integrations table (already designed above)
- integration_credentials table (already designed above)
- ticket_links table (already designed above)
- Indexes for fast external_id lookups
Security:
- Integration credentials separate from user credentials
- Per-integration permission scopes
- Audit logging for all external API calls
- User confirmation for write operations
FastAPI Architecture:
# Integration plugins
integrations/
├── __init__.py
├── base.py (BaseIntegration abstract class)
├── syncro.py (SyncroMSP integration)
├── mspbackups.py (MSP Backups integration)
├── zapier.py (Zapier webhooks)
└── future/
├── autotask.py
├── connectwise.py
└── itglue.py
This integration capability is foundational to MSP Mode's value proposition - linking real-world MSP workflows to intelligent automation.
Normal Mode Behaviors (Agent-Based Architecture)
Core Principle: Track valuable work that doesn't belong to a specific client or dev project. General research, internal tasks, exploration, learning.
Agent Usage in Normal Mode: Same agent architecture as MSP Mode, but with lighter tracking requirements.
Purpose
Normal Mode is for:
- Research and exploration - "How does JWT authentication work?"
- General questions - "What's the best way to handle SQL migrations?"
- Internal infrastructure (non-client) - Working on Jupiter/Saturn without client context
- Learning/experimentation - Testing new tools, trying approaches
- Documentation - Writing guides, updating READMEs
- Non-categorized work - Anything that doesn't fit MSP or Dev
Not for:
- Client work → Use MSP Mode
- Specific development projects → Use Dev Mode
When /normal is Called
-
Mode Switch:
- If coming from MSP/Dev mode: preserve all knowledge/context from previous mode
- Set session context to "general" (no client_id, no project_id)
- Display: "Normal Mode | General work session"
-
Knowledge Retention:
- Keep: All learned information, credentials accessed, context from previous modes
- Clear: Client/project assignment only
- Rationale: You might research something in Normal mode, then apply it in MSP mode
-
Session Creation:
- Create session with:
- client_id = NULL
- project_id = NULL (or link to "Internal" or "Research" pseudo-project)
- session_title = "General work session: [auto-generated from topic]"
- is_billable = false (by default, since not client work)
- Create session with:
During Normal Mode Session
Tracking (lighter than MSP):
- Still create work_items, but less granular
- Track major actions: "Researched FastAPI authentication patterns"
- Track useful findings: "Found pyjwt library, better than python-jose"
- Track decisions: "Decided to use Alembic for migrations"
What gets stored:
- Work items with category (usually 'documentation', 'research', or 'development')
- Key commands run (if applicable)
- Files modified (if applicable)
- Tags: technology, topics
- Brief summary of what was learned/accomplished
What doesn't get stored:
- Less emphasis on billability tracking
- No client/project relationships
- Less detailed command/file tracking (unless requested)
Information Density in Normal Mode
Focus on: Capturing useful knowledge, decisions, findings
Example Normal Mode work_item:
Title: Researched PostgreSQL connection pooling for FastAPI
Category: research
Description: Compared SQLAlchemy pooling vs asyncpg.
Finding: SQLAlchemy pool_size=20, max_overflow=10 optimal for our load.
Decision: Use SQLAlchemy with pool_pre_ping=True for connection health checks.
Tags: postgresql, sqlalchemy, fastapi, connection-pooling
Billable: false
Not: "I started by searching for PostgreSQL connection pooling documentation..."
Session End (Normal Mode)
Auto-save:
- session.client_id = NULL
- session.is_billable = false
- session.summary = brief summary of research/work done
Generated summary example:
Session: General Research
Duration: 45 minutes
Category: Research
Topics Explored:
- FastAPI database connection pooling
- JWT vs session authentication
- Alembic migration strategies
Key Findings:
- SQLAlchemy pooling recommended for our use case
- JWT refresh tokens better than long-lived access tokens
- Alembic supports auto-generate from models
Tags: fastapi, postgresql, jwt, alembic
Billable: No
Value of Normal Mode Sessions
Why track this?
- Knowledge base - "What did I learn about X last month?"
- Decision trail - "Why did we choose technology Y?"
- Reference - "Where did I see that solution before?"
- Context recovery - Future Claude instances can search: "Show me research on JWT authentication"
Queryable:
- "What have I researched about Docker networking?"
- "When did I decide to use FastAPI over Flask?"
- "Show all sessions tagged 'postgresql'"
Mode Comparison
| Aspect | MSP Mode | Dev Mode | Normal Mode |
|---|---|---|---|
| Purpose | Client work | Specific projects | General work/research |
| Client/Project | Required | Optional | None (NULL) |
| Billable | Default: yes | Default: no | Default: no |
| Detail Level | High (every command) | Medium | Light (key actions) |
| Focus | Client value delivery | Code/features | Knowledge/learning |
| Session Title | "[Client] - [Issue]" | "[Project] - [Feature]" | "Research: [Topic]" |
Switching Between Modes
MSP → Normal:
- User: "Let me research how to fix this SSL issue"
/normal→ Research mode, but can reference client context if needed- Knowledge retained: knows which client, what SSL issue
- Categorization: research session, not billable client work
Normal → MSP:
- User: "Okay, back to Dataforth"
/msp→ Resumes (or starts new) Dataforth session- Knowledge retained: knows solution from research
- Categorization: client work, billable
Dev → Normal → MSP:
- Modes are fluid, knowledge carries through
- Only categorization changes
- Session assignments change (project vs client vs general)
Development Mode Behaviors (To Define)
- What should Development Mode track?
- How does it differ from MSP and Normal modes?
- Integration with git repos?
Implementation Order
- Database schema design
- API development and deployment
- MCP server or API client for Claude Code
- MSP Mode slash command
- Development Mode
- Normal Mode
Security Considerations
Credential Storage
- Never store plaintext passwords
- Use Fernet encryption or AES-256-GCM
- Encryption key stored separately from database
- Key rotation strategy needed
API Security
- HTTPS only (no HTTP)
- Rate limiting (prevent brute force)
- IP whitelisting (optional - VPN only)
- Token expiration and refresh
- Revocation list for compromised tokens
- Audit logging for credential access
Multi-Machine Sync
- Encrypted tokens in Gitea config
- git-crypt or encrypted JSON values
- Never commit plaintext tokens to repo
Next Steps (Planning Phase)
- ✅ Architecture decisions (SQL, FastAPI, JWT)
- ⏳ Define MSP Mode behaviors in detail
- ⏳ Design database schema
- ⏳ Define API endpoints specification
- ⏳ Create authentication flow diagram
- ⏳ Design slash command interactions
Notes
- This spec will evolve as we discuss details
- Focus on scalability and robustness
- Design for future team members and integrations
- All decisions documented with rationale
Change Log
2026-01-15 (Evening Update 3):
- CRITICAL ADDITION: Machine Detection & OS-Specific Command Selection
- Added 1 new specialized agent (total: 13 agents): 13. Machine Detection Agent (identifies current machine, loads capabilities)
- Added 1 new database table (total: 36 tables):
- machines (technician's laptops/desktops with capabilities tracking)
- backup_log (backup tracking with verification status)
- Enhanced sessions table with machine_id tracking
- Machine fingerprinting via SHA256(hostname|username|platform|home_dir)
- Auto-detection on session start (hostname, whoami, platform)
- Machine capabilities tracked:
- VPN access per client, Docker, PowerShell version, SSH, Git
- Available MCPs (claude-in-chrome, filesystem, etc.)
- Available skills (pdf, commit, review-pr, etc.)
- OS-specific package managers (choco, brew, apt)
- Preferred shell (powershell, zsh, bash, cmd)
- OS-specific command selection:
- Windows vs macOS vs Linux command mapping
- Shell-specific syntax (PowerShell vs Bash vs Batch)
- Path separator handling (\ vs /)
- Package manager commands per platform
- MCP/Skill availability checking before calls
- VPN requirements validation before client access
- Real-world scenarios documented:
- VPN-required client work (warns if no VPN on current machine)
- Docker-based development (suggests machines with Docker)
- PowerShell version differences (uses compatible cmdlets)
- Session history per machine tracking
- User has 3 laptops + 1 desktop, each with different environments
- Benefits: No cross-platform errors, capability-aware suggestions, session portability
2026-01-15 (Evening Update 2):
- CRITICAL ADDITION: Failure Logging & Environmental Awareness System
- Added 2 new specialized agents: 11. Failure Analysis Agent (learns from all failures) 12. Environment Context Agent (pre-checks before suggestions)
- Added 3 new database tables (total: 33 tables):
- failure_patterns (aggregated failure insights)
- environmental_insights (generated insights.md content)
- operation_failures (non-command failures)
- Enhanced infrastructure table with environmental constraints:
- environmental_notes, powershell_version, shell_type, limitations, has_gui
- Enhanced commands_run table with failure tracking:
- exit_code, error_message, failure_category, resolution, resolved
- Documented complete failure logging workflow:
- Command execution → Failure detection → Pattern analysis → Insights generation
- Environment pre-check prevents future failures
- Self-improving system learns from every mistake
- Real-world examples documented:
- D2TESTNAS WINS service (manual install, no GUI)
- PowerShell 7 cmdlets on Server 2008 (version incompatibility)
- DOS batch file syntax (IF /I not supported in DOS 6.22)
- Benefits: Self-improving, reduced user friction, institutional knowledge, proactive prevention
2026-01-15 (Evening Update 1):
- CRITICAL ARCHITECTURAL ADDITION: Agent-Based Execution
- Added core principle: All modes use specialized agents to preserve main context
- Documented 10 specialized agent types:
- Context Recovery Agent (session start)
- Work Categorization Agent (periodic analysis)
- Session Summary Agent (session end)
- Credential Retrieval Agent (secure access)
- Credential Storage Agent (secure storage)
- Historical Search Agent (on-demand queries)
- Integration Workflow Agent (multi-step external integrations)
- Problem Pattern Matching Agent (solution lookup)
- Database Query Agent (complex reporting)
- Integration Search Agent (external system queries)
- Defined agent execution patterns: Sequential Chain, Parallel, On-Demand, Background
- Updated all MSP Mode workflows to use agents
- Updated integration example to demonstrate agent-based execution
- Added context preservation metrics (90-99% context saved per agent)
- Architecture benefits: Context preservation, scalability, separation of concerns
- User experience: Agents handle all heavy lifting, main Claude stays conversational
2026-01-15 (Initial):
- Initial spec created
- Architecture decisions: SQL, FastAPI, JWT, Gitea config
- Technology stack defined
- High-level infrastructure design
- Open questions identified
- Database schema designed: 30 tables via parallel agent analysis
- 5 parallel agents analyzed: sessions, credentials, projects, work categorization, infrastructure
- Comprehensive schema with 25 core tables + 5 junction tables
- Analyzed 37 session logs, credentials file, all projects, infrastructure docs
- Estimated storage: 1-2 GB/year
- Pre-identified 157+ tags for categorization
- MSP Mode behaviors defined:
- Auto-categorization of client work
- Information-dense storage format
- Auto-tracking: commands, files, problems, credentials, infrastructure changes
- Smart billability detection
- Context awareness and auto-suggestion
- Normal Mode behaviors defined:
- For general work/research not assigned to client or dev project
- Knowledge retention across mode switches
- Lighter tracking than MSP mode
- Captures decisions, findings, learnings
- Queryable knowledge base
- External integrations architecture added:
- SyncroMSP, MSP Backups, Zapier integration design
- 3 additional database tables (external_integrations, integration_credentials, ticket_links)
- Multi-step workflow example documented
- OAuth flow and security considerations