Files

Mike Swanson fffb71ff08 Initial commit: ClaudeTools system foundation

Complete architecture for multi-mode Claude operation:
- MSP Mode (client work tracking)
- Development Mode (project management)
- Normal Mode (general research)

Agents created:
- Coding Agent (perfectionist programmer)
- Code Review Agent (quality gatekeeper)
- Database Agent (data custodian)
- Gitea Agent (version control)
- Backup Agent (data protection)

Workflows documented:
- CODE_WORKFLOW.md (mandatory review process)
- TASK_MANAGEMENT.md (checklist system)
- FILE_ORGANIZATION.md (hybrid storage)
- MSP-MODE-SPEC.md (complete architecture, 36 tables)

Commands:
- /sync (pull latest from Gitea)

Database schema: 36 tables for comprehensive context storage
File organization: clients/, projects/, normal/, backups/
Backup strategy: Daily/weekly/monthly with retention

Status: Architecture complete, ready for implementation

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-15 18:55:45 -07:00

116 KiB

Raw Blame History

MSP Mode Specification

Date Started: 2026-01-15 Status: Planning / Design Phase

Overview

Creating a custom "MSP Mode" for Claude Code that tracks client work, maintains context across sessions and machines, and provides structured access to historical MSP data.

Objectives

Track MSP client work - Sessions, work items, time, credentials
Multi-machine access - Same context available on all machines
Long-term context - Search historical work across clients/projects
Scalable & robust - Designed to evolve over time

Core Architectural Principle: Agent-Based Execution

Critical Design Rule: All modes (MSP, Development, Normal) use specialized agents wherever possible to preserve main Claude instance context space.

Why Agent-Based Architecture?

Context Preservation:

Main Claude instance maintains conversation focus with user
Agents handle data processing, queries, analysis, integration calls
User gets concise results without context pollution
Main instance doesn't get bloated with raw database results or API responses

Scalability:

Multiple agents can run in parallel
Each agent has full context window for its specific task
Complex operations don't consume main context

Separation of Concerns:

Main instance: Conversation, decision-making, user interaction
Agents: Data retrieval, processing, categorization, integration

Agent Usage Patterns

When to Use Agents:

Database Operations
- Querying sessions, work items, credentials
- Searching historical data
- Complex joins and aggregations
- Agent returns concise summary, not raw rows
Session Analysis & Categorization
- End-of-session processing
- Auto-categorizing work items
- Extracting structured data (commands, files, systems)
- Generating dense summaries
- Auto-tagging
External Integrations
- Searching tickets in SyncroMSP
- Pulling backup reports from MSP Backups
- OAuth flows
- Processing external API responses
Context Recovery
- User asks: "What did we do for Dataforth last time?"
- Agent searches database, retrieves sessions, summarizes
- Returns: "Last session: 2026-01-10, worked on DNS migration (3 hours)"
Credential Management
- Retrieving encrypted credentials
- Decryption and formatting
- Audit logging
- Returns only the credential needed
Problem Pattern Matching
- User describes error
- Agent searches problem_solutions table
- Returns: "Similar issue solved on 2025-12-15: [brief solution]"
Parallel Analysis
- Multiple data sources need analysis
- Launch parallel agents for each source
- Aggregate results in main context

When NOT to Use Agents:

Simple API calls that return small payloads
Single credential lookups
Quick status checks
User is asking conversational questions (not data operations)

Agent Communication Pattern

User: "Show me all work for Dataforth in January"
  ↓
Main Claude: Understands request, validates parameters
  ↓
Launches Agent: "Explore database for Dataforth sessions in January 2026"
  ↓
Agent:
  - Queries database (sessions WHERE client='Dataforth' AND date BETWEEN...)
  - Processes 15 sessions
  - Extracts key info: dates, categories, billable hours, major outcomes
  - Generates concise summary
  ↓
Agent Returns:
  "Dataforth - January 2026:
   15 sessions, 38.5 billable hours
   Main projects: DOS machines (8 sessions), Network migration (5), M365 (2)
   Categories: Infrastructure (60%), Troubleshooting (25%), Config (15%)
   Key outcomes: Completed UPDATE.BAT v2.0, migrated DNS to UDM"
  ↓
Main Claude: Presents summary to user, ready for follow-up questions

Context Saved: Agent processed potentially 500+ rows of data, main Claude only received 200-word summary.

Architecture Decisions

Storage: SQL Database (MariaDB)

Decision: Use SQL database instead of local files + Git sync

Rationale:

Claude Code requires internet anyway (offline not a real advantage)
Structured queries needed ("show all work for Client X in January")
Relational data (clients → projects → sessions → credentials → billing)
Fast indexing and search even with years of data
No merge conflicts (single source of truth)
Time tracking and billing calculations
Report generation capabilities

Infrastructure:

Existing MariaDB on Jupiter (172.16.3.20)
New database: msp_tracking

Access Method: REST API with JWT Authentication

Decision: FastAPI REST API on Jupiter with JWT tokens

Rationale - Security:

Token-based auth (revocable, rotatable)
Scoped permissions (API can't access other databases)
Audit logging (track all queries by user/token/timestamp)
Rate limiting possible
HTTPS encryption in transit
Token revocation without DB password changes
IP restrictions and 2FA possible later

Rationale - Scalability:

Industry standard approach
Can add team members later
Other tools can integrate (scripts, mobile, GuruRMM)
Stateless authentication (no session storage)
Multiple clients supported

Rationale - Robustness:

Comprehensive error handling
Input validation before DB access
Structured logging
Health checks and monitoring
Version controlled schema migrations

Technology Stack

API Framework: FastAPI (Python)

Async performance for concurrent requests
Auto-generated OpenAPI/Swagger documentation
Type safety with Pydantic models (runtime validation)
SQLAlchemy ORM for complex queries
Built-in background tasks
Industry-standard testing with pytest
Alembic for database migrations
Mature dependency injection

Authentication: JWT Tokens

Stateless (no DB lookup to validate)
Claims-based (permissions, scopes, expiration)
Refresh token pattern for long-term access
Multiple clients/machines supported
Short-lived tokens minimize compromise risk
Industry standard

Configuration Storage: Gitea (Private Repo)

Multi-machine sync
Version controlled
Single source of truth
Token rotation = one commit, all machines sync
Encrypted token values (git-crypt or encrypted JSON)
Backup via Gitea

Deployment: Docker Container

Easy deployment and updates
Resource limits
Systemd service for auto-restart
Portable (can migrate to dedicated host later)

Infrastructure Design

Jupiter Server (172.16.3.20)

Docker Container: msp-api

FastAPI application (Python 3.11+)
SQLAlchemy + Alembic (ORM and migrations)
JWT auth library (python-jose)
Pydantic validation
Gunicorn/Uvicorn ASGI server
Health checks endpoint
Prometheus metrics (optional)
Mounted logs: /var/log/msp-api/

MariaDB Database: msp_tracking

Connection pooling (SQLAlchemy)
Automated backups (critical MSP data)
Schema versioned with Alembic

Nginx Reverse Proxy

HTTPS with Let's Encrypt
Rate limiting
Access logs
Proxies to: msp-api.azcomputerguru.com

Gitea Private Repository

Repo: azcomputerguru/claude-settings (or new msp-config repo)

Structure:

msp-api-config.json
├── api_url (https://msp-api.azcomputerguru.com)
├── api_token (encrypted JWT or refresh token)
└── database_schema_version (for migration tracking)

Local Machine (D:\ClaudeTools)

Directory Structure:

D:\ClaudeTools\
├── .claude/
│   ├── commands/
│   │   ├── msp.md (MSP Mode slash command)
│   │   ├── dev.md (Development Mode - TBD)
│   │   └── normal.md (Normal Mode - TBD)
│   └── msp-api-config.json (synced from Gitea)
├── MSP-MODE-SPEC.md (this file)
└── .git/ (synced to Gitea)

API Design Principles

Versioning

Start with /api/v1/ from day one
Allows breaking changes in future versions

Security

All endpoints require JWT authentication
Input validation with Pydantic models
Never expose database errors to client
Rate limiting to prevent abuse
Comprehensive audit logging

Endpoints (Draft - To Be Detailed)

POST   /api/v1/sessions          (start new MSP session)
GET    /api/v1/sessions          (query sessions - filters: client, date range, etc.)
POST   /api/v1/work-items        (log work performed)
GET    /api/v1/clients           (list clients)
POST   /api/v1/clients           (create client record)
GET    /api/v1/clients/{id}/credentials
POST   /api/v1/auth/token        (get JWT token)
POST   /api/v1/auth/refresh      (refresh expired token)
GET    /api/v1/health            (health check)
GET    /api/v1/metrics           (Prometheus metrics - optional)

Error Handling

Structured JSON error responses
HTTP status codes (400, 401, 403, 404, 429, 500)
Never leak sensitive information in errors
Log all errors with context

Logging

JSON structured logs (easy parsing)
Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
Include: timestamp, user/token ID, endpoint, duration, status
Separate audit log for sensitive operations (credential access, deletions)

JWT Token Structure

Access Token (Short-lived: 1 hour)

{
  "sub": "mike@azcomputerguru.com",
  "scopes": ["msp:read", "msp:write", "msp:admin"],
  "machine": "windows-workstation",
  "exp": 1234567890,
  "iat": 1234567890,
  "jti": "unique-token-id"
}

Refresh Token (Long-lived: 30 days)

Stored securely in Gitea config
Used to obtain new access tokens
Can be revoked server-side

Scopes (Permissions)

msp:read - Read sessions, clients, work items
msp:write - Create/update sessions, work items
msp:admin - Manage clients, credentials, delete operations

Database Schema (Draft - To Be Detailed)

Tables (High-Level)

clients
- id, name, network_cidr, primary_contact, notes, created_at, updated_at

projects
- id, client_id, name, description, status, created_at, updated_at

sessions
- id, project_id, start_time, end_time, billable_hours, notes, created_at

work_items
- id, session_id, description, category, timestamp, billable, created_at

credentials
- id, client_id, service, username, password_encrypted, notes, created_at, updated_at

tags (for categorization)
- id, name, type (client_tag, project_tag, work_tag)

session_tags (many-to-many)
- session_id, tag_id

Schema Versioning

Alembic migrations in version control
Schema version tracked in config
Automated migration on API startup (optional)

Modes Overview (D:\ClaudeTools Context)

1. MSP Mode

Purpose: Track client work, maintain context across sessions
Activation: /msp slash command
Behaviors: (To Be Detailed)
- Prompt for client/project at start
- Auto-log work items as we work
- Track time spent
- Save credentials securely
- Generate session summary at end

2. Development Mode

Purpose: (To Be Detailed)
Activation: /dev slash command
Behaviors: (To Be Detailed)

3. Normal Mode

Purpose: Return to standard Claude behavior
Activation: /normal slash command
Behaviors: (To Be Detailed)
- Clear active mode context
- Standard conversational Claude

Database Schema Design

Status: ✅ Analyzed via 5 parallel agents on 2026-01-15

Based on comprehensive analysis of:

37 session logs (Dec 2025 - Jan 2026)
shared-data/credentials.md
All project directories and documentation
Infrastructure and client network patterns

Schema Summary

Total Tables: 25 core tables + 5 junction tables = 30 tables

Categories:

Core MSP Tracking (5 tables)
Client & Infrastructure (7 tables)
Credentials & Security (4 tables)
Work Details (6 tables)
Tagging & Categorization (3 tables)
System & Audit (2 tables)
External Integrations (3 tables) - Added 2026-01-15

1. Core MSP Tracking Tables (6 tables)

`machines`

Technician's machines (laptops, desktops) used for MSP work.

CREATE TABLE machines (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Machine identification (auto-detected)
    hostname VARCHAR(255) NOT NULL UNIQUE, -- from `hostname` command
    machine_fingerprint VARCHAR(500) UNIQUE, -- hostname + username + platform hash

    -- Environment details
    friendly_name VARCHAR(255), -- "Main Laptop", "Home Desktop", "Travel Laptop"
    machine_type VARCHAR(50) CHECK(machine_type IN ('laptop', 'desktop', 'workstation', 'vm')),
    platform VARCHAR(50), -- "win32", "darwin", "linux"
    os_version VARCHAR(100),
    username VARCHAR(255), -- from `whoami`
    home_directory VARCHAR(500), -- user home path

    -- Capabilities
    has_vpn_access BOOLEAN DEFAULT false, -- can connect to client networks
    vpn_profiles TEXT, -- JSON array: ["dataforth", "grabb", "internal"]
    has_docker BOOLEAN DEFAULT false,
    has_powershell BOOLEAN DEFAULT false,
    powershell_version VARCHAR(20),
    has_ssh BOOLEAN DEFAULT true,
    has_git BOOLEAN DEFAULT true,

    -- Network context
    typical_network_location VARCHAR(100), -- "home", "office", "mobile"
    static_ip VARCHAR(45), -- if has static IP

    -- Claude Code context
    claude_working_directory VARCHAR(500), -- primary working dir
    additional_working_dirs TEXT, -- JSON array

    -- Tool versions
    installed_tools TEXT, -- JSON: {"git": "2.40", "docker": "24.0", "python": "3.11"}

    -- MCP Servers & Skills (NEW)
    available_mcps TEXT, -- JSON array: ["claude-in-chrome", "filesystem", "custom-mcp"]
    mcp_capabilities TEXT, -- JSON: {"chrome": {"version": "1.0", "features": ["screenshots"]}}
    available_skills TEXT, -- JSON array: ["pdf", "commit", "review-pr", "custom-skill"]
    skill_paths TEXT, -- JSON: {"/pdf": "/path/to/pdf-skill", ...}

    -- OS-Specific Commands
    preferred_shell VARCHAR(50), -- "powershell", "bash", "zsh", "cmd"
    package_manager_commands TEXT, -- JSON: {"install": "choco install", "update": "choco upgrade"}

    -- Status
    is_primary BOOLEAN DEFAULT false, -- primary machine
    is_active BOOLEAN DEFAULT true,
    last_seen TIMESTAMP,
    last_session_id UUID, -- last session from this machine

    -- Notes
    notes TEXT, -- "Travel laptop - limited tools, no VPN"

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_machines_hostname (hostname),
    INDEX idx_machines_fingerprint (machine_fingerprint),
    INDEX idx_machines_is_active (is_active),
    INDEX idx_machines_platform (platform)
);

Machine Fingerprint Generation:

fingerprint = SHA256(hostname + "|" + username + "|" + platform + "|" + home_directory)
// Example: SHA256("ACG-M-L5090|MikeSwanson|win32|C:\Users\MikeSwanson")

Auto-Detection on Session Start:

hostname = exec("hostname")          // "ACG-M-L5090"
username = exec("whoami")            // "MikeSwanson" or "AzureAD+MikeSwanson"
platform = process.platform          // "win32", "darwin", "linux"
home_dir = process.env.HOME || process.env.USERPROFILE

fingerprint = SHA256(`${hostname}|${username}|${platform}|${home_dir}`)

// Query database: SELECT * FROM machines WHERE machine_fingerprint = ?
// If not found: Create new machine record
// If found: Update last_seen, return machine_id

Examples:

ACG-M-L5090 (Main Laptop):

{
  "hostname": "ACG-M-L5090",
  "friendly_name": "Main Laptop",
  "platform": "win32",
  "os_version": "Windows 11 Pro",
  "has_vpn_access": true,
  "vpn_profiles": ["dataforth", "grabb", "internal"],
  "has_docker": true,
  "powershell_version": "7.4",
  "preferred_shell": "powershell",
  "available_mcps": ["claude-in-chrome", "filesystem"],
  "available_skills": ["pdf", "commit", "review-pr", "frontend-design"],
  "package_manager_commands": {
    "install": "choco install {package}",
    "update": "choco upgrade {package}",
    "list": "choco list --local-only"
  }
}

Mike-MacBook (Development Machine):

{
  "hostname": "Mikes-MacBook-Pro",
  "friendly_name": "MacBook Pro",
  "platform": "darwin",
  "os_version": "macOS 14.2",
  "has_vpn_access": false,
  "has_docker": true,
  "powershell_version": null,
  "preferred_shell": "zsh",
  "available_mcps": ["filesystem"],
  "available_skills": ["commit", "review-pr"],
  "package_manager_commands": {
    "install": "brew install {package}",
    "update": "brew upgrade {package}",
    "list": "brew list"
  }
}

Travel-Laptop (Limited):

{
  "hostname": "TRAVEL-WIN",
  "friendly_name": "Travel Laptop",
  "platform": "win32",
  "os_version": "Windows 10 Home",
  "has_vpn_access": false,
  "vpn_profiles": [],
  "has_docker": false,
  "powershell_version": "5.1",
  "preferred_shell": "powershell",
  "available_mcps": [],
  "available_skills": [],
  "notes": "Minimal toolset, no Docker, no VPN - use for light work only"
}

`clients`

Master table for all client organizations.

CREATE TABLE clients (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL UNIQUE,
    type VARCHAR(50) NOT NULL CHECK(type IN ('msp_client', 'internal', 'project')),
    network_subnet VARCHAR(100), -- e.g., "192.168.0.0/24"
    domain_name VARCHAR(255), -- AD domain or primary domain
    m365_tenant_id UUID, -- Microsoft 365 tenant ID
    primary_contact VARCHAR(255),
    notes TEXT,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_clients_type (type),
    INDEX idx_clients_name (name)
);

Examples: Dataforth, Grabb & Durando, Valley Wide Plastering, AZ Computer Guru (internal)

`projects`

Individual projects/engagements for clients.

CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(255) UNIQUE, -- directory name: "dataforth-dos"
    category VARCHAR(50) CHECK(category IN (
        'client_project', 'internal_product', 'infrastructure',
        'website', 'development_tool', 'documentation'
    )),
    status VARCHAR(50) DEFAULT 'working' CHECK(status IN (
        'complete', 'working', 'blocked', 'pending', 'critical', 'deferred'
    )),
    priority VARCHAR(20) CHECK(priority IN ('critical', 'high', 'medium', 'low')),
    description TEXT,
    started_date DATE,
    target_completion_date DATE,
    completed_date DATE,
    estimated_hours DECIMAL(10,2),
    actual_hours DECIMAL(10,2),
    gitea_repo_url VARCHAR(500),
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_projects_client (client_id),
    INDEX idx_projects_status (status),
    INDEX idx_projects_slug (slug)
);

Examples: dataforth-dos, gururmm, grabb-website-move

`sessions`

Work sessions with time tracking (enhanced with machine tracking).

CREATE TABLE sessions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
    project_id UUID REFERENCES projects(id) ON DELETE SET NULL,
    machine_id UUID REFERENCES machines(id) ON DELETE SET NULL, -- NEW: which machine
    session_date DATE NOT NULL,
    start_time TIMESTAMP,
    end_time TIMESTAMP,
    duration_minutes INTEGER, -- auto-calculated or manual
    status VARCHAR(50) DEFAULT 'completed' CHECK(status IN (
        'completed', 'in_progress', 'blocked', 'pending'
    )),
    session_title VARCHAR(500) NOT NULL,
    summary TEXT, -- markdown summary
    is_billable BOOLEAN DEFAULT false,
    billable_hours DECIMAL(10,2),
    technician VARCHAR(255), -- "Mike Swanson", etc.
    session_log_file VARCHAR(500), -- path to .md file
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_sessions_client (client_id),
    INDEX idx_sessions_project (project_id),
    INDEX idx_sessions_date (session_date),
    INDEX idx_sessions_billable (is_billable),
    INDEX idx_sessions_machine (machine_id)
);

`work_items`

Individual tasks/actions within sessions (granular tracking).

CREATE TABLE work_items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    category VARCHAR(50) NOT NULL CHECK(category IN (
        'infrastructure', 'troubleshooting', 'configuration',
        'development', 'maintenance', 'security', 'documentation'
    )),
    title VARCHAR(500) NOT NULL,
    description TEXT NOT NULL,
    status VARCHAR(50) DEFAULT 'completed' CHECK(status IN (
        'completed', 'in_progress', 'blocked', 'pending', 'deferred'
    )),
    priority VARCHAR(20) CHECK(priority IN ('critical', 'high', 'medium', 'low')),
    is_billable BOOLEAN DEFAULT false,
    estimated_minutes INTEGER,
    actual_minutes INTEGER,
    affected_systems TEXT, -- JSON array: ["jupiter", "172.16.3.20"]
    technologies_used TEXT, -- JSON array: ["docker", "mariadb"]
    item_order INTEGER, -- sequence within session
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,

    INDEX idx_work_items_session (session_id),
    INDEX idx_work_items_category (category),
    INDEX idx_work_items_status (status)
);

Categories distribution (from analysis):

Infrastructure: 30%
Troubleshooting: 25%
Configuration: 15%
Development: 15%
Maintenance: 10%
Security: 5%

`pending_tasks`

Open items across all clients/projects.

CREATE TABLE pending_tasks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
    work_item_id UUID REFERENCES work_items(id) ON DELETE SET NULL,
    title VARCHAR(500) NOT NULL,
    description TEXT,
    priority VARCHAR(20) CHECK(priority IN ('critical', 'high', 'medium', 'low')),
    blocked_by TEXT, -- what's blocking this
    assigned_to VARCHAR(255),
    due_date DATE,
    status VARCHAR(50) DEFAULT 'pending' CHECK(status IN (
        'pending', 'in_progress', 'blocked', 'completed', 'cancelled'
    )),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,

    INDEX idx_pending_tasks_client (client_id),
    INDEX idx_pending_tasks_status (status),
    INDEX idx_pending_tasks_priority (priority)
);

`tasks`

Task/checklist management for tracking implementation steps, analysis work, and other agent activities.

-- Task/Checklist Management
CREATE TABLE tasks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Task hierarchy
    parent_task_id UUID REFERENCES tasks(id) ON DELETE CASCADE,
    task_order INTEGER NOT NULL,

    -- Task details
    title VARCHAR(500) NOT NULL,
    description TEXT,
    task_type VARCHAR(100) CHECK(task_type IN (
        'implementation', 'research', 'review', 'deployment',
        'testing', 'documentation', 'bugfix', 'analysis'
    )),

    -- Status tracking
    status VARCHAR(50) NOT NULL CHECK(status IN (
        'pending', 'in_progress', 'blocked', 'completed', 'cancelled'
    )),
    blocking_reason TEXT, -- Why blocked (if status='blocked')

    -- Context
    session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
    client_id UUID REFERENCES clients(id) ON DELETE SET NULL,
    project_id UUID REFERENCES projects(id) ON DELETE SET NULL,
    assigned_agent VARCHAR(100), -- Which agent is handling this

    -- Timing
    estimated_complexity VARCHAR(20) CHECK(estimated_complexity IN (
        'trivial', 'simple', 'moderate', 'complex', 'very_complex'
    )),
    started_at TIMESTAMP,
    completed_at TIMESTAMP,

    -- Context data (JSON)
    task_context TEXT, -- Detailed context for this task
    dependencies TEXT, -- JSON array of dependency task_ids

    -- Metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_tasks_session (session_id),
    INDEX idx_tasks_status (status),
    INDEX idx_tasks_parent (parent_task_id),
    INDEX idx_tasks_client (client_id),
    INDEX idx_tasks_project (project_id)
);

2. Client & Infrastructure Tables (7 tables)

`sites`

Physical/logical locations for clients.

CREATE TABLE sites (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL, -- "Main Office", "SLC - Salt Lake City"
    network_subnet VARCHAR(100), -- "172.16.9.0/24"
    vpn_required BOOLEAN DEFAULT false,
    vpn_subnet VARCHAR(100), -- "192.168.1.0/24"
    gateway_ip VARCHAR(45), -- IPv4/IPv6
    dns_servers TEXT, -- JSON array
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_sites_client (client_id)
);

`infrastructure`

Servers, network devices, NAS, workstations (enhanced with environmental constraints).

CREATE TABLE infrastructure (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    site_id UUID REFERENCES sites(id) ON DELETE SET NULL,
    asset_type VARCHAR(50) NOT NULL CHECK(asset_type IN (
        'physical_server', 'virtual_machine', 'container',
        'network_device', 'nas_storage', 'workstation',
        'firewall', 'domain_controller'
    )),
    hostname VARCHAR(255) NOT NULL,
    ip_address VARCHAR(45),
    mac_address VARCHAR(17),
    os VARCHAR(255), -- "Ubuntu 22.04", "Windows Server 2022", "Unraid"
    os_version VARCHAR(100), -- "6.22", "2008 R2", "22.04"
    role_description TEXT, -- "Primary DC, NPS/RADIUS server"
    parent_host_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL, -- for VMs/containers
    status VARCHAR(50) DEFAULT 'active' CHECK(status IN (
        'active', 'migration_source', 'migration_destination', 'decommissioned'
    )),

    -- Environmental constraints (new)
    environmental_notes TEXT, -- "Manual WINS install, no native service. ReadyNAS OS, SMB1 only."
    powershell_version VARCHAR(20), -- "2.0", "5.1", "7.4"
    shell_type VARCHAR(50), -- "bash", "cmd", "powershell", "sh"
    package_manager VARCHAR(50), -- "apt", "yum", "chocolatey", "none"
    has_gui BOOLEAN DEFAULT true, -- false for headless/DOS
    limitations TEXT, -- JSON array: ["no_ps7", "smb1_only", "dos_6.22_commands"]

    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_infrastructure_client (client_id),
    INDEX idx_infrastructure_type (asset_type),
    INDEX idx_infrastructure_hostname (hostname),
    INDEX idx_infrastructure_parent (parent_host_id),
    INDEX idx_infrastructure_os (os)
);

Examples:

Jupiter (Ubuntu 22.04, PS7, GUI)
AD2/Dataforth (Server 2022, PS5.1, GUI)
D2TESTNAS (ReadyNAS OS, manual WINS, no GUI service manager, SMB1)
TS-27 (MS-DOS 6.22, no GUI, batch only)

`services`

Applications/services running on infrastructure.

CREATE TABLE services (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
    service_name VARCHAR(255) NOT NULL, -- "Gitea", "PostgreSQL", "Apache"
    service_type VARCHAR(100), -- "git_hosting", "database", "web_server"
    external_url VARCHAR(500), -- "https://git.azcomputerguru.com"
    internal_url VARCHAR(500), -- "http://172.16.3.20:3000"
    port INTEGER,
    protocol VARCHAR(50), -- "https", "ssh", "smb"
    status VARCHAR(50) DEFAULT 'running' CHECK(status IN (
        'running', 'stopped', 'error', 'maintenance'
    )),
    version VARCHAR(100),
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_services_infrastructure (infrastructure_id),
    INDEX idx_services_name (service_name),
    INDEX idx_services_type (service_type)
);

`service_relationships`

Dependencies and relationships between services.

CREATE TABLE service_relationships (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    from_service_id UUID NOT NULL REFERENCES services(id) ON DELETE CASCADE,
    to_service_id UUID NOT NULL REFERENCES services(id) ON DELETE CASCADE,
    relationship_type VARCHAR(50) NOT NULL CHECK(relationship_type IN (
        'hosted_on', 'proxied_by', 'authenticates_via',
        'backend_for', 'depends_on', 'replicates_to'
    )),
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE(from_service_id, to_service_id, relationship_type),
    INDEX idx_service_rel_from (from_service_id),
    INDEX idx_service_rel_to (to_service_id)
);

Examples:

Gitea (proxied_by) NPM
GuruRMM API (hosted_on) Jupiter container

`networks`

Network segments, VLANs, VPN networks.

CREATE TABLE networks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    site_id UUID REFERENCES sites(id) ON DELETE CASCADE,
    network_name VARCHAR(255) NOT NULL,
    network_type VARCHAR(50) CHECK(network_type IN (
        'lan', 'vpn', 'vlan', 'isolated', 'dmz'
    )),
    cidr VARCHAR(100) NOT NULL, -- "192.168.0.0/24"
    gateway_ip VARCHAR(45),
    vlan_id INTEGER,
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_networks_client (client_id),
    INDEX idx_networks_site (site_id)
);

`firewall_rules`

Network security rules (for documentation/audit trail).

CREATE TABLE firewall_rules (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
    rule_name VARCHAR(255),
    source_cidr VARCHAR(100),
    destination_cidr VARCHAR(100),
    port INTEGER,
    protocol VARCHAR(20), -- "tcp", "udp", "icmp"
    action VARCHAR(20) CHECK(action IN ('allow', 'deny', 'drop')),
    rule_order INTEGER,
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(255),

    INDEX idx_firewall_infra (infrastructure_id)
);

`m365_tenants`

Microsoft 365 tenant tracking.

CREATE TABLE m365_tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    tenant_id UUID NOT NULL UNIQUE, -- Microsoft tenant ID
    tenant_name VARCHAR(255), -- "dataforth.com"
    default_domain VARCHAR(255), -- "dataforthcorp.onmicrosoft.com"
    admin_email VARCHAR(255),
    cipp_name VARCHAR(255), -- name in CIPP portal
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_m365_client (client_id),
    INDEX idx_m365_tenant_id (tenant_id)
);

3. Credentials & Security Tables (4 tables)

`credentials`

Encrypted credential storage (values encrypted at rest).

CREATE TABLE credentials (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    service_id UUID REFERENCES services(id) ON DELETE CASCADE,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
    credential_type VARCHAR(50) NOT NULL CHECK(credential_type IN (
        'password', 'api_key', 'oauth', 'ssh_key',
        'shared_secret', 'jwt', 'connection_string', 'certificate'
    )),
    service_name VARCHAR(255) NOT NULL, -- "Gitea Admin", "AD2 sysadmin"
    username VARCHAR(255),
    password_encrypted BYTEA, -- AES-256-GCM encrypted
    api_key_encrypted BYTEA,
    client_id_oauth VARCHAR(255), -- for OAuth
    client_secret_encrypted BYTEA,
    tenant_id_oauth VARCHAR(255),
    public_key TEXT, -- for SSH
    token_encrypted BYTEA,
    connection_string_encrypted BYTEA,
    integration_code VARCHAR(255), -- for services like Autotask

    -- Metadata
    external_url VARCHAR(500),
    internal_url VARCHAR(500),
    custom_port INTEGER,
    role_description VARCHAR(500),
    requires_vpn BOOLEAN DEFAULT false,
    requires_2fa BOOLEAN DEFAULT false,
    ssh_key_auth_enabled BOOLEAN DEFAULT false,
    access_level VARCHAR(100),

    -- Lifecycle
    expires_at TIMESTAMP,
    last_rotated_at TIMESTAMP,
    is_active BOOLEAN DEFAULT true,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_credentials_client (client_id),
    INDEX idx_credentials_service (service_id),
    INDEX idx_credentials_type (credential_type),
    INDEX idx_credentials_active (is_active)
);

Security:

All sensitive fields encrypted with AES-256-GCM
Encryption key stored separately (environment variable or vault)
Master password unlock mechanism

`credential_audit_log`

Audit trail for credential access.

CREATE TABLE credential_audit_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    credential_id UUID NOT NULL REFERENCES credentials(id) ON DELETE CASCADE,
    action VARCHAR(50) NOT NULL CHECK(action IN (
        'view', 'create', 'update', 'delete', 'rotate', 'decrypt'
    )),
    user_id VARCHAR(255) NOT NULL, -- JWT sub claim
    ip_address VARCHAR(45),
    user_agent TEXT,
    details TEXT, -- JSON: what changed, why
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_cred_audit_credential (credential_id),
    INDEX idx_cred_audit_user (user_id),
    INDEX idx_cred_audit_timestamp (timestamp)
);

`security_incidents`

Track security events and remediation.

CREATE TABLE security_incidents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    service_id UUID REFERENCES services(id) ON DELETE SET NULL,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
    incident_type VARCHAR(100) CHECK(incident_type IN (
        'bec', 'backdoor', 'malware', 'unauthorized_access',
        'data_breach', 'phishing', 'ransomware', 'brute_force'
    )),
    incident_date TIMESTAMP NOT NULL,
    severity VARCHAR(50) CHECK(severity IN ('critical', 'high', 'medium', 'low')),
    description TEXT NOT NULL,
    findings TEXT, -- investigation results
    remediation_steps TEXT,
    status VARCHAR(50) DEFAULT 'investigating' CHECK(status IN (
        'investigating', 'contained', 'resolved', 'monitoring'
    )),
    resolved_at TIMESTAMP,
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_incidents_client (client_id),
    INDEX idx_incidents_type (incident_type),
    INDEX idx_incidents_status (status)
);

Examples: BG Builders OAuth backdoor, CW Concrete BEC

`credential_permissions`

Access control for credentials (future team expansion).

CREATE TABLE credential_permissions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    credential_id UUID NOT NULL REFERENCES credentials(id) ON DELETE CASCADE,
    user_id VARCHAR(255) NOT NULL, -- or role_id
    permission_level VARCHAR(50) CHECK(permission_level IN ('read', 'write', 'admin')),
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    granted_by VARCHAR(255),

    UNIQUE(credential_id, user_id),
    INDEX idx_cred_perm_credential (credential_id),
    INDEX idx_cred_perm_user (user_id)
);

4. Work Details Tables (6 tables)

`file_changes`

Track files created/modified/deleted during sessions.

CREATE TABLE file_changes (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    file_path VARCHAR(1000) NOT NULL,
    change_type VARCHAR(50) CHECK(change_type IN (
        'created', 'modified', 'deleted', 'renamed', 'backed_up'
    )),
    backup_path VARCHAR(1000),
    size_bytes BIGINT,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_file_changes_work_item (work_item_id),
    INDEX idx_file_changes_session (session_id)
);

`commands_run`

Shell/PowerShell/SQL commands executed (enhanced with failure tracking).

CREATE TABLE commands_run (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    command_text TEXT NOT NULL,
    host VARCHAR(255), -- where executed: "jupiter", "172.16.3.20"
    shell_type VARCHAR(50), -- "bash", "powershell", "sql", "docker"
    success BOOLEAN,
    output_summary TEXT, -- first/last lines or error

    -- Failure tracking (new)
    exit_code INTEGER, -- non-zero indicates failure
    error_message TEXT, -- full error text
    failure_category VARCHAR(100), -- "compatibility", "permission", "syntax", "environmental"
    resolution TEXT, -- how it was fixed (if resolved)
    resolved BOOLEAN DEFAULT false,

    execution_order INTEGER, -- sequence within work item
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_commands_work_item (work_item_id),
    INDEX idx_commands_session (session_id),
    INDEX idx_commands_host (host),
    INDEX idx_commands_success (success),
    INDEX idx_commands_failure_category (failure_category)
);

`infrastructure_changes`

Audit trail for infrastructure modifications.

CREATE TABLE infrastructure_changes (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
    change_type VARCHAR(50) CHECK(change_type IN (
        'dns', 'firewall', 'routing', 'ssl', 'container',
        'service_config', 'hardware', 'network', 'storage'
    )),
    target_system VARCHAR(255) NOT NULL,
    before_state TEXT,
    after_state TEXT,
    is_permanent BOOLEAN DEFAULT true,
    rollback_procedure TEXT,
    verification_performed BOOLEAN DEFAULT false,
    verification_notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_infra_changes_work_item (work_item_id),
    INDEX idx_infra_changes_session (session_id),
    INDEX idx_infra_changes_infrastructure (infrastructure_id)
);

`backup_log`

Backup tracking with verification status.

-- Backup Tracking
CREATE TABLE backup_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Backup details
    backup_type VARCHAR(50) NOT NULL CHECK(backup_type IN (
        'daily', 'weekly', 'monthly', 'manual', 'pre-migration'
    )),
    file_path VARCHAR(500) NOT NULL,
    file_size_bytes BIGINT NOT NULL,

    -- Timing
    backup_started_at TIMESTAMP NOT NULL,
    backup_completed_at TIMESTAMP NOT NULL,
    duration_seconds INTEGER GENERATED ALWAYS AS (
        TIMESTAMPDIFF(SECOND, backup_started_at, backup_completed_at)
    ) STORED,

    -- Verification
    verification_status VARCHAR(50) CHECK(verification_status IN (
        'passed', 'failed', 'not_verified'
    )),
    verification_details TEXT, -- JSON: specific check results

    -- Metadata
    database_host VARCHAR(255),
    database_name VARCHAR(100),
    backup_method VARCHAR(50) DEFAULT 'mysqldump',

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_backup_type (backup_type),
    INDEX idx_backup_date (backup_completed_at),
    INDEX idx_verification_status (verification_status)
);

`problem_solutions`

Issue tracking with root cause and resolution.

CREATE TABLE problem_solutions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    problem_description TEXT NOT NULL,
    symptom TEXT, -- what user saw
    error_message TEXT, -- exact error code/message
    investigation_steps TEXT, -- JSON array of diagnostic commands
    root_cause TEXT,
    solution_applied TEXT NOT NULL,
    verification_method TEXT,
    rollback_plan TEXT,
    recurrence_count INTEGER DEFAULT 1, -- if same problem reoccurs
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_problems_work_item (work_item_id),
    INDEX idx_problems_session (session_id)
);

`deployments`

Track software/config deployments.

CREATE TABLE deployments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
    service_id UUID REFERENCES services(id) ON DELETE SET NULL,
    deployment_type VARCHAR(50) CHECK(deployment_type IN (
        'code', 'config', 'database', 'container', 'service_restart'
    )),
    version VARCHAR(100),
    description TEXT,
    deployed_from VARCHAR(500), -- source path or repo
    deployed_to VARCHAR(500), -- destination
    rollback_available BOOLEAN DEFAULT false,
    rollback_procedure TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_deployments_work_item (work_item_id),
    INDEX idx_deployments_infrastructure (infrastructure_id),
    INDEX idx_deployments_service (service_id)
);

`database_changes`

Track database schema/data modifications.

CREATE TABLE database_changes (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    database_name VARCHAR(255) NOT NULL,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE SET NULL,
    change_type VARCHAR(50) CHECK(change_type IN (
        'schema', 'data', 'index', 'optimization', 'cleanup', 'migration'
    )),
    sql_executed TEXT,
    rows_affected BIGINT,
    size_freed_bytes BIGINT, -- for cleanup operations
    backup_taken BOOLEAN DEFAULT false,
    backup_location VARCHAR(500),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_db_changes_work_item (work_item_id),
    INDEX idx_db_changes_database (database_name)
);

`failure_patterns`

Aggregated failure insights learned from command/operation failures.

CREATE TABLE failure_patterns (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,

    -- Pattern identification
    pattern_type VARCHAR(100) NOT NULL CHECK(pattern_type IN (
        'command_compatibility', 'version_mismatch', 'permission_denied',
        'service_unavailable', 'configuration_error', 'environmental_limitation'
    )),
    pattern_signature VARCHAR(500) NOT NULL, -- "PowerShell 7 cmdlets on Server 2008"
    error_pattern TEXT, -- regex or keywords: "Get-LocalUser.*not recognized"

    -- Context
    affected_systems TEXT, -- JSON array: ["all_server_2008", "D2TESTNAS"]
    triggering_commands TEXT, -- JSON array of command patterns
    triggering_operations TEXT, -- JSON array of operation types

    -- Resolution
    failure_description TEXT NOT NULL,
    root_cause TEXT NOT NULL, -- "Server 2008 only has PowerShell 2.0"
    recommended_solution TEXT NOT NULL, -- "Use Get-WmiObject instead of Get-LocalUser"
    alternative_approaches TEXT, -- JSON array of alternatives

    -- Metadata
    occurrence_count INTEGER DEFAULT 1, -- how many times seen
    first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    severity VARCHAR(20) CHECK(severity IN ('blocking', 'major', 'minor', 'info')),
    is_active BOOLEAN DEFAULT true, -- false if pattern no longer applies
    added_to_insights BOOLEAN DEFAULT false,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_failure_infrastructure (infrastructure_id),
    INDEX idx_failure_client (client_id),
    INDEX idx_failure_pattern_type (pattern_type),
    INDEX idx_failure_signature (pattern_signature)
);

Examples:

Pattern: "PowerShell 7 cmdlets on Server 2008" → Use PS 2.0 compatible commands
Pattern: "WINS service GUI on D2TESTNAS" → WINS manually installed, no native service
Pattern: "Modern batch syntax on DOS 6.22" → No IF /I, no long filenames

`environmental_insights`

Generated insights.md content per client/infrastructure.

CREATE TABLE environmental_insights (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,
    infrastructure_id UUID REFERENCES infrastructure(id) ON DELETE CASCADE,

    -- Insight content
    insight_category VARCHAR(100) NOT NULL CHECK(insight_category IN (
        'command_constraints', 'service_configuration', 'version_limitations',
        'custom_installations', 'network_constraints', 'permissions'
    )),
    insight_title VARCHAR(500) NOT NULL,
    insight_description TEXT NOT NULL, -- markdown formatted
    examples TEXT, -- JSON array of command examples

    -- Metadata
    source_pattern_id UUID REFERENCES failure_patterns(id) ON DELETE SET NULL,
    confidence_level VARCHAR(20) CHECK(confidence_level IN ('confirmed', 'likely', 'suspected')),
    verification_count INTEGER DEFAULT 1, -- how many times verified
    priority INTEGER DEFAULT 5, -- 1-10, higher = more important

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_verified TIMESTAMP,

    INDEX idx_insights_client (client_id),
    INDEX idx_insights_infrastructure (infrastructure_id),
    INDEX idx_insights_category (insight_category)
);

Generated insights.md example:

# Environmental Insights: Dataforth

## D2TESTNAS (192.168.0.9)

### Custom Installations
- **WINS Service**: Manually installed, not native ReadyNAS service
  - No GUI service manager for WINS
  - Configure via `/etc/frontview/samba/smb.conf.overrides`
  - Check status: `ssh root@192.168.0.9 'nmbd -V'`

### Version Constraints
- **SMB Protocol**: CORE/SMB1 only (for DOS compatibility)
  - Modern SMB2/3 clients may need configuration
  - Use NetBIOS name, not IP address for DOS machines

## AD2 (192.168.0.6 - Server 2022)

### PowerShell Version
- **Version**: PowerShell 5.1 (default)
- **Compatible**: Modern cmdlets work
- **Not available**: PowerShell 7 specific features

## TS-XX Machines (DOS)

### Command Constraints
- **OS**: MS-DOS 6.22
- **No support for**:
  - `IF /I` (case insensitive) - use duplicate IF statements
  - Long filenames (8.3 format only)
  - Unicode or special characters
  - Modern batch features

`operation_failures`

Non-command failures (API calls, integrations, file operations).

CREATE TABLE operation_failures (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
    work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE,

    -- Operation details
    operation_type VARCHAR(100) NOT NULL CHECK(operation_type IN (
        'api_call', 'file_operation', 'network_request',
        'database_query', 'external_integration', 'service_restart'
    )),
    operation_description TEXT NOT NULL,
    target_system VARCHAR(255), -- host, URL, service name

    -- Failure details
    error_message TEXT NOT NULL,
    error_code VARCHAR(50), -- HTTP status, exit code, error number
    failure_category VARCHAR(100), -- "timeout", "authentication", "not_found", etc.
    stack_trace TEXT,

    -- Resolution
    resolution_applied TEXT,
    resolved BOOLEAN DEFAULT false,
    resolved_at TIMESTAMP,

    -- Context
    request_data TEXT, -- JSON: what was attempted
    response_data TEXT, -- JSON: error response
    environment_snapshot TEXT, -- JSON: relevant env vars, versions

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_op_failure_session (session_id),
    INDEX idx_op_failure_type (operation_type),
    INDEX idx_op_failure_category (failure_category),
    INDEX idx_op_failure_resolved (resolved)
);

Examples:

SyncroMSP API call timeout → Retry logic needed
File upload to NAS fails → Permission issue detected
Database query slow → Index missing, added

5. Tagging & Categorization Tables (3 tables)

`tags`

Flexible tagging system for work items.

CREATE TABLE tags (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(100) UNIQUE NOT NULL,
    category VARCHAR(50) CHECK(category IN (
        'technology', 'client', 'infrastructure',
        'problem_type', 'action', 'service'
    )),
    description TEXT,
    usage_count INTEGER DEFAULT 0, -- auto-increment on use
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_tags_category (category),
    INDEX idx_tags_name (name)
);

Pre-populated tags: 157+ tags identified from analysis

58 technology tags (docker, postgresql, apache, etc.)
24 infrastructure tags (jupiter, saturn, pfsense, etc.)
20+ client tags
30 problem type tags (connection-timeout, ssl-error, etc.)
25 action tags (migration, upgrade, cleanup, etc.)

`work_item_tags` (Junction Table)

Many-to-many relationship: work items ↔ tags.

CREATE TABLE work_item_tags (
    work_item_id UUID NOT NULL REFERENCES work_items(id) ON DELETE CASCADE,
    tag_id UUID NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (work_item_id, tag_id),

    INDEX idx_wit_work_item (work_item_id),
    INDEX idx_wit_tag (tag_id)
);

`session_tags` (Junction Table)

Many-to-many relationship: sessions ↔ tags.

CREATE TABLE session_tags (
    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    tag_id UUID NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (session_id, tag_id),

    INDEX idx_st_session (session_id),
    INDEX idx_st_tag (tag_id)
);

6. System & Audit Tables (2 tables)

`api_audit_log`

Track all API requests for security and debugging.

CREATE TABLE api_audit_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id VARCHAR(255) NOT NULL, -- JWT sub claim
    endpoint VARCHAR(500) NOT NULL, -- "/api/v1/sessions"
    http_method VARCHAR(10), -- GET, POST, PUT, DELETE
    ip_address VARCHAR(45),
    user_agent TEXT,
    request_body TEXT, -- sanitized (no credentials)
    response_status INTEGER, -- 200, 401, 500
    response_time_ms INTEGER,
    error_message TEXT,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_api_audit_user (user_id),
    INDEX idx_api_audit_endpoint (endpoint),
    INDEX idx_api_audit_timestamp (timestamp),
    INDEX idx_api_audit_status (response_status)
);

`schema_migrations`

Track database schema versions (Alembic migrations).

CREATE TABLE schema_migrations (
    version_id VARCHAR(100) PRIMARY KEY,
    description TEXT,
    applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    applied_by VARCHAR(255),
    migration_sql TEXT
);

Junction Tables Summary

work_item_tags - Work items ↔ Tags
session_tags - Sessions ↔ Tags
project_relationships (optional) - Projects ↔ Projects (related/dependent)
project_session_logs (optional) - Projects ↔ Sessions (many-to-many if sessions span multiple projects)

Schema Statistics

Total Tables: 34

Core MSP: 6 tables (added machines table)
Client & Infrastructure: 7 tables
Credentials & Security: 4 tables
Work Details: 6 tables
Failure Analysis & Environmental Insights: 3 tables
Tagging: 3 tables (+ 2 junction)
System: 2 tables
External Integrations: 3 tables

Estimated Row Counts (1 year of MSP work):

sessions: ~500-1000 (2-3 per day)
work_items: ~5,000-10,000 (5-10 per session)
file_changes: ~10,000
commands_run: ~20,000
tags: ~200
clients: ~50
projects: ~100
credentials: ~500
api_audit_log: ~100,000+

Storage Estimate: ~1-2 GB per year (compressed)

Design Principles Applied

Normalized Structure - Minimizes data duplication
Flexible Tagging - Supports evolving categorization
Audit Trail - Comprehensive logging for security and troubleshooting
Scalability - Designed for multi-user MSP team growth
Security First - Encrypted credentials, access control, audit logging
Temporal Tracking - created_at, updated_at, completed_at timestamps
Soft Deletes - is_active flags allow recovery
Relationships - Foreign keys enforce referential integrity
Indexes - Strategic indexes for common query patterns
JSON Flexibility - JSON fields for arrays/flexible data (affected_systems, technologies_used)

Next Steps for Database Implementation

✅ Schema designed (27 tables, relationships defined)
⏳ Create Alembic migration files
⏳ Set up encryption key management
⏳ Seed initial data (tags, MSP infrastructure)
⏳ Create database on Jupiter MariaDB
⏳ Build FastAPI models (SQLAlchemy + Pydantic)
⏳ Implement API endpoints
⏳ Create authentication flow
⏳ Build MSP Mode slash command integration

Open Questions

MSP Mode Behaviors (DEFINED)

Core Principle: Automatically categorize client interactions and store useful data in brief but information-dense format.

When `/msp` is Called (Session Start)

Phase 0: Machine Detection (FIRST - before everything)

Main Claude launches Machine Detection Agent:

Agent performs:

Execute: hostname → "ACG-M-L5090"
Execute: whoami → "MikeSwanson"
Detect: platform → "win32"
Detect: home_dir → "C:\Users\MikeSwanson"
Generate fingerprint: SHA256(hostname|username|platform|home_dir)

Agent queries database:

SELECT * FROM machines WHERE machine_fingerprint = 'abc123...'

If machine NOT found (first time on this machine):

Create machine record with auto-detected info
Prompt user: "New machine detected: ACG-M-L5090. Please configure:"
- Friendly name? (e.g., "Main Laptop")
- Machine type? (laptop/desktop)
- Has VPN access? Which profiles?
- Docker installed? PowerShell version?
Store capabilities in machines table

If machine found:

Update last_seen timestamp
Load machine capabilities
Check for tool version changes (optional)

Agent returns to Main Claude:

Machine Context:
- machine_id: uuid-123
- friendly_name: "Main Laptop"
- capabilities: VPN (dataforth, grabb), Docker 24.0, PS 7.4
- limitations: None

Main Claude stores machine_id for session tracking.

Phase 1: Client/Project Detection

Auto-detect from context:
- Mentions of client names, domains, IPs
- If ambiguous, present quick-select list of recent clients
- Prompt: "Working on: [Client] - [Project]? (or select different)"
Check VPN requirements:
- If client requires VPN (e.g., Dataforth): Check if current machine has VPN capability
- If no VPN on this machine: Warn user "Dataforth requires VPN - ACG-M-L5090 has VPN access ✓"
- If VPN not available: "Travel-Laptop doesn't have VPN access - some operations may be limited"

Phase 2: Session Initialization

Start timer automatically
Create session record with:
- session_date (today)
- start_time (now)
- client_id, project_id (detected or selected)
- machine_id (from Machine Detection Agent)
- status = 'in_progress'
Context Display:
- Show brief summary: "MSP Mode: [Client] - [Project] | Machine: Main Laptop | Started: [time]"
- Machine capabilities displayed if relevant
- Load relevant context: recent sessions, open tasks, credentials for this client

During Session (Automatic Tracking)

Auto-Categorization - Work Items (Agent-Based): As work progresses, Main Claude tracks actions, then periodically (or on-demand) launches categorization agent:

Agent Task: "Analyze recent work and categorize"

Agent receives:

Conversation transcript since last categorization
Commands executed
Files modified
User questions/issues mentioned

Agent performs:

Category detection:
- Keywords trigger categories:
  - "ssh", "docker restart" → infrastructure
  - "error", "not working", "broken" → troubleshooting
  - "configure", "setup", "change settings" → configuration
  - "build", "code", "implement" → development
  - "cleanup", "optimize", "backup" → maintenance
  - "malware", "breach", "unauthorized" → security
Technology tagging:
- Auto-detect from commands/context: docker, apache, mariadb, m365, etc.
- Returns technologies_used array
Affected systems:
- Extract IPs, hostnames from commands
- Returns affected_systems array
Dense description generation:
- Problem: [what was wrong]
- Cause: [root cause if identified]
- Fix: [solution applied]
- Verify: [how confirmed]

Agent returns structured work_item data:

{
  "category": "troubleshooting",
  "title": "Fixed Apache SSL certificate expiration",
  "description": "Problem: ERR_SSL_PROTOCOL_ERROR\nCause: Cert expired 2026-01-10\nFix: certbot renew, restarted apache\nVerify: curl test successful",
  "technologies_used": ["apache", "ssl", "certbot"],
  "affected_systems": ["jupiter", "172.16.3.20"],
  "status": "completed"
}

Main Claude: Presents to user, stores to database via API

Information-Dense Data Capture:

Commands Run:
- Auto-log every bash/powershell/SQL command executed
- Store: command_text, host, shell_type, success, output_summary (first/last 5 lines)
- Link to current work_item
File Changes:
- Track when files are read/edited/written
- Store: file_path, change_type, backup_path (if created), size_bytes
- Brief description (auto-generated: "Modified Apache config for SSL")
Problems & Solutions:
- When user describes an error, auto-create problem_solution record:
  - symptom: "Users can't access website"
  - error_message: "ERR_CONNECTION_TIMED_OUT"
  - investigation_steps: [array of diagnostic commands]
  - root_cause: "Firewall blocking port 443"
  - solution_applied: "Added iptables ACCEPT rule for 443"
  - verification_method: "curl test successful"
Credentials Accessed:
- When retrieving credentials, log to credential_audit_log:
  - action: 'decrypt' or 'view'
  - credential_id
  - user_id (from JWT)
  - timestamp
- Don't log the actual credential value (security)
Infrastructure Changes:
- Detect infrastructure modifications:
  - DNS changes → change_type: 'dns'
  - Firewall rules → change_type: 'firewall'
  - Service configs → change_type: 'service_config'
- Store before_state, after_state, rollback_procedure

Concise Summaries:

Auto-generate brief descriptions:
- Work item title: "Fixed Apache SSL certificate expiration on jupiter"
- Problem description: "Website down: cert expired, renewed via certbot, verified"
- Not verbose: avoid "I then proceeded to...", just facts

Billability Detection

Auto-flag billable work:

Client work (non-internal) → is_billable = true by default
Internal infrastructure → is_billable = false
User can override with quick command: "/billable false"

Time allocation:

Track time per work_item (start when created, end when completed)
Aggregate to session total

Session End Behavior (Agent-Based)

When /msp end or /normal is called:

Main Claude launches Session Summary Agent

Agent Task: "Generate comprehensive session summary with dense format"

Agent receives:

Full session data from main Claude
All work_items created during session
Commands executed log
Files modified log
Problems solved
Credentials accessed
Infrastructure changes

Agent performs:

Analyzes work patterns:
- Primary category (most frequent)
- Time allocation per category
- Key outcomes

Generates dense summary:

Session: [Client] - [Project]
Duration: [duration]
Category: [primary category based on work_items]

Work Completed:
- [Concise bullet: category, title, affected systems]
- [Another item]

Problems Solved: [count]
- [Error] → [Solution]

Infrastructure Changes: [count]
- [System]: [change type] - [brief description]

Commands Run: [count] | Files Modified: [count]
Technologies: [tag list]

Billable: [yes/no] | Hours: [calculated]

Structures data for API:
- Complete session object
- All related: work_items, commands_run, file_changes, etc.
- Auto-calculated fields: duration, billable_hours, category distribution

Agent returns: Structured summary + API-ready payload

Main Claude:

Presents summary to user:
- Shows generated summary
- "Save session? (y/n)"
- "Billable hours: [auto-calculated] - adjust? (or press Enter)"
- "Add notes? (or press Enter to skip)"
Stores to database:
- POST to API: /api/v1/sessions
- Agent's structured payload sent
- API returns session_id
Generates session log file (optional):
- Create markdown file in session-logs/
- Format similar to current session logs but auto-generated
- Include all dense information captured

Context Saved: Agent processed entire session history, main Claude only receives summary and confirmation prompts.

Information Density Examples

Dense (Good):

Problem: Apache crash on jupiter
Error: segfault in mod_php
Cause: PHP 8.1 incompatibility
Fix: Downgraded to PHP 7.4, restarted apache
Verify: Website loads, no errors in logs
Files: /etc/apache2/mods-enabled/php*.conf
Commands: 3 (apt, systemctl, curl)

Verbose (Avoid):

I first investigated the Apache crash by checking the error logs.
Then I noticed that there was a segmentation fault in the mod_php module.
After some research, I determined this was due to a PHP version incompatibility.
I proceeded to downgrade PHP from version 8.1 to version 7.4.
Once that was complete, I restarted the Apache service.
Finally, I verified the fix by loading the website and checking the logs.

Dense storage = More information, fewer words.

Credential Handling (Agent-Based)

Storage:

New credentials discovered → prompt: "Store credential for [service]? (y/n)"
If yes → Credential Storage Agent:
- Receives: credential data, client context, service info
- Encrypts credential with AES-256-GCM
- Links to client_id, service_id, infrastructure_id
- Stores via API: POST /api/v1/credentials
- Returns: credential_id
Main Claude confirms to user: "Stored [service] credential (ID: abc123)"

Retrieval:

When credential needed, Main Claude launches Credential Retrieval Agent:

Agent Task: "Retrieve credential for AD2\sysadmin"

Agent performs:

Query API: GET /api/v1/credentials?service=AD2&username=sysadmin
Decrypt credential (API handles this)
Log access to credential_audit_log:
- Who (JWT user_id)
- When (timestamp)
- What (credential_id, service_name)
- Why (current session_id, work_item context)
Return only the credential value

Agent returns: "Paper123!@#"

Main Claude: Displays to user in context (e.g., "Using AD2\sysadmin password from vault")

Audit:

Every credential access logged automatically by agent
Main Claude doesn't see audit details (reduces context usage)
Audit queryable later: "Show all credential access for last month"

Auto-Tagging

As work progresses, auto-apply tags:

Mention "docker" → tag: docker
Working on "jupiter" → tag: jupiter
Client "Dataforth" → tag: dataforth
Error "connection-timeout" → tag: connection-timeout
Action "migration" → tag: migration

Tag categories:

technology (docker, apache, mariadb)
infrastructure (jupiter, pfsense)
client (dataforth)
problem_type (ssl-error, connection-timeout)
action (migration, upgrade, cleanup)

Tags stored in work_item_tags and session_tags junction tables.

Context Awareness (Agent-Based)

When MSP session starts, Main Claude launches Context Recovery Agent:

Agent Task: "Retrieve relevant context for [Client]"

Agent queries in parallel:

Previous sessions (last 5): GET /api/v1/sessions?client=Dataforth&limit=5
Open pending tasks: GET /api/v1/pending-tasks?client=Dataforth&status=pending
Recent credentials: GET /api/v1/credentials?client=Dataforth&recently_used=true
Infrastructure: GET /api/v1/infrastructure?client=Dataforth

Agent processes and summarizes:

Context for Dataforth:
Last session: 2026-01-10 - DOS UPDATE.BAT v2.0 completion (3.5 hrs)
Open tasks (2):
  - Datasheets share creation (blocked: waiting on Engineering)
  - Engineer NAS access documentation (pending)
Infrastructure: AD2 (192.168.0.6), D2TESTNAS (192.168.0.9), 30x TS machines
Available credentials: AD2\sysadmin, NAS root (last used: 2026-01-10)

Agent returns concise summary

Main Claude:

Displays context to user
Auto-suggests: "Continue with datasheets share setup?"
Has context for intelligent suggestions without full history in main context

During session, on-demand context retrieval:

User: "What did we do about backups for this client?"

Main Claude launches Historical Search Agent:

Agent Task: "Search Dataforth sessions for backup-related work"

Agent:

Queries: GET /api/v1/sessions?client=Dataforth&search=backup
Finds 3 sessions with backup work
Extracts key outcomes
Returns: "Found 3 backup-related sessions: 2025-12-14 (NAS setup), 2025-12-20 (Veeam config), 2026-01-05 (sync testing)"

Main Claude presents concise answer to user

Context Saved: Agent processed potentially megabytes of session data, returned 100-word summary.

Agent Types & Responsibilities

MSP Mode uses multiple specialized agents to preserve main context:

1. Context Recovery Agent

Launched: Session start (/msp command) Purpose: Load relevant client context Tasks:

Query previous sessions (last 5)
Retrieve open pending tasks
Get recently used credentials
Fetch infrastructure topology Returns: Concise context summary (< 300 words) API Calls: 4-5 parallel GET requests Context Saved: ~95% (processes MB of data, returns summary)

2. Work Categorization Agent

Launched: Periodically during session or on-demand Purpose: Analyze and categorize recent work Tasks:

Parse conversation transcript
Extract commands, files, systems, technologies
Detect category (infrastructure, troubleshooting, etc.)
Generate dense description
Auto-tag work items Returns: Structured work_item object (JSON) Context Saved: ~90% (processes conversation, returns structured data)

3. Session Summary Agent

Launched: Session end (/msp end or mode switch) Purpose: Generate comprehensive session summary Tasks:

Analyze all work_items from session
Calculate time allocation per category
Generate dense markdown summary
Structure data for API storage
Create billable hours calculation Returns: Summary + API-ready payload Context Saved: ~92% (processes full session, returns summary)

4. Credential Retrieval Agent

Launched: When credential needed Purpose: Securely retrieve and decrypt credentials Tasks:

Query credentials API
Decrypt credential value
Log access to audit trail
Return only credential value Returns: Single credential string API Calls: 2 (retrieve + audit log) Context Saved: ~98% (credential + minimal metadata)

5. Credential Storage Agent

Launched: When new credential discovered Purpose: Encrypt and store credential securely Tasks:

Validate credential data
Encrypt with AES-256-GCM
Link to client/service/infrastructure
Store via API
Create audit log entry Returns: credential_id confirmation Context Saved: ~99% (only ID returned)

6. Historical Search Agent

Launched: On-demand (user asks about past work) Purpose: Search and summarize historical sessions Tasks:

Query sessions database with filters
Parse matching sessions
Extract key outcomes
Generate concise summary Returns: Brief summary of findings Example: "Found 3 backup sessions: [dates] - [outcomes]" Context Saved: ~95% (processes potentially 100s of sessions)

7. Integration Workflow Agent

Launched: Multi-step integration requests Purpose: Execute complex workflows with external tools Tasks:

Search external ticketing systems
Generate work summaries
Update tickets with comments
Pull reports from backup systems
Attach files to tickets
Track all integrations in database Returns: Workflow completion summary API Calls: 5-10+ external + internal calls Context Saved: ~90% (handles large files, API responses) Example: SyncroMSP ticket update + MSP Backups report workflow

8. Problem Pattern Matching Agent

Launched: When user describes an error/issue Purpose: Find similar historical problems Tasks:

Parse error description
Search problem_solutions table
Extract relevant solutions
Rank by similarity Returns: Top 3 similar problems with solutions Context Saved: ~94% (searches all problems, returns matches)

9. Database Query Agent

Launched: Complex reporting or analytics requests Purpose: Execute complex database queries Tasks:

Build SQL queries with filters/joins
Execute query via API
Process result set
Generate summary statistics
Format for presentation Returns: Summary statistics + key findings Example: "Dataforth - Q4 2025: 45 sessions, 120 hours, $12,000 billed" Context Saved: ~93% (processes large result sets)

10. Integration Search Agent

Launched: Searching external systems Purpose: Query SyncroMSP, MSP Backups, etc. Tasks:

Authenticate with external API
Execute search query
Parse results
Summarize findings Returns: Concise list of matches API Calls: 1-3 external API calls Context Saved: ~90% (handles API pagination, large response)

11. Failure Analysis Agent

Launched: When commands/operations fail, or periodically to analyze patterns Purpose: Learn from failures to prevent future mistakes Tasks:

Log all command/operation failures with full context
Analyze failure patterns across sessions
Identify environmental constraints (e.g., "Server 2008 can't run PS7 cmdlets")
Update infrastructure environmental_notes
Generate/update insights.md from failure database
Create actionable resolutions Returns: Updated insights, environmental constraints Context Saved: ~94% (analyzes failures, returns key learnings)

12. Environment Context Agent

Launched: Before making suggestions or running commands on infrastructure Purpose: Check environmental constraints and insights to avoid known failures Tasks:

Query infrastructure environmental_notes
Read insights.md for client/infrastructure
Check failure history for similar operations
Validate command compatibility with environment
Return constraints and recommendations Returns: Environmental context + compatibility warnings Example: "D2TESTNAS: Manual WINS install (no native service), ReadyNAS OS, SMB1 only" Context Saved: ~96% (processes failure history, returns summary)

13. Machine Detection Agent

Launched: Session start, before any other agents Purpose: Identify current machine and load machine-specific context Tasks:

Execute hostname, whoami, detect platform
Generate machine fingerprint (SHA256 hash)
Query machines table for existing record
If new machine: Create record, prompt user for capabilities
If known machine: Load capabilities, VPN access, tool versions
Update last_seen timestamp
Check for tool updates/changes since last session Returns: Machine context (machine_id, capabilities, limitations) Example: "ACG-M-L5090: VPN access (dataforth, grabb), Docker 24.0, PowerShell 7.4" Context Saved: ~97% (machine profile loaded, only key capabilities returned)

Agent Execution Patterns

Sequential Agent Chain

Pattern: Agent A completes → Agent B starts with A's output

Example: Session End

Work Categorization Agent → categorizes final work
Session Summary Agent → uses categorized work to generate summary
Database Storage → API call with structured data

Parallel Agent Execution

Pattern: Multiple agents run simultaneously

Example: Session Start

Context Recovery Agent (previous sessions)
Credential Cache Agent (load frequently used)
Infrastructure Topology Agent (load network map)
All return to main Claude in parallel (fastest wins)

On-Demand Agent

Pattern: Launched only when needed

Example: User asks: "What's the password for AD2?"

Main Claude launches Credential Retrieval Agent
Agent returns credential
Main Claude displays to user

Background Agent

Pattern: Agent runs while user continues working

Example: Large report generation

User continues conversation
Report Generation Agent processes in background
Notifies when complete

Failure-Aware Agent Chain

Pattern: Environment check → Operation → Failure logging → Pattern analysis

Example: Command execution on infrastructure

Environment Context Agent checks constraints before suggesting command
Command executes (success or failure)
If failure: Failure Analysis Agent logs detailed failure
Pattern analysis identifies if this is a recurring issue
Environmental insights updated
Future suggestions avoid this failure

Failure Logging & Environmental Awareness System

Core Principle: Every failure is a learning opportunity. Agents must never make the same mistake twice.

Failure Logging Workflow

1. Command Execution with Failure Tracking

When Main Claude or agent executes a command:

User: "Check WINS status on D2TESTNAS"

Main Claude launches Environment Context Agent:
  - Queries infrastructure table for D2TESTNAS
  - Reads environmental_notes: "Manual WINS install, no native service"
  - Reads environmental_insights for D2TESTNAS
  - Returns: "D2TESTNAS has manually installed WINS (not native ReadyNAS service)"

Main Claude suggests command based on environmental context:
  - NOT: "Check Services GUI for WINS service" (WRONG - no GUI service)
  - CORRECT: "ssh root@192.168.0.9 'systemctl status nmbd'" (right for manual install)

If command fails:
  - Log to commands_run table:
    - success = false
    - exit_code = 1
    - error_message = "systemctl: command not found"
    - failure_category = "command_compatibility"

  - Trigger Failure Analysis Agent:
    - Analyzes error: ReadyNAS doesn't use systemd
    - Identifies correct approach: "service nmbd status" or "ps aux | grep nmbd"
    - Creates failure_pattern entry
    - Updates environmental_insights with correction
    - Returns resolution to Main Claude

Main Claude tries corrected command:
  - Executes: "ssh root@192.168.0.9 'ps aux | grep nmbd'"
  - Success = true
  - Updates original failure record with resolution

2. Environmental Insights Generation

Failure Analysis Agent runs periodically (or after N failures):

Agent Task: "Analyze recent failures and update environmental insights"

Agent performs:

Query failures:
- All unresolved command failures
- All operation failures
- Group by infrastructure_id, client_id, pattern_type
Identify patterns:
- "Get-LocalUser on Server 2008" → 5 occurrences
- Pattern: Server 2008 has PowerShell 2.0 only
- Solution: Use Get-WmiObject Win32_UserAccount instead

Create/update failure_patterns:

INSERT INTO failure_patterns (
  infrastructure_id,
  pattern_type = 'command_compatibility',
  pattern_signature = 'PowerShell 7 cmdlets on Server 2008',
  error_pattern = 'Get-LocalUser.*not recognized',
  failure_description = 'Modern PowerShell cmdlets fail on Server 2008',
  root_cause = 'Server 2008 only has PowerShell 2.0',
  recommended_solution = 'Use Get-WmiObject Win32_UserAccount instead',
  occurrence_count = 5,
  severity = 'major'
)

Generate environmental_insights:

INSERT INTO environmental_insights (
  infrastructure_id,
  insight_category = 'version_limitations',
  insight_title = 'Server 2008: PowerShell 2.0 command compatibility',
  insight_description = '**PowerShell Version**: 2.0 only\n**Avoid**: Get-LocalUser, Get-LocalGroup, etc.\n**Use instead**: Get-WmiObject Win32_UserAccount',
  examples = '["Get-WmiObject Win32_UserAccount", "Get-WmiObject Win32_Group"]',
  confidence_level = 'confirmed',
  verification_count = 5,
  priority = 8
)

Update infrastructure environmental_notes:

UPDATE infrastructure
SET environmental_notes = 'Server 2008 R2. PowerShell 2.0 only (no modern cmdlets). Use WMI for user/group management.'
WHERE hostname = 'old-server'

Generate insights.md file:
- Query all environmental_insights for client
- Format as markdown
- Store in D:\ClaudeTools\insights[client-name].md
- Agents read this file before making suggestions

Agent returns: "Updated 3 failure patterns, added 2 insights for Dataforth"

3. Environment Context Agent Pre-Check

Before suggesting commands/operations:

Agent Task: "Check environmental constraints for D2TESTNAS before command suggestion"

Agent performs:

Query infrastructure:
- Get environmental_notes
- Get powershell_version, shell_type, limitations
Query environmental_insights:
- Get all insights for this infrastructure
- Sort by priority (high first)
Query failure_patterns:
- Get patterns affecting this infrastructure
- Check if proposed command matches any error_pattern
Check command compatibility:
- Proposed: "Get-Service WINS"
- Infrastructure: has_gui = true, powershell_version = "5.1"
- Insights: "WINS manually installed, no native service"
- Result: INCOMPATIBLE - suggest alternative

Agent returns:

Environmental Context for D2TESTNAS:
- ReadyNAS OS (Linux-based)
- Manual WINS installation (Samba nmbd)
- No native Windows services
- Access via SSH only
- SMB1/CORE protocol for DOS compatibility

Recommended commands:
✓ ssh root@192.168.0.9 'ps aux | grep nmbd'
✓ ssh root@192.168.0.9 'cat /etc/frontview/samba/smb.conf.overrides | grep wins'
✗ Check Services GUI (no GUI service manager)
✗ Get-Service (not Windows)

Main Claude uses this context to suggest correct approach.

4. Real-World Examples from Your Feedback

Example 1: D2TESTNAS WINS Service

Problem: Claude suggested "Check Services GUI for WINS"
Failure: User had to correct - WINS is manually installed, no GUI service

Solution after failure logging:
1. Failure logged:
   - operation_type: 'user_instruction_invalid'
   - error_message: 'WINS is manually installed on D2TESTNAS, no native service GUI'
   - target_system: 'D2TESTNAS'

2. Environmental insight created:
   - infrastructure_id: D2TESTNAS
   - insight_category: 'custom_installations'
   - insight_title: 'WINS: Manual Samba installation'
   - insight_description: 'WINS service manually installed via Samba nmbd. Not a native ReadyNAS service. No GUI service manager available.'
   - examples: ["ssh root@192.168.0.9 'ps aux | grep nmbd'"]
   - priority: 9 (high - avoid wasting user time)

3. Future behavior:
   - Environment Context Agent checks before suggesting WINS commands
   - Returns: "D2TESTNAS has manual WINS install (no GUI)"
   - Main Claude suggests SSH commands instead

Example 2: PowerShell 7 on Server 2008

Problem: Suggested Get-LocalUser on Server 2008
Failure: Command not recognized (PowerShell 2.0 only)

Solution after failure logging:
1. Command failure logged:
   - command_text: 'Get-LocalUser'
   - host: 'old-server-2008'
   - success: false
   - error_message: 'Get-LocalUser : The term Get-LocalUser is not recognized'
   - failure_category: 'compatibility'

2. Failure pattern created:
   - pattern_signature: 'Modern PowerShell cmdlets on Server 2008'
   - error_pattern: '(Get-LocalUser|Get-LocalGroup|New-LocalUser).*not recognized'
   - root_cause: 'Server 2008 has PowerShell 2.0 (no modern user management cmdlets)'
   - recommended_solution: 'Use Get-WmiObject Win32_UserAccount'

3. Infrastructure updated:
   - powershell_version: '2.0'
   - limitations: ["no_modern_cmdlets", "no_get_local*_commands"]
   - environmental_notes: 'PowerShell 2.0 only. Use WMI for user/group management.'

4. Future behavior:
   - Environment Context Agent warns: "Server 2008 has PS 2.0 - modern cmdlets unavailable"
   - Main Claude suggests WMI alternatives automatically

Example 3: DOS Batch File Syntax

Problem: Used IF /I (case insensitive) in DOS batch file
Failure: IF /I not recognized in MS-DOS 6.22

Solution:
1. Command failure logged:
   - command_text: 'IF /I "%1"=="STATUS" GOTO STATUS'
   - host: 'TS-27'
   - error_message: 'Invalid switch - /I'
   - failure_category: 'environmental_limitation'

2. Failure pattern created:
   - pattern_signature: 'Modern batch syntax on MS-DOS 6.22'
   - error_pattern: 'IF /I.*Invalid switch'
   - root_cause: 'DOS 6.22 does not support /I flag (added in Windows 2000)'
   - recommended_solution: 'Use duplicate IF statements for upper/lowercase'
   - alternative_approaches: '["IF "%1"=="STATUS" GOTO STATUS", "IF "%1"=="status" GOTO STATUS"]'

3. Infrastructure environmental_notes:
   - 'MS-DOS 6.22. No IF /I, no long filenames (8.3), no Unicode. Use basic batch only.'

4. Future behavior:
   - Environment Context Agent checks OS version before batch suggestions
   - Main Claude generates DOS 6.22 compatible batch files automatically

Benefits of Failure Logging System

1. Self-Improving System:

Each failure makes the system smarter
Patterns identified automatically
Insights generated without manual documentation

2. Reduced User Friction:

User doesn't have to keep correcting same mistakes
Claude learns environmental constraints once
Suggestions are environmentally aware from start

3. Institutional Knowledge Capture:

All environmental quirks documented in database
Survives across sessions and Claude instances
Queryable: "What are known issues with D2TESTNAS?"

4. Proactive Problem Prevention:

Environment Context Agent prevents failures before they happen
Suggests compatible alternatives automatically
Warns about known limitations

5. Audit Trail:

Every failure tracked with full context
Resolution history for troubleshooting
Pattern analysis for infrastructure planning

Machine-Aware Operations

Core Principle: Command suggestions and operations are tailored to the capabilities of the current machine.

Real-World Scenarios

Scenario 1: VPN-Required Client Work

User on Main Laptop (has VPN):

User: "/msp" → selects Dataforth

Machine Detection Agent returns:
- Machine: ACG-M-L5090 (Main Laptop)
- VPN access: ✓ (dataforth, grabb, internal)
- Capabilities: Full toolset

Main Claude:
"MSP Mode: Dataforth | Machine: Main Laptop (VPN: ✓) | Started: 3:45 PM"

User: "SSH to D2TESTNAS and check WINS status"

Environment Context Agent:
- Current machine: Main Laptop (has VPN: dataforth)
- Target: D2TESTNAS (requires VPN to 192.168.0.9)
- ✓ VPN check passed

Main Claude suggests:
"ssh root@192.168.0.9 'ps aux | grep nmbd'"

User on Travel Laptop (NO VPN):

User: "/msp" → selects Dataforth

Machine Detection Agent returns:
- Machine: Travel-Laptop
- VPN access: ✗ (none configured)
- Capabilities: Limited toolset

Main Claude:
"⚠️ MSP Mode: Dataforth | Machine: Travel Laptop (VPN: ✗) | Started: 3:45 PM"
"Warning: Dataforth infrastructure requires VPN access. Travel-Laptop has no VPN configured."
"Some operations may be unavailable. Switch to Main Laptop for full access."

User: "SSH to D2TESTNAS"

Environment Context Agent:
- Current machine: Travel-Laptop (no VPN)
- Target: D2TESTNAS (requires VPN to 192.168.0.9)
- ✗ VPN check FAILED

Main Claude responds:
"Cannot access D2TESTNAS from Travel-Laptop (no VPN connection to 192.168.0.0/24)."
"Options:"
"  1. Switch to Main Laptop or Desktop (have VPN access)"
"  2. Configure VPN on this machine (I can help)"
"  3. Work on different client/task"

Scenario 2: Docker-Based Development

User on Main Laptop (has Docker):

User: "Build the GuruRMM container"

Machine Detection Agent: Main Laptop has Docker 24.0 ✓

Main Claude:
"Building GuruRMM container on Main Laptop..."
[executes docker build commands]

User on Travel Laptop (no Docker):

User: "Build the GuruRMM container"

Machine Detection Agent: Travel-Laptop has no Docker ✗

Main Claude:
"Travel-Laptop doesn't have Docker installed."
"Options:"
"  1. Install Docker (I can guide you)"
"  2. Switch to Main Laptop or Desktop (have Docker)"
"  3. Use remote build server (172.16.3.30)"

Scenario 3: PowerShell Version Differences

User on Main Laptop (PowerShell 7.4):

User: "Get local users on AD2"

Machine Detection Agent: Main Laptop has PowerShell 7.4 ✓

Main Claude suggests modern cmdlet:
"Get-LocalUser -ComputerName AD2"

User on old Desktop (PowerShell 5.1):

User: "Get local users on AD2"

Machine Detection Agent: Desktop has PowerShell 5.1

Environment Context Agent:
- Desktop: PowerShell 5.1 (no Get-LocalUser remote capability)
- Suggests WinRM or alternative

Main Claude:
"Invoke-Command -ComputerName AD2 -ScriptBlock { Get-LocalUser }"
(or suggests WMI alternative if WinRM not available)

Scenario 4: Session History Per Machine

Querying past work:

User: "What did I work on last from this laptop?"

Machine Detection Agent: Current machine = Main Laptop (machine_id: uuid-123)

Historical Search Agent:
Query: SELECT * FROM sessions WHERE machine_id = 'uuid-123' ORDER BY session_date DESC LIMIT 5

Returns:
"Recent sessions from Main Laptop:"
"1. Dataforth - DOS UPDATE.BAT v2.0 (Jan 15, 3.5 hrs)"
"2. Grabb & Durando - DNS migration (Jan 14, 2.0 hrs)"
"3. Internal - GuruRMM container build (Jan 13, 1.5 hrs)"

User: "What about from my desktop?"

Historical Search Agent:
Query: SELECT * FROM sessions WHERE machine_id = (SELECT id FROM machines WHERE friendly_name = 'Desktop')

Returns:
"Recent sessions from Desktop:"
"1. Valley Wide Plastering - M365 migration planning (Jan 12, 2.5 hrs)"
"2. Internal - Infrastructure upgrades (Jan 10, 4.0 hrs)"

Machine-Specific Insights

Machine capabilities inform command suggestions:

-- Before suggesting Docker command
SELECT has_docker FROM machines WHERE id = current_machine_id

-- Before suggesting SSH to client infrastructure
SELECT vpn_profiles FROM machines WHERE id = current_machine_id
-- Check if client's network is in vpn_profiles array

-- Before suggesting PowerShell cmdlets
SELECT powershell_version FROM machines WHERE id = current_machine_id
-- Use PS 2.0 compatible commands if version = "2.0"

Benefits of Machine Tracking

1. Capability-Aware Suggestions:

Never suggest Docker commands on machines without Docker
Never suggest VPN-required access from non-VPN machines
Use version-compatible syntax for PowerShell/tools

2. Session Portability:

Know which sessions were done where
Understand tool availability context for past work
Resume work on appropriate machine

3. Troubleshooting Context:

"This worked on Main Laptop but not Desktop" → Check tool versions
Machine-specific environmental issues tracked
Cross-machine compatibility insights

4. User Experience:

Proactive warnings about capability limitations
Helpful suggestions to switch machines when needed
No wasted time trying commands that won't work

5. Multi-User MSP Team (Future):

Track which technician on which machine
Machine capabilities per team member
Resource allocation (who has VPN access to which clients)

OS-Specific Command Selection

Core Principle: Never suggest Windows commands on Mac, Mac commands on Windows, or Linux-only commands on either.

Command Selection Logic

Machine Detection Agent provides platform context:

{
  "platform": "win32",          // or "darwin", "linux"
  "preferred_shell": "powershell",  // or "zsh", "bash", "cmd"
  "package_manager_commands": {...}
}

Main Claude selects appropriate commands based on platform:

File Operations

Task	Windows (win32)	macOS (darwin)	Linux
List files	`dir` or `Get-ChildItem`	`ls -la`	`ls -la`
Find file	`Get-ChildItem -Recurse -Filter`	`find . -name`	`find . -name`
Copy file	`Copy-Item`	`cp`	`cp`
Move file	`Move-Item`	`mv`	`mv`
Delete file	`Remove-Item`	`rm`	`rm`

Process Management

Task	Windows	macOS	Linux
List processes	`Get-Process`	`ps aux`	`ps aux`
Kill process	`Stop-Process -Name`	`killall`	`pkill`
Process tree	`Get-Process	Select-Object`	`pstree`

Network Operations

Task	Windows	macOS	Linux
IP config	`ipconfig`	`ifconfig`	`ip addr`
DNS lookup	`nslookup`	`dig`	`dig`
Ping	`ping -n 4`	`ping -c 4`	`ping -c 4`
Port check	`Test-NetConnection -Port`	`nc -zv`	`nc -zv`

Package Management

Task	Windows (Chocolatey)	macOS (Homebrew)	Linux (apt/yum)
Install	`choco install {pkg}`	`brew install {pkg}`	`apt install {pkg}`
Update	`choco upgrade {pkg}`	`brew upgrade {pkg}`	`apt upgrade {pkg}`
Search	`choco search {pkg}`	`brew search {pkg}`	`apt search {pkg}`
List	`choco list --local`	`brew list`	`apt list --installed`

MCP & Skill Availability Check

Before calling MCP or Skill:

// Machine Detection Agent returns available_mcps and available_skills

current_machine = {
  "available_mcps": ["claude-in-chrome", "filesystem"],
  "available_skills": ["pdf", "commit", "review-pr"]
}

// User requests: "Take a screenshot of this webpage"

// Check if claude-in-chrome MCP is available:
if (current_machine.available_mcps.includes("claude-in-chrome")) {
  // Use mcp__claude-in-chrome__computer screenshot action
} else {
  // Inform user: "claude-in-chrome MCP not available on this machine"
  // Suggest: "Switch to Main Laptop (has claude-in-chrome MCP)"
}

// User requests: "/pdf" to export document

// Check if pdf skill is available:
if (current_machine.available_skills.includes("pdf")) {
  // Execute pdf skill
} else {
  // Inform user: "pdf skill not available on Travel-Laptop"
  // Suggest: "Install pdf skill or switch to Main Laptop"
}

Real-World Example: Cross-Platform File Search

User on Windows (ACG-M-L5090):

User: "Find all Python files in the project"

Machine Detection Agent: platform = "win32", preferred_shell = "powershell"

Main Claude uses Windows-appropriate command:
"Get-ChildItem -Path . -Recurse -Filter *.py | Select-Object FullName"

OR (if using bash-style preference):
"dir /s /b *.py"

Same user on MacBook:

User: "Find all Python files in the project"

Machine Detection Agent: platform = "darwin", preferred_shell = "zsh"

Main Claude uses macOS-appropriate command:
"find . -name '*.py' -type f"

Shell-Specific Syntax

PowerShell (Windows):

# Variables
$var = "value"

# Conditionals
if ($condition) { }

# Loops
foreach ($item in $collection) { }

# Output
Write-Host "message"

Bash/Zsh (macOS/Linux):

# Variables
var="value"

# Conditionals
if [ "$condition" ]; then fi

# Loops
for item in $collection; do done

# Output
echo "message"

Batch/CMD (Windows legacy):

REM Variables
SET var=value

REM Conditionals
IF "%var%"=="value" ( )

REM Loops
FOR %%i IN (*) DO ( )

REM Output
ECHO message

Environment-Specific Path Separators

Machine Detection Agent provides path conventions:

Platform	Path Separator	Home Directory	Example Path
Windows	`\` (backslash)	`C:\Users\{user}`	`C:\Users\MikeSwanson\Documents`
macOS	`/` (forward slash)	`/Users/{user}`	`/Users/mike/Documents`
Linux	`/` (forward slash)	`/home/{user}`	`/home/mike/documents`

Main Claude constructs paths appropriately:

if (platform === "win32") {
  path = `${home_directory}\\claude-projects\\${project}`
} else {
  path = `${home_directory}/claude-projects/${project}`
}

Benefits

1. No Cross-Platform Errors:

Windows commands never suggested on Mac
Mac commands never suggested on Windows
Shell syntax matches current environment

2. MCP/Skill Availability:

Never attempt to call unavailable MCPs
Suggest alternative machines if MCP needed
Track which skills are installed where

3. Package Manager Intelligence:

Use choco on Windows, brew on Mac, apt on Linux
Correct syntax for each package manager
Installation suggestions platform-appropriate

4. User Experience:

Commands always work on current platform
No manual translation needed
Helpful suggestions when capabilities missing

Summary: MSP Mode = Smart Agent-Based Auto-Tracking

Architecture:

Main Claude Instance: Conversation, decision-making, user interaction
Specialized Agents: Data processing, queries, integrations, analysis

Benefits:

Context Preservation: Main instance stays focused, agents handle heavy lifting
Scalability: Parallel agents for concurrent operations
Information Density: Agents process raw data, return summaries
Separation of Concerns: Clean boundaries between conversation and data operations

User Experience:

Auto-categorize work as it happens (via agents)
Auto-extract structured data (via agents)
Auto-tag based on content (via agents)
Auto-detect billability (via agents)
Auto-generate dense summaries (via agents)
Auto-link related data (via agents)
Minimal user input required - agents do the categorization
Maximum information density - agents ensure brief but complete

Result: User just works, main Claude maintains conversation, agents capture everything in structured, queryable format.

MSP Tool Integrations (Future Capability)

Core Requirement: MSP Mode will integrate with external MSP platforms to automate workflows and link session data to ticketing/documentation systems.

Planned Integrations

1. SyncroMSP (PSA/RMM Platform)

Tickets: Create, update, search, attach files
Time Tracking: Log billable time from sessions automatically
Assets: Link session work to customer assets
Contacts: Associate work with customer contacts
API: REST API with OAuth

2. MSP Backups (Backup Management)

Reports: Pull backup status reports
Alerts: Check backup failures
Statistics: Get backup size, success rates
Export: Attach reports to tickets
API: REST API (vendor-specific)

3. Zapier (Automation Platform)

Triggers: Session completed → Create ticket
Actions: Ticket created → Start session
Webhooks: Bi-directional automation
API: REST API + webhooks

4. Future Integrations

Autotask (PSA)
ConnectWise (PSA)
Datto RMM
IT Glue (Documentation)
Microsoft Teams (notifications)
Email (session summaries)

Example Workflow (Agent-Based)

User request:

"Find the ticket for Dataforth about Backups - update it with the changes we've made today, and pull a report from MSP backups to attach to that ticket"

Main Claude MSP Mode execution:

Main Claude understands multi-step integration request, launches Integration Workflow Agent:

Agent Task: "Execute SyncroMSP ticket update and MSP Backups report workflow for Dataforth"

Agent receives context from Main Claude:

Current session: client_id = Dataforth
Current session work_items (today's work)

Agent executes workflow:

Search for ticket:
- API call: SyncroMSP.search_tickets(customer="Dataforth", subject contains "Backup")
- Result: Ticket #12345 "Backup configuration for NAS"

Generate work summary:

Processes current session work_items

Creates dense summary:

Changes made today:
- Configured Veeam backup job for D2TESTNAS
- Set retention: 30 days local, 90 days cloud
- Tested backup: successful (45GB)
- Verified restore point creation

Update ticket:

API call: SyncroMSP.add_comment(ticket_id=12345, comment=summary)

Store in DB: POST /api/v1/external-integrations

{
  "session_id": "current-session-id",
  "integration_type": "syncro_ticket",
  "external_id": "12345",
  "action": "updated",
  "request_data": {"comment": "..."},
  "response_data": {"comment_id": "67890"}
}

Pull backup report:
- API call: MSPBackups.get_report(customer="Dataforth", date=today)
- Receives report PDF (could be large - agent handles it)
- Summarizes key metrics: "Report: 5 backup jobs, all successful, total size 245GB"
Attach report to ticket:
- API call: SyncroMSP.attach_file(ticket_id=12345, file=report_pdf)
- Store attachment action in external_integrations

Create ticket link:

Store in DB: POST /api/v1/ticket-links

{
  "session_id": "current-session-id",
  "client_id": "dataforth-id",
  "integration_type": "syncro",
  "ticket_id": "12345",
  "ticket_subject": "Backup configuration for NAS",
  "link_type": "documents"
}

Agent returns concise summary:

Workflow complete:
✓ Found ticket #12345: Backup configuration for NAS
✓ Added work summary comment
✓ Pulled backup report (5 jobs, all successful, 245GB)
✓ Attached report to ticket

Main Claude: Presents summary to user, confirms completion

Context Saved:

Agent handled all API calls, file transfers, database updates
Main Claude only received 50-word summary
Large PDF file never entered main context
Multi-step workflow executed autonomously

Database Schema Additions for Integrations

New table: external_integrations

CREATE TABLE external_integrations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
    work_item_id UUID REFERENCES work_items(id) ON DELETE CASCADE,

    -- Integration details
    integration_type VARCHAR(100) NOT NULL, -- 'syncro_ticket', 'msp_backups', 'zapier_webhook'
    external_id VARCHAR(255), -- ticket ID, asset ID, etc.
    external_url VARCHAR(500), -- direct link to resource

    -- Action tracking
    action VARCHAR(50), -- 'created', 'updated', 'linked', 'attached'
    direction VARCHAR(20), -- 'outbound' (we pushed) or 'inbound' (they triggered)

    -- Data
    request_data TEXT, -- JSON: what we sent
    response_data TEXT, -- JSON: what we received

    -- Metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(255), -- user who authorized

    INDEX idx_ext_int_session (session_id),
    INDEX idx_ext_int_type (integration_type),
    INDEX idx_ext_int_external (external_id)
);

New table: integration_credentials

CREATE TABLE integration_credentials (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    integration_name VARCHAR(100) NOT NULL UNIQUE, -- 'syncro', 'msp_backups', 'zapier'

    -- OAuth or API key
    credential_type VARCHAR(50) CHECK(credential_type IN ('oauth', 'api_key', 'basic_auth')),
    api_key_encrypted BYTEA,
    oauth_token_encrypted BYTEA,
    oauth_refresh_token_encrypted BYTEA,
    oauth_expires_at TIMESTAMP,

    -- Endpoints
    api_base_url VARCHAR(500),
    webhook_url VARCHAR(500),

    -- Status
    is_active BOOLEAN DEFAULT true,
    last_tested_at TIMESTAMP,
    last_test_status VARCHAR(50),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_int_cred_name (integration_name)
);

New table: ticket_links

CREATE TABLE ticket_links (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
    client_id UUID REFERENCES clients(id) ON DELETE CASCADE,

    -- Ticket info
    integration_type VARCHAR(100) NOT NULL, -- 'syncro', 'autotask', 'connectwise'
    ticket_id VARCHAR(255) NOT NULL,
    ticket_number VARCHAR(100), -- human-readable: "T12345"
    ticket_subject VARCHAR(500),
    ticket_url VARCHAR(500),
    ticket_status VARCHAR(100),

    -- Linking
    link_type VARCHAR(50), -- 'related', 'resolves', 'documents'
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_ticket_session (session_id),
    INDEX idx_ticket_client (client_id),
    INDEX idx_ticket_external (integration_type, ticket_id)
);

API Integration Layer

FastAPI endpoints for integration management:

GET    /api/v1/integrations                    (list configured integrations)
POST   /api/v1/integrations/{name}/test        (test connection)
GET    /api/v1/integrations/{name}/credentials (get encrypted credentials)
PUT    /api/v1/integrations/{name}/credentials (update credentials)

# Syncro-specific
GET    /api/v1/syncro/tickets                  (search tickets)
POST   /api/v1/syncro/tickets/{id}/comment     (add comment)
POST   /api/v1/syncro/tickets/{id}/attach      (attach file)
POST   /api/v1/syncro/time                     (log time entry)

# MSP Backups
GET    /api/v1/mspbackups/report               (pull report)
GET    /api/v1/mspbackups/status/{client}      (backup status)

# Zapier webhooks
POST   /api/v1/webhooks/zapier                 (receive webhook)

Workflow Automation

Session → Ticket Linking: When MSP Mode session ends:

Ask user: "Link this session to a ticket? (y/n/search)"
If search: query Syncro for tickets matching client
If found: link session_id to ticket_id in ticket_links table
Auto-post session summary as ticket comment (optional)

Auto Time Tracking: When session ends with billable hours:

Ask: "Log 2.5 hours to SyncroMSP? (y/n)"
If yes: POST to Syncro time tracking API
Link time entry ID to session in external_integrations

Backup Report Automation: Trigger: User mentions "backup" in MSP session for client

Detect keyword "backup"
Auto-suggest: "Pull latest backup report for [Client]? (y/n)"
If yes: Query MSPBackups API, display summary
Option to attach to ticket or save to session

Permission & Security

OAuth Flow:

User initiates: /msp integrate syncro
Claude generates OAuth URL, user authorizes in browser
Callback URL receives token, encrypts, stores in integration_credentials
Refresh token used to maintain access

API Key Storage:

All integration credentials encrypted with AES-256-GCM
Same master key as credential storage
Audit log for all integration credential access

Scopes:

Read-only for initial implementation (search tickets, pull reports)
Write access requires explicit user confirmation per action
Never auto-update tickets without user approval

Future Capabilities

Natural Language Integration:

"Create a ticket for Dataforth about today's backup work"
"Show me all open tickets for Grabb & Durando"
"Pull the backup report for last week and email it to [contact]"
"Log today's 3 hours to ticket T12345"
"What tickets mention Apache or SSL?"

Multi-Step Workflows:

Session ends → Auto-create ticket → Auto-log time → Auto-attach session summary
Backup failure detected (via webhook) → Create session → Investigate → Update ticket
Ticket created in Syncro (webhook) → Notify Claude → Start MSP session

Bi-Directional Sync:

Ticket updated in Syncro → Webhook to Claude → Add to pending_tasks
Session completed in Claude → Auto-comment in ticket
Time logged in Claude → Synced to Syncro billing

Implementation Priority

Phase 1 (MVP):

Database tables for integrations
SyncroMSP ticket search and read
Manual ticket linking
Session summary → ticket comment (manual)

Phase 2:

MSP Backups report pulling
File attachments to tickets
OAuth token refresh automation
Auto-suggest ticket linking

Phase 3:

Zapier webhook triggers
Auto time tracking
Multi-step workflows
Natural language commands

Phase 4:

Bi-directional sync
Advanced automation
Additional PSA integrations (Autotask, ConnectWise)
IT Glue documentation sync

Impact on Current Architecture

API Design Considerations:

Modular integration layer (plugins per platform)
Webhook receiver endpoints
OAuth flow support
Rate limiting per integration
Retry logic for failed API calls

Database Design:

external_integrations table (already designed above)
integration_credentials table (already designed above)
ticket_links table (already designed above)
Indexes for fast external_id lookups

Security:

Integration credentials separate from user credentials
Per-integration permission scopes
Audit logging for all external API calls
User confirmation for write operations

FastAPI Architecture:

# Integration plugins
integrations/
├── __init__.py
├── base.py              (BaseIntegration abstract class)
├── syncro.py            (SyncroMSP integration)
├── mspbackups.py        (MSP Backups integration)
├── zapier.py            (Zapier webhooks)
└── future/
    ├── autotask.py
    ├── connectwise.py
    └── itglue.py

This integration capability is foundational to MSP Mode's value proposition - linking real-world MSP workflows to intelligent automation.

Normal Mode Behaviors (Agent-Based Architecture)

Core Principle: Track valuable work that doesn't belong to a specific client or dev project. General research, internal tasks, exploration, learning.

Agent Usage in Normal Mode: Same agent architecture as MSP Mode, but with lighter tracking requirements.

Purpose

Normal Mode is for:

Research and exploration - "How does JWT authentication work?"
General questions - "What's the best way to handle SQL migrations?"
Internal infrastructure (non-client) - Working on Jupiter/Saturn without client context
Learning/experimentation - Testing new tools, trying approaches
Documentation - Writing guides, updating READMEs
Non-categorized work - Anything that doesn't fit MSP or Dev

Not for:

Client work → Use MSP Mode
Specific development projects → Use Dev Mode

When `/normal` is Called

Mode Switch:
- If coming from MSP/Dev mode: preserve all knowledge/context from previous mode
- Set session context to "general" (no client_id, no project_id)
- Display: "Normal Mode | General work session"
Knowledge Retention:
- Keep: All learned information, credentials accessed, context from previous modes
- Clear: Client/project assignment only
- Rationale: You might research something in Normal mode, then apply it in MSP mode
Session Creation:
- Create session with:
  - client_id = NULL
  - project_id = NULL (or link to "Internal" or "Research" pseudo-project)
  - session_title = "General work session: [auto-generated from topic]"
  - is_billable = false (by default, since not client work)

During Normal Mode Session

Tracking (lighter than MSP):

Still create work_items, but less granular
Track major actions: "Researched FastAPI authentication patterns"
Track useful findings: "Found pyjwt library, better than python-jose"
Track decisions: "Decided to use Alembic for migrations"

What gets stored:

Work items with category (usually 'documentation', 'research', or 'development')
Key commands run (if applicable)
Files modified (if applicable)
Tags: technology, topics
Brief summary of what was learned/accomplished

What doesn't get stored:

Less emphasis on billability tracking
No client/project relationships
Less detailed command/file tracking (unless requested)

Information Density in Normal Mode

Focus on: Capturing useful knowledge, decisions, findings

Example Normal Mode work_item:

Title: Researched PostgreSQL connection pooling for FastAPI
Category: research
Description: Compared SQLAlchemy pooling vs asyncpg.
Finding: SQLAlchemy pool_size=20, max_overflow=10 optimal for our load.
Decision: Use SQLAlchemy with pool_pre_ping=True for connection health checks.
Tags: postgresql, sqlalchemy, fastapi, connection-pooling
Billable: false

Not: "I started by searching for PostgreSQL connection pooling documentation..."

Session End (Normal Mode)

Auto-save:

session.client_id = NULL
session.is_billable = false
session.summary = brief summary of research/work done

Generated summary example:

Session: General Research
Duration: 45 minutes
Category: Research

Topics Explored:
- FastAPI database connection pooling
- JWT vs session authentication
- Alembic migration strategies

Key Findings:
- SQLAlchemy pooling recommended for our use case
- JWT refresh tokens better than long-lived access tokens
- Alembic supports auto-generate from models

Tags: fastapi, postgresql, jwt, alembic
Billable: No

Value of Normal Mode Sessions

Why track this?

Knowledge base - "What did I learn about X last month?"
Decision trail - "Why did we choose technology Y?"
Reference - "Where did I see that solution before?"
Context recovery - Future Claude instances can search: "Show me research on JWT authentication"

Queryable:

"What have I researched about Docker networking?"
"When did I decide to use FastAPI over Flask?"
"Show all sessions tagged 'postgresql'"

Mode Comparison

Aspect	MSP Mode	Dev Mode	Normal Mode
Purpose	Client work	Specific projects	General work/research
Client/Project	Required	Optional	None (NULL)
Billable	Default: yes	Default: no	Default: no
Detail Level	High (every command)	Medium	Light (key actions)
Focus	Client value delivery	Code/features	Knowledge/learning
Session Title	"[Client] - [Issue]"	"[Project] - [Feature]"	"Research: [Topic]"

Switching Between Modes

MSP → Normal:

User: "Let me research how to fix this SSL issue"
/normal → Research mode, but can reference client context if needed
Knowledge retained: knows which client, what SSL issue
Categorization: research session, not billable client work

Normal → MSP:

User: "Okay, back to Dataforth"
/msp → Resumes (or starts new) Dataforth session
Knowledge retained: knows solution from research
Categorization: client work, billable

Dev → Normal → MSP:

Modes are fluid, knowledge carries through
Only categorization changes
Session assignments change (project vs client vs general)

Development Mode Behaviors (To Define)

What should Development Mode track?
How does it differ from MSP and Normal modes?
Integration with git repos?

Implementation Order

Database schema design
API development and deployment
MCP server or API client for Claude Code
MSP Mode slash command
Development Mode
Normal Mode

Security Considerations

Credential Storage

Never store plaintext passwords
Use Fernet encryption or AES-256-GCM
Encryption key stored separately from database
Key rotation strategy needed

API Security

HTTPS only (no HTTP)
Rate limiting (prevent brute force)
IP whitelisting (optional - VPN only)
Token expiration and refresh
Revocation list for compromised tokens
Audit logging for credential access

Multi-Machine Sync

Encrypted tokens in Gitea config
git-crypt or encrypted JSON values
Never commit plaintext tokens to repo

Next Steps (Planning Phase)

✅ Architecture decisions (SQL, FastAPI, JWT)
⏳ Define MSP Mode behaviors in detail
⏳ Design database schema
⏳ Define API endpoints specification
⏳ Create authentication flow diagram
⏳ Design slash command interactions

Notes

This spec will evolve as we discuss details
Focus on scalability and robustness
Design for future team members and integrations
All decisions documented with rationale

Change Log

2026-01-15 (Evening Update 3):

CRITICAL ADDITION: Machine Detection & OS-Specific Command Selection
- Added 1 new specialized agent (total: 13 agents): 13. Machine Detection Agent (identifies current machine, loads capabilities)
- Added 1 new database table (total: 36 tables):
  - machines (technician's laptops/desktops with capabilities tracking)
  - backup_log (backup tracking with verification status)
- Enhanced sessions table with machine_id tracking
- Machine fingerprinting via SHA256(hostname|username|platform|home_dir)
- Auto-detection on session start (hostname, whoami, platform)
- Machine capabilities tracked:
  - VPN access per client, Docker, PowerShell version, SSH, Git
  - Available MCPs (claude-in-chrome, filesystem, etc.)
  - Available skills (pdf, commit, review-pr, etc.)
  - OS-specific package managers (choco, brew, apt)
  - Preferred shell (powershell, zsh, bash, cmd)
- OS-specific command selection:
  - Windows vs macOS vs Linux command mapping
  - Shell-specific syntax (PowerShell vs Bash vs Batch)
  - Path separator handling (\ vs /)
  - Package manager commands per platform
- MCP/Skill availability checking before calls
- VPN requirements validation before client access
- Real-world scenarios documented:
  - VPN-required client work (warns if no VPN on current machine)
  - Docker-based development (suggests machines with Docker)
  - PowerShell version differences (uses compatible cmdlets)
  - Session history per machine tracking
- User has 3 laptops + 1 desktop, each with different environments
- Benefits: No cross-platform errors, capability-aware suggestions, session portability

2026-01-15 (Evening Update 2):

CRITICAL ADDITION: Failure Logging & Environmental Awareness System
- Added 2 new specialized agents: 11. Failure Analysis Agent (learns from all failures) 12. Environment Context Agent (pre-checks before suggestions)
- Added 3 new database tables (total: 33 tables):
  - failure_patterns (aggregated failure insights)
  - environmental_insights (generated insights.md content)
  - operation_failures (non-command failures)
- Enhanced infrastructure table with environmental constraints:
  - environmental_notes, powershell_version, shell_type, limitations, has_gui
- Enhanced commands_run table with failure tracking:
  - exit_code, error_message, failure_category, resolution, resolved
- Documented complete failure logging workflow:
  - Command execution → Failure detection → Pattern analysis → Insights generation
  - Environment pre-check prevents future failures
  - Self-improving system learns from every mistake
- Real-world examples documented:
  - D2TESTNAS WINS service (manual install, no GUI)
  - PowerShell 7 cmdlets on Server 2008 (version incompatibility)
  - DOS batch file syntax (IF /I not supported in DOS 6.22)
- Benefits: Self-improving, reduced user friction, institutional knowledge, proactive prevention

2026-01-15 (Evening Update 1):

CRITICAL ARCHITECTURAL ADDITION: Agent-Based Execution
- Added core principle: All modes use specialized agents to preserve main context
- Documented 10 specialized agent types:
  1. Context Recovery Agent (session start)
  2. Work Categorization Agent (periodic analysis)
  3. Session Summary Agent (session end)
  4. Credential Retrieval Agent (secure access)
  5. Credential Storage Agent (secure storage)
  6. Historical Search Agent (on-demand queries)
  7. Integration Workflow Agent (multi-step external integrations)
  8. Problem Pattern Matching Agent (solution lookup)
  9. Database Query Agent (complex reporting)
  10. Integration Search Agent (external system queries)
- Defined agent execution patterns: Sequential Chain, Parallel, On-Demand, Background
- Updated all MSP Mode workflows to use agents
- Updated integration example to demonstrate agent-based execution
- Added context preservation metrics (90-99% context saved per agent)
- Architecture benefits: Context preservation, scalability, separation of concerns
- User experience: Agents handle all heavy lifting, main Claude stays conversational

2026-01-15 (Initial):

Initial spec created
Architecture decisions: SQL, FastAPI, JWT, Gitea config
Technology stack defined
High-level infrastructure design
Open questions identified
Database schema designed: 30 tables via parallel agent analysis
- 5 parallel agents analyzed: sessions, credentials, projects, work categorization, infrastructure
- Comprehensive schema with 25 core tables + 5 junction tables
- Analyzed 37 session logs, credentials file, all projects, infrastructure docs
- Estimated storage: 1-2 GB/year
- Pre-identified 157+ tags for categorization
MSP Mode behaviors defined:
- Auto-categorization of client work
- Information-dense storage format
- Auto-tracking: commands, files, problems, credentials, infrastructure changes
- Smart billability detection
- Context awareness and auto-suggestion
Normal Mode behaviors defined:
- For general work/research not assigned to client or dev project
- Knowledge retention across mode switches
- Lighter tracking than MSP mode
- Captures decisions, findings, learnings
- Queryable knowledge base
External integrations architecture added:
- SyncroMSP, MSP Backups, Zapier integration design
- 3 additional database tables (external_integrations, integration_credentials, ticket_links)
- Multi-step workflow example documented
- OAuth flow and security considerations

116 KiB Raw Blame History

MSP Mode Specification

Overview

Objectives

Core Architectural Principle: Agent-Based Execution

Why Agent-Based Architecture?

Agent Usage Patterns

Agent Communication Pattern

Architecture Decisions

Storage: SQL Database (MariaDB)

Access Method: REST API with JWT Authentication

Technology Stack

Infrastructure Design

Jupiter Server (172.16.3.20)

Gitea Private Repository

Local Machine (D:\ClaudeTools)

API Design Principles

Versioning

Security

Endpoints (Draft - To Be Detailed)

Error Handling

Logging

JWT Token Structure

Access Token (Short-lived: 1 hour)

Refresh Token (Long-lived: 30 days)

Scopes (Permissions)

Database Schema (Draft - To Be Detailed)

Tables (High-Level)

Schema Versioning

Modes Overview (D:\ClaudeTools Context)

1. MSP Mode

2. Development Mode

3. Normal Mode

Database Schema Design

Schema Summary

1. Core MSP Tracking Tables (6 tables)

machines

clients

projects

sessions

work_items

pending_tasks

tasks

2. Client & Infrastructure Tables (7 tables)

sites

infrastructure

services

service_relationships

networks

firewall_rules

m365_tenants

3. Credentials & Security Tables (4 tables)

credentials

credential_audit_log

security_incidents

credential_permissions

4. Work Details Tables (6 tables)

file_changes

commands_run

infrastructure_changes

backup_log

problem_solutions

deployments

database_changes

failure_patterns

environmental_insights

operation_failures

5. Tagging & Categorization Tables (3 tables)

tags

work_item_tags (Junction Table)

session_tags (Junction Table)

6. System & Audit Tables (2 tables)

api_audit_log

schema_migrations

Junction Tables Summary

Schema Statistics

Design Principles Applied

Next Steps for Database Implementation

Open Questions

MSP Mode Behaviors (DEFINED)

116 KiB

Raw Blame History

`machines`

`clients`

`projects`

`sessions`

`work_items`

`pending_tasks`

`tasks`

`sites`

`infrastructure`

`services`

`service_relationships`

`networks`

`firewall_rules`

`m365_tenants`

`credentials`

`credential_audit_log`

`security_incidents`

`credential_permissions`

`file_changes`

`commands_run`

`infrastructure_changes`

`backup_log`

`problem_solutions`

`deployments`

`database_changes`

`failure_patterns`

`environmental_insights`

`operation_failures`

`tags`

`work_item_tags` (Junction Table)

`session_tags` (Junction Table)

`api_audit_log`

`schema_migrations`

When `/msp` is Called (Session Start)

When `/normal` is Called