Files
Mike Swanson eca8fe820e sync: Auto-sync from ACG-M-L5090 at 2026-01-22 19:22:24
Synced files:
- Grepai optimization documentation
- Ollama Assistant MCP server implementation
- Session logs and context updates

Machine: ACG-M-L5090
Timestamp: 2026-01-22 19:22:24

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 19:23:16 -07:00
..

Ollama MCP Server - Local AI Assistant

Purpose: Integrate Ollama local models with Claude Code via MCP, allowing Claude to delegate tasks to a local model that has computer access.

Use Cases

  • Code Analysis: Delegate code review to local model for privacy-sensitive code
  • Data Processing: Process large local datasets without API costs
  • Offline Work: Continue working when internet/API is unavailable
  • Cost Optimization: Use local model for simple tasks, Claude for complex reasoning

Architecture

┌─────────────────┐
│  Claude Code    │ (Coordinator)
└────────┬────────┘
         │
         │ MCP Protocol
         ↓
┌─────────────────────────────┐
│  Ollama MCP Server          │
│  - Exposes tools:           │
│    • ask_ollama()           │
│    • analyze_code()         │
│    • process_data()         │
└────────┬────────────────────┘
         │
         │ HTTP API
         ↓
┌─────────────────────────────┐
│  Ollama                     │
│  - Model: llama3.1:8b       │
│  - Local execution          │
└─────────────────────────────┘

Installation

1. Install Ollama

Windows:

# Download from https://ollama.ai/download
# Or use winget
winget install Ollama.Ollama

Verify Installation:

ollama --version

2. Pull a Model

# Recommended models:
ollama pull llama3.1:8b      # Best balance (4.7GB)
ollama pull codellama:13b    # Code-focused (7.4GB)
ollama pull mistral:7b       # Fast, good reasoning (4.1GB)
ollama pull qwen2.5-coder:7b # Excellent for code (4.7GB)

3. Test Ollama

ollama run llama3.1:8b "What is MCP?"

4. Create MCP Server

File: mcp-servers/ollama-assistant/server.py

#!/usr/bin/env python3
"""
Ollama MCP Server
Provides local AI assistance to Claude Code via MCP protocol
"""

import asyncio
import json
from typing import Any
import httpx
from mcp.server import Server
from mcp.types import Tool, TextContent

# Configuration
OLLAMA_HOST = "http://localhost:11434"
DEFAULT_MODEL = "llama3.1:8b"

# Create MCP server
app = Server("ollama-assistant")

@app.list_tools()
async def list_tools() -> list[Tool]:
    """List available Ollama tools"""
    return [
        Tool(
            name="ask_ollama",
            description="Ask the local Ollama model a question. Use for simple queries, code review, or when you want a second opinion. The model has no context of the conversation.",
            inputSchema={
                "type": "object",
                "properties": {
                    "prompt": {
                        "type": "string",
                        "description": "The question or task for Ollama"
                    },
                    "model": {
                        "type": "string",
                        "description": "Model to use (default: llama3.1:8b)",
                        "default": DEFAULT_MODEL
                    },
                    "system": {
                        "type": "string",
                        "description": "System prompt to set context/role",
                        "default": "You are a helpful coding assistant."
                    }
                },
                "required": ["prompt"]
            }
        ),
        Tool(
            name="analyze_code_local",
            description="Analyze code using local Ollama model. Good for privacy-sensitive code or large codebases. Returns analysis without sending code to external APIs.",
            inputSchema={
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "Code to analyze"
                    },
                    "language": {
                        "type": "string",
                        "description": "Programming language"
                    },
                    "analysis_type": {
                        "type": "string",
                        "enum": ["security", "performance", "quality", "bugs", "general"],
                        "description": "Type of analysis to perform"
                    }
                },
                "required": ["code", "language"]
            }
        ),
        Tool(
            name="summarize_large_file",
            description="Summarize large files using local model. No size limits or API costs.",
            inputSchema={
                "type": "object",
                "properties": {
                    "content": {
                        "type": "string",
                        "description": "File content to summarize"
                    },
                    "summary_length": {
                        "type": "string",
                        "enum": ["brief", "detailed", "technical"],
                        "default": "brief"
                    }
                },
                "required": ["content"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: Any) -> list[TextContent]:
    """Execute Ollama tool"""

    if name == "ask_ollama":
        prompt = arguments["prompt"]
        model = arguments.get("model", DEFAULT_MODEL)
        system = arguments.get("system", "You are a helpful coding assistant.")

        response = await query_ollama(prompt, model, system)
        return [TextContent(type="text", text=response)]

    elif name == "analyze_code_local":
        code = arguments["code"]
        language = arguments["language"]
        analysis_type = arguments.get("analysis_type", "general")

        system = f"You are a {language} code analyzer. Focus on {analysis_type} analysis."
        prompt = f"Analyze this {language} code:\n\n```{language}\n{code}\n```\n\nProvide a {analysis_type} analysis."

        response = await query_ollama(prompt, "codellama:13b", system)
        return [TextContent(type="text", text=response)]

    elif name == "summarize_large_file":
        content = arguments["content"]
        summary_length = arguments.get("summary_length", "brief")

        system = f"You are a file summarizer. Create {summary_length} summaries."
        prompt = f"Summarize this file content:\n\n{content}"

        response = await query_ollama(prompt, DEFAULT_MODEL, system)
        return [TextContent(type="text", text=response)]

    else:
        raise ValueError(f"Unknown tool: {name}")

async def query_ollama(prompt: str, model: str, system: str) -> str:
    """Query Ollama API"""
    async with httpx.AsyncClient(timeout=120.0) as client:
        response = await client.post(
            f"{OLLAMA_HOST}/api/generate",
            json={
                "model": model,
                "prompt": prompt,
                "system": system,
                "stream": False
            }
        )
        response.raise_for_status()
        result = response.json()
        return result["response"]

async def main():
    """Run MCP server"""
    from mcp.server.stdio import stdio_server

    async with stdio_server() as (read_stream, write_stream):
        await app.run(
            read_stream,
            write_stream,
            app.create_initialization_options()
        )

if __name__ == "__main__":
    asyncio.run(main())

5. Install MCP Server Dependencies

cd D:\ClaudeTools\mcp-servers\ollama-assistant
python -m venv venv
venv\Scripts\activate
pip install mcp httpx

6. Configure in Claude Code

Edit: .mcp.json (in D:\ClaudeTools)

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "your-token-here"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "D:\\ClaudeTools"]
    },
    "sequential-thinking": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
    },
    "ollama-assistant": {
      "command": "python",
      "args": [
        "D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\venv\\Scripts\\python.exe",
        "D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\server.py"
      ]
    }
  }
}

Usage Examples

Example 1: Ask Ollama for a Second Opinion

User: "Review this authentication code for security issues"

Claude: Let me delegate this to the local Ollama model for a privacy-focused review.

[Uses ask_ollama tool]
Ollama: "Found potential issues: 1. Password not hashed... 2. No rate limiting..."

Claude: Based on the local analysis, here are the security concerns...

Example 2: Analyze Large Codebase Locally

User: "Analyze this 10,000 line file for performance issues"

Claude: This is large - I'll use the local model to avoid API costs.

[Uses summarize_large_file tool]
Ollama: "Main performance bottlenecks: 1. N+1 queries... 2. Missing indexes..."

Claude: Here's the performance analysis from the local model...

Example 3: Offline Development

User: "Help me debug this code" (while offline)

Claude: API unavailable, using local Ollama model...

[Uses analyze_code_local tool]
Ollama: "Bug found on line 42: null reference..."

Claude: The local model identified the issue...

Option 2: Standalone Ollama with MCP Tools

Run Ollama as a separate agent with its own MCP server access.

Architecture:

┌─────────────────┐     ┌─────────────────────┐
│  Claude Code    │     │  Ollama + MCP       │
│  (Main Agent)   │────▶│  (Helper Agent)     │
└─────────────────┘     └──────────┬──────────┘
                                   │
                                   │ MCP Protocol
                                   ↓
                        ┌──────────────────────┐
                        │  MCP Servers         │
                        │  - Filesystem        │
                        │  - Bash              │
                        │  - Custom tools      │
                        └──────────────────────┘

Tool: Use ollama-mcp or similar wrapper that gives Ollama access to MCP servers.


Option 3: Hybrid Task Distribution

Use Claude as coordinator, Ollama for execution.

When to use Ollama:

  • Privacy-sensitive code review
  • Large file processing (no token limits)
  • Offline work
  • Cost optimization (simple tasks)
  • Repetitive analysis

When to use Claude:

  • Complex reasoning
  • Multi-step planning
  • API integrations
  • Final decision-making
  • User communication

Task Type Recommended Model Size Reason
Code Review qwen2.5-coder:7b 4.7GB Best code understanding
Code Generation codellama:13b 7.4GB Trained on code
General Queries llama3.1:8b 4.7GB Balanced performance
Fast Responses mistral:7b 4.1GB Speed optimized
Large Context llama3.1:70b 40GB 128k context (needs GPU)

Performance Considerations

CPU Only:

  • llama3.1:8b: ~2-5 tokens/sec
  • Usable for short queries

GPU (NVIDIA):

  • llama3.1:8b: ~30-100 tokens/sec
  • codellama:13b: ~20-50 tokens/sec
  • Much faster, recommended

Enable GPU in Ollama:

# Ollama auto-detects GPU
# Verify: check Ollama logs for "CUDA" or "Metal"

Next Steps

  1. Install Ollama
  2. Pull a model (llama3.1:8b recommended)
  3. Create MCP server (use code above)
  4. Configure .mcp.json
  5. Restart Claude Code
  6. Test: "Use the local Ollama model to analyze this code"

Status: Design phase - ready to implement Created: 2026-01-22