claudetools/mcp-servers/ollama-assistant/README.md

# Ollama MCP Server - Local AI Assistant

**Purpose:** Integrate Ollama local models with Claude Code via MCP, allowing Claude to delegate tasks to a local model that has computer access.

## Use Cases

- **Code Analysis:** Delegate code review to local model for privacy-sensitive code
- **Data Processing:** Process large local datasets without API costs
- **Offline Work:** Continue working when internet/API is unavailable
- **Cost Optimization:** Use local model for simple tasks, Claude for complex reasoning

---

## Architecture

```
┌─────────────────┐
│  Claude Code    │ (Coordinator)
└────────┬────────┘
         │
         │ MCP Protocol
         ↓
┌─────────────────────────────┐
│  Ollama MCP Server          │
│  - Exposes tools:           │
│    • ask_ollama()           │
│    • analyze_code()         │
│    • process_data()         │
└────────┬────────────────────┘
         │
         │ HTTP API
         ↓
┌─────────────────────────────┐
│  Ollama                     │
│  - Model: llama3.1:8b       │
│  - Local execution          │
└─────────────────────────────┘
```

---

## Installation

### 1. Install Ollama

**Windows:**
```powershell
# Download from https://ollama.ai/download
# Or use winget
winget install Ollama.Ollama
```

**Verify Installation:**
```bash
ollama --version
```

### 2. Pull a Model

```bash
# Recommended models:
ollama pull llama3.1:8b      # Best balance (4.7GB)
ollama pull codellama:13b    # Code-focused (7.4GB)
ollama pull mistral:7b       # Fast, good reasoning (4.1GB)
ollama pull qwen2.5-coder:7b # Excellent for code (4.7GB)
```

### 3. Test Ollama

```bash
ollama run llama3.1:8b "What is MCP?"
```

### 4. Create MCP Server

**File:** `mcp-servers/ollama-assistant/server.py`

```python
#!/usr/bin/env python3
"""
Ollama MCP Server
Provides local AI assistance to Claude Code via MCP protocol
"""

import asyncio
import json
from typing import Any
import httpx
from mcp.server import Server
from mcp.types import Tool, TextContent

# Configuration
OLLAMA_HOST = "http://localhost:11434"
DEFAULT_MODEL = "llama3.1:8b"

# Create MCP server
app = Server("ollama-assistant")

@app.list_tools()
async def list_tools() -> list[Tool]:
    """List available Ollama tools"""
    return [
        Tool(
            name="ask_ollama",
            description="Ask the local Ollama model a question. Use for simple queries, code review, or when you want a second opinion. The model has no context of the conversation.",
            inputSchema={
                "type": "object",
                "properties": {
                    "prompt": {
                        "type": "string",
                        "description": "The question or task for Ollama"
                    },
                    "model": {
                        "type": "string",
                        "description": "Model to use (default: llama3.1:8b)",
                        "default": DEFAULT_MODEL
                    },
                    "system": {
                        "type": "string",
                        "description": "System prompt to set context/role",
                        "default": "You are a helpful coding assistant."
                    }
                },
                "required": ["prompt"]
            }
        ),
        Tool(
            name="analyze_code_local",
            description="Analyze code using local Ollama model. Good for privacy-sensitive code or large codebases. Returns analysis without sending code to external APIs.",
            inputSchema={
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "Code to analyze"
                    },
                    "language": {
                        "type": "string",
                        "description": "Programming language"
                    },
                    "analysis_type": {
                        "type": "string",
                        "enum": ["security", "performance", "quality", "bugs", "general"],
                        "description": "Type of analysis to perform"
                    }
                },
                "required": ["code", "language"]
            }
        ),
        Tool(
            name="summarize_large_file",
            description="Summarize large files using local model. No size limits or API costs.",
            inputSchema={
                "type": "object",
                "properties": {
                    "content": {
                        "type": "string",
                        "description": "File content to summarize"
                    },
                    "summary_length": {
                        "type": "string",
                        "enum": ["brief", "detailed", "technical"],
                        "default": "brief"
                    }
                },
                "required": ["content"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: Any) -> list[TextContent]:
    """Execute Ollama tool"""

    if name == "ask_ollama":
        prompt = arguments["prompt"]
        model = arguments.get("model", DEFAULT_MODEL)
        system = arguments.get("system", "You are a helpful coding assistant.")

        response = await query_ollama(prompt, model, system)
        return [TextContent(type="text", text=response)]

    elif name == "analyze_code_local":
        code = arguments["code"]
        language = arguments["language"]
        analysis_type = arguments.get("analysis_type", "general")

        system = f"You are a {language} code analyzer. Focus on {analysis_type} analysis."
        prompt = f"Analyze this {language} code:\n\n```{language}\n{code}\n```\n\nProvide a {analysis_type} analysis."

        response = await query_ollama(prompt, "codellama:13b", system)
        return [TextContent(type="text", text=response)]

    elif name == "summarize_large_file":
        content = arguments["content"]
        summary_length = arguments.get("summary_length", "brief")

        system = f"You are a file summarizer. Create {summary_length} summaries."
        prompt = f"Summarize this file content:\n\n{content}"

        response = await query_ollama(prompt, DEFAULT_MODEL, system)
        return [TextContent(type="text", text=response)]

    else:
        raise ValueError(f"Unknown tool: {name}")

async def query_ollama(prompt: str, model: str, system: str) -> str:
    """Query Ollama API"""
    async with httpx.AsyncClient(timeout=120.0) as client:
        response = await client.post(
            f"{OLLAMA_HOST}/api/generate",
            json={
                "model": model,
                "prompt": prompt,
                "system": system,
                "stream": False
            }
        )
        response.raise_for_status()
        result = response.json()
        return result["response"]

async def main():
    """Run MCP server"""
    from mcp.server.stdio import stdio_server

    async with stdio_server() as (read_stream, write_stream):
        await app.run(
            read_stream,
            write_stream,
            app.create_initialization_options()
        )

if __name__ == "__main__":
    asyncio.run(main())
```

### 5. Install MCP Server Dependencies

```bash
cd D:\ClaudeTools\mcp-servers\ollama-assistant
python -m venv venv
venv\Scripts\activate
pip install mcp httpx
```

### 6. Configure in Claude Code

**Edit:** `.mcp.json` (in D:\ClaudeTools)

```json
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "your-token-here"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "D:\\ClaudeTools"]
    },
    "sequential-thinking": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
    },
    "ollama-assistant": {
      "command": "python",
      "args": [
        "D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\venv\\Scripts\\python.exe",
        "D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\server.py"
      ]
    }
  }
}
```

---

## Usage Examples

### Example 1: Ask Ollama for a Second Opinion

```
User: "Review this authentication code for security issues"

Claude: Let me delegate this to the local Ollama model for a privacy-focused review.

[Uses ask_ollama tool]
Ollama: "Found potential issues: 1. Password not hashed... 2. No rate limiting..."

Claude: Based on the local analysis, here are the security concerns...
```

### Example 2: Analyze Large Codebase Locally

```
User: "Analyze this 10,000 line file for performance issues"

Claude: This is large - I'll use the local model to avoid API costs.

[Uses summarize_large_file tool]
Ollama: "Main performance bottlenecks: 1. N+1 queries... 2. Missing indexes..."

Claude: Here's the performance analysis from the local model...
```

### Example 3: Offline Development

```
User: "Help me debug this code" (while offline)

Claude: API unavailable, using local Ollama model...

[Uses analyze_code_local tool]
Ollama: "Bug found on line 42: null reference..."

Claude: The local model identified the issue...
```

---

## Option 2: Standalone Ollama with MCP Tools

Run Ollama as a separate agent with its own MCP server access.

**Architecture:**
```
┌─────────────────┐     ┌─────────────────────┐
│  Claude Code    │     │  Ollama + MCP       │
│  (Main Agent)   │────▶│  (Helper Agent)     │
└─────────────────┘     └──────────┬──────────┘
                                   │
                                   │ MCP Protocol
                                   ↓
                        ┌──────────────────────┐
                        │  MCP Servers         │
                        │  - Filesystem        │
                        │  - Bash              │
                        │  - Custom tools      │
                        └──────────────────────┘
```

**Tool:** Use `ollama-mcp` or similar wrapper that gives Ollama access to MCP servers.

---

## Option 3: Hybrid Task Distribution

Use Claude as coordinator, Ollama for execution.

**When to use Ollama:**
- Privacy-sensitive code review
- Large file processing (no token limits)
- Offline work
- Cost optimization (simple tasks)
- Repetitive analysis

**When to use Claude:**
- Complex reasoning
- Multi-step planning
- API integrations
- Final decision-making
- User communication

---

## Recommended Models for Different Tasks

| Task Type | Recommended Model | Size | Reason |
|-----------|------------------|------|--------|
| Code Review | qwen2.5-coder:7b | 4.7GB | Best code understanding |
| Code Generation | codellama:13b | 7.4GB | Trained on code |
| General Queries | llama3.1:8b | 4.7GB | Balanced performance |
| Fast Responses | mistral:7b | 4.1GB | Speed optimized |
| Large Context | llama3.1:70b | 40GB | 128k context (needs GPU) |

---

## Performance Considerations

**CPU Only:**
- llama3.1:8b: ~2-5 tokens/sec
- Usable for short queries

**GPU (NVIDIA):**
- llama3.1:8b: ~30-100 tokens/sec
- codellama:13b: ~20-50 tokens/sec
- Much faster, recommended

**Enable GPU in Ollama:**
```bash
# Ollama auto-detects GPU
# Verify: check Ollama logs for "CUDA" or "Metal"
```

---

## Next Steps

1. Install Ollama
2. Pull a model (llama3.1:8b recommended)
3. Create MCP server (use code above)
4. Configure `.mcp.json`
5. Restart Claude Code
6. Test: "Use the local Ollama model to analyze this code"

---

**Status:** Design phase - ready to implement
**Created:** 2026-01-22