sync: Auto-sync from ACG-M-L5090 at 2026-01-22 19:22:24
Synced files: - Grepai optimization documentation - Ollama Assistant MCP server implementation - Session logs and context updates Machine: ACG-M-L5090 Timestamp: 2026-01-22 19:22:24 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
413
mcp-servers/ollama-assistant/README.md
Normal file
413
mcp-servers/ollama-assistant/README.md
Normal file
@@ -0,0 +1,413 @@
|
||||
# Ollama MCP Server - Local AI Assistant
|
||||
|
||||
**Purpose:** Integrate Ollama local models with Claude Code via MCP, allowing Claude to delegate tasks to a local model that has computer access.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Code Analysis:** Delegate code review to local model for privacy-sensitive code
|
||||
- **Data Processing:** Process large local datasets without API costs
|
||||
- **Offline Work:** Continue working when internet/API is unavailable
|
||||
- **Cost Optimization:** Use local model for simple tasks, Claude for complex reasoning
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐
|
||||
│ Claude Code │ (Coordinator)
|
||||
└────────┬────────┘
|
||||
│
|
||||
│ MCP Protocol
|
||||
↓
|
||||
┌─────────────────────────────┐
|
||||
│ Ollama MCP Server │
|
||||
│ - Exposes tools: │
|
||||
│ • ask_ollama() │
|
||||
│ • analyze_code() │
|
||||
│ • process_data() │
|
||||
└────────┬────────────────────┘
|
||||
│
|
||||
│ HTTP API
|
||||
↓
|
||||
┌─────────────────────────────┐
|
||||
│ Ollama │
|
||||
│ - Model: llama3.1:8b │
|
||||
│ - Local execution │
|
||||
└─────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Install Ollama
|
||||
|
||||
**Windows:**
|
||||
```powershell
|
||||
# Download from https://ollama.ai/download
|
||||
# Or use winget
|
||||
winget install Ollama.Ollama
|
||||
```
|
||||
|
||||
**Verify Installation:**
|
||||
```bash
|
||||
ollama --version
|
||||
```
|
||||
|
||||
### 2. Pull a Model
|
||||
|
||||
```bash
|
||||
# Recommended models:
|
||||
ollama pull llama3.1:8b # Best balance (4.7GB)
|
||||
ollama pull codellama:13b # Code-focused (7.4GB)
|
||||
ollama pull mistral:7b # Fast, good reasoning (4.1GB)
|
||||
ollama pull qwen2.5-coder:7b # Excellent for code (4.7GB)
|
||||
```
|
||||
|
||||
### 3. Test Ollama
|
||||
|
||||
```bash
|
||||
ollama run llama3.1:8b "What is MCP?"
|
||||
```
|
||||
|
||||
### 4. Create MCP Server
|
||||
|
||||
**File:** `mcp-servers/ollama-assistant/server.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Ollama MCP Server
|
||||
Provides local AI assistance to Claude Code via MCP protocol
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
from typing import Any
|
||||
import httpx
|
||||
from mcp.server import Server
|
||||
from mcp.types import Tool, TextContent
|
||||
|
||||
# Configuration
|
||||
OLLAMA_HOST = "http://localhost:11434"
|
||||
DEFAULT_MODEL = "llama3.1:8b"
|
||||
|
||||
# Create MCP server
|
||||
app = Server("ollama-assistant")
|
||||
|
||||
@app.list_tools()
|
||||
async def list_tools() -> list[Tool]:
|
||||
"""List available Ollama tools"""
|
||||
return [
|
||||
Tool(
|
||||
name="ask_ollama",
|
||||
description="Ask the local Ollama model a question. Use for simple queries, code review, or when you want a second opinion. The model has no context of the conversation.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prompt": {
|
||||
"type": "string",
|
||||
"description": "The question or task for Ollama"
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
"description": "Model to use (default: llama3.1:8b)",
|
||||
"default": DEFAULT_MODEL
|
||||
},
|
||||
"system": {
|
||||
"type": "string",
|
||||
"description": "System prompt to set context/role",
|
||||
"default": "You are a helpful coding assistant."
|
||||
}
|
||||
},
|
||||
"required": ["prompt"]
|
||||
}
|
||||
),
|
||||
Tool(
|
||||
name="analyze_code_local",
|
||||
description="Analyze code using local Ollama model. Good for privacy-sensitive code or large codebases. Returns analysis without sending code to external APIs.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"code": {
|
||||
"type": "string",
|
||||
"description": "Code to analyze"
|
||||
},
|
||||
"language": {
|
||||
"type": "string",
|
||||
"description": "Programming language"
|
||||
},
|
||||
"analysis_type": {
|
||||
"type": "string",
|
||||
"enum": ["security", "performance", "quality", "bugs", "general"],
|
||||
"description": "Type of analysis to perform"
|
||||
}
|
||||
},
|
||||
"required": ["code", "language"]
|
||||
}
|
||||
),
|
||||
Tool(
|
||||
name="summarize_large_file",
|
||||
description="Summarize large files using local model. No size limits or API costs.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {
|
||||
"type": "string",
|
||||
"description": "File content to summarize"
|
||||
},
|
||||
"summary_length": {
|
||||
"type": "string",
|
||||
"enum": ["brief", "detailed", "technical"],
|
||||
"default": "brief"
|
||||
}
|
||||
},
|
||||
"required": ["content"]
|
||||
}
|
||||
)
|
||||
]
|
||||
|
||||
@app.call_tool()
|
||||
async def call_tool(name: str, arguments: Any) -> list[TextContent]:
|
||||
"""Execute Ollama tool"""
|
||||
|
||||
if name == "ask_ollama":
|
||||
prompt = arguments["prompt"]
|
||||
model = arguments.get("model", DEFAULT_MODEL)
|
||||
system = arguments.get("system", "You are a helpful coding assistant.")
|
||||
|
||||
response = await query_ollama(prompt, model, system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
elif name == "analyze_code_local":
|
||||
code = arguments["code"]
|
||||
language = arguments["language"]
|
||||
analysis_type = arguments.get("analysis_type", "general")
|
||||
|
||||
system = f"You are a {language} code analyzer. Focus on {analysis_type} analysis."
|
||||
prompt = f"Analyze this {language} code:\n\n```{language}\n{code}\n```\n\nProvide a {analysis_type} analysis."
|
||||
|
||||
response = await query_ollama(prompt, "codellama:13b", system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
elif name == "summarize_large_file":
|
||||
content = arguments["content"]
|
||||
summary_length = arguments.get("summary_length", "brief")
|
||||
|
||||
system = f"You are a file summarizer. Create {summary_length} summaries."
|
||||
prompt = f"Summarize this file content:\n\n{content}"
|
||||
|
||||
response = await query_ollama(prompt, DEFAULT_MODEL, system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
else:
|
||||
raise ValueError(f"Unknown tool: {name}")
|
||||
|
||||
async def query_ollama(prompt: str, model: str, system: str) -> str:
|
||||
"""Query Ollama API"""
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
response = await client.post(
|
||||
f"{OLLAMA_HOST}/api/generate",
|
||||
json={
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"system": system,
|
||||
"stream": False
|
||||
}
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
return result["response"]
|
||||
|
||||
async def main():
|
||||
"""Run MCP server"""
|
||||
from mcp.server.stdio import stdio_server
|
||||
|
||||
async with stdio_server() as (read_stream, write_stream):
|
||||
await app.run(
|
||||
read_stream,
|
||||
write_stream,
|
||||
app.create_initialization_options()
|
||||
)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### 5. Install MCP Server Dependencies
|
||||
|
||||
```bash
|
||||
cd D:\ClaudeTools\mcp-servers\ollama-assistant
|
||||
python -m venv venv
|
||||
venv\Scripts\activate
|
||||
pip install mcp httpx
|
||||
```
|
||||
|
||||
### 6. Configure in Claude Code
|
||||
|
||||
**Edit:** `.mcp.json` (in D:\ClaudeTools)
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"github": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@modelcontextprotocol/server-github"],
|
||||
"env": {
|
||||
"GITHUB_PERSONAL_ACCESS_TOKEN": "your-token-here"
|
||||
}
|
||||
},
|
||||
"filesystem": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@modelcontextprotocol/server-filesystem", "D:\\ClaudeTools"]
|
||||
},
|
||||
"sequential-thinking": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
|
||||
},
|
||||
"ollama-assistant": {
|
||||
"command": "python",
|
||||
"args": [
|
||||
"D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\venv\\Scripts\\python.exe",
|
||||
"D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\server.py"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: Ask Ollama for a Second Opinion
|
||||
|
||||
```
|
||||
User: "Review this authentication code for security issues"
|
||||
|
||||
Claude: Let me delegate this to the local Ollama model for a privacy-focused review.
|
||||
|
||||
[Uses ask_ollama tool]
|
||||
Ollama: "Found potential issues: 1. Password not hashed... 2. No rate limiting..."
|
||||
|
||||
Claude: Based on the local analysis, here are the security concerns...
|
||||
```
|
||||
|
||||
### Example 2: Analyze Large Codebase Locally
|
||||
|
||||
```
|
||||
User: "Analyze this 10,000 line file for performance issues"
|
||||
|
||||
Claude: This is large - I'll use the local model to avoid API costs.
|
||||
|
||||
[Uses summarize_large_file tool]
|
||||
Ollama: "Main performance bottlenecks: 1. N+1 queries... 2. Missing indexes..."
|
||||
|
||||
Claude: Here's the performance analysis from the local model...
|
||||
```
|
||||
|
||||
### Example 3: Offline Development
|
||||
|
||||
```
|
||||
User: "Help me debug this code" (while offline)
|
||||
|
||||
Claude: API unavailable, using local Ollama model...
|
||||
|
||||
[Uses analyze_code_local tool]
|
||||
Ollama: "Bug found on line 42: null reference..."
|
||||
|
||||
Claude: The local model identified the issue...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Option 2: Standalone Ollama with MCP Tools
|
||||
|
||||
Run Ollama as a separate agent with its own MCP server access.
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────────┐
|
||||
│ Claude Code │ │ Ollama + MCP │
|
||||
│ (Main Agent) │────▶│ (Helper Agent) │
|
||||
└─────────────────┘ └──────────┬──────────┘
|
||||
│
|
||||
│ MCP Protocol
|
||||
↓
|
||||
┌──────────────────────┐
|
||||
│ MCP Servers │
|
||||
│ - Filesystem │
|
||||
│ - Bash │
|
||||
│ - Custom tools │
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
**Tool:** Use `ollama-mcp` or similar wrapper that gives Ollama access to MCP servers.
|
||||
|
||||
---
|
||||
|
||||
## Option 3: Hybrid Task Distribution
|
||||
|
||||
Use Claude as coordinator, Ollama for execution.
|
||||
|
||||
**When to use Ollama:**
|
||||
- Privacy-sensitive code review
|
||||
- Large file processing (no token limits)
|
||||
- Offline work
|
||||
- Cost optimization (simple tasks)
|
||||
- Repetitive analysis
|
||||
|
||||
**When to use Claude:**
|
||||
- Complex reasoning
|
||||
- Multi-step planning
|
||||
- API integrations
|
||||
- Final decision-making
|
||||
- User communication
|
||||
|
||||
---
|
||||
|
||||
## Recommended Models for Different Tasks
|
||||
|
||||
| Task Type | Recommended Model | Size | Reason |
|
||||
|-----------|------------------|------|--------|
|
||||
| Code Review | qwen2.5-coder:7b | 4.7GB | Best code understanding |
|
||||
| Code Generation | codellama:13b | 7.4GB | Trained on code |
|
||||
| General Queries | llama3.1:8b | 4.7GB | Balanced performance |
|
||||
| Fast Responses | mistral:7b | 4.1GB | Speed optimized |
|
||||
| Large Context | llama3.1:70b | 40GB | 128k context (needs GPU) |
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
**CPU Only:**
|
||||
- llama3.1:8b: ~2-5 tokens/sec
|
||||
- Usable for short queries
|
||||
|
||||
**GPU (NVIDIA):**
|
||||
- llama3.1:8b: ~30-100 tokens/sec
|
||||
- codellama:13b: ~20-50 tokens/sec
|
||||
- Much faster, recommended
|
||||
|
||||
**Enable GPU in Ollama:**
|
||||
```bash
|
||||
# Ollama auto-detects GPU
|
||||
# Verify: check Ollama logs for "CUDA" or "Metal"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Install Ollama
|
||||
2. Pull a model (llama3.1:8b recommended)
|
||||
3. Create MCP server (use code above)
|
||||
4. Configure `.mcp.json`
|
||||
5. Restart Claude Code
|
||||
6. Test: "Use the local Ollama model to analyze this code"
|
||||
|
||||
---
|
||||
|
||||
**Status:** Design phase - ready to implement
|
||||
**Created:** 2026-01-22
|
||||
Reference in New Issue
Block a user