sync: Auto-sync from ACG-M-L5090 at 2026-01-22 19:22:24
Synced files: - Grepai optimization documentation - Ollama Assistant MCP server implementation - Session logs and context updates Machine: ACG-M-L5090 Timestamp: 2026-01-22 19:22:24 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
345
mcp-servers/ollama-assistant/INSTALL.md
Normal file
345
mcp-servers/ollama-assistant/INSTALL.md
Normal file
@@ -0,0 +1,345 @@
|
||||
# Ollama MCP Server Installation Guide
|
||||
|
||||
Follow these steps to set up local AI assistance for Claude Code.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Install Ollama
|
||||
|
||||
**Option A: Using winget (Recommended)**
|
||||
```powershell
|
||||
winget install Ollama.Ollama
|
||||
```
|
||||
|
||||
**Option B: Manual Download**
|
||||
1. Go to https://ollama.ai/download
|
||||
2. Download the Windows installer
|
||||
3. Run the installer
|
||||
|
||||
**Verify Installation:**
|
||||
```powershell
|
||||
ollama --version
|
||||
```
|
||||
|
||||
Expected output: `ollama version is X.Y.Z`
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Start Ollama Server
|
||||
|
||||
**Start the server:**
|
||||
```powershell
|
||||
ollama serve
|
||||
```
|
||||
|
||||
Leave this terminal open - Ollama needs to run in the background.
|
||||
|
||||
**Tip:** Ollama usually starts automatically after installation. Check system tray for Ollama icon.
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Pull a Model
|
||||
|
||||
**Open a NEW terminal** and pull a model:
|
||||
|
||||
**Recommended for most users:**
|
||||
```powershell
|
||||
ollama pull llama3.1:8b
|
||||
```
|
||||
Size: 4.7GB | Speed: Fast | Quality: Good
|
||||
|
||||
**Best for code:**
|
||||
```powershell
|
||||
ollama pull qwen2.5-coder:7b
|
||||
```
|
||||
Size: 4.7GB | Speed: Fast | Quality: Excellent for code
|
||||
|
||||
**Alternative options:**
|
||||
```powershell
|
||||
# Faster, smaller
|
||||
ollama pull mistral:7b # 4.1GB
|
||||
|
||||
# Better quality, larger
|
||||
ollama pull llama3.1:70b # 40GB (requires good GPU)
|
||||
|
||||
# Code-focused
|
||||
ollama pull codellama:13b # 7.4GB
|
||||
```
|
||||
|
||||
**Verify model is available:**
|
||||
```powershell
|
||||
ollama list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Test Ollama
|
||||
|
||||
```powershell
|
||||
ollama run llama3.1:8b "Explain what MCP is in one sentence"
|
||||
```
|
||||
|
||||
Expected: You should get a response from the model.
|
||||
|
||||
Press `Ctrl+D` or type `/bye` to exit the chat.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Setup MCP Server
|
||||
|
||||
**Run the setup script:**
|
||||
```powershell
|
||||
cd D:\ClaudeTools\mcp-servers\ollama-assistant
|
||||
.\setup.ps1
|
||||
```
|
||||
|
||||
This will:
|
||||
- Create Python virtual environment
|
||||
- Install MCP dependencies (mcp, httpx)
|
||||
- Check Ollama installation
|
||||
- Verify everything is configured
|
||||
|
||||
**Expected output:**
|
||||
```
|
||||
[OK] Python installed
|
||||
[OK] Virtual environment created
|
||||
[OK] Dependencies installed
|
||||
[OK] Ollama installed
|
||||
[OK] Ollama server is running
|
||||
[OK] Found compatible models
|
||||
Setup Complete!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Configure Claude Code
|
||||
|
||||
The `.mcp.json` file has already been updated with the Ollama configuration.
|
||||
|
||||
**Verify configuration:**
|
||||
```powershell
|
||||
cat D:\ClaudeTools\.mcp.json
|
||||
```
|
||||
|
||||
You should see an `ollama-assistant` entry.
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Restart Claude Code
|
||||
|
||||
**IMPORTANT:** You must completely restart Claude Code for MCP changes to take effect.
|
||||
|
||||
1. Close Claude Code completely
|
||||
2. Reopen Claude Code
|
||||
3. Navigate to D:\ClaudeTools directory
|
||||
|
||||
---
|
||||
|
||||
## Step 8: Test Integration
|
||||
|
||||
Try these commands in Claude Code:
|
||||
|
||||
**Test 1: Check status**
|
||||
```
|
||||
Use the ollama_status tool to check if Ollama is running
|
||||
```
|
||||
|
||||
**Test 2: Ask a question**
|
||||
```
|
||||
Use ask_ollama to ask: "What is the fastest sorting algorithm?"
|
||||
```
|
||||
|
||||
**Test 3: Analyze code**
|
||||
```
|
||||
Use analyze_code_local to review this Python function for bugs:
|
||||
def divide(a, b):
|
||||
return a / b
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Ollama Not Running
|
||||
|
||||
**Error:** `Cannot connect to Ollama at http://localhost:11434`
|
||||
|
||||
**Fix:**
|
||||
```powershell
|
||||
# Start Ollama
|
||||
ollama serve
|
||||
|
||||
# Or check if it's already running
|
||||
netstat -ano | findstr :11434
|
||||
```
|
||||
|
||||
### Model Not Found
|
||||
|
||||
**Error:** `Model 'llama3.1:8b' not found`
|
||||
|
||||
**Fix:**
|
||||
```powershell
|
||||
# Pull the model
|
||||
ollama pull llama3.1:8b
|
||||
|
||||
# Verify it's installed
|
||||
ollama list
|
||||
```
|
||||
|
||||
### Python Virtual Environment Issues
|
||||
|
||||
**Error:** `python: command not found`
|
||||
|
||||
**Fix:**
|
||||
1. Install Python 3.8+ from python.org
|
||||
2. Add Python to PATH
|
||||
3. Rerun setup.ps1
|
||||
|
||||
### MCP Server Not Loading
|
||||
|
||||
**Check Claude Code logs:**
|
||||
```powershell
|
||||
# Look for MCP-related errors
|
||||
# Logs are typically in: %APPDATA%\Claude\logs\
|
||||
```
|
||||
|
||||
**Verify Python path:**
|
||||
```powershell
|
||||
D:\ClaudeTools\mcp-servers\ollama-assistant\venv\Scripts\python.exe --version
|
||||
```
|
||||
|
||||
### Port 11434 Already in Use
|
||||
|
||||
**Error:** `Port 11434 is already in use`
|
||||
|
||||
**Fix:**
|
||||
```powershell
|
||||
# Find what's using the port
|
||||
netstat -ano | findstr :11434
|
||||
|
||||
# Kill the process (replace PID)
|
||||
taskkill /F /PID <PID>
|
||||
|
||||
# Restart Ollama
|
||||
ollama serve
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Tips
|
||||
|
||||
### GPU Acceleration
|
||||
|
||||
**Ollama automatically uses your GPU if available (NVIDIA/AMD).**
|
||||
|
||||
**Check GPU usage:**
|
||||
```powershell
|
||||
# NVIDIA
|
||||
nvidia-smi
|
||||
|
||||
# AMD
|
||||
# Check Task Manager > Performance > GPU
|
||||
```
|
||||
|
||||
### CPU Performance
|
||||
|
||||
If using CPU only:
|
||||
- Smaller models (7b-8b) work better
|
||||
- Expect 2-5 tokens/second
|
||||
- Close other applications for better performance
|
||||
|
||||
### Faster Response Times
|
||||
|
||||
```powershell
|
||||
# Use smaller models for speed
|
||||
ollama pull mistral:7b
|
||||
|
||||
# Or quantized versions (smaller, faster)
|
||||
ollama pull llama3.1:8b-q4_0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: Private Code Review
|
||||
|
||||
```
|
||||
I have some proprietary code I don't want to send to external APIs.
|
||||
Can you use the local Ollama model to review it for security issues?
|
||||
|
||||
[Paste code]
|
||||
```
|
||||
|
||||
Claude will use `analyze_code_local` to review locally.
|
||||
|
||||
### Example 2: Large File Summary
|
||||
|
||||
```
|
||||
Summarize this 50,000 line log file using the local model to avoid API costs.
|
||||
|
||||
[Paste content]
|
||||
```
|
||||
|
||||
Claude will use `summarize_large_file` locally.
|
||||
|
||||
### Example 3: Offline Development
|
||||
|
||||
```
|
||||
I'm offline - can you still help with this code?
|
||||
```
|
||||
|
||||
Claude will delegate to local Ollama model automatically.
|
||||
|
||||
---
|
||||
|
||||
## What Models to Use When
|
||||
|
||||
| Task | Best Model | Why |
|
||||
|------|-----------|-----|
|
||||
| Code review | qwen2.5-coder:7b | Trained specifically for code |
|
||||
| Code generation | codellama:13b | Best code completion |
|
||||
| General questions | llama3.1:8b | Balanced performance |
|
||||
| Speed priority | mistral:7b | Fastest responses |
|
||||
| Quality priority | llama3.1:70b | Best reasoning (needs GPU) |
|
||||
|
||||
---
|
||||
|
||||
## Uninstall
|
||||
|
||||
To remove the Ollama MCP server:
|
||||
|
||||
1. **Remove from `.mcp.json`:**
|
||||
Delete the `ollama-assistant` entry
|
||||
|
||||
2. **Delete files:**
|
||||
```powershell
|
||||
Remove-Item -Recurse D:\ClaudeTools\mcp-servers\ollama-assistant
|
||||
```
|
||||
|
||||
3. **Uninstall Ollama (optional):**
|
||||
```powershell
|
||||
winget uninstall Ollama.Ollama
|
||||
```
|
||||
|
||||
4. **Restart Claude Code**
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
Once installed:
|
||||
1. Try asking me to use local Ollama for tasks
|
||||
2. I'll automatically delegate when appropriate:
|
||||
- Privacy-sensitive code
|
||||
- Large files
|
||||
- Offline work
|
||||
- Cost optimization
|
||||
|
||||
The integration is transparent - you can work normally and I'll decide when to use local vs. cloud AI.
|
||||
|
||||
---
|
||||
|
||||
**Status:** Ready to install
|
||||
**Estimated Setup Time:** 10-15 minutes (including model download)
|
||||
**Disk Space Required:** ~5-10GB (for models)
|
||||
413
mcp-servers/ollama-assistant/README.md
Normal file
413
mcp-servers/ollama-assistant/README.md
Normal file
@@ -0,0 +1,413 @@
|
||||
# Ollama MCP Server - Local AI Assistant
|
||||
|
||||
**Purpose:** Integrate Ollama local models with Claude Code via MCP, allowing Claude to delegate tasks to a local model that has computer access.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Code Analysis:** Delegate code review to local model for privacy-sensitive code
|
||||
- **Data Processing:** Process large local datasets without API costs
|
||||
- **Offline Work:** Continue working when internet/API is unavailable
|
||||
- **Cost Optimization:** Use local model for simple tasks, Claude for complex reasoning
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐
|
||||
│ Claude Code │ (Coordinator)
|
||||
└────────┬────────┘
|
||||
│
|
||||
│ MCP Protocol
|
||||
↓
|
||||
┌─────────────────────────────┐
|
||||
│ Ollama MCP Server │
|
||||
│ - Exposes tools: │
|
||||
│ • ask_ollama() │
|
||||
│ • analyze_code() │
|
||||
│ • process_data() │
|
||||
└────────┬────────────────────┘
|
||||
│
|
||||
│ HTTP API
|
||||
↓
|
||||
┌─────────────────────────────┐
|
||||
│ Ollama │
|
||||
│ - Model: llama3.1:8b │
|
||||
│ - Local execution │
|
||||
└─────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Install Ollama
|
||||
|
||||
**Windows:**
|
||||
```powershell
|
||||
# Download from https://ollama.ai/download
|
||||
# Or use winget
|
||||
winget install Ollama.Ollama
|
||||
```
|
||||
|
||||
**Verify Installation:**
|
||||
```bash
|
||||
ollama --version
|
||||
```
|
||||
|
||||
### 2. Pull a Model
|
||||
|
||||
```bash
|
||||
# Recommended models:
|
||||
ollama pull llama3.1:8b # Best balance (4.7GB)
|
||||
ollama pull codellama:13b # Code-focused (7.4GB)
|
||||
ollama pull mistral:7b # Fast, good reasoning (4.1GB)
|
||||
ollama pull qwen2.5-coder:7b # Excellent for code (4.7GB)
|
||||
```
|
||||
|
||||
### 3. Test Ollama
|
||||
|
||||
```bash
|
||||
ollama run llama3.1:8b "What is MCP?"
|
||||
```
|
||||
|
||||
### 4. Create MCP Server
|
||||
|
||||
**File:** `mcp-servers/ollama-assistant/server.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Ollama MCP Server
|
||||
Provides local AI assistance to Claude Code via MCP protocol
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
from typing import Any
|
||||
import httpx
|
||||
from mcp.server import Server
|
||||
from mcp.types import Tool, TextContent
|
||||
|
||||
# Configuration
|
||||
OLLAMA_HOST = "http://localhost:11434"
|
||||
DEFAULT_MODEL = "llama3.1:8b"
|
||||
|
||||
# Create MCP server
|
||||
app = Server("ollama-assistant")
|
||||
|
||||
@app.list_tools()
|
||||
async def list_tools() -> list[Tool]:
|
||||
"""List available Ollama tools"""
|
||||
return [
|
||||
Tool(
|
||||
name="ask_ollama",
|
||||
description="Ask the local Ollama model a question. Use for simple queries, code review, or when you want a second opinion. The model has no context of the conversation.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prompt": {
|
||||
"type": "string",
|
||||
"description": "The question or task for Ollama"
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
"description": "Model to use (default: llama3.1:8b)",
|
||||
"default": DEFAULT_MODEL
|
||||
},
|
||||
"system": {
|
||||
"type": "string",
|
||||
"description": "System prompt to set context/role",
|
||||
"default": "You are a helpful coding assistant."
|
||||
}
|
||||
},
|
||||
"required": ["prompt"]
|
||||
}
|
||||
),
|
||||
Tool(
|
||||
name="analyze_code_local",
|
||||
description="Analyze code using local Ollama model. Good for privacy-sensitive code or large codebases. Returns analysis without sending code to external APIs.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"code": {
|
||||
"type": "string",
|
||||
"description": "Code to analyze"
|
||||
},
|
||||
"language": {
|
||||
"type": "string",
|
||||
"description": "Programming language"
|
||||
},
|
||||
"analysis_type": {
|
||||
"type": "string",
|
||||
"enum": ["security", "performance", "quality", "bugs", "general"],
|
||||
"description": "Type of analysis to perform"
|
||||
}
|
||||
},
|
||||
"required": ["code", "language"]
|
||||
}
|
||||
),
|
||||
Tool(
|
||||
name="summarize_large_file",
|
||||
description="Summarize large files using local model. No size limits or API costs.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {
|
||||
"type": "string",
|
||||
"description": "File content to summarize"
|
||||
},
|
||||
"summary_length": {
|
||||
"type": "string",
|
||||
"enum": ["brief", "detailed", "technical"],
|
||||
"default": "brief"
|
||||
}
|
||||
},
|
||||
"required": ["content"]
|
||||
}
|
||||
)
|
||||
]
|
||||
|
||||
@app.call_tool()
|
||||
async def call_tool(name: str, arguments: Any) -> list[TextContent]:
|
||||
"""Execute Ollama tool"""
|
||||
|
||||
if name == "ask_ollama":
|
||||
prompt = arguments["prompt"]
|
||||
model = arguments.get("model", DEFAULT_MODEL)
|
||||
system = arguments.get("system", "You are a helpful coding assistant.")
|
||||
|
||||
response = await query_ollama(prompt, model, system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
elif name == "analyze_code_local":
|
||||
code = arguments["code"]
|
||||
language = arguments["language"]
|
||||
analysis_type = arguments.get("analysis_type", "general")
|
||||
|
||||
system = f"You are a {language} code analyzer. Focus on {analysis_type} analysis."
|
||||
prompt = f"Analyze this {language} code:\n\n```{language}\n{code}\n```\n\nProvide a {analysis_type} analysis."
|
||||
|
||||
response = await query_ollama(prompt, "codellama:13b", system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
elif name == "summarize_large_file":
|
||||
content = arguments["content"]
|
||||
summary_length = arguments.get("summary_length", "brief")
|
||||
|
||||
system = f"You are a file summarizer. Create {summary_length} summaries."
|
||||
prompt = f"Summarize this file content:\n\n{content}"
|
||||
|
||||
response = await query_ollama(prompt, DEFAULT_MODEL, system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
else:
|
||||
raise ValueError(f"Unknown tool: {name}")
|
||||
|
||||
async def query_ollama(prompt: str, model: str, system: str) -> str:
|
||||
"""Query Ollama API"""
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
response = await client.post(
|
||||
f"{OLLAMA_HOST}/api/generate",
|
||||
json={
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"system": system,
|
||||
"stream": False
|
||||
}
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
return result["response"]
|
||||
|
||||
async def main():
|
||||
"""Run MCP server"""
|
||||
from mcp.server.stdio import stdio_server
|
||||
|
||||
async with stdio_server() as (read_stream, write_stream):
|
||||
await app.run(
|
||||
read_stream,
|
||||
write_stream,
|
||||
app.create_initialization_options()
|
||||
)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### 5. Install MCP Server Dependencies
|
||||
|
||||
```bash
|
||||
cd D:\ClaudeTools\mcp-servers\ollama-assistant
|
||||
python -m venv venv
|
||||
venv\Scripts\activate
|
||||
pip install mcp httpx
|
||||
```
|
||||
|
||||
### 6. Configure in Claude Code
|
||||
|
||||
**Edit:** `.mcp.json` (in D:\ClaudeTools)
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"github": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@modelcontextprotocol/server-github"],
|
||||
"env": {
|
||||
"GITHUB_PERSONAL_ACCESS_TOKEN": "your-token-here"
|
||||
}
|
||||
},
|
||||
"filesystem": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@modelcontextprotocol/server-filesystem", "D:\\ClaudeTools"]
|
||||
},
|
||||
"sequential-thinking": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
|
||||
},
|
||||
"ollama-assistant": {
|
||||
"command": "python",
|
||||
"args": [
|
||||
"D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\venv\\Scripts\\python.exe",
|
||||
"D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\server.py"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: Ask Ollama for a Second Opinion
|
||||
|
||||
```
|
||||
User: "Review this authentication code for security issues"
|
||||
|
||||
Claude: Let me delegate this to the local Ollama model for a privacy-focused review.
|
||||
|
||||
[Uses ask_ollama tool]
|
||||
Ollama: "Found potential issues: 1. Password not hashed... 2. No rate limiting..."
|
||||
|
||||
Claude: Based on the local analysis, here are the security concerns...
|
||||
```
|
||||
|
||||
### Example 2: Analyze Large Codebase Locally
|
||||
|
||||
```
|
||||
User: "Analyze this 10,000 line file for performance issues"
|
||||
|
||||
Claude: This is large - I'll use the local model to avoid API costs.
|
||||
|
||||
[Uses summarize_large_file tool]
|
||||
Ollama: "Main performance bottlenecks: 1. N+1 queries... 2. Missing indexes..."
|
||||
|
||||
Claude: Here's the performance analysis from the local model...
|
||||
```
|
||||
|
||||
### Example 3: Offline Development
|
||||
|
||||
```
|
||||
User: "Help me debug this code" (while offline)
|
||||
|
||||
Claude: API unavailable, using local Ollama model...
|
||||
|
||||
[Uses analyze_code_local tool]
|
||||
Ollama: "Bug found on line 42: null reference..."
|
||||
|
||||
Claude: The local model identified the issue...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Option 2: Standalone Ollama with MCP Tools
|
||||
|
||||
Run Ollama as a separate agent with its own MCP server access.
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────────┐
|
||||
│ Claude Code │ │ Ollama + MCP │
|
||||
│ (Main Agent) │────▶│ (Helper Agent) │
|
||||
└─────────────────┘ └──────────┬──────────┘
|
||||
│
|
||||
│ MCP Protocol
|
||||
↓
|
||||
┌──────────────────────┐
|
||||
│ MCP Servers │
|
||||
│ - Filesystem │
|
||||
│ - Bash │
|
||||
│ - Custom tools │
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
**Tool:** Use `ollama-mcp` or similar wrapper that gives Ollama access to MCP servers.
|
||||
|
||||
---
|
||||
|
||||
## Option 3: Hybrid Task Distribution
|
||||
|
||||
Use Claude as coordinator, Ollama for execution.
|
||||
|
||||
**When to use Ollama:**
|
||||
- Privacy-sensitive code review
|
||||
- Large file processing (no token limits)
|
||||
- Offline work
|
||||
- Cost optimization (simple tasks)
|
||||
- Repetitive analysis
|
||||
|
||||
**When to use Claude:**
|
||||
- Complex reasoning
|
||||
- Multi-step planning
|
||||
- API integrations
|
||||
- Final decision-making
|
||||
- User communication
|
||||
|
||||
---
|
||||
|
||||
## Recommended Models for Different Tasks
|
||||
|
||||
| Task Type | Recommended Model | Size | Reason |
|
||||
|-----------|------------------|------|--------|
|
||||
| Code Review | qwen2.5-coder:7b | 4.7GB | Best code understanding |
|
||||
| Code Generation | codellama:13b | 7.4GB | Trained on code |
|
||||
| General Queries | llama3.1:8b | 4.7GB | Balanced performance |
|
||||
| Fast Responses | mistral:7b | 4.1GB | Speed optimized |
|
||||
| Large Context | llama3.1:70b | 40GB | 128k context (needs GPU) |
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
**CPU Only:**
|
||||
- llama3.1:8b: ~2-5 tokens/sec
|
||||
- Usable for short queries
|
||||
|
||||
**GPU (NVIDIA):**
|
||||
- llama3.1:8b: ~30-100 tokens/sec
|
||||
- codellama:13b: ~20-50 tokens/sec
|
||||
- Much faster, recommended
|
||||
|
||||
**Enable GPU in Ollama:**
|
||||
```bash
|
||||
# Ollama auto-detects GPU
|
||||
# Verify: check Ollama logs for "CUDA" or "Metal"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Install Ollama
|
||||
2. Pull a model (llama3.1:8b recommended)
|
||||
3. Create MCP server (use code above)
|
||||
4. Configure `.mcp.json`
|
||||
5. Restart Claude Code
|
||||
6. Test: "Use the local Ollama model to analyze this code"
|
||||
|
||||
---
|
||||
|
||||
**Status:** Design phase - ready to implement
|
||||
**Created:** 2026-01-22
|
||||
7
mcp-servers/ollama-assistant/requirements.txt
Normal file
7
mcp-servers/ollama-assistant/requirements.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
# Ollama MCP Server Dependencies
|
||||
|
||||
# MCP SDK
|
||||
mcp>=0.1.0
|
||||
|
||||
# HTTP client for Ollama API
|
||||
httpx>=0.25.0
|
||||
238
mcp-servers/ollama-assistant/server.py
Normal file
238
mcp-servers/ollama-assistant/server.py
Normal file
@@ -0,0 +1,238 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Ollama MCP Server
|
||||
Provides local AI assistance to Claude Code via MCP protocol
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import sys
|
||||
from typing import Any
|
||||
import httpx
|
||||
|
||||
# MCP imports
|
||||
try:
|
||||
from mcp.server import Server
|
||||
from mcp.types import Tool, TextContent
|
||||
except ImportError:
|
||||
print("[ERROR] MCP package not installed. Run: pip install mcp", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Configuration
|
||||
OLLAMA_HOST = "http://localhost:11434"
|
||||
DEFAULT_MODEL = "llama3.1:8b"
|
||||
|
||||
# Create MCP server
|
||||
app = Server("ollama-assistant")
|
||||
|
||||
@app.list_tools()
|
||||
async def list_tools() -> list[Tool]:
|
||||
"""List available Ollama tools"""
|
||||
return [
|
||||
Tool(
|
||||
name="ask_ollama",
|
||||
description="Ask the local Ollama model a question. Use for simple queries, code review, or when you want a second opinion. The model has no context of the conversation.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prompt": {
|
||||
"type": "string",
|
||||
"description": "The question or task for Ollama"
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
"description": "Model to use (default: llama3.1:8b)",
|
||||
"default": DEFAULT_MODEL
|
||||
},
|
||||
"system": {
|
||||
"type": "string",
|
||||
"description": "System prompt to set context/role",
|
||||
"default": "You are a helpful coding assistant."
|
||||
}
|
||||
},
|
||||
"required": ["prompt"]
|
||||
}
|
||||
),
|
||||
Tool(
|
||||
name="analyze_code_local",
|
||||
description="Analyze code using local Ollama model. Good for privacy-sensitive code or large codebases. Returns analysis without sending code to external APIs.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"code": {
|
||||
"type": "string",
|
||||
"description": "Code to analyze"
|
||||
},
|
||||
"language": {
|
||||
"type": "string",
|
||||
"description": "Programming language"
|
||||
},
|
||||
"analysis_type": {
|
||||
"type": "string",
|
||||
"enum": ["security", "performance", "quality", "bugs", "general"],
|
||||
"description": "Type of analysis to perform",
|
||||
"default": "general"
|
||||
}
|
||||
},
|
||||
"required": ["code", "language"]
|
||||
}
|
||||
),
|
||||
Tool(
|
||||
name="summarize_large_file",
|
||||
description="Summarize large files using local model. No size limits or API costs.",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {
|
||||
"type": "string",
|
||||
"description": "File content to summarize"
|
||||
},
|
||||
"summary_length": {
|
||||
"type": "string",
|
||||
"enum": ["brief", "detailed", "technical"],
|
||||
"default": "brief"
|
||||
}
|
||||
},
|
||||
"required": ["content"]
|
||||
}
|
||||
),
|
||||
Tool(
|
||||
name="ollama_status",
|
||||
description="Check Ollama server status and list available models",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {}
|
||||
}
|
||||
)
|
||||
]
|
||||
|
||||
@app.call_tool()
|
||||
async def call_tool(name: str, arguments: Any) -> list[TextContent]:
|
||||
"""Execute Ollama tool"""
|
||||
|
||||
if name == "ask_ollama":
|
||||
prompt = arguments["prompt"]
|
||||
model = arguments.get("model", DEFAULT_MODEL)
|
||||
system = arguments.get("system", "You are a helpful coding assistant.")
|
||||
|
||||
try:
|
||||
response = await query_ollama(prompt, model, system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
except Exception as e:
|
||||
return [TextContent(type="text", text=f"[ERROR] Ollama query failed: {str(e)}")]
|
||||
|
||||
elif name == "analyze_code_local":
|
||||
code = arguments["code"]
|
||||
language = arguments["language"]
|
||||
analysis_type = arguments.get("analysis_type", "general")
|
||||
|
||||
system = f"You are a {language} code analyzer. Focus on {analysis_type} analysis. Be concise and specific."
|
||||
prompt = f"Analyze this {language} code for {analysis_type} issues:\n\n```{language}\n{code}\n```\n\nProvide specific findings with line references where possible."
|
||||
|
||||
# Try to use code-specific model if available, fallback to default
|
||||
try:
|
||||
response = await query_ollama(prompt, "qwen2.5-coder:7b", system)
|
||||
except:
|
||||
try:
|
||||
response = await query_ollama(prompt, "codellama:13b", system)
|
||||
except:
|
||||
response = await query_ollama(prompt, DEFAULT_MODEL, system)
|
||||
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
elif name == "summarize_large_file":
|
||||
content = arguments["content"]
|
||||
summary_length = arguments.get("summary_length", "brief")
|
||||
|
||||
length_instructions = {
|
||||
"brief": "Create a concise 2-3 sentence summary.",
|
||||
"detailed": "Create a comprehensive paragraph summary covering main points.",
|
||||
"technical": "Create a technical summary highlighting key functions, classes, and architecture."
|
||||
}
|
||||
|
||||
system = f"You are a file summarizer. {length_instructions[summary_length]}"
|
||||
prompt = f"Summarize this content:\n\n{content[:50000]}" # Limit to first 50k chars
|
||||
|
||||
response = await query_ollama(prompt, DEFAULT_MODEL, system)
|
||||
return [TextContent(type="text", text=response)]
|
||||
|
||||
elif name == "ollama_status":
|
||||
try:
|
||||
status = await check_ollama_status()
|
||||
return [TextContent(type="text", text=status)]
|
||||
except Exception as e:
|
||||
return [TextContent(type="text", text=f"[ERROR] Failed to check Ollama status: {str(e)}")]
|
||||
|
||||
else:
|
||||
raise ValueError(f"Unknown tool: {name}")
|
||||
|
||||
async def query_ollama(prompt: str, model: str, system: str) -> str:
|
||||
"""Query Ollama API"""
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
try:
|
||||
response = await client.post(
|
||||
f"{OLLAMA_HOST}/api/generate",
|
||||
json={
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"system": system,
|
||||
"stream": False,
|
||||
"options": {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9
|
||||
}
|
||||
}
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
return result["response"]
|
||||
except httpx.ConnectError:
|
||||
raise Exception(f"Cannot connect to Ollama at {OLLAMA_HOST}. Is Ollama running? Try: ollama serve")
|
||||
except httpx.HTTPStatusError as e:
|
||||
if e.response.status_code == 404:
|
||||
raise Exception(f"Model '{model}' not found. Pull it with: ollama pull {model}")
|
||||
raise Exception(f"Ollama API error: {e.response.status_code} - {e.response.text}")
|
||||
|
||||
async def check_ollama_status() -> str:
|
||||
"""Check Ollama server status and list models"""
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
# Check server
|
||||
await client.get(f"{OLLAMA_HOST}/")
|
||||
|
||||
# List models
|
||||
response = await client.get(f"{OLLAMA_HOST}/api/tags")
|
||||
response.raise_for_status()
|
||||
models = response.json().get("models", [])
|
||||
|
||||
if not models:
|
||||
return "[WARNING] Ollama is running but no models are installed. Pull a model with: ollama pull llama3.1:8b"
|
||||
|
||||
status = "[OK] Ollama is running\n\nAvailable models:\n"
|
||||
for model in models:
|
||||
name = model["name"]
|
||||
size = model.get("size", 0) / (1024**3) # Convert to GB
|
||||
status += f" - {name} ({size:.1f} GB)\n"
|
||||
|
||||
return status
|
||||
|
||||
except httpx.ConnectError:
|
||||
return f"[ERROR] Ollama is not running. Start it with: ollama serve\nOr install from: https://ollama.ai/download"
|
||||
|
||||
async def main():
|
||||
"""Run MCP server"""
|
||||
try:
|
||||
from mcp.server.stdio import stdio_server
|
||||
|
||||
async with stdio_server() as (read_stream, write_stream):
|
||||
await app.run(
|
||||
read_stream,
|
||||
write_stream,
|
||||
app.create_initialization_options()
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"[ERROR] MCP server failed: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
84
mcp-servers/ollama-assistant/setup.ps1
Normal file
84
mcp-servers/ollama-assistant/setup.ps1
Normal file
@@ -0,0 +1,84 @@
|
||||
# Setup Ollama MCP Server
|
||||
# Run this script to install dependencies
|
||||
|
||||
$ErrorActionPreference = "Stop"
|
||||
|
||||
Write-Host "="*80 -ForegroundColor Cyan
|
||||
Write-Host "Ollama MCP Server Setup" -ForegroundColor Cyan
|
||||
Write-Host "="*80 -ForegroundColor Cyan
|
||||
Write-Host ""
|
||||
|
||||
# Check if Python is available
|
||||
Write-Host "[INFO] Checking Python..." -ForegroundColor Cyan
|
||||
try {
|
||||
$pythonVersion = python --version 2>&1
|
||||
Write-Host "[OK] $pythonVersion" -ForegroundColor Green
|
||||
}
|
||||
catch {
|
||||
Write-Host "[ERROR] Python not found. Install Python 3.8+ from python.org" -ForegroundColor Red
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Create virtual environment
|
||||
Write-Host "[INFO] Creating virtual environment..." -ForegroundColor Cyan
|
||||
if (Test-Path "venv") {
|
||||
Write-Host "[SKIP] Virtual environment already exists" -ForegroundColor Yellow
|
||||
}
|
||||
else {
|
||||
python -m venv venv
|
||||
Write-Host "[OK] Virtual environment created" -ForegroundColor Green
|
||||
}
|
||||
|
||||
# Activate and install dependencies
|
||||
Write-Host "[INFO] Installing dependencies..." -ForegroundColor Cyan
|
||||
& "venv\Scripts\activate.ps1"
|
||||
python -m pip install --upgrade pip -q
|
||||
pip install -r requirements.txt
|
||||
|
||||
Write-Host "[OK] Dependencies installed" -ForegroundColor Green
|
||||
Write-Host ""
|
||||
|
||||
# Check Ollama installation
|
||||
Write-Host "[INFO] Checking Ollama installation..." -ForegroundColor Cyan
|
||||
try {
|
||||
$ollamaVersion = ollama --version 2>&1
|
||||
Write-Host "[OK] Ollama installed: $ollamaVersion" -ForegroundColor Green
|
||||
|
||||
# Check if Ollama is running
|
||||
try {
|
||||
$response = Invoke-WebRequest -Uri "http://localhost:11434" -Method GET -TimeoutSec 2 -ErrorAction Stop
|
||||
Write-Host "[OK] Ollama server is running" -ForegroundColor Green
|
||||
}
|
||||
catch {
|
||||
Write-Host "[WARNING] Ollama is installed but not running" -ForegroundColor Yellow
|
||||
Write-Host "[INFO] Start Ollama with: ollama serve" -ForegroundColor Cyan
|
||||
}
|
||||
|
||||
# Check for models
|
||||
Write-Host "[INFO] Checking for installed models..." -ForegroundColor Cyan
|
||||
$models = ollama list 2>&1
|
||||
if ($models -match "llama3.1:8b|qwen2.5-coder|codellama") {
|
||||
Write-Host "[OK] Found compatible models" -ForegroundColor Green
|
||||
}
|
||||
else {
|
||||
Write-Host "[WARNING] No recommended models found" -ForegroundColor Yellow
|
||||
Write-Host "[INFO] Pull a model with: ollama pull llama3.1:8b" -ForegroundColor Cyan
|
||||
}
|
||||
}
|
||||
catch {
|
||||
Write-Host "[WARNING] Ollama not installed" -ForegroundColor Yellow
|
||||
Write-Host "[INFO] Install from: https://ollama.ai/download" -ForegroundColor Cyan
|
||||
Write-Host "[INFO] Or run: winget install Ollama.Ollama" -ForegroundColor Cyan
|
||||
}
|
||||
|
||||
Write-Host ""
|
||||
Write-Host "="*80 -ForegroundColor Cyan
|
||||
Write-Host "Setup Complete!" -ForegroundColor Green
|
||||
Write-Host "="*80 -ForegroundColor Cyan
|
||||
Write-Host ""
|
||||
Write-Host "Next steps:" -ForegroundColor Cyan
|
||||
Write-Host "1. Install Ollama if not already installed: winget install Ollama.Ollama"
|
||||
Write-Host "2. Pull a model: ollama pull llama3.1:8b"
|
||||
Write-Host "3. Start Ollama: ollama serve"
|
||||
Write-Host "4. Add to .mcp.json and restart Claude Code"
|
||||
Write-Host ""
|
||||
Reference in New Issue
Block a user