Synced files: - Grepai optimization documentation - Ollama Assistant MCP server implementation - Session logs and context updates Machine: ACG-M-L5090 Timestamp: 2026-01-22 19:22:24 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
6.2 KiB
Ollama MCP Server Installation Guide
Follow these steps to set up local AI assistance for Claude Code.
Step 1: Install Ollama
Option A: Using winget (Recommended)
winget install Ollama.Ollama
Option B: Manual Download
- Go to https://ollama.ai/download
- Download the Windows installer
- Run the installer
Verify Installation:
ollama --version
Expected output: ollama version is X.Y.Z
Step 2: Start Ollama Server
Start the server:
ollama serve
Leave this terminal open - Ollama needs to run in the background.
Tip: Ollama usually starts automatically after installation. Check system tray for Ollama icon.
Step 3: Pull a Model
Open a NEW terminal and pull a model:
Recommended for most users:
ollama pull llama3.1:8b
Size: 4.7GB | Speed: Fast | Quality: Good
Best for code:
ollama pull qwen2.5-coder:7b
Size: 4.7GB | Speed: Fast | Quality: Excellent for code
Alternative options:
# Faster, smaller
ollama pull mistral:7b # 4.1GB
# Better quality, larger
ollama pull llama3.1:70b # 40GB (requires good GPU)
# Code-focused
ollama pull codellama:13b # 7.4GB
Verify model is available:
ollama list
Step 4: Test Ollama
ollama run llama3.1:8b "Explain what MCP is in one sentence"
Expected: You should get a response from the model.
Press Ctrl+D or type /bye to exit the chat.
Step 5: Setup MCP Server
Run the setup script:
cd D:\ClaudeTools\mcp-servers\ollama-assistant
.\setup.ps1
This will:
- Create Python virtual environment
- Install MCP dependencies (mcp, httpx)
- Check Ollama installation
- Verify everything is configured
Expected output:
[OK] Python installed
[OK] Virtual environment created
[OK] Dependencies installed
[OK] Ollama installed
[OK] Ollama server is running
[OK] Found compatible models
Setup Complete!
Step 6: Configure Claude Code
The .mcp.json file has already been updated with the Ollama configuration.
Verify configuration:
cat D:\ClaudeTools\.mcp.json
You should see an ollama-assistant entry.
Step 7: Restart Claude Code
IMPORTANT: You must completely restart Claude Code for MCP changes to take effect.
- Close Claude Code completely
- Reopen Claude Code
- Navigate to D:\ClaudeTools directory
Step 8: Test Integration
Try these commands in Claude Code:
Test 1: Check status
Use the ollama_status tool to check if Ollama is running
Test 2: Ask a question
Use ask_ollama to ask: "What is the fastest sorting algorithm?"
Test 3: Analyze code
Use analyze_code_local to review this Python function for bugs:
def divide(a, b):
return a / b
Troubleshooting
Ollama Not Running
Error: Cannot connect to Ollama at http://localhost:11434
Fix:
# Start Ollama
ollama serve
# Or check if it's already running
netstat -ano | findstr :11434
Model Not Found
Error: Model 'llama3.1:8b' not found
Fix:
# Pull the model
ollama pull llama3.1:8b
# Verify it's installed
ollama list
Python Virtual Environment Issues
Error: python: command not found
Fix:
- Install Python 3.8+ from python.org
- Add Python to PATH
- Rerun setup.ps1
MCP Server Not Loading
Check Claude Code logs:
# Look for MCP-related errors
# Logs are typically in: %APPDATA%\Claude\logs\
Verify Python path:
D:\ClaudeTools\mcp-servers\ollama-assistant\venv\Scripts\python.exe --version
Port 11434 Already in Use
Error: Port 11434 is already in use
Fix:
# Find what's using the port
netstat -ano | findstr :11434
# Kill the process (replace PID)
taskkill /F /PID <PID>
# Restart Ollama
ollama serve
Performance Tips
GPU Acceleration
Ollama automatically uses your GPU if available (NVIDIA/AMD).
Check GPU usage:
# NVIDIA
nvidia-smi
# AMD
# Check Task Manager > Performance > GPU
CPU Performance
If using CPU only:
- Smaller models (7b-8b) work better
- Expect 2-5 tokens/second
- Close other applications for better performance
Faster Response Times
# Use smaller models for speed
ollama pull mistral:7b
# Or quantized versions (smaller, faster)
ollama pull llama3.1:8b-q4_0
Usage Examples
Example 1: Private Code Review
I have some proprietary code I don't want to send to external APIs.
Can you use the local Ollama model to review it for security issues?
[Paste code]
Claude will use analyze_code_local to review locally.
Example 2: Large File Summary
Summarize this 50,000 line log file using the local model to avoid API costs.
[Paste content]
Claude will use summarize_large_file locally.
Example 3: Offline Development
I'm offline - can you still help with this code?
Claude will delegate to local Ollama model automatically.
What Models to Use When
| Task | Best Model | Why |
|---|---|---|
| Code review | qwen2.5-coder:7b | Trained specifically for code |
| Code generation | codellama:13b | Best code completion |
| General questions | llama3.1:8b | Balanced performance |
| Speed priority | mistral:7b | Fastest responses |
| Quality priority | llama3.1:70b | Best reasoning (needs GPU) |
Uninstall
To remove the Ollama MCP server:
-
Remove from
.mcp.json: Delete theollama-assistantentry -
Delete files:
Remove-Item -Recurse D:\ClaudeTools\mcp-servers\ollama-assistant -
Uninstall Ollama (optional):
winget uninstall Ollama.Ollama -
Restart Claude Code
Next Steps
Once installed:
- Try asking me to use local Ollama for tasks
- I'll automatically delegate when appropriate:
- Privacy-sensitive code
- Large files
- Offline work
- Cost optimization
The integration is transparent - you can work normally and I'll decide when to use local vs. cloud AI.
Status: Ready to install Estimated Setup Time: 10-15 minutes (including model download) Disk Space Required: ~5-10GB (for models)