Files
Mike Swanson eca8fe820e sync: Auto-sync from ACG-M-L5090 at 2026-01-22 19:22:24
Synced files:
- Grepai optimization documentation
- Ollama Assistant MCP server implementation
- Session logs and context updates

Machine: ACG-M-L5090
Timestamp: 2026-01-22 19:22:24

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 19:23:16 -07:00

6.2 KiB

Ollama MCP Server Installation Guide

Follow these steps to set up local AI assistance for Claude Code.


Step 1: Install Ollama

Option A: Using winget (Recommended)

winget install Ollama.Ollama

Option B: Manual Download

  1. Go to https://ollama.ai/download
  2. Download the Windows installer
  3. Run the installer

Verify Installation:

ollama --version

Expected output: ollama version is X.Y.Z


Step 2: Start Ollama Server

Start the server:

ollama serve

Leave this terminal open - Ollama needs to run in the background.

Tip: Ollama usually starts automatically after installation. Check system tray for Ollama icon.


Step 3: Pull a Model

Open a NEW terminal and pull a model:

Recommended for most users:

ollama pull llama3.1:8b

Size: 4.7GB | Speed: Fast | Quality: Good

Best for code:

ollama pull qwen2.5-coder:7b

Size: 4.7GB | Speed: Fast | Quality: Excellent for code

Alternative options:

# Faster, smaller
ollama pull mistral:7b        # 4.1GB

# Better quality, larger
ollama pull llama3.1:70b      # 40GB (requires good GPU)

# Code-focused
ollama pull codellama:13b     # 7.4GB

Verify model is available:

ollama list

Step 4: Test Ollama

ollama run llama3.1:8b "Explain what MCP is in one sentence"

Expected: You should get a response from the model.

Press Ctrl+D or type /bye to exit the chat.


Step 5: Setup MCP Server

Run the setup script:

cd D:\ClaudeTools\mcp-servers\ollama-assistant
.\setup.ps1

This will:

  • Create Python virtual environment
  • Install MCP dependencies (mcp, httpx)
  • Check Ollama installation
  • Verify everything is configured

Expected output:

[OK] Python installed
[OK] Virtual environment created
[OK] Dependencies installed
[OK] Ollama installed
[OK] Ollama server is running
[OK] Found compatible models
Setup Complete!

Step 6: Configure Claude Code

The .mcp.json file has already been updated with the Ollama configuration.

Verify configuration:

cat D:\ClaudeTools\.mcp.json

You should see an ollama-assistant entry.


Step 7: Restart Claude Code

IMPORTANT: You must completely restart Claude Code for MCP changes to take effect.

  1. Close Claude Code completely
  2. Reopen Claude Code
  3. Navigate to D:\ClaudeTools directory

Step 8: Test Integration

Try these commands in Claude Code:

Test 1: Check status

Use the ollama_status tool to check if Ollama is running

Test 2: Ask a question

Use ask_ollama to ask: "What is the fastest sorting algorithm?"

Test 3: Analyze code

Use analyze_code_local to review this Python function for bugs:
def divide(a, b):
    return a / b

Troubleshooting

Ollama Not Running

Error: Cannot connect to Ollama at http://localhost:11434

Fix:

# Start Ollama
ollama serve

# Or check if it's already running
netstat -ano | findstr :11434

Model Not Found

Error: Model 'llama3.1:8b' not found

Fix:

# Pull the model
ollama pull llama3.1:8b

# Verify it's installed
ollama list

Python Virtual Environment Issues

Error: python: command not found

Fix:

  1. Install Python 3.8+ from python.org
  2. Add Python to PATH
  3. Rerun setup.ps1

MCP Server Not Loading

Check Claude Code logs:

# Look for MCP-related errors
# Logs are typically in: %APPDATA%\Claude\logs\

Verify Python path:

D:\ClaudeTools\mcp-servers\ollama-assistant\venv\Scripts\python.exe --version

Port 11434 Already in Use

Error: Port 11434 is already in use

Fix:

# Find what's using the port
netstat -ano | findstr :11434

# Kill the process (replace PID)
taskkill /F /PID <PID>

# Restart Ollama
ollama serve

Performance Tips

GPU Acceleration

Ollama automatically uses your GPU if available (NVIDIA/AMD).

Check GPU usage:

# NVIDIA
nvidia-smi

# AMD
# Check Task Manager > Performance > GPU

CPU Performance

If using CPU only:

  • Smaller models (7b-8b) work better
  • Expect 2-5 tokens/second
  • Close other applications for better performance

Faster Response Times

# Use smaller models for speed
ollama pull mistral:7b

# Or quantized versions (smaller, faster)
ollama pull llama3.1:8b-q4_0

Usage Examples

Example 1: Private Code Review

I have some proprietary code I don't want to send to external APIs.
Can you use the local Ollama model to review it for security issues?

[Paste code]

Claude will use analyze_code_local to review locally.

Example 2: Large File Summary

Summarize this 50,000 line log file using the local model to avoid API costs.

[Paste content]

Claude will use summarize_large_file locally.

Example 3: Offline Development

I'm offline - can you still help with this code?

Claude will delegate to local Ollama model automatically.


What Models to Use When

Task Best Model Why
Code review qwen2.5-coder:7b Trained specifically for code
Code generation codellama:13b Best code completion
General questions llama3.1:8b Balanced performance
Speed priority mistral:7b Fastest responses
Quality priority llama3.1:70b Best reasoning (needs GPU)

Uninstall

To remove the Ollama MCP server:

  1. Remove from .mcp.json: Delete the ollama-assistant entry

  2. Delete files:

    Remove-Item -Recurse D:\ClaudeTools\mcp-servers\ollama-assistant
    
  3. Uninstall Ollama (optional):

    winget uninstall Ollama.Ollama
    
  4. Restart Claude Code


Next Steps

Once installed:

  1. Try asking me to use local Ollama for tasks
  2. I'll automatically delegate when appropriate:
    • Privacy-sensitive code
    • Large files
    • Offline work
    • Cost optimization

The integration is transparent - you can work normally and I'll decide when to use local vs. cloud AI.


Status: Ready to install Estimated Setup Time: 10-15 minutes (including model download) Disk Space Required: ~5-10GB (for models)