sync: Auto-sync from ACG-M-L5090 at 2026-01-22 19:22:24

Synced files: - Grepai optimization documentation - Ollama Assistant MCP server implementation - Session logs and context updates Machine: ACG-M-L5090 Timestamp: 2026-01-22 19:22:24 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 19:22:54 -07:00
parent 63ab144c8f
commit eca8fe820e
7 changed files with 1782 additions and 0 deletions
--- a/mcp-servers/ollama-assistant/README.md
+++ b/mcp-servers/ollama-assistant/README.md
@@ -0,0 +1,413 @@
+# Ollama MCP Server - Local AI Assistant
+
+**Purpose:** Integrate Ollama local models with Claude Code via MCP, allowing Claude to delegate tasks to a local model that has computer access.
+
+## Use Cases
+
+- **Code Analysis:** Delegate code review to local model for privacy-sensitive code
+- **Data Processing:** Process large local datasets without API costs
+- **Offline Work:** Continue working when internet/API is unavailable
+- **Cost Optimization:** Use local model for simple tasks, Claude for complex reasoning
+
+---
+
+## Architecture
+
+```
+┌─────────────────┐
+│  Claude Code    │ (Coordinator)
+└────────┬────────┘
+         │
+         │ MCP Protocol
+         ↓
+┌─────────────────────────────┐
+│  Ollama MCP Server          │
+│  - Exposes tools:           │
+│    • ask_ollama()           │
+│    • analyze_code()         │
+│    • process_data()         │
+└────────┬────────────────────┘
+         │
+         │ HTTP API
+         ↓
+┌─────────────────────────────┐
+│  Ollama                     │
+│  - Model: llama3.1:8b       │
+│  - Local execution          │
+└─────────────────────────────┘
+```
+
+---
+
+## Installation
+
+### 1. Install Ollama
+
+**Windows:**
+```powershell
+# Download from https://ollama.ai/download
+# Or use winget
+winget install Ollama.Ollama
+```
+
+**Verify Installation:**
+```bash
+ollama --version
+```
+
+### 2. Pull a Model
+
+```bash
+# Recommended models:
+ollama pull llama3.1:8b      # Best balance (4.7GB)
+ollama pull codellama:13b    # Code-focused (7.4GB)
+ollama pull mistral:7b       # Fast, good reasoning (4.1GB)
+ollama pull qwen2.5-coder:7b # Excellent for code (4.7GB)
+```
+
+### 3. Test Ollama
+
+```bash
+ollama run llama3.1:8b "What is MCP?"
+```
+
+### 4. Create MCP Server
+
+**File:** `mcp-servers/ollama-assistant/server.py`
+
+```python
+#!/usr/bin/env python3
+"""
+Ollama MCP Server
+Provides local AI assistance to Claude Code via MCP protocol
+"""
+
+import asyncio
+import json
+from typing import Any
+import httpx
+from mcp.server import Server
+from mcp.types import Tool, TextContent
+
+# Configuration
+OLLAMA_HOST = "http://localhost:11434"
+DEFAULT_MODEL = "llama3.1:8b"
+
+# Create MCP server
+app = Server("ollama-assistant")
+
+@app.list_tools()
+async def list_tools() -> list[Tool]:
+    """List available Ollama tools"""
+    return [
+        Tool(
+            name="ask_ollama",
+            description="Ask the local Ollama model a question. Use for simple queries, code review, or when you want a second opinion. The model has no context of the conversation.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "prompt": {
+                        "type": "string",
+                        "description": "The question or task for Ollama"
+                    },
+                    "model": {
+                        "type": "string",
+                        "description": "Model to use (default: llama3.1:8b)",
+                        "default": DEFAULT_MODEL
+                    },
+                    "system": {
+                        "type": "string",
+                        "description": "System prompt to set context/role",
+                        "default": "You are a helpful coding assistant."
+                    }
+                },
+                "required": ["prompt"]
+            }
+        ),
+        Tool(
+            name="analyze_code_local",
+            description="Analyze code using local Ollama model. Good for privacy-sensitive code or large codebases. Returns analysis without sending code to external APIs.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "code": {
+                        "type": "string",
+                        "description": "Code to analyze"
+                    },
+                    "language": {
+                        "type": "string",
+                        "description": "Programming language"
+                    },
+                    "analysis_type": {
+                        "type": "string",
+                        "enum": ["security", "performance", "quality", "bugs", "general"],
+                        "description": "Type of analysis to perform"
+                    }
+                },
+                "required": ["code", "language"]
+            }
+        ),
+        Tool(
+            name="summarize_large_file",
+            description="Summarize large files using local model. No size limits or API costs.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "content": {
+                        "type": "string",
+                        "description": "File content to summarize"
+                    },
+                    "summary_length": {
+                        "type": "string",
+                        "enum": ["brief", "detailed", "technical"],
+                        "default": "brief"
+                    }
+                },
+                "required": ["content"]
+            }
+        )
+    ]
+
+@app.call_tool()
+async def call_tool(name: str, arguments: Any) -> list[TextContent]:
+    """Execute Ollama tool"""
+
+    if name == "ask_ollama":
+        prompt = arguments["prompt"]
+        model = arguments.get("model", DEFAULT_MODEL)
+        system = arguments.get("system", "You are a helpful coding assistant.")
+
+        response = await query_ollama(prompt, model, system)
+        return [TextContent(type="text", text=response)]
+
+    elif name == "analyze_code_local":
+        code = arguments["code"]
+        language = arguments["language"]
+        analysis_type = arguments.get("analysis_type", "general")
+
+        system = f"You are a {language} code analyzer. Focus on {analysis_type} analysis."
+        prompt = f"Analyze this {language} code:\n\n```{language}\n{code}\n```\n\nProvide a {analysis_type} analysis."
+
+        response = await query_ollama(prompt, "codellama:13b", system)
+        return [TextContent(type="text", text=response)]
+
+    elif name == "summarize_large_file":
+        content = arguments["content"]
+        summary_length = arguments.get("summary_length", "brief")
+
+        system = f"You are a file summarizer. Create {summary_length} summaries."
+        prompt = f"Summarize this file content:\n\n{content}"
+
+        response = await query_ollama(prompt, DEFAULT_MODEL, system)
+        return [TextContent(type="text", text=response)]
+
+    else:
+        raise ValueError(f"Unknown tool: {name}")
+
+async def query_ollama(prompt: str, model: str, system: str) -> str:
+    """Query Ollama API"""
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        response = await client.post(
+            f"{OLLAMA_HOST}/api/generate",
+            json={
+                "model": model,
+                "prompt": prompt,
+                "system": system,
+                "stream": False
+            }
+        )
+        response.raise_for_status()
+        result = response.json()
+        return result["response"]
+
+async def main():
+    """Run MCP server"""
+    from mcp.server.stdio import stdio_server
+
+    async with stdio_server() as (read_stream, write_stream):
+        await app.run(
+            read_stream,
+            write_stream,
+            app.create_initialization_options()
+        )
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+### 5. Install MCP Server Dependencies
+
+```bash
+cd D:\ClaudeTools\mcp-servers\ollama-assistant
+python -m venv venv
+venv\Scripts\activate
+pip install mcp httpx
+```
+
+### 6. Configure in Claude Code
+
+**Edit:** `.mcp.json` (in D:\ClaudeTools)
+
+```json
+{
+  "mcpServers": {
+    "github": {
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-github"],
+      "env": {
+        "GITHUB_PERSONAL_ACCESS_TOKEN": "your-token-here"
+      }
+    },
+    "filesystem": {
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-filesystem", "D:\\ClaudeTools"]
+    },
+    "sequential-thinking": {
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
+    },
+    "ollama-assistant": {
+      "command": "python",
+      "args": [
+        "D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\venv\\Scripts\\python.exe",
+        "D:\\ClaudeTools\\mcp-servers\\ollama-assistant\\server.py"
+      ]
+    }
+  }
+}
+```
+
+---
+
+## Usage Examples
+
+### Example 1: Ask Ollama for a Second Opinion
+
+```
+User: "Review this authentication code for security issues"
+
+Claude: Let me delegate this to the local Ollama model for a privacy-focused review.
+
+[Uses ask_ollama tool]
+Ollama: "Found potential issues: 1. Password not hashed... 2. No rate limiting..."
+
+Claude: Based on the local analysis, here are the security concerns...
+```
+
+### Example 2: Analyze Large Codebase Locally
+
+```
+User: "Analyze this 10,000 line file for performance issues"
+
+Claude: This is large - I'll use the local model to avoid API costs.
+
+[Uses summarize_large_file tool]
+Ollama: "Main performance bottlenecks: 1. N+1 queries... 2. Missing indexes..."
+
+Claude: Here's the performance analysis from the local model...
+```
+
+### Example 3: Offline Development
+
+```
+User: "Help me debug this code" (while offline)
+
+Claude: API unavailable, using local Ollama model...
+
+[Uses analyze_code_local tool]
+Ollama: "Bug found on line 42: null reference..."
+
+Claude: The local model identified the issue...
+```
+
+---
+
+## Option 2: Standalone Ollama with MCP Tools
+
+Run Ollama as a separate agent with its own MCP server access.
+
+**Architecture:**
+```
+┌─────────────────┐     ┌─────────────────────┐
+│  Claude Code    │     │  Ollama + MCP       │
+│  (Main Agent)   │────▶│  (Helper Agent)     │
+└─────────────────┘     └──────────┬──────────┘
+                                   │
+                                   │ MCP Protocol
+                                   ↓
+                        ┌──────────────────────┐
+                        │  MCP Servers         │
+                        │  - Filesystem        │
+                        │  - Bash              │
+                        │  - Custom tools      │
+                        └──────────────────────┘
+```
+
+**Tool:** Use `ollama-mcp` or similar wrapper that gives Ollama access to MCP servers.
+
+---
+
+## Option 3: Hybrid Task Distribution
+
+Use Claude as coordinator, Ollama for execution.
+
+**When to use Ollama:**
+- Privacy-sensitive code review
+- Large file processing (no token limits)
+- Offline work
+- Cost optimization (simple tasks)
+- Repetitive analysis
+
+**When to use Claude:**
+- Complex reasoning
+- Multi-step planning
+- API integrations
+- Final decision-making
+- User communication
+
+---
+
+## Recommended Models for Different Tasks
+
+| Task Type | Recommended Model | Size | Reason |
+|-----------|------------------|------|--------|
+| Code Review | qwen2.5-coder:7b | 4.7GB | Best code understanding |
+| Code Generation | codellama:13b | 7.4GB | Trained on code |
+| General Queries | llama3.1:8b | 4.7GB | Balanced performance |
+| Fast Responses | mistral:7b | 4.1GB | Speed optimized |
+| Large Context | llama3.1:70b | 40GB | 128k context (needs GPU) |
+
+---
+
+## Performance Considerations
+
+**CPU Only:**
+- llama3.1:8b: ~2-5 tokens/sec
+- Usable for short queries
+
+**GPU (NVIDIA):**
+- llama3.1:8b: ~30-100 tokens/sec
+- codellama:13b: ~20-50 tokens/sec
+- Much faster, recommended
+
+**Enable GPU in Ollama:**
+```bash
+# Ollama auto-detects GPU
+# Verify: check Ollama logs for "CUDA" or "Metal"
+```
+
+---
+
+## Next Steps
+
+1. Install Ollama
+2. Pull a model (llama3.1:8b recommended)
+3. Create MCP server (use code above)
+4. Configure `.mcp.json`
+5. Restart Claude Code
+6. Test: "Use the local Ollama model to analyze this code"
+
+---
+
+**Status:** Design phase - ready to implement
+**Created:** 2026-01-22