Session log: Ollama + GrepAI setup, coordinator review policy

Installed Ollama with GPU support (qwen3:14b, codestral:22b, nomic-embed-text), configured GrepAI semantic code search with optimized 256-token chunks and context file boosting, added MCP server integration and deep-explore agent. Updated claude.md with local AI usage guidelines and 4-tier output review policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:42:01 -07:00
parent 481b02ed46
commit 98ea867d2c
3 changed files with 265 additions and 2 deletions
--- a/.claude/claude.md
+++ b/.claude/claude.md
@@ -15,6 +15,7 @@ You are NOT an executor. You coordinate specialized agents and preserve your con
 | Git commits/push/branch | Gitea Agent |
 | Backups/restore | Backup Agent |
 | File exploration (broad) | Explore Agent |
+| Semantic code search | deep-explore Agent (uses GrepAI) |
 | Complex reasoning | General-purpose + Sequential Thinking |

 **Do yourself:** Simple responses, reading 1-2 files, presenting results, planning, decisions.
@@ -39,7 +40,7 @@ You are NOT an executor. You coordinate specialized agents and preserve your con

 - **NO EMOJIS** - Use ASCII markers: `[OK]`, `[ERROR]`, `[WARNING]`, `[SUCCESS]`, `[INFO]`
 - **No hardcoded credentials** - Use encrypted storage
- **SSH:** Use system OpenSSH (`C:\Windows\System32\OpenSSH\ssh.exe`), never Git for Windows SSH
+- **SSH:** Use system OpenSSH (on Windows: `C:\Windows\System32\OpenSSH\ssh.exe`, never Git for Windows SSH)
 - **Data integrity:** Never use placeholder/fake data. Check credentials.md or ask user.
 - **Full coding standards:** `.claude/CODING_GUIDELINES.md` (agents read on-demand, not every session)

@@ -85,6 +86,87 @@ When user references previous work, use `/context` command. Never ask user for i

 ---

+## Local AI (Ollama)
+
+Ollama runs locally with GPU acceleration. Use it for tasks that don't need Claude-level reasoning.
+
+### Available Models
+
+| Model | Size | Use For |
+|-------|------|---------|
+| `qwen3:14b` | 9.3 GB | General sub-tasks: summarization, classification, data extraction, drafting |
+| `codestral:22b` | 12 GB | Code-specific sub-tasks: code generation, refactoring suggestions, docstring generation |
+| `nomic-embed-text` | 274 MB | Embeddings only (used by GrepAI, not for direct use) |
+
+### GrepAI (Semantic Code Search)
+
+GrepAI indexes the codebase using `nomic-embed-text` embeddings and provides semantic search via MCP server.
+
+**When to use GrepAI instead of Grep/Glob:**
+- Finding code by intent ("how does authentication work") rather than exact text
+- Exploring unfamiliar areas of the codebase
+- Finding related implementations across files
+- Context recovery — searching session logs and credentials by meaning
+
+**How to use:**
+- **MCP tool:** Use the `grepai` MCP server tools directly (available after MCP loads)
+- **deep-explore agent:** Delegate to the `deep-explore` agent for thorough semantic exploration
+- **CLI fallback:** `grepai search "your query" --json --compact`
+
+**Maintenance:** The watcher daemon runs in the background and auto-indexes file changes. If search results seem stale, run `grepai watch --stop && grepai watch --background` to restart it.
+
+### Using Ollama for Sub-Tasks
+
+For bulk or repetitive work that doesn't require Claude's full reasoning, offload to local models via Ollama's API:
+
+**When to use Ollama:**
+- Processing many items in a loop (e.g., summarizing 50 session logs)
+- Generating boilerplate or repetitive code patterns
+- Data extraction/classification from structured text
+- Draft content that Claude will review/refine
+- Any task where speed > quality and results will be verified
+
+**When NOT to use Ollama (use Claude instead):**
+- Architectural decisions or complex reasoning
+- Security-sensitive code review
+- Tasks requiring tool use or multi-step planning
+- Final output that goes directly to production
+
+**How to call Ollama:**
+```bash
+# Simple prompt
+curl -s http://localhost:11434/api/generate -d '{"model":"qwen3:14b","prompt":"Summarize this: ...","stream":false}' | jq -r '.response'
+
+# Chat format
+curl -s http://localhost:11434/api/chat -d '{"model":"codestral:22b","messages":[{"role":"user","content":"Refactor this function: ..."}],"stream":false}' | jq -r '.message.content'
+```
+
+### Ollama Output Review Policy
+
+The coordinator (Claude) must review Ollama outputs based on impact level. Local models are useful but unreliable — they hallucinate, miss edge cases, and produce subtly wrong code.
+
+**Impact levels and review requirements:**
+
+| Level | Review | Examples |
+|-------|--------|----------|
+| **Critical** | ALWAYS review, verify against source | Code touching auth/security/encryption, credential handling, database migrations, production config, anything user-facing |
+| **High** | Review for correctness, spot-check details | API endpoint logic, business rules, infrastructure scripts, client-specific work |
+| **Medium** | Skim for obvious errors, trust if reasonable | Internal documentation drafts, session log summaries, data extraction from structured input, boilerplate code |
+| **Low** | Trust without review | Classification/tagging of items, reformatting text, generating placeholder content for later editing |
+
+**Review process for Critical/High:**
+1. Read Ollama's full output — don't just check if it "looks right"
+2. Verify claims against actual files/data (e.g., if it says a function exists, confirm it does)
+3. Check for: hallucinated function names, wrong parameter types, missing error handling, security gaps
+4. If output is wrong or uncertain, redo the task yourself rather than patching Ollama's attempt
+
+**Batch processing pattern:**
+When using Ollama for bulk tasks (e.g., processing N items), review the first 2-3 results fully before trusting the rest. If any are wrong, switch to doing it yourself or fix the prompt and reprocess.
+
+**Flag to user:** If Ollama produces output for a Critical task and you are not confident in your review, tell the user explicitly: "This was generated by a local model and I'm not fully confident in [specific concern]."
+
+---
+
 ## Reference (read on-demand, not every session)

 - **Project structure, endpoints, workflows, troubleshooting:** `.claude/REFERENCE.md`
@@ -94,4 +176,4 @@ When user references previous work, use `/context` command. Never ask user for i

 ---

-**Last Updated:** 2026-02-17
+**Last Updated:** 2026-03-20