Synced files: - Grepai optimization documentation - Ollama Assistant MCP server implementation - Session logs and context updates Machine: ACG-M-L5090 Timestamp: 2026-01-22 19:22:24 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
413 lines
9.8 KiB
Markdown
413 lines
9.8 KiB
Markdown
# GrepAI Optimization Guide - Bite-Sized Chunks & Enhanced Context
|
|
|
|
**Created:** 2026-01-22
|
|
**Purpose:** Configure GrepAI for optimal context search with smaller, more precise chunks
|
|
**Status:** Ready to Apply
|
|
|
|
---
|
|
|
|
## What Changed
|
|
|
|
### 1. Bite-Sized Chunks (512 → 256 tokens)
|
|
|
|
**Before:**
|
|
- Chunk size: 512 tokens (~2,048 characters, ~40-50 lines)
|
|
- Total chunks: 6,458
|
|
|
|
**After:**
|
|
- Chunk size: 256 tokens (~1,024 characters, ~20-25 lines)
|
|
- Expected chunks: ~13,000
|
|
- Index size: ~80 MB (from 41 MB)
|
|
|
|
**Benefits:**
|
|
- ✅ More precise search results
|
|
- ✅ Better semantic matching on specific concepts
|
|
- ✅ Easier to locate exact code snippets
|
|
- ✅ Improved context for AI analysis
|
|
- ✅ Can find smaller functions/methods independently
|
|
|
|
**Trade-offs:**
|
|
- ⚠️ Doubles chunk count (more storage)
|
|
- ⚠️ Initial re-indexing: 10-15 minutes
|
|
- ⚠️ Slightly higher memory usage
|
|
|
|
---
|
|
|
|
### 2. Enhanced Context File Search
|
|
|
|
**Problem:** Important context files (credentials.md, directives.md, session logs) were penalized at 0.6x relevance, making them harder to find.
|
|
|
|
**Solution:** Strategic boost system for critical files
|
|
|
|
#### Critical Context Files (1.5x boost)
|
|
- `credentials.md` - Infrastructure credentials for context recovery
|
|
- `directives.md` - Operational guidelines and agent coordination rules
|
|
|
|
#### Session Logs (1.4x boost)
|
|
- `session-logs/*.md` - Complete work history with credentials and decisions
|
|
|
|
#### Claude Configuration (1.3-1.4x boost)
|
|
- `.claude/CLAUDE.md` - Project instructions
|
|
- `.claude/FILE_PLACEMENT_GUIDE.md` - File organization
|
|
- `.claude/AGENT_COORDINATION_RULES.md` - Agent delegation rules
|
|
- `MCP_SERVERS.md` - MCP server configuration
|
|
|
|
#### Documentation (Neutral 1.0x)
|
|
- Changed from 0.6x penalty to 1.0x neutral
|
|
- All `.md` files now searchable without penalty
|
|
- README files and `/docs/` no longer penalized
|
|
|
|
---
|
|
|
|
## What Gets Indexed
|
|
|
|
### ✅ Currently Indexed (955 files)
|
|
- All source code (`.py`, `.rs`, `.ts`, `.js`, etc.)
|
|
- All markdown files (`.md`)
|
|
- Session logs (`session-logs/*.md`)
|
|
- Configuration files (`.yaml`, `.json`, `.toml`)
|
|
- Shell scripts (`.sh`, `.ps1`, `.bat`)
|
|
- SQL files (`.sql`)
|
|
|
|
### ❌ Excluded (Ignored Patterns)
|
|
- `.git/` - Git repository internals
|
|
- `.grepai/` - GrepAI index itself
|
|
- `node_modules/` - npm dependencies
|
|
- `venv/`, `.venv/` - Python virtual environments
|
|
- `__pycache__/` - Python bytecode
|
|
- `dist/`, `build/` - Build artifacts
|
|
- `.idea/`, `.vscode/` - IDE settings
|
|
|
|
### ⚠️ Penalized (Lower Relevance)
|
|
- Test files: `*_test.*`, `*.spec.*`, `*.test.*` (0.5x)
|
|
- Mock files: `/mocks/`, `.mock.*` (0.4x)
|
|
- Generated code: `/generated/`, `.gen.*` (0.4x)
|
|
|
|
---
|
|
|
|
## Implementation Steps
|
|
|
|
### Step 1: Stop the Watcher
|
|
|
|
```bash
|
|
cd D:\ClaudeTools
|
|
./grepai.exe watch --stop
|
|
```
|
|
|
|
Expected output: "Watcher stopped"
|
|
|
|
### Step 2: Backup Current Config
|
|
|
|
```bash
|
|
copy .grepai\config.yaml .grepai\config.yaml.backup
|
|
```
|
|
|
|
### Step 3: Apply New Configuration
|
|
|
|
```bash
|
|
copy .grepai\config.yaml.new .grepai\config.yaml
|
|
```
|
|
|
|
Or manually edit `.grepai\config.yaml` and change:
|
|
- Line 10: `size: 512` → `size: 256`
|
|
- Add bonus patterns (lines 22-41 in new config)
|
|
- Remove `.md` penalty (delete line 49-50)
|
|
|
|
### Step 4: Delete Old Index (Forces Re-indexing)
|
|
|
|
```bash
|
|
# Delete index files but keep config
|
|
Remove-Item .grepai\*.gob -Force
|
|
Remove-Item .grepai\embeddings -Recurse -Force -ErrorAction SilentlyContinue
|
|
```
|
|
|
|
### Step 5: Re-Index with New Settings
|
|
|
|
```bash
|
|
./grepai.exe index --force
|
|
```
|
|
|
|
**Expected time:** 10-15 minutes for ~955 files
|
|
|
|
**Progress indicators:**
|
|
- Shows "Indexing files..." with progress bar
|
|
- Displays file count and ETA
|
|
- Updates every few seconds
|
|
|
|
### Step 6: Restart Watcher
|
|
|
|
```bash
|
|
./grepai.exe watch --background
|
|
```
|
|
|
|
**Verify it's running:**
|
|
```bash
|
|
./grepai.exe watch --status
|
|
```
|
|
|
|
Expected output:
|
|
```
|
|
Watcher status: running
|
|
PID: <process_id>
|
|
Indexed files: 955
|
|
Last update: <timestamp>
|
|
```
|
|
|
|
### Step 7: Verify New Index
|
|
|
|
```bash
|
|
./grepai.exe status
|
|
```
|
|
|
|
Expected output:
|
|
```
|
|
Files indexed: 955
|
|
Total chunks: ~13,000 (doubled from 6,458)
|
|
Index size: ~80 MB (increased from 41 MB)
|
|
Provider: ollama (nomic-embed-text)
|
|
```
|
|
|
|
### Step 8: Restart Claude Code
|
|
|
|
Claude Code needs to restart to use the updated MCP server configuration.
|
|
|
|
1. Quit Claude Code completely
|
|
2. Relaunch Claude Code
|
|
3. Test: "Use grepai to search for database credentials"
|
|
|
|
---
|
|
|
|
## Testing the Optimizations
|
|
|
|
### Test 1: Bite-Sized Chunks
|
|
|
|
**Query:** "database connection pool setup"
|
|
|
|
**Expected:**
|
|
- More granular results (specific to pool config)
|
|
- Find `create_engine()` call independently
|
|
- Find `SessionLocal` configuration separately
|
|
- Better line-level precision
|
|
|
|
**Before (512 tokens):** Returns entire `api\database.py` module (68 lines)
|
|
**After (256 tokens):** Returns specific sections:
|
|
- Engine creation (lines 20-30)
|
|
- Session factory (lines 50-60)
|
|
- get_db dependency (lines 61-80)
|
|
|
|
---
|
|
|
|
### Test 2: Context File Search
|
|
|
|
**Query:** "SSH credentials for GuruRMM server"
|
|
|
|
**Expected:**
|
|
- `credentials.md` should rank FIRST (1.5x boost)
|
|
- Should find SSH access section directly
|
|
- Higher relevance score than code files
|
|
|
|
**Verify:**
|
|
```bash
|
|
./grepai.exe search "SSH credentials GuruRMM" -n 5
|
|
```
|
|
|
|
---
|
|
|
|
### Test 3: Session Log Context Recovery
|
|
|
|
**Query:** "previous work on session logs or context recovery"
|
|
|
|
**Expected:**
|
|
- `session-logs/*.md` files should rank highly (1.4x boost)
|
|
- Find relevant past work sessions
|
|
- Better than generic documentation
|
|
|
|
---
|
|
|
|
### Test 4: Operational Guidelines
|
|
|
|
**Query:** "agent coordination rules or delegation"
|
|
|
|
**Expected:**
|
|
- `directives.md` should rank first (1.5x boost)
|
|
- `.claude/AGENT_COORDINATION_RULES.md` should rank second (1.3x boost)
|
|
- Find operational guidelines before generic docs
|
|
|
|
---
|
|
|
|
## Performance Expectations
|
|
|
|
### Indexing Performance
|
|
- **Initial indexing:** 10-15 minutes (one-time)
|
|
- **Incremental updates:** <5 seconds per file
|
|
- **Full re-index:** 10-15 minutes (rarely needed)
|
|
|
|
### Search Performance
|
|
- **Query latency:** 50-150ms (may increase slightly due to more chunks)
|
|
- **Relevance:** Improved for specific concepts
|
|
- **Memory usage:** 150-250 MB (increased from 100-200 MB)
|
|
|
|
### Storage Requirements
|
|
- **Index size:** ~80 MB (increased from 41 MB)
|
|
- **Disk I/O:** Minimal after initial indexing
|
|
- **Ollama embeddings:** 768-dimensional vectors (unchanged)
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Re-indexing Stuck or Slow
|
|
|
|
**Solution:**
|
|
1. Check Ollama is running: `curl http://localhost:11434/api/tags`
|
|
2. Check CPU usage (embedding generation is CPU-intensive)
|
|
3. Monitor logs: `C:\Users\<username>\AppData\Local\grepai\logs\grepai-watch.log`
|
|
|
|
### Issue: Search Results Less Relevant
|
|
|
|
**Solution:**
|
|
1. Verify config applied: `type .grepai\config.yaml | findstr "size:"`
|
|
- Should show: `size: 256`
|
|
2. Verify bonuses applied: `type .grepai\config.yaml | findstr "credentials.md"`
|
|
- Should show: `factor: 1.5`
|
|
3. Re-index if needed: `./grepai.exe index --force`
|
|
|
|
### Issue: Watcher Won't Start
|
|
|
|
**Solution:**
|
|
1. Kill existing process: `taskkill /F /IM grepai.exe`
|
|
2. Delete stale PID: `Remove-Item .grepai\watch.pid -Force`
|
|
3. Restart watcher: `./grepai.exe watch --background`
|
|
|
|
### Issue: MCP Server Not Responding
|
|
|
|
**Solution:**
|
|
1. Verify grepai running: `./grepai.exe watch --status`
|
|
2. Restart Claude Code completely
|
|
3. Test MCP manually: `./grepai.exe mcp-serve`
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
If issues occur, rollback to original configuration:
|
|
|
|
```bash
|
|
# Stop watcher
|
|
./grepai.exe watch --stop
|
|
|
|
# Restore backup config
|
|
copy .grepai\config.yaml.backup .grepai\config.yaml
|
|
|
|
# Re-index with old settings
|
|
./grepai.exe index --force
|
|
|
|
# Restart watcher
|
|
./grepai.exe watch --background
|
|
|
|
# Restart Claude Code
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Summary
|
|
|
|
### Old Configuration
|
|
```yaml
|
|
chunking:
|
|
size: 512
|
|
overlap: 50
|
|
|
|
search:
|
|
boost:
|
|
penalties:
|
|
- pattern: .md
|
|
factor: 0.6 # Markdown penalized
|
|
```
|
|
|
|
### New Configuration
|
|
```yaml
|
|
chunking:
|
|
size: 256 # REDUCED for bite-sized chunks
|
|
overlap: 50
|
|
|
|
search:
|
|
boost:
|
|
bonuses:
|
|
# Critical context files
|
|
- pattern: credentials.md
|
|
factor: 1.5
|
|
- pattern: directives.md
|
|
factor: 1.5
|
|
- pattern: /session-logs/
|
|
factor: 1.4
|
|
- pattern: /.claude/
|
|
factor: 1.3
|
|
penalties:
|
|
# .md penalty REMOVED
|
|
# Markdown now neutral or boosted
|
|
```
|
|
|
|
---
|
|
|
|
## Expected Results
|
|
|
|
### Improved Search Scenarios
|
|
|
|
**Scenario 1: Finding Infrastructure Credentials**
|
|
- Query: "database connection string"
|
|
- Old: Generic code files ranked first
|
|
- New: `credentials.md` ranked first with full connection details
|
|
|
|
**Scenario 2: Finding Operational Guidelines**
|
|
- Query: "how to coordinate with agents"
|
|
- Old: Generic documentation or code examples
|
|
- New: `directives.md` and `AGENT_COORDINATION_RULES.md` ranked first
|
|
|
|
**Scenario 3: Context Recovery**
|
|
- Query: "previous work on authentication system"
|
|
- Old: Current code files only
|
|
- New: Session logs with full context of past decisions
|
|
|
|
**Scenario 4: Specific Code Snippets**
|
|
- Query: "JWT token verification"
|
|
- Old: Entire auth.py file (100+ lines)
|
|
- New: Specific `verify_token()` function (10-20 lines)
|
|
|
|
---
|
|
|
|
## Maintenance
|
|
|
|
### Weekly Checks
|
|
- Verify watcher running: `./grepai.exe watch --status`
|
|
- Check index health: `./grepai.exe status`
|
|
|
|
### Monthly Review
|
|
- Review log files for errors
|
|
- Consider re-indexing: `./grepai.exe index --force`
|
|
- Update this guide with findings
|
|
|
|
### As Needed
|
|
- Add new critical files to boost patterns
|
|
- Adjust chunk size if needed (128, 384, 512)
|
|
- Monitor search relevance and adjust factors
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- GrepAI Documentation: https://yoanbernabeu.github.io/grepai/
|
|
- Chunking Best Practices: https://yoanbernabeu.github.io/grepai/chunking/
|
|
- Search Boost Configuration: https://yoanbernabeu.github.io/grepai/search-boost/
|
|
- MCP Integration: https://yoanbernabeu.github.io/grepai/mcp/
|
|
|
|
---
|
|
|
|
**Next Steps:**
|
|
1. Review this guide
|
|
2. Backup current config
|
|
3. Apply new configuration
|
|
4. Re-index with optimized settings
|
|
5. Test search improvements
|
|
6. Update MCP_SERVERS.md with findings
|