docs: establish Ollama as the documentation engine

Route all prose generation (session logs, commit messages, Syncro
comments, client notes, code docs) through Ollama qwen3:14b by default.
Claude reviews output and owns verbatim-accuracy sections (credentials,
IPs, command outputs). GrepAI context lookups keep the Ollama service
warm, eliminating the 30-50s cold-start in normal workflow.

Updates: OLLAMA.md (documentation engine scope + warm-start note),
CLAUDE.md (Ollama section), save.md (narrative drafting), checkpoint.md
(commit message body drafting).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-24 07:37:45 -07:00
parent 693766d05e
commit 88bdc3d4c9
4 changed files with 131 additions and 19 deletions

View File

@@ -241,11 +241,13 @@ Vault structure: `infrastructure/`, `clients/`, `services/`, `projects/`, `msp-t
## Local AI (Ollama)
Tier 0 — use before spawning Haiku agents for low-stakes tasks (free, fast, private):
Tier 0 — **Ollama is the documentation engine.** Route all prose generation through it: session log narratives, commit messages, ticket comments, client notes, code docs. Claude reviews output, owns credentials/facts/execution.
- **DESKTOP-0O8A1RL:** `http://localhost:11434`
- **Other machines:** `http://100.92.127.64:11434` (Tailscale required)
- **Models:** `qwen3:14b` (summarize/classify/draft), `codestral:22b` (code suggestions — always review)
- **Full reference:** `.claude/OLLAMA.md` (connection examples, model selection, review policy)
- **Models:** `qwen3:14b` (all documentation/prose), `codestral:22b` (code suggestions — always review)
- **Warm-start:** GrepAI keeps the Ollama service running; qwen3 VRAM swap is ~5s worst case, not 50s
- **Full reference:** `.claude/OLLAMA.md` (documentation engine scope, model selection, review policy)
### GrepAI (Semantic Code Search)

View File

@@ -70,19 +70,61 @@ For code suggestions, swap `qwen3:14b` for `codestral:22b`. Codestral doesn't ne
Cold-start is ~30-50s on first call per model per session. Warm calls are 1-5s.
## Documentation Engine
**Ollama is the default documentation engine for all prose output.** Any time stored text needs to be generated — session logs, commit messages, ticket comments, client notes, code docs — route it through Ollama first. Claude reviews, corrects if needed, then writes or posts.
This keeps Claude tokens focused on reasoning, decisions, and execution. Ollama handles the writing.
### What Ollama owns
| Output | Model | Claude's role |
|--------|-------|---------------|
| Session log narrative (summary, decisions, problems) | qwen3:14b | Review + assemble with factual sections |
| Commit message body | qwen3:14b | Review + execute git commit |
| Syncro comment bodies + billing descriptions | qwen3:14b | Review checklist + post via API |
| Ticket initial issue / description text | qwen3:14b | Review + post |
| Client-facing notes and summaries | qwen3:14b | Review for accuracy |
| Code comments and docstrings | codestral:22b | Review before applying |
| Refactor suggestions | codestral:22b | Review before applying |
### What Claude always owns (never Ollama)
- Credentials, passwords, API keys — must be verbatim accurate
- Infrastructure details, IPs, hostnames — must be verbatim accurate
- Command outputs and error messages — verbatim from actual output
- Security decisions, auth review, production migrations
- Final field values on API payloads (rates, IDs, quantities)
### Warm-start and GrepAI
GrepAI uses `nomic-embed-text` for context lookups, which keeps the Ollama **service** running continuously. The 30-50s service cold-start is effectively eliminated in normal workflow. `qwen3:14b` may take ~5s to swap into VRAM if it hasn't been called recently, but that's the worst case — not 50s.
If the first Ollama call of a session needs to be fast, send a throwaway warm-up ping:
```bash
py -c "
import urllib.request, json
body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':'ok'}],'stream':False,'think':False}).encode()
urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=60).read()
print('warm')
"
```
## When to Use Which Model
| Task | Model |
|------|-------|
| Summarize logs, diffs, session notes | qwen3:14b |
| Session log narrative sections | qwen3:14b |
| Commit message body | qwen3:14b |
| Ticket / client comment drafting | qwen3:14b |
| Summarize logs, diffs, incident notes | qwen3:14b |
| Classify bug type, severity, category | qwen3:14b |
| Extract structured data from text | qwen3:14b |
| Draft commit message from diff | qwen3:14b |
| Suggest refactor for a function | codestral:22b |
| Docstring / comment generation | codestral:22b |
| Code comment / docstring generation | codestral:22b |
| Refactor suggestions | codestral:22b |
## Review Policy
- Low-stakes output (summary, classification, draft) — use directly
- Documentation output (session logs, commit messages, comments) — Claude reviews before writing/posting
- Code suggestions from codestral — always review before applying
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
- Never use Ollama for: credentials, auth decisions, production migrations, security review, API payload field values

View File

@@ -20,17 +20,34 @@ Please create a comprehensive git checkpoint with the following steps:
- Add ALL untracked files (new files)
- Use `git add -A` or `git add .` to stage everything
4. **Create a detailed commit message**:
4. **Draft commit message body via Ollama** (documentation engine):
- **First line**: Write a clear, concise summary (50-72 chars) describing the primary change
- Use imperative mood (e.g., "Add feature" not "Added feature")
- Examples: "feat: add user authentication", "fix: resolve database connection issue", "refactor: improve API route structure"
- **Body**: Provide a detailed description including:
- What changes were made (list of key modifications)
- Why these changes were made (purpose/motivation)
- Any important technical details or decisions
- Breaking changes or migration notes if applicable
- **Footer**: Include co-author attribution as shown in the Git Safety Protocol
```bash
# Resolve Ollama
if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://localhost:11434"
elif curl -s -m 3 http://100.92.127.64:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://100.92.127.64:11434"
else OLLAMA=""; fi
# Capture diff summary for Ollama prompt
{ git diff --stat HEAD; echo "---"; git diff HEAD | head -200; } \
> "C:/Users/guru/AppData/Local/Temp/checkpoint_diff.txt"
# Ollama drafts the body; fallback to Claude if unavailable
if [ -n "$OLLAMA" ]; then
BODY=$(py -c "
import urllib.request, json
diff = open('C:/Users/guru/AppData/Local/Temp/checkpoint_diff.txt', encoding='utf-8').read()
prompt = 'Write a git commit message BODY only (not the summary line). Imperative mood. What changed and why. No filler. Under 150 words.\n\nDIFF:\n' + diff
body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':prompt}],'stream':False,'think':False}).encode()
res = json.loads(urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=60).read())
print(res['message']['content'])
")
fi
```
- **Summary line** (first line): Claude writes — 50-72 chars, imperative mood, from `git diff --stat`
- **Body**: Ollama draft (Claude reviews); Claude writes directly if Ollama unavailable
- **Footer**: `Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>`
5. **Execute the commit**: Create the commit with the properly formatted message following this repository's conventions.

View File

@@ -1,5 +1,56 @@
Save a COMPREHENSIVE session log to appropriate session-logs/ directory. This is critical for context recovery.
## Ollama drafting (documentation engine)
Narrative sections are drafted by Ollama (qwen3:14b), then assembled with Claude-generated factual sections. Claude reviews the full document before writing.
**Ollama drafts:** Session Summary, Key Decisions, Problems Encountered
**Claude owns (verbatim, never delegated):** Credentials, infrastructure IPs/hostnames, command outputs, file paths, pending tasks
### Draft call
```bash
# Check Ollama (reuse $OLLAMA across the save operation)
if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://localhost:11434"
elif curl -s -m 3 http://100.92.127.64:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://100.92.127.64:11434"
else OLLAMA=""; fi
# Write narrative prompt to temp file
cat > "C:/Users/guru/AppData/Local/Temp/save_narrative_prompt.txt" << 'ENDPROMPT'
You are a technical session log writer for an MSP (managed service provider).
Write three sections of a session log in markdown. Be concise, factual, and technical.
No filler phrases. Use past tense.
WORK DONE THIS SESSION:
<paste bullet list of what happened>
Write these three sections only:
## Session Summary
<2-4 paragraph narrative: what was accomplished, in what order, why>
## Key Decisions
<bullet list of non-obvious decisions made and their rationale>
## Problems Encountered
<bullet list of problems hit and how each was resolved; omit if none>
ENDPROMPT
NARRATIVE=$(py -c "
import urllib.request, json
prompt = open('C:/Users/guru/AppData/Local/Temp/save_narrative_prompt.txt', encoding='utf-8').read()
body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':prompt}],'stream':False,'think':False}).encode()
res = json.loads(urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=120).read())
print(res['message']['content'])
")
# Fallback: if OLLAMA empty, Claude writes narrative directly
```
Claude reviews the narrative output before assembling the final document.
---
## Determine Correct Location
**IMPORTANT: Save to project-specific or general session-logs based on work context**