docs: establish Ollama as the documentation engine

Route all prose generation (session logs, commit messages, Syncro comments, client notes, code docs) through Ollama qwen3:14b by default. Claude reviews output and owns verbatim-accuracy sections (credentials, IPs, command outputs). GrepAI context lookups keep the Ollama service warm, eliminating the 30-50s cold-start in normal workflow. Updates: OLLAMA.md (documentation engine scope + warm-start note), CLAUDE.md (Ollama section), save.md (narrative drafting), checkpoint.md (commit message body drafting). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:37:45 -07:00
parent 693766d05e
commit 88bdc3d4c9
4 changed files with 131 additions and 19 deletions
--- a/.claude/OLLAMA.md
+++ b/.claude/OLLAMA.md
@@ -70,19 +70,61 @@ For code suggestions, swap `qwen3:14b` for `codestral:22b`. Codestral doesn't ne

 Cold-start is ~30-50s on first call per model per session. Warm calls are 1-5s.

+## Documentation Engine
+
+**Ollama is the default documentation engine for all prose output.** Any time stored text needs to be generated — session logs, commit messages, ticket comments, client notes, code docs — route it through Ollama first. Claude reviews, corrects if needed, then writes or posts.
+
+This keeps Claude tokens focused on reasoning, decisions, and execution. Ollama handles the writing.
+
+### What Ollama owns
+
+| Output | Model | Claude's role |
+|--------|-------|---------------|
+| Session log narrative (summary, decisions, problems) | qwen3:14b | Review + assemble with factual sections |
+| Commit message body | qwen3:14b | Review + execute git commit |
+| Syncro comment bodies + billing descriptions | qwen3:14b | Review checklist + post via API |
+| Ticket initial issue / description text | qwen3:14b | Review + post |
+| Client-facing notes and summaries | qwen3:14b | Review for accuracy |
+| Code comments and docstrings | codestral:22b | Review before applying |
+| Refactor suggestions | codestral:22b | Review before applying |
+
+### What Claude always owns (never Ollama)
+
+- Credentials, passwords, API keys — must be verbatim accurate
+- Infrastructure details, IPs, hostnames — must be verbatim accurate
+- Command outputs and error messages — verbatim from actual output
+- Security decisions, auth review, production migrations
+- Final field values on API payloads (rates, IDs, quantities)
+
+### Warm-start and GrepAI
+
+GrepAI uses `nomic-embed-text` for context lookups, which keeps the Ollama **service** running continuously. The 30-50s service cold-start is effectively eliminated in normal workflow. `qwen3:14b` may take ~5s to swap into VRAM if it hasn't been called recently, but that's the worst case — not 50s.
+
+If the first Ollama call of a session needs to be fast, send a throwaway warm-up ping:
+```bash
+py -c "
+import urllib.request, json
+body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':'ok'}],'stream':False,'think':False}).encode()
+urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=60).read()
+print('warm')
+"
+```
+
 ## When to Use Which Model

 | Task | Model |
 |------|-------|
-| Summarize logs, diffs, session notes | qwen3:14b |
+| Session log narrative sections | qwen3:14b |
+| Commit message body | qwen3:14b |
+| Ticket / client comment drafting | qwen3:14b |
+| Summarize logs, diffs, incident notes | qwen3:14b |
 | Classify bug type, severity, category | qwen3:14b |
 | Extract structured data from text | qwen3:14b |
-| Draft commit message from diff | qwen3:14b |
-| Suggest refactor for a function | codestral:22b |
-| Docstring / comment generation | codestral:22b |
+| Code comment / docstring generation | codestral:22b |
+| Refactor suggestions | codestral:22b |

 ## Review Policy

- Low-stakes output (summary, classification, draft) — use directly
+- Documentation output (session logs, commit messages, comments) — Claude reviews before writing/posting
 - Code suggestions from codestral — always review before applying
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
+- Never use Ollama for: credentials, auth decisions, production migrations, security review, API payload field values