docs: establish Ollama as the documentation engine

Route all prose generation (session logs, commit messages, Syncro
comments, client notes, code docs) through Ollama qwen3:14b by default.
Claude reviews output and owns verbatim-accuracy sections (credentials,
IPs, command outputs). GrepAI context lookups keep the Ollama service
warm, eliminating the 30-50s cold-start in normal workflow.

Updates: OLLAMA.md (documentation engine scope + warm-start note),
CLAUDE.md (Ollama section), save.md (narrative drafting), checkpoint.md
(commit message body drafting).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-24 07:37:45 -07:00
parent 693766d05e
commit 88bdc3d4c9
4 changed files with 131 additions and 19 deletions

View File

@@ -70,19 +70,61 @@ For code suggestions, swap `qwen3:14b` for `codestral:22b`. Codestral doesn't ne
Cold-start is ~30-50s on first call per model per session. Warm calls are 1-5s.
## Documentation Engine
**Ollama is the default documentation engine for all prose output.** Any time stored text needs to be generated — session logs, commit messages, ticket comments, client notes, code docs — route it through Ollama first. Claude reviews, corrects if needed, then writes or posts.
This keeps Claude tokens focused on reasoning, decisions, and execution. Ollama handles the writing.
### What Ollama owns
| Output | Model | Claude's role |
|--------|-------|---------------|
| Session log narrative (summary, decisions, problems) | qwen3:14b | Review + assemble with factual sections |
| Commit message body | qwen3:14b | Review + execute git commit |
| Syncro comment bodies + billing descriptions | qwen3:14b | Review checklist + post via API |
| Ticket initial issue / description text | qwen3:14b | Review + post |
| Client-facing notes and summaries | qwen3:14b | Review for accuracy |
| Code comments and docstrings | codestral:22b | Review before applying |
| Refactor suggestions | codestral:22b | Review before applying |
### What Claude always owns (never Ollama)
- Credentials, passwords, API keys — must be verbatim accurate
- Infrastructure details, IPs, hostnames — must be verbatim accurate
- Command outputs and error messages — verbatim from actual output
- Security decisions, auth review, production migrations
- Final field values on API payloads (rates, IDs, quantities)
### Warm-start and GrepAI
GrepAI uses `nomic-embed-text` for context lookups, which keeps the Ollama **service** running continuously. The 30-50s service cold-start is effectively eliminated in normal workflow. `qwen3:14b` may take ~5s to swap into VRAM if it hasn't been called recently, but that's the worst case — not 50s.
If the first Ollama call of a session needs to be fast, send a throwaway warm-up ping:
```bash
py -c "
import urllib.request, json
body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':'ok'}],'stream':False,'think':False}).encode()
urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=60).read()
print('warm')
"
```
## When to Use Which Model
| Task | Model |
|------|-------|
| Summarize logs, diffs, session notes | qwen3:14b |
| Session log narrative sections | qwen3:14b |
| Commit message body | qwen3:14b |
| Ticket / client comment drafting | qwen3:14b |
| Summarize logs, diffs, incident notes | qwen3:14b |
| Classify bug type, severity, category | qwen3:14b |
| Extract structured data from text | qwen3:14b |
| Draft commit message from diff | qwen3:14b |
| Suggest refactor for a function | codestral:22b |
| Docstring / comment generation | codestral:22b |
| Code comment / docstring generation | codestral:22b |
| Refactor suggestions | codestral:22b |
## Review Policy
- Low-stakes output (summary, classification, draft) — use directly
- Documentation output (session logs, commit messages, comments) — Claude reviews before writing/posting
- Code suggestions from codestral — always review before applying
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
- Never use Ollama for: credentials, auth decisions, production migrations, security review, API payload field values