docs: establish Ollama as the documentation engine
Route all prose generation (session logs, commit messages, Syncro comments, client notes, code docs) through Ollama qwen3:14b by default. Claude reviews output and owns verbatim-accuracy sections (credentials, IPs, command outputs). GrepAI context lookups keep the Ollama service warm, eliminating the 30-50s cold-start in normal workflow. Updates: OLLAMA.md (documentation engine scope + warm-start note), CLAUDE.md (Ollama section), save.md (narrative drafting), checkpoint.md (commit message body drafting). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -70,19 +70,61 @@ For code suggestions, swap `qwen3:14b` for `codestral:22b`. Codestral doesn't ne
|
||||
|
||||
Cold-start is ~30-50s on first call per model per session. Warm calls are 1-5s.
|
||||
|
||||
## Documentation Engine
|
||||
|
||||
**Ollama is the default documentation engine for all prose output.** Any time stored text needs to be generated — session logs, commit messages, ticket comments, client notes, code docs — route it through Ollama first. Claude reviews, corrects if needed, then writes or posts.
|
||||
|
||||
This keeps Claude tokens focused on reasoning, decisions, and execution. Ollama handles the writing.
|
||||
|
||||
### What Ollama owns
|
||||
|
||||
| Output | Model | Claude's role |
|
||||
|--------|-------|---------------|
|
||||
| Session log narrative (summary, decisions, problems) | qwen3:14b | Review + assemble with factual sections |
|
||||
| Commit message body | qwen3:14b | Review + execute git commit |
|
||||
| Syncro comment bodies + billing descriptions | qwen3:14b | Review checklist + post via API |
|
||||
| Ticket initial issue / description text | qwen3:14b | Review + post |
|
||||
| Client-facing notes and summaries | qwen3:14b | Review for accuracy |
|
||||
| Code comments and docstrings | codestral:22b | Review before applying |
|
||||
| Refactor suggestions | codestral:22b | Review before applying |
|
||||
|
||||
### What Claude always owns (never Ollama)
|
||||
|
||||
- Credentials, passwords, API keys — must be verbatim accurate
|
||||
- Infrastructure details, IPs, hostnames — must be verbatim accurate
|
||||
- Command outputs and error messages — verbatim from actual output
|
||||
- Security decisions, auth review, production migrations
|
||||
- Final field values on API payloads (rates, IDs, quantities)
|
||||
|
||||
### Warm-start and GrepAI
|
||||
|
||||
GrepAI uses `nomic-embed-text` for context lookups, which keeps the Ollama **service** running continuously. The 30-50s service cold-start is effectively eliminated in normal workflow. `qwen3:14b` may take ~5s to swap into VRAM if it hasn't been called recently, but that's the worst case — not 50s.
|
||||
|
||||
If the first Ollama call of a session needs to be fast, send a throwaway warm-up ping:
|
||||
```bash
|
||||
py -c "
|
||||
import urllib.request, json
|
||||
body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':'ok'}],'stream':False,'think':False}).encode()
|
||||
urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=60).read()
|
||||
print('warm')
|
||||
"
|
||||
```
|
||||
|
||||
## When to Use Which Model
|
||||
|
||||
| Task | Model |
|
||||
|------|-------|
|
||||
| Summarize logs, diffs, session notes | qwen3:14b |
|
||||
| Session log narrative sections | qwen3:14b |
|
||||
| Commit message body | qwen3:14b |
|
||||
| Ticket / client comment drafting | qwen3:14b |
|
||||
| Summarize logs, diffs, incident notes | qwen3:14b |
|
||||
| Classify bug type, severity, category | qwen3:14b |
|
||||
| Extract structured data from text | qwen3:14b |
|
||||
| Draft commit message from diff | qwen3:14b |
|
||||
| Suggest refactor for a function | codestral:22b |
|
||||
| Docstring / comment generation | codestral:22b |
|
||||
| Code comment / docstring generation | codestral:22b |
|
||||
| Refactor suggestions | codestral:22b |
|
||||
|
||||
## Review Policy
|
||||
|
||||
- Low-stakes output (summary, classification, draft) — use directly
|
||||
- Documentation output (session logs, commit messages, comments) — Claude reviews before writing/posting
|
||||
- Code suggestions from codestral — always review before applying
|
||||
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
|
||||
- Never use Ollama for: credentials, auth decisions, production migrations, security review, API payload field values
|
||||
|
||||
Reference in New Issue
Block a user