docs: establish Ollama as the documentation engine

Route all prose generation (session logs, commit messages, Syncro comments, client notes, code docs) through Ollama qwen3:14b by default. Claude reviews output and owns verbatim-accuracy sections (credentials, IPs, command outputs). GrepAI context lookups keep the Ollama service warm, eliminating the 30-50s cold-start in normal workflow. Updates: OLLAMA.md (documentation engine scope + warm-start note), CLAUDE.md (Ollama section), save.md (narrative drafting), checkpoint.md (commit message body drafting). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:37:45 -07:00
parent 693766d05e
commit 88bdc3d4c9
4 changed files with 131 additions and 19 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -241,11 +241,13 @@ Vault structure: `infrastructure/`, `clients/`, `services/`, `projects/`, `msp-t

 ## Local AI (Ollama)

-Tier 0 — use before spawning Haiku agents for low-stakes tasks (free, fast, private):
+Tier 0 — **Ollama is the documentation engine.** Route all prose generation through it: session log narratives, commit messages, ticket comments, client notes, code docs. Claude reviews output, owns credentials/facts/execution.
+
 - **DESKTOP-0O8A1RL:** `http://localhost:11434`
 - **Other machines:** `http://100.92.127.64:11434` (Tailscale required)
- **Models:** `qwen3:14b` (summarize/classify/draft), `codestral:22b` (code suggestions — always review)
- **Full reference:** `.claude/OLLAMA.md` (connection examples, model selection, review policy)
+- **Models:** `qwen3:14b` (all documentation/prose), `codestral:22b` (code suggestions — always review)
+- **Warm-start:** GrepAI keeps the Ollama service running; qwen3 VRAM swap is ~5s worst case, not 50s
+- **Full reference:** `.claude/OLLAMA.md` (documentation engine scope, model selection, review policy)

 ### GrepAI (Semantic Code Search)

--- a/.claude/OLLAMA.md
+++ b/.claude/OLLAMA.md
@@ -70,19 +70,61 @@ For code suggestions, swap `qwen3:14b` for `codestral:22b`. Codestral doesn't ne

 Cold-start is ~30-50s on first call per model per session. Warm calls are 1-5s.

+## Documentation Engine
+
+**Ollama is the default documentation engine for all prose output.** Any time stored text needs to be generated — session logs, commit messages, ticket comments, client notes, code docs — route it through Ollama first. Claude reviews, corrects if needed, then writes or posts.
+
+This keeps Claude tokens focused on reasoning, decisions, and execution. Ollama handles the writing.
+
+### What Ollama owns
+
+| Output | Model | Claude's role |
+|--------|-------|---------------|
+| Session log narrative (summary, decisions, problems) | qwen3:14b | Review + assemble with factual sections |
+| Commit message body | qwen3:14b | Review + execute git commit |
+| Syncro comment bodies + billing descriptions | qwen3:14b | Review checklist + post via API |
+| Ticket initial issue / description text | qwen3:14b | Review + post |
+| Client-facing notes and summaries | qwen3:14b | Review for accuracy |
+| Code comments and docstrings | codestral:22b | Review before applying |
+| Refactor suggestions | codestral:22b | Review before applying |
+
+### What Claude always owns (never Ollama)
+
+- Credentials, passwords, API keys — must be verbatim accurate
+- Infrastructure details, IPs, hostnames — must be verbatim accurate
+- Command outputs and error messages — verbatim from actual output
+- Security decisions, auth review, production migrations
+- Final field values on API payloads (rates, IDs, quantities)
+
+### Warm-start and GrepAI
+
+GrepAI uses `nomic-embed-text` for context lookups, which keeps the Ollama **service** running continuously. The 30-50s service cold-start is effectively eliminated in normal workflow. `qwen3:14b` may take ~5s to swap into VRAM if it hasn't been called recently, but that's the worst case — not 50s.
+
+If the first Ollama call of a session needs to be fast, send a throwaway warm-up ping:
+```bash
+py -c "
+import urllib.request, json
+body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':'ok'}],'stream':False,'think':False}).encode()
+urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=60).read()
+print('warm')
+"
+```
+
 ## When to Use Which Model

 | Task | Model |
 |------|-------|
-| Summarize logs, diffs, session notes | qwen3:14b |
+| Session log narrative sections | qwen3:14b |
+| Commit message body | qwen3:14b |
+| Ticket / client comment drafting | qwen3:14b |
+| Summarize logs, diffs, incident notes | qwen3:14b |
 | Classify bug type, severity, category | qwen3:14b |
 | Extract structured data from text | qwen3:14b |
-| Draft commit message from diff | qwen3:14b |
-| Suggest refactor for a function | codestral:22b |
-| Docstring / comment generation | codestral:22b |
+| Code comment / docstring generation | codestral:22b |
+| Refactor suggestions | codestral:22b |

 ## Review Policy

- Low-stakes output (summary, classification, draft) — use directly
+- Documentation output (session logs, commit messages, comments) — Claude reviews before writing/posting
 - Code suggestions from codestral — always review before applying
- Never use Ollama for: auth decisions, credential handling, production migrations, security review
+- Never use Ollama for: credentials, auth decisions, production migrations, security review, API payload field values
--- a/.claude/commands/checkpoint.md
+++ b/.claude/commands/checkpoint.md
@@ -20,17 +20,34 @@ Please create a comprehensive git checkpoint with the following steps:
   - Add ALL untracked files (new files)
   - Use `git add -A` or `git add .` to stage everything

-4. **Create a detailed commit message**:
+4. **Draft commit message body via Ollama** (documentation engine):

-   - **First line**: Write a clear, concise summary (50-72 chars) describing the primary change
-     - Use imperative mood (e.g., "Add feature" not "Added feature")
-     - Examples: "feat: add user authentication", "fix: resolve database connection issue", "refactor: improve API route structure"
-   - **Body**: Provide a detailed description including:
-     - What changes were made (list of key modifications)
-     - Why these changes were made (purpose/motivation)
-     - Any important technical details or decisions
-     - Breaking changes or migration notes if applicable
-   - **Footer**: Include co-author attribution as shown in the Git Safety Protocol
+   ```bash
+   # Resolve Ollama
+   if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://localhost:11434"
+   elif curl -s -m 3 http://100.92.127.64:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://100.92.127.64:11434"
+   else OLLAMA=""; fi
+
+   # Capture diff summary for Ollama prompt
+   { git diff --stat HEAD; echo "---"; git diff HEAD | head -200; } \
+     > "C:/Users/guru/AppData/Local/Temp/checkpoint_diff.txt"
+
+   # Ollama drafts the body; fallback to Claude if unavailable
+   if [ -n "$OLLAMA" ]; then
+     BODY=$(py -c "
+import urllib.request, json
+diff = open('C:/Users/guru/AppData/Local/Temp/checkpoint_diff.txt', encoding='utf-8').read()
+prompt = 'Write a git commit message BODY only (not the summary line). Imperative mood. What changed and why. No filler. Under 150 words.\n\nDIFF:\n' + diff
+body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':prompt}],'stream':False,'think':False}).encode()
+res = json.loads(urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=60).read())
+print(res['message']['content'])
+")
+   fi
+   ```
+
+   - **Summary line** (first line): Claude writes — 50-72 chars, imperative mood, from `git diff --stat`
+   - **Body**: Ollama draft (Claude reviews); Claude writes directly if Ollama unavailable
+   - **Footer**: `Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>`

 5. **Execute the commit**: Create the commit with the properly formatted message following this repository's conventions.

--- a/.claude/commands/save.md
+++ b/.claude/commands/save.md
@@ -1,5 +1,56 @@
 Save a COMPREHENSIVE session log to appropriate session-logs/ directory. This is critical for context recovery.

+## Ollama drafting (documentation engine)
+
+Narrative sections are drafted by Ollama (qwen3:14b), then assembled with Claude-generated factual sections. Claude reviews the full document before writing.
+
+**Ollama drafts:** Session Summary, Key Decisions, Problems Encountered
+**Claude owns (verbatim, never delegated):** Credentials, infrastructure IPs/hostnames, command outputs, file paths, pending tasks
+
+### Draft call
+
+```bash
+# Check Ollama (reuse $OLLAMA across the save operation)
+if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://localhost:11434"
+elif curl -s -m 3 http://100.92.127.64:11434/api/tags >/dev/null 2>&1; then OLLAMA="http://100.92.127.64:11434"
+else OLLAMA=""; fi
+
+# Write narrative prompt to temp file
+cat > "C:/Users/guru/AppData/Local/Temp/save_narrative_prompt.txt" << 'ENDPROMPT'
+You are a technical session log writer for an MSP (managed service provider).
+Write three sections of a session log in markdown. Be concise, factual, and technical.
+No filler phrases. Use past tense.
+
+WORK DONE THIS SESSION:
+<paste bullet list of what happened>
+
+Write these three sections only:
+
+## Session Summary
+<2-4 paragraph narrative: what was accomplished, in what order, why>
+
+## Key Decisions
+<bullet list of non-obvious decisions made and their rationale>
+
+## Problems Encountered
+<bullet list of problems hit and how each was resolved; omit if none>
+ENDPROMPT
+
+NARRATIVE=$(py -c "
+import urllib.request, json
+prompt = open('C:/Users/guru/AppData/Local/Temp/save_narrative_prompt.txt', encoding='utf-8').read()
+body = json.dumps({'model':'qwen3:14b','messages':[{'role':'user','content':prompt}],'stream':False,'think':False}).encode()
+res = json.loads(urllib.request.urlopen(urllib.request.Request('$OLLAMA/api/chat', body), timeout=120).read())
+print(res['message']['content'])
+")
+
+# Fallback: if OLLAMA empty, Claude writes narrative directly
+```
+
+Claude reviews the narrative output before assembling the final document.
+
+---
+
 ## Determine Correct Location

 **IMPORTANT: Save to project-specific or general session-logs based on work context**