sync: auto-sync from HOWARD-HOME at 2026-04-23 06:21:23
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-04-23 06:21:23
This commit is contained in:
@@ -23,6 +23,7 @@
|
||||
- [D2TESTNAS SSH Access](feedback_d2testnas_ssh.md) - Use root@192.168.0.9 with Paper123!@#, not sysadmin
|
||||
- [Bypass Permissions Setting](feedback_bypass_permissions_setting.md) - Set permissions.defaultMode to bypassPermissions in settings.json on all machines
|
||||
- [365 Remediation Tool](feedback_365_remediation_tool.md) - Always means Graph API app fabb3421, not CIPP
|
||||
- [Ollama Tier-0 Routing](feedback_ollama_tier0_routing.md) - Route drafts/summaries/classifications through Ollama (qwen3:14b). Mike designed ClaudeTools this way — not optional.
|
||||
|
||||
## Machine
|
||||
- [ACG-5070 Workstation Setup](reference_workstation_setup.md) - Windows 11 Pro clean install 2026-03-30, replaced CachyOS. All tools installed.
|
||||
|
||||
46
.claude/memory/feedback_ollama_tier0_routing.md
Normal file
46
.claude/memory/feedback_ollama_tier0_routing.md
Normal file
@@ -0,0 +1,46 @@
|
||||
---
|
||||
name: Route Tier-0 tasks through Ollama (Mike's ClaudeTools design intent)
|
||||
description: Drafts, summaries, classifications, extractions MUST go through Ollama per Mike's tiered-model architecture. Don't default to Claude inference for low-stakes text generation.
|
||||
type: feedback
|
||||
---
|
||||
|
||||
Route Tier-0 tasks (summaries, classifications, drafts, extractions) through Ollama. Not optional — this is how Mike designed ClaudeTools to work.
|
||||
|
||||
**Why:** Mike built the tiered-model architecture (`CLAUDE.md` Model Routing section + `.claude/OLLAMA.md`) deliberately. Tier 0 is free + fast + private. Defaulting to Claude for every drafting task burns context window and Anthropic tokens on work that qwen3:14b does fine.
|
||||
|
||||
**How to apply:**
|
||||
- Drafting emails, session-log paragraphs, status-update sentences, commit-message first-drafts → qwen3:14b
|
||||
- Summarizing long output (Graph JSON, PowerShell transcripts, log tails) → qwen3:14b
|
||||
- Extracting structured data from text → qwen3:14b
|
||||
- Suggesting refactors / generating docstrings → codestral:22b (then review)
|
||||
- NEVER for: auth decisions, credential handling, production migrations, security review, citation work, production-change scripts
|
||||
|
||||
**Endpoint resolution (updated 2026-04-22 in `.claude/OLLAMA.md`):**
|
||||
```bash
|
||||
if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
|
||||
OLLAMA="http://localhost:11434"
|
||||
else
|
||||
OLLAMA="http://100.92.127.64:11434"
|
||||
fi
|
||||
```
|
||||
|
||||
HOWARD-HOME has the canonical models loaded locally (qwen3:14b, codestral:22b, nomic-embed-text, plus bonus qwen3-coder:30b) — so HOWARD-HOME uses local Ollama, not Mike's. Zero Tailscale hop.
|
||||
|
||||
**Call pattern for qwen3 — use `/api/chat` with `think:false`**, NOT `/api/generate`. qwen3 on generate endpoint dumps reasoning into internal thinking tokens and returns empty `response` field. Chat endpoint with `think:false` returns clean content in `message.content`:
|
||||
|
||||
```python
|
||||
body = json.dumps({
|
||||
'model':'qwen3:14b',
|
||||
'messages':[{'role':'user','content': prompt}],
|
||||
'stream':False,
|
||||
'think':False
|
||||
}).encode()
|
||||
# POST to OLLAMA + '/api/chat'
|
||||
# Read res['message']['content']
|
||||
```
|
||||
|
||||
Codestral doesn't need `think:false` — just use it on `/api/chat` normally.
|
||||
|
||||
Cold-start ~30-50s on first call per model per session; warm calls 1-5s.
|
||||
|
||||
**Incident 2026-04-22:** Spent an entire Cascades rollout session (G1 hygiene, orphan cleanup, risk register, synology discovery, etc.) without routing a single task through Ollama despite many drafting opportunities (report drafts, summary text, email drafts). Howard called this out: "just make sure ollama is being used as mike has designed claudetools to work."
|
||||
Reference in New Issue
Block a user