Commit Graph

14 Commits

Author SHA1 Message Date
38cff65fba ollama: fix broken endpoint auto-detect in OLLAMA.md one-liner (RTFM audit)
Audited the Ollama reference (no wrapper script — it's the OLLAMA.md doc + inline
HTTP-API call pattern) against the live server (Ollama 0.30.8 on GURU-5070):
- /api/chat + think:false + res['message']['content'] confirmed working (clean
  output, no thinking leak) -- the core call pattern is correct.
- All referenced models exist on the server (qwen3:8b, qwen3.6:latest, qwen3:14b,
  codestral:22b, nomic-embed-text).

Real bug found + fixed: the "Preferred one-liner" auto-detected the endpoint with
`urlopen(...)` used as a truthiness test. urlopen RAISES URLError on a down host
(proven), so the ternary's fallback branch was dead code -- it crashed on a down
localhost instead of failing over to Beast, and it did a per-call probe that
contradicts the doc's own "read endpoint from identity.json, no probe" rule 30 lines
above. Replaced with the identity.json endpoint+model pattern (also swaps the
hardcoded qwen3:14b for the per-machine prose_model). Validated verbatim end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 20:22:59 -07:00
63bc234d1b sync: auto-sync from GURU-KALI at 2026-05-26 20:08:37
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-26 20:08:37
2026-05-26 20:08:39 -07:00
2bec888ea7 sync: auto-sync from GURU-KALI at 2026-05-26 19:59:15
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-26 19:59:15
2026-05-26 19:59:16 -07:00
9595c9059b sync: auto-sync from GURU-5070 at 2026-05-25 06:00:45
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-25 06:00:45
2026-05-25 06:01:37 -07:00
1cf9c48461 sync: auto-sync from GURU-KALI at 2026-05-24 06:54:59
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-24 06:54:59
2026-05-24 06:54:59 -07:00
d4d9a71aa7 docs: fix broken markdown tables in OLLAMA.md
The qwen3:8b routing update inserted footnote lines mid-table in both
the "What Ollama owns" and "When to Use Which Model" sections, splitting
each table in half so renderers treated the qwen3.6 rows as paragraph
text. Moved footnotes below the closing table row in both places.

Also updated the bottom "Rule of thumb" line: previously named qwen3:14b
with a "2x faster" claim that's now stale on DESKTOP-0O8A1RL where 8b is
the prose model. Generalized to "the per-machine prose model".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:54:20 -07:00
66a4a03e28 feat: add qwen3:8b for DESKTOP-0O8A1RL, update Ollama routing
Benchmarked 2026-05-16 on DESKTOP-0O8A1RL (RTX 5070 Ti Laptop, 12 GB VRAM):
- qwen3:8b: 100% VRAM fit (10.9/10.9 GB) -> 74-86 tok/s
- qwen3:14b: 73% VRAM (11.3/15.6 GB split) -> 17-18 tok/s (4.8x slower)
- qwen3.6:  41% VRAM (11.3/27.5 GB split) -> 17-19 tok/s

qwen3:14b overflows 12 GB VRAM at runtime (9.3 GB GGUF = 15.6 GB loaded).
qwen3:8b fits entirely in VRAM and matches the reference machine speed.

Updated OLLAMA.md: added qwen3:8b to models table, per-machine routing
table, benchmark results. Updated CLAUDE.md model one-liner.
Routing: qwen3:8b for prose on DESKTOP-0O8A1RL, qwen3:14b everywhere else,
qwen3.6 for strict-format tasks on all machines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:25:57 -07:00
12d5a976d4 Session log: qwen3.6 benchmark, route strict-format to 3.6
Benchmarked qwen3.6 (36B MoE) vs qwen3:14b and qwen3:32b on 16
representative prompts. qwen3.6 scored 15/16 vs 14b 11/16 and 32b
12/16, winning every strict-format/adherence test (multi-step rules,
weekend-aware scheduling, prompt-injection resistance, word-limit
summary). Single reasoning regression noted for re-check at qwen3.7.

Updated .claude/OLLAMA.md (Models, Documentation Engine, and
When-to-Use tables) and .claude/CLAUDE.md one-line model summary to
route strict-format work to qwen3.6 and keep bulk prose on qwen3:14b
(2x faster). Also removed openclaw npm package + ~/.openclaw data dir
earlier in the session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:03:07 -07:00
5da286e18f docs: apply vix-inspired token efficiency optimizations
- CLAUDE.md: trim ~45 lines — compress Live State Tracking, Automatic
  Context Loading, File Placement, Ollama sections; add single-agent
  guidance for coupled explore→implement tasks
- CODING_GUIDELINES.md: add GrepAI-first rule with token cost rationale;
  add GuruRMM platform parity matrix and cross-platform coding standards
- OLLAMA.md: expand tier-0 scope to include diff summarization, error
  categorization, agent phase handoff summaries, client email drafts,
  ticket classification with priority

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 15:50:29 -07:00
3e2b04c489 grepai: fix index staleness, mandate usage, document config for new machines
Index was dead since 2026-04-19 (watcher not running). Fixes:
- Watcher restarted; scheduled task registered for login persistence
- Removed .md 0.6x penalty — markdown is primary content in this repo
- Added session-logs/ 1.3x, .claude/ 1.2x, /clients/ 1.1x relevance bonuses
- CLAUDE.md: grepai_search is now the first step for any context lookup
- OLLAMA.md: documents config overrides + watcher setup for new machines

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:42:01 -07:00
36cc1ddedd docs: establish Ollama as the documentation engine
Route all prose generation (session logs, commit messages, Syncro
comments, client notes, code docs) through Ollama qwen3:14b by default.
Claude reviews output and owns verbatim-accuracy sections (credentials,
IPs, command outputs). GrepAI context lookups keep the Ollama service
warm, eliminating the 30-50s cold-start in normal workflow.

Updates: OLLAMA.md (documentation engine scope + warm-start note),
CLAUDE.md (Ollama section), save.md (narrative drafting), checkpoint.md
(commit message body drafting).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:37:45 -07:00
73c28cd9db sync: auto-sync from HOWARD-HOME at 2026-04-23 06:21:23
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-04-23 06:21:23
2026-04-23 06:21:24 -07:00
391178ef02 fix: replace python3 with py/jq throughout scripts and docs
Windows Store python3 stub returns exit 49 instead of running Python.
Replace with: py (Windows launcher) for actual Python code, jq for
simple JSON extraction. Reorder fallback loops to try py first.
Add Bash(py:*) to settings.local.json allowlist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 12:14:43 -07:00
a50af7faf1 refactor: optimize CLAUDE.md context footprint (-49%)
Extract Ollama docs and PROJECT_STATE locking protocol to on-demand
reference files. Trim Work Mode to detection table only. Remove verbose
anti-pattern examples and credential encryption details.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 12:09:17 -07:00