Ollama shared via Tailscale: per-machine URL detection + Howard access

CLAUDE.md: Ollama section rewritten. localhost for Mike's workstation, 100.92.127.64:11434 via Tailscale for all other machines. Claude reads identity.json hostname to determine which URL to use. Firewall rule restricts to Tailscale 100.0.0.0/8 subnet only. ONBOARDING.md: updated Ollama section for remote access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 13:04:44 -07:00
parent b99f8512e4
commit 5995511011
2 changed files with 37 additions and 4 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -290,7 +290,7 @@ Service account token in vault: `infrastructure/1password-service-account.sops.y

 ## Local AI (Ollama)

-Ollama runs locally with GPU acceleration for tasks that don't need Claude-level reasoning.
+Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Available to all team members via Tailscale.

 | Model | Size | Use For |
 |-------|------|---------|
@@ -298,11 +298,37 @@ Ollama runs locally with GPU acceleration for tasks that don't need Claude-level
 | `codestral:22b` | 12 GB | Code generation, refactoring suggestions, docstrings |
 | `nomic-embed-text` | 274 MB | Embeddings only (used by GrepAI) |

+### How to connect
+
+**On Mike's workstation (local):**
 ```bash
-# Simple prompt
 curl -s http://localhost:11434/api/generate -d '{"model":"qwen3:14b","prompt":"...","stream":false}' | jq -r '.response'
 ```

+**On any other machine via Tailscale:**
+```bash
+curl -s http://100.92.127.64:11434/api/generate -d '{"model":"qwen3:14b","prompt":"...","stream":false}' | jq -r '.response'
+```
+
+### Per-machine setup
+
+Read `.claude/identity.json` to determine which machine you're on:
+- **DESKTOP-0O8A1RL** (Mike's workstation): Ollama runs locally. Use `localhost:11434`.
+- **Any other machine** (Howard's laptop, other workstations): Ollama is remote via Tailscale. Use `100.92.127.64:11434`. Requires Tailscale to be connected.
+
+**To check if Ollama is reachable:**
+```bash
+curl -s http://100.92.127.64:11434/api/tags | python -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin).get('models',[])]"
+```
+
+If it fails: verify Tailscale is connected (`tailscale status`), and that Mike's workstation is online.
+
+### Access control
+
+- Firewall rule on Mike's workstation allows port 11434 ONLY from Tailscale subnet (100.0.0.0/8)
+- NOT exposed to LAN, VPN, or internet
+- Binding: `OLLAMA_HOST=0.0.0.0:11434` (all interfaces, firewall restricts)
+
 **Review policy:** Always review Critical/High impact Ollama outputs (auth, security, migrations, production). Trust Low impact (classification, formatting). Flag uncertainty to user.

 ### GrepAI (Semantic Code Search)
--- a/.claude/ONBOARDING.md
+++ b/.claude/ONBOARDING.md
@@ -154,9 +154,16 @@ Ollama runs AI models locally on your GPU. Used for tasks that don't need Claude
 - `codestral:22b` — code generation assistance
 - `nomic-embed-text` — embeddings for semantic search

-**Status:** May not be installed/running on your machine yet. If Claude says "Ollama not available", it's fine — it falls back to doing the work itself.
+**Ollama runs on Mike's workstation** and is shared via Tailscale. You don't need to install it locally.

-**To check:** `curl http://localhost:11434/api/tags` — if it returns models, Ollama is running.
+**To use from your machine (Tailscale must be connected):**
+```bash
+curl -s http://100.92.127.64:11434/api/tags
+```
+
+If that returns models, you're connected. Claude automatically uses the right URL based on which machine you're on (reads from `identity.json`).
+
+If it fails: check that Tailscale is connected (`tailscale status`) and Mike's workstation is online.

 ### GrepAI (semantic code search)