sync: Add Wrightstown Solar and Smart Home projects
New projects from 2026-02-09 research session: Wrightstown Solar: - DIY 48V LiFePO4 battery storage (EVE C40 cells) - Victron MultiPlus II whole-house UPS design - BMS comparison (Victron CAN bus compatible) - EV salvage analysis (new cells won) - Full parts list and budget Wrightstown Smart Home: - Home Assistant Yellow setup (local voice, no cloud) - Local LLM server build guide (Ollama + RTX 4090) - Hybrid LLM bridge (LiteLLM + Claude API + Grok API) - Network security (VLAN architecture, PII sanitization) Machine: ACG-M-L5090 Timestamp: 2026-02-09 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
290
projects/wrightstown-smarthome/documentation/hybrid-bridge.md
Normal file
290
projects/wrightstown-smarthome/documentation/hybrid-bridge.md
Normal file
@@ -0,0 +1,290 @@
|
||||
# Hybrid LLM Bridge - Local + Cloud Routing
|
||||
|
||||
**Created:** 2026-02-09
|
||||
**Purpose:** Route queries intelligently between local Ollama, Claude API, and Grok API
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
User Query (voice, chat, HA automation)
|
||||
|
|
||||
[LiteLLM Proxy]
|
||||
localhost:4000
|
||||
|
|
||||
Routing Decision
|
||||
/ | \
|
||||
[Ollama] [Claude] [Grok]
|
||||
Local Anthropic xAI
|
||||
Free Reasoning Search
|
||||
Private $3/$15/1M $3/$15/1M
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommended: LiteLLM Proxy
|
||||
|
||||
Unified API gateway that presents a single OpenAI-compatible endpoint. Everything talks to `localhost:4000` and LiteLLM routes to the right backend.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install litellm[proxy]
|
||||
```
|
||||
|
||||
### Configuration (`config.yaml`)
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
# Local models (free, private)
|
||||
- model_name: local-fast
|
||||
litellm_params:
|
||||
model: ollama/qwen2.5:7b
|
||||
api_base: http://localhost:11434
|
||||
|
||||
- model_name: local-reasoning
|
||||
litellm_params:
|
||||
model: ollama/llama3.1:70b-q4
|
||||
api_base: http://localhost:11434
|
||||
|
||||
# Cloud: Claude (complex reasoning)
|
||||
- model_name: cloud-reasoning
|
||||
litellm_params:
|
||||
model: claude-sonnet-4-5-20250929
|
||||
api_key: sk-ant-XXXXX
|
||||
|
||||
- model_name: cloud-reasoning-cheap
|
||||
litellm_params:
|
||||
model: claude-haiku-4-5-20251001
|
||||
api_key: sk-ant-XXXXX
|
||||
|
||||
# Cloud: Grok (internet search)
|
||||
- model_name: cloud-search
|
||||
litellm_params:
|
||||
model: grok-4
|
||||
api_key: xai-XXXXX
|
||||
api_base: https://api.x.ai/v1
|
||||
|
||||
router_settings:
|
||||
routing_strategy: simple-shuffle
|
||||
allowed_fails: 2
|
||||
num_retries: 3
|
||||
|
||||
budget_policy:
|
||||
local-fast: unlimited
|
||||
local-reasoning: unlimited
|
||||
cloud-reasoning: $50/month
|
||||
cloud-reasoning-cheap: $25/month
|
||||
cloud-search: $25/month
|
||||
```
|
||||
|
||||
### Start the Proxy
|
||||
|
||||
```bash
|
||||
litellm --config config.yaml --port 4000
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
Everything talks to `http://localhost:4000` with OpenAI-compatible format:
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
api_key="anything", # LiteLLM doesn't need this for local
|
||||
base_url="http://localhost:4000"
|
||||
)
|
||||
|
||||
# Route to local
|
||||
response = client.chat.completions.create(
|
||||
model="local-fast",
|
||||
messages=[{"role": "user", "content": "Turn on the lights"}]
|
||||
)
|
||||
|
||||
# Route to Claude
|
||||
response = client.chat.completions.create(
|
||||
model="cloud-reasoning",
|
||||
messages=[{"role": "user", "content": "Analyze my energy usage patterns"}]
|
||||
)
|
||||
|
||||
# Route to Grok
|
||||
response = client.chat.completions.create(
|
||||
model="cloud-search",
|
||||
messages=[{"role": "user", "content": "What's the current electricity rate in PA?"}]
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Routing Strategy
|
||||
|
||||
### What Goes Where
|
||||
|
||||
**Local (Ollama) -- Default for everything private:**
|
||||
- Home automation commands ("turn on lights", "set thermostat to 72")
|
||||
- Sensor data queries ("what's the temperature in the garage?")
|
||||
- Camera-related queries (never send video to cloud)
|
||||
- Personal information queries
|
||||
- Simple Q&A
|
||||
- Quick lookups from local knowledge
|
||||
|
||||
**Claude API -- Complex reasoning tasks:**
|
||||
- Detailed analysis ("analyze my energy trends this month")
|
||||
- Code generation ("write an HA automation for...")
|
||||
- Long-form content creation
|
||||
- Multi-step reasoning problems
|
||||
- Function calling for HA service control
|
||||
|
||||
**Grok API -- Internet/real-time data:**
|
||||
- Current events ("latest news on solar tariffs")
|
||||
- Real-time pricing ("current electricity rates")
|
||||
- Weather data (if not using local integration)
|
||||
- Web searches
|
||||
- Anything requiring information the local model doesn't have
|
||||
|
||||
### Manual vs Automatic Routing
|
||||
|
||||
**Phase 1 (Start here):** Manual model selection
|
||||
- User picks "local-fast", "cloud-reasoning", or "cloud-search" in Open WebUI
|
||||
- Simple, no mistakes, full control
|
||||
- Good for learning which queries work best where
|
||||
|
||||
**Phase 2 (Later):** Keyword-based routing in LiteLLM
|
||||
- Route based on keywords in the query
|
||||
- "search", "latest", "current" --> Grok
|
||||
- "analyze", "explain in detail", "write code" --> Claude
|
||||
- Everything else --> local
|
||||
|
||||
**Phase 3 (Advanced):** Semantic routing
|
||||
- Use sentence embeddings to classify query intent
|
||||
- Small local model (all-MiniLM-L6-v2) classifies in 50-200ms
|
||||
- Most intelligent routing, but requires Python development
|
||||
|
||||
---
|
||||
|
||||
## Cloud API Details
|
||||
|
||||
### Claude (Anthropic)
|
||||
|
||||
**Endpoint:** `https://api.anthropic.com/v1/messages`
|
||||
**Get API key:** https://console.anthropic.com/
|
||||
|
||||
**Pricing (2025-2026):**
|
||||
|
||||
| Model | Input/1M tokens | Output/1M tokens | Best For |
|
||||
|---|---|---|---|
|
||||
| Claude Haiku 4.5 | $0.50 | $2.50 | Fast, cheap tasks |
|
||||
| Claude Sonnet 4.5 | $3.00 | $15.00 | Best balance |
|
||||
| Claude Opus 4.5 | $5.00 | $25.00 | Top quality |
|
||||
|
||||
**Cost optimization:**
|
||||
- Prompt caching: 90% savings on repeated system prompts
|
||||
- Use Haiku for simple tasks, Sonnet for complex ones
|
||||
- Batch processing available for non-urgent tasks
|
||||
|
||||
**Features:**
|
||||
- 200k context window
|
||||
- Extended thinking mode
|
||||
- Function calling (perfect for HA control)
|
||||
- Vision support (could analyze charts, screenshots)
|
||||
|
||||
### Grok (xAI)
|
||||
|
||||
**Endpoint:** `https://api.x.ai/v1/chat/completions`
|
||||
**Get API key:** https://console.x.ai/
|
||||
**Format:** OpenAI SDK compatible
|
||||
|
||||
**Pricing:**
|
||||
|
||||
| Model | Input/1M tokens | Output/1M tokens | Best For |
|
||||
|---|---|---|---|
|
||||
| Grok 4.1 Fast | $0.20 | $1.00 | Budget queries |
|
||||
| Grok 4 | $3.00 | $15.00 | Full capability |
|
||||
|
||||
**Free credits:** $25 new user + $150/month if opting into data sharing program
|
||||
|
||||
**Features:**
|
||||
- 2 million token context window (industry-leading)
|
||||
- Real-time X (Twitter) integration
|
||||
- Internet search capability
|
||||
- OpenAI SDK compatibility
|
||||
|
||||
---
|
||||
|
||||
## Monthly Cost Estimates
|
||||
|
||||
### Conservative Use (80/15/5 Split, 1000 queries/month)
|
||||
|
||||
| Route | Queries | Model | Cost |
|
||||
|---|---|---|---|
|
||||
| Local (80%) | 800 | Ollama | $0 |
|
||||
| Claude (15%) | 150 | Haiku 4.5 | ~$0.45 |
|
||||
| Grok (5%) | 50 | Grok 4.1 Fast | ~$0.07 |
|
||||
| **Total** | **1000** | | **~$0.52/month** |
|
||||
|
||||
### Heavy Use (60/25/15 Split, 3000 queries/month)
|
||||
|
||||
| Route | Queries | Model | Cost |
|
||||
|---|---|---|---|
|
||||
| Local (60%) | 1800 | Ollama | $0 |
|
||||
| Claude (25%) | 750 | Sonnet 4.5 | ~$15 |
|
||||
| Grok (15%) | 450 | Grok 4 | ~$9 |
|
||||
| **Total** | **3000** | | **~$24/month** |
|
||||
|
||||
**Add electricity for LLM server:** ~$15-30/month (RTX 4090 build)
|
||||
|
||||
---
|
||||
|
||||
## Home Assistant Integration
|
||||
|
||||
### Connect HA to LiteLLM Proxy
|
||||
|
||||
**Option 1: Extended OpenAI Conversation (Recommended)**
|
||||
|
||||
Install via HACS, then configure:
|
||||
- API Base URL: `http://<llm-server-ip>:4000/v1`
|
||||
- API Key: (any string, LiteLLM doesn't validate for local)
|
||||
- Model: `local-fast` (or any model name from your config)
|
||||
|
||||
This gives HA natural language control:
|
||||
- "Turn off all lights downstairs" --> local LLM understands --> calls HA service
|
||||
- "What's my battery charge level?" --> queries HA entities --> responds
|
||||
|
||||
**Option 2: Native Ollama Integration**
|
||||
|
||||
Settings > Integrations > Ollama:
|
||||
- URL: `http://<llm-server-ip>:11434`
|
||||
- Simpler but bypasses the routing layer
|
||||
|
||||
### Voice Assistant Pipeline
|
||||
|
||||
```
|
||||
Wake word detected ("Hey Jarvis")
|
||||
|
|
||||
Whisper (speech-to-text, local)
|
||||
|
|
||||
Query text
|
||||
|
|
||||
Extended OpenAI Conversation
|
||||
|
|
||||
LiteLLM Proxy (routing)
|
||||
|
|
||||
Response text
|
||||
|
|
||||
Piper (text-to-speech, local)
|
||||
|
|
||||
Speaker output
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- https://docs.litellm.ai/
|
||||
- https://github.com/open-webui/open-webui
|
||||
- https://console.anthropic.com/
|
||||
- https://docs.x.ai/developers/models
|
||||
- https://github.com/jekalmin/extended_openai_conversation
|
||||
- https://github.com/aurelio-labs/semantic-router
|
||||
Reference in New Issue
Block a user