sync: Add Wrightstown Solar and Smart Home projects

New projects from 2026-02-09 research session:

Wrightstown Solar:
- DIY 48V LiFePO4 battery storage (EVE C40 cells)
- Victron MultiPlus II whole-house UPS design
- BMS comparison (Victron CAN bus compatible)
- EV salvage analysis (new cells won)
- Full parts list and budget

Wrightstown Smart Home:
- Home Assistant Yellow setup (local voice, no cloud)
- Local LLM server build guide (Ollama + RTX 4090)
- Hybrid LLM bridge (LiteLLM + Claude API + Grok API)
- Network security (VLAN architecture, PII sanitization)

Machine: ACG-M-L5090
Timestamp: 2026-02-09

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-10 18:44:35 -07:00
parent fee9cc01ac
commit aaf4172b3c
12 changed files with 1953 additions and 0 deletions

View File

@@ -0,0 +1,290 @@
# Hybrid LLM Bridge - Local + Cloud Routing
**Created:** 2026-02-09
**Purpose:** Route queries intelligently between local Ollama, Claude API, and Grok API
---
## Architecture
```
User Query (voice, chat, HA automation)
|
[LiteLLM Proxy]
localhost:4000
|
Routing Decision
/ | \
[Ollama] [Claude] [Grok]
Local Anthropic xAI
Free Reasoning Search
Private $3/$15/1M $3/$15/1M
```
---
## Recommended: LiteLLM Proxy
Unified API gateway that presents a single OpenAI-compatible endpoint. Everything talks to `localhost:4000` and LiteLLM routes to the right backend.
### Installation
```bash
pip install litellm[proxy]
```
### Configuration (`config.yaml`)
```yaml
model_list:
# Local models (free, private)
- model_name: local-fast
litellm_params:
model: ollama/qwen2.5:7b
api_base: http://localhost:11434
- model_name: local-reasoning
litellm_params:
model: ollama/llama3.1:70b-q4
api_base: http://localhost:11434
# Cloud: Claude (complex reasoning)
- model_name: cloud-reasoning
litellm_params:
model: claude-sonnet-4-5-20250929
api_key: sk-ant-XXXXX
- model_name: cloud-reasoning-cheap
litellm_params:
model: claude-haiku-4-5-20251001
api_key: sk-ant-XXXXX
# Cloud: Grok (internet search)
- model_name: cloud-search
litellm_params:
model: grok-4
api_key: xai-XXXXX
api_base: https://api.x.ai/v1
router_settings:
routing_strategy: simple-shuffle
allowed_fails: 2
num_retries: 3
budget_policy:
local-fast: unlimited
local-reasoning: unlimited
cloud-reasoning: $50/month
cloud-reasoning-cheap: $25/month
cloud-search: $25/month
```
### Start the Proxy
```bash
litellm --config config.yaml --port 4000
```
### Usage
Everything talks to `http://localhost:4000` with OpenAI-compatible format:
```python
import openai
client = openai.OpenAI(
api_key="anything", # LiteLLM doesn't need this for local
base_url="http://localhost:4000"
)
# Route to local
response = client.chat.completions.create(
model="local-fast",
messages=[{"role": "user", "content": "Turn on the lights"}]
)
# Route to Claude
response = client.chat.completions.create(
model="cloud-reasoning",
messages=[{"role": "user", "content": "Analyze my energy usage patterns"}]
)
# Route to Grok
response = client.chat.completions.create(
model="cloud-search",
messages=[{"role": "user", "content": "What's the current electricity rate in PA?"}]
)
```
---
## Routing Strategy
### What Goes Where
**Local (Ollama) -- Default for everything private:**
- Home automation commands ("turn on lights", "set thermostat to 72")
- Sensor data queries ("what's the temperature in the garage?")
- Camera-related queries (never send video to cloud)
- Personal information queries
- Simple Q&A
- Quick lookups from local knowledge
**Claude API -- Complex reasoning tasks:**
- Detailed analysis ("analyze my energy trends this month")
- Code generation ("write an HA automation for...")
- Long-form content creation
- Multi-step reasoning problems
- Function calling for HA service control
**Grok API -- Internet/real-time data:**
- Current events ("latest news on solar tariffs")
- Real-time pricing ("current electricity rates")
- Weather data (if not using local integration)
- Web searches
- Anything requiring information the local model doesn't have
### Manual vs Automatic Routing
**Phase 1 (Start here):** Manual model selection
- User picks "local-fast", "cloud-reasoning", or "cloud-search" in Open WebUI
- Simple, no mistakes, full control
- Good for learning which queries work best where
**Phase 2 (Later):** Keyword-based routing in LiteLLM
- Route based on keywords in the query
- "search", "latest", "current" --> Grok
- "analyze", "explain in detail", "write code" --> Claude
- Everything else --> local
**Phase 3 (Advanced):** Semantic routing
- Use sentence embeddings to classify query intent
- Small local model (all-MiniLM-L6-v2) classifies in 50-200ms
- Most intelligent routing, but requires Python development
---
## Cloud API Details
### Claude (Anthropic)
**Endpoint:** `https://api.anthropic.com/v1/messages`
**Get API key:** https://console.anthropic.com/
**Pricing (2025-2026):**
| Model | Input/1M tokens | Output/1M tokens | Best For |
|---|---|---|---|
| Claude Haiku 4.5 | $0.50 | $2.50 | Fast, cheap tasks |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Best balance |
| Claude Opus 4.5 | $5.00 | $25.00 | Top quality |
**Cost optimization:**
- Prompt caching: 90% savings on repeated system prompts
- Use Haiku for simple tasks, Sonnet for complex ones
- Batch processing available for non-urgent tasks
**Features:**
- 200k context window
- Extended thinking mode
- Function calling (perfect for HA control)
- Vision support (could analyze charts, screenshots)
### Grok (xAI)
**Endpoint:** `https://api.x.ai/v1/chat/completions`
**Get API key:** https://console.x.ai/
**Format:** OpenAI SDK compatible
**Pricing:**
| Model | Input/1M tokens | Output/1M tokens | Best For |
|---|---|---|---|
| Grok 4.1 Fast | $0.20 | $1.00 | Budget queries |
| Grok 4 | $3.00 | $15.00 | Full capability |
**Free credits:** $25 new user + $150/month if opting into data sharing program
**Features:**
- 2 million token context window (industry-leading)
- Real-time X (Twitter) integration
- Internet search capability
- OpenAI SDK compatibility
---
## Monthly Cost Estimates
### Conservative Use (80/15/5 Split, 1000 queries/month)
| Route | Queries | Model | Cost |
|---|---|---|---|
| Local (80%) | 800 | Ollama | $0 |
| Claude (15%) | 150 | Haiku 4.5 | ~$0.45 |
| Grok (5%) | 50 | Grok 4.1 Fast | ~$0.07 |
| **Total** | **1000** | | **~$0.52/month** |
### Heavy Use (60/25/15 Split, 3000 queries/month)
| Route | Queries | Model | Cost |
|---|---|---|---|
| Local (60%) | 1800 | Ollama | $0 |
| Claude (25%) | 750 | Sonnet 4.5 | ~$15 |
| Grok (15%) | 450 | Grok 4 | ~$9 |
| **Total** | **3000** | | **~$24/month** |
**Add electricity for LLM server:** ~$15-30/month (RTX 4090 build)
---
## Home Assistant Integration
### Connect HA to LiteLLM Proxy
**Option 1: Extended OpenAI Conversation (Recommended)**
Install via HACS, then configure:
- API Base URL: `http://<llm-server-ip>:4000/v1`
- API Key: (any string, LiteLLM doesn't validate for local)
- Model: `local-fast` (or any model name from your config)
This gives HA natural language control:
- "Turn off all lights downstairs" --> local LLM understands --> calls HA service
- "What's my battery charge level?" --> queries HA entities --> responds
**Option 2: Native Ollama Integration**
Settings > Integrations > Ollama:
- URL: `http://<llm-server-ip>:11434`
- Simpler but bypasses the routing layer
### Voice Assistant Pipeline
```
Wake word detected ("Hey Jarvis")
|
Whisper (speech-to-text, local)
|
Query text
|
Extended OpenAI Conversation
|
LiteLLM Proxy (routing)
|
Response text
|
Piper (text-to-speech, local)
|
Speaker output
```
---
## Sources
- https://docs.litellm.ai/
- https://github.com/open-webui/open-webui
- https://console.anthropic.com/
- https://docs.x.ai/developers/models
- https://github.com/jekalmin/extended_openai_conversation
- https://github.com/aurelio-labs/semantic-router