Files

Mike Swanson aaf4172b3c sync: Add Wrightstown Solar and Smart Home projects

New projects from 2026-02-09 research session:

Wrightstown Solar:
- DIY 48V LiFePO4 battery storage (EVE C40 cells)
- Victron MultiPlus II whole-house UPS design
- BMS comparison (Victron CAN bus compatible)
- EV salvage analysis (new cells won)
- Full parts list and budget

Wrightstown Smart Home:
- Home Assistant Yellow setup (local voice, no cloud)
- Local LLM server build guide (Ollama + RTX 4090)
- Hybrid LLM bridge (LiteLLM + Claude API + Grok API)
- Network security (VLAN architecture, PII sanitization)

Machine: ACG-M-L5090
Timestamp: 2026-02-09

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-10 18:44:35 -07:00

7.2 KiB

Raw Blame History

Hybrid LLM Bridge - Local + Cloud Routing

Created: 2026-02-09 Purpose: Route queries intelligently between local Ollama, Claude API, and Grok API

Architecture

User Query (voice, chat, HA automation)
              |
      [LiteLLM Proxy]
       localhost:4000
              |
     Routing Decision
     /       |        \
[Ollama]  [Claude]   [Grok]
 Local    Anthropic    xAI
 Free     Reasoning   Search
 Private  $3/$15/1M   $3/$15/1M

Recommended: LiteLLM Proxy

Unified API gateway that presents a single OpenAI-compatible endpoint. Everything talks to localhost:4000 and LiteLLM routes to the right backend.

Installation

pip install litellm[proxy]

Configuration (`config.yaml`)

model_list:
  # Local models (free, private)
  - model_name: local-fast
    litellm_params:
      model: ollama/qwen2.5:7b
      api_base: http://localhost:11434

  - model_name: local-reasoning
    litellm_params:
      model: ollama/llama3.1:70b-q4
      api_base: http://localhost:11434

  # Cloud: Claude (complex reasoning)
  - model_name: cloud-reasoning
    litellm_params:
      model: claude-sonnet-4-5-20250929
      api_key: sk-ant-XXXXX

  - model_name: cloud-reasoning-cheap
    litellm_params:
      model: claude-haiku-4-5-20251001
      api_key: sk-ant-XXXXX

  # Cloud: Grok (internet search)
  - model_name: cloud-search
    litellm_params:
      model: grok-4
      api_key: xai-XXXXX
      api_base: https://api.x.ai/v1

router_settings:
  routing_strategy: simple-shuffle
  allowed_fails: 2
  num_retries: 3

  budget_policy:
    local-fast: unlimited
    local-reasoning: unlimited
    cloud-reasoning: $50/month
    cloud-reasoning-cheap: $25/month
    cloud-search: $25/month

Start the Proxy

litellm --config config.yaml --port 4000

Usage

Everything talks to http://localhost:4000 with OpenAI-compatible format:

import openai

client = openai.OpenAI(
    api_key="anything",  # LiteLLM doesn't need this for local
    base_url="http://localhost:4000"
)

# Route to local
response = client.chat.completions.create(
    model="local-fast",
    messages=[{"role": "user", "content": "Turn on the lights"}]
)

# Route to Claude
response = client.chat.completions.create(
    model="cloud-reasoning",
    messages=[{"role": "user", "content": "Analyze my energy usage patterns"}]
)

# Route to Grok
response = client.chat.completions.create(
    model="cloud-search",
    messages=[{"role": "user", "content": "What's the current electricity rate in PA?"}]
)

Routing Strategy

What Goes Where

Local (Ollama) -- Default for everything private:

Home automation commands ("turn on lights", "set thermostat to 72")
Sensor data queries ("what's the temperature in the garage?")
Camera-related queries (never send video to cloud)
Personal information queries
Simple Q&A
Quick lookups from local knowledge

Claude API -- Complex reasoning tasks:

Detailed analysis ("analyze my energy trends this month")
Code generation ("write an HA automation for...")
Long-form content creation
Multi-step reasoning problems
Function calling for HA service control

Grok API -- Internet/real-time data:

Current events ("latest news on solar tariffs")
Real-time pricing ("current electricity rates")
Weather data (if not using local integration)
Web searches
Anything requiring information the local model doesn't have

Manual vs Automatic Routing

Phase 1 (Start here): Manual model selection

User picks "local-fast", "cloud-reasoning", or "cloud-search" in Open WebUI
Simple, no mistakes, full control
Good for learning which queries work best where

Phase 2 (Later): Keyword-based routing in LiteLLM

Route based on keywords in the query
"search", "latest", "current" --> Grok
"analyze", "explain in detail", "write code" --> Claude
Everything else --> local

Phase 3 (Advanced): Semantic routing

Use sentence embeddings to classify query intent
Small local model (all-MiniLM-L6-v2) classifies in 50-200ms
Most intelligent routing, but requires Python development

Cloud API Details

Claude (Anthropic)

Endpoint: https://api.anthropic.com/v1/messages Get API key: https://console.anthropic.com/

Pricing (2025-2026):

Model	Input/1M tokens	Output/1M tokens	Best For
Claude Haiku 4.5	$0.50	$2.50	Fast, cheap tasks
Claude Sonnet 4.5	$3.00	$15.00	Best balance
Claude Opus 4.5	$5.00	$25.00	Top quality

Cost optimization:

Prompt caching: 90% savings on repeated system prompts
Use Haiku for simple tasks, Sonnet for complex ones
Batch processing available for non-urgent tasks

Features:

200k context window
Extended thinking mode
Function calling (perfect for HA control)
Vision support (could analyze charts, screenshots)

Grok (xAI)

Endpoint: https://api.x.ai/v1/chat/completions Get API key: https://console.x.ai/ Format: OpenAI SDK compatible

Pricing:

Model	Input/1M tokens	Output/1M tokens	Best For
Grok 4.1 Fast	$0.20	$1.00	Budget queries
Grok 4	$3.00	$15.00	Full capability

Free credits: $25 new user + $150/month if opting into data sharing program

Features:

2 million token context window (industry-leading)
Real-time X (Twitter) integration
Internet search capability
OpenAI SDK compatibility

Monthly Cost Estimates

Conservative Use (80/15/5 Split, 1000 queries/month)

Route	Queries	Model	Cost
Local (80%)	800	Ollama	$0
Claude (15%)	150	Haiku 4.5	~$0.45
Grok (5%)	50	Grok 4.1 Fast	~$0.07
Total	1000		~$0.52/month

Heavy Use (60/25/15 Split, 3000 queries/month)

Route	Queries	Model	Cost
Local (60%)	1800	Ollama	$0
Claude (25%)	750	Sonnet 4.5	~$15
Grok (15%)	450	Grok 4	~$9
Total	3000		~$24/month

Add electricity for LLM server: ~$15-30/month (RTX 4090 build)

Home Assistant Integration

Connect HA to LiteLLM Proxy

Option 1: Extended OpenAI Conversation (Recommended)

Install via HACS, then configure:

API Base URL: http://<llm-server-ip>:4000/v1
API Key: (any string, LiteLLM doesn't validate for local)
Model: local-fast (or any model name from your config)

This gives HA natural language control:

"Turn off all lights downstairs" --> local LLM understands --> calls HA service
"What's my battery charge level?" --> queries HA entities --> responds

Option 2: Native Ollama Integration

Settings > Integrations > Ollama:

URL: http://<llm-server-ip>:11434
Simpler but bypasses the routing layer

Voice Assistant Pipeline

Wake word detected ("Hey Jarvis")
         |
   Whisper (speech-to-text, local)
         |
   Query text
         |
   Extended OpenAI Conversation
         |
   LiteLLM Proxy (routing)
         |
   Response text
         |
   Piper (text-to-speech, local)
         |
   Speaker output

7.2 KiB Raw Blame History