Files
claudetools/projects/wrightstown-smarthome/documentation/hybrid-bridge.md
Mike Swanson aaf4172b3c sync: Add Wrightstown Solar and Smart Home projects
New projects from 2026-02-09 research session:

Wrightstown Solar:
- DIY 48V LiFePO4 battery storage (EVE C40 cells)
- Victron MultiPlus II whole-house UPS design
- BMS comparison (Victron CAN bus compatible)
- EV salvage analysis (new cells won)
- Full parts list and budget

Wrightstown Smart Home:
- Home Assistant Yellow setup (local voice, no cloud)
- Local LLM server build guide (Ollama + RTX 4090)
- Hybrid LLM bridge (LiteLLM + Claude API + Grok API)
- Network security (VLAN architecture, PII sanitization)

Machine: ACG-M-L5090
Timestamp: 2026-02-09

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 18:44:35 -07:00

7.2 KiB

Hybrid LLM Bridge - Local + Cloud Routing

Created: 2026-02-09 Purpose: Route queries intelligently between local Ollama, Claude API, and Grok API


Architecture

User Query (voice, chat, HA automation)
              |
      [LiteLLM Proxy]
       localhost:4000
              |
     Routing Decision
     /       |        \
[Ollama]  [Claude]   [Grok]
 Local    Anthropic    xAI
 Free     Reasoning   Search
 Private  $3/$15/1M   $3/$15/1M

Unified API gateway that presents a single OpenAI-compatible endpoint. Everything talks to localhost:4000 and LiteLLM routes to the right backend.

Installation

pip install litellm[proxy]

Configuration (config.yaml)

model_list:
  # Local models (free, private)
  - model_name: local-fast
    litellm_params:
      model: ollama/qwen2.5:7b
      api_base: http://localhost:11434

  - model_name: local-reasoning
    litellm_params:
      model: ollama/llama3.1:70b-q4
      api_base: http://localhost:11434

  # Cloud: Claude (complex reasoning)
  - model_name: cloud-reasoning
    litellm_params:
      model: claude-sonnet-4-5-20250929
      api_key: sk-ant-XXXXX

  - model_name: cloud-reasoning-cheap
    litellm_params:
      model: claude-haiku-4-5-20251001
      api_key: sk-ant-XXXXX

  # Cloud: Grok (internet search)
  - model_name: cloud-search
    litellm_params:
      model: grok-4
      api_key: xai-XXXXX
      api_base: https://api.x.ai/v1

router_settings:
  routing_strategy: simple-shuffle
  allowed_fails: 2
  num_retries: 3

  budget_policy:
    local-fast: unlimited
    local-reasoning: unlimited
    cloud-reasoning: $50/month
    cloud-reasoning-cheap: $25/month
    cloud-search: $25/month

Start the Proxy

litellm --config config.yaml --port 4000

Usage

Everything talks to http://localhost:4000 with OpenAI-compatible format:

import openai

client = openai.OpenAI(
    api_key="anything",  # LiteLLM doesn't need this for local
    base_url="http://localhost:4000"
)

# Route to local
response = client.chat.completions.create(
    model="local-fast",
    messages=[{"role": "user", "content": "Turn on the lights"}]
)

# Route to Claude
response = client.chat.completions.create(
    model="cloud-reasoning",
    messages=[{"role": "user", "content": "Analyze my energy usage patterns"}]
)

# Route to Grok
response = client.chat.completions.create(
    model="cloud-search",
    messages=[{"role": "user", "content": "What's the current electricity rate in PA?"}]
)

Routing Strategy

What Goes Where

Local (Ollama) -- Default for everything private:

  • Home automation commands ("turn on lights", "set thermostat to 72")
  • Sensor data queries ("what's the temperature in the garage?")
  • Camera-related queries (never send video to cloud)
  • Personal information queries
  • Simple Q&A
  • Quick lookups from local knowledge

Claude API -- Complex reasoning tasks:

  • Detailed analysis ("analyze my energy trends this month")
  • Code generation ("write an HA automation for...")
  • Long-form content creation
  • Multi-step reasoning problems
  • Function calling for HA service control

Grok API -- Internet/real-time data:

  • Current events ("latest news on solar tariffs")
  • Real-time pricing ("current electricity rates")
  • Weather data (if not using local integration)
  • Web searches
  • Anything requiring information the local model doesn't have

Manual vs Automatic Routing

Phase 1 (Start here): Manual model selection

  • User picks "local-fast", "cloud-reasoning", or "cloud-search" in Open WebUI
  • Simple, no mistakes, full control
  • Good for learning which queries work best where

Phase 2 (Later): Keyword-based routing in LiteLLM

  • Route based on keywords in the query
  • "search", "latest", "current" --> Grok
  • "analyze", "explain in detail", "write code" --> Claude
  • Everything else --> local

Phase 3 (Advanced): Semantic routing

  • Use sentence embeddings to classify query intent
  • Small local model (all-MiniLM-L6-v2) classifies in 50-200ms
  • Most intelligent routing, but requires Python development

Cloud API Details

Claude (Anthropic)

Endpoint: https://api.anthropic.com/v1/messages Get API key: https://console.anthropic.com/

Pricing (2025-2026):

Model Input/1M tokens Output/1M tokens Best For
Claude Haiku 4.5 $0.50 $2.50 Fast, cheap tasks
Claude Sonnet 4.5 $3.00 $15.00 Best balance
Claude Opus 4.5 $5.00 $25.00 Top quality

Cost optimization:

  • Prompt caching: 90% savings on repeated system prompts
  • Use Haiku for simple tasks, Sonnet for complex ones
  • Batch processing available for non-urgent tasks

Features:

  • 200k context window
  • Extended thinking mode
  • Function calling (perfect for HA control)
  • Vision support (could analyze charts, screenshots)

Grok (xAI)

Endpoint: https://api.x.ai/v1/chat/completions Get API key: https://console.x.ai/ Format: OpenAI SDK compatible

Pricing:

Model Input/1M tokens Output/1M tokens Best For
Grok 4.1 Fast $0.20 $1.00 Budget queries
Grok 4 $3.00 $15.00 Full capability

Free credits: $25 new user + $150/month if opting into data sharing program

Features:

  • 2 million token context window (industry-leading)
  • Real-time X (Twitter) integration
  • Internet search capability
  • OpenAI SDK compatibility

Monthly Cost Estimates

Conservative Use (80/15/5 Split, 1000 queries/month)

Route Queries Model Cost
Local (80%) 800 Ollama $0
Claude (15%) 150 Haiku 4.5 ~$0.45
Grok (5%) 50 Grok 4.1 Fast ~$0.07
Total 1000 ~$0.52/month

Heavy Use (60/25/15 Split, 3000 queries/month)

Route Queries Model Cost
Local (60%) 1800 Ollama $0
Claude (25%) 750 Sonnet 4.5 ~$15
Grok (15%) 450 Grok 4 ~$9
Total 3000 ~$24/month

Add electricity for LLM server: ~$15-30/month (RTX 4090 build)


Home Assistant Integration

Connect HA to LiteLLM Proxy

Option 1: Extended OpenAI Conversation (Recommended)

Install via HACS, then configure:

  • API Base URL: http://<llm-server-ip>:4000/v1
  • API Key: (any string, LiteLLM doesn't validate for local)
  • Model: local-fast (or any model name from your config)

This gives HA natural language control:

  • "Turn off all lights downstairs" --> local LLM understands --> calls HA service
  • "What's my battery charge level?" --> queries HA entities --> responds

Option 2: Native Ollama Integration

Settings > Integrations > Ollama:

  • URL: http://<llm-server-ip>:11434
  • Simpler but bypasses the routing layer

Voice Assistant Pipeline

Wake word detected ("Hey Jarvis")
         |
   Whisper (speech-to-text, local)
         |
   Query text
         |
   Extended OpenAI Conversation
         |
   LiteLLM Proxy (routing)
         |
   Response text
         |
   Piper (text-to-speech, local)
         |
   Speaker output

Sources