Model Routing
Intelligent model selection that balances developer requirements with user preferences
WebLLM's router is what makes the protocol feel magical. Instead of hardcoding model names likegpt-4-turbo-2024-01-15, developers describe what they need. The router automatically selects the best available model based on:
Developer Requirements
Task type, quality, speed, capabilities
Model Capabilities
16 scoring criteria, benchmarks
User Preferences
Priorities, budget, privacy settings
How Routing Works
Instead of specifying a model, developers describe what they need:
Available Task Types:
The router first determines which models are actually available:
User has Anthropic API key configured
User has OpenAI API key configured
Local Llama 70B model downloaded
Google Gemini not configured
Gateway fallback available
Each available model is scored across 16 different criteria:
Performance
- • Speed / Latency
- • Throughput
- • Tokens per second
Quality
- • Accuracy
- • Overall quality
- • Instruction following
Capabilities
- • Coding proficiency
- • Reasoning ability
- • Math & logic
- • Creative writing
Practical
- • Cost efficiency
- • Context length
- • Output length
- • Reliability
- • Privacy features
User preferences are applied as multipliers to the base scores:
User Priority Settings:
1. Anthropic (Claude) - Priority: 100
2. Local Models - Priority: 80
3. OpenAI - Priority: 60
4. DeepSeek - Priority: 40
Final Scores:
claude-sonnet-4: 0.91 × 1.0 = 0.91 ✅ Winner
llama-70b-local: 0.68 × 0.8 = 0.54
gpt-4o: 0.91 × 0.6 = 0.55
deepseek-coder: 0.89 × 0.4 = 0.36
The router creates a prioritized fallback chain in case the primary model fails:
Primary: Claude Sonnet 4 (0.91)
Fallback: Llama 70B Local (0.54)
Fallback: GPT-4o (0.55)
Fallback: Gateway
If Claude fails (rate limit, API down, etc.), the router automatically tries the next option. Your app never breaks.
Real-World Example
Developer's Request:
Available Models
✅ Claude Sonnet 4
✅ GPT-4o
✅ DeepSeek Coder
✅ Llama 70B
Base Scores
Claude: 0.91
GPT-4o: 0.89
DeepSeek: 0.87
Llama: 0.68
With Priorities
Claude: 0.91 ✅
Llama: 0.54
GPT-4o: 0.53
DeepSeek: 0.35
✅ Decision: Claude Sonnet 4
- • Highest coding capability score (0.95)
- • Matches quality requirement
- • User's highest priority provider
- • User already pays for Claude Pro (maximize value)
Fallback chain: Llama 70B → GPT-4o → DeepSeek → Gateway
Why Intelligent Routing Matters
• GPT-5 launches? Users get it automatically
• Claude 4 comes out? Router adapts immediately
• New providers? Add in 30 lines of code
• Developer code never changes
• Automatically use cheaper models when appropriate
• Respect user budget constraints
• Maximize user's existing subscriptions
• Balance cost vs quality intelligently
• Automatic fallback if primary fails
• No single point of failure
• Handles rate limits gracefully
• Your app never breaks
• Task-optimized model selection
• Speed vs quality trade-offs
• Capability matching
• Intelligent benchmarking
Best Practices
✅ Always specify task type
Use specific task types (coding, creative, etc.) to help the router choose the best model for your use case.
✅ Provide quality/speed hints
Give the router guidance on whether you prioritize speed or quality. This helps balance trade-offs.
✅ Set reasonable constraints
Use cost and latency constraints to ensure the router respects your requirements and user budgets.
❌ Don't hardcode model names
Avoid specifying exact models. Let the router choose. Your code stays future-proof and works with whatever the user has.
Next Steps
Experiment with different scenarios and see how the router makes decisions
Open Playground →See all available task types, hints, and constraints
View API Docs →