Model Routing

Intelligent model selection that balances developer requirements with user preferences

The Secret Sauce

WebLLM's router is what makes the protocol feel magical. Instead of hardcoding model names likegpt-4-turbo-2024-01-15, developers describe what they need. The router automatically selects the best available model based on:

Developer Requirements

Task type, quality, speed, capabilities

Model Capabilities

16 scoring criteria, benchmarks

User Preferences

Priorities, budget, privacy settings

How Routing Works

1Developer Describes Requirements

Instead of specifying a model, developers describe what they need:

Available Task Types:

coding

creative

summarization

educational

general

2Router Checks Availability

The router first determines which models are actually available:

✅

User has Anthropic API key configured

✅

User has OpenAI API key configured

✅

Local Llama 70B model downloaded

❌

Google Gemini not configured

✅

Gateway fallback available

3Score Against 16 Criteria

Each available model is scored across 16 different criteria:

Performance

• Speed / Latency
• Throughput
• Tokens per second

Quality

• Accuracy
• Overall quality
• Instruction following

Capabilities

• Coding proficiency
• Reasoning ability
• Math & logic
• Creative writing

Practical

• Cost efficiency
• Context length
• Output length
• Reliability
• Privacy features

4Apply User Priorities

User preferences are applied as multipliers to the base scores:

User Priority Settings:

1. Anthropic (Claude) - Priority: 100

2. Local Models - Priority: 80

3. OpenAI - Priority: 60

4. DeepSeek - Priority: 40

Final Scores:

claude-sonnet-4: 0.91 × 1.0 = 0.91 ✅ Winner

llama-70b-local: 0.68 × 0.8 = 0.54

gpt-4o: 0.91 × 0.6 = 0.55

deepseek-coder: 0.89 × 0.4 = 0.36

5Build Fallback Chain

The router creates a prioritized fallback chain in case the primary model fails:

Primary: Claude Sonnet 4 (0.91)

Fallback: Llama 70B Local (0.54)

Fallback: GPT-4o (0.55)

Fallback: Gateway

If Claude fails (rate limit, API down, etc.), the router automatically tries the next option. Your app never breaks.

Real-World Example

Coding Task: "Write a Python function"

See the router in action

Developer's Request:

Available Models

✅ Claude Sonnet 4

✅ GPT-4o

✅ DeepSeek Coder

✅ Llama 70B

Base Scores

Claude: 0.91

GPT-4o: 0.89

DeepSeek: 0.87

Llama: 0.68

With Priorities

Claude: 0.91 ✅

Llama: 0.54

GPT-4o: 0.53

DeepSeek: 0.35

✅ Decision: Claude Sonnet 4

• Highest coding capability score (0.95)
• Matches quality requirement
• User's highest priority provider
• User already pays for Claude Pro (maximize value)

Fallback chain: Llama 70B → GPT-4o → DeepSeek → Gateway

Why Intelligent Routing Matters

Future-Proof Code

• GPT-5 launches? Users get it automatically

• Claude 4 comes out? Router adapts immediately

• New providers? Add in 30 lines of code

• Developer code never changes

Cost Optimization

• Automatically use cheaper models when appropriate

• Respect user budget constraints

• Maximize user's existing subscriptions

• Balance cost vs quality intelligently

Reliability

• Automatic fallback if primary fails

• No single point of failure

• Handles rate limits gracefully

• Your app never breaks

Best Performance

• Task-optimized model selection

• Speed vs quality trade-offs

• Capability matching

• Intelligent benchmarking

Best Practices

✅ Always specify task type

Use specific task types (coding, creative, etc.) to help the router choose the best model for your use case.

✅ Provide quality/speed hints

Give the router guidance on whether you prioritize speed or quality. This helps balance trade-offs.

✅ Set reasonable constraints

Use cost and latency constraints to ensure the router respects your requirements and user budgets.

❌ Don't hardcode model names

Avoid specifying exact models. Let the router choose. Your code stays future-proof and works with whatever the user has.

Next Steps

Try the Router Playground

Experiment with different scenarios and see how the router makes decisions

Open Playground →

View API Reference

See all available task types, hints, and constraints