WebLLM

Model Routing

Intelligent model selection that balances developer requirements with user preferences

The Secret Sauce

WebLLM's router is what makes the protocol feel magical. Instead of hardcoding model names likegpt-4-turbo-2024-01-15, developers describe what they need. The router automatically selects the best available model based on:

Developer Requirements

Task type, quality, speed, capabilities

Model Capabilities

16 scoring criteria, benchmarks

User Preferences

Priorities, budget, privacy settings

How Routing Works

1Developer Describes Requirements

Instead of specifying a model, developers describe what they need:

Available Task Types:

coding
creative
summarization
qa
educational
general
2Router Checks Availability

The router first determines which models are actually available:

User has Anthropic API key configured

User has OpenAI API key configured

Local Llama 70B model downloaded

Google Gemini not configured

Gateway fallback available

3Score Against 16 Criteria

Each available model is scored across 16 different criteria:

Performance

  • • Speed / Latency
  • • Throughput
  • • Tokens per second

Quality

  • • Accuracy
  • • Overall quality
  • • Instruction following

Capabilities

  • • Coding proficiency
  • • Reasoning ability
  • • Math & logic
  • • Creative writing

Practical

  • • Cost efficiency
  • • Context length
  • • Output length
  • • Reliability
  • • Privacy features

4Apply User Priorities

User preferences are applied as multipliers to the base scores:

User Priority Settings:

1. Anthropic (Claude) - Priority: 100

2. Local Models - Priority: 80

3. OpenAI - Priority: 60

4. DeepSeek - Priority: 40

Final Scores:

claude-sonnet-4: 0.91 × 1.0 = 0.91 ✅ Winner

llama-70b-local: 0.68 × 0.8 = 0.54

gpt-4o: 0.91 × 0.6 = 0.55

deepseek-coder: 0.89 × 0.4 = 0.36

5Build Fallback Chain

The router creates a prioritized fallback chain in case the primary model fails:

Primary: Claude Sonnet 4 (0.91)

Fallback: Llama 70B Local (0.54)

Fallback: GPT-4o (0.55)

Fallback: Gateway

If Claude fails (rate limit, API down, etc.), the router automatically tries the next option. Your app never breaks.

Real-World Example

Coding Task: "Write a Python function"
See the router in action

Developer's Request:

Available Models

✅ Claude Sonnet 4

✅ GPT-4o

✅ DeepSeek Coder

✅ Llama 70B

Base Scores

Claude: 0.91

GPT-4o: 0.89

DeepSeek: 0.87

Llama: 0.68

With Priorities

Claude: 0.91 ✅

Llama: 0.54

GPT-4o: 0.53

DeepSeek: 0.35

✅ Decision: Claude Sonnet 4

  • • Highest coding capability score (0.95)
  • • Matches quality requirement
  • • User's highest priority provider
  • • User already pays for Claude Pro (maximize value)

Fallback chain: Llama 70B → GPT-4o → DeepSeek → Gateway

Why Intelligent Routing Matters

Future-Proof Code

• GPT-5 launches? Users get it automatically

• Claude 4 comes out? Router adapts immediately

• New providers? Add in 30 lines of code

• Developer code never changes

Cost Optimization

• Automatically use cheaper models when appropriate

• Respect user budget constraints

• Maximize user's existing subscriptions

• Balance cost vs quality intelligently

Reliability

• Automatic fallback if primary fails

• No single point of failure

• Handles rate limits gracefully

• Your app never breaks

Best Performance

• Task-optimized model selection

• Speed vs quality trade-offs

• Capability matching

• Intelligent benchmarking

Best Practices

✅ Always specify task type

Use specific task types (coding, creative, etc.) to help the router choose the best model for your use case.

✅ Provide quality/speed hints

Give the router guidance on whether you prioritize speed or quality. This helps balance trade-offs.

✅ Set reasonable constraints

Use cost and latency constraints to ensure the router respects your requirements and user budgets.

❌ Don't hardcode model names

Avoid specifying exact models. Let the router choose. Your code stays future-proof and works with whatever the user has.

Next Steps

Try the Router Playground

Experiment with different scenarios and see how the router makes decisions

Open Playground →
View API Reference

See all available task types, hints, and constraints

View API Docs →