Gateway Federation
Connect gateways together to create a distributed inference network
Gateway federation enables WebLLM instances to connect to other gateways, creating a true "web of computing nodes". This enables fallback, load distribution, and geographic routing across multiple gateway instances.
┌─────────────────┐
│ User's App │
│ (Browser) │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Chrome Extension │ │ Remote Gateway │
│ (Local WebLLM) │ │ (Token-Gated) │
└────────┬─────────┘ └────────┬─────────┘
│ │
│ Provider Selection │
▼ ▼
┌────────────────────────────────────────────────┐
│ Provider Priority List │
│ 1. OpenAI (if API key configured) │
│ 2. Anthropic (if API key configured) │
│ 3. WebLLM Gateway @ company-gateway.com ←── │
│ 4. Resource Pool (community) │
│ 5. Ollama (if running locally) │
└────────────────────────────────────────────────┘Use Cases
If your primary providers fail, requests automatically route to a backup gateway. Keep your app running even during outages.
Spread requests across multiple gateways to avoid rate limits and improve response times under heavy load.
Route users to the nearest gateway for lower latency. Deploy gateways in multiple regions for global coverage.
Run your own gateway with custom providers and API keys, while falling back to public infrastructure when needed.
Configuration
Add WebLLM Gateway Provider
In your Chrome extension or Node daemon, add the "WebLLM Gateway" provider.
Extension UI: Providers → Add → WebLLM Gateway
Configure Gateway URL
Enter the URL of the remote gateway you want to connect to.
Gateway URL: https://gateway.your-company.com
Add Authentication (Optional)
For private gateways, provide authentication credentials.
API Key
Full access for server-to-server
X-Gateway-Key: sk-...Access Token
Limited access with quotas
Bearer: wlm-abc...Set Priority
Drag the provider in the list to set its priority relative to other providers. Higher priority providers are tried first.
Programmatic Configuration
Configure the WebLLM Gateway provider programmatically:
// Add a WebLLM Gateway provider to your configuration
const providerConfig = {
id: 'webllm-server',
name: 'Company Gateway',
enabled: true,
priority: 3, // After direct API providers
config: {
gatewayUrl: 'https://gateway.your-company.com',
// For token-gated access:
accessToken: 'wlm-abc123.eyJ...',
// OR for full API access:
apiKey: 'sk-webllm-gateway-...',
// Request timeout (optional)
timeout: 30000,
}
};
// The provider will:
// 1. Check gateway health on /api/v1/health
// 2. Forward requests to /api/v1/inference
// 3. Stream responses via Server-Sent Events
// 4. Handle quota/auth errors gracefullyRequest Flow
Error Handling
The WebLLM Gateway provider handles errors gracefully and provides helpful messages:
Authentication Failed
Invalid or expired token/API key
Access Denied
Origin not in allowed domains list
Quota Exceeded
Token usage limit reached for this period
Gateway Unavailable
Gateway is down or no providers available
When a gateway returns an error, WebLLM automatically tries the next provider in the priority list.
Security Considerations
Circular Reference Prevention
Avoid configuring Gateway A → Gateway B → Gateway A loops. Each gateway should only route to lower-priority backends.
Credential Security
Access tokens are safe to use client-side (domain-locked). API keys should only be used in server-side configurations.
Latency Overhead
Each gateway hop adds latency (~50-100ms). Use federation strategically for redundancy, not as the primary path.