WebLLM

Gateway Federation

Connect gateways together to create a distributed inference network

Gateway federation enables WebLLM instances to connect to other gateways, creating a true "web of computing nodes". This enables fallback, load distribution, and geographic routing across multiple gateway instances.

Federation Architecture
                    ┌─────────────────┐
                    │   User's App    │
                    │   (Browser)     │
                    └────────┬────────┘
                             │
              ┌──────────────┴──────────────┐
              │                              │
              ▼                              ▼
   ┌──────────────────┐          ┌──────────────────┐
   │ Chrome Extension │          │  Remote Gateway  │
   │   (Local WebLLM) │          │  (Token-Gated)   │
   └────────┬─────────┘          └────────┬─────────┘
            │                              │
            │ Provider Selection           │
            ▼                              ▼
   ┌────────────────────────────────────────────────┐
   │              Provider Priority List            │
   │  1. OpenAI (if API key configured)             │
   │  2. Anthropic (if API key configured)          │
   │  3. WebLLM Gateway @ company-gateway.com  ←──  │
   │  4. Resource Pool (community)                  │
   │  5. Ollama (if running locally)                │
   └────────────────────────────────────────────────┘

Use Cases

Fallback & Redundancy

If your primary providers fail, requests automatically route to a backup gateway. Keep your app running even during outages.

Load Distribution

Spread requests across multiple gateways to avoid rate limits and improve response times under heavy load.

Geographic Routing

Route users to the nearest gateway for lower latency. Deploy gateways in multiple regions for global coverage.

Organization Gateways

Run your own gateway with custom providers and API keys, while falling back to public infrastructure when needed.

Configuration

1

Add WebLLM Gateway Provider

In your Chrome extension or Node daemon, add the "WebLLM Gateway" provider.

Extension UI: Providers → Add → WebLLM Gateway

2

Configure Gateway URL

Enter the URL of the remote gateway you want to connect to.

Gateway URL: https://gateway.your-company.com
3

Add Authentication (Optional)

For private gateways, provide authentication credentials.

API Key

Full access for server-to-server

X-Gateway-Key: sk-...

Access Token

Limited access with quotas

Bearer: wlm-abc...
4

Set Priority

Drag the provider in the list to set its priority relative to other providers. Higher priority providers are tried first.

Programmatic Configuration

Configure the WebLLM Gateway provider programmatically:

// Add a WebLLM Gateway provider to your configuration
const providerConfig = {
  id: 'webllm-server',
  name: 'Company Gateway',
  enabled: true,
  priority: 3, // After direct API providers
  config: {
    gatewayUrl: 'https://gateway.your-company.com',
    // For token-gated access:
    accessToken: 'wlm-abc123.eyJ...',
    // OR for full API access:
    apiKey: 'sk-webllm-gateway-...',
    // Request timeout (optional)
    timeout: 30000,
  }
};

// The provider will:
// 1. Check gateway health on /api/v1/health
// 2. Forward requests to /api/v1/inference
// 3. Stream responses via Server-Sent Events
// 4. Handle quota/auth errors gracefully

Request Flow

1
Health CheckGET /api/v1/health - Verify gateway is available
2
Send RequestPOST /api/v1/inference - Submit chat/completion request
3
Token ValidationGateway validates auth and checks quota
4
Provider SelectionGateway routes to best available provider
5
Stream ResponseResults streamed back via SSE (Server-Sent Events)

Error Handling

The WebLLM Gateway provider handles errors gracefully and provides helpful messages:

401

Authentication Failed

Invalid or expired token/API key

403

Access Denied

Origin not in allowed domains list

429

Quota Exceeded

Token usage limit reached for this period

503

Gateway Unavailable

Gateway is down or no providers available

When a gateway returns an error, WebLLM automatically tries the next provider in the priority list.

Security Considerations

Circular Reference Prevention

Avoid configuring Gateway A → Gateway B → Gateway A loops. Each gateway should only route to lower-priority backends.

Credential Security

Access tokens are safe to use client-side (domain-locked). API keys should only be used in server-side configurations.

Latency Overhead

Each gateway hop adds latency (~50-100ms). Use federation strategically for redundancy, not as the primary path.