WebLLM

Architecture

Browser-native LLM integration following a standard model

Think of it as: "Geolocation API for AI" - Install once, use your AI everywhere on the web

Vision: AI to the People

For Users

• Your AI, everywhere: Install once, use on any website

• Stop paying twice: Use ChatGPT Plus/Claude Pro across all sites

• Data control: Local models, per-site permissions, full transparency

• Choice: Switch between local, cloud, or company-provided models

For Developers

• Free infrastructure: Zero API costs, no key management

• 3 lines of code: navigator.llm.generate()

• Universal compatibility: Works with any model, future-proof

• Privacy built-in: GDPR/HIPAA ready, user-controlled

Three-Layer Architecture

Layer 1
Client SDK (@webllm/client)

Browser polyfill that adds navigator.llm API

• Auto-detects transport (Extension → Daemon → Gateway)

• Provides simple API: generateText(), streamText(), etc.

• Works in any browser once extension is installed

Layer 2
Server (@webllm/server)

Core orchestration running in extension service worker or Node.js daemon

• Request coordination and progress tracking

• Provider selection and routing (@webllm/router)

• Permission management (per-origin)

• Storage (IndexedDB) and usage tracking

Layer 3
Providers

14+ AI providers execute requests

• Cloud APIs: OpenAI, Anthropic, Google, Azure, DeepSeek, etc.

• Gateways: OpenRouter, Portkey, Developer Gateways

• Local: Ollama, LM Studio, browser-based models

Transport Modes

The client SDK automatically detects the best transport in priority order:

Extension

Priority: 1 (Highest)

When: Chrome extension installed

Best for: Desktop users

User brings their own AI, zero cost to developer

Daemon

Priority: 2

When: localhost:54321 responds

Best for: Development

Local testing with hot reload

Gateway

Priority: 3 (Fallback)

When: Developer provides token

Best for: Mobile, 100% coverage

Works on iOS/Android without installation

Developer Gateway Platform

Critical for mobile support during transition period. Developers create hosted gateways with their API keys, distribute limited tokens to users.

How It Works
1

Developer creates gateway

At gateway.webllm.org, inputs their API keys (OpenAI, Anthropic, etc.)

2

Get encoded token

Set limits: requests/day, tokens/month, expiration

webllm-gateway-abc123-5k-limit
3

Users make requests

Works on mobile! No extension/daemon needed

4

Gateway proxies to provider

Enforces limits, tracks usage, returns response

Mobile Users

✓ Works on iOS/Android

✓ Zero installation

✓ Same API everywhere

Developers

✓ No server needed

✓ Usage control per token

✓ Monitor and revoke

Architecture

✓ No backend secrets

✓ Client-side tools

✓ Direct execution

Client-Side Tool Execution

One of the most powerful architectural benefits: AI tools execute directly in the browser with zero server round-trips.

Why This Matters

No Server Round-Trips: Tools execute immediately in the browser

No Secret Handling: Developer's API keys stay in gateway, not in backend code

Simpler Architecture: Modern apps just need client-side code

Better UX: Instant UI updates, no latency from server relay

Rich Interactions: AI can directly manipulate DOM, play sounds, trigger animations

Only Fetch What's Needed: Get AI response from gateway, execute tools locally

Example Use Cases

• Interactive UIs: Theme changes, layout adjustments

• Media Control: Play sounds, show images, control video

• Form Interactions: Auto-fill, validate, show/hide fields

• Data Visualization: Update charts, graphs, tables

• Game Logic: AI-driven state changes

• Accessibility: Dynamic ARIA updates

Code Example
tools: [
  {
    name: 'change_theme',
    execute: (params) => {
      changeTheme(params.theme);
      return { success: true };
    }
  },
  {
    name: 'play_sound',
    execute: () => {
      new Audio('/notify.mp3').play();
      return { success: true };
    }
  }
]

Request Flow

1

Permission Check (0-20%)

Verify site has user permission (like geolocation API)

2

Router Selection (20-40%)

@webllm/router scores models by 16 criteria, builds fallback chain

3

Provider Execution (40-80%)

Try primary provider, auto-fallback on failure

4

Storage (80-90%)

Save conversation to IndexedDB

5

Usage Tracking (90-100%)

Record tokens, cost, provider used

Key Components

RequestCoordinator: Orchestrates full pipeline with progress tracking

ProviderManager: Provider selection and routing using factory pattern

RouterManager: Integrates @webllm/router for intelligent model selection

PermissionManager: Per-origin permission system (like geolocation API)

UsageTracker: Tracks token consumption and cost

Provider Architecture

16+ providers supported via factory pattern. Add new providers in ~30 lines of code.

Cloud APIs

• Anthropic (Claude), OpenAI (GPT)

• Google Generative AI, Vertex AI

• Azure OpenAI, Mistral AI

• DeepSeek, Groq, Fireworks AI

• Together.ai, Cohere

Gateways & Local

• OpenRouter (100+ models)

• Portkey (unified API)

• Cloudflare Workers AI

• Ollama, LM Studio

• Local Browser (WebGPU/WASM)

Adding a New Provider

1. Define Provider Metadata

In @webllm/data registry: ID, name, type, category, tier, config fields

2. Create Provider Class (~30 lines)

Extend APIProvider for API providers, or BaseProvider for custom

3. Register Factory

One line in register-providers.ts

4. Done!

UI automatically shows correct category, tier, config form, connection testing

Data Storage

All data stored locally in IndexedDB (no external telemetry):

conversations
Request history with configurable retention
providers
API keys, configs, priorities
permissions
Per-origin access control
settings
User preferences, retention policy

Philosophy & Principles

User Sovereignty
You choose your AI provider, not each website
Developer Freedom
Build AI features without infrastructure costs
Privacy by Design
Local processing is always an option
Open Standards
Moving toward W3C browser API standardization
Universal Compatibility
Works with any model, new models added automatically
Zero Vendor Lock-in
Switch providers without changing code

Key Takeaways

  • ✓ WebLLM is like the Geolocation API, but for AI - Standardized browser API for LLM access
  • ✓ Three-layer architecture: Client SDK → Server (orchestration) → Providers
  • ✓ Three transport modes: Extension (desktop) → Daemon (dev) → Gateway (mobile/fallback)
  • ✓ Developer Gateway Platform: Solves mobile problem with hosted gateways (critical for transition!)
  • ✓ Client-side tool execution: AI tools run in browser - instant UI updates, no server round-trips
  • ✓ Intelligent routing: @webllm/router selects best model based on 16 criteria
  • ✓ Privacy-first: Local models, per-site permissions, transparent logging
  • ✓ Developer-friendly: 3 lines of code, zero infrastructure, no backend secrets needed
  • ✓ User control: Choose AI, approve sites, switch models, track usage
  • ✓ 16+ providers supported via factory pattern (30 lines to add new)
  • ✓ Automatic fallback: Graceful provider switching on failure
  • ✓ Heading to W3C: Extension is phase 1, native browser API is the goal

Learn More

Browser Usage

Get started using WebLLM in the browser

View Guide →
Providers

Learn about supported AI providers

View Provider Docs →
Gateway System

Set up developer gateways for mobile support

Configure Gateway →
Daemon

Run the Node.js daemon for development

View Daemon Guide →
Security

Understand WebLLM's security model

View Security Docs →
Playground

Test WebLLM features in the playground

Open Playground →