WebLLM

Adding AI to your web app currently means one of two things: set up server infrastructure to call AI APIs, or figure out complex browser-based ML frameworks. Neither is great.

What if adding AI was as simple as adding geolocation?

// This is what WebLLM enables
const position = await navigator.geolocation.getCurrentPosition(); // Location
const result = await navigator.llm.prompt('Summarize this page'); // AI

WebLLM is building toward a future where AI is a native browser capability—available through a simple API, controlled by users, and working with any AI provider the user chooses.

This guide explains what WebLLM is, how it works, and why it matters for the future of web development.

The Problem WebLLM Solves

The Current State of Web AI

Option 1: Server-side AI

// Your server calls OpenAI, Anthropic, etc.
const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ message: userInput }),
});

Problems:

You need server infrastructure
You pay for API costs
You manage API keys and rate limits
User data leaves their device
Latency for every request
You're locked to providers you integrate

Option 2: Client-side ML libraries

// Complex setup with transformers.js or similar
import { pipeline } from '@xenova/transformers';

// Downloads 100MB+ models
const classifier = await pipeline('sentiment-analysis');
const result = await classifier(text);

Problems:

Large downloads for each model
Limited to specific models
Complex implementation
No user choice in providers
Different APIs for different tasks

Option 3: Native apps

Problems:

Not the web
Platform fragmentation
Installation friction
Distribution challenges

What WebLLM Provides

WebLLM introduces navigator.llm—a browser API for AI that follows established web platform patterns:

// Simple, permission-gated AI access
if ('llm' in navigator) {
  const result = await navigator.llm.prompt('Explain this concept');
  console.log(result);
}

This is what WebLLM enables:

One API for all AI providers
User controls which AI processes their data
Privacy options including fully local AI
No server needed for many use cases
Permission-based like camera/location

How WebLLM Works

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                        Your Web App                          │
├─────────────────────────────────────────────────────────────┤
│                     navigator.llm API                        │
├─────────────────────────────────────────────────────────────┤
│                    WebLLM Extension                          │
├──────────────┬──────────────┬──────────────┬───────────────┤
│    OpenAI    │   Anthropic  │    Ollama    │   On-Device   │
│    (Cloud)   │   (Cloud)    │   (Local)    │   (WebGPU)    │
└──────────────┴──────────────┴──────────────┴───────────────┘

Layer 1: Your Web App

Calls navigator.llm.prompt() or similar methods
Doesn't know or care which provider fulfills the request
Works the same regardless of user's AI setup

Layer 2: WebLLM API

Provides the navigator.llm interface
Handles permissions (like geolocation prompts)
Routes requests to configured providers
Currently delivered via browser extension

Layer 3: Providers

Cloud APIs: OpenAI, Anthropic, Google, etc.
Local servers: Ollama, LM Studio
On-device: WebGPU-based inference
User configures which providers to use

The Extension: A Polyfill for the Future

Today, navigator.llm doesn't exist natively in browsers. WebLLM provides it through a Chrome extension.

// The extension injects the API
window.navigator.llm = {
  prompt: async (input) => {
    /* ... */
  },
  streamPrompt: async function* (input) {
    /* ... */
  },
  requestPermission: async () => {
    /* ... */
  },
  // ...
};

This is the same pattern used to prototype other browser APIs:

Service Workers were tested via polyfills before shipping
Push notifications had experimental implementations first
WebGPU had origin trials before stable release

The extension proves the API design. When browsers eventually add native AI support, apps built on WebLLM patterns will work without changes.

The Permission Model

WebLLM follows the browser permission pattern:

// Explicit permission request (optional - also happens on first use)
const permission = await navigator.llm.requestPermission();

if (permission === 'granted') {
  // AI available
} else if (permission === 'denied') {
  // User declined
} else {
  // Permission not yet determined
}

The permission prompt shows:

Which site is requesting AI access
Clear Allow/Deny options
Remembers decision per-site
Revocable via extension settings

This is familiar to users from camera, microphone, and location permissions.

The API

Basic Usage

// Simple prompt (non-streaming)
const response = await navigator.llm.prompt('What is the capital of France?');
console.log(response); // "The capital of France is Paris."

Streaming Responses

// Streaming for real-time display
const output = document.getElementById('output');
output.textContent = '';

const stream = navigator.llm.streamPrompt('Write a short story about a robot');

for await (const chunk of stream) {
  output.textContent += chunk;
}

System Messages and Context

// Create a session with system context
const session = await navigator.llm.createSession({
  system: 'You are a helpful coding assistant. Be concise.',
});

// Conversation maintains context
const response1 = await session.prompt('How do I center a div?');
const response2 = await session.prompt('What about vertical centering?');
// Second response understands we're still discussing CSS

Checking Capabilities

// Check if WebLLM is available
if ('llm' in navigator) {
  // Check permission status
  const status = await navigator.permissions.query({ name: 'llm' });

  if (status.state === 'granted') {
    // Ready to use
  } else if (status.state === 'prompt') {
    // Will prompt on first use
  } else {
    // Denied
  }
} else {
  // WebLLM not available (extension not installed)
}

Error Handling

try {
  const result = await navigator.llm.prompt('Hello');
} catch (error) {
  if (error.name === 'PermissionDenied') {
    // User denied AI access
  } else if (error.name === 'NoProviderAvailable') {
    // No AI providers configured
  } else if (error.name === 'ProviderError') {
    // Provider returned an error (rate limit, invalid key, etc.)
  }
}

Provider System

How Providers Work

Users configure providers in the WebLLM extension:

Extension Settings:
┌─────────────────────────────────────────┐
│ AI Providers                            │
├─────────────────────────────────────────┤
│ ✓ Ollama (Local)         [Priority: 1] │
│   Running on localhost:11434            │
│                                         │
│ ✓ OpenAI                 [Priority: 2] │
│   API Key: sk-...                       │
│                                         │
│ ○ Anthropic              [Priority: 3] │
│   Not configured                        │
└─────────────────────────────────────────┘

When your app calls navigator.llm.prompt():

WebLLM checks configured providers
Uses highest-priority available provider
Falls back to next provider on failure
Returns result (app doesn't know which provider)

Supported Providers

Cloud APIs:

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude)
Google (Gemini)
And many more...

Local/Self-hosted:

Ollama
LM Studio
llama.cpp
Any OpenAI-compatible endpoint

On-device:

WebGPU-based inference (experimental)
Smaller models that run entirely in browser

Why Provider Agnosticism Matters

For developers:

Write once, works with any AI
No need to support multiple APIs
No API key management in your app
Users bring their own AI

For users:

Choose their preferred AI
Use local AI for privacy
Use different AI for different sites
No per-app subscriptions

Privacy Model

User Control Spectrum

Full Cloud ◄──────────────────────────────► Full Local

OpenAI/Anthropic │ Hybrid │ Ollama/LM Studio │ On-device
                 │        │                   │
                 │        │                   └─ No network
                 │        └─ Local server, no cloud
                 └─ Local first, cloud fallback

Users choose where they are on this spectrum. Your app works regardless.

Privacy-First Use Cases

Sensitive applications:

Health journaling
Financial planning
Personal notes
Therapy aids

For these, users can configure local-only providers:

Data never leaves their device
No cloud API calls
Full privacy

Your app doesn't change:

// Same code for cloud or local
const result = await navigator.llm.prompt(userInput);

Permission Visibility

Like camera permissions, users can:

See which sites have AI access
Revoke access per-site
See when AI is being used (optional indicator)
Control provider selection per-site

Current Status

What's Working Now

The Extension:

Chrome extension available
navigator.llm API functional
Multiple providers supported
Permission system working

Provider Support:

OpenAI (GPT-4, GPT-3.5-turbo)
Anthropic (Claude)
Ollama (local models)
LM Studio
Many OpenAI-compatible endpoints

API Features:

prompt() - Simple completion
streamPrompt() - Streaming responses
Session management
Error handling

What's Experimental

On-device inference via WebGPU
Model download/caching
Multi-modal support (images)
Tool use / function calling

What's Planned

Firefox/Safari extensions
Native browser integration proposals
More providers
Advanced features (embeddings, fine-tuning access)

Why This Architecture?

Lessons from Other Browser APIs

Geolocation succeeded because:

Simple API (getCurrentPosition)
User permission required
Works regardless of GPS source
Browser mediates, not website

WebLLM follows the same principles:

Simple API (prompt)
User permission required
Works regardless of AI provider
Extension mediates (later: browser)

Lessons from AI Integration Failures

Notification API had problems:

Sites spammed users with prompts
Low value prompts (subscribe! allow notifications!)
Users learned to auto-deny

WebLLM avoids this by:

Requiring clear value (AI that helps the page)
Extension controls prompt frequency
Users configure once, not per-site

The Path to Native

Extension proves concept (now)
Developers build with it (now)
Browser vendors take interest (soon)
Standards process begins (2025?)
Native implementation ships (2026-2027?)

Apps built on WebLLM today will work when native support ships.

Getting Started

For Users

Install the WebLLM Chrome extension
Configure at least one provider:
- OpenAI API key (if you have one)
- Ollama (free, local)
- Other supported providers
Browse normally—AI-enabled sites will prompt for permission

For Developers

Basic integration:

<!DOCTYPE html>
<html>
  <body>
    <input id="prompt" placeholder="Ask anything..." />
    <button id="ask">Ask</button>
    <div id="response"></div>

    <script>
      document.getElementById('ask').addEventListener('click', async () => {
        const prompt = document.getElementById('prompt').value;

        if (!navigator.llm) {
          alert('Please install the WebLLM extension');
          return;
        }

        try {
          document.getElementById('response').textContent = 'Thinking...';
          const result = await navigator.llm.prompt(prompt);
          document.getElementById('response').textContent = result;
        } catch (error) {
          document.getElementById('response').textContent = `Error: ${error.message}`;
        }
      });
    </script>
  </body>
</html>

With streaming:

const stream = navigator.llm.streamPrompt(prompt);
const responseEl = document.getElementById('response');
responseEl.textContent = '';

for await (const chunk of stream) {
  responseEl.textContent += chunk;
}

With capability detection:

async function getAIResponse(prompt) {
  // Best: WebLLM
  if ('llm' in navigator) {
    return await navigator.llm.prompt(prompt);
  }

  // Fallback: Your server
  const response = await fetch('/api/ai', {
    method: 'POST',
    body: JSON.stringify({ prompt }),
  });
  return response.json();
}

Comparison: WebLLM vs Alternatives

Feature	WebLLM	Server API	transformers.js
Server needed	No	Yes	No
User provider choice	Yes	No	No
Privacy option	Yes	No	Yes
Model flexibility	High	High	Limited
Setup complexity	Low	Medium	High
API simplicity	`navigator.llm.prompt()`	fetch + parse	Import + pipeline setup
Cost model	User's API/free local	Your API costs	Free (bandwidth only)

Conclusion

WebLLM represents a vision: AI as a browser capability, not a server dependency.

Today, it's an extension that provides navigator.llm. Tomorrow, it could be native browser functionality—just like geolocation, camera access, and notifications evolved from experiments to standards.

For developers, it offers the simplest path to adding AI: one API that works regardless of which AI provider users choose.

For users, it offers control: decide whether AI runs locally or in the cloud, which provider to use, and which sites get access.

The web platform keeps expanding. AI is obviously next. WebLLM is building the bridge.

What is WebLLM?

The Problem WebLLM Solves

The Current State of Web AI

What WebLLM Provides

How WebLLM Works

Architecture Overview

The Extension: A Polyfill for the Future

The Permission Model

The API

Basic Usage

Streaming Responses

System Messages and Context

Checking Capabilities

Error Handling

Provider System

How Providers Work

Supported Providers

Why Provider Agnosticism Matters

Privacy Model

User Control Spectrum

Privacy-First Use Cases

Permission Visibility

Current Status

What's Working Now

What's Experimental

What's Planned

Why This Architecture?

Lessons from Other Browser APIs

Lessons from AI Integration Failures

The Path to Native

Getting Started

For Users

For Developers

Comparison: WebLLM vs Alternatives

Conclusion

Further Reading