Adding AI to your web app currently means one of two things: set up server infrastructure to call AI APIs, or figure out complex browser-based ML frameworks. Neither is great.
What if adding AI was as simple as adding geolocation?
// This is what WebLLM enables
const position = await navigator.geolocation.getCurrentPosition(); // Location
const result = await navigator.llm.prompt('Summarize this page'); // AI
WebLLM is building toward a future where AI is a native browser capability—available through a simple API, controlled by users, and working with any AI provider the user chooses.
This guide explains what WebLLM is, how it works, and why it matters for the future of web development.
The Problem WebLLM Solves
The Current State of Web AI
Option 1: Server-side AI
// Your server calls OpenAI, Anthropic, etc.
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ message: userInput }),
});
Problems:
- You need server infrastructure
- You pay for API costs
- You manage API keys and rate limits
- User data leaves their device
- Latency for every request
- You're locked to providers you integrate
Option 2: Client-side ML libraries
// Complex setup with transformers.js or similar
import { pipeline } from '@xenova/transformers';
// Downloads 100MB+ models
const classifier = await pipeline('sentiment-analysis');
const result = await classifier(text);
Problems:
- Large downloads for each model
- Limited to specific models
- Complex implementation
- No user choice in providers
- Different APIs for different tasks
Option 3: Native apps
Problems:
- Not the web
- Platform fragmentation
- Installation friction
- Distribution challenges
What WebLLM Provides
WebLLM introduces navigator.llm—a browser API for AI that follows established web platform patterns:
// Simple, permission-gated AI access
if ('llm' in navigator) {
const result = await navigator.llm.prompt('Explain this concept');
console.log(result);
}
This is what WebLLM enables:
- One API for all AI providers
- User controls which AI processes their data
- Privacy options including fully local AI
- No server needed for many use cases
- Permission-based like camera/location
How WebLLM Works
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Your Web App │
├─────────────────────────────────────────────────────────────┤
│ navigator.llm API │
├─────────────────────────────────────────────────────────────┤
│ WebLLM Extension │
├──────────────┬──────────────┬──────────────┬───────────────┤
│ OpenAI │ Anthropic │ Ollama │ On-Device │
│ (Cloud) │ (Cloud) │ (Local) │ (WebGPU) │
└──────────────┴──────────────┴──────────────┴───────────────┘
Layer 1: Your Web App
- Calls
navigator.llm.prompt()or similar methods - Doesn't know or care which provider fulfills the request
- Works the same regardless of user's AI setup
Layer 2: WebLLM API
- Provides the
navigator.llminterface - Handles permissions (like geolocation prompts)
- Routes requests to configured providers
- Currently delivered via browser extension
Layer 3: Providers
- Cloud APIs: OpenAI, Anthropic, Google, etc.
- Local servers: Ollama, LM Studio
- On-device: WebGPU-based inference
- User configures which providers to use
The Extension: A Polyfill for the Future
Today, navigator.llm doesn't exist natively in browsers. WebLLM provides it through a Chrome extension.
// The extension injects the API
window.navigator.llm = {
prompt: async (input) => {
/* ... */
},
streamPrompt: async function* (input) {
/* ... */
},
requestPermission: async () => {
/* ... */
},
// ...
};
This is the same pattern used to prototype other browser APIs:
- Service Workers were tested via polyfills before shipping
- Push notifications had experimental implementations first
- WebGPU had origin trials before stable release
The extension proves the API design. When browsers eventually add native AI support, apps built on WebLLM patterns will work without changes.
The Permission Model
WebLLM follows the browser permission pattern:
// Explicit permission request (optional - also happens on first use)
const permission = await navigator.llm.requestPermission();
if (permission === 'granted') {
// AI available
} else if (permission === 'denied') {
// User declined
} else {
// Permission not yet determined
}
The permission prompt shows:
- Which site is requesting AI access
- Clear Allow/Deny options
- Remembers decision per-site
- Revocable via extension settings
This is familiar to users from camera, microphone, and location permissions.
The API
Basic Usage
// Simple prompt (non-streaming)
const response = await navigator.llm.prompt('What is the capital of France?');
console.log(response); // "The capital of France is Paris."
Streaming Responses
// Streaming for real-time display
const output = document.getElementById('output');
output.textContent = '';
const stream = navigator.llm.streamPrompt('Write a short story about a robot');
for await (const chunk of stream) {
output.textContent += chunk;
}
System Messages and Context
// Create a session with system context
const session = await navigator.llm.createSession({
system: 'You are a helpful coding assistant. Be concise.',
});
// Conversation maintains context
const response1 = await session.prompt('How do I center a div?');
const response2 = await session.prompt('What about vertical centering?');
// Second response understands we're still discussing CSS
Checking Capabilities
// Check if WebLLM is available
if ('llm' in navigator) {
// Check permission status
const status = await navigator.permissions.query({ name: 'llm' });
if (status.state === 'granted') {
// Ready to use
} else if (status.state === 'prompt') {
// Will prompt on first use
} else {
// Denied
}
} else {
// WebLLM not available (extension not installed)
}
Error Handling
try {
const result = await navigator.llm.prompt('Hello');
} catch (error) {
if (error.name === 'PermissionDenied') {
// User denied AI access
} else if (error.name === 'NoProviderAvailable') {
// No AI providers configured
} else if (error.name === 'ProviderError') {
// Provider returned an error (rate limit, invalid key, etc.)
}
}
Provider System
How Providers Work
Users configure providers in the WebLLM extension:
Extension Settings:
┌─────────────────────────────────────────┐
│ AI Providers │
├─────────────────────────────────────────┤
│ ✓ Ollama (Local) [Priority: 1] │
│ Running on localhost:11434 │
│ │
│ ✓ OpenAI [Priority: 2] │
│ API Key: sk-... │
│ │
│ ○ Anthropic [Priority: 3] │
│ Not configured │
└─────────────────────────────────────────┘
When your app calls navigator.llm.prompt():
- WebLLM checks configured providers
- Uses highest-priority available provider
- Falls back to next provider on failure
- Returns result (app doesn't know which provider)
Supported Providers
Cloud APIs:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude)
- Google (Gemini)
- And many more...
Local/Self-hosted:
- Ollama
- LM Studio
- llama.cpp
- Any OpenAI-compatible endpoint
On-device:
- WebGPU-based inference (experimental)
- Smaller models that run entirely in browser
Why Provider Agnosticism Matters
For developers:
- Write once, works with any AI
- No need to support multiple APIs
- No API key management in your app
- Users bring their own AI
For users:
- Choose their preferred AI
- Use local AI for privacy
- Use different AI for different sites
- No per-app subscriptions
Privacy Model
User Control Spectrum
Full Cloud ◄──────────────────────────────► Full Local
OpenAI/Anthropic │ Hybrid │ Ollama/LM Studio │ On-device
│ │ │
│ │ └─ No network
│ └─ Local server, no cloud
└─ Local first, cloud fallback
Users choose where they are on this spectrum. Your app works regardless.
Privacy-First Use Cases
Sensitive applications:
- Health journaling
- Financial planning
- Personal notes
- Therapy aids
For these, users can configure local-only providers:
- Data never leaves their device
- No cloud API calls
- Full privacy
Your app doesn't change:
// Same code for cloud or local
const result = await navigator.llm.prompt(userInput);
Permission Visibility
Like camera permissions, users can:
- See which sites have AI access
- Revoke access per-site
- See when AI is being used (optional indicator)
- Control provider selection per-site
Current Status
What's Working Now
The Extension:
- Chrome extension available
navigator.llmAPI functional- Multiple providers supported
- Permission system working
Provider Support:
- OpenAI (GPT-4, GPT-3.5-turbo)
- Anthropic (Claude)
- Ollama (local models)
- LM Studio
- Many OpenAI-compatible endpoints
API Features:
prompt()- Simple completionstreamPrompt()- Streaming responses- Session management
- Error handling
What's Experimental
- On-device inference via WebGPU
- Model download/caching
- Multi-modal support (images)
- Tool use / function calling
What's Planned
- Firefox/Safari extensions
- Native browser integration proposals
- More providers
- Advanced features (embeddings, fine-tuning access)
Why This Architecture?
Lessons from Other Browser APIs
Geolocation succeeded because:
- Simple API (
getCurrentPosition) - User permission required
- Works regardless of GPS source
- Browser mediates, not website
WebLLM follows the same principles:
- Simple API (
prompt) - User permission required
- Works regardless of AI provider
- Extension mediates (later: browser)
Lessons from AI Integration Failures
Notification API had problems:
- Sites spammed users with prompts
- Low value prompts (subscribe! allow notifications!)
- Users learned to auto-deny
WebLLM avoids this by:
- Requiring clear value (AI that helps the page)
- Extension controls prompt frequency
- Users configure once, not per-site
The Path to Native
- Extension proves concept (now)
- Developers build with it (now)
- Browser vendors take interest (soon)
- Standards process begins (2025?)
- Native implementation ships (2026-2027?)
Apps built on WebLLM today will work when native support ships.
Getting Started
For Users
- Install the WebLLM Chrome extension
- Configure at least one provider:
- OpenAI API key (if you have one)
- Ollama (free, local)
- Other supported providers
- Browse normally—AI-enabled sites will prompt for permission
For Developers
Basic integration:
<!DOCTYPE html>
<html>
<body>
<input id="prompt" placeholder="Ask anything..." />
<button id="ask">Ask</button>
<div id="response"></div>
<script>
document.getElementById('ask').addEventListener('click', async () => {
const prompt = document.getElementById('prompt').value;
if (!navigator.llm) {
alert('Please install the WebLLM extension');
return;
}
try {
document.getElementById('response').textContent = 'Thinking...';
const result = await navigator.llm.prompt(prompt);
document.getElementById('response').textContent = result;
} catch (error) {
document.getElementById('response').textContent = `Error: ${error.message}`;
}
});
</script>
</body>
</html>
With streaming:
const stream = navigator.llm.streamPrompt(prompt);
const responseEl = document.getElementById('response');
responseEl.textContent = '';
for await (const chunk of stream) {
responseEl.textContent += chunk;
}
With capability detection:
async function getAIResponse(prompt) {
// Best: WebLLM
if ('llm' in navigator) {
return await navigator.llm.prompt(prompt);
}
// Fallback: Your server
const response = await fetch('/api/ai', {
method: 'POST',
body: JSON.stringify({ prompt }),
});
return response.json();
}
Comparison: WebLLM vs Alternatives
| Feature | WebLLM | Server API | transformers.js |
|---|---|---|---|
| Server needed | No | Yes | No |
| User provider choice | Yes | No | No |
| Privacy option | Yes | No | Yes |
| Model flexibility | High | High | Limited |
| Setup complexity | Low | Medium | High |
| API simplicity | navigator.llm.prompt() | fetch + parse | Import + pipeline setup |
| Cost model | User's API/free local | Your API costs | Free (bandwidth only) |
Conclusion
WebLLM represents a vision: AI as a browser capability, not a server dependency.
Today, it's an extension that provides navigator.llm. Tomorrow, it could be native browser functionality—just like geolocation, camera access, and notifications evolved from experiments to standards.
For developers, it offers the simplest path to adding AI: one API that works regardless of which AI provider users choose.
For users, it offers control: decide whether AI runs locally or in the cloud, which provider to use, and which sites get access.
The web platform keeps expanding. AI is obviously next. WebLLM is building the bridge.
