WebLLM - Browser-Native AI Protocol

You've decided to add AI to your web app. Now what?

There are multiple approaches, each with different tradeoffs. This guide helps you choose.

The 5 Approaches

1. Direct API Integration

Call AI providers (OpenAI, Anthropic, etc.) directly from your backend.

// Your backend
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY });

app.post('/api/chat', async (req, res) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: req.body.messages
  });
  res.json(response);
});

Best for: Quick prototypes, single-provider apps

2. AI Gateway Services

Use managed services like Vercel AI SDK, Cloudflare AI Gateway, or AWS Bedrock.

// Using Vercel AI SDK
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateText({
  model: openai('gpt-4'),
  prompt: 'Hello'
});

Best for: Production apps needing reliability, caching, analytics

3. Backend Proxy

Build your own abstraction layer over multiple providers.

// Your custom proxy
class AIProxy {
  async generate(prompt) {
    try {
      return await this.openai.generate(prompt);
    } catch {
      return await this.anthropic.generate(prompt); // Fallback
    }
  }
}

Best for: Custom routing logic, multi-provider fallback, cost optimization

4. Browser-Native (User-Powered)

Users bring their own AI via browser extension or native browser support.

// Frontend - no backend needed
if ('llm' in navigator) {
  const response = await navigator.llm.prompt('Hello');
}

Best for: Privacy-focused apps, cost-sensitive projects, user control

5. Local Models

Run models on-device using Ollama, LM Studio, or in-browser via WebGPU.

// Connect to local Ollama
const response = await fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  body: JSON.stringify({ model: 'llama3', prompt: 'Hello' })
});

Best for: Offline capability, full privacy, air-gapped environments

Comparison Matrix

Factor	Direct API	AI Gateway	Backend Proxy	Browser-Native	Local Models
Setup complexity	Low	Medium	High	Low	Medium
Cost to you	High	Medium	High	Zero	Zero
Privacy	Low	Low	Medium	High	Highest
User control	None	None	None	Full	Full
Provider flexibility	Low	Medium	High	High	Medium
Offline support	No	No	No	Possible	Yes
Reliability	Provider-dependent	High (managed)	You manage	User-dependent	You manage

Decision Tree

Do you need offline support?
├── Yes → Local Models (5)
└── No
    │
    Do users need to control their AI provider?
    ├── Yes → Browser-Native (4)
    └── No
        │
        Do you need multi-provider fallback?
        ├── Yes
        │   │
        │   Build vs Buy?
        │   ├── Build → Backend Proxy (3)
        │   └── Buy → AI Gateway (2)
        └── No → Direct API (1)

Approach 1: Direct API Integration

When to Use

Rapid prototyping
Single provider is acceptable
Simple use case
Small scale

Implementation

// Backend (Node.js/Express)
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

app.post('/api/chat', async (req, res) => {
  const message = await anthropic.messages.create({
    model: 'claude-3-sonnet-20240229',
    max_tokens: 1024,
    messages: [{ role: 'user', content: req.body.prompt }]
  });

  res.json({ response: message.content[0].text });
});

Pros

Simplest to implement
Direct access to provider features
Good documentation

Cons

Locked to one provider
No fallback
You manage everything

Approach 2: AI Gateway Services

When to Use

Production applications
Need caching, rate limiting, analytics
Want managed reliability
Multiple models/providers

Implementation

// Using Vercel AI SDK
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';

// Easy provider switching
const model = useOpenAI ? openai('gpt-4') : anthropic('claude-3-sonnet');

const result = await streamText({
  model,
  prompt: input
});

// Streaming built-in
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Pros

Managed infrastructure
Built-in streaming, caching
Provider abstraction
Good for production

Cons

Additional service dependency
May have usage limits
Less control than custom proxy

Approach 3: Backend Proxy

When to Use

Custom routing logic needed
Cost optimization across providers
Specific compliance requirements
Full control required

Implementation

// Custom AI proxy with fallback
class AIService {
  constructor() {
    this.providers = [
      { name: 'groq', client: new Groq(), priority: 1 },
      { name: 'openai', client: new OpenAI(), priority: 2 },
      { name: 'anthropic', client: new Anthropic(), priority: 3 }
    ];
  }

  async generate(prompt, options = {}) {
    const sorted = this.providers.sort((a, b) => a.priority - b.priority);

    for (const provider of sorted) {
      try {
        return await this.callProvider(provider, prompt, options);
      } catch (error) {
        console.log(`${provider.name} failed, trying next...`);
      }
    }

    throw new Error('All providers failed');
  }

  async callProvider(provider, prompt, options) {
    // Provider-specific implementation
    switch (provider.name) {
      case 'groq':
        return this.callGroq(provider.client, prompt, options);
      case 'openai':
        return this.callOpenAI(provider.client, prompt, options);
      case 'anthropic':
        return this.callAnthropic(provider.client, prompt, options);
    }
  }
}

Pros

Full control
Custom fallback logic
Cost optimization
Compliance flexibility

Cons

Significant engineering effort
You maintain everything
Still server-side costs

Approach 4: Browser-Native (User-Powered)

When to Use

Privacy is important
Don't want to pay AI costs
Users likely have AI subscriptions
Simple AI features (enhancement, not core)

Implementation

// Frontend - no backend AI needed
class BrowserAI {
  constructor() {
    this.available = 'llm' in navigator;
  }

  async generate(prompt, fallback = null) {
    if (this.available) {
      return await navigator.llm.prompt(prompt);
    }

    if (fallback) {
      return await fallback(prompt);
    }

    throw new Error('AI not available');
  }

  async stream(prompt) {
    if (!this.available) {
      throw new Error('AI not available');
    }

    return navigator.llm.streamPrompt(prompt);
  }
}

// Usage
const ai = new BrowserAI();

if (ai.available) {
  const response = await ai.generate('Improve this text: ' + text);
}

Pros

Zero cost to you
User controls privacy
Simpler architecture
No API key management

Cons

Users must have AI configured
Less control over model
Graceful degradation needed

Approach 5: Local Models

When to Use

Offline capability required
Maximum privacy needed
Air-gapped environments
Specific model requirements

Implementation

// Using Ollama
class LocalAI {
  constructor(baseUrl = 'http://localhost:11434') {
    this.baseUrl = baseUrl;
  }

  async generate(prompt, model = 'llama3') {
    const response = await fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model,
        prompt,
        stream: false
      })
    });

    const data = await response.json();
    return data.response;
  }

  async *stream(prompt, model = 'llama3') {
    const response = await fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model,
        prompt,
        stream: true
      })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const lines = decoder.decode(value).split('\n');
      for (const line of lines) {
        if (line) {
          const data = JSON.parse(line);
          if (data.response) yield data.response;
        }
      }
    }
  }
}

Pros

Works offline
Full privacy
No API costs
No rate limits

Cons

User needs to set up Ollama
Hardware-dependent performance
Limited model selection
No cloud fallback

Hybrid Approaches

Real applications often combine approaches:

Browser-First with Backend Fallback

async function getAIResponse(prompt) {
  // Try browser AI first (free for you)
  if ('llm' in navigator) {
    try {
      return await navigator.llm.prompt(prompt);
    } catch {
      // Fall through to backend
    }
  }

  // Fallback to your backend
  const res = await fetch('/api/ai', {
    method: 'POST',
    body: JSON.stringify({ prompt })
  });
  return res.json();
}

Local-First with Cloud Fallback

async function generate(prompt) {
  // Try local Ollama
  if (await this.isOllamaRunning()) {
    return await this.localAI.generate(prompt);
  }

  // Fallback to cloud
  return await this.cloudAI.generate(prompt);
}

Recommendations by Use Case

Use Case	Recommended Approach
Prototype	Direct API
Production SaaS	AI Gateway
Privacy-focused app	Browser-Native + Local
Cost-sensitive	Browser-Native
Enterprise/compliance	Backend Proxy
Offline-first	Local Models
Open source project	Browser-Native

Conclusion

There's no single "best" approach. The right choice depends on:

Privacy requirements → Browser-Native or Local
Cost constraints → Browser-Native
Control needs → Backend Proxy
Simplicity needs → Direct API or AI Gateway
Offline needs → Local Models

Most production apps benefit from hybrid approaches—browser-native for enhancement features, backend for core functionality.

AI Integration Approaches: A Framework for Developers

The 5 Approaches

1. Direct API Integration

2. AI Gateway Services

3. Backend Proxy

4. Browser-Native (User-Powered)

5. Local Models

Comparison Matrix

Decision Tree

Approach 1: Direct API Integration

When to Use

Implementation

Pros

Cons

Approach 2: AI Gateway Services

When to Use

Implementation

Pros

Cons

Approach 3: Backend Proxy

When to Use

Implementation

Pros

Cons

Approach 4: Browser-Native (User-Powered)

When to Use

Implementation

Pros

Cons

Approach 5: Local Models

When to Use

Implementation

Pros

Cons

Hybrid Approaches

Browser-First with Backend Fallback

Local-First with Cloud Fallback

Recommendations by Use Case

Conclusion

Further Reading