WebLLM

Most AI-powered web apps follow this architecture:

Browser → Your Backend → AI API → Your Backend → Browser

Four network hops. Your backend is a proxy. You're paying for servers that mostly forward requests.

There's another way:

Browser → AI Provider → Browser

Two hops. No backend (for AI at least). Client-side AI.

This isn't just simpler. It's fundamentally better for certain use cases.

The Traditional Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Traditional AI Integration                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Browser          Backend           AI Provider            │
│   ┌─────┐         ┌─────┐           ┌─────┐                │
│   │     │ ──1──▶  │     │ ──2──▶    │     │                │
│   │     │         │     │           │     │                │
│   │     │ ◀──4──  │     │ ◀──3──    │     │                │
│   └─────┘         └─────┘           └─────┘                │
│                                                             │
│   1. User request                                           │
│   2. Backend forwards to AI                                 │
│   3. AI responds                                            │
│   4. Backend forwards to browser                            │
│                                                             │
│   For tool execution (like function calling):               │
│   Repeat 1-4 for each tool result = 8+ hops                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

What This Requires

Backend infrastructure:

Server to receive requests
API key storage and management
Rate limiting logic
Error handling
Logging and monitoring
Scaling considerations

Ongoing costs:

Server hosting ($50-500+/month)
AI API costs (varies)
DevOps time
Security maintenance

Why People Use It

API key protection: Don't expose keys to browser
Rate limiting: Control usage server-side
Custom logic: Pre/post processing
Logging: Track usage for billing/analytics

These are valid reasons. But they're not always necessary.

The Client-Side Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Client-Side AI Integration                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Browser                              AI Provider          │
│   ┌─────┐                             ┌─────┐              │
│   │     │ ─────────1───────────▶      │     │              │
│   │     │                             │     │              │
│   │     │ ◀────────2───────────       │     │              │
│   └─────┘                             └─────┘              │
│                                                             │
│   1. Request (via navigator.llm)                           │
│   2. Response                                               │
│                                                             │
│   For tool execution:                                       │
│   Tools execute in browser = no extra hops                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

What This Requires

For the developer:

Frontend code
That's it

For the user:

Browser AI configured (extension or native)
Their own AI subscription or local model

Why It's Better (When It Fits)

Lower latency:

2 hops instead of 4
Tool execution is local (no roundtrip)
Streaming is more direct

Lower cost:

No backend infrastructure
No server hosting
User pays their own AI subscription

Better privacy:

Data goes directly to user's chosen provider
No intermediate logging
User controls the relationship

Simpler architecture:

No backend to maintain
No API keys to manage
No scaling concerns for AI

Performance Comparison

Simple Prompt

Architecture	Hops	Typical Latency
Traditional	4	500ms + AI time
Client-side	2	100ms + AI time

With Tool Execution (3 tools)

Architecture	Hops	Typical Latency
Traditional	16	2000ms + AI time
Client-side	2	100ms + AI time

Tool execution is where client-side really wins. Each tool call in traditional architecture requires:

AI returns tool request
Backend receives, executes tool
Backend sends result back to AI
Repeat

With client-side, tools execute in the browser:

AI returns tool request
Browser executes locally
Result sent back to AI
No backend roundtrip

Use Cases for Client-Side AI

✓ Good Fits

UI manipulation:

// AI can change themes, layouts, preferences
const changes = await navigator.llm.prompt(
  `User wants dark mode. Return JSON for settings changes.`,
  { tools: [themeToggle, layoutChange] }
);
// Executes instantly in browser

Form assistance:

// AI helps fill forms based on context
const suggestions = await navigator.llm.prompt(
  `Help user complete shipping form: ${formContext}`,
  { tools: [fillField, validateAddress] }
);
// No server needed

Content enhancement:

// AI improves user's text
const enhanced = await navigator.llm.prompt(
  `Improve this writing: ${userText}`
);
// Direct to AI, no backend

Local data processing:

// AI analyzes data that stays in browser
const analysis = await navigator.llm.prompt(
  `Analyze this spreadsheet data: ${localData}`
);
// Data never leaves browser (if using local AI)

✗ Poor Fits

Database queries: AI needs server-side data access → use backend

Authentication flows: Need server-side validation → use backend

Multi-user coordination: Need server-side state → use backend

Sensitive operations: Need audit logging → use backend

Implementation Examples

Client-Side Chat

class AIChat {
  constructor() {
    this.available = 'llm' in navigator;
  }

  async send(message, history = []) {
    if (!this.available) {
      throw new Error('Browser AI not available');
    }

    const context = history.map(h =>
      `${h.role}: ${h.content}`
    ).join('\n');

    const response = await navigator.llm.prompt(
      `${context}\nUser: ${message}\nAssistant:`
    );

    return response;
  }
}

// Usage - no backend needed
const chat = new AIChat();
const response = await chat.send("Hello!");

Client-Side Form Helper

async function smartFormAssist(form, userRequest) {
  if (!('llm' in navigator)) return;

  const formFields = getFormFields(form);

  const result = await navigator.llm.prompt(`
    User is filling a form with fields: ${JSON.stringify(formFields)}
    User says: "${userRequest}"

    Return JSON with field suggestions:
    { "fieldName": "suggestedValue", ... }
  `);

  try {
    const suggestions = JSON.parse(result);
    applyFormSuggestions(form, suggestions);
  } catch {
    console.error('Could not parse AI suggestions');
  }
}

Client-Side Search Enhancement

async function smartSearch(query, items) {
  // Try AI-enhanced search
  if ('llm' in navigator) {
    const enhanced = await navigator.llm.prompt(`
      User search: "${query}"
      Available items: ${JSON.stringify(items.slice(0, 50))}

      Return JSON array of matching item IDs, ranked by relevance.
    `);

    try {
      const ids = JSON.parse(enhanced);
      return items.filter(i => ids.includes(i.id));
    } catch {
      // Fall through to traditional search
    }
  }

  // Fallback: keyword search
  return items.filter(i =>
    i.name.toLowerCase().includes(query.toLowerCase())
  );
}

Hybrid Architecture

For many apps, the answer is both:

┌─────────────────────────────────────────────────────────────┐
│                    Hybrid Architecture                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   AI Tasks (client-side):        Data Tasks (backend):     │
│   • Text enhancement             • Database queries         │
│   • UI suggestions               • Authentication          │
│   • Form assistance              • Payments                │
│   • Local analysis               • Multi-user state        │
│                                                             │
│   Browser ──▶ AI Provider        Browser ──▶ Your API      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Pattern:

AI stuff: client-side via navigator.llm
Data stuff: traditional backend
No backend server just for AI proxying

Cost Analysis

Traditional (Backend AI Proxy)

Cost	Monthly
Server hosting	$50-200
AI API costs	$100-1000
DevOps time (1hr/week)	$200-400
Total	$350-1600

Client-Side (User-Powered)

Cost	Monthly
Server hosting	$0 (no AI proxy)
AI API costs	$0 (user pays)
DevOps time	$0 (nothing to maintain)
Total	$0

Real savings depend on your scale, but for many apps, client-side AI is essentially free.

When to Choose What

Choose Client-Side When:

AI enhances UI/UX
Data stays in browser
Users likely have AI subscriptions
You want simple architecture
Privacy is important
Cost optimization matters

Choose Backend When:

AI needs server data
You must control the AI provider
You need comprehensive logging
Multi-user AI coordination
You're the AI provider (SaaS)

Choose Hybrid When:

You have both use cases
Some features need backend data
Some features are pure enhancement

Conclusion

Client-side AI isn't always the right choice. But when it fits:

Lower latency
Lower cost
Better privacy
Simpler architecture

The question to ask: "Does this AI feature actually need my backend?"

If not, consider client-side.

Client-Side AI: The Architecture Advantage

The Traditional Architecture

What This Requires

Why People Use It

The Client-Side Architecture

What This Requires

Why It's Better (When It Fits)

Performance Comparison

Simple Prompt

With Tool Execution (3 tools)

Use Cases for Client-Side AI

✓ Good Fits

✗ Poor Fits

Implementation Examples

Client-Side Chat

Client-Side Form Helper

Client-Side Search Enhancement

Hybrid Architecture

Cost Analysis

Traditional (Backend AI Proxy)

Client-Side (User-Powered)

When to Choose What

Choose Client-Side When:

Choose Backend When:

Choose Hybrid When:

Conclusion

Further Reading